Theory behind the Technology: On Weak and Strong Laws of Large Numbers

Introduction

In statistics, we use the mean calculated from a sample as an estimate for the mean of the population. E.g. if the average height of a random sample of a thousand people from a region is 6 feet, then we estimate that the average height of all people in that region is 6 feet. Why does this work? The weak and strong laws of large numbers provide a theoretical basis for why this works. Given below are the laws themselves and their difference, which serves as a justification for their names.

Notation

Let $X_1$, $X_2$, $\cdots$, $X_n$ be independent and identically distributed random variables. (In layman terms, $X_1$ is the first observation, $X_2$ is the second and so on.)
Let $M_n$ be a random variable denoting the mean of $X_1$, $X_2$, $\cdots$, $X_n$. In other words, \[M_n=\frac{1}{n}\sum_{1}^{n}X_i\]So this is the mean of the sample.
Let $\mu$ be the mean of each of $X_1$, $X_2$, $\cdots$, $X_n$. In other words, $\mathbf{E}(X_i)=\mu$ for each $i$. So $\mu$ is the mean of the population (usually unknown, which is why we want to estimate it!).

Weak Law of Large Numbers

This law states that for any $\epsilon>0$,
\[\lim_{n\to\infty}\mathbf{P}(|M_n-\mu|>\epsilon)=0\]Interpretation

For large values of $n$ (i.e. $n>n_0$ for some $n_0$), the probability that the value of $M_n$ (the sample mean) differs from the population mean $\mu$ by more than any given number $\epsilon$ is 0.
Alternatively, all probability is concentrated in an $\epsilon$-interval around $\mu$.
Alternatively, almost surely, for large samples, the sample mean is within an $\epsilon$ neighborhood of the population mean.

Strong Law of Large Numbers

This law states that
\[\mathbf{P}(\lim_{n\to\infty}M_n=\mu)=1\]Interpretation

For large values of $n$ (i.e. $n>n_0$ for some $n_0$), the probability that the value of $M_n$ (the sample mean) differs from the population mean at all is 0.

Alternatively, all probability is concentrated at $\mu$.

Alternatively, almost surely, for large samples, the sample mean is exactly the population mean.

Difference between the two laws

Strong law is stronger than the weak law because the strong law allows for $\epsilon=0$, while the weak law has to have $\epsilon>0$.
Per the strong law, all probability is concentrated at $\mu$, while per the weak law, it is concentrated in the interval $(\mu-\epsilon,\mu+\epsilon)$, which is infinitely larger because $\epsilon>0$.
Because the probability of the sample mean, $M_n$ differing from population mean $\mu$ is 0, the strong law allows for only a finite number of values of $M_n$ to differ from $\mu$. In other words, there are only a finite number of sequences $X_1$, $X_2$, $\cdots$, $X_n$ whose mean $M_n$ differs from $\mu$. Now that is a very strong statement!
Because the probability of the sample mean $M_n$ differing from population mean $\mu$ is positive (although small), the weak law allows for an infite number of values of $M_n$ to differ from $\mu$. In other words, there are an infinte number of sequences $X_1$, $X_2$, $\cdots$, $X_n$ whose mean $M_n$ differs from $\mu$. This is clearly weaker than the previous statement.

References

Bertsekas, Dimitri and Tsitsiklis, John, Introduction to Probability, 2nd edition.
Wikipedia - Weak law of large numbers
Wikipedia - Strong law of large numbers

Theory behind the Technology

About Me

Friday, June 15, 2012

On Weak and Strong Laws of Large Numbers