Introduction
In statistics, we use the mean calculated from a sample as an estimate for the mean of the population. E.g. if the average height of a random sample of a thousand people from a region is 6 feet, then we estimate that the average height of all people in that region is 6 feet. Why does this work? The weak and strong laws of large numbers provide a theoretical basis for why this works. Given below are the laws themselves and their difference, which serves as a justification for their names.Notation
- Let $X_1$, $X_2$, $\cdots$, $X_n$ be independent and identically distributed random variables. (In layman terms, $X_1$ is the first observation, $X_2$ is the second and so on.)
- Let $M_n$ be a random variable denoting the mean of $X_1$, $X_2$, $\cdots$, $X_n$. In other words, \[M_n=\frac{1}{n}\sum_{1}^{n}X_i\]So this is the mean of the sample.
- Let $\mu$ be the mean of each of $X_1$, $X_2$, $\cdots$, $X_n$. In other words, $\mathbf{E}(X_i)=\mu$ for each $i$. So $\mu$ is the mean of the population (usually unknown, which is why we want to estimate it!).
Weak Law of Large Numbers
This law states that for any $\epsilon>0$,\[\lim_{n\to\infty}\mathbf{P}(|M_n-\mu|>\epsilon)=0\]Interpretation
- For large values of $n$ (i.e. $n>n_0$ for some $n_0$), the probability that the value of $M_n$ (the sample mean) differs from the population mean $\mu$ by more than any given number $\epsilon$ is 0.
- Alternatively, all probability is concentrated in an $\epsilon$-interval around $\mu$.
- Alternatively, almost surely, for large samples, the sample mean is within an $\epsilon$ neighborhood of the population mean.
Strong Law of Large Numbers
This law states that\[\mathbf{P}(\lim_{n\to\infty}M_n=\mu)=1\]Interpretation
- For large values of $n$ (i.e. $n>n_0$ for some $n_0$), the probability that the value of $M_n$ (the sample mean) differs from the population mean at all is 0.
- Alternatively, all probability is concentrated at $\mu$.
- Alternatively, almost surely, for large samples, the sample mean is exactly the population mean.
Difference between the two laws
- Strong law is stronger than the weak law because the strong law allows for $\epsilon=0$, while the weak law has to have $\epsilon>0$.
- Per the strong law, all probability is concentrated at $\mu$, while per the weak law, it is concentrated in the interval $(\mu-\epsilon,\mu+\epsilon)$, which is infinitely larger because $\epsilon>0$.
- Because the probability of the sample mean, $M_n$ differing from population mean $\mu$ is 0, the strong law allows for only a finite number of values of $M_n$ to differ from $\mu$. In other words, there are only a finite number of sequences $X_1$, $X_2$, $\cdots$, $X_n$ whose mean $M_n$ differs from $\mu$. Now that is a very strong statement!
- Because the probability of the sample mean $M_n$ differing from population mean $\mu$ is positive (although small), the weak law allows for an infite number of values of $M_n$ to differ from $\mu$. In other words, there are an infinte number of sequences $X_1$, $X_2$, $\cdots$, $X_n$ whose mean $M_n$ differs from $\mu$. This is clearly weaker than the previous statement.
References
- Bertsekas, Dimitri and Tsitsiklis, John, Introduction to Probability, 2nd edition.
- Wikipedia - Weak law of large numbers
- Wikipedia - Strong law of large numbers