20: Probability - Convergence Modes, LLN & CLT
🎲 Convergence, Law of Large Numbers, and Central Limit Theorem
Convergence Modes
01 Convergence modes: The vanishing spike
Define \(X_n\) as follows: with probability \(\frac{1}{n}\), \(X_n = n\); otherwise \(X_n = 0\).
- Compute \(E[X_n]\) and \(\text{Var}[X_n]\). What happens as \(n \to \infty\)?
- Show that \(X_n \xrightarrow{P} 0\) (converges in probability to 0) by computing \(P[|X_n| > \epsilon]\) for any \(\epsilon > 0\).
- Does \(X_n \to 0\) almost surely? Hint: Consider \(\sum_{n=1}^{\infty} P[X_n \neq 0]\). Use the Borel-Cantelli lemma intuition: if events happen “infinitely often” then convergence a.s. fails.
- Explain in 2-3 sentences: why can a sequence have vanishing probability of being far from 0, yet still “spike” infinitely often with probability 1?
📖 The Borel-Cantelli Lemmas: Infinite Monkeys and Rare Events
The Borel-Cantelli lemmas tell us when rare events happen “infinitely often” (abbreviated i.o.).
Setup: You have infinitely many events \(A_1, A_2, A_3, \ldots\) (like flipping coins forever, or waiting for earthquakes).
Question: What’s the probability that infinitely many of these events occur?
First Borel-Cantelli Lemma (The “Convergent Sum” Lemma)
Statement: If \(\sum_{n=1}^{\infty} P[A_n] < \infty\) (the probabilities sum to a finite number), then \[P[\text{infinitely many } A_n \text{ occur}] = 0.\]
Intuition: If the total probability budget across all events is finite, you can’t afford infinitely many events to happen. Almost surely, only finitely many occur.
Example: Let \(A_n\) = “you win the lottery on day \(n\)” with \(P[A_n] = \frac{1}{n^2}\).
- \(\sum_{n=1}^{\infty} \frac{1}{n^2} = \frac{\pi^2}{6} < \infty\) (converges!)
- So with probability 1, you win the lottery only finitely many times (sad news for infinite wealth dreams).
Non-example: Let \(A_n\) = “you get heads on flip \(n\)” with \(P[A_n] = \frac{1}{2}\).
- \(\sum_{n=1}^{\infty} \frac{1}{2} = \infty\) (diverges!)
- The first lemma says nothing here. (Hint: the second lemma will help!)
Second Borel-Cantelli Lemma (The “Independent Events” Lemma)
Statement: If the events \(A_n\) are independent and \(\sum_{n=1}^{\infty} P[A_n] = \infty\) (diverges), then \[P[\text{infinitely many } A_n \text{ occur}] = 1.\]
Intuition: If events are independent and the total probability is infinite, you have an unlimited budget—infinitely many events must happen almost surely.
Example continued: Coin flips with \(P[\text{Heads}] = \frac{1}{2}\) are independent and \(\sum \frac{1}{2} = \infty\).
- So you get heads infinitely often with probability 1. (This is obvious intuitively, but BC makes it rigorous!)
Another example: Toss a fair coin. Let \(A_n\) = “you get 10 heads in a row starting at flip \(n\).”
- \(P[A_n] = \frac{1}{2^{10}} = \frac{1}{1024}\)
- \(\sum_{n=1}^{\infty} \frac{1}{1024} = \infty\)
- The events aren’t exactly independent (overlapping runs), but a modified argument shows: you will see 10 heads in a row infinitely often, almost surely.
Combining the Two: The Traffic Light of Convergence
| \(\sum P[A_n]\) | Events independent? | What happens i.o.? |
|---|---|---|
| \(< \infty\) | Doesn’t matter | Never (a.s.) |
| \(= \infty\) | ❌ No | Unknown (need more info) |
| \(= \infty\) | ✅ Yes | Always (a.s.) |
Key insight for convergence:
- To show \(X_n \to 0\) almost surely, you often define \(A_n = \{|X_n| > \epsilon\}\) and show \(\sum P[A_n] < \infty\) (so “being far away” happens only finitely often).
- If \(\sum P[A_n] = \infty\) but events aren’t independent, a.s. convergence might fail (see Problem 02!).
The Infinite Monkey Theorem
Claim: A monkey typing randomly will eventually type the complete works of Shakespeare, almost surely.
Proof sketch:
- Let \(A_n\) = “the monkey types ‘To be or not to be’ starting at position \(n\).”
- \(P[A_n] = (1/26)^{18} \approx 10^{-25}\) (assuming 26 letters)
- \(\sum P[A_n] = \infty\) (sum of constants diverges)
- Keystrokes are independent \(\Rightarrow\) BC2 says it happens infinitely often!
- Same argument works for the full text of Hamlet (just a longer string with smaller \(P[A_n]\), but still divergent sum).
Reality check: The universe might end before this happens. Borel-Cantelli guarantees it in the limit, not in your lifetime!
Law of Large Numbers
02 Sample mean convergence: Visualizing LLN
You roll a fair die repeatedly and compute the running average \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\).
- What is \(E[X_i]\) and \(\text{Var}[X_i]\) for a single roll?
- Compute \(E[\bar{X}_n]\) and \(\text{Var}[\bar{X}_n]\). What happens to the variance as \(n\) increases?
- Use Chebyshev’s inequality to bound \(P[|\bar{X}_n - 3.5| > 0.5]\). For what \(n\) is this probability less than 0.05?
- Sketch (or describe) how you’d expect a plot of \(\bar{X}_n\) vs \(n\) to look for \(n = 1\) to \(n = 100\).
03 When LLN fails: Cauchy counterexample
The Cauchy distribution has PDF \(f(x) = \frac{1}{\pi(1 + x^2)}\) for \(x \in \mathbb{R}\).
- Without computing, explain why \(E[X]\) doesn’t exist for \(X \sim \text{Cauchy}\) by considering the integral \(\int_{-\infty}^{\infty} x \cdot \frac{1}{\pi(1+x^2)} dx\).
- Let \(X_1, \ldots, X_n\) be i.i.d. Cauchy. The sample mean is \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\). Does \(\bar{X}_n\) converge to anything as \(n \to \infty\)?
- Fact: \(\bar{X}_n\) itself has the same Cauchy distribution for all \(n\). Explain why this violates the LLN, and what condition of the LLN is not satisfied.
- Simulate (or imagine) 1000 samples from Cauchy and compute their mean. Would you expect a “tight” estimate or wild variability? Why is averaging useless here?
Central Limit Theorem
04 CLT in Python: Simulating the magic
In this problem, you’ll write Python code to demonstrate the Central Limit Theorem empirically.
Setup: Let \(X_i \sim \text{Exp}(1)\) (exponential distribution with rate 1, so \(E[X_i] = 1\), \(\text{Var}[X_i] = 1\)).
Tasks:
- Generate data: For each sample size \(n \in \{1, 5, 10, 30, 50\}\):
- Simulate \(N = 10000\) sample means \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\) (each mean computed from \(n\) i.i.d. \(\text{Exp}(1)\) random variables).
- Store these 10000 sample means.
- Standardize: Compute the standardized version: \[Z_n = \frac{\bar{X}_n - 1}{\sqrt{1/n}} = \sqrt{n} \cdot (\bar{X}_n - 1).\] For each \(n\), compute this for all 10000 samples.
- Visualize: Create a figure with 5 subplots (one for each \(n\)):
- Plot a histogram of the 10000 standardized values \(Z_n\).
- Overlay the standard normal PDF \(N(0, 1)\) as a smooth curve.
- Add a title indicating the sample size \(n\).
- Interpret:
- For which values of \(n\) does the histogram closely match the normal curve?
- The original \(\text{Exp}(1)\) distribution is highly skewed (right tail). Explain in 2-3 sentences why the CLT “fixes” this skewness as \(n\) grows.
Bonus: - Repeat for \(X_i \sim \text{Bernoulli}(0.1)\) (highly skewed discrete distribution). Does CLT work faster or slower than for exponential? Why?
Python hints:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
# For generating exponential: np.random.exponential(scale=1, size=n)
# For standard normal PDF: norm.pdf(x, 0, 1)