20: Probability - Convergence Modes, LLN & CLT


🎲 Convergence, Law of Large Numbers, and Central Limit Theorem

Convergence Modes

01 Convergence modes: The vanishing spike

Define \(X_n\) as follows: with probability \(\frac{1}{n}\), \(X_n = n\); otherwise \(X_n = 0\).

    1. Compute \(E[X_n]\) and \(\text{Var}[X_n]\). What happens as \(n \to \infty\)?
    1. Show that \(X_n \xrightarrow{P} 0\) (converges in probability to 0) by computing \(P[|X_n| > \epsilon]\) for any \(\epsilon > 0\).
    1. Does \(X_n \to 0\) almost surely? Hint: Consider \(\sum_{n=1}^{\infty} P[X_n \neq 0]\). Use the Borel-Cantelli lemma intuition: if events happen “infinitely often” then convergence a.s. fails.
    1. Explain in 2-3 sentences: why can a sequence have vanishing probability of being far from 0, yet still “spike” infinitely often with probability 1?

📖 The Borel-Cantelli Lemmas: Infinite Monkeys and Rare Events

The Borel-Cantelli lemmas tell us when rare events happen “infinitely often” (abbreviated i.o.).

Setup: You have infinitely many events \(A_1, A_2, A_3, \ldots\) (like flipping coins forever, or waiting for earthquakes).

Question: What’s the probability that infinitely many of these events occur?


First Borel-Cantelli Lemma (The “Convergent Sum” Lemma)

Statement: If \(\sum_{n=1}^{\infty} P[A_n] < \infty\) (the probabilities sum to a finite number), then \[P[\text{infinitely many } A_n \text{ occur}] = 0.\]

Intuition: If the total probability budget across all events is finite, you can’t afford infinitely many events to happen. Almost surely, only finitely many occur.

Example: Let \(A_n\) = “you win the lottery on day \(n\)” with \(P[A_n] = \frac{1}{n^2}\).

  • \(\sum_{n=1}^{\infty} \frac{1}{n^2} = \frac{\pi^2}{6} < \infty\) (converges!)
  • So with probability 1, you win the lottery only finitely many times (sad news for infinite wealth dreams).

Non-example: Let \(A_n\) = “you get heads on flip \(n\)” with \(P[A_n] = \frac{1}{2}\).

  • \(\sum_{n=1}^{\infty} \frac{1}{2} = \infty\) (diverges!)
  • The first lemma says nothing here. (Hint: the second lemma will help!)

Second Borel-Cantelli Lemma (The “Independent Events” Lemma)

Statement: If the events \(A_n\) are independent and \(\sum_{n=1}^{\infty} P[A_n] = \infty\) (diverges), then \[P[\text{infinitely many } A_n \text{ occur}] = 1.\]

Intuition: If events are independent and the total probability is infinite, you have an unlimited budget—infinitely many events must happen almost surely.

Example continued: Coin flips with \(P[\text{Heads}] = \frac{1}{2}\) are independent and \(\sum \frac{1}{2} = \infty\).

  • So you get heads infinitely often with probability 1. (This is obvious intuitively, but BC makes it rigorous!)

Another example: Toss a fair coin. Let \(A_n\) = “you get 10 heads in a row starting at flip \(n\).”

  • \(P[A_n] = \frac{1}{2^{10}} = \frac{1}{1024}\)
  • \(\sum_{n=1}^{\infty} \frac{1}{1024} = \infty\)
  • The events aren’t exactly independent (overlapping runs), but a modified argument shows: you will see 10 heads in a row infinitely often, almost surely.

Combining the Two: The Traffic Light of Convergence

\(\sum P[A_n]\) Events independent? What happens i.o.?
\(< \infty\) Doesn’t matter Never (a.s.)
\(= \infty\) ❌ No Unknown (need more info)
\(= \infty\) ✅ Yes Always (a.s.)

Key insight for convergence:

  • To show \(X_n \to 0\) almost surely, you often define \(A_n = \{|X_n| > \epsilon\}\) and show \(\sum P[A_n] < \infty\) (so “being far away” happens only finitely often).
  • If \(\sum P[A_n] = \infty\) but events aren’t independent, a.s. convergence might fail (see Problem 02!).

The Infinite Monkey Theorem

Claim: A monkey typing randomly will eventually type the complete works of Shakespeare, almost surely.

Proof sketch:

  • Let \(A_n\) = “the monkey types ‘To be or not to be’ starting at position \(n\).”
  • \(P[A_n] = (1/26)^{18} \approx 10^{-25}\) (assuming 26 letters)
  • \(\sum P[A_n] = \infty\) (sum of constants diverges)
  • Keystrokes are independent \(\Rightarrow\) BC2 says it happens infinitely often!
  • Same argument works for the full text of Hamlet (just a longer string with smaller \(P[A_n]\), but still divergent sum).

Reality check: The universe might end before this happens. Borel-Cantelli guarantees it in the limit, not in your lifetime!


Law of Large Numbers

02 Sample mean convergence: Visualizing LLN

You roll a fair die repeatedly and compute the running average \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\).

    1. What is \(E[X_i]\) and \(\text{Var}[X_i]\) for a single roll?
    1. Compute \(E[\bar{X}_n]\) and \(\text{Var}[\bar{X}_n]\). What happens to the variance as \(n\) increases?
    1. Use Chebyshev’s inequality to bound \(P[|\bar{X}_n - 3.5| > 0.5]\). For what \(n\) is this probability less than 0.05?
    1. Sketch (or describe) how you’d expect a plot of \(\bar{X}_n\) vs \(n\) to look for \(n = 1\) to \(n = 100\).

03 When LLN fails: Cauchy counterexample

The Cauchy distribution has PDF \(f(x) = \frac{1}{\pi(1 + x^2)}\) for \(x \in \mathbb{R}\).

    1. Without computing, explain why \(E[X]\) doesn’t exist for \(X \sim \text{Cauchy}\) by considering the integral \(\int_{-\infty}^{\infty} x \cdot \frac{1}{\pi(1+x^2)} dx\).
    1. Let \(X_1, \ldots, X_n\) be i.i.d. Cauchy. The sample mean is \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\). Does \(\bar{X}_n\) converge to anything as \(n \to \infty\)?
    1. Fact: \(\bar{X}_n\) itself has the same Cauchy distribution for all \(n\). Explain why this violates the LLN, and what condition of the LLN is not satisfied.
    1. Simulate (or imagine) 1000 samples from Cauchy and compute their mean. Would you expect a “tight” estimate or wild variability? Why is averaging useless here?

Central Limit Theorem

04 CLT in Python: Simulating the magic

In this problem, you’ll write Python code to demonstrate the Central Limit Theorem empirically.

Setup: Let \(X_i \sim \text{Exp}(1)\) (exponential distribution with rate 1, so \(E[X_i] = 1\), \(\text{Var}[X_i] = 1\)).

Tasks:

    1. Generate data: For each sample size \(n \in \{1, 5, 10, 30, 50\}\):
    • Simulate \(N = 10000\) sample means \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\) (each mean computed from \(n\) i.i.d. \(\text{Exp}(1)\) random variables).
    • Store these 10000 sample means.
    1. Standardize: Compute the standardized version: \[Z_n = \frac{\bar{X}_n - 1}{\sqrt{1/n}} = \sqrt{n} \cdot (\bar{X}_n - 1).\] For each \(n\), compute this for all 10000 samples.
    1. Visualize: Create a figure with 5 subplots (one for each \(n\)):
    • Plot a histogram of the 10000 standardized values \(Z_n\).
    • Overlay the standard normal PDF \(N(0, 1)\) as a smooth curve.
    • Add a title indicating the sample size \(n\).
    1. Interpret:
    • For which values of \(n\) does the histogram closely match the normal curve?
    • The original \(\text{Exp}(1)\) distribution is highly skewed (right tail). Explain in 2-3 sentences why the CLT “fixes” this skewness as \(n\) grows.

Bonus: - Repeat for \(X_i \sim \text{Bernoulli}(0.1)\) (highly skewed discrete distribution). Does CLT work faster or slower than for exponential? Why?

Python hints:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# For generating exponential: np.random.exponential(scale=1, size=n)
# For standard normal PDF: norm.pdf(x, 0, 1)