24: Statistics — Sampling Distributions, CIs & Bootstrap

📚 Նյութը

YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!

Դասախոսություն

🏡 Տնային


1) Analytical Confidence Intervals

01 ✏️ Election Night Nail-Biter

A poll of \(n = 900\) likely voters finds that 52% support candidate A.

    1. Build a 95% Wald CI for the true proportion \(p\). Based on this interval, can you call the election for candidate A?
    1. Now compute the Wilson CI for the same data. How does it differ from the Wald CI?
    1. Imagine only \(n = 20\) voters were polled and 11 (55%) said A. Recompute both Wald and Wilson CIs. Which one behaves better near the boundary, and why?

02 ✏️ A/B Test: Ship It or Wait?

An e-commerce company runs an A/B test on their checkout page.

  • Control (\(n_1 = 200\)): conversion rate \(\hat{p}_1 = 0.08\) (8%).
  • Treatment (\(n_2 = 200\)): conversion rate \(\hat{p}_2 = 0.11\) (11%).

The product manager says “3% lift — ship it!”

    1. Build a 95% CI for the difference \(p_2 - p_1\).
    Recall: \(\text{SE} = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\).
    1. Does the CI contain 0? What does this tell the PM?
    1. How large would each group need to be (equal sizes) so that the expected CI width is narrow enough to exclude 0, assuming the true lift really is 3%? (Use the sample-size formula for two proportions.)

2) The Bootstrap

03 🐍 Bootstrap vs Formula: A Head-to-Head Race

Generate \(n = 30\) observations from \(\text{Exp}(\lambda = 1)\) in Python (set np.random.seed(509)).

    1. Compute the analytical SE of \(\bar{X}\) (since \(\sigma = 1/\lambda = 1\), this is \(\sigma / \sqrt{n}\)).
    1. Compute the bootstrap SE with \(B = 10{,}000\) resamples.
    1. Build three 95% CIs for \(\mu = 1/\lambda\):
      1. Normal-theory: \(\bar{X} \pm z_{0.025} \cdot \text{SE}\)
      1. Bootstrap percentile: \([\hat{\theta}^*_{0.025},\; \hat{\theta}^*_{0.975}]\)
      1. \(t\)-interval: \(\bar{X} \pm t_{n-1, 0.025} \cdot \frac{s}{\sqrt{n}}\)
    Compare widths. Do all three contain the true \(\mu = 1\)?
    1. Repeat (a)-(c) for the sample median instead of the mean. Which method cannot give you a formula-based SE?

04 🐍 Bootstrap the Correlation

Given these \((x, y)\) pairs: \[ (1,2),\; (2,3),\; (3,5),\; (4,4),\; (5,7),\; (6,8),\; (7,6),\; (8,9),\; (9,10),\; (10,12) \]

    1. Compute the Pearson correlation \(r\).
    1. Use the bootstrap (\(B = 5{,}000\)) to build a 95% percentile CI for the population correlation \(\rho\).
    1. Plot the bootstrap distribution of \(r^*\). Is it symmetric? If not, why might that be?
    1. Fisher’s \(z\)-transform gives an analytical CI: transform \(z = \tfrac{1}{2}\ln\!\tfrac{1+r}{1-r}\), build a normal CI using \(\text{SE}(z) = \tfrac{1}{\sqrt{n-3}}\), then back-transform with \(r = \tfrac{e^{2z}-1}{e^{2z}+1}\). Compare with your bootstrap CI.

05 ✏️🐍 When Bootstrap Breaks

Consider \(X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \text{Uniform}(0, \theta)\). The MLE is \(\hat{\theta} = X_{(n)} = \max(X_i)\).

    1. Show that \(n(\theta - X_{(n)}) \xrightarrow{d} \text{Exp}(1/\theta)\).
    Hint: start from \(P(X_{(n)} \leq x) = (x/\theta)^n\) and substitute \(u = n(\theta - x)\).
    1. Using \(n = 50\) and \(\theta = 1\), simulate 10,000 samples. For each sample, also run \(B = 1{,}000\) bootstrap resamples of \(\hat{\theta}^*\). Compare the true distribution of \(n(\theta - \hat{\theta})\) with the bootstrap distribution of \(n(\hat{\theta} - \hat{\theta}^*)\). Do they match?
    1. Explain why the bootstrap fails here. What is special about the convergence rate of \(\hat{\theta}\)?
    Hint: the bootstrap “works” when the convergence rate is \(\sqrt{n}\). Here it is \(n\).

🎲 38 (01) TODO

Flag Counter