24: Statistics — Sampling Distributions, CIs & Bootstrap
📚 Նյութը
Tip⚠️ Note
YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!
Դասախոսություն
🏡 Տնային
1) Analytical Confidence Intervals
01 ✏️ Election Night Nail-Biter
A poll of \(n = 900\) likely voters finds that 52% support candidate A.
- Build a 95% Wald CI for the true proportion \(p\). Based on this interval, can you call the election for candidate A?
- Now compute the Wilson CI for the same data. How does it differ from the Wald CI?
- Imagine only \(n = 20\) voters were polled and 11 (55%) said A. Recompute both Wald and Wilson CIs. Which one behaves better near the boundary, and why?
02 ✏️ A/B Test: Ship It or Wait?
An e-commerce company runs an A/B test on their checkout page.
- Control (\(n_1 = 200\)): conversion rate \(\hat{p}_1 = 0.08\) (8%).
- Treatment (\(n_2 = 200\)): conversion rate \(\hat{p}_2 = 0.11\) (11%).
The product manager says “3% lift — ship it!”
- Build a 95% CI for the difference \(p_2 - p_1\).
- Does the CI contain 0? What does this tell the PM?
- How large would each group need to be (equal sizes) so that the expected CI width is narrow enough to exclude 0, assuming the true lift really is 3%? (Use the sample-size formula for two proportions.)
2) The Bootstrap
03 🐍 Bootstrap vs Formula: A Head-to-Head Race
Generate \(n = 30\) observations from \(\text{Exp}(\lambda = 1)\) in Python (set np.random.seed(509)).
- Compute the analytical SE of \(\bar{X}\) (since \(\sigma = 1/\lambda = 1\), this is \(\sigma / \sqrt{n}\)).
- Compute the bootstrap SE with \(B = 10{,}000\) resamples.
- Build three 95% CIs for \(\mu = 1/\lambda\):
- Normal-theory: \(\bar{X} \pm z_{0.025} \cdot \text{SE}\)
- Bootstrap percentile: \([\hat{\theta}^*_{0.025},\; \hat{\theta}^*_{0.975}]\)
- \(t\)-interval: \(\bar{X} \pm t_{n-1, 0.025} \cdot \frac{s}{\sqrt{n}}\)
- Repeat (a)-(c) for the sample median instead of the mean. Which method cannot give you a formula-based SE?
04 🐍 Bootstrap the Correlation
Given these \((x, y)\) pairs: \[ (1,2),\; (2,3),\; (3,5),\; (4,4),\; (5,7),\; (6,8),\; (7,6),\; (8,9),\; (9,10),\; (10,12) \]
- Compute the Pearson correlation \(r\).
- Use the bootstrap (\(B = 5{,}000\)) to build a 95% percentile CI for the population correlation \(\rho\).
- Plot the bootstrap distribution of \(r^*\). Is it symmetric? If not, why might that be?
- Fisher’s \(z\)-transform gives an analytical CI: transform \(z = \tfrac{1}{2}\ln\!\tfrac{1+r}{1-r}\), build a normal CI using \(\text{SE}(z) = \tfrac{1}{\sqrt{n-3}}\), then back-transform with \(r = \tfrac{e^{2z}-1}{e^{2z}+1}\). Compare with your bootstrap CI.
05 ✏️🐍 When Bootstrap Breaks
Consider \(X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \text{Uniform}(0, \theta)\). The MLE is \(\hat{\theta} = X_{(n)} = \max(X_i)\).
- Show that \(n(\theta - X_{(n)}) \xrightarrow{d} \text{Exp}(1/\theta)\).
- Using \(n = 50\) and \(\theta = 1\), simulate 10,000 samples. For each sample, also run \(B = 1{,}000\) bootstrap resamples of \(\hat{\theta}^*\). Compare the true distribution of \(n(\theta - \hat{\theta})\) with the bootstrap distribution of \(n(\hat{\theta} - \hat{\theta}^*)\). Do they match?
- Explain why the bootstrap fails here. What is special about the convergence rate of \(\hat{\theta}\)?
🎲 38 (01) TODO
- ▶️ToDo
- 🔗Random link
- 🇦🇲🎶ToDo
- 🌐🎶ToDo
- 🤌Կարգին