25: Statistics — Hypothesis Testing, p-Values & Power
📚 Նյութը
Tip⚠️ Note
YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!
Դասախոսություն
🏡 Տնային
1) Core Logic & p-Values
01 ✏️ Guilty or Not?
A jury must decide whether a defendant is guilty. Map the hypothesis-testing framework onto the courtroom:
- What plays the role of \(H_0\)? Of \(H_1\)?
- What is a Type I error in this context? A Type II error?
- What is the “significance level” \(\alpha\) analogous to?
- In a criminal trial, which error is considered worse? What about in a medical screening for a serious disease? How does this affect the choice of \(\alpha\)?
02 ✏️ The Suspicious Coin
You flip a coin 100 times and observe 60 heads.
- State \(H_0\) and \(H_1\) (two-sided).
- Compute the \(z\)-statistic.
- Find the p-value. At \(\alpha = 0.05\), do you reject? At \(\alpha = 0.01\)?
- Build a 95% CI for \(p\). Verify that your test decision matches whether \(p_0 = 0.5\) lies inside the CI.
03 ✏️ p-Value Misconceptions
A study reports \(p = 0.03\). For each statement below, say TRUE or FALSE and briefly explain.
- “There is a 3% probability that \(H_0\) is true.”
- “If we reject \(H_0\), there is a 3% chance we made an error.”
- “If we repeated the study many times, we would get \(p < 0.05\) at least 97% of the time.”
- “The probability of observing data this extreme (or more), assuming \(H_0\) is true, is 0.03.”
- “The effect is practically important.”
2) Multiple Testing
04 🐍 The p-Hacking Experiment
Simulate the multiple-testing disaster in Python:
- Generate 20 independent datasets, each containing two groups of \(n = 30\) drawn from the same \(N(0, 1)\) (so \(H_0\) is true for all 20). Run a two-sample \(t\)-test on each pair. How many give \(p < 0.05\)?
- Repeat the entire experiment 1,000 times. In what fraction of repetitions do you find at least one significant result (\(p < 0.05\))?
- Compare with the theoretical answer: \(1 - (1 - 0.05)^{20} \approx\;\)?
- Take one of the runs where several tests are “significant.” Apply Bonferroni and Benjamini–Hochberg corrections. How many remain significant after each?
3) Permutation Tests
05 🐍 Build Your Own Permutation Test
Treatment group: \([5.2,\; 6.1,\; 7.3,\; 5.8,\; 6.9]\). Control group: \([4.1,\; 3.9,\; 5.0,\; 4.5,\; 4.3]\).
- Compute the observed difference in means \(\bar{X}_T - \bar{X}_C\).
- Under \(H_0\) (no treatment effect), all \(\binom{10}{5} = 252\) ways to split the 10 values into two groups of 5 are equally likely. Enumerate all 252 permutations and compute \(\bar{X}_T^* - \bar{X}_C^*\) for each.
- What fraction of permutations produce a difference \(\geq\) the observed value? This is your exact one-sided p-value.
- Compare with the Welch \(t\)-test p-value (
scipy.stats.ttest_ind). Do they agree?
- Compare with the Welch \(t\)-test p-value (
📊 Reference Tables
NoteZ-Table & t-Table
Standard Normal (Z) Table
Quick reference: \(z_{0.025} = 1.96\), \(z_{0.005} = 2.576\), \(z_{0.05} = 1.645\) (one-sided)
Student’s t-Table
🎲 38 (01) TODO
- ▶️ToDo
- 🔗Random link
- 🇦🇲🎶ToDo
- 🌐🎶ToDo
- 🤌Կարգին