25: Statistics — Hypothesis Testing, p-Values & Power

📚 Նյութը

YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!

Դասախոսություն

🏡 Տնային


1) Core Logic & p-Values

01 ✏️ Guilty or Not?

A jury must decide whether a defendant is guilty. Map the hypothesis-testing framework onto the courtroom:

    1. What plays the role of \(H_0\)? Of \(H_1\)?
    1. What is a Type I error in this context? A Type II error?
    1. What is the “significance level” \(\alpha\) analogous to?
    1. In a criminal trial, which error is considered worse? What about in a medical screening for a serious disease? How does this affect the choice of \(\alpha\)?

02 ✏️ The Suspicious Coin

You flip a coin 100 times and observe 60 heads.

    1. State \(H_0\) and \(H_1\) (two-sided).
    1. Compute the \(z\)-statistic.
    1. Find the p-value. At \(\alpha = 0.05\), do you reject? At \(\alpha = 0.01\)?
    1. Build a 95% CI for \(p\). Verify that your test decision matches whether \(p_0 = 0.5\) lies inside the CI.

03 ✏️ p-Value Misconceptions

A study reports \(p = 0.03\). For each statement below, say TRUE or FALSE and briefly explain.

    1. “There is a 3% probability that \(H_0\) is true.”
    1. “If we reject \(H_0\), there is a 3% chance we made an error.”
    1. “If we repeated the study many times, we would get \(p < 0.05\) at least 97% of the time.”
    1. “The probability of observing data this extreme (or more), assuming \(H_0\) is true, is 0.03.”
    1. “The effect is practically important.”

2) Multiple Testing

04 🐍 The p-Hacking Experiment

Simulate the multiple-testing disaster in Python:

    1. Generate 20 independent datasets, each containing two groups of \(n = 30\) drawn from the same \(N(0, 1)\) (so \(H_0\) is true for all 20). Run a two-sample \(t\)-test on each pair. How many give \(p < 0.05\)?
    1. Repeat the entire experiment 1,000 times. In what fraction of repetitions do you find at least one significant result (\(p < 0.05\))?
    1. Compare with the theoretical answer: \(1 - (1 - 0.05)^{20} \approx\;\)?
    1. Take one of the runs where several tests are “significant.” Apply Bonferroni and Benjamini–Hochberg corrections. How many remain significant after each?

3) Permutation Tests

05 🐍 Build Your Own Permutation Test

Treatment group: \([5.2,\; 6.1,\; 7.3,\; 5.8,\; 6.9]\). Control group: \([4.1,\; 3.9,\; 5.0,\; 4.5,\; 4.3]\).

    1. Compute the observed difference in means \(\bar{X}_T - \bar{X}_C\).
    1. Under \(H_0\) (no treatment effect), all \(\binom{10}{5} = 252\) ways to split the 10 values into two groups of 5 are equally likely. Enumerate all 252 permutations and compute \(\bar{X}_T^* - \bar{X}_C^*\) for each.
    1. What fraction of permutations produce a difference \(\geq\) the observed value? This is your exact one-sided p-value.
    1. Compare with the Welch \(t\)-test p-value (scipy.stats.ttest_ind). Do they agree?

📊 Reference Tables

Standard Normal (Z) Table

Quick reference: \(z_{0.025} = 1.96\), \(z_{0.005} = 2.576\), \(z_{0.05} = 1.645\) (one-sided)

Student’s t-Table

🎲 38 (01) TODO

Flag Counter