27: Statistics - ANOVA & A/B Testing

📚 Նյութը

YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!

Դասախոսություն

🏡 Տնային


1) One-Way ANOVA

01 ✏️ Three Fertilizers

A farmer tests three fertilizers on crop yield (kg per plot):

  • Fertilizer A: [22, 25, 28, 24, 26]

  • Fertilizer B: [30, 32, 28, 35, 31]

  • Fertilizer C: [25, 27, 29, 24, 26]

    1. State \(H_0\) and \(H_1\).
    1. Compute SSB, SSW, and SST. Build the ANOVA table.
    1. Compute the \(F\)-statistic and p-value. At \(lpha = 0.05\), are the fertilizers different?
    1. Compute \(\eta^2\) (eta-squared). Is the effect small, medium, or large?
    1. If significant, run Bonferroni-corrected pairwise \(t\)-tests. Which pairs differ?

02 🐍 ANOVA in Python

Using the data from Problem 01:

    1. Run one-way ANOVA with . Verify your hand calculation.
    1. Run Tukey HSD with . Which groups differ?
    1. Check the equal-variance assumption with Levene’s test ().

2) A/B Testing

03 ✏️🐍 E-Commerce Button Color

An e-commerce site tests a red vs blue “Buy Now” button.

  • Red (control): 1,200 visitors, 96 conversions (8.0%)

  • Blue (treatment): 1,200 visitors, 120 conversions (10.0%)

    1. Test whether the blue button has a higher conversion rate using a two-proportion \(z\)-test.
    1. Compute the 95% CI for the difference in proportions. Is the lift significant?
    1. What sample size per group would be needed to detect a 2% lift with 80% power?
    1. Compute the relative lift (percentage increase over baseline). Is it meaningful?

04 🐍 The Peeking Problem

Simulate what happens when you “peek” at A/B test results:

    1. Generate two groups of \(n = 500\) from the same distribution \(N(0, 1)\) (no true effect). Run a \(t\)-test after every 10 new observations (so at \(n = 10, 20, 30, \ldots, 500\)). Plot the p-value trajectory. Does it ever dip below 0.05?
    1. Repeat 1,000 times. In what fraction of experiments does the p-value cross 0.05 at any point? (Compare with the nominal 5%.)
    1. Why is this a problem for real A/B tests? What is sequential testing?

🎲 38 (01) TODO

Flag Counter