17: Probability - Exp, Var, Inequalities


📚 Նյութը

YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!

Դասախոսություն

Գործնական

1) Expectation & Variance Basics

01 Modified Die: Probability and Moments

Vahe added a dot on the 4 side of the die, making it 5, and then added two dots on the 1 side, making it 3.

    1. What is the probability that the outcome of the die is greater than 4?
    1. Find the expectation and variance of the modified die.

Vahe’s modifications: \(1 \to 3\) (added two dots) and \(4 \to 5\) (added one dot). The faces of the modified die are \(\{2, 3, 3, 5, 5, 6\}\) — each still equally likely since the underlying die is unchanged.

a) Faces strictly greater than \(4\) are \(\{5, 5, 6\}\):

\[\mathbb{P}(X > 4) = \tfrac{3}{6} = \tfrac{1}{2}\]

b) Expectation:

\[\mathbb{E}[X] = \tfrac{2 + 3 + 3 + 5 + 5 + 6}{6} = \tfrac{24}{6} = 4\]

For variance, first compute \(\mathbb{E}[X^2]\):

\[\mathbb{E}[X^2] = \tfrac{4 + 9 + 9 + 25 + 25 + 36}{6} = \tfrac{108}{6} = 18\]

\[\operatorname{Var}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2 = 18 - 16 = 2\]

Sanity check. For a standard die, \(\operatorname{Var} = 35/12 \approx 2.92\). Vahe’s die has slightly less variance because the original extremes \(1\) and \(4\) both got pulled toward the middle (to \(3\) and \(5\)).

02 Die Game: Expected Value

You roll a fair die. If you roll 1, you are paid $25. If you roll 2, you are paid $5. If you roll 3, you win nothing. If you roll 4 or 5, you must pay $10, and if you roll 6, you must pay $15.

    1. Compute the expected payoff.
    1. Do you want to play?

Each outcome has probability \(1/6\):

roll payoff ($)
1 \(+25\)
2 \(+5\)
3 \(0\)
4 \(-10\)
5 \(-10\)
6 \(-15\)

a) Expected payoff:

\[\mathbb{E}[\text{payoff}] = \tfrac{1}{6}(25 + 5 + 0 - 10 - 10 - 15) = -\tfrac{5}{6} \approx -\$0.83\]

b) Negative expected value — on average you lose about 83 cents per play. So don’t play, at least if you’re an expected-value maximizer with no liquidity constraints.

When pure expected value isn’t enough. Real decisions sometimes deviate from \(\arg\max\mathbb{E}\):

  • Asymmetric utility. If a one-time $25 win would be life-changing and a $15 loss is barely noticed, an expected-utility maximizer might still play.
  • Risk aversion. Conversely, even some positive-EV games aren’t worth playing if their variance is too high.

This is the gap between expected value and expected utility (Bernoulli, 1738) — the same theme that returns in Problem 08 (St. Petersburg).

03 Uniform Sum Expectation

Let \(X\) and \(Y\) be two continuous random variables with uniform distribution on \((0,2)\).

  • Find \(\mathbb{E}[X+Y]\).

By linearity of expectation:

\[\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]\]

For \(X \sim \mathrm{Uniform}(0, 2)\), \(\mathbb{E}[X] = (0 + 2)/2 = 1\). Same for \(Y\).

\[\mathbb{E}[X + Y] = 1 + 1 = 2\]

No independence assumption needed. Linearity of expectation \(\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]\) holds for any random variables, independent or not. This is one of the most useful identities in probability — you can decompose complicated quantities into sums and pull the expectation through, even when the pieces are correlated.

(Independence is needed for \(\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]\) and for \(\operatorname{Var}(X + Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)\).)


2) LOTUS: Law of the Unconscious Statistician

04 Expectation Without the CDF

Let \(X\sim\mathrm{Uniform}(0,1)\). Define \(Y=\log(1+X)\).

    1. Compute \(\mathbb{E}[Y]\) using LOTUS directly.
    1. Compute \(\operatorname{Var}(Y)\) (you may leave integrals in closed form).

The Law of the Unconscious Statistician (LOTUS): to compute \(\mathbb{E}[g(X)]\) for any function \(g\), you don’t need the distribution of \(Y = g(X)\) — just integrate \(g(x) \cdot f_X(x)\) over the original support:

\[\mathbb{E}[g(X)] = \int g(x)\,f_X(x)\,dx\]

For \(X \sim \mathrm{Uniform}(0, 1)\), \(f_X(x) = 1\) on \([0, 1]\).

a) \(\mathbb{E}[Y] = \mathbb{E}[\log(1 + X)]\).

\[\mathbb{E}[Y] = \int_0^1 \log(1 + x)\, dx\]

Integration by parts with \(u = \log(1+x)\), \(dv = dx\), so \(du = \tfrac{1}{1+x}\,dx\), \(v = x\):

\[\int_0^1 \log(1+x)\, dx = \big[x \log(1+x)\big]_0^1 - \int_0^1 \tfrac{x}{1+x}\, dx\]

The first piece: \(\log 2\). For the second, write \(\tfrac{x}{1+x} = 1 - \tfrac{1}{1+x}\):

\[\int_0^1 \tfrac{x}{1+x}\, dx = 1 - \log 2\]

Combining:

\[\mathbb{E}[Y] = \log 2 - (1 - \log 2) = 2\log 2 - 1 \approx 0.386\]

b) \(\operatorname{Var}(Y)\).

Compute \(\mathbb{E}[Y^2]\) via LOTUS:

\[\mathbb{E}[Y^2] = \int_0^1 \big(\log(1+x)\big)^2\, dx\]

Substitute \(u = \log(1+x)\), so \(x = e^u - 1\) and \(dx = e^u\, du\). Limits: \(x = 0 \to u = 0\), \(x = 1 \to u = \log 2\).

\[\mathbb{E}[Y^2] = \int_0^{\log 2} u^2\, e^u\, du\]

Integration by parts twice (or use the antiderivative \(\int u^2 e^u\, du = (u^2 - 2u + 2)\,e^u\)):

\[\int_0^{\log 2} u^2 e^u\, du = \big[(u^2 - 2u + 2)\,e^u\big]_0^{\log 2}\]

At \(u = \log 2\): \(\big((\log 2)^2 - 2\log 2 + 2\big)\cdot 2 = 2(\log 2)^2 - 4\log 2 + 4\). At \(u = 0\): \(2\).

\[\mathbb{E}[Y^2] = 2(\log 2)^2 - 4\log 2 + 2\]

Now the variance:

\[\operatorname{Var}(Y) = \mathbb{E}[Y^2] - \mathbb{E}[Y]^2 = \big[2(\log 2)^2 - 4\log 2 + 2\big] - (2\log 2 - 1)^2\]

Expanding \((2\log 2 - 1)^2 = 4(\log 2)^2 - 4\log 2 + 1\):

\[\operatorname{Var}(Y) = 1 - 2(\log 2)^2 \approx 0.039\]

So \(\sigma_Y \approx 0.197\) — reasonable for a quantity squeezed into roughly \([0, \log 2] \approx [0, 0.693]\).

Why LOTUS is so valuable. Without it, you’d compute \(f_Y(y)\) first (via the change-of-variable formula or CDF method) and then integrate \(y\,f_Y(y)\). LOTUS skips that detour entirely — work in the original variable’s space, with its known density. Especially useful when finding \(f_Y\) is messy or impossible in closed form.

05 Piecewise Payoff

Let \(X\sim\mathrm{Exp}(\lambda)\). A “refund policy” pays \(g(X)=\min(X,c)\) for fixed \(c>0\).

(for \(\mathrm{Exp}(\lambda)\), \(f(x)=\lambda e^{-\lambda x}\) for \(x\ge 0\) and \(F(x)=1-e^{-\lambda x}\) for \(x\ge 0\). Aprikyan will cover distibutions during next or next-next lesson.)

    1. Compute \(\mathbb{E}[g(X)]\) using LOTUS.
    1. Compute \(\mathbb{P}(g(X)=c)\).
    1. Find \(\dfrac{d}{dc}\,\mathbb{E}[g(X)]\) and interpret.

a) \(\mathbb{E}[g(X)]\) via LOTUS.

Split at \(x = c\) since \(g(x) = x\) for \(x < c\) and \(g(x) = c\) for \(x \geq c\):

\[\mathbb{E}[g(X)] = \int_0^c x\,\lambda e^{-\lambda x}\, dx + \int_c^\infty c\,\lambda e^{-\lambda x}\, dx\]

The second integral is straightforward: \(c \cdot \mathbb{P}(X \geq c) = c\,e^{-\lambda c}\).

The first integral, by parts (\(u = x\), \(dv = \lambda e^{-\lambda x}\, dx\), \(v = -e^{-\lambda x}\)):

\[\int_0^c x\,\lambda e^{-\lambda x}\, dx = \big[-x e^{-\lambda x}\big]_0^c + \int_0^c e^{-\lambda x}\, dx = -c\,e^{-\lambda c} + \tfrac{1}{\lambda}\big(1 - e^{-\lambda c}\big)\]

Adding the two pieces, the \(-c\,e^{-\lambda c}\) and \(+c\,e^{-\lambda c}\) cancel:

\[\boxed{\mathbb{E}[g(X)] = \tfrac{1}{\lambda}\big(1 - e^{-\lambda c}\big)}\]

Sanity checks. As \(c \to 0\): \(\mathbb{E}[g(X)] \to 0\) (cap is zero, payoff is zero). As \(c \to \infty\): \(\mathbb{E}[g(X)] \to 1/\lambda = \mathbb{E}[X]\) (cap is irrelevant, you get the full claim).

b) \(\mathbb{P}(g(X) = c)\).

The capped value \(g(X)\) equals exactly \(c\) precisely when \(X \geq c\) (otherwise \(g(X) = X < c\)):

\[\mathbb{P}(g(X) = c) = \mathbb{P}(X \geq c) = e^{-\lambda c}\]

c) Marginal value of raising the cap.

Differentiate the answer from (a):

\[\frac{d}{dc}\mathbb{E}[g(X)] = \frac{d}{dc}\!\left[\tfrac{1}{\lambda}(1 - e^{-\lambda c})\right] = e^{-\lambda c}\]

Compare with (b): they’re equal.

\[\frac{d}{dc}\mathbb{E}[g(X)] = \mathbb{P}(g(X) = c) = e^{-\lambda c}\]

Interpretation: the envelope theorem in miniature. Raising the cap by a tiny \(dc\) helps only when the cap is binding — i.e., when \(X \geq c\). The marginal expected value of the cap equals the probability it’s currently active.

This pattern shows up wherever decisions involve constraints:

  • In real-options pricing, the marginal value of a cap/floor equals the probability the option is “in the money.”
  • In inventory management, the marginal value of one more unit of stock equals the probability of stocking out.
  • In gradient clipping for ML, the marginal effect of raising the clip threshold equals the fraction of steps currently being clipped.

3) Expectation & Variance (Decision / Games)

06 When to Stop (Secretary-lite)

You see prices of used laptops one by one, i.i.d. \(\mathrm{Uniform}(0,1)\). You can accept one price and stop, or reject and continue; once rejected, it’s gone. You must decide a stopping rule.

    1. Consider the rule: “accept the first price \(\le t\).” Compute the expected accepted price as a function of \(t\) given a maximum of \(N\) offers.
    1. Find (approximately) the best \(t\) for \(N=10\).

We accept the first price below threshold \(t\), and if no price comes in below \(t\) in \(N\) tries, we’re forced to accept the last one.

a) Expected accepted price.

Probability of accepting at position \(k\):

  • For \(k = 1, \ldots, N - 1\): previous \(k-1\) all exceeded \(t\), current is \(\leq t\)\((1-t)^{k-1}\cdot t\)
  • For \(k = N\) (forced): all previous \(N-1\) exceeded \(t\)\((1-t)^{N-1}\)

Conditional means:

  • Given \(X_k \leq t\): uniform on \([0, t]\), so \(\mathbb{E}[X_k \mid X_k \leq t] = t/2\)
  • The forced \(X_N\) is independent of the “forced” event, so \(\mathbb{E}[X_N] = 1/2\)

Putting it together:

\[\mathbb{E}[\text{accepted}] = \sum_{k=1}^{N-1}(1-t)^{k-1}\cdot t \cdot \tfrac{t}{2} + (1-t)^{N-1}\cdot\tfrac{1}{2}\]

The geometric sum \(\sum_{k=1}^{N-1}(1-t)^{k-1} = \tfrac{1 - (1-t)^{N-1}}{t}\). Substituting and simplifying:

\[\mathbb{E}[\text{accepted}] = \tfrac{t}{2} + \tfrac{(1-t)^N}{2}\]

b) Optimal \(t\) for \(N = 10\).

Differentiate and set to zero:

\[\frac{d}{dt}\,\mathbb{E} = \tfrac{1}{2}\big[1 - N(1-t)^{N-1}\big] = 0 \;\Rightarrow\; (1-t)^{N-1} = \tfrac{1}{N}\]

For \(N = 10\): \((1-t)^9 = 1/10\), so \(1-t = 10^{-1/9} \approx 0.7743\):

\[\boxed{t^* \approx 0.226}\]

The corresponding minimum expected accepted price:

\[\mathbb{E}(t^*) = \tfrac{0.226}{2} + \tfrac{(0.7743)^{10}}{2} \approx 0.113 + 0.039 \approx 0.152\]

So the optimal rule on 10 offers gives an expected accepted price around \(0.152\) — far better than \(1/2\) (the no-strategy mean).

Why \(t = 1/2\) is wrong. Naïvely you might think “set the threshold to the mean, beat the mean.” But \(t = 1/2\) gives \(\mathbb{E} \approx 0.25\) — much worse than \(0.152\). With many offers, you can afford to be picky early.

This is the classical optimal-stopping problem and underlies job-search models, real-estate negotiations, and dating-search papers.

07 Optimal Reroll (Single Reroll Allowed)

You roll a die once; you may choose to keep it or reroll once (then must keep). Goal: maximize expected value.

    1. What threshold rule is optimal?
    1. What is the resulting expected value?
    1. Compute the variance of the final payoff under the optimal strategy.

a) Threshold rule.

Keep the first roll if \(X\) exceeds the expected value of rerolling. The expected value of a fresh die is \(\mathbb{E}[Y] = 3.5\), so:

  • Keep if \(X > 3.5\), i.e., \(X \in \{4, 5, 6\}\)
  • Reroll if \(X < 3.5\), i.e., \(X \in \{1, 2, 3\}\)

b) Expected final value.

\[\mathbb{E}[\text{final}] = \mathbb{P}(\text{keep})\cdot\mathbb{E}[X \mid \text{keep}] + \mathbb{P}(\text{reroll})\cdot\mathbb{E}[\text{reroll}]\]

\[= \tfrac{3}{6}\cdot\tfrac{4+5+6}{3} + \tfrac{3}{6}\cdot\tfrac{7}{2} = \tfrac{1}{2}\cdot 5 + \tfrac{1}{2}\cdot\tfrac{7}{2} = \tfrac{17}{4} = 4.25\]

The reroll strategy raises the expected value from \(3.5\) (no reroll) to \(4.25\).

c) Variance.

\[\mathbb{E}[\text{final}^2] = \tfrac{1}{2}\cdot\tfrac{16+25+36}{3} + \tfrac{1}{2}\cdot\mathbb{E}[Y^2]\]

For a fair die, \(\mathbb{E}[Y^2] = (1+4+9+16+25+36)/6 = 91/6\). Kept-side mean of squares is \(77/3\):

\[\mathbb{E}[\text{final}^2] = \tfrac{77}{6} + \tfrac{91}{12} = \tfrac{154 + 91}{12} = \tfrac{245}{12}\]

\[\operatorname{Var}(\text{final}) = \tfrac{245}{12} - \left(\tfrac{17}{4}\right)^2 = \tfrac{980 - 867}{48} = \tfrac{113}{48} \approx 2.354\]

For comparison, a single die roll has variance \(35/12 \approx 2.917\). The reroll strategy reduces variance — by rerolling bad outcomes you trade randomness for a higher and tighter mean.

The general principle. Optimal stopping with reservation values: replace what you have if its value falls below the expected value of continuing. Same structure as Problem 06, just smaller scale.

08 St. Petersburg Game (Bonus)

A fair coin is tossed until the first Heads appears. If Heads appears on toss \(k\), you get \(2^k\) dollars.

    1. Compute the expected payoff.
    1. Why might people still refuse to pay an “infinite fair price” to play?

a) Expected payoff.

The first Heads occurs on toss \(k\) with probability \((1/2)^k\) (Geometric distribution), and pays \(2^k\):

\[\mathbb{E}[\text{payoff}] = \sum_{k=1}^{\infty} \tfrac{1}{2^k}\cdot 2^k = \sum_{k=1}^{\infty} 1 = \infty\]

Every term is exactly \(1\) — each “branch” of the game contributes one dollar of expected value, and there are infinitely many branches.

b) Why the “infinite fair price” is unrealistic.

  1. Diminishing marginal utility of money. The first $1{,}000 changes your life more than the next $1{,}000{,}000. Bernoulli (1738) proposed using \(\log(\text{wealth})\) instead of wealth — under log-utility the expected utility is finite (\(\sum k/2^k = 2\), equivalent to a fair price of about $4).

  2. Bounded payoff in practice. No casino can pay \(2^{1000}\) dollars even if Heads holds out that long. Truncating at any realistic upper bound makes the expected payoff finite and small.

  3. Variance is also infinite. Even if EV were the right metric, the risk is unbounded — most plays pay almost nothing, with a tiny chance of an astronomical jackpot.

  4. Time and ergodicity. Realistic players play finite games over a lifetime, not infinite ones. The “ensemble” expected value is much higher than the “time” expected value of a single life of plays.

This is the St. Petersburg paradox, foundational in:

  • Decision theory (utility functions, Allais paradox)
  • Behavioral economics (Kahneman–Tversky prospect theory)
  • Risk management (Kelly criterion for sizing bets)

4) Indicators & Counting (Expectations via Linearity)

09 Coupon Collector: Sticker Packs

A shop gives one random sticker from a set of \(n\) stickers with each purchase (uniform, independent).

    1. Expected number of purchases to collect all \(n\) stickers (you may express the answer using harmonic numbers).
    1. For \(n=50\), give a rough numerical approximation.

a) Expected purchases to collect all \(n\) stickers.

The trick: break \(T\) into stages by the new sticker count. Let \(T_i\) be the number of purchases needed to get the \(i\)-th new sticker, given you already have \(i - 1\) distinct.

With \(i - 1\) distinct stickers, each pack contains a new sticker with probability \(\tfrac{n - (i-1)}{n} = \tfrac{n - i + 1}{n}\). So \(T_i \sim \mathrm{Geometric}\!\left(\tfrac{n-i+1}{n}\right)\), with \(\mathbb{E}[T_i] = \tfrac{n}{n - i + 1}\).

By linearity of expectation:

\[\mathbb{E}[T] = \sum_{i=1}^{n} \tfrac{n}{n - i + 1} = n\sum_{j=1}^{n}\tfrac{1}{j} = n\cdot H_n\]

where \(H_n = 1 + \tfrac{1}{2} + \cdots + \tfrac{1}{n}\) is the \(n\)-th harmonic number.

b) Approximation for \(n = 50\).

Use \(H_n \approx \ln n + \gamma + \tfrac{1}{2n}\), where \(\gamma \approx 0.5772\):

\[H_{50} \approx \ln 50 + 0.577 + 0.01 \approx 4.499\]

\[\mathbb{E}[T] \approx 50 \cdot 4.499 \approx 225\]

About 225 purchases on average to collect all 50 — roughly \(4.5\times\) the number of stickers.

Where the time goes. The hardest sticker to find is the last: \(T_{50} \sim \mathrm{Geometric}(1/50)\) with \(\mathbb{E}[T_{50}] = 50\). So 50 of your 225 expected purchases (>20%) are spent waiting for the very last one — a “long tail” generic to collecting independent draws.

10 Distinct Stickers After \(n\) Packs: \(\mathbb{E}[D]\), \(\operatorname{Var}(D)\)

You buy \(n\) sticker packs; each pack contains one sticker uniformly from \(\{1,\dots,m\}\), independent. Let \(D\) be the number of distinct stickers you have after \(n\) packs.

    1. Find \(\mathbb{E}[D]\).
    1. Find \(\operatorname{Var}(D)\) using indicators and pairwise terms.

The dual of the coupon collector: fix the number of packs \(n\) and ask how many distinct stickers we end up with.

Setup with indicators. For sticker \(j \in \{1, \ldots, m\}\), let \(I_j = 1\) if sticker \(j\) appears at least once in the \(n\) packs, else \(0\). Then \(D = \sum_{j=1}^m I_j\).

The probability sticker \(j\) is missed in any one pack is \(\tfrac{m-1}{m}\). Across \(n\) independent packs:

\[\mathbb{P}(I_j = 0) = \left(\tfrac{m-1}{m}\right)^n \;\Rightarrow\; \mathbb{P}(I_j = 1) = 1 - \left(\tfrac{m-1}{m}\right)^n\]

Let \(p = \tfrac{m-1}{m}\) for shorthand.

a) \(\mathbb{E}[D]\) via linearity.

\[\mathbb{E}[D] = \sum_{j=1}^m \mathbb{E}[I_j] = m(1 - p^n) = \boxed{\,m\left[1 - \left(\tfrac{m-1}{m}\right)^n\right]\,}\]

Intuition checks. \(\mathbb{E}[D] = 0\) at \(n = 0\) ✓; \(\mathbb{E}[D] \to m\) as \(n \to \infty\) ✓; for small \(n\), Taylor-expand \((1 - 1/m)^n \approx 1 - n/m\) giving \(\mathbb{E}[D] \approx n\) — every pack is essentially a new sticker.

b) \(\operatorname{Var}(D)\) — needs covariance.

The indicators \(I_j\) are not independent. Use:

\[\operatorname{Var}(D) = \sum_j \operatorname{Var}(I_j) + \sum_{i \neq j}\operatorname{Cov}(I_i, I_j)\]

Variance terms (each \(I_j\) is Bernoulli with \(\mathbb{P}(I_j = 1) = 1 - p^n\)):

\[\sum_j \operatorname{Var}(I_j) = m\cdot p^n(1 - p^n)\]

Covariance terms. Let \(q = \tfrac{m-2}{m}\). The probability that both stickers \(i\) and \(j\) are missing is \(\mathbb{P}(I_i = 0, I_j = 0) = q^n\). By inclusion–exclusion:

\[\mathbb{P}(I_i = 1, I_j = 1) = 1 - 2p^n + q^n\]

Then:

\[\operatorname{Cov}(I_i, I_j) = (1 - 2p^n + q^n) - (1 - p^n)^2 = q^n - p^{2n}\]

Sign check. Algebra shows \(q < p^2\) (i.e. \(\tfrac{m-2}{m} < \tfrac{(m-1)^2}{m^2}\)), so \(q^n < p^{2n}\) and \(\operatorname{Cov} < 0\) — slightly negative, as expected (if sticker \(i\) never showed, packs were spent on the others, slightly raising the chance \(j\) did show).

Combining:

\[\boxed{\operatorname{Var}(D) = m\,p^n(1 - p^n) + m(m-1)\big(q^n - p^{2n}\big)}\]

Where this shows up in ML. This is the variance analysis for bag-of-features / frequency estimators — how many distinct words appear in a corpus, distinct hash buckets hit by a stream, distinct classes sampled in a minibatch. The negative covariance reflects competition for limited data.


5) Inequalities (Markov & Chebyshev)

11 Markov — “What’s the chance my bill is huge?”

Your monthly electricity bill \(B\ge 0\) has average \(\mathbb{E}[B]=\$80\).

    1. Use Markov’s inequality to bound \(\mathbb{P}(B\ge \$200)\) and \(\mathbb{P}(B\ge \$300)\).
    1. Suppose the provider claims: “the probability of a \(\$300+\) bill is at most 5%.” What average bill \(\mathbb{E}[B]\) would make this statement true by Markov?

Markov’s inequality. For any non-negative random variable \(B\) and any \(a > 0\):

\[\mathbb{P}(B \geq a) \leq \frac{\mathbb{E}[B]}{a}\]

The intuition is “you can’t have too much probability mass too far above the mean, because that mass has to be paid for in the average.”

a) Bounds with \(\mathbb{E}[B] = \$80\).

\[\mathbb{P}(B \geq 200) \leq \tfrac{80}{200} = \tfrac{2}{5} = 0.4\]

\[\mathbb{P}(B \geq 300) \leq \tfrac{80}{300} = \tfrac{4}{15} \approx 0.267\]

b) What \(\mathbb{E}[B]\) would make \(\mathbb{P}(B \geq 300) \leq 5\%\) Markov-provable?

We need \(\mathbb{E}[B]/300 \leq 0.05\):

\[\mathbb{E}[B] \leq 0.05 \cdot 300 = \$15\]

So Markov can guarantee the provider’s claim only if the population mean bill is at most $15.

Markov is loose — its strengths and limits.

  • Markov uses only the mean, not the variance or shape. It works for any non-negative distribution with that mean — including pathological ones.
  • Real distributions usually have much smaller tails. Your actual electricity bill is unlikely to triple, even though Markov allows \(\mathbb{P}(B \geq 3\,\mathbb{E}[B]) \leq 1/3\).
  • If you also know the variance, Chebyshev (Problem 12) gives tighter two-sided bounds.
  • For sums of i.i.d. variables, Hoeffding / Chernoff bounds give exponentially tight tails.

The provider’s claim might be true for their distribution — Markov simply can’t verify it without further assumptions.

12 Chebyshev — “Commute-time reliability”

Commute time \(T\) (minutes) has mean \(\mu=40\) and variance \(\sigma^2=25\) (so \(\sigma=5\)).

    1. Use Chebyshev’s inequality to upper-bound \(\mathbb{P}(T\ge 55)\) and \(\mathbb{P}(T\le 25)\).
    1. How large must a time buffer \(b\) be so that \(\mathbb{P}(T\le \mu+b)\ge 0.95\)?

Chebyshev’s inequality. For any random variable \(T\) with finite variance and any \(k > 0\):

\[\mathbb{P}(|T - \mu| \geq k\sigma) \leq \tfrac{1}{k^2}\]

Stronger than Markov because it uses both mean and variance.

a) Bounds.

With \(\mu = 40\), \(\sigma = 5\), both events \(T \geq 55\) and \(T \leq 25\) correspond to \(|T - 40| \geq 15\). That’s exactly \(3\sigma\), so \(k = 3\):

\[\mathbb{P}(|T - 40| \geq 15) \leq \tfrac{1}{9} \approx 0.111\]

This is the two-sided bound. Each one-sided event sits inside it (their sum is at most \(1/9\)):

\[\mathbb{P}(T \geq 55) \leq \tfrac{1}{9}, \qquad \mathbb{P}(T \leq 25) \leq \tfrac{1}{9}\]

b) Buffer \(b\) for 95% on-time arrival.

We want \(\mathbb{P}(T \leq \mu + b) \geq 0.95\), equivalently \(\mathbb{P}(T > \mu + b) \leq 0.05\).

Set \(b = k\sigma\) and bound by Chebyshev:

\[\mathbb{P}(T > \mu + k\sigma) \;\leq\; \mathbb{P}(|T - \mu| \geq k\sigma) \;\leq\; \tfrac{1}{k^2}\]

We need \(\tfrac{1}{k^2} \leq 0.05\):

\[k^2 \geq 20 \;\Rightarrow\; k \geq \sqrt{20} = 2\sqrt{5} \approx 4.47\]

\[b = k\sigma \geq \sqrt{20}\cdot 5 = 5\sqrt{20} \approx 22.4 \text{ minutes}\]

So a 22-minute buffer guarantees (by Chebyshev) on-time arrival with probability \(\geq 95\%\).

Why this is so conservative. Chebyshev makes no shape assumption about \(T\) — its bound has to work for the worst distribution with the given \((\mu, \sigma)\). For a Normal-distributed commute time, \(\mathbb{P}(T > \mu + 1.645\sigma) = 0.05\), so a buffer of just \(1.645 \cdot 5 \approx 8.2\) minutes suffices — nearly 3× tighter than Chebyshev’s answer.

Take-away. If you have only \(\mu\) and \(\sigma\), Chebyshev is the best general bound available. With a known distribution shape (Normal, exponential, etc.), you can do much better. This trade-off — more assumptions ⇒ tighter conclusions — is one of the central themes of statistical inference.

🎲 xx+37 (xx)

Flag Counter