19: Probability - Distributions
Չմոռանամ
https://mathlets.org/mathlets/probability-distributions/ կայքը բզբզալ գործնականին
Distribution Identification
01 Distribution detective: Which one fits?
Match each scenario to the most appropriate distribution. Justify each choice in one sentence.
- The number of typos on a randomly selected page of a 500-page book, if typos occur randomly at an average rate of 0.5 per page.
- Whether a randomly selected email is spam (yes/no), given 40% of emails are spam.
- The number of heads in 20 coin flips.
- The exact time (in minutes) you wait for the next bus, if buses arrive completely randomly at an average rate of 4 per hour.
- A randomly chosen real number between 0 and 10.
02 Name that distribution
For each scenario, identify the distribution, state its parameter(s), and write the PMF or PDF.
- A call center receives calls at an average rate of 8 per hour. Let \(X\) be the number of calls received between 2:00 PM and 3:00 PM.
- A software update crashes with probability 0.03. An IT department pushes the update to 200 computers independently. Let \(Y\) be the number of computers that crash.
- A sensor measures temperature continuously, but due to manufacturing imprecision, the true reading is somewhere between 98.5°C and 101.5°C with no value more likely than another. Let \(T\) be the measured temperature.
- A quality inspector tests light bulbs one by one. Each bulb independently fails inspection with probability 0.15. Let \(N\) be the number of bulbs tested until the first failure.
- The time between earthquakes in a seismically active region averages 4 months. Let \(W\) be the waiting time (in months) until the next earthquake.
03 Mystery distributions: Identify from data
A researcher collects data from three different experiments and computes summary statistics:
Dataset A: \(n = 500\) observations, all values are either 0 or 1. Sample mean \(\approx 0.23\), sample variance \(\approx 0.177\).
Dataset B: \(n = 1000\) observations, values range from 0 to 47. Sample mean \(\approx 12.1\), sample variance \(\approx 11.8\).
Dataset C: \(n = 800\) observations, values are positive reals ranging from 0.001 to 14.2. Sample mean \(\approx 2.5\), sample variance \(\approx 6.3\).
For each dataset:
- Identify the most likely distribution family.
- Estimate the parameter(s) of that distribution from the summary statistics.
- For Dataset B, the researcher notices that these are counts of customer complaints per day at a call center. Does this context support your answer? What if instead they were counts of “successes” in 50 independent trials per observation?
Discrete Distributions
04 The “obvious” Bernoulli that isn’t
A weighted die shows 6 with probability \(\frac{1}{3}\) and each of 1–5 with probability \(\frac{2}{15}\).
- Define a Bernoulli random variable \(X\) for “rolling a 6.” State \(p\) and compute \(E[X]\) and \(\text{Var}[X]\).
- Define a different Bernoulli random variable \(Y\) for “rolling an even number.” Compute \(E[Y]\) and \(\text{Var}[Y]\).
- For which event is the variance larger? Explain intuitively why maximum Bernoulli variance occurs at \(p = 0.5\).
05 Memoryless waiting: Geometric intuition
A slot machine pays out with probability \(p = 0.05\) on each play.
- What is the expected number of plays until the first payout?
- You’ve already played 50 times with no payout. What is the expected additional number of plays until you win?
- A gambler says: “I’m due for a win soon because I’ve lost so many times.” In 2–3 sentences, explain why this reasoning is flawed.
06 Binomial: Quality control decision
A factory produces chips with defect probability \(p = 0.02\). A batch of \(n = 100\) chips is inspected.
- Let \(X\) be the number of defective chips. State the distribution of \(X\) and compute \(E[X]\) and \(\text{Var}[X]\).
- The batch is rejected if more than 5 chips are defective. Without computing \(P[X > 5]\) exactly, explain why \(P[X > 5]\) is small.
- If \(p\) increases to \(0.10\), recompute \(E[X]\). How does this change the rejection decision intuitively?
07 Poisson: Rare events approximation
A website has 10,000 visitors per day. Each visitor independently has a \(0.0003\) probability of reporting a bug.
- Let \(X\) be the number of bug reports per day. Which distribution is a good approximation here, and what is the parameter?
- What is the probability of receiving at least one bug report?
Continuous Distributions
08 Exponential: Memoryless lifetimes
A light bulb’s lifetime (in years) follows \(\text{Exp}(\lambda = 0.5)\).
- Compute \(E[X]\) and the probability that the bulb lasts more than 3 years.
- Given that the bulb has already lasted 2 years, what is the probability it lasts at least 1 more year?
- Compare with the discrete case: if bulb failure each year is Bernoulli with \(p = 0.4\), and \(Y \sim \text{Geo}(0.4)\) counts years until failure, compute \(P[Y > 3 \mid Y > 2]\) and \(P[Y > 1]\). What do you notice?
09 Uniform: The broken stick problem
A stick of length 1 is broken at a uniformly random point \(X \sim U(0, 1)\).
- What is the expected length of the left piece?
- Let \(Y = X(1 - X)\) be the product of the two piece lengths. Compute \(E[Y]\).
- What break point \(x\) maximizes \(Y = x(1-x)\)? Compare this to \(E[X]\).
10 Normal: The 68-95-99.7 rule in action
Human heights in a population follow \(N(170, 100)\) (mean 170 cm, variance 100 cm²).
- What is \(\sigma\)? What proportion of people are between 160 cm and 180 cm tall?
- A person is 2.5 standard deviations above the mean. How tall are they?
- Standardize the height \(X = 155\) cm. Interpret the z-score: is this person unusually short?
11 Normal: Standardization and comparison
Test A has scores \(\sim N(500, 10000)\) (so \(\sigma = 100\)). Test B has scores \(\sim N(50, 100)\) (so \(\sigma = 10\)).
- Alice scores 680 on Test A. Bob scores 72 on Test B. Compute both z-scores.
- Who performed better relative to their test population?
- Explain why comparing raw scores (680 vs 72) is meaningless without standardization.
- You’re given a coin that shows heads with unknown probability \(p\). You flip it 100 times and observe 65 heads. If the coin were fair (\(p = 0.5\)), what are \(E[X]\) and \(\text{SD}[X]\) for the number of heads? How many standard deviations away from the mean is 65? What can you conclude about whether \(p = 0.5\)?
Connections Between Distributions
12 The Poisson-Exponential connection
Customers arrive at a shop according to a Poisson process with rate \(\lambda = 4\) per hour.
- What distribution does the number of arrivals in 1 hour follow? State its mean and variance.
- What distribution does the time between consecutive arrivals follow? State its mean.
- If no customer has arrived in the last 15 minutes, what is the probability that the next customer arrives within 10 minutes?
13 The “inspection paradox”
Buses arrive according to a Poisson process with rate \(\lambda = 6\) per hour (i.e., one every 10 minutes on average). You arrive at the bus stop at a uniformly random time.
- What is the distribution of time between consecutive buses? Compute its expected value.
- Intuitively, would you expect your average wait time to be 5 minutes (half the inter-arrival time)?
- The “inspection paradox” says you’re more likely to arrive during a long gap than a short one. Without computing, explain in 2–3 sentences why your expected wait might actually be longer than 5 minutes.
Applications and Critical Thinking
14 The prosecutor’s fallacy: Conditional thinking
In a city of 1 million people, a crime is committed. DNA evidence matches the suspect with a 1-in-10,000 error rate (i.e., a random person matches with probability 0.0001).
- Model the number of matching individuals in the city as a random variable. What distribution is appropriate? What is its expected value?
- The prosecutor argues: “The probability of a false match is 0.0001, so the defendant is 99.99% certain to be guilty.” Is this reasoning correct?
- If we assume the guilty person is definitely in the city, use Bayes-like reasoning to argue that the suspect’s probability of guilt depends on the expected number of matches.
🎲 xx+37 (xx)
- ▶️ToDo
- 🔗Random link ToDo
- 🇦🇲🎶ToDo
- 🌐🎶ToDo
- 🤌Կարգին ToDo