19: Probability - Distributions

Չմոռանամ

https://mathlets.org/mathlets/probability-distributions/ կայքը բզբզալ գործնականին

Distribution Identification

01 Distribution detective: Which one fits?

Match each scenario to the most appropriate distribution. Justify each choice in one sentence.

1. The number of typos on a randomly selected page of a 500-page book, if typos occur randomly at an average rate of 0.5 per page.
1. Whether a randomly selected email is spam (yes/no), given 40% of emails are spam.
1. The number of heads in 20 coin flips.
1. The exact time (in minutes) you wait for the next bus, if buses arrive completely randomly at an average rate of 4 per hour.
1. A randomly chosen real number between 0 and 10.

02 Name that distribution

For each scenario, identify the distribution, state its parameter(s), and write the PMF or PDF.

1. A call center receives calls at an average rate of 8 per hour. Let $X$ be the number of calls received between 2:00 PM and 3:00 PM.
1. A software update crashes with probability 0.03. An IT department pushes the update to 200 computers independently. Let $Y$ be the number of computers that crash.
1. A sensor measures temperature continuously, but due to manufacturing imprecision, the true reading is somewhere between 98.5°C and 101.5°C with no value more likely than another. Let $T$ be the measured temperature.
1. A quality inspector tests light bulbs one by one. Each bulb independently fails inspection with probability 0.15. Let $N$ be the number of bulbs tested until the first failure.
1. The time between earthquakes in a seismically active region averages 4 months. Let $W$ be the waiting time (in months) until the next earthquake.

03 Mystery distributions: Identify from data

A researcher collects data from three different experiments and computes summary statistics:

Dataset A: $n = 500$ observations, all values are either 0 or 1. Sample mean $\approx 0.23$, sample variance $\approx 0.177$.

Dataset B: $n = 1000$ observations, values range from 0 to 47. Sample mean $\approx 12.1$, sample variance $\approx 11.8$.

Dataset C: $n = 800$ observations, values are positive reals ranging from 0.001 to 14.2. Sample mean $\approx 2.5$, sample variance $\approx 6.3$.

For each dataset:

1. Identify the most likely distribution family.
1. Estimate the parameter(s) of that distribution from the summary statistics.
1. For Dataset B, the researcher notices that these are counts of customer complaints per day at a call center. Does this context support your answer? What if instead they were counts of “successes” in 50 independent trials per observation?

Discrete Distributions

04 The “obvious” Bernoulli that isn’t

A weighted die shows 6 with probability $\frac{1}{3}$ and each of 1–5 with probability $\frac{2}{15}$.

1. Define a Bernoulli random variable $X$ for “rolling a 6.” State $p$ and compute $E[X]$ and $\text{Var}[X]$.
1. Define a different Bernoulli random variable $Y$ for “rolling an even number.” Compute $E[Y]$ and $\text{Var}[Y]$.
1. For which event is the variance larger? Explain intuitively why maximum Bernoulli variance occurs at $p = 0.5$.

05 Memoryless waiting: Geometric intuition

A slot machine pays out with probability $p = 0.05$ on each play.

1. What is the expected number of plays until the first payout?
1. You’ve already played 50 times with no payout. What is the expected additional number of plays until you win?
1. A gambler says: “I’m due for a win soon because I’ve lost so many times.” In 2–3 sentences, explain why this reasoning is flawed.

06 Binomial: Quality control decision

A factory produces chips with defect probability $p = 0.02$. A batch of $n = 100$ chips is inspected.

1. Let $X$ be the number of defective chips. State the distribution of $X$ and compute $E[X]$ and $\text{Var}[X]$.
1. The batch is rejected if more than 5 chips are defective. Without computing $P[X > 5]$ exactly, explain why $P[X > 5]$ is small.
1. If $p$ increases to $0.10$, recompute $E[X]$. How does this change the rejection decision intuitively?

07 Poisson: Rare events approximation

A website has 10,000 visitors per day. Each visitor independently has a $0.0003$ probability of reporting a bug.

1. Let $X$ be the number of bug reports per day. Which distribution is a good approximation here, and what is the parameter?
1. What is the probability of receiving at least one bug report?

Continuous Distributions

08 Exponential: Memoryless lifetimes

A light bulb’s lifetime (in years) follows $\text{Exp}(\lambda = 0.5)$.

1. Compute $E[X]$ and the probability that the bulb lasts more than 3 years.
1. Given that the bulb has already lasted 2 years, what is the probability it lasts at least 1 more year?
1. Compare with the discrete case: if bulb failure each year is Bernoulli with $p = 0.4$, and $Y \sim \text{Geo}(0.4)$ counts years until failure, compute $P[Y > 3 \mid Y > 2]$ and $P[Y > 1]$. What do you notice?

09 Uniform: The broken stick problem

A stick of length 1 is broken at a uniformly random point $X \sim U(0, 1)$.

1. What is the expected length of the left piece?
1. Let $Y = X(1 - X)$ be the product of the two piece lengths. Compute $E[Y]$.
1. What break point $x$ maximizes $Y = x(1-x)$? Compare this to $E[X]$.

10 Normal: The 68-95-99.7 rule in action

Human heights in a population follow $N(170, 100)$ (mean 170 cm, variance 100 cm²).

1. What is $\sigma$? What proportion of people are between 160 cm and 180 cm tall?
1. A person is 2.5 standard deviations above the mean. How tall are they?
1. Standardize the height $X = 155$ cm. Interpret the z-score: is this person unusually short?

11 Normal: Standardization and comparison

Test A has scores $\sim N(500, 10000)$ (so $\sigma = 100$). Test B has scores $\sim N(50, 100)$ (so $\sigma = 10$).

1. Alice scores 680 on Test A. Bob scores 72 on Test B. Compute both z-scores.
1. Who performed better relative to their test population?
1. Explain why comparing raw scores (680 vs 72) is meaningless without standardization.
1. You’re given a coin that shows heads with unknown probability $p$. You flip it 100 times and observe 65 heads. If the coin were fair ($p = 0.5$), what are $E[X]$ and $\text{SD}[X]$ for the number of heads? How many standard deviations away from the mean is 65? What can you conclude about whether $p = 0.5$?

Connections Between Distributions

12 The Poisson-Exponential connection

Customers arrive at a shop according to a Poisson process with rate $\lambda = 4$ per hour.

1. What distribution does the number of arrivals in 1 hour follow? State its mean and variance.
1. What distribution does the time between consecutive arrivals follow? State its mean.
1. If no customer has arrived in the last 15 minutes, what is the probability that the next customer arrives within 10 minutes?

13 The “inspection paradox”

Buses arrive according to a Poisson process with rate $\lambda = 6$ per hour (i.e., one every 10 minutes on average). You arrive at the bus stop at a uniformly random time.

1. What is the distribution of time between consecutive buses? Compute its expected value.
1. Intuitively, would you expect your average wait time to be 5 minutes (half the inter-arrival time)?
1. The “inspection paradox” says you’re more likely to arrive during a long gap than a short one. Without computing, explain in 2–3 sentences why your expected wait might actually be longer than 5 minutes.

Applications and Critical Thinking

14 The prosecutor’s fallacy: Conditional thinking

In a city of 1 million people, a crime is committed. DNA evidence matches the suspect with a 1-in-10,000 error rate (i.e., a random person matches with probability 0.0001).

1. Model the number of matching individuals in the city as a random variable. What distribution is appropriate? What is its expected value?
1. The prosecutor argues: “The probability of a false match is 0.0001, so the defendant is 99.99% certain to be guilty.” Is this reasoning correct?
1. If we assume the guilty person is definitely in the city, use Bayes-like reasoning to argue that the suspect’s probability of guilt depends on the expected number of matches.

🎲 xx+37 (xx)

▶️ToDo
🔗Random link ToDo
🇦🇲🎶ToDo
🌐🎶ToDo
🤌Կարգին ToDo

--- title: "19: Probability - Distributions" format: html: css: homework-styles.css --- <script src="homework-scripts.js"></script> --- ## Չմոռանամ https://mathlets.org/mathlets/probability-distributions/ կայքը բզբզալ գործնականին ## Distribution Identification ### 01 Distribution detective: Which one fits? {data-difficulty="2"} Match each scenario to the most appropriate distribution. Justify each choice in one sentence. - a) The number of typos on a randomly selected page of a 500-page book, if typos occur randomly at an average rate of 0.5 per page. - b) Whether a randomly selected email is spam (yes/no), given 40% of emails are spam. - c) The number of heads in 20 coin flips. - d) The exact time (in minutes) you wait for the next bus, if buses arrive completely randomly at an average rate of 4 per hour. - e) A randomly chosen real number between 0 and 10. --- ### 02 Name that distribution {data-difficulty="2"} For each scenario, identify the distribution, state its parameter(s), and write the PMF or PDF. - a) A call center receives calls at an average rate of 8 per hour. Let $X$ be the number of calls received between 2:00 PM and 3:00 PM. - b) A software update crashes with probability 0.03. An IT department pushes the update to 200 computers independently. Let $Y$ be the number of computers that crash. - c) A sensor measures temperature continuously, but due to manufacturing imprecision, the true reading is somewhere between 98.5°C and 101.5°C with no value more likely than another. Let $T$ be the measured temperature. - d) A quality inspector tests light bulbs one by one. Each bulb independently fails inspection with probability 0.15. Let $N$ be the number of bulbs tested until the first failure. - e) The time between earthquakes in a seismically active region averages 4 months. Let $W$ be the waiting time (in months) until the next earthquake. --- ### 03 Mystery distributions: Identify from data {data-difficulty="3"} A researcher collects data from three different experiments and computes summary statistics: **Dataset A:** $n = 500$ observations, all values are either 0 or 1. Sample mean $\approx 0.23$, sample variance $\approx 0.177$. **Dataset B:** $n = 1000$ observations, values range from 0 to 47. Sample mean $\approx 12.1$, sample variance $\approx 11.8$. **Dataset C:** $n = 800$ observations, values are positive reals ranging from 0.001 to 14.2. Sample mean $\approx 2.5$, sample variance $\approx 6.3$. For each dataset: - a) Identify the most likely distribution family. - b) Estimate the parameter(s) of that distribution from the summary statistics. - c) For Dataset B, the researcher notices that these are counts of customer complaints per day at a call center. Does this context support your answer? What if instead they were counts of "successes" in 50 independent trials per observation? --- ## Discrete Distributions ### 04 The "obvious" Bernoulli that isn't {data-difficulty="1"} A weighted die shows 6 with probability $\frac{1}{3}$ and each of 1–5 with probability $\frac{2}{15}$. - a) Define a Bernoulli random variable $X$ for "rolling a 6." State $p$ and compute $E[X]$ and $\text{Var}[X]$. - b) Define a different Bernoulli random variable $Y$ for "rolling an even number." Compute $E[Y]$ and $\text{Var}[Y]$. - c) For which event is the variance larger? Explain intuitively why maximum Bernoulli variance occurs at $p = 0.5$. --- ### 05 Memoryless waiting: Geometric intuition {data-difficulty="1"} A slot machine pays out with probability $p = 0.05$ on each play. - a) What is the expected number of plays until the first payout? - b) You've already played 50 times with no payout. What is the expected *additional* number of plays until you win? - c) A gambler says: "I'm due for a win soon because I've lost so many times." In 2–3 sentences, explain why this reasoning is flawed. --- ### 06 Binomial: Quality control decision {data-difficulty="1"} A factory produces chips with defect probability $p = 0.02$. A batch of $n = 100$ chips is inspected. - a) Let $X$ be the number of defective chips. State the distribution of $X$ and compute $E[X]$ and $\text{Var}[X]$. - b) The batch is rejected if more than 5 chips are defective. Without computing $P[X > 5]$ exactly, explain why $P[X > 5]$ is small. - c) If $p$ increases to $0.10$, recompute $E[X]$. How does this change the rejection decision intuitively? --- ### 07 Poisson: Rare events approximation {data-difficulty="2"} A website has 10,000 visitors per day. Each visitor independently has a $0.0003$ probability of reporting a bug. - a) Let $X$ be the number of bug reports per day. Which distribution is a good approximation here, and what is the parameter? - b) What is the probability of receiving at least one bug report? --- ## Continuous Distributions ### 08 Exponential: Memoryless lifetimes {data-difficulty="2"} A light bulb's lifetime (in years) follows $\text{Exp}(\lambda = 0.5)$. - a) Compute $E[X]$ and the probability that the bulb lasts more than 3 years. - b) Given that the bulb has already lasted 2 years, what is the probability it lasts at least 1 more year? - c) Compare with the discrete case: if bulb failure each year is Bernoulli with $p = 0.4$, and $Y \sim \text{Geo}(0.4)$ counts years until failure, compute $P[Y > 3 \mid Y > 2]$ and $P[Y > 1]$. What do you notice? --- ### 09 Uniform: The broken stick problem {data-difficulty="2"} A stick of length 1 is broken at a uniformly random point $X \sim U(0, 1)$. - a) What is the expected length of the left piece? - b) Let $Y = X(1 - X)$ be the product of the two piece lengths. Compute $E[Y]$. - c) What break point $x$ maximizes $Y = x(1-x)$? Compare this to $E[X]$. --- ### 10 Normal: The 68-95-99.7 rule in action {data-difficulty="1"} Human heights in a population follow $N(170, 100)$ (mean 170 cm, variance 100 cm²). - a) What is $\sigma$? What proportion of people are between 160 cm and 180 cm tall? - b) A person is 2.5 standard deviations above the mean. How tall are they? - c) Standardize the height $X = 155$ cm. Interpret the z-score: is this person unusually short? --- ### 11 Normal: Standardization and comparison {data-difficulty="2"} Test A has scores $\sim N(500, 10000)$ (so $\sigma = 100$). Test B has scores $\sim N(50, 100)$ (so $\sigma = 10$). - a) Alice scores 680 on Test A. Bob scores 72 on Test B. Compute both z-scores. - b) Who performed better relative to their test population? - c) Explain why comparing raw scores (680 vs 72) is meaningless without standardization. - d) You're given a coin that shows heads with unknown probability $p$. You flip it 100 times and observe 65 heads. If the coin were fair ($p = 0.5$), what are $E[X]$ and $\text{SD}[X]$ for the number of heads? How many standard deviations away from the mean is 65? What can you conclude about whether $p = 0.5$? --- ## Connections Between Distributions ### 12 The Poisson-Exponential connection {data-difficulty="2"} Customers arrive at a shop according to a Poisson process with rate $\lambda = 4$ per hour. - a) What distribution does the number of arrivals in 1 hour follow? State its mean and variance. - b) What distribution does the time between consecutive arrivals follow? State its mean. - c) If no customer has arrived in the last 15 minutes, what is the probability that the next customer arrives within 10 minutes? --- ### 13 The "inspection paradox" {data-difficulty="3"} Buses arrive according to a Poisson process with rate $\lambda = 6$ per hour (i.e., one every 10 minutes on average). You arrive at the bus stop at a uniformly random time. - a) What is the distribution of time between consecutive buses? Compute its expected value. - b) Intuitively, would you expect your average wait time to be 5 minutes (half the inter-arrival time)? - c) The "inspection paradox" says you're more likely to arrive during a *long* gap than a short one. Without computing, explain in 2–3 sentences why your expected wait might actually be *longer* than 5 minutes. --- ## Applications and Critical Thinking ### 14 The prosecutor's fallacy: Conditional thinking {data-difficulty="3"} In a city of 1 million people, a crime is committed. DNA evidence matches the suspect with a 1-in-10,000 error rate (i.e., a random person matches with probability 0.0001). - a) Model the number of matching individuals in the city as a random variable. What distribution is appropriate? What is its expected value? - b) The prosecutor argues: "The probability of a false match is 0.0001, so the defendant is 99.99% certain to be guilty." Is this reasoning correct? - c) If we assume the guilty person is definitely in the city, use Bayes-like reasoning to argue that the suspect's probability of guilt depends on the expected number of matches. # 🎲 xx+37 (xx) - ▶️[ToDo]() - 🔗[Random link ToDo]() - 🇦🇲🎶[ToDo]() - 🌐🎶[ToDo]() - 🤌[Կարգին ToDo]() <a href="http://s01.flagcounter.com/more/1oO"><img src="https://s01.flagcounter.com/count2/1oO/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_10/viewers_0/labels_0/pageviews_1/flags_0/percent_0/" alt="Flag Counter"></a>