22: Statistics — Estimator Properties, Fisher Info, Cramér-Rao

📚 Նյութը

⚠️ Note

YouTube links in this section were auto-extracted. If you spot a mistake, please let me know!

Դասախոսություն

Գործնական

🏡 Տնային

1) Exponential Family & Sufficiency

01 Poisson Meets the Exponential Family

Let $X$ be a random variable following a Poisson distribution with parameter $\lambda > 0$, i.e., $P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$ for $k = 0, 1, 2, \ldots$

1. Show that the Poisson distribution belongs to the exponential family by writing its PMF in the form \[f(x \mid \lambda) = h(x) \exp\!\big(\eta(\lambda)\, T(x) - A(\lambda)\big).\] Identify $h(x)$, $\eta(\lambda)$, $T(x)$, and $A(\lambda)$.
1. Using the exponential family form, what is the sufficient statistic for $\lambda$ based on an i.i.d. sample $X_1, \ldots, X_n$?

Solution

a) Rewrite the PMF by exponentiating the log:

\[P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} = \frac{1}{k!} \exp\!\big(k \log \lambda - \lambda\big).\]

Matching $h(x) \exp(\eta(\lambda) T(x) - A(\lambda))$:

\[h(x) = \frac{1}{x!}, \qquad \eta(\lambda) = \log \lambda, \qquad T(x) = x, \qquad A(\lambda) = \lambda.\]

So Poisson is a one-parameter exponential family with natural parameter $\log \lambda$.

b) For an i.i.d. sample, the joint PMF factors as

\[\prod_{i=1}^n \frac{1}{x_i!} \exp\!\Big(\log \lambda \cdot \sum_i x_i - n\lambda\Big).\]

By the Fisher-Neyman factorization theorem, the sufficient statistic is $\boxed{T(\mathbf{X}) = \sum_{i=1}^n X_i}$.

02 Slit Width Estimation

In an experiment, $n$ drops of solution are released uniformly through a slit onto a surface. We model the one-dimensional impact points $X_1, \ldots, X_n$ as i.i.d. $\mathrm{Uniform}(0, d)$, where the unknown slit width $d > 0$ is to be estimated.

1. Write down the joint density $f(\mathbf{x} \mid d)$ for the sample.
1. Using the Fisher–Neyman factorization theorem, show that $X_{(n)} = \max\{X_1, \ldots, X_n\}$ is sufficient for $d$.
1. Is $X_{(n)}$ unbiased for $d$? If not, find an unbiased estimator based on $X_{(n)}$.

Hint for (c): the CDF of $X_{(n)}$ is $F_{X_{(n)}}(x) = (x/d)^n$ for $0 \le x \le d$.

Solution

a) Each $X_i$ has density $\frac{1}{d} \mathbf{1}\{0 \le x_i \le d\}$, so

\[f(\mathbf{x} \mid d) = d^{-n} \prod_{i=1}^n \mathbf{1}\{0 \le x_i \le d\} = d^{-n} \cdot \mathbf{1}\{x_{(1)} \ge 0\} \cdot \mathbf{1}\{x_{(n)} \le d\}.\]

b) Split the joint density into a $d$-dependent piece and a data-only piece:

\[f(\mathbf{x} \mid d) = \underbrace{d^{-n} \mathbf{1}\{x_{(n)} \le d\}}_{g(T(\mathbf{x}), d)} \cdot \underbrace{\mathbf{1}\{x_{(1)} \ge 0\}}_{h(\mathbf{x})}.\]

The $d$-dependent factor depends on the data only through $T(\mathbf{x}) = x_{(n)}$. By the Fisher-Neyman factorization theorem, $X_{(n)}$ is sufficient for $d$.

c) From the hint, $f_{X_{(n)}}(x) = n x^{n-1}/d^n$ for $0 \le x \le d$. Then

\[\mathbb{E}[X_{(n)}] = \int_0^d x \cdot \frac{n x^{n-1}}{d^n} \, dx = \frac{n}{d^n} \cdot \frac{d^{n+1}}{n+1} = \frac{n}{n+1} d.\]

So $X_{(n)}$ is biased (it underestimates $d$). The unbiased estimator is

\[\boxed{\hat{d} = \frac{n+1}{n} X_{(n)}}.\]

03 Normal Variance: Minimal Sufficiency

Let $X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} N(\mu, \sigma^2)$ where $\sigma^2 > 0$ is unknown but $\mu$ is known.

1. Show that $T(\mathbf{X}) = \sum_{i=1}^{n}(X_i - \mu)^2$ is sufficient for $\sigma^2$ using the factorization theorem.
1. Using the likelihood ratio criterion, show that $T(\mathbf{X})$ is minimal sufficient for $\sigma^2$.

Recall: $T$ is minimal sufficient iff $T(\mathbf{x}) = T(\mathbf{y})$ $\Longleftrightarrow$ $\frac{f(\mathbf{x} \mid \sigma^2)}{f(\mathbf{y} \mid \sigma^2)}$ is free of $\sigma^2$.

Solution

a) The joint density is

\[f(\mathbf{x} \mid \sigma^2) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp\!\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right) = (2\pi\sigma^2)^{-n/2} \exp\!\left(-\frac{1}{2\sigma^2} \sum_{i=1}^n (x_i - \mu)^2\right).\]

Write this as

\[f(\mathbf{x} \mid \sigma^2) = \underbrace{(2\pi\sigma^2)^{-n/2} \exp\!\left(-\frac{T(\mathbf{x})}{2\sigma^2}\right)}_{g(T(\mathbf{x}), \sigma^2)} \cdot \underbrace{1}_{h(\mathbf{x})}\]

with $T(\mathbf{x}) = \sum_i (x_i - \mu)^2$ (recall $\mu$ is known, so $T$ is a function of the data alone). By the Fisher-Neyman factorization theorem, $T(\mathbf{X})$ is sufficient for $\sigma^2$.

b) Form the likelihood ratio:

\[\frac{f(\mathbf{x} \mid \sigma^2)}{f(\mathbf{y} \mid \sigma^2)} = \exp\!\left(-\frac{T(\mathbf{x}) - T(\mathbf{y})}{2\sigma^2}\right).\]

This ratio is free of $\sigma^2$ if and only if $T(\mathbf{x}) - T(\mathbf{y}) = 0$, i.e., $T(\mathbf{x}) = T(\mathbf{y})$. By the ratio characterization of minimal sufficiency, $\boxed{T(\mathbf{X}) = \sum_i (X_i - \mu)^2}$ is minimal sufficient for $\sigma^2$.

04 Binomial Sufficiency and Estimating $\pi^2$

Let $\mathbf{X} = (X_1, \ldots, X_n)^\top$ with $X_i \overset{\text{i.i.d.}}{\sim} \mathrm{Bernoulli}(\pi)$, where $\pi \in (0, 1)$. Define $U(\mathbf{X}) = \sum_{i=1}^{n} X_i$.

1. Show that $U(\mathbf{X})/n$ is unbiased for $\pi$.
1. Show that $U(\mathbf{X})$ is minimal sufficient for $\pi$.
1. Now consider the estimator for $\pi^2$: \[V(\mathbf{X}) = \frac{U(\mathbf{X})\left[U(\mathbf{X}) - 1\right]}{n(n-1)}.\] Verify that $V(\mathbf{X})$ is unbiased for $\pi^2$.

Hint for (c): expand $\mathbb{E}[U(U-1)]$ using $\mathbb{E}[U] = n\pi$ and $\operatorname{Var}(U) = n\pi(1-\pi)$.

Solution

a) Each $X_i$ has $\mathbb{E}[X_i] = \pi$, so

\[\mathbb{E}\!\left[\frac{U(\mathbf{X})}{n}\right] = \frac{1}{n} \sum_{i=1}^n \mathbb{E}[X_i] = \pi.\]

b) The joint PMF is

\[f(\mathbf{x} \mid \pi) = \prod_{i=1}^n \pi^{x_i}(1-\pi)^{1-x_i} = \pi^{u}(1-\pi)^{n-u}, \qquad u = \sum_i x_i.\]

This depends on the data only through $u = U(\mathbf{x})$, so by Fisher-Neyman, $U$ is sufficient. For minimality, the ratio

\[\frac{f(\mathbf{x} \mid \pi)}{f(\mathbf{y} \mid \pi)} = \left(\frac{\pi}{1-\pi}\right)^{U(\mathbf{x}) - U(\mathbf{y})}\]

is free of $\pi$ iff $U(\mathbf{x}) = U(\mathbf{y})$. So $\boxed{U(\mathbf{X})}$ is minimal sufficient for $\pi$.

c) Using $\operatorname{Var}(U) = \mathbb{E}[U^2] - \mathbb{E}[U]^2$, we get $\mathbb{E}[U^2] = n\pi(1-\pi) + (n\pi)^2$. Then

\[\mathbb{E}[U(U-1)] = \mathbb{E}[U^2] - \mathbb{E}[U] = n\pi(1-\pi) + n^2\pi^2 - n\pi = n^2\pi^2 - n\pi^2 = n(n-1)\pi^2.\]

Therefore

\[\mathbb{E}[V(\mathbf{X})] = \frac{n(n-1)\pi^2}{n(n-1)} = \pi^2,\]

so $V$ is unbiased for $\pi^2$.

2) Fisher Information & Cram'{e}r–Rao

05 Fisher Information for the Exponential

Let $X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \mathrm{Exp}(\lambda)$ with density $f(x \mid \lambda) = \lambda e^{-\lambda x}$ for $x \geq 0$.

1. Compute the score function $s(\lambda) = \frac{\partial}{\partial \lambda} \log f(X \mid \lambda)$.
1. Verify that $\mathbb{E}[s(\lambda)] = 0$.
1. Compute the Fisher information $I(\lambda) = \operatorname{Var}[s(\lambda)]$.
1. Verify your answer in (c) by computing $I(\lambda)$ via the second-derivative formula: $I(\lambda) = -\mathbb{E}\!\left[\frac{\partial^2}{\partial\lambda^2}\log f(X \mid \lambda)\right]$.
1. The MLE for $\lambda$ is $\hat{\lambda} = 1/\bar{X}$, which has $\operatorname{Var}(\hat{\lambda}) \approx \lambda^2/n$ for large $n$. Compare this with the Cram'{e}r–Rao lower bound. Is $\hat{\lambda}$ asymptotically efficient?

Solution

a) $\log f(x \mid \lambda) = \log \lambda - \lambda x$. Differentiating with respect to $\lambda$:

\[s(\lambda) = \frac{\partial}{\partial \lambda} \log f(X \mid \lambda) = \frac{1}{\lambda} - X.\]

b) Since $X \sim \mathrm{Exp}(\lambda)$ has $\mathbb{E}[X] = 1/\lambda$,

\[\mathbb{E}[s(\lambda)] = \frac{1}{\lambda} - \mathbb{E}[X] = \frac{1}{\lambda} - \frac{1}{\lambda} = 0.\]

c) Because $\mathbb{E}[s] = 0$, $I(\lambda) = \operatorname{Var}[s(\lambda)] = \operatorname{Var}(1/\lambda - X) = \operatorname{Var}(X) = 1/\lambda^2$. So

\[\boxed{I(\lambda) = \frac{1}{\lambda^2}}.\]

d) From $s(\lambda) = 1/\lambda - X$,

\[\frac{\partial^2}{\partial \lambda^2} \log f(X \mid \lambda) = -\frac{1}{\lambda^2}.\]

Taking $-\mathbb{E}$ of a constant gives $I(\lambda) = 1/\lambda^2$, matching part (c).

e) The Cramér-Rao lower bound for unbiased estimators of $\lambda$ based on $n$ i.i.d. observations is

\[\operatorname{Var}(\hat{\lambda}) \ge \frac{1}{n \cdot I(\lambda)} = \frac{\lambda^2}{n}.\]

The MLE has $\operatorname{Var}(\hat{\lambda}) \approx \lambda^2/n$, which matches the bound asymptotically. So $\hat{\lambda} = 1/\bar{X}$ is asymptotically efficient.

06 Cramér–Rao: When Can We Beat $1/n$?

Let $X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \mathrm{Bernoulli}(p)$.

1. We know $I(p) = \frac{1}{p(1-p)}$. Write down the Cram'{e}r–Rao lower bound for any unbiased estimator of $p$.
1. The sample proportion $\hat{p} = \bar{X}$ has $\operatorname{Var}(\hat{p}) = \frac{p(1-p)}{n}$. Is $\hat{p}$ efficient?
1. Now consider estimating $g(p) = p^2$ instead of $p$. The Cram'{e}r–Rao bound for unbiased estimators of $g(\theta)$ is \[\operatorname{Var}(\hat{g}) \geq \frac{[g'(\theta)]^2}{n \cdot I(\theta)}.\] Compute the CR bound for unbiased estimators of $p^2$.
1. We showed in Problem 04(c) that $V(\mathbf{X}) = \frac{U(U-1)}{n(n-1)}$ is unbiased for $p^2$. Compute $\operatorname{Var}(V)$ (at least for large $n$). Does it achieve the CR bound?

Solution

a) With $n$ i.i.d. observations,

\[\operatorname{Var}(\hat{p}) \ge \frac{1}{n \cdot I(p)} = \boxed{\frac{p(1-p)}{n}}.\]

b) $\operatorname{Var}(\hat{p}) = p(1-p)/n$ exactly equals the CR bound, so $\hat{p}$ is efficient (uniformly in $p$, not just asymptotically).

c) Here $g(p) = p^2$, so $g'(p) = 2p$. The CR bound for unbiased estimators of $p^2$ is

\[\operatorname{Var}(\hat{g}) \ge \frac{[g'(p)]^2}{n \cdot I(p)} = \frac{4p^2 \cdot p(1-p)}{n} = \boxed{\frac{4p^3(1-p)}{n}}.\]

d) Note $V = \frac{U(U-1)}{n(n-1)} = \frac{n}{n-1} \hat{p}^2 - \frac{1}{n-1}\hat{p}$. For large $n$, $V \approx \hat{p}^2$.

To get $\operatorname{Var}(\hat{p}^2)$ for large $n$, use a first-order Taylor expansion of $\phi(p) = p^2$ around the true $p$. Since $\hat{p}$ concentrates near $p$ (its variance is $O(1/n)$), we can write

\[\hat{p}^2 \approx p^2 + 2p \cdot (\hat{p} - p),\]

so the random part of $\hat{p}^2$ is approximately $2p \cdot (\hat{p} - p)$. Therefore

\[\operatorname{Var}(\hat{p}^2) \approx (2p)^2 \cdot \operatorname{Var}(\hat{p}) = 4p^2 \cdot \frac{p(1-p)}{n} = \frac{4p^3(1-p)}{n}.\]

This matches the Cramér-Rao bound from (c) asymptotically, so $V$ is asymptotically efficient for $p^2$. (For finite $n$, $V$ does not exactly achieve the bound.)

3) Admissibility & Minimax

07 Admissibility: A Sketch Exercise

Suppose there exist exactly three estimators $T_1$, $T_2$, and $T_3$ for a parameter $\theta \in [0, 1]$.

1. Sketch an example of the MSE curves $\mathrm{MSE}(T_i, \theta)$ as functions of $\theta$ such that $T_1$ and $T_2$ are admissible, but $T_3$ is not admissible. Explain why your sketch works.
1. Now sketch (possibly different) risk functions for $T_1$, $T_2$, and $T_3$ such that $T_1$ is the minimax estimator. Must $T_1$ have the lowest MSE everywhere?

Solution

a) Plot $\theta$ on the horizontal axis (over $[0, 1]$) and MSE on the vertical axis. A working sketch:

$\mathrm{MSE}(T_1, \theta)$: a curve that is low near $\theta = 0$ and rises monotonically toward $\theta = 1$.
$\mathrm{MSE}(T_2, \theta)$: a curve that is low near $\theta = 1$ and rises monotonically toward $\theta = 0$ (mirror of $T_1$). The two curves cross somewhere in $(0, 1)$.
$\mathrm{MSE}(T_3, \theta)$: a curve that lies strictly above both $\mathrm{MSE}(T_1, \theta)$ and $\mathrm{MSE}(T_2, \theta)$ for every $\theta \in [0, 1]$.

Why it works. Recall: $T'$ dominates $T$ iff $\mathrm{MSE}(T', \theta) \le \mathrm{MSE}(T, \theta)$ for all $\theta$ with strict inequality somewhere; $T$ is admissible iff no $T'$ dominates it.

$T_1$ admissible: $T_2$ doesn’t dominate it (since $T_1$ beats $T_2$ at small $\theta$); $T_3$ is uniformly worse, so certainly doesn’t dominate. No estimator dominates $T_1$.
$T_2$ admissible: symmetric - $T_1$ beats $T_2$ on the left so $T_1$ doesn’t dominate; $T_3$ uniformly worse so doesn’t dominate.
$T_3$ inadmissible: $T_1$ dominates $T_3$ by construction (and so does $T_2$).

b) Minimax means smallest worst-case MSE: $T_1$ minimizes $\sup_{\theta} \mathrm{MSE}(T, \theta)$. A sketch:

$\mathrm{MSE}(T_1, \theta)$: roughly flat at a moderate level $M$ across all $\theta$.
$\mathrm{MSE}(T_2, \theta)$: very low for $\theta$ near $0$ but spikes well above $M$ as $\theta \to 1$.
$\mathrm{MSE}(T_3, \theta)$: very low for $\theta$ near $1$ but spikes well above $M$ as $\theta \to 0$.

Then $\sup_\theta \mathrm{MSE}(T_1, \theta) = M$, while both $\sup_\theta \mathrm{MSE}(T_2, \theta)$ and $\sup_\theta \mathrm{MSE}(T_3, \theta)$ exceed $M$. So $T_1$ is minimax.

No - $T_1$ does not need the lowest MSE everywhere. In fact, in this sketch $T_2$ beats $T_1$ for small $\theta$ and $T_3$ beats it for large $\theta$. Minimaxity is purely about worst-case behavior, not pointwise dominance.

🎲 xx+37 (xx)

▶️ToDo
🔗Random link ToDo
🇦🇲🎶ToDo
🌐🎶ToDo
🤌Կարգին ToDo

--- title: "22: Statistics — Estimator Properties, Fisher Info, Cramér-Rao" format: html: css: homework-styles.css --- <script src="homework-scripts.js"></script> # 📚 Նյութը ::: {.callout-tip collapse="true"} ## ⚠️ Note YouTube links in this section were auto-extracted. If you spot a mistake, please let me know! ::: ## Դասախոսություն - [📺 Դասախոսություն — Estimators: Bias, MSE, Bias-Variance tradeoff](https://youtu.be/lAKPMjqQ6vc) - [📺 Դասախոսություն — Estimators: Consistency, sufficiency](https://youtu.be/Ye0ZsTDnPx4) - [📺 Դասախոսություն — Fisher information, Cramér-Rao, Minimax](https://youtu.be/DDdzHnQsyrA) - [🎞️ Սլայդեր — 03 stat](Lectures/stat/03_stat.pdf), [📝 Notes](Lectures/stat/03_stat_notes.pdf) - [🎞️ Սլայդեր — 04 stat](Lectures/stat/04_stat.pdf), [📝 Notes](Lectures/stat/04_stat_notes.pdf) ## Գործնական - [📺 Գործնական — Minimal sufficiency, Fisher-Neyman theorem](https://youtu.be/fxYxoHIz-P0) - [🛠️🗂️ Գործնականի PDF-ը](Homeworks/hw_14_stat_1_estimator_properties.pdf) # 🏡 Տնային --- ## 1) Exponential Family & Sufficiency ### 01 Poisson Meets the Exponential Family {data-difficulty="1"} Let $X$ be a random variable following a Poisson distribution with parameter $\lambda > 0$, i.e., $P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$ for $k = 0, 1, 2, \ldots$ - a) Show that the Poisson distribution belongs to the exponential family by writing its PMF in the form $$f(x \mid \lambda) = h(x) \exp\!\big(\eta(\lambda)\, T(x) - A(\lambda)\big).$$ Identify $h(x)$, $\eta(\lambda)$, $T(x)$, and $A(\lambda)$. - b) Using the exponential family form, what is the sufficient statistic for $\lambda$ based on an i.i.d. sample $X_1, \ldots, X_n$? ::: {.callout-tip collapse="true" title="Solution"} **a)** Rewrite the PMF by exponentiating the log: $$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} = \frac{1}{k!} \exp\!\big(k \log \lambda - \lambda\big).$$ Matching $h(x) \exp(\eta(\lambda) T(x) - A(\lambda))$: $$h(x) = \frac{1}{x!}, \qquad \eta(\lambda) = \log \lambda, \qquad T(x) = x, \qquad A(\lambda) = \lambda.$$ So Poisson is a one-parameter exponential family with natural parameter $\log \lambda$. **b)** For an i.i.d. sample, the joint PMF factors as $$\prod_{i=1}^n \frac{1}{x_i!} \exp\!\Big(\log \lambda \cdot \sum_i x_i - n\lambda\Big).$$ By the Fisher-Neyman factorization theorem, the sufficient statistic is $\boxed{T(\mathbf{X}) = \sum_{i=1}^n X_i}$. ::: ### 02 Slit Width Estimation {data-difficulty="2"} In an experiment, $n$ drops of solution are released uniformly through a slit onto a surface. We model the one-dimensional impact points $X_1, \ldots, X_n$ as i.i.d. $\mathrm{Uniform}(0, d)$, where the unknown slit width $d > 0$ is to be estimated. - a) Write down the joint density $f(\mathbf{x} \mid d)$ for the sample. - b) Using the Fisher--Neyman factorization theorem, show that $X_{(n)} = \max\{X_1, \ldots, X_n\}$ is sufficient for $d$. - c) Is $X_{(n)}$ unbiased for $d$? If not, find an unbiased estimator based on $X_{(n)}$. *Hint for (c): the CDF of $X_{(n)}$ is $F_{X_{(n)}}(x) = (x/d)^n$ for $0 \le x \le d$.* ::: {.callout-tip collapse="true" title="Solution"} **a)** Each $X_i$ has density $\frac{1}{d} \mathbf{1}\{0 \le x_i \le d\}$, so $$f(\mathbf{x} \mid d) = d^{-n} \prod_{i=1}^n \mathbf{1}\{0 \le x_i \le d\} = d^{-n} \cdot \mathbf{1}\{x_{(1)} \ge 0\} \cdot \mathbf{1}\{x_{(n)} \le d\}.$$ **b)** Split the joint density into a $d$-dependent piece and a data-only piece: $$f(\mathbf{x} \mid d) = \underbrace{d^{-n} \mathbf{1}\{x_{(n)} \le d\}}_{g(T(\mathbf{x}), d)} \cdot \underbrace{\mathbf{1}\{x_{(1)} \ge 0\}}_{h(\mathbf{x})}.$$ The $d$-dependent factor depends on the data only through $T(\mathbf{x}) = x_{(n)}$. By the Fisher-Neyman factorization theorem, $X_{(n)}$ is sufficient for $d$. **c)** From the hint, $f_{X_{(n)}}(x) = n x^{n-1}/d^n$ for $0 \le x \le d$. Then $$\mathbb{E}[X_{(n)}] = \int_0^d x \cdot \frac{n x^{n-1}}{d^n} \, dx = \frac{n}{d^n} \cdot \frac{d^{n+1}}{n+1} = \frac{n}{n+1} d.$$ So $X_{(n)}$ is biased (it underestimates $d$). The unbiased estimator is $$\boxed{\hat{d} = \frac{n+1}{n} X_{(n)}}.$$ ::: ### 03 Normal Variance: Minimal Sufficiency {data-difficulty="2"} Let $X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} N(\mu, \sigma^2)$ where $\sigma^2 > 0$ is unknown but $\mu$ is **known**. - a) Show that $T(\mathbf{X}) = \sum_{i=1}^{n}(X_i - \mu)^2$ is sufficient for $\sigma^2$ using the factorization theorem. - b) Using the likelihood ratio criterion, show that $T(\mathbf{X})$ is **minimal** sufficient for $\sigma^2$. *Recall: $T$ is minimal sufficient iff $T(\mathbf{x}) = T(\mathbf{y})$ $\Longleftrightarrow$ $\frac{f(\mathbf{x} \mid \sigma^2)}{f(\mathbf{y} \mid \sigma^2)}$ is free of $\sigma^2$.* ::: {.callout-tip collapse="true" title="Solution"} **a)** The joint density is $$f(\mathbf{x} \mid \sigma^2) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp\!\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right) = (2\pi\sigma^2)^{-n/2} \exp\!\left(-\frac{1}{2\sigma^2} \sum_{i=1}^n (x_i - \mu)^2\right).$$ Write this as $$f(\mathbf{x} \mid \sigma^2) = \underbrace{(2\pi\sigma^2)^{-n/2} \exp\!\left(-\frac{T(\mathbf{x})}{2\sigma^2}\right)}_{g(T(\mathbf{x}), \sigma^2)} \cdot \underbrace{1}_{h(\mathbf{x})}$$ with $T(\mathbf{x}) = \sum_i (x_i - \mu)^2$ (recall $\mu$ is known, so $T$ is a function of the data alone). By the Fisher-Neyman factorization theorem, $T(\mathbf{X})$ is sufficient for $\sigma^2$. **b)** Form the likelihood ratio: $$\frac{f(\mathbf{x} \mid \sigma^2)}{f(\mathbf{y} \mid \sigma^2)} = \exp\!\left(-\frac{T(\mathbf{x}) - T(\mathbf{y})}{2\sigma^2}\right).$$ This ratio is free of $\sigma^2$ if and only if $T(\mathbf{x}) - T(\mathbf{y}) = 0$, i.e., $T(\mathbf{x}) = T(\mathbf{y})$. By the ratio characterization of minimal sufficiency, $\boxed{T(\mathbf{X}) = \sum_i (X_i - \mu)^2}$ is minimal sufficient for $\sigma^2$. ::: ### 04 Binomial Sufficiency and Estimating $\pi^2$ {data-difficulty="2"} Let $\mathbf{X} = (X_1, \ldots, X_n)^\top$ with $X_i \overset{\text{i.i.d.}}{\sim} \mathrm{Bernoulli}(\pi)$, where $\pi \in (0, 1)$. Define $U(\mathbf{X}) = \sum_{i=1}^{n} X_i$. - a) Show that $U(\mathbf{X})/n$ is unbiased for $\pi$. - b) Show that $U(\mathbf{X})$ is **minimal** sufficient for $\pi$. - c) Now consider the estimator for $\pi^2$: $$V(\mathbf{X}) = \frac{U(\mathbf{X})\left[U(\mathbf{X}) - 1\right]}{n(n-1)}.$$ Verify that $V(\mathbf{X})$ is unbiased for $\pi^2$. *Hint for (c): expand $\mathbb{E}[U(U-1)]$ using $\mathbb{E}[U] = n\pi$ and $\operatorname{Var}(U) = n\pi(1-\pi)$.* ::: {.callout-tip collapse="true" title="Solution"} **a)** Each $X_i$ has $\mathbb{E}[X_i] = \pi$, so $$\mathbb{E}\!\left[\frac{U(\mathbf{X})}{n}\right] = \frac{1}{n} \sum_{i=1}^n \mathbb{E}[X_i] = \pi.$$ **b)** The joint PMF is $$f(\mathbf{x} \mid \pi) = \prod_{i=1}^n \pi^{x_i}(1-\pi)^{1-x_i} = \pi^{u}(1-\pi)^{n-u}, \qquad u = \sum_i x_i.$$ This depends on the data only through $u = U(\mathbf{x})$, so by Fisher-Neyman, $U$ is sufficient. For minimality, the ratio $$\frac{f(\mathbf{x} \mid \pi)}{f(\mathbf{y} \mid \pi)} = \left(\frac{\pi}{1-\pi}\right)^{U(\mathbf{x}) - U(\mathbf{y})}$$ is free of $\pi$ iff $U(\mathbf{x}) = U(\mathbf{y})$. So $\boxed{U(\mathbf{X})}$ is minimal sufficient for $\pi$. **c)** Using $\operatorname{Var}(U) = \mathbb{E}[U^2] - \mathbb{E}[U]^2$, we get $\mathbb{E}[U^2] = n\pi(1-\pi) + (n\pi)^2$. Then $$\mathbb{E}[U(U-1)] = \mathbb{E}[U^2] - \mathbb{E}[U] = n\pi(1-\pi) + n^2\pi^2 - n\pi = n^2\pi^2 - n\pi^2 = n(n-1)\pi^2.$$ Therefore $$\mathbb{E}[V(\mathbf{X})] = \frac{n(n-1)\pi^2}{n(n-1)} = \pi^2,$$ so $V$ is unbiased for $\pi^2$. ::: --- ## 2) Fisher Information & Cram\'{e}r--Rao ### 05 Fisher Information for the Exponential {data-difficulty="2"} Let $X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \mathrm{Exp}(\lambda)$ with density $f(x \mid \lambda) = \lambda e^{-\lambda x}$ for $x \geq 0$. - a) Compute the score function $s(\lambda) = \frac{\partial}{\partial \lambda} \log f(X \mid \lambda)$. - b) Verify that $\mathbb{E}[s(\lambda)] = 0$. - c) Compute the Fisher information $I(\lambda) = \operatorname{Var}[s(\lambda)]$. - d) Verify your answer in (c) by computing $I(\lambda)$ via the second-derivative formula: $I(\lambda) = -\mathbb{E}\!\left[\frac{\partial^2}{\partial\lambda^2}\log f(X \mid \lambda)\right]$. - e) The MLE for $\lambda$ is $\hat{\lambda} = 1/\bar{X}$, which has $\operatorname{Var}(\hat{\lambda}) \approx \lambda^2/n$ for large $n$. Compare this with the Cram\'{e}r--Rao lower bound. Is $\hat{\lambda}$ asymptotically efficient? ::: {.callout-tip collapse="true" title="Solution"} **a)** $\log f(x \mid \lambda) = \log \lambda - \lambda x$. Differentiating with respect to $\lambda$: $$s(\lambda) = \frac{\partial}{\partial \lambda} \log f(X \mid \lambda) = \frac{1}{\lambda} - X.$$ **b)** Since $X \sim \mathrm{Exp}(\lambda)$ has $\mathbb{E}[X] = 1/\lambda$, $$\mathbb{E}[s(\lambda)] = \frac{1}{\lambda} - \mathbb{E}[X] = \frac{1}{\lambda} - \frac{1}{\lambda} = 0.$$ **c)** Because $\mathbb{E}[s] = 0$, $I(\lambda) = \operatorname{Var}[s(\lambda)] = \operatorname{Var}(1/\lambda - X) = \operatorname{Var}(X) = 1/\lambda^2$. So $$\boxed{I(\lambda) = \frac{1}{\lambda^2}}.$$ **d)** From $s(\lambda) = 1/\lambda - X$, $$\frac{\partial^2}{\partial \lambda^2} \log f(X \mid \lambda) = -\frac{1}{\lambda^2}.$$ Taking $-\mathbb{E}$ of a constant gives $I(\lambda) = 1/\lambda^2$, matching part (c). **e)** The Cramér-Rao lower bound for unbiased estimators of $\lambda$ based on $n$ i.i.d. observations is $$\operatorname{Var}(\hat{\lambda}) \ge \frac{1}{n \cdot I(\lambda)} = \frac{\lambda^2}{n}.$$ The MLE has $\operatorname{Var}(\hat{\lambda}) \approx \lambda^2/n$, which matches the bound asymptotically. So $\hat{\lambda} = 1/\bar{X}$ is **asymptotically efficient**. ::: ### 06 Cramér--Rao: When Can We Beat $1/n$? {data-difficulty="3"} Let $X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \mathrm{Bernoulli}(p)$. - a) We know $I(p) = \frac{1}{p(1-p)}$. Write down the Cram\'{e}r--Rao lower bound for any unbiased estimator of $p$. - b) The sample proportion $\hat{p} = \bar{X}$ has $\operatorname{Var}(\hat{p}) = \frac{p(1-p)}{n}$. Is $\hat{p}$ efficient? - c) Now consider estimating $g(p) = p^2$ instead of $p$. The Cram\'{e}r--Rao bound for unbiased estimators of $g(\theta)$ is $$\operatorname{Var}(\hat{g}) \geq \frac{[g'(\theta)]^2}{n \cdot I(\theta)}.$$ Compute the CR bound for unbiased estimators of $p^2$. - d) We showed in Problem 04(c) that $V(\mathbf{X}) = \frac{U(U-1)}{n(n-1)}$ is unbiased for $p^2$. Compute $\operatorname{Var}(V)$ (at least for large $n$). Does it achieve the CR bound? ::: {.callout-tip collapse="true" title="Solution"} **a)** With $n$ i.i.d. observations, $$\operatorname{Var}(\hat{p}) \ge \frac{1}{n \cdot I(p)} = \boxed{\frac{p(1-p)}{n}}.$$ **b)** $\operatorname{Var}(\hat{p}) = p(1-p)/n$ exactly equals the CR bound, so $\hat{p}$ is **efficient** (uniformly in $p$, not just asymptotically). **c)** Here $g(p) = p^2$, so $g'(p) = 2p$. The CR bound for unbiased estimators of $p^2$ is $$\operatorname{Var}(\hat{g}) \ge \frac{[g'(p)]^2}{n \cdot I(p)} = \frac{4p^2 \cdot p(1-p)}{n} = \boxed{\frac{4p^3(1-p)}{n}}.$$ **d)** Note $V = \frac{U(U-1)}{n(n-1)} = \frac{n}{n-1} \hat{p}^2 - \frac{1}{n-1}\hat{p}$. For large $n$, $V \approx \hat{p}^2$. To get $\operatorname{Var}(\hat{p}^2)$ for large $n$, use a first-order Taylor expansion of $\phi(p) = p^2$ around the true $p$. Since $\hat{p}$ concentrates near $p$ (its variance is $O(1/n)$), we can write $$\hat{p}^2 \approx p^2 + 2p \cdot (\hat{p} - p),$$ so the random part of $\hat{p}^2$ is approximately $2p \cdot (\hat{p} - p)$. Therefore $$\operatorname{Var}(\hat{p}^2) \approx (2p)^2 \cdot \operatorname{Var}(\hat{p}) = 4p^2 \cdot \frac{p(1-p)}{n} = \frac{4p^3(1-p)}{n}.$$ This matches the Cramér-Rao bound from (c) asymptotically, so $V$ is **asymptotically efficient** for $p^2$. (For finite $n$, $V$ does not exactly achieve the bound.) ::: --- ## 3) Admissibility & Minimax ### 07 Admissibility: A Sketch Exercise {data-difficulty="1"} Suppose there exist exactly three estimators $T_1$, $T_2$, and $T_3$ for a parameter $\theta \in [0, 1]$. - a) Sketch an example of the MSE curves $\mathrm{MSE}(T_i, \theta)$ as functions of $\theta$ such that $T_1$ and $T_2$ are **admissible**, but $T_3$ is **not** admissible. Explain why your sketch works. - b) Now sketch (possibly different) risk functions for $T_1$, $T_2$, and $T_3$ such that $T_1$ is the **minimax** estimator. Must $T_1$ have the lowest MSE everywhere? ::: {.callout-tip collapse="true" title="Solution"} **a)** Plot $\theta$ on the horizontal axis (over $[0, 1]$) and MSE on the vertical axis. A working sketch: - $\mathrm{MSE}(T_1, \theta)$: a curve that is low near $\theta = 0$ and rises monotonically toward $\theta = 1$. - $\mathrm{MSE}(T_2, \theta)$: a curve that is low near $\theta = 1$ and rises monotonically toward $\theta = 0$ (mirror of $T_1$). The two curves cross somewhere in $(0, 1)$. - $\mathrm{MSE}(T_3, \theta)$: a curve that lies **strictly above both** $\mathrm{MSE}(T_1, \theta)$ and $\mathrm{MSE}(T_2, \theta)$ for every $\theta \in [0, 1]$. **Why it works.** Recall: $T'$ dominates $T$ iff $\mathrm{MSE}(T', \theta) \le \mathrm{MSE}(T, \theta)$ for all $\theta$ with strict inequality somewhere; $T$ is admissible iff no $T'$ dominates it. - $T_1$ admissible: $T_2$ doesn't dominate it (since $T_1$ beats $T_2$ at small $\theta$); $T_3$ is uniformly worse, so certainly doesn't dominate. No estimator dominates $T_1$. - $T_2$ admissible: symmetric - $T_1$ beats $T_2$ on the left so $T_1$ doesn't dominate; $T_3$ uniformly worse so doesn't dominate. - $T_3$ inadmissible: $T_1$ dominates $T_3$ by construction (and so does $T_2$). **b)** Minimax means smallest worst-case MSE: $T_1$ minimizes $\sup_{\theta} \mathrm{MSE}(T, \theta)$. A sketch: - $\mathrm{MSE}(T_1, \theta)$: roughly **flat** at a moderate level $M$ across all $\theta$. - $\mathrm{MSE}(T_2, \theta)$: very low for $\theta$ near $0$ but spikes well above $M$ as $\theta \to 1$. - $\mathrm{MSE}(T_3, \theta)$: very low for $\theta$ near $1$ but spikes well above $M$ as $\theta \to 0$. Then $\sup_\theta \mathrm{MSE}(T_1, \theta) = M$, while both $\sup_\theta \mathrm{MSE}(T_2, \theta)$ and $\sup_\theta \mathrm{MSE}(T_3, \theta)$ exceed $M$. So $T_1$ is minimax. **No - $T_1$ does not need the lowest MSE everywhere.** In fact, in this sketch $T_2$ beats $T_1$ for small $\theta$ and $T_3$ beats it for large $\theta$. Minimaxity is purely about worst-case behavior, not pointwise dominance. ::: # 🎲 xx+37 (xx) - ▶️[ToDo]() - 🔗[Random link ToDo]() - 🇦🇲🎶[ToDo]() - 🌐🎶[ToDo]() - 🤌[Կարգին ToDo]() <a href="http://s01.flagcounter.com/more/1oO"><img src="https://s01.flagcounter.com/count2/1oO/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_10/viewers_0/labels_0/pageviews_1/flags_0/percent_0/" alt="Flag Counter"></a>

22: Statistics — Estimator Properties, Fisher Info, Cramér-Rao

📚 Նյութը

Դասախոսություն

Գործնական

🏡 Տնային

1) Exponential Family & Sufficiency

01 Poisson Meets the Exponential Family

02 Slit Width Estimation

03 Normal Variance: Minimal Sufficiency

04 Binomial Sufficiency and Estimating \(\pi^2\)

2) Fisher Information & Cram'{e}r–Rao

05 Fisher Information for the Exponential

06 Cramér–Rao: When Can We Beat \(1/n\)?

3) Admissibility & Minimax

07 Admissibility: A Sketch Exercise

🎲 xx+37 (xx)