22: Statistics - Properties of Estimators
📚 Նյութը
- 📺 Properties of Estimators: Bias, MSE, Consistency, CR Bound (ToDo), 🎞️ Սլայդեր
- 📺 MLE: Maximum Likelihood Estimation (ToDo), 🎞️ Սլայդեր
- 🛠️📺 Practical (ToDo)
🏡 Տնային
1) Exponential Family & Sufficiency
01 Poisson Meets the Exponential Family
Let \(X\) be a random variable following a Poisson distribution with parameter \(\lambda > 0\), i.e., \(P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}\) for \(k = 0, 1, 2, \ldots\)
- Show that the Poisson distribution belongs to the exponential family by writing its PMF in the form \[f(x \mid \lambda) = h(x) \exp\!\big(\eta(\lambda)\, T(x) - A(\lambda)\big).\] Identify \(h(x)\), \(\eta(\lambda)\), \(T(x)\), and \(A(\lambda)\).
- Using the exponential family form, what is the sufficient statistic for \(\lambda\) based on an i.i.d. sample \(X_1, \ldots, X_n\)?
02 Slit Width Estimation
In an experiment, \(n\) drops of solution are released uniformly through a slit onto a surface. We model the one-dimensional impact points \(X_1, \ldots, X_n\) as i.i.d. \(\mathrm{Uniform}(0, d)\), where the unknown slit width \(d > 0\) is to be estimated.
- Write down the joint density \(f(\mathbf{x} \mid d)\) for the sample.
- Using the Fisher–Neyman factorization theorem, show that \(X_{(n)} = \max\{X_1, \ldots, X_n\}\) is sufficient for \(d\).
- Is \(X_{(n)}\) unbiased for \(d\)? If not, find an unbiased estimator based on \(X_{(n)}\).
Hint for (c): the CDF of \(X_{(n)}\) is \(F_{X_{(n)}}(x) = (x/d)^n\) for \(0 \le x \le d\).
03 Normal Variance: Minimal Sufficiency
Let \(X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} N(\mu, \sigma^2)\) where \(\sigma^2 > 0\) is unknown but \(\mu\) is known.
- Show that \(T(\mathbf{X}) = \sum_{i=1}^{n}(X_i - \mu)^2\) is sufficient for \(\sigma^2\) using the factorization theorem.
- Using the likelihood ratio criterion, show that \(T(\mathbf{X})\) is minimal sufficient for \(\sigma^2\).
Recall: \(T\) is minimal sufficient iff \(T(\mathbf{x}) = T(\mathbf{y})\) \(\Longleftrightarrow\) \(\frac{f(\mathbf{x} \mid \sigma^2)}{f(\mathbf{y} \mid \sigma^2)}\) is free of \(\sigma^2\).
04 Binomial Sufficiency and Estimating \(\pi^2\)
Let \(\mathbf{X} = (X_1, \ldots, X_n)^\top\) with \(X_i \overset{\text{i.i.d.}}{\sim} \mathrm{Bernoulli}(\pi)\), where \(\pi \in (0, 1)\). Define \(U(\mathbf{X}) = \sum_{i=1}^{n} X_i\).
- Show that \(U(\mathbf{X})/n\) is unbiased for \(\pi\).
- Show that \(U(\mathbf{X})\) is minimal sufficient for \(\pi\).
- Now consider the estimator for \(\pi^2\): \[V(\mathbf{X}) = \frac{U(\mathbf{X})\left[U(\mathbf{X}) - 1\right]}{n(n-1)}.\] Verify that \(V(\mathbf{X})\) is unbiased for \(\pi^2\).
Hint for (c): expand \(\mathbb{E}[U(U-1)]\) using \(\mathbb{E}[U] = n\pi\) and \(\operatorname{Var}(U) = n\pi(1-\pi)\).
2) Bias, Variance & Risk Functions
05 Logistic Risk Showdown
Let \(X_1, \ldots, X_n\) be i.i.d. from a logistic distribution with parameters \(a \in \mathbb{R}\) (location) and \(b \in \mathbb{R}^+\) (scale). The density is \[f(x) = \frac{\exp\!\left(-\frac{x - a}{b}\right)}{b\left(1 + \exp\!\left(-\frac{x - a}{b}\right)\right)^2}, \quad x \in \mathbb{R},\] with \(\mathbb{E}[X] = a\) and \(\operatorname{Var}(X) = b^2 \pi^2 / 3\).
We want to estimate the location parameter \(a\) using the quadratic loss \(L(\hat{a}, a) = (\hat{a} - a)^2\), and compare two estimators:
- \(T(\mathbf{X}) = \bar{X} = \frac{1}{n}\sum_{i=1}^{n} X_i\) (the sample mean),
- \(U(\mathbf{X}, c) = \frac{1}{2}c + \frac{1}{2}\bar{X}\), where \(c \in \mathbb{R}\) is a fixed constant.
Answer the following:
- Compute the risk function \(R(a, T) = \mathbb{E}_a\!\left[(T - a)^2\right]\) for the estimator \(T\).
- Compute the risk function \(R(a, U) = \mathbb{E}_a\!\left[(U - a)^2\right]\) for the estimator \(U\).
- Sketch both risk functions on the same plot for \(a \in [-5, 5]\) with \(b = 3\), \(c = 1\), and \(n = 10\). Which estimator has lower risk near \(a = c\)? Which has lower risk when \(|a|\) is large?
- Which of the two estimators is the minimax estimator? Justify your answer.
06 Shrinking the Sample Mean
Let \(X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} N(\mu, \sigma^2)\) with \(\sigma^2\) known. Instead of the usual \(\bar{X}\), consider the “shrinkage” estimator \[\hat{\mu}_c = c \cdot \bar{X}, \quad 0 < c < 1.\]
- Find the bias of \(\hat{\mu}_c\).
- Find the variance of \(\hat{\mu}_c\).
- Show that \(\mathrm{MSE}(\hat{\mu}_c) = c^2 \frac{\sigma^2}{n} + (1 - c)^2 \mu^2\).
- For what value of \(c\) is the MSE minimized? Is the optimal estimator biased?
- What is the problem with the “optimal” \(c\) from (d) in practice?
3) Fisher Information & Cram'{e}r–Rao
07 Fisher Information for the Exponential
Let \(X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \mathrm{Exp}(\lambda)\) with density \(f(x \mid \lambda) = \lambda e^{-\lambda x}\) for \(x \geq 0\).
- Compute the score function \(s(\lambda) = \frac{\partial}{\partial \lambda} \log f(X \mid \lambda)\).
- Verify that \(\mathbb{E}[s(\lambda)] = 0\).
- Compute the Fisher information \(I(\lambda) = \operatorname{Var}[s(\lambda)]\).
- Verify your answer in (c) by computing \(I(\lambda)\) via the second-derivative formula: \(I(\lambda) = -\mathbb{E}\!\left[\frac{\partial^2}{\partial\lambda^2}\log f(X \mid \lambda)\right]\).
- The MLE for \(\lambda\) is \(\hat{\lambda} = 1/\bar{X}\), which has \(\operatorname{Var}(\hat{\lambda}) \approx \lambda^2/n\) for large \(n\). Compare this with the Cram'{e}r–Rao lower bound. Is \(\hat{\lambda}\) asymptotically efficient?
08 Cramér–Rao: When Can We Beat \(1/n\)?
Let \(X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \mathrm{Bernoulli}(p)\).
- We know \(I(p) = \frac{1}{p(1-p)}\). Write down the Cram'{e}r–Rao lower bound for any unbiased estimator of \(p\).
- The sample proportion \(\hat{p} = \bar{X}\) has \(\operatorname{Var}(\hat{p}) = \frac{p(1-p)}{n}\). Is \(\hat{p}\) efficient?
- Now consider estimating \(g(p) = p^2\) instead of \(p\). The Cram'{e}r–Rao bound for unbiased estimators of \(g(\theta)\) is \[\operatorname{Var}(\hat{g}) \geq \frac{[g'(\theta)]^2}{n \cdot I(\theta)}.\] Compute the CR bound for unbiased estimators of \(p^2\).
- We showed in Problem 04(c) that \(V(\mathbf{X}) = \frac{U(U-1)}{n(n-1)}\) is unbiased for \(p^2\). Compute \(\operatorname{Var}(V)\) (at least for large \(n\)). Does it achieve the CR bound?
4) Admissibility & Minimax
09 Admissibility: A Sketch Exercise
Suppose there exist exactly three estimators \(T_1\), \(T_2\), and \(T_3\) for a parameter \(\theta \in [0, 1]\).
- Sketch an example of the MSE curves \(\mathrm{MSE}(T_i, \theta)\) as functions of \(\theta\) such that \(T_1\) and \(T_2\) are admissible, but \(T_3\) is not admissible. Explain why your sketch works.
- Now sketch (possibly different) risk functions for \(T_1\), \(T_2\), and \(T_3\) such that \(T_1\) is the minimax estimator. Must \(T_1\) have the lowest MSE everywhere?
10 MSE Matrix
Consider the parameter vector \(\boldsymbol{\theta} = (\theta_1, \theta_2)^\top\) and an estimator \(\hat{\boldsymbol{\theta}} = (\hat{\theta}_1, \hat{\theta}_2)^\top\) with \[\mathbb{E}[\hat{\boldsymbol{\theta}}] = \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix}, \quad \mathrm{Cov}(\hat{\boldsymbol{\theta}}) = \begin{pmatrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2 \end{pmatrix}.\]
The MSE matrix generalizes scalar MSE to vectors: \(\mathrm{MSE}(\hat{\boldsymbol{\theta}}) = \mathbb{E}\!\left[(\hat{\boldsymbol{\theta}} - \boldsymbol{\theta})(\hat{\boldsymbol{\theta}} - \boldsymbol{\theta})^\top\right]\).
- Show that \(\mathrm{MSE}(\hat{\boldsymbol{\theta}}) = \mathrm{Cov}(\hat{\boldsymbol{\theta}}) + \mathbf{b}\mathbf{b}^\top\), where \(\mathbf{b} = \mathbb{E}[\hat{\boldsymbol{\theta}}] - \boldsymbol{\theta}\) is the bias vector.
- Write out the MSE matrix explicitly in terms of \(\mu_1, \mu_2, \theta_1, \theta_2, \sigma_1^2, \sigma_2^2, \sigma_{12}\).
- The scalar total MSE is defined as \(\mathrm{tr}(\mathrm{MSE}) = \mathrm{MSE}(\hat{\theta}_1) + \mathrm{MSE}(\hat{\theta}_2)\). Write this in terms of the given quantities.
🎲 xx+37 (xx)
- ▶️ToDo
- 🔗Random link ToDo
- 🇦🇲🎶ToDo
- 🌐🎶ToDo
- 🤌Կարգին ToDo