22: Statistics - Properties of Estimators

📚 Նյութը


🏡 Տնային


1) Exponential Family & Sufficiency

01 Poisson Meets the Exponential Family

Let \(X\) be a random variable following a Poisson distribution with parameter \(\lambda > 0\), i.e., \(P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}\) for \(k = 0, 1, 2, \ldots\)

    1. Show that the Poisson distribution belongs to the exponential family by writing its PMF in the form \[f(x \mid \lambda) = h(x) \exp\!\big(\eta(\lambda)\, T(x) - A(\lambda)\big).\] Identify \(h(x)\), \(\eta(\lambda)\), \(T(x)\), and \(A(\lambda)\).
    1. Using the exponential family form, what is the sufficient statistic for \(\lambda\) based on an i.i.d. sample \(X_1, \ldots, X_n\)?

02 Slit Width Estimation

In an experiment, \(n\) drops of solution are released uniformly through a slit onto a surface. We model the one-dimensional impact points \(X_1, \ldots, X_n\) as i.i.d. \(\mathrm{Uniform}(0, d)\), where the unknown slit width \(d > 0\) is to be estimated.

    1. Write down the joint density \(f(\mathbf{x} \mid d)\) for the sample.
    1. Using the Fisher–Neyman factorization theorem, show that \(X_{(n)} = \max\{X_1, \ldots, X_n\}\) is sufficient for \(d\).
    1. Is \(X_{(n)}\) unbiased for \(d\)? If not, find an unbiased estimator based on \(X_{(n)}\).

Hint for (c): the CDF of \(X_{(n)}\) is \(F_{X_{(n)}}(x) = (x/d)^n\) for \(0 \le x \le d\).

03 Normal Variance: Minimal Sufficiency

Let \(X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} N(\mu, \sigma^2)\) where \(\sigma^2 > 0\) is unknown but \(\mu\) is known.

    1. Show that \(T(\mathbf{X}) = \sum_{i=1}^{n}(X_i - \mu)^2\) is sufficient for \(\sigma^2\) using the factorization theorem.
    1. Using the likelihood ratio criterion, show that \(T(\mathbf{X})\) is minimal sufficient for \(\sigma^2\).

Recall: \(T\) is minimal sufficient iff \(T(\mathbf{x}) = T(\mathbf{y})\) \(\Longleftrightarrow\) \(\frac{f(\mathbf{x} \mid \sigma^2)}{f(\mathbf{y} \mid \sigma^2)}\) is free of \(\sigma^2\).

04 Binomial Sufficiency and Estimating \(\pi^2\)

Let \(\mathbf{X} = (X_1, \ldots, X_n)^\top\) with \(X_i \overset{\text{i.i.d.}}{\sim} \mathrm{Bernoulli}(\pi)\), where \(\pi \in (0, 1)\). Define \(U(\mathbf{X}) = \sum_{i=1}^{n} X_i\).

    1. Show that \(U(\mathbf{X})/n\) is unbiased for \(\pi\).
    1. Show that \(U(\mathbf{X})\) is minimal sufficient for \(\pi\).
    1. Now consider the estimator for \(\pi^2\): \[V(\mathbf{X}) = \frac{U(\mathbf{X})\left[U(\mathbf{X}) - 1\right]}{n(n-1)}.\] Verify that \(V(\mathbf{X})\) is unbiased for \(\pi^2\).

Hint for (c): expand \(\mathbb{E}[U(U-1)]\) using \(\mathbb{E}[U] = n\pi\) and \(\operatorname{Var}(U) = n\pi(1-\pi)\).


2) Bias, Variance & Risk Functions

05 Logistic Risk Showdown

Let \(X_1, \ldots, X_n\) be i.i.d. from a logistic distribution with parameters \(a \in \mathbb{R}\) (location) and \(b \in \mathbb{R}^+\) (scale). The density is \[f(x) = \frac{\exp\!\left(-\frac{x - a}{b}\right)}{b\left(1 + \exp\!\left(-\frac{x - a}{b}\right)\right)^2}, \quad x \in \mathbb{R},\] with \(\mathbb{E}[X] = a\) and \(\operatorname{Var}(X) = b^2 \pi^2 / 3\).

We want to estimate the location parameter \(a\) using the quadratic loss \(L(\hat{a}, a) = (\hat{a} - a)^2\), and compare two estimators:

  • \(T(\mathbf{X}) = \bar{X} = \frac{1}{n}\sum_{i=1}^{n} X_i\) (the sample mean),
  • \(U(\mathbf{X}, c) = \frac{1}{2}c + \frac{1}{2}\bar{X}\), where \(c \in \mathbb{R}\) is a fixed constant.

Answer the following:

    1. Compute the risk function \(R(a, T) = \mathbb{E}_a\!\left[(T - a)^2\right]\) for the estimator \(T\).
    1. Compute the risk function \(R(a, U) = \mathbb{E}_a\!\left[(U - a)^2\right]\) for the estimator \(U\).
    1. Sketch both risk functions on the same plot for \(a \in [-5, 5]\) with \(b = 3\), \(c = 1\), and \(n = 10\). Which estimator has lower risk near \(a = c\)? Which has lower risk when \(|a|\) is large?
    1. Which of the two estimators is the minimax estimator? Justify your answer.

06 Shrinking the Sample Mean

Let \(X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} N(\mu, \sigma^2)\) with \(\sigma^2\) known. Instead of the usual \(\bar{X}\), consider the “shrinkage” estimator \[\hat{\mu}_c = c \cdot \bar{X}, \quad 0 < c < 1.\]

    1. Find the bias of \(\hat{\mu}_c\).
    1. Find the variance of \(\hat{\mu}_c\).
    1. Show that \(\mathrm{MSE}(\hat{\mu}_c) = c^2 \frac{\sigma^2}{n} + (1 - c)^2 \mu^2\).
    1. For what value of \(c\) is the MSE minimized? Is the optimal estimator biased?
    1. What is the problem with the “optimal” \(c\) from (d) in practice?

3) Fisher Information & Cram'{e}r–Rao

07 Fisher Information for the Exponential

Let \(X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \mathrm{Exp}(\lambda)\) with density \(f(x \mid \lambda) = \lambda e^{-\lambda x}\) for \(x \geq 0\).

    1. Compute the score function \(s(\lambda) = \frac{\partial}{\partial \lambda} \log f(X \mid \lambda)\).
    1. Verify that \(\mathbb{E}[s(\lambda)] = 0\).
    1. Compute the Fisher information \(I(\lambda) = \operatorname{Var}[s(\lambda)]\).
    1. Verify your answer in (c) by computing \(I(\lambda)\) via the second-derivative formula: \(I(\lambda) = -\mathbb{E}\!\left[\frac{\partial^2}{\partial\lambda^2}\log f(X \mid \lambda)\right]\).
    1. The MLE for \(\lambda\) is \(\hat{\lambda} = 1/\bar{X}\), which has \(\operatorname{Var}(\hat{\lambda}) \approx \lambda^2/n\) for large \(n\). Compare this with the Cram'{e}r–Rao lower bound. Is \(\hat{\lambda}\) asymptotically efficient?

08 Cramér–Rao: When Can We Beat \(1/n\)?

Let \(X_1, \ldots, X_n \overset{\text{i.i.d.}}{\sim} \mathrm{Bernoulli}(p)\).

    1. We know \(I(p) = \frac{1}{p(1-p)}\). Write down the Cram'{e}r–Rao lower bound for any unbiased estimator of \(p\).
    1. The sample proportion \(\hat{p} = \bar{X}\) has \(\operatorname{Var}(\hat{p}) = \frac{p(1-p)}{n}\). Is \(\hat{p}\) efficient?
    1. Now consider estimating \(g(p) = p^2\) instead of \(p\). The Cram'{e}r–Rao bound for unbiased estimators of \(g(\theta)\) is \[\operatorname{Var}(\hat{g}) \geq \frac{[g'(\theta)]^2}{n \cdot I(\theta)}.\] Compute the CR bound for unbiased estimators of \(p^2\).
    1. We showed in Problem 04(c) that \(V(\mathbf{X}) = \frac{U(U-1)}{n(n-1)}\) is unbiased for \(p^2\). Compute \(\operatorname{Var}(V)\) (at least for large \(n\)). Does it achieve the CR bound?

4) Admissibility & Minimax

09 Admissibility: A Sketch Exercise

Suppose there exist exactly three estimators \(T_1\), \(T_2\), and \(T_3\) for a parameter \(\theta \in [0, 1]\).

    1. Sketch an example of the MSE curves \(\mathrm{MSE}(T_i, \theta)\) as functions of \(\theta\) such that \(T_1\) and \(T_2\) are admissible, but \(T_3\) is not admissible. Explain why your sketch works.
    1. Now sketch (possibly different) risk functions for \(T_1\), \(T_2\), and \(T_3\) such that \(T_1\) is the minimax estimator. Must \(T_1\) have the lowest MSE everywhere?

10 MSE Matrix

Consider the parameter vector \(\boldsymbol{\theta} = (\theta_1, \theta_2)^\top\) and an estimator \(\hat{\boldsymbol{\theta}} = (\hat{\theta}_1, \hat{\theta}_2)^\top\) with \[\mathbb{E}[\hat{\boldsymbol{\theta}}] = \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix}, \quad \mathrm{Cov}(\hat{\boldsymbol{\theta}}) = \begin{pmatrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2 \end{pmatrix}.\]

The MSE matrix generalizes scalar MSE to vectors: \(\mathrm{MSE}(\hat{\boldsymbol{\theta}}) = \mathbb{E}\!\left[(\hat{\boldsymbol{\theta}} - \boldsymbol{\theta})(\hat{\boldsymbol{\theta}} - \boldsymbol{\theta})^\top\right]\).

    1. Show that \(\mathrm{MSE}(\hat{\boldsymbol{\theta}}) = \mathrm{Cov}(\hat{\boldsymbol{\theta}}) + \mathbf{b}\mathbf{b}^\top\), where \(\mathbf{b} = \mathbb{E}[\hat{\boldsymbol{\theta}}] - \boldsymbol{\theta}\) is the bias vector.
    1. Write out the MSE matrix explicitly in terms of \(\mu_1, \mu_2, \theta_1, \theta_2, \sigma_1^2, \sigma_2^2, \sigma_{12}\).
    1. The scalar total MSE is defined as \(\mathrm{tr}(\mathrm{MSE}) = \mathrm{MSE}(\hat{\theta}_1) + \mathrm{MSE}(\hat{\theta}_2)\). Write this in terms of the given quantities.

🎲 xx+37 (xx)

Flag Counter