Statistics

Statistics is the study of finding estimators for a statistic.

A statistic is a function s from random variables to the real numbers \bm{R}. Estimators are collection of functions s_n\colon\bm{R}^n\to\bm{R}, where n\in\bm{N} is a natural number. Given independent (X_1,\ldots, X_n) having the same law as X how do we find estimators s_n such that the random variable s_n(X_1,\ldots,X_n) approximates the number s(X) as n gets large?

For example, if s(X) = E[X] is the expected value of X, then the arithmetic means m_n(x_1,\ldots,x_n) = (x_1 + \cdots + x_n)/n are estimators. If M_n = m_n(X_1,\ldots,X_n) then E[M_n] = E[X] and \operatorname{Var}(M_n) = \operatorname{Var}(X)/n\to 0 as n\to\infty. We say the random variables S_n converge to the number s in mean squared when \lim_{n\to\infty}E[(S_n - s)^2] = 0.

If s_n are estimators let S_n = s_n(X_1,\ldots,X_n). Estimators with the property E[S_n] = s(X), n\in\bm{N}, are unbiased.

We could also use the geometric means g_n(x_1,\ldots,x_n) = \sqrt[n]{x_1\cdots x_n} as estimators for E[X] if X is positive. By Jensen’s Inequality E[\sqrt[n]{X_1\cdots X_n}] \le E[X] so the geometric means are not unbiased.

Clearly E[M_n\mid X_1 = x_1, \ldots, X_n = x_n] = E[M_n\mid M_n = m_n(x_1, \ldots, x_n)] since both are equal to m_n(x_1, \ldots, x_n). If s_n and t_n are estimators with the property E[S_n\mid X_1 = x_1, \ldots, X_n = x_n] = E[S_n\mid S_n = t_n(x_1,\ldots,x_n)] then t_n are sufficient for s_n.

Some statistics are better than other statistics.

Bias

If E[s_n(X_1,\ldots,X_n)] = \sigma we say s_n is an unbiased estimator of \sigma. The arithmetic mean is an unbiased estimator of the mean. Since E[(X_1\cdots X_n)^{1/n}] \le E[X_1\cdots X_n]^{1/n} = E[X] the geometric mean is biased.

Efficient

An unbiased statistic s_n is efficient if it has the smallest variance among all unbiased statistics. This leaves open the possibility of biased statistics that have lower variance than efficient statistics.

Complete

A statistic s\colon\bm{R}^n\to\bm{R} is complete if E_\theta[g(s(X_1,\ldots,X_n))] = 0 for all \theta implies g(s(X_1,\ldots,X_n)) = 0 a.s.

Sufficient

A statistic t\colon\bm{R}^n\to\bm{R} is sufficient if the n conditions X_j = x_j, 1\le j\le n can be replaced by one condition t(X_1,\ldots,X_n) = t(x_1,\ldots,x_n), i.e., E_\theta[g(X)\mid X_j = x_j] = E_\theta[g(X)\mid s(X_1,\ldots,X_n) = s(x_1,\ldots,x_n)].

Statistic

Given two statistics s and t for \sigma where t is sufficient let \delta(X) = E[s(X)\mid t(X)] be the improved estimator. The following theorem justifies this name.

Theorem. (Rao–Blackwell–Kolmogorov) E[(\delta(X) - \sigma)^2] \le E[(s(X) - \sigma)^2].

Proof. We have $

Theorem. (Lehmann–Scheffé) If ….

See Lehmann–Scheffé Theorem

Population

Sampling from a population is not special. One always does this.

Hypothesis Testing

Given random variables X_\theta, \theta\in\Theta, and a partion \{\Theta_0,\Theta_1\} of \Theta how can we decide if \theta\in\Theta_0 (null hypothesis) or \theta\in\Theta_1 (alternate hypothesis)?

This is done by designing tests and collecting data samples. A test is a subset \delta_0\subseteq \bm{R}^n and is called the critical region. A data sample is a collection of numbers x = (x_1,\ldots,x_n)\in\bm{R}^n. We accept the null hypothesis if the sample belongs to this set.

Let X = (X_1,\ldots,X_n) be iid random variables with the same law as X_\theta. The power function of a test \delta_0 is \pi(\theta) = P(X\in\delta\mid\theta). If \pi = 1_{\Theta_1} then the test determines whether or not \theta\in\Theta_0 with probability 1.

Exercise. Show if \pi = 1_{\Theta_1} then x\in\delta_0 implies P(\theta\in\Theta_0) = 1.

Hint: If x\in\delta_0 then $

This is usually not possible so we look for tests that approximate the indicator function of \Theta_1. The size of a test \delta is \alpha = \sup_{\theta\in\Theta_0} \pi(\theta).