Statistics

Keith A. Lewis

January 26, 2025

Abstract
Observations of outcomes
Probability all the things!

Statistics

Statistics is the study of finding estimators for a statistic.

A statistic is a function ss from random variables to the real numbers R\bm{R}. Estimators are collection of functions sn ⁣:RnRs_n\colon\bm{R}^n\to\bm{R}, where nNn\in\bm{N} is a natural number. Given independent (X1,,Xn)(X_1,\ldots, X_n) having the same law as XX how do we find estimators sns_n such that the random variable sn(X1,,Xn)s_n(X_1,\ldots,X_n) approximates the number s(X)s(X) as nn gets large?

For example, if s(X)=E[X]s(X) = E[X] is the expected value of XX, then the arithmetic means mn(x1,,xn)=(x1++xn)/nm_n(x_1,\ldots,x_n) = (x_1 + \cdots + x_n)/n are estimators. If Mn=mn(X1,,Xn)M_n = m_n(X_1,\ldots,X_n) then E[Mn]=E[X]E[M_n] = E[X] and Var(Mn)=Var(X)/n0\operatorname{Var}(M_n) = \operatorname{Var}(X)/n\to 0 as nn\to\infty. We say the random variables SnS_n converge to the number ss in mean squared when limnE[(Sns)2]=0\lim_{n\to\infty}E[(S_n - s)^2] = 0.

If sns_n are estimators let Sn=sn(X1,,Xn)S_n = s_n(X_1,\ldots,X_n). Estimators with the property E[Sn]=s(X)E[S_n] = s(X), nNn\in\bm{N}, are unbiased.

We could also use the geometric means gn(x1,,xn)=x1xnng_n(x_1,\ldots,x_n) = \sqrt[n]{x_1\cdots x_n} as estimators for E[X]E[X] if XX is positive. By Jensen’s Inequality E[X1Xnn]E[X]E[\sqrt[n]{X_1\cdots X_n}] \le E[X] so the geometric means are not unbiased.

Clearly E[MnX1=x1,,Xn=xn]=E[MnMn=mn(x1,,xn)]E[M_n\mid X_1 = x_1, \ldots, X_n = x_n] = E[M_n\mid M_n = m_n(x_1, \ldots, x_n)] since both are equal to mn(x1,,xn)m_n(x_1, \ldots, x_n). If sns_n and tnt_n are estimators with the property E[SnX1=x1,,Xn=xn]=E[SnSn=tn(x1,,xn)]E[S_n\mid X_1 = x_1, \ldots, X_n = x_n] = E[S_n\mid S_n = t_n(x_1,\ldots,x_n)] then tnt_n are sufficient for sns_n.

Some statistics are better than other statistics.

Bias

If E[sn(X1,,Xn)]=σE[s_n(X_1,\ldots,X_n)] = \sigma we say sns_n is an unbiased estimator of σ\sigma. The arithmetic mean is an unbiased estimator of the mean. Since E[(X1Xn)1/n]E[X1Xn]1/n=E[X]E[(X_1\cdots X_n)^{1/n}] \le E[X_1\cdots X_n]^{1/n} = E[X] the geometric mean is biased.

Efficient

An unbiased statistic sns_n is efficient if it has the smallest variance among all unbiased statistics. This leaves open the possibility of biased statistics that have lower variance than efficient statistics.

Complete

A statistic s ⁣:RnRs\colon\bm{R}^n\to\bm{R} is complete if Eθ[g(s(X1,,Xn))]=0E_\theta[g(s(X_1,\ldots,X_n))] = 0 for all θ\theta implies g(s(X1,,Xn))=0g(s(X_1,\ldots,X_n)) = 0 a.s.

Sufficient

A statistic t ⁣:RnRt\colon\bm{R}^n\to\bm{R} is sufficient if the nn conditions Xj=xjX_j = x_j, 1jn1\le j\le n can be replaced by one condition t(X1,,Xn)=t(x1,,xn)t(X_1,\ldots,X_n) = t(x_1,\ldots,x_n), i.e., Eθ[g(X)Xj=xj]=Eθ[g(X)s(X1,,Xn)=s(x1,,xn)]E_\theta[g(X)\mid X_j = x_j] = E_\theta[g(X)\mid s(X_1,\ldots,X_n) = s(x_1,\ldots,x_n)].

Statistic

Given two statistics ss and tt for σ\sigma where tt is sufficient let δ(X)=E[s(X)t(X)]\delta(X) = E[s(X)\mid t(X)] be the improved estimator. The following theorem justifies this name.

Theorem. (Rao–Blackwell–Kolmogorov) E[(δ(X)σ)2]E[(s(X)σ)2]E[(\delta(X) - \sigma)^2] \le E[(s(X) - \sigma)^2].

Proof. We have $

Theorem. (Lehmann–Scheffé) If ….

See Lehmann–Scheffé Theorem

Population

Sampling from a population is not special. One always does this.

Hypothesis Testing

Given random variables XθX_\theta, θΘ\theta\in\Theta, and a partion {Θ0,Θ1}\{\Theta_0,\Theta_1\} of Θ\Theta how can we decide if θΘ0\theta\in\Theta_0 (null hypothesis) or θΘ1\theta\in\Theta_1 (alternate hypothesis)?

This is done by designing tests and collecting data samples. A test is a subset δ0Rn\delta_0\subseteq \bm{R}^n and is called the critical region. A data sample is a collection of numbers x=(x1,,xn)Rnx = (x_1,\ldots,x_n)\in\bm{R}^n. We accept the null hypothesis if the sample belongs to this set.

Let X=(X1,,Xn)X = (X_1,\ldots,X_n) be iid random variables with the same law as XθX_\theta. The power function of a test δ0\delta_0 is π(θ)=P(Xδθ)\pi(\theta) = P(X\in\delta\mid\theta). If π=1Θ1\pi = 1_{\Theta_1} then the test determines whether or not θΘ0\theta\in\Theta_0 with probability 1.

Exercise. Show if π=1Θ1\pi = 1_{\Theta_1} then xδ0x\in\delta_0 implies P(θΘ0)=1P(\theta\in\Theta_0) = 1.

Hint: If xδ0x\in\delta_0 then $

This is usually not possible so we look for tests that approximate the indicator function of Θ1\Theta_1. The size of a test δ\delta is α=supθΘ0π(θ)\alpha = \sup_{\theta\in\Theta_0} \pi(\theta).