Statistics is the study of finding estimators for a
statistic.
A statistic is a function s from
random variables to the real numbers R. Estimators are collection of
functions sn:Rn→R,
where n∈N is a natural number.
Given independent (X1,…,Xn)
having the same law as X how do we find
estimators sn such that the random
variable sn(X1,…,Xn)
approximates the number s(X) as n gets large?
For example, if s(X)=E[X] is the
expected value of X, then the
arithmetic meansmn(x1,…,xn)=(x1+⋯+xn)/n are estimators. If Mn=mn(X1,…,Xn) then E[Mn]=E[X] and Var(Mn)=Var(X)/n→0 as n→∞. We say the
random variables Sn converge to the
number s in mean squared when
limn→∞E[(Sn−s)2]=0.
If sn are estimators let Sn=sn(X1,…,Xn). Estimators with
the property E[Sn]=s(X), n∈N, are unbiased.
We could also use the geometric meansgn(x1,…,xn)=nx1⋯xn as estimators for E[X] if X is
positive. By Jensen’s
InequalityE[nX1⋯Xn]≤E[X] so the geometric means are not unbiased.
Clearly E[Mn∣X1=x1,…,Xn=xn]=E[Mn∣Mn=mn(x1,…,xn)] since both are
equal to mn(x1,…,xn). If
sn and tn are estimators with the property E[Sn∣X1=x1,…,Xn=xn]=E[Sn∣Sn=tn(x1,…,xn)] then tn are sufficient for sn.
Some statistics are better than other statistics.
Bias
If E[sn(X1,…,Xn)]=σ
we say sn is an unbiasedestimator of σ. The
arithmetic mean is an unbiased estimator of the mean. Since E[(X1⋯Xn)1/n]≤E[X1⋯Xn]1/n=E[X] the geometric mean is biased.
Efficient
An unbiased statistic sn is
efficient if it has the smallest variance among all unbiased
statistics. This leaves open the possibility of biased statistics that
have lower variance than efficient statistics.
Complete
A statistic s:Rn→R
is complete if Eθ[g(s(X1,…,Xn))]=0 for all
θ implies g(s(X1,…,Xn))=0 a.s.
Sufficient
A statistic t:Rn→R
is sufficient if the n
conditions Xj=xj, 1≤j≤n can be replaced by one condition
t(X1,…,Xn)=t(x1,…,xn),
i.e., Eθ[g(X)∣Xj=xj]=Eθ[g(X)∣s(X1,…,Xn)=s(x1,…,xn)].
Statistic
Given two statistics s and t for σ
where t is sufficient let δ(X)=E[s(X)∣t(X)] be the
improved estimator. The following theorem justifies this
name.
Sampling from a population is not special. One always does
this.
Hypothesis Testing
Given random variables Xθ,
θ∈Θ, and a partion {Θ0,Θ1} of Θ how can we decide if θ∈Θ0 (null hypothesis)
or θ∈Θ1 (alternate
hypothesis)?
This is done by designing tests and collecting data
samples. A test is a subset δ0⊆Rn and is called the
critical region. A data sample is a collection of numbers x=(x1,…,xn)∈Rn. We
accept the null hypothesis if the sample belongs to this
set.
Let X=(X1,…,Xn) be iid
random variables with the same law as Xθ. The power function of a
test δ0 is π(θ)=P(X∈δ∣θ). If
π=1Θ1 then the test
determines whether or not θ∈Θ0 with probability 1.
Exercise. Show if π=1Θ1 then x∈δ0
implies P(θ∈Θ0)=1.
Hint: If x∈δ0 then $
This is usually not possible so we look for tests that approximate
the indicator function of Θ1. The
size of a test δ is α=supθ∈Θ0π(θ).