> The Weirdest Paradox in Statistics (and Machine Learning). Directed by Mathemaniac, 2022. YouTube, <https://www.youtube.com/watch?v=cUqoHQDinCM>.
# The James-Stein estimator
- For $p$ independent normal distributions of means $\mu_1, \dots, \mu_p$ from which are respectively sampled $p$ data points $x_1, \dots, x_p$, one of the best estimator of the means is:
$
\left( 1 - \frac{p-2}{x_1^2 + \dots + x_p^2} \right) \begin{pmatrix} x_1 \\ \vdots \\ x_p \end{pmatrix}
$
- Why is it better than the ordinary estimator $\begin{pmatrix} x_1 \\ \vdots \\ x_p \end{pmatrix}$ ?
- ==Better in the sense that it has lower mean squared error==
- Estimator A *dominates* B if MSE is lower for all values of the estimate
- A is *admissible* if no other estimator can dominate it
- The ordinary estimator has MSE of $p$ for all values of $\mu_i$ while the James-Stein estimator has an MSE that is always smaller than $p$
- Shrinkage factor
$
\left( 1 - \frac{p-2}{x_1^2 + \dots + x_p^2} \right)
$ is usually between 0 and 1, sometimes negative
- Only the points in an spherical area between the mean and the origin end up farther from the mean after shrinkage, while all other points end up closer
- ==James-Stein estimator is biased but with lower variance than ordinary estimator==
- It is not admissible either: replacing the shrinkage factor by 0 when negative improves the estimator