> The Weirdest Paradox in Statistics (and Machine Learning). Directed by Mathemaniac, 2022. YouTube, <https://www.youtube.com/watch?v=cUqoHQDinCM>. # The James-Stein estimator - For $p$ independent normal distributions of means $\mu_1, \dots, \mu_p$ from which are respectively sampled $p$ data points $x_1, \dots, x_p$, one of the best estimator of the means is: $ \left( 1 - \frac{p-2}{x_1^2 + \dots + x_p^2} \right) \begin{pmatrix} x_1 \\ \vdots \\ x_p \end{pmatrix} $ - Why is it better than the ordinary estimator $\begin{pmatrix} x_1 \\ \vdots \\ x_p \end{pmatrix}$ ? - ==Better in the sense that it has lower mean squared error== - Estimator A *dominates* B if MSE is lower for all values of the estimate - A is *admissible* if no other estimator can dominate it - The ordinary estimator has MSE of $p$ for all values of $\mu_i$ while the James-Stein estimator has an MSE that is always smaller than $p$ - Shrinkage factor $ \left( 1 - \frac{p-2}{x_1^2 + \dots + x_p^2} \right) $ is usually between 0 and 1, sometimes negative - Only the points in an spherical area between the mean and the origin end up farther from the mean after shrinkage, while all other points end up closer - ==James-Stein estimator is biased but with lower variance than ordinary estimator== - It is not admissible either: replacing the shrinkage factor by 0 when negative improves the estimator