> Ribeiro, Rita. _Imbalanced Regression and Extreme Value Prediction_. # Imbalanced Regression and Extreme Value Prediction ## Motivation - Standard predictive ML: uniform importance of values across Y → most frequent cases are favoured. - In imbalanced domains, non-uniform importance → costs of errors dependent on values. - Examples: minority classes in classification, high energy consumption in regression - In regression, need to accurately predict extreme values - Two reasons for classification v regression differences: 1. How to define non-uniform preferences over continuous domain of the target variable? - Method based on utility-based regression → relevance function on extreme values with automatic and non-parametric approach 2. How to properly evaluate models in an imbalanced regression setting for model selection and optimisation? - New metric that evaluates the ability of a model to predict extreme values ## Relevance function - $\phi(Y): \mathcal{Y} \to [0,1]$ - Approximation is necessary - Piecewise cubic hermite splines given a set of control points $(x_i, \phi(x_i))$ - Adjusted boxplot to supply control points - Non-parametric, robust measure of skewness, avoids signalling "false" extreme values ## Evaluation metric - Squared Error Relevance with cutoff $t \in [0,1]$: $ \mathrm{SER}(t) = \sum_{i \in \mathcal{D}_t} (\hat{y}_i - y_i)^2 $ where $\mathcal{D}_t = \{ i \in I \mid \phi(y_i) \geq t \}$ - Squared Error-Relevance Area: $ \mathrm{SERA} = \int_0^1 \mathrm{SER}(t) \, \mathrm{d}t $ - ROPE (Region of practical equivalence) for dominance analysis