> Ribeiro, Rita. _Imbalanced Regression and Extreme Value Prediction_.
# Imbalanced Regression and Extreme Value Prediction
## Motivation
- Standard predictive ML: uniform importance of values across Y → most frequent cases are favoured.
- In imbalanced domains, non-uniform importance → costs of errors dependent on values.
- Examples: minority classes in classification, high energy consumption in regression
- In regression, need to accurately predict extreme values
- Two reasons for classification v regression differences:
1. How to define non-uniform preferences over continuous domain of the target variable?
- Method based on utility-based regression → relevance function on extreme values with automatic and non-parametric approach
2. How to properly evaluate models in an imbalanced regression setting for model selection and optimisation?
- New metric that evaluates the ability of a model to predict extreme values
## Relevance function
- $\phi(Y): \mathcal{Y} \to [0,1]$
- Approximation is necessary
- Piecewise cubic hermite splines given a set of control points $(x_i, \phi(x_i))$
- Adjusted boxplot to supply control points
- Non-parametric, robust measure of skewness, avoids signalling "false" extreme values
## Evaluation metric
- Squared Error Relevance with cutoff $t \in [0,1]$:
$
\mathrm{SER}(t) = \sum_{i \in \mathcal{D}_t} (\hat{y}_i - y_i)^2
$
where $\mathcal{D}_t = \{ i \in I \mid \phi(y_i) \geq t \}$
- Squared Error-Relevance Area:
$
\mathrm{SERA} = \int_0^1 \mathrm{SER}(t) \, \mathrm{d}t
$
- ROPE (Region of practical equivalence) for dominance analysis