Model Comparison Visualization¶
Comparing the performance of different forecasting or simulation models is a common task in model development and selection. Often, evaluation requires looking at multiple performance metrics simultaneously to understand the trade-offs and overall suitability of each model for a specific application.
The kdiagram.plot.comparison module provides tools specifically
for this purpose, currently featuring radar charts for multi-metric,
multi-model comparisons.
Summary of Comparison Functions¶
Function |
Description |
|---|---|
Generates a radar chart comparing multiple models across various performance metrics (e.g., R2, MAE, Accuracy). |
Detailed Explanations¶
Let’s explore the model comparison function.
Multi-Metric Model Comparison (plot_model_comparison())¶
Purpose: This function generates a radar chart (also known as a spider or star chart) to visually compare the performance of multiple models across multiple evaluation metrics simultaneously. It provides a holistic snapshot of model strengths and weaknesses, making it easier to select the best model based on criteria beyond a single score. Optionally, training time can be included as an additional comparison axis.
Mathematical Concept:
For each model \(k\) (with predictions \(\hat{y}_k\)) and each chosen metric \(m\), a score \(S_{m,k}\) is calculated using the true values \(y_{true}\):
The metrics used can be standard ones (like R2, MAE, Accuracy, F1) or custom functions. If train_times are provided, they are treated as another dimension.
The scores for each metric \(m\) are typically scaled across the models (using scale=’norm’ for Min-Max or scale=’std’ for Standard Scaling) before plotting, to bring potentially different metric ranges onto a comparable radial axis:
Each metric \(m\) is assigned an angle \(\theta_m\) on the radar chart, and the scaled score \(S'_{m,k}\) determines the radial distance along that axis for model \(k\). These points are connected to form a polygon representing each model’s overall performance profile.
Interpretation:
Axes: Each axis radiating from the center represents a different performance metric (e.g., ‘r2’, ‘mae’, ‘accuracy’, ‘train_time_s’).
Polygons: Each colored polygon corresponds to a different model, as indicated by the legend.
Radius: The distance from the center along a metric’s axis shows the model’s (potentially scaled) score for that metric.
Important: By default (scale=’norm’ with internal inversion for error metrics), a larger radius generally indicates better performance (higher score for accuracy/R2, lower score for MAE/RMSE/MAPE/time after inversion during scaling). Check the scale parameter used. If scale=None, interpret radius based on the raw metric values.
Shape Comparison: Compare the overall shapes and sizes of the polygons. A model with a consistently large polygon across multiple desirable metrics might be considered the best overall performer. Different shapes highlight trade-offs (e.g., one model might excel in R2 but be slow, while another is fast but has lower R2).
Use Cases:
- Multi-Objective Model Selection: Choose the best model when
performance needs to be balanced across several, potentially conflicting, metrics (e.g., high accuracy vs. low error vs. fast training time).
- Visualizing Strengths/Weaknesses: Quickly identify which metrics
a particular model excels or struggles with compared to others.
- Communicating Comparative Performance: Provide stakeholders with
an intuitive visual summary of how different candidate models stack up against each other based on chosen criteria.
- Comparing Regression and Classification: Use appropriate default
or custom metrics to compare models for either task type.
Advantages (Radar Context):
- Effectively displays multiple performance dimensions (>2) for
multiple entities (models) in a single, relatively compact plot.
- Allows direct comparison of the profiles of different models
– are they generally good/bad, or strong in some areas and weak in others?
Facilitates the identification of trade-offs between different metrics.
Example: (See the Model Comparison Example in the Gallery) (Note: Ensure the label `_gallery_plot_model_comparison` exists before the corresponding example in your gallery file, likely `gallery/evaluation.rst` or `gallery/comparison.rst` if you create one.)