Feature Importance Visualization¶
Understanding which input features most significantly influence a model’s predictions is crucial for interpretation, debugging, and building trust in forecasting models. While overall importance scores are useful, visualizing how these importances compare across different contexts (e.g., different models, time periods, spatial regions) can reveal deeper insights [1][2].
k-diagram provides a specialized radar chart, the “Feature
Fingerprint,” to effectively visualize and compare these multi-
dimensional feature importance profiles.
Summary of Feature-Based Functions¶
This section currently focuses on the primary function for visualizing feature importance profiles:
Function |
Description |
|---|---|
Creates a radar chart comparing feature importance profiles across different groups or layers. |
Detailed Explanations¶
Let’s explore the Feature Fingerprint plot.
Feature Importance Fingerprint (plot_feature_fingerprint())¶
Purpose: This function generates a polar radar chart designed to visually compare the importance or contribution profiles of multiple features across different groups, conditions, or models (referred to as “layers”). Each layer is represented by a distinct colored polygon on the chart, creating a unique “fingerprint” of feature influence for that layer [3]. It allows for easy identification of dominant features, relative-importance patterns, and shifts in influence across the layers being compared. When feature scores originate from model- agnostic tools (e.g., permutation importance) or model-specific methods (e.g., gradient/attention based for TFT), the fingerprint helps synthesize those signals into a single comparative view [1][2].
Mathematical Concept: Let \(\mathbf{R}\) be the input importances matrix of shape \((M, N)\), where \(M\) is the number of layers and \(N\) is the number of features.
Angle Assignment: Each feature \(j\) is assigned an axis on the radar chart at an evenly spaced angle:
(1)¶\[\theta_j = \frac{2 \pi j}{N}, \quad j = 0, 1, \dots, N-1\]Radial Value (Importance): For each layer \(i\) and feature \(j\), the radial distance \(r_{ij}\) represents the importance value from the input matrix \(\mathbf{R}\).
Normalization (Optional): If
normalize=True, the importances within each layer (row) \(i\) are scaled independently to the range [0, 1]:(2)¶\[r'_{ij} = \frac{r_{ij}}{\max_{k}(r_{ik})}\]If the maximum importance in a layer is zero or less, the normalized values for that layer are set to zero. The radius plotted is then \(r'_{ij}\). If
normalize=False, the raw radius \(r_{ij}\) is used.Plotting: Points \((r, \theta)\) are plotted for each feature and connected to form a polygon for each layer. The shape is closed by connecting the last feature’s point back to the first. The area can optionally be filled (
fill=True).
Interpretation:
Axes: Each angular axis corresponds to a specific input feature.
Polygons (Layers): Each colored polygon represents a different layer (e.g., Model A vs. Model B, or Zone 1 vs. Zone 2).
Radius: The distance from the center along a feature’s axis indicates the importance of that feature for a given layer.
Shape (Normalized View): When
normalize=True, compare the shapes of the polygons. This highlights the relative importance patterns. Which features are most important within each layer, regardless of overall magnitude? Do different layers rely on vastly different feature subsets?Size (Raw View): When
normalize=False, compare the overall size of the polygons. A larger polygon indicates that the layer generally assigns higher absolute importance scores across features compared to a smaller polygon (though interpretation depends on the nature of the importance metric).Dominant Features: Features corresponding to axes where polygons extend furthest are the most influential for those respective layers.
Use Cases:
Comparing Model Interpretations: Visualize and contrast feature importance derived from different model types (e.g., Random Forest vs. Gradient Boosting) trained on the same data.
Analyzing Importance Drift: Plot importance profiles calculated for different time periods or spatial regions to see if feature influence changes.
Identifying Characteristic Fingerprints: Understand the typical pattern of feature reliance for a specific system or model setup.
Debugging and Validation: Check if the feature importance profile aligns with domain knowledge or expectations.
Advantages (Polar/Radar Context):
Excellent for simultaneously comparing multiple multi-dimensional profiles (feature importance vectors) against a common set of axes (features).
The closed polygon shape provides a distinct visual “fingerprint” for each layer.
Makes it easy to spot the most dominant features (those axes with the largest radial values) for each layer.
Normalization allows comparing relative patterns effectively, even if absolute importance scales differ significantly between layers.
Example: (See the Gallery section for a runnable code example and plot)
References