kdiagram.plot.feature_based.plot_feature_interaction¶
- kdiagram.plot.feature_based.plot_feature_interaction(df, theta_col, r_col, color_col, *, statistic='mean', theta_period=None, theta_bins=24, r_bins=10, title=None, figsize=(8, 8), cmap='viridis', show_grid=True, grid_props=None, mask_radius=False, savefig=None, dpi=300)[source]¶
Plots a polar heatmap of feature interactions.
This function visualizes how a target variable (
color_col) changes based on the interaction between two features, one mapped to the angle and one to the radius. It is a powerful tool for discovering non-linear relationships and conditional patterns in the data.The angular position (θ) represents the binned values of the first feature (
theta_col).The radial distance (r) represents the binned values of the second feature (
r_col).The color of each polar sector represents the aggregated value (e.g., mean) of the target variable (
color_col) for all data points that fall into that specific bin.
This plot is useful for identifying “hot spots” where a particular combination of feature values leads to a specific outcome, revealing complex interactions that are not visible from one-dimensional feature importance plots.
- Parameters:
- df
pd.DataFrame The input DataFrame containing the feature and target data.
- theta_col
str The name of the feature to be mapped to the angular axis. This is often a cyclical feature like “month” or “hour”.
- r_col
str The name of the feature to be mapped to the radial axis.
- color_col
str The name of the target column whose value will be represented by the color in each bin.
- statistic
str, default=’mean’ The aggregation function to apply to
color_colwithin each bin (e.g., ‘mean’, ‘median’, ‘std’).- theta_period
float,optional The period of the cyclical data in
theta_col(e.g., 24 for hours, 12 for months). This ensures the data wraps correctly around the polar plot.- theta_bins
int, default=24 The number of bins to create for the angular feature.
- r_bins
int, default=10 The number of bins to create for the radial feature.
- title
str,optional The title for the plot. If
None, a default is generated.- figsize
tupleof(float,float), default=(8, 8) The figure size in inches.
- cmap
str, default=’viridis’ The colormap for the heatmap.
- show_gridbool, default=True
Toggle the visibility of the polar grid lines.
- grid_props
dict,optional Custom keyword arguments passed to the grid for styling.
- mask_radiusbool, default=False
If
True, hide the radial tick labels.- savefig
str,optional The file path to save the plot. If
None, the plot is displayed interactively.- dpi
int, default=300 The resolution (dots per inch) for the saved figure.
- df
- Returns:
- ax
matplotlib.axes.Axes The Matplotlib Axes object containing the plot.
- ax
- Raises:
ValueErrorIf any of the specified columns are not found in the DataFrame.
- Parameters:
See also
pandas.cutBin values into discrete intervals.
pandas.DataFrame.groupbyGroup DataFrame using a mapper or by a Series of columns.
matplotlib.pyplot.pcolormeshCreate a pseudocolor plot with a non-regular rectangular grid.
Notes
This plot is a novel visualization method developed as part of the analytics framework in [1].
The heatmap is constructed by first binning the 2D polar space defined by
theta_colandr_col. For each resulting polar sector, the specifiedstatistic(e.g., mean) is calculated for all data points whose feature values fall within that sector. The resulting aggregate value is then mapped to a color, creating the heatmap effect.Coordinate Mapping and Binning:
The angular data from
theta_col, \(\theta_{data}\), is converted to radians \([0, 2\pi]\). If a period \(P\) is given, the mapping is:(1)¶\[\theta_{rad} = \left( \frac{\theta_{data} \pmod P}{P} \right) \cdot 2\pi\]The data space is then divided into a grid of \(K_r \times K_{\theta}\) bins, where \(K_r\) is
r_binsand \(K_{\theta}\) istheta_bins.For each bin \(B_{ij}\), the aggregate value \(C_{ij}\) is computed from the target column
color_col(\(z\)):(2)¶\[C_{ij} = \text{statistic}(\{z_k \mid (r_k, \theta_k) \in B_{ij}\})\]
References
Examples
>>> import numpy as np >>> import pandas as pd >>> from kdiagram.plot.feature_based import plot_feature_interaction >>> >>> # Simulate solar panel output data >>> np.random.seed(0) >>> n_points = 5000 >>> hour = np.random.uniform(0, 24, n_points) >>> cloud = np.random.rand(n_points) >>> >>> # Output depends on the interaction of daylight and cloud cover >>> daylight = np.sin(hour * np.pi / 24)**2 >>> cloud_factor = (1 - cloud**0.5) >>> output = 100 * daylight * cloud_factor + np.random.rand(n_points) * 5 >>> output[(hour < 6) | (hour > 18)] = 0 # No output at night >>> >>> df_solar = pd.DataFrame({ ... 'hour_of_day': hour, ... 'cloud_cover': cloud, ... 'panel_output': output ... }) >>> >>> # Generate the plot >>> ax = plot_feature_interaction( ... df=df_solar, ... theta_col='hour_of_day', ... r_col='cloud_cover', ... color_col='panel_output', ... theta_period=24, ... theta_bins=24, ... r_bins=8, ... cmap='inferno', ... title='Solar Panel Output by Hour and Cloud Cover' ... )