kdiagram.plot.feature_based.plot_feature_interaction¶
- kdiagram.plot.feature_based.plot_feature_interaction(df, theta_col, r_col, color_col, *, statistic='mean', theta_period=None, theta_bins=24, r_bins=10, acov='default', mode='basic', title=None, figsize=(8, 8), cmap='viridis', show_grid=True, grid_props=None, mask_radius=False, savefig=None, edgecolor='none', linewidth=0.0, theta_ticks=None, theta_ticklabels=None, theta_tick_step=None, r_ticks=None, r_ticklabels=None, r_tick_step=None, dpi=300, ax=None)[source]¶
Plots a polar heatmap of feature interactions.
This function visualizes how a target variable (
color_col) changes based on the interaction between two features, one mapped to the angle and one to the radius. It is a powerful tool for discovering non-linear relationships and conditional patterns in the data.The angular position (θ) represents the binned values of the first feature (
theta_col).The radial distance (r) represents the binned values of the second feature (
r_col).The color of each polar sector represents the aggregated value (e.g., mean) of the target variable (
color_col) for all data points that fall into that specific bin.
This plot is useful for identifying “hot spots” where a particular combination of feature values leads to a specific outcome, revealing complex interactions that are not visible from one-dimensional feature importance plots.
- Parameters:
- df
pd.DataFrame The input DataFrame containing the feature and target data.
- theta_col
str The name of the feature to be mapped to the angular axis. This is often a cyclical feature like “month” or “hour”.
- r_col
str The name of the feature to be mapped to the radial axis.
- color_col
str The name of the target column whose value will be represented by the color in each bin.
- statistic
str, default=’mean’ The aggregation function to apply to
color_colwithin each bin (e.g., ‘mean’, ‘median’, ‘std’).- theta_period
float,optional The period of the cyclical data in
theta_col(e.g., 24 for hours, 12 for months). This ensures the data wraps correctly around the polar plot.- theta_bins
int, default=24 The number of bins to create for the angular feature.
- r_bins
int, default=10 The number of bins to create for the radial feature.
- acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’},
default=’default’ Angular coverage (span) of the plot:
'default': \(2\pi\) (full circle)'half_circle': \(\pi\)'quarter_circle': \(\tfrac{\pi}{2}\)'eighth_circle': \(\tfrac{\pi}{4}\)
- mode{‘basic’, ‘annular’}, default=’basic’
The rendering mode for the plot:
'basic': (Default) Renders a smooth heatmap usingpcolormesh.'annular': Renders discrete, curved wedges (annular sectors) using polar bars. This is often clearer for binned data.
- title
str,optional The title for the plot. If
None, a default is generated.- figsize
tupleof(float,float), default=(8, 8) The figure size in inches.
- cmap
str, default=’viridis’ The colormap for the heatmap.
- show_gridbool, default=True
Toggle the visibility of the polar grid lines.
- grid_props
dict,optional Custom keyword arguments passed to the grid for styling.
- mask_radiusbool, default=False
If
True, hide the radial tick labels.- edgecolor
str, default=’none’ Edge color for the wedges when
mode='annular'.- linewidth
float, default=0.0 Edge line width for the wedges when
mode='annular'.- theta_tickssequence
offloat,optional Specific locations for angular ticks, specified in the original data units of
theta_col(e.g.,[0, 6, 12, 18]for hours). IfNone, ticks are set automatically.- theta_ticklabelssequence,
mapping,orcallable(),optional Custom labels for the
theta_ticks.sequence[str]: Must match the length of
theta_ticks.mapping[float, str]: Maps data values to labels (e.g.,
{12: "Noon", 16: "Close"}).callable: A function f(value) -> str.
- theta_tick_step
float,optional If
theta_ticksis not set, this generates ticks spaced by this step in the original data units (e.g., 1.0 for 1 hour).- r_tickssequence
offloat,optional Specific locations for radial ticks, specified in the original data units of
r_col.- r_ticklabelssequence,
mapping,orcallable(),optional Custom labels for the
r_ticks. Seetheta_ticklabelsfor format options (e.g.,{-1: "Bearish", 1: "Bullish"}).- r_tick_step
float,optional If
r_ticksis not set, this generates ticks spaced by this step in the original data units.- savefig
str,optional The file path to save the plot. If
None, the plot is displayed interactively.- dpi
int, default=300 The resolution (dots per inch) for the saved figure.
- df
- Returns:
- ax
matplotlib.axes.Axes The Matplotlib Axes object containing the plot.
- ax
- Raises:
ValueErrorIf any of the specified columns are not found in the DataFrame.
- Parameters:
df (DataFrame)
theta_col (str)
r_col (str)
color_col (str)
statistic (str)
theta_period (float | None)
theta_bins (int)
r_bins (int)
acov (Literal['default', 'half_circle', 'quarter_circle', 'eighth_circle'])
mode (Literal['basic', 'annular'])
title (str | None)
cmap (str)
show_grid (bool)
mask_radius (bool)
savefig (str | None)
edgecolor (str)
linewidth (float)
theta_ticklabels (Sequence[str] | Mapping[float, str] | Callable[[float], str] | None)
theta_tick_step (float | None)
r_ticklabels (Sequence[str] | Mapping[float, str] | Callable[[float], str] | None)
r_tick_step (float | None)
dpi (int)
ax (Axes | None)
See also
pandas.cutBin values into discrete intervals.
pandas.DataFrame.groupbyGroup DataFrame using a mapper or by a Series of columns.
matplotlib.pyplot.pcolormeshCreate a pseudocolor plot with a non-regular rectangular grid.
Notes
This plot is a novel visualization method developed as part of the analytics framework in [1].
The heatmap is constructed by first binning the 2D polar space defined by
theta_colandr_col. For each resulting polar sector, the specifiedstatistic(e.g., mean) is calculated for all data points whose feature values fall within that sector. The resulting aggregate value is then mapped to a color, creating the heatmap effect.Coordinate Mapping and Binning:
The angular data from
theta_col, \(\theta_{data}\), is converted to radians \([0, 2\pi]\). If a period \(P\) is given, the mapping is:(1)¶\[\theta_{rad} = \left( \frac{\theta_{data} \pmod P}{P} \right) \cdot 2\pi\]The data space is then divided into a grid of \(K_r \times K_{\theta}\) bins, where \(K_r\) is
r_binsand \(K_{\theta}\) istheta_bins.For each bin \(B_{ij}\), the aggregate value \(C_{ij}\) is computed from the target column
color_col(\(z\)):(2)¶\[C_{ij} = \text{statistic}(\{z_k \mid (r_k, \theta_k) \in B_{ij}\})\]
References
Examples
>>> import numpy as np >>> import pandas as pd >>> from kdiagram.plot.feature_based import plot_feature_interaction >>> >>> # Simulate solar panel output data >>> np.random.seed(0) >>> n_points = 5000 >>> hour = np.random.uniform(0, 24, n_points) >>> cloud = np.random.rand(n_points) >>> >>> # Output depends on the interaction of daylight and cloud cover >>> daylight = np.sin(hour * np.pi / 24)**2 >>> cloud_factor = (1 - cloud**0.5) >>> output = 100 * daylight * cloud_factor + np.random.rand(n_points) * 5 >>> output[(hour < 6) | (hour > 18)] = 0 # No output at night >>> >>> df_solar = pd.DataFrame({ ... 'hour_of_day': hour, ... 'cloud_cover': cloud, ... 'panel_output': output ... }) >>> >>> # Generate the plot >>> ax = plot_feature_interaction( ... df=df_solar, ... theta_col='hour_of_day', ... r_col='cloud_cover', ... color_col='panel_output', ... theta_period=24, ... theta_bins=24, ... r_bins=8, ... cmap='inferno', ... title='Solar Panel Output by Hour and Cloud Cover' ... )