kdiagram.plot.feature_based.plot_feature_interaction

kdiagram.plot.feature_based.plot_feature_interaction(df, theta_col, r_col, color_col, *, statistic='mean', theta_period=None, theta_bins=24, r_bins=10, acov='default', mode='basic', title=None, figsize=(8, 8), cmap='viridis', show_grid=True, grid_props=None, mask_radius=False, savefig=None, edgecolor='none', linewidth=0.0, theta_ticks=None, theta_ticklabels=None, theta_tick_step=None, r_ticks=None, r_ticklabels=None, r_tick_step=None, dpi=300, ax=None)[source]

Plots a polar heatmap of feature interactions.

This function visualizes how a target variable (color_col) changes based on the interaction between two features, one mapped to the angle and one to the radius. It is a powerful tool for discovering non-linear relationships and conditional patterns in the data.

  • The angular position (θ) represents the binned values of the first feature (theta_col).

  • The radial distance (r) represents the binned values of the second feature (r_col).

  • The color of each polar sector represents the aggregated value (e.g., mean) of the target variable (color_col) for all data points that fall into that specific bin.

This plot is useful for identifying “hot spots” where a particular combination of feature values leads to a specific outcome, revealing complex interactions that are not visible from one-dimensional feature importance plots.

Parameters:
dfpd.DataFrame

The input DataFrame containing the feature and target data.

theta_colstr

The name of the feature to be mapped to the angular axis. This is often a cyclical feature like “month” or “hour”.

r_colstr

The name of the feature to be mapped to the radial axis.

color_colstr

The name of the target column whose value will be represented by the color in each bin.

statisticstr, default=’mean’

The aggregation function to apply to color_col within each bin (e.g., ‘mean’, ‘median’, ‘std’).

theta_periodfloat, optional

The period of the cyclical data in theta_col (e.g., 24 for hours, 12 for months). This ensures the data wraps correctly around the polar plot.

theta_binsint, default=24

The number of bins to create for the angular feature.

r_binsint, default=10

The number of bins to create for the radial feature.

acov{‘default’, ‘half_circle’, ‘quarter_circle’, ‘eighth_circle’},

default=’default’ Angular coverage (span) of the plot:

  • 'default': \(2\pi\) (full circle)

  • 'half_circle': \(\pi\)

  • 'quarter_circle': \(\tfrac{\pi}{2}\)

  • 'eighth_circle': \(\tfrac{\pi}{4}\)

mode{‘basic’, ‘annular’}, default=’basic’

The rendering mode for the plot:

  • 'basic': (Default) Renders a smooth heatmap using pcolormesh.

  • 'annular': Renders discrete, curved wedges (annular sectors) using polar bars. This is often clearer for binned data.

titlestr, optional

The title for the plot. If None, a default is generated.

figsizetuple of (float, float), default=(8, 8)

The figure size in inches.

cmapstr, default=’viridis’

The colormap for the heatmap.

show_gridbool, default=True

Toggle the visibility of the polar grid lines.

grid_propsdict, optional

Custom keyword arguments passed to the grid for styling.

mask_radiusbool, default=False

If True, hide the radial tick labels.

edgecolorstr, default=’none’

Edge color for the wedges when mode='annular'.

linewidthfloat, default=0.0

Edge line width for the wedges when mode='annular'.

theta_tickssequence of float, optional

Specific locations for angular ticks, specified in the original data units of theta_col (e.g., [0, 6, 12, 18] for hours). If None, ticks are set automatically.

theta_ticklabelssequence, mapping, or callable(), optional

Custom labels for the theta_ticks.

  • sequence[str]: Must match the length of theta_ticks.

  • mapping[float, str]: Maps data values to labels (e.g., {12: "Noon", 16: "Close"}).

  • callable: A function f(value) -> str.

theta_tick_stepfloat, optional

If theta_ticks is not set, this generates ticks spaced by this step in the original data units (e.g., 1.0 for 1 hour).

r_tickssequence of float, optional

Specific locations for radial ticks, specified in the original data units of r_col.

r_ticklabelssequence, mapping, or callable(), optional

Custom labels for the r_ticks. See theta_ticklabels for format options (e.g., {-1: "Bearish", 1: "Bullish"}).

r_tick_stepfloat, optional

If r_ticks is not set, this generates ticks spaced by this step in the original data units.

savefigstr, optional

The file path to save the plot. If None, the plot is displayed interactively.

dpiint, default=300

The resolution (dots per inch) for the saved figure.

Returns:
axmatplotlib.axes.Axes

The Matplotlib Axes object containing the plot.

Raises:
ValueError

If any of the specified columns are not found in the DataFrame.

Parameters:

See also

pandas.cut

Bin values into discrete intervals.

pandas.DataFrame.groupby

Group DataFrame using a mapper or by a Series of columns.

matplotlib.pyplot.pcolormesh

Create a pseudocolor plot with a non-regular rectangular grid.

Notes

This plot is a novel visualization method developed as part of the analytics framework in [1].

The heatmap is constructed by first binning the 2D polar space defined by theta_col and r_col. For each resulting polar sector, the specified statistic (e.g., mean) is calculated for all data points whose feature values fall within that sector. The resulting aggregate value is then mapped to a color, creating the heatmap effect.

Coordinate Mapping and Binning:

  1. The angular data from theta_col, \(\theta_{data}\), is converted to radians \([0, 2\pi]\). If a period \(P\) is given, the mapping is:

    (1)\[\theta_{rad} = \left( \frac{\theta_{data} \pmod P}{P} \right) \cdot 2\pi\]
  2. The data space is then divided into a grid of \(K_r \times K_{\theta}\) bins, where \(K_r\) is r_bins and \(K_{\theta}\) is theta_bins.

  3. For each bin \(B_{ij}\), the aggregate value \(C_{ij}\) is computed from the target column color_col (\(z\)):

    (2)\[C_{ij} = \text{statistic}(\{z_k \mid (r_k, \theta_k) \in B_{ij}\})\]

References

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from kdiagram.plot.feature_based import plot_feature_interaction
>>>
>>> # Simulate solar panel output data
>>> np.random.seed(0)
>>> n_points = 5000
>>> hour = np.random.uniform(0, 24, n_points)
>>> cloud = np.random.rand(n_points)
>>>
>>> # Output depends on the interaction of daylight and cloud cover
>>> daylight = np.sin(hour * np.pi / 24)**2
>>> cloud_factor = (1 - cloud**0.5)
>>> output = 100 * daylight * cloud_factor + np.random.rand(n_points) * 5
>>> output[(hour < 6) | (hour > 18)] = 0 # No output at night
>>>
>>> df_solar = pd.DataFrame({
...     'hour_of_day': hour,
...     'cloud_cover': cloud,
...     'panel_output': output
... })
>>>
>>> # Generate the plot
>>> ax = plot_feature_interaction(
...     df=df_solar,
...     theta_col='hour_of_day',
...     r_col='cloud_cover',
...     color_col='panel_output',
...     theta_period=24,
...     theta_bins=24,
...     r_bins=8,
...     cmap='inferno',
...     title='Solar Panel Output by Hour and Cloud Cover'
... )