kdiagram.datasets.make_fingerprint_data¶

kdiagram.datasets.make_fingerprint_data(n_layers=3, n_features=8, layer_names=None, feature_names=None, value_range=(0.0, 1.0), sparsity=0.1, add_structure=True, seed=303, as_frame=False)[source]¶

Generate synthetic feature-importance data for fingerprint plots.

Creates a matrix of feature-importance scores across multiple layers (e.g., models, periods, experimental groups) suitable for visualization with plot_feature_fingerprint(). This is handy for comparing profiles in a compact polar radar view and for testing feature-comparison workflows in forecasting and ML [1][2][3].

Parameters:

n_layersint, default=3: Number of rows (layers) in the importance matrix. Each row represents a group such as a model or time period.
n_featuresint, default=8: Number of columns (features) in the importance matrix.
layer_nameslist of str, optional: Names for the layers. If None, generic names like 'Layer_A', 'Layer_B' are generated. Must have length n_layers if provided.
feature_nameslist of str, optional: Names for the features. If None, generic names like 'Feature_1', 'Feature_2' are generated. Must have length n_features if provided.
value_rangetuple of (float, float), default=(0.0, 1.0): Approximate sampling range (min_val, max_val) for raw importance scores. Values are drawn from a uniform distribution before structure/sparsity are applied.
sparsityfloat, default=0.1: Fraction in [0, 1] of entries that are set to zero at random, simulating unimportant features for some layers.
add_structurebool, default=True: If True, inject simple patterns to make fingerprints distinct, e.g., emphasizing one feature per layer and de-emphasizing another. If False, the matrix is fully random apart from sparsity.
seedint or None, default=303: Seed for NumPy’s random generator. If None, a fresh RNG is used.
as_framebool, default=False: If False, return a Bunch with metadata and arrays. If True, return only the pandas DataFrame indexed by layers with feature columns.

Returns:

dataBunch or pandas.DataFrame

If as_frame=False (default), a Bunch with:

importances : ndarray of shape (n_layers, n_features).
frame : pandas DataFrame view of the matrix with layers as index and features as columns.
layer_names : list of layer names.
feature_names : list of feature names.
DESCR : human-readable description.

If as_frame=True, only the pandas DataFrame is returned.

Raises:

ValueError: If layer_names or feature_names lengths do not match the specified dimensions, if sparsity is outside [0, 1], or if value_range does not satisfy min_val <= max_val.

Parameters:

n_layers (int)
n_features (int)
layer_names (list[str] | None)
feature_names (list[str] | None)
value_range (tuple[float, float])
sparsity (float)
add_structure (bool)
seed (int | None)
as_frame (bool)

Return type:

Bunch | DataFrame

See also

kdiagram.plot.feature_based.plot_feature_fingerprint: Radar-style comparison of multi-feature profiles across layers.

Notes

Generation model. Let \(I \in \mathbb{R}^{L \times F}\) denote the importance matrix with \(L = \texttt{n\_layers}\) and \(F = \texttt{n\_features}\). Raw scores are sampled as

(1)¶\[I_{k,j}^{(0)} \sim \mathcal{U}(m, M), \qquad m = \texttt{value\_range[0]},\; M = \texttt{value\_range[1]}.\]

If structure is enabled, a layer-specific emphasis and de-emphasis may be applied, producing \(I^{(1)}\). Finally, a sparsity mask \(\;M_{k,j} \sim \text{Bernoulli}(1-s)\;\) with \(s=\texttt{sparsity}\) is applied:

(2)¶\[I_{k,j} \;=\; I_{k,j}^{(1)} \cdot M_{k,j}.\]

Scores are left in their original scale; you may normalize per-layer or per-feature downstream if desired. For practical feature-importance workflows and attribution in forecasting, see Pedregosa et al.[1] and Lim et al.[2]. The fingerprint visualization concept is part of our polar analytics framework Kouadio[3].

References

Examples

>>> Return a Bunch with arrays and a DataFrame view:
>>>
>>> from kdiagram.datasets import make_fingerprint_data
>>> fp = make_fingerprint_data(n_layers=4, n_features=10, seed=1)
>>> fp.importances.shape
(4, 10)
>>> list(fp.frame.index)[:2], list(fp.frame.columns)[:3]
(['Layer_A', 'Layer_B'], ['Feature_1', 'Feature_2', 'Feature_3'])
>>>
>>> Return only a DataFrame with custom names:
>>>
>>> df = make_fingerprint_data(
...     n_layers=3,
...     n_features=5,
...     layer_names=['L1','L2','L3'],
...     feature_names=['f1','f2','f3','f4','f5'],
...     as_frame=True,
...     seed=2,
... )
>>> df.shape
(3, 5)