kdiagram.datasets.make_fingerprint_data

kdiagram.datasets.make_fingerprint_data(n_layers=3, n_features=8, layer_names=None, feature_names=None, value_range=(0.0, 1.0), sparsity=0.1, add_structure=True, seed=303, as_frame=False)[source]

Generate synthetic feature-importance data for fingerprint plots.

Creates a matrix of feature-importance scores across multiple layers (e.g., models, periods, experimental groups) suitable for visualization with plot_feature_fingerprint(). This is handy for comparing profiles in a compact polar radar view and for testing feature-comparison workflows in forecasting and ML [1][2][3].

Parameters:
n_layersint, default=3

Number of rows (layers) in the importance matrix. Each row represents a group such as a model or time period.

n_featuresint, default=8

Number of columns (features) in the importance matrix.

layer_nameslist of str, optional

Names for the layers. If None, generic names like 'Layer_A', 'Layer_B' are generated. Must have length n_layers if provided.

feature_nameslist of str, optional

Names for the features. If None, generic names like 'Feature_1', 'Feature_2' are generated. Must have length n_features if provided.

value_rangetuple of (float, float), default=(0.0, 1.0)

Approximate sampling range (min_val, max_val) for raw importance scores. Values are drawn from a uniform distribution before structure/sparsity are applied.

sparsityfloat, default=0.1

Fraction in [0, 1] of entries that are set to zero at random, simulating unimportant features for some layers.

add_structurebool, default=True

If True, inject simple patterns to make fingerprints distinct, e.g., emphasizing one feature per layer and de-emphasizing another. If False, the matrix is fully random apart from sparsity.

seedint or None, default=303

Seed for NumPy’s random generator. If None, a fresh RNG is used.

as_framebool, default=False

If False, return a Bunch with metadata and arrays. If True, return only the pandas DataFrame indexed by layers with feature columns.

Returns:
dataBunch or pandas.DataFrame

If as_frame=False (default), a Bunch with:

  • importances : ndarray of shape (n_layers, n_features).

  • frame : pandas DataFrame view of the matrix with layers as index and features as columns.

  • layer_names : list of layer names.

  • feature_names : list of feature names.

  • DESCR : human-readable description.

If as_frame=True, only the pandas DataFrame is returned.

Raises:
ValueError

If layer_names or feature_names lengths do not match the specified dimensions, if sparsity is outside [0, 1], or if value_range does not satisfy min_val <= max_val.

Parameters:
Return type:

Bunch | DataFrame

See also

kdiagram.plot.feature_based.plot_feature_fingerprint

Radar-style comparison of multi-feature profiles across layers.

Notes

Generation model. Let \(I \in \mathbb{R}^{L \times F}\) denote the importance matrix with \(L = \texttt{n\_layers}\) and \(F = \texttt{n\_features}\). Raw scores are sampled as

(1)\[I_{k,j}^{(0)} \sim \mathcal{U}(m, M), \qquad m = \texttt{value\_range[0]},\; M = \texttt{value\_range[1]}.\]

If structure is enabled, a layer-specific emphasis and de-emphasis may be applied, producing \(I^{(1)}\). Finally, a sparsity mask \(\;M_{k,j} \sim \text{Bernoulli}(1-s)\;\) with \(s=\texttt{sparsity}\) is applied:

(2)\[I_{k,j} \;=\; I_{k,j}^{(1)} \cdot M_{k,j}.\]

Scores are left in their original scale; you may normalize per-layer or per-feature downstream if desired. For practical feature-importance workflows and attribution in forecasting, see Pedregosa et al.[1] and Lim et al.[2]. The fingerprint visualization concept is part of our polar analytics framework Kouadio[3].

References

Examples

>>> Return a Bunch with arrays and a DataFrame view:
>>>
>>> from kdiagram.datasets import make_fingerprint_data
>>> fp = make_fingerprint_data(n_layers=4, n_features=10, seed=1)
>>> fp.importances.shape
(4, 10)
>>> list(fp.frame.index)[:2], list(fp.frame.columns)[:3]
(['Layer_A', 'Layer_B'], ['Feature_1', 'Feature_2', 'Feature_3'])
>>>
>>> Return only a DataFrame with custom names:
>>>
>>> df = make_fingerprint_data(
...     n_layers=3,
...     n_features=5,
...     layer_names=['L1','L2','L3'],
...     feature_names=['f1','f2','f3','f4','f5'],
...     as_frame=True,
...     seed=2,
... )
>>> df.shape
(3, 5)