kdiagram.datasets.make_fingerprint_data¶
- kdiagram.datasets.make_fingerprint_data(n_layers=3, n_features=8, layer_names=None, feature_names=None, value_range=(0.0, 1.0), sparsity=0.1, add_structure=True, seed=303, as_frame=False)[source]¶
Generate synthetic feature-importance data for fingerprint plots.
Creates a matrix of feature-importance scores across multiple layers (e.g., models, periods, experimental groups) suitable for visualization with
plot_feature_fingerprint(). This is handy for comparing profiles in a compact polar radar view and for testing feature-comparison workflows in forecasting and ML [1][2][3].- Parameters:
- n_layers
int, default=3 Number of rows (layers) in the importance matrix. Each row represents a group such as a model or time period.
- n_features
int, default=8 Number of columns (features) in the importance matrix.
- layer_names
listofstr,optional Names for the layers. If
None, generic names like'Layer_A','Layer_B'are generated. Must have lengthn_layersif provided.- feature_names
listofstr,optional Names for the features. If
None, generic names like'Feature_1','Feature_2'are generated. Must have lengthn_featuresif provided.- value_range
tupleof(float,float), default=(0.0, 1.0) Approximate sampling range
(min_val, max_val)for raw importance scores. Values are drawn from a uniform distribution before structure/sparsity are applied.- sparsity
float, default=0.1 Fraction in
[0, 1]of entries that are set to zero at random, simulating unimportant features for some layers.- add_structurebool, default=True
If
True, inject simple patterns to make fingerprints distinct, e.g., emphasizing one feature per layer and de-emphasizing another. IfFalse, the matrix is fully random apart from sparsity.- seed
intorNone, default=303 Seed for NumPy’s random generator. If
None, a fresh RNG is used.- as_framebool, default=False
If
False, return aBunchwith metadata and arrays. IfTrue, return only the pandasDataFrameindexed by layers with feature columns.
- n_layers
- Returns:
- data
Bunchorpandas.DataFrame If
as_frame=False(default), a Bunch with:importances:ndarrayof shape(n_layers, n_features).frame: pandasDataFrameview of the matrix with layers as index and features as columns.layer_names: list of layer names.feature_names: list of feature names.DESCR: human-readable description.
If
as_frame=True, only the pandasDataFrameis returned.
- data
- Raises:
ValueErrorIf
layer_namesorfeature_nameslengths do not match the specified dimensions, ifsparsityis outside[0, 1], or ifvalue_rangedoes not satisfymin_val <= max_val.
- Parameters:
- Return type:
Bunch | DataFrame
See also
kdiagram.plot.feature_based.plot_feature_fingerprintRadar-style comparison of multi-feature profiles across layers.
Notes
Generation model. Let \(I \in \mathbb{R}^{L \times F}\) denote the importance matrix with \(L = \texttt{n\_layers}\) and \(F = \texttt{n\_features}\). Raw scores are sampled as
(1)¶\[I_{k,j}^{(0)} \sim \mathcal{U}(m, M), \qquad m = \texttt{value\_range[0]},\; M = \texttt{value\_range[1]}.\]If structure is enabled, a layer-specific emphasis and de-emphasis may be applied, producing \(I^{(1)}\). Finally, a sparsity mask \(\;M_{k,j} \sim \text{Bernoulli}(1-s)\;\) with \(s=\texttt{sparsity}\) is applied:
(2)¶\[I_{k,j} \;=\; I_{k,j}^{(1)} \cdot M_{k,j}.\]Scores are left in their original scale; you may normalize per-layer or per-feature downstream if desired. For practical feature-importance workflows and attribution in forecasting, see Pedregosa et al.[1] and Lim et al.[2]. The fingerprint visualization concept is part of our polar analytics framework Kouadio[3].
References
Examples
>>> Return a Bunch with arrays and a DataFrame view: >>> >>> from kdiagram.datasets import make_fingerprint_data >>> fp = make_fingerprint_data(n_layers=4, n_features=10, seed=1) >>> fp.importances.shape (4, 10) >>> list(fp.frame.index)[:2], list(fp.frame.columns)[:3] (['Layer_A', 'Layer_B'], ['Feature_1', 'Feature_2', 'Feature_3']) >>> >>> Return only a DataFrame with custom names: >>> >>> df = make_fingerprint_data( ... n_layers=3, ... n_features=5, ... layer_names=['L1','L2','L3'], ... feature_names=['f1','f2','f3','f4','f5'], ... as_frame=True, ... seed=2, ... ) >>> df.shape (3, 5)