kdiagram.datasets.make_taylor_data¶

kdiagram.datasets.make_taylor_data(n_samples=100, n_models=3, ref_std=1.0, corr_range=(0.5, 0.99), std_range=(0.7, 1.3), noise_level=0.3, bias_level=0.1, seed=101, as_frame=False)[source]¶

Generate synthetic data for Taylor diagrams.

Taylor diagrams, introduced by Taylor[1], summarize correlation, standard deviation, and centered RMS difference between model outputs and a reference. This routine creates one reference series and several model-like series with controllable correlation and spread, suitable for exercising plotting functions such as taylor_diagram(). Practical guidance on verification appears in [2].

Parameters:

n_samplesint, default=100: Number of observations in each generated series.
n_modelsint, default=3: Number of model (prediction) series to simulate.
ref_stdfloat, default=1.0: Target standard deviation for the reference series (mean is centered to 0).
corr_rangetuple of (float, float), default=(0.5, 0.99): Closed interval from which target correlations \(\rho\) for models are sampled uniformly. Values should be in \([0,1]\) for standard Taylor use.
std_rangetuple of (float, float), default=(0.7, 1.3): Closed interval for multiplicative factors applied to the reference standard deviation to obtain each model’s target spread.
noise_levelfloat, default=0.3: Standard deviation of the independent noise used to reach the requested spread and correlation. Must be positive if any target correlation is less than 1.
bias_levelfloat, default=0.1: Maximum absolute bias added to each model series (uniform in [-bias_level, bias_level]). Note that Taylor diagrams are insensitive to overall bias.
seedint or None, default=101: NumPy random seed. If None, a fresh RNG is used.
as_framebool, default=False: If False, return a Bunch with arrays, names, and summary stats. If True, return only a pandas DataFrame with columns for the reference and each model series.

Returns:

dataBunch or pandas.DataFrame

If as_frame=False (default), a Bunch with:

frame : pandas DataFrame with 'reference' and model columns.
reference : ndarray of shape (n_samples,).
predictions : list of ndarray predictions.
model_names : list of model labels.
stats : pandas DataFrame with columns 'stddev' and 'corrcoef' vs the reference.
ref_std : actual standard deviation of the reference.
DESCR : human-readable description.

If as_frame=True, only the pandas DataFrame is returned.

Raises:

ValueError: If ranges are invalid, or noise_level is non-positive while a sub-perfect target correlation is requested.

Parameters:

n_samples (int)
n_models (int)
ref_std (float)
corr_range (tuple[float, float])
std_range (tuple[float, float])
noise_level (float)
bias_level (float)
seed (int | None)
as_frame (bool)

Return type:

Bunch | DataFrame

See also

kdiagram.plot.evaluation.taylor_diagram: Flexible Taylor diagram from raw arrays or pre-computed stats.
kdiagram.plot.evaluation.plot_taylor_diagram: Standard Taylor diagram from raw arrays.
kdiagram.plot.evaluation.plot_taylor_diagram_in: Taylor diagram with background shading.

Notes

Construction. Let the reference be \(r\) with \(\mathrm{E}[r]=0\) and \(\mathrm{sd}(r)=\sigma_r\) (we target \(\sigma_r=\texttt{ref\_std}\)). For model \(k\), we synthesize

(1)¶\[p^{(k)} \;=\; a^{(k)} r \;+\; b^{(k)} \epsilon^{(k)} \;+\; \text{bias}^{(k)},\]

with \(\epsilon^{(k)} \sim \mathcal{N}(0,\sigma_\epsilon^2)\) independent of \(r\), where \(\sigma_\epsilon=\texttt{noise\_level}\). Ignoring bias (centered statistics), the model spread and correlation satisfy

(2)¶\[\sigma_{p}^{(k)} \;=\; \sqrt{(a^{(k)} \sigma_r)^2 + (b^{(k)} \sigma_\epsilon)^2}, \qquad \rho^{(k)} \;=\; \frac{a^{(k)} \sigma_r}{\sigma_{p}^{(k)}}.\]

We sample a target \(\rho^{(k)} \in \texttt{corr\_range}\) and a target spread factor \(\alpha^{(k)} \in \texttt{std\_range}\), set \(\sigma_p^{(k)} = \alpha^{(k)} \sigma_r\), choose

(3)¶\[a^{(k)} \;=\; \rho^{(k)} \alpha^{(k)}, \qquad b^{(k)} \;=\; \frac{\sqrt{\left(\sigma_p^{(k)}\right)^2 - \left(a^{(k)} \sigma_r\right)^2}} {\sigma_\epsilon},\]

and draw a small constant \(\text{bias}^{(k)} \in [-\texttt{bias\_level},\texttt{bias\_level}]\). Centered Taylor statistics are unaffected by bias. See Taylor[1] for interpretation; broader verification context is covered in [2].

References

Examples

>>>  # Get arrays and stats as a Bunch:
>>>
>>> from kdiagram.datasets import make_taylor_data
>>> ds = make_taylor_data(n_models=2, seed=0)
>>> list(ds.frame.columns)
['reference', 'Model_A', 'Model_B']
>>> set(ds.stats.columns) == {'stddev', 'corrcoef'}
True
>>>
>>> # Return only a DataFrame:
>>>
>>> df = make_taylor_data(as_frame=True, seed=1)
>>> 'reference' in df.columns
True