kdiagram.datasets.make_taylor_data

kdiagram.datasets.make_taylor_data(n_samples=100, n_models=3, ref_std=1.0, corr_range=(0.5, 0.99), std_range=(0.7, 1.3), noise_level=0.3, bias_level=0.1, seed=101, as_frame=False)[source]

Generate synthetic data for Taylor diagrams.

Taylor diagrams, introduced by Taylor[1], summarize correlation, standard deviation, and centered RMS difference between model outputs and a reference. This routine creates one reference series and several model-like series with controllable correlation and spread, suitable for exercising plotting functions such as taylor_diagram(). Practical guidance on verification appears in [2].

Parameters:
n_samplesint, default=100

Number of observations in each generated series.

n_modelsint, default=3

Number of model (prediction) series to simulate.

ref_stdfloat, default=1.0

Target standard deviation for the reference series (mean is centered to 0).

corr_rangetuple of (float, float), default=(0.5, 0.99)

Closed interval from which target correlations \(\rho\) for models are sampled uniformly. Values should be in \([0,1]\) for standard Taylor use.

std_rangetuple of (float, float), default=(0.7, 1.3)

Closed interval for multiplicative factors applied to the reference standard deviation to obtain each model’s target spread.

noise_levelfloat, default=0.3

Standard deviation of the independent noise used to reach the requested spread and correlation. Must be positive if any target correlation is less than 1.

bias_levelfloat, default=0.1

Maximum absolute bias added to each model series (uniform in [-bias_level, bias_level]). Note that Taylor diagrams are insensitive to overall bias.

seedint or None, default=101

NumPy random seed. If None, a fresh RNG is used.

as_framebool, default=False

If False, return a Bunch with arrays, names, and summary stats. If True, return only a pandas DataFrame with columns for the reference and each model series.

Returns:
dataBunch or pandas.DataFrame

If as_frame=False (default), a Bunch with:

  • frame : pandas DataFrame with 'reference' and model columns.

  • reference : ndarray of shape (n_samples,).

  • predictions : list of ndarray predictions.

  • model_names : list of model labels.

  • stats : pandas DataFrame with columns 'stddev' and 'corrcoef' vs the reference.

  • ref_std : actual standard deviation of the reference.

  • DESCR : human-readable description.

If as_frame=True, only the pandas DataFrame is returned.

Raises:
ValueError

If ranges are invalid, or noise_level is non-positive while a sub-perfect target correlation is requested.

Parameters:
Return type:

Bunch | DataFrame

See also

kdiagram.plot.evaluation.taylor_diagram

Flexible Taylor diagram from raw arrays or pre-computed stats.

kdiagram.plot.evaluation.plot_taylor_diagram

Standard Taylor diagram from raw arrays.

kdiagram.plot.evaluation.plot_taylor_diagram_in

Taylor diagram with background shading.

Notes

Construction. Let the reference be \(r\) with \(\mathrm{E}[r]=0\) and \(\mathrm{sd}(r)=\sigma_r\) (we target \(\sigma_r=\texttt{ref\_std}\)). For model \(k\), we synthesize

(1)\[p^{(k)} \;=\; a^{(k)} r \;+\; b^{(k)} \epsilon^{(k)} \;+\; \text{bias}^{(k)},\]

with \(\epsilon^{(k)} \sim \mathcal{N}(0,\sigma_\epsilon^2)\) independent of \(r\), where \(\sigma_\epsilon=\texttt{noise\_level}\). Ignoring bias (centered statistics), the model spread and correlation satisfy

(2)\[\sigma_{p}^{(k)} \;=\; \sqrt{(a^{(k)} \sigma_r)^2 + (b^{(k)} \sigma_\epsilon)^2}, \qquad \rho^{(k)} \;=\; \frac{a^{(k)} \sigma_r}{\sigma_{p}^{(k)}}.\]

We sample a target \(\rho^{(k)} \in \texttt{corr\_range}\) and a target spread factor \(\alpha^{(k)} \in \texttt{std\_range}\), set \(\sigma_p^{(k)} = \alpha^{(k)} \sigma_r\), choose

(3)\[a^{(k)} \;=\; \rho^{(k)} \alpha^{(k)}, \qquad b^{(k)} \;=\; \frac{\sqrt{\left(\sigma_p^{(k)}\right)^2 - \left(a^{(k)} \sigma_r\right)^2}} {\sigma_\epsilon},\]

and draw a small constant \(\text{bias}^{(k)} \in [-\texttt{bias\_level},\texttt{bias\_level}]\). Centered Taylor statistics are unaffected by bias. See Taylor[1] for interpretation; broader verification context is covered in [2].

References

Examples

>>>  # Get arrays and stats as a Bunch:
>>>
>>> from kdiagram.datasets import make_taylor_data
>>> ds = make_taylor_data(n_models=2, seed=0)
>>> list(ds.frame.columns)
['reference', 'Model_A', 'Model_B']
>>> set(ds.stats.columns) == {'stddev', 'corrcoef'}
True
>>>
>>> # Return only a DataFrame:
>>>
>>> df = make_taylor_data(as_frame=True, seed=1)
>>> 'reference' in df.columns
True