kdiagram.datasets.make_taylor_data¶
- kdiagram.datasets.make_taylor_data(n_samples=100, n_models=3, ref_std=1.0, corr_range=(0.5, 0.99), std_range=(0.7, 1.3), noise_level=0.3, bias_level=0.1, seed=101, as_frame=False)[source]¶
Generate synthetic data for Taylor diagrams.
Taylor diagrams, introduced by Taylor[1], summarize correlation, standard deviation, and centered RMS difference between model outputs and a reference. This routine creates one reference series and several model-like series with controllable correlation and spread, suitable for exercising plotting functions such as
taylor_diagram(). Practical guidance on verification appears in [2].- Parameters:
- n_samples
int, default=100 Number of observations in each generated series.
- n_models
int, default=3 Number of model (prediction) series to simulate.
- ref_std
float, default=1.0 Target standard deviation for the reference series (mean is centered to 0).
- corr_range
tupleof(float,float), default=(0.5, 0.99) Closed interval from which target correlations \(\rho\) for models are sampled uniformly. Values should be in \([0,1]\) for standard Taylor use.
- std_range
tupleof(float,float), default=(0.7, 1.3) Closed interval for multiplicative factors applied to the reference standard deviation to obtain each model’s target spread.
- noise_level
float, default=0.3 Standard deviation of the independent noise used to reach the requested spread and correlation. Must be positive if any target correlation is less than 1.
- bias_level
float, default=0.1 Maximum absolute bias added to each model series (uniform in
[-bias_level, bias_level]). Note that Taylor diagrams are insensitive to overall bias.- seed
intorNone, default=101 NumPy random seed. If
None, a fresh RNG is used.- as_framebool, default=False
If
False, return aBunchwith arrays, names, and summary stats. IfTrue, return only a pandasDataFramewith columns for the reference and each model series.
- n_samples
- Returns:
- data
Bunchorpandas.DataFrame If
as_frame=False(default), a Bunch with:frame: pandasDataFramewith'reference'and model columns.reference:ndarrayof shape(n_samples,).predictions: list ofndarraypredictions.model_names: list of model labels.stats: pandasDataFramewith columns'stddev'and'corrcoef'vs the reference.ref_std: actual standard deviation of the reference.DESCR: human-readable description.
If
as_frame=True, only the pandasDataFrameis returned.
- data
- Raises:
ValueErrorIf ranges are invalid, or
noise_levelis non-positive while a sub-perfect target correlation is requested.
- Parameters:
- Return type:
Bunch | DataFrame
See also
kdiagram.plot.evaluation.taylor_diagramFlexible Taylor diagram from raw arrays or pre-computed stats.
kdiagram.plot.evaluation.plot_taylor_diagramStandard Taylor diagram from raw arrays.
kdiagram.plot.evaluation.plot_taylor_diagram_inTaylor diagram with background shading.
Notes
Construction. Let the reference be \(r\) with \(\mathrm{E}[r]=0\) and \(\mathrm{sd}(r)=\sigma_r\) (we target \(\sigma_r=\texttt{ref\_std}\)). For model \(k\), we synthesize
(1)¶\[p^{(k)} \;=\; a^{(k)} r \;+\; b^{(k)} \epsilon^{(k)} \;+\; \text{bias}^{(k)},\]with \(\epsilon^{(k)} \sim \mathcal{N}(0,\sigma_\epsilon^2)\) independent of \(r\), where \(\sigma_\epsilon=\texttt{noise\_level}\). Ignoring bias (centered statistics), the model spread and correlation satisfy
(2)¶\[\sigma_{p}^{(k)} \;=\; \sqrt{(a^{(k)} \sigma_r)^2 + (b^{(k)} \sigma_\epsilon)^2}, \qquad \rho^{(k)} \;=\; \frac{a^{(k)} \sigma_r}{\sigma_{p}^{(k)}}.\]We sample a target \(\rho^{(k)} \in \texttt{corr\_range}\) and a target spread factor \(\alpha^{(k)} \in \texttt{std\_range}\), set \(\sigma_p^{(k)} = \alpha^{(k)} \sigma_r\), choose
(3)¶\[a^{(k)} \;=\; \rho^{(k)} \alpha^{(k)}, \qquad b^{(k)} \;=\; \frac{\sqrt{\left(\sigma_p^{(k)}\right)^2 - \left(a^{(k)} \sigma_r\right)^2}} {\sigma_\epsilon},\]and draw a small constant \(\text{bias}^{(k)} \in [-\texttt{bias\_level},\texttt{bias\_level}]\). Centered Taylor statistics are unaffected by bias. See Taylor[1] for interpretation; broader verification context is covered in [2].
References
Examples
>>> # Get arrays and stats as a Bunch: >>> >>> from kdiagram.datasets import make_taylor_data >>> ds = make_taylor_data(n_models=2, seed=0) >>> list(ds.frame.columns) ['reference', 'Model_A', 'Model_B'] >>> set(ds.stats.columns) == {'stddev', 'corrcoef'} True >>> >>> # Return only a DataFrame: >>> >>> df = make_taylor_data(as_frame=True, seed=1) >>> 'reference' in df.columns True