kdiagram.datasets.make_cyclical_data

kdiagram.datasets.make_cyclical_data(n_samples=365, n_series=2, cycle_period=365, noise_level=0.5, amplitude_true=10.0, offset_true=20.0, pred_bias=None, pred_noise_factor=None, pred_amplitude_factor=None, pred_phase_shift=None, prefix='model', series_names=None, seed=404, as_frame=False)[source]

Generate synthetic cyclical data for relationship and temporal plots.

Creates a dataset with a single true cyclical signal and one or more prediction series that can differ in amplitude, phase, bias, and noise relative to the truth. This is useful for demos of polar relationship and temporal-uncertainty plots in k-diagram [1][2][3].

This data is useful for demonstrating and testing functions like plot_relationship() or plot_temporal_uncertainty() where visualizing behavior over a cycle is important.

Parameters:
n_samplesint, default=365

Number of time steps to generate. Interpreted as evenly spaced samples over one or more cycles.

n_seriesint, default=2

Number of simulated prediction series (e.g., models).

cycle_periodfloat, default=365

Samples per full cycle \(P\). The angular frequency is \(\omega = 2\pi / P\). Use 365 for daily data over one year, 12 for monthly data over one year, etc.

noise_levelfloat, default=0.5

Standard deviation of Gaussian noise added to the true signal. Prediction series scale this by pred_noise_factor.

amplitude_truefloat, default=10.0

Amplitude of the sinusoidal true signal.

offset_truefloat, default=20.0

Vertical offset (mean level) of the true signal.

pred_biasfloat or list of float, optional

Additive bias for each prediction series. If a scalar is provided it is broadcast to all n_series. If a list is provided, its length must equal n_series. Defaults to [0.0, 1.5] when None.

pred_noise_factorfloat or list of float, optional

Multiplier for noise_level per series. Scalar values are broadcast; lists must match n_series in length. Defaults to [1.0, 1.5] when None.

pred_amplitude_factorfloat or list of float, optional

Multiplier of amplitude_true per series (allows under/ over-estimation of the cycle amplitude). Scalar broadcast is supported. Defaults to [1.0, 0.8] when None.

pred_phase_shiftfloat or list of float, optional

Phase shift (radians) added to each series. Positive values produce a lag relative to the truth. Scalar broadcast is supported. Defaults to [0.0, np.pi / 6] when None.

prefixstr, default=’model’

Prefix used to generate prediction column names, e.g., model_A, model_B, …

series_nameslist of str, optional

Explicit names for prediction columns. If omitted, names are generated from prefix as prefix_A, prefix_B, … Must have length n_series if provided.

seedint or None, default=404

Seed for NumPy’s random generator. If None, a fresh RNG is used.

as_framebool, default=False

If False, return a Bunch with metadata and arrays. If True, return only the pandas DataFrame.

Returns:
dataBunch or pandas.DataFrame

If as_frame=False (default), a Bunch with:

  • frame : pandas DataFrame containing 'time_step', 'y_true', and prediction columns.

  • feature_names : ['time_step'].

  • target_names : ['y_true'].

  • target : ndarray of shape (n_samples,) with the true signal.

  • series_names : list of prediction series names.

  • prediction_columns : list of prediction column names.

  • DESCR : human-readable description.

If as_frame=True, only the pandas DataFrame is returned.

Raises:
ValueError

If a provided list for prediction parameters does not match n_series in length.

TypeError

If prediction parameters are not float or list of float.

Parameters:
Return type:

Bunch | DataFrame

See also

kdiagram.plot.relationship.plot_relationship

Polar relationship scatter for true vs. predictions.

kdiagram.plot.uncertainty.plot_temporal_uncertainty

General-purpose polar series plot; useful for Q10/Q50/Q90 and cyclical visualizations.

Notes

Signal model. Let \(P\) be the cycle period and \(\omega = 2\pi/P\). The true signal at time step \(t \in \{0,\dots,n\_samples-1\}\) is

(1)\[y_{\text{true}}(t) \;=\; \texttt{offset\_true} \;+\; \texttt{amplitude\_true}\,\sin(\omega t) \;+\; \varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0,\sigma^2), \;\; \sigma=\texttt{noise\_level}.\]

For series \(k=1,\dots,n\_{\text{series}}\), the prediction is

(2)\[y_{\text{pred}}^{(k)}(t) \;=\; \texttt{offset\_true} \;+\; b_k \;+\; \big(\texttt{amplitude\_true}\,\alpha_k\big) \sin(\omega t + \phi_k) \;+\; \eta^{(k)}_t,\]

with \(\eta^{(k)}_t \sim \mathcal{N}\!\big(0,\, (\sigma\,\gamma_k)^2\big)\). Here \(b_k\) is the bias (pred_bias), \(\alpha_k\) the amplitude factor (pred_amplitude_factor), \(\phi_k\) the phase shift (pred_phase_shift), and \(\gamma_k\) the noise factor (pred_noise_factor). Numerical generation and plotting typically rely on array/scientific and graphics stacks [1][2][3].

References

Examples

>>> Generate a small cyclical dataset as a Bunch:
>>>
>>> from kdiagram.datasets import make_cyclical_data
>>> ds = make_cyclical_data(
...     n_samples=24, n_series=2, cycle_period=12, seed=7
... )
>>> ds.frame.head().columns.tolist()[:3]
['time_step', 'y_true', ds.prediction_columns[0]]
>>>
>>> Return only a DataFrame and supply custom names:
>>>
>>> df = make_cyclical_data(
...     n_samples=50,
...     n_series=3,
...     series_names=['A','B','C'],
...     as_frame=True,
...     seed=1
... )
>>> set(['time_step','y_true']).issubset(df.columns)
True