kdiagram.datasets.make_cyclical_data¶
- kdiagram.datasets.make_cyclical_data(n_samples=365, n_series=2, cycle_period=365, noise_level=0.5, amplitude_true=10.0, offset_true=20.0, pred_bias=None, pred_noise_factor=None, pred_amplitude_factor=None, pred_phase_shift=None, prefix='model', series_names=None, seed=404, as_frame=False)[source]¶
Generate synthetic cyclical data for relationship and temporal plots.
Creates a dataset with a single true cyclical signal and one or more prediction series that can differ in amplitude, phase, bias, and noise relative to the truth. This is useful for demos of polar relationship and temporal-uncertainty plots in k-diagram [1][2][3].
This data is useful for demonstrating and testing functions like
plot_relationship()orplot_temporal_uncertainty()where visualizing behavior over a cycle is important.- Parameters:
- n_samples
int, default=365 Number of time steps to generate. Interpreted as evenly spaced samples over one or more cycles.
- n_series
int, default=2 Number of simulated prediction series (e.g., models).
- cycle_period
float, default=365 Samples per full cycle \(P\). The angular frequency is \(\omega = 2\pi / P\). Use
365for daily data over one year,12for monthly data over one year, etc.- noise_level
float, default=0.5 Standard deviation of Gaussian noise added to the true signal. Prediction series scale this by
pred_noise_factor.- amplitude_true
float, default=10.0 Amplitude of the sinusoidal true signal.
- offset_true
float, default=20.0 Vertical offset (mean level) of the true signal.
- pred_bias
floatorlistoffloat,optional Additive bias for each prediction series. If a scalar is provided it is broadcast to all
n_series. If a list is provided, its length must equaln_series. Defaults to[0.0, 1.5]whenNone.- pred_noise_factor
floatorlistoffloat,optional Multiplier for
noise_levelper series. Scalar values are broadcast; lists must matchn_seriesin length. Defaults to[1.0, 1.5]whenNone.- pred_amplitude_factor
floatorlistoffloat,optional Multiplier of
amplitude_trueper series (allows under/ over-estimation of the cycle amplitude). Scalar broadcast is supported. Defaults to[1.0, 0.8]whenNone.- pred_phase_shift
floatorlistoffloat,optional Phase shift (radians) added to each series. Positive values produce a lag relative to the truth. Scalar broadcast is supported. Defaults to
[0.0, np.pi / 6]whenNone.- prefix
str, default=’model’ Prefix used to generate prediction column names, e.g.,
model_A,model_B, …- series_names
listofstr,optional Explicit names for prediction columns. If omitted, names are generated from
prefixasprefix_A,prefix_B, … Must have lengthn_seriesif provided.- seed
intorNone, default=404 Seed for NumPy’s random generator. If
None, a fresh RNG is used.- as_framebool, default=False
If
False, return aBunchwith metadata and arrays. IfTrue, return only the pandasDataFrame.
- n_samples
- Returns:
- data
Bunchorpandas.DataFrame If
as_frame=False(default), a Bunch with:frame: pandasDataFramecontaining'time_step','y_true', and prediction columns.feature_names:['time_step'].target_names:['y_true'].target:ndarrayof shape(n_samples,)with the true signal.series_names: list of prediction series names.prediction_columns: list of prediction column names.DESCR: human-readable description.
If
as_frame=True, only the pandasDataFrameis returned.
- data
- Raises:
ValueErrorIf a provided list for prediction parameters does not match
n_seriesin length.TypeErrorIf prediction parameters are not float or list of float.
- Parameters:
- Return type:
Bunch | DataFrame
See also
kdiagram.plot.relationship.plot_relationshipPolar relationship scatter for true vs. predictions.
kdiagram.plot.uncertainty.plot_temporal_uncertaintyGeneral-purpose polar series plot; useful for Q10/Q50/Q90 and cyclical visualizations.
Notes
Signal model. Let \(P\) be the cycle period and \(\omega = 2\pi/P\). The true signal at time step \(t \in \{0,\dots,n\_samples-1\}\) is
(1)¶\[y_{\text{true}}(t) \;=\; \texttt{offset\_true} \;+\; \texttt{amplitude\_true}\,\sin(\omega t) \;+\; \varepsilon_t, \qquad \varepsilon_t \sim \mathcal{N}(0,\sigma^2), \;\; \sigma=\texttt{noise\_level}.\]For series \(k=1,\dots,n\_{\text{series}}\), the prediction is
(2)¶\[y_{\text{pred}}^{(k)}(t) \;=\; \texttt{offset\_true} \;+\; b_k \;+\; \big(\texttt{amplitude\_true}\,\alpha_k\big) \sin(\omega t + \phi_k) \;+\; \eta^{(k)}_t,\]with \(\eta^{(k)}_t \sim \mathcal{N}\!\big(0,\, (\sigma\,\gamma_k)^2\big)\). Here \(b_k\) is the bias (
pred_bias), \(\alpha_k\) the amplitude factor (pred_amplitude_factor), \(\phi_k\) the phase shift (pred_phase_shift), and \(\gamma_k\) the noise factor (pred_noise_factor). Numerical generation and plotting typically rely on array/scientific and graphics stacks [1][2][3].References
Examples
>>> Generate a small cyclical dataset as a Bunch: >>> >>> from kdiagram.datasets import make_cyclical_data >>> ds = make_cyclical_data( ... n_samples=24, n_series=2, cycle_period=12, seed=7 ... ) >>> ds.frame.head().columns.tolist()[:3] ['time_step', 'y_true', ds.prediction_columns[0]] >>> >>> Return only a DataFrame and supply custom names: >>> >>> df = make_cyclical_data( ... n_samples=50, ... n_series=3, ... series_names=['A','B','C'], ... as_frame=True, ... seed=1 ... ) >>> set(['time_step','y_true']).issubset(df.columns) True