kdiagram.plot.uncertainty.plot_velocity

kdiagram.plot.uncertainty.plot_velocity(df, q50_cols, theta_col=None, cmap='viridis', acov='default', normalize=True, use_abs_color=True, figsize=(9, 9), title=None, s=30, alpha=0.85, show_grid=True, savefig=None, cbar=True, mask_angle=False, dpi=300, ax=None)[source]

Polar plot visualizing average velocity across locations.

Generates a polar scatter plot where each point represents a unique location or observation from the input DataFrame. The radial distance (r) of each point corresponds to the average rate of change (velocity) of the median prediction (Q50) over consecutive time periods (e.g., years), optionally normalized to [0, 1]. The angular position (theta) represents the location, currently determined by its index in the DataFrame, mapped onto a specified angular coverage. The color of each point provides an additional dimension, representing either the calculated velocity itself or the average absolute magnitude of the Q50 predictions over the considered time periods [1].

This visualization is useful for identifying spatial patterns in the dynamics of a phenomenon, such as locating areas of rapid or slow change (high/low velocity) in land subsidence predictions. Coloring by magnitude helps to contextualize the velocity (e.g., is high velocity occurring in areas of already high subsidence?).

Parameters:
dfpd.DataFrame

Input DataFrame containing the data. Must include the columns specified in q50_cols. Decorator @isdf ensures this is a pandas DataFrame. Decorator @check_non_emptiness ensures it’s not empty.

q50_colslist of str

An ordered list of column names representing the Q50 (median) predictions for consecutive time steps (e.g., years). The list must contain at least two column names to compute velocity. Example: ['subsidence_2022_q50', 'subsidence_2023_q50', 'subsidence_2024_q50'].

theta_colstr, optional

Intended column name to determine the angular position (theta) for each location (e.g., ‘latitude’, ‘longitude’, or a spatial index). If None, the DataFrame index is conceptually used. Note: The current implementation maps the DataFrame row index to the angular range specified by `acov`, regardless of whether `theta_col` is provided. Providing `theta_col` will currently trigger a warning but will not affect the plot’s angular axis. Default is None.

cmapstr, default=’viridis’

The name of the Matplotlib colormap used to color the scatter points based on color_vals (determined by use_abs_color).

acovstr, default=’default’

Angular coverage defining the span of the polar plot’s theta axis. Options are:

  • 'default': Full circle (2p radians or 360 degrees).

  • 'half_circle': Half circle (p radians or 180 degrees).

  • 'quarter_circle': Quarter circle (p/2 radians or 90 degrees).

  • 'eighth_circle': Eighth circle (p/4 radians or 45 degrees).

Invalid options default to 'default'.

normalizebool, default=True

If True, the calculated average velocity values (r) are min-max normalized to the range [0, 1] before plotting radially. This emphasizes relative velocity patterns. If False, the raw average velocity values are used for the radial coordinate.

use_abs_colorbool, default=True

Determines the variable used for coloring the points:

  • If True, points are colored based on the average absolute magnitude of the Q50 values across the specified q50_cols. This highlights areas with high overall prediction values.

  • If False, points are colored based on the calculated average velocity (r) itself. This highlights areas of high or low rate of change.

figsizetuple of (float, float), default=(9, 9)

The width and height of the figure in inches.

titlestr, optional

The title displayed above the polar plot. If None, a default title “Normalized Subsidence Velocity” (or similar, depending on context, though not dynamically changed here) is used. Default is None.

sfloat or int, default=30

The marker size for the scatter points.

alphafloat, default=0.85

The transparency level of the scatter points (0=transparent, 1=opaque). Useful for visualizing dense data.

show_gridbool, default=True

If True, display the polar grid lines (radial and angular) on the plot.

savefigstr, optional

The file path (including extension, e.g., ‘velocity_plot.pdf’) where the plot image should be saved. If None, the plot is displayed interactively using plt.show(). Default is None.

cbarbool, default=True

If True, display a color bar alongside the plot indicating the mapping between colors and the values defined by use_abs_color.

mask_anglebool, default=False

If True, hide the angular tick labels (the degrees/radians around the circumference). This can be useful if the angular position based on index is not inherently meaningful.

Returns:
axmatplotlib.axes.Axes

The Matplotlib Axes object containing the polar scatter plot. Can be used for further customization.

Raises:
ValueError

If q50_cols contains fewer than two column names.

Parameters:

See also

numpy.diff

Computes the difference between consecutive elements.

numpy.mean

Computes the arithmetic mean.

matplotlib.pyplot.scatter

Creates scatter plots.

matplotlib.pyplot.polar

Creates polar plots.

kdiagram.plot.uncertainty.plot_uncertainty_drift

Visualizes uncertainty width changes over time.

Notes

  • The function assumes the columns in q50_cols represent equally spaced time steps for the velocity calculation to be meaningful as an average yearly (or per-step) velocity.

  • The average velocity (r) is calculated as the mean of the first-order differences between consecutive columns in q50_cols.

  • Normalization of r uses min-max scaling: \(r' = (r - \min(r)) / (\max(r) - \min(r))\).

  • The angular coordinate theta is currently derived from the DataFrame index, mapped linearly onto the angular range defined by acov. The theta_col parameter is not used for positioning in the current implementation, which might be revised in future versions. A warning is issued if theta_col is provided [2].

Let \(\mathbf{Q}\) be the data matrix extracted from df using columns q50_cols, with shape \((N, M)\), where \(N\) is the number of locations (rows) and \(M\) is the number of time points (columns). Note the transpose compared to the description in plot_feature_fingerprint.

  1. Velocity Calculation: The differences between consecutive time points for each location \(j\) are computed:

    \(\Delta Q_{j,i} = Q_{j, i+1} - Q_{j, i}\) for \(i = 0, \dots, M-2\).

    The average velocity for location \(j\) is:

    (1)\[r_j = \frac{1}{M-1} \sum_{i=0}^{M-2} \Delta Q_{j,i}\]
  2. Radial Normalization (if normalize=True):

    Let \(\mathbf{r} = (r_0, \dots, r_{N-1})\).

    (2)\[r'_j = \frac{r_j - \min(\mathbf{r})}{\max(\mathbf{r}) - \min(\mathbf{r})}\]

    If \(\max(\mathbf{r}) = \min(\mathbf{r})\), \(r'_j = 0\).

  3. Color Value Calculation:

    • If use_abs_color=True: Average absolute magnitude.

      (3)\[c_j = \frac{1}{M} \sum_{i=0}^{M-1} |Q_{j,i}|\]
    • If use_abs_color=False: Use average velocity. \(c_j = r_j\)

  4. Angular Coordinate Calculation:

    Let \(S\) be the angular span in radians determined by acov (e.g., \(2\pi\) for 'default'). The angle for location \(j\) (where \(j\) is the row index from \(0\) to \(N-1\)) is:

    (4)\[\theta_j = \frac{j}{N} \times S\]

    The code uses np.linspace(0, 1, N) which generates N points from 0 to 1 inclusive, so the formula might be slightly different depending on endpoint handling, effectively \(\theta_j = \frac{j}{N-1} \times S\) for the N points if endpoint=True, or spacing relates to N intervals if endpoint=False. The code uses np.linspace(0, 1, N) and multiplies by angle_span, suggesting the angles might span from 0 up to angle_span.

References

[1]

Kouadio, K. L., Liu, R., Loukou, K. G. H., Liu, J., & Liu, W. (2025). Analytics Framework for Interpreting Spatiotemporal Probabilistic Forecasts. International Journal of Forecasting. Manuscript submitted.

[2]

Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90-95.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from kdiagram.plot.uncertainty import plot_velocity

1. Random Example:

>>> np.random.seed(0)
>>> N_points = 100
>>> df_random = pd.DataFrame({
...     'location_id': range(N_points),
...     'value_2020_q50': np.random.rand(N_points) * 10,
...     'value_2021_q50': (np.random.rand(N_points) * 10 +
...                        np.linspace(0, 5, N_points)),
...     'value_2022_q50': (np.random.rand(N_points) * 10 +
...                        np.linspace(0, 10, N_points)),
...     'latitude': np.linspace(22, 23, N_points)
... })
>>> q50_cols_random = ['value_2020_q50', 'value_2021_q50',
...                    'value_2022_q50']
>>> ax_random = plot_velocity(
...     df=df_random,
...     q50_cols=q50_cols_random,
...     theta_col='latitude', # Note: currently ignored for pos
...     acov='default',
...     normalize=True,
...     use_abs_color=False, # Color by velocity
...     title='Random Data Velocity Profile',
...     cmap='coolwarm',
...     s=40,
...     cbar=True
... )
>>> # plt.show() is called internally if savefig is None

2. Concrete Example (Subsidence Data - adapted from docstring):

>>> # Assume zhongshan_pred_2023_2026 is a loaded DataFrame like:
>>> # zhongshan_pred_2023_2026 = pd.DataFrame({
>>> #     'subsidence_2022_q50': np.random.rand(50)*5 + 5,
>>> #     'subsidence_2023_q50': np.random.rand(50)*6 + 6,
>>> #     'subsidence_2024_q50': np.random.rand(50)*7 + 7,
>>> #     'subsidence_2025_q50': np.random.rand(50)*8 + 8,
>>> #     'subsidence_2026_q50': np.random.rand(50)*9 + 9,
>>> #     'latitude': np.linspace(22.2, 22.8, 50)
>>> # }) # Dummy data for example execution
>>> # Create dummy data if zhongshan_pred_2023_2026 doesn't exist
>>> try:
...    zhongshan_pred_2023_2026
... except NameError:
...    print("Creating dummy subsidence data for example...")
...    zhongshan_pred_2023_2026 = pd.DataFrame({
...       'subsidence_2022_q50': np.random.rand(150)*5 + 5,
...       'subsidence_2023_q50': np.random.rand(150)*6 + 6 + np.linspace(0, 2, 150),
...       'subsidence_2024_q50': np.random.rand(150)*7 + 7 + np.linspace(0, 4, 150),
...       'subsidence_2025_q50': np.random.rand(150)*8 + 8 + np.linspace(0, 6, 150),
...       'subsidence_2026_q50': np.random.rand(150)*9 + 9 + np.linspace(0, 8, 150),
...       'latitude': np.linspace(22.2, 22.8, 150)
...     })
>>> subsidence_q50_cols = [
...     'subsidence_2022_q50', 'subsidence_2023_q50',
...     'subsidence_2024_q50', 'subsidence_2025_q50',
...     'subsidence_2026_q50',
... ]
>>> ax_subsidence = plot_velocity(
...     df=zhongshan_pred_2023_2026,
...     q50_cols=subsidence_q50_cols,
...     theta_col='latitude',       # Ignored for pos, triggers warning
...     acov='quarter_circle',      # Focus angular range
...     normalize=True,
...     use_abs_color=True,         # Color by Q50 magnitude
...     title='Subsidence Velocity Across Zhongshan (2022–2026)',
...     cmap='plasma',
...     s=25,
...     cbar=True,
...     mask_angle=True             # Hide angle labels
... )
>>> # plt.show() called internally