seisgo.clustering

The clustering module groups 1-D depth-velocity profiles from 3-D seismic tomography models into coherent spatial clusters using k-means or self-organizing maps (SOM).

Function Summary

`vpcluster_kmean` Parameter Reference

Parameter	Default	Description
`lat`	required	1-D array of latitudes for the model grid.
`lon`	required	1-D array of longitudes for the model grid.
`dep`	required	1-D depth array (km).
`vmodel`	required	3-D velocity array, shape `(n_depth, n_lat, n_lon)`.
`ncluster`	`None`	Number of clusters. If `None`, automatically determined via the elbow method.
`nrange`	`None`	Range of cluster counts to evaluate when `ncluster=None`. Default: 2–20.
`spacing`	`1`	Spatial sub-sampling stride (every Nth lat/lon point).
`zrange`	`None`	`[z_min, z_max]` depth range to use. Default: full model range.
`dz`	`None`	Depth interpolation interval (km). `None` = use model depth grid as-is.
`metric`	`"euclidean"`	Distance metric for k-means. Passed to `TimeSeriesKMeans`.
`max_iter_barycenter`	`100`	Maximum DBA (DTW Barycenter Averaging) iterations.
`random_state`	`0`	Random seed for reproducibility.
`njob`	`1`	Number of parallel jobs.
`plot`	`True`	Plot cluster profiles and map.
`savefig`	`True`	Save figures to PNG.
`figbase`	`"kmean"`	Base name for output figure and pickle files.
`save`	`True`	Save results to a pickle file. If `False`, returns `outdict`.
`evaluate_smooth`	`False`	Smooth the distortion curve before knee detection.
`evaluate_plot`	`True`	Plot the elbow curve when auto-detecting cluster count.

`vpcluster_som` Parameter Reference

Parameter	Default	Description
`lat`, `lon`, `depth`, `v`	required	Same as `vpcluster_kmean` (note: `depth` not `dep`; `v` not `vmodel`).
`grid_size`	`None`	`[som_x, som_y]` SOM grid dimensions. `None` = auto: `ceil(√(√N))²`.
`spacing`	`1`	Spatial sub-sampling stride.
`niteration`	`50000`	Number of SOM training iterations.
`sigma`	`0.3`	Initial neighbourhood radius.
`rate`	`0.1`	Initial learning rate.
`plot`, `savefig`, `figbase`, `save`	same as k-means	Same behaviour as `vpcluster_kmean`.

Output dictionary structure

Both functions return (or save) a dictionary with the following keys:

Key	Description
`method`	`"k-means"` or `"som"`
`source`	User-supplied source label string
`tag`	User-supplied variable label string
`depth`	1-D depth vector used for clustering
`model`	Fitted `TimeSeriesKMeans` or `MiniSom` object
`pred`	List of length `n_clusters`; each element is an array of profiles in that cluster
`para`	Dictionary of algorithm parameters
`cluster_map`	`pandas.DataFrame` with columns `lat`, `lon`, `cluster`

Elbow / Knee detection

vpcluster_evaluate_kmean() fits k-means for each value in nrange and uses the kneed library to locate the knee of the distortion (within-cluster sum of distances) curve.

from seisgo import clustering
import numpy as np
from tslearn.utils import to_time_series_dataset

ts = to_time_series_dataset(all_profiles)
nbest, distortions = clustering.vpcluster_evaluate_kmean(
    ts,
    nrange=np.arange(2, 15),
    smooth=True,
    plot=True,
)
print("Recommended cluster count:", nbest)