.. _api-clustering:

seisgo.clustering
=================

The ``clustering`` module groups 1-D depth-velocity profiles from 3-D seismic tomography
models into coherent spatial clusters using k-means or self-organizing maps (SOM).

.. automodule:: seisgo.clustering
   :members:
   :undoc-members:
   :show-inheritance:

----

Function Summary
----------------

.. autosummary::
   :nosignatures:

   seisgo.clustering.vpcluster_kmean
   seisgo.clustering.vpcluster_evaluate_kmean
   seisgo.clustering.vpcluster_som

----

``vpcluster_kmean`` Parameter Reference
-----------------------------------------

.. list-table::
   :header-rows: 1
   :widths: 25 15 60

   * - Parameter
     - Default
     - Description
   * - ``lat``
     - *required*
     - 1-D array of latitudes for the model grid.
   * - ``lon``
     - *required*
     - 1-D array of longitudes for the model grid.
   * - ``dep``
     - *required*
     - 1-D depth array (km).
   * - ``vmodel``
     - *required*
     - 3-D velocity array, shape ``(n_depth, n_lat, n_lon)``.
   * - ``ncluster``
     - ``None``
     - Number of clusters. If ``None``, automatically determined via the elbow method.
   * - ``nrange``
     - ``None``
     - Range of cluster counts to evaluate when ``ncluster=None``. Default: 2–20.
   * - ``spacing``
     - ``1``
     - Spatial sub-sampling stride (every Nth lat/lon point).
   * - ``zrange``
     - ``None``
     - ``[z_min, z_max]`` depth range to use. Default: full model range.
   * - ``dz``
     - ``None``
     - Depth interpolation interval (km). ``None`` = use model depth grid as-is.
   * - ``metric``
     - ``"euclidean"``
     - Distance metric for k-means. Passed to ``TimeSeriesKMeans``.
   * - ``max_iter_barycenter``
     - ``100``
     - Maximum DBA (DTW Barycenter Averaging) iterations.
   * - ``random_state``
     - ``0``
     - Random seed for reproducibility.
   * - ``njob``
     - ``1``
     - Number of parallel jobs.
   * - ``plot``
     - ``True``
     - Plot cluster profiles and map.
   * - ``savefig``
     - ``True``
     - Save figures to PNG.
   * - ``figbase``
     - ``"kmean"``
     - Base name for output figure and pickle files.
   * - ``save``
     - ``True``
     - Save results to a pickle file. If ``False``, returns ``outdict``.
   * - ``evaluate_smooth``
     - ``False``
     - Smooth the distortion curve before knee detection.
   * - ``evaluate_plot``
     - ``True``
     - Plot the elbow curve when auto-detecting cluster count.

----

``vpcluster_som`` Parameter Reference
---------------------------------------

.. list-table::
   :header-rows: 1
   :widths: 25 15 60

   * - Parameter
     - Default
     - Description
   * - ``lat``, ``lon``, ``depth``, ``v``
     - *required*
     - Same as ``vpcluster_kmean`` (note: ``depth`` not ``dep``; ``v`` not ``vmodel``).
   * - ``grid_size``
     - ``None``
     - ``[som_x, som_y]`` SOM grid dimensions. ``None`` = auto: ``ceil(√(√N))²``.
   * - ``spacing``
     - ``1``
     - Spatial sub-sampling stride.
   * - ``niteration``
     - ``50000``
     - Number of SOM training iterations.
   * - ``sigma``
     - ``0.3``
     - Initial neighbourhood radius.
   * - ``rate``
     - ``0.1``
     - Initial learning rate.
   * - ``plot``, ``savefig``, ``figbase``, ``save``
     - same as k-means
     - Same behaviour as ``vpcluster_kmean``.

----

Output dictionary structure
----------------------------

Both functions return (or save) a dictionary with the following keys:

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Key
     - Description
   * - ``method``
     - ``"k-means"`` or ``"som"``
   * - ``source``
     - User-supplied source label string
   * - ``tag``
     - User-supplied variable label string
   * - ``depth``
     - 1-D depth vector used for clustering
   * - ``model``
     - Fitted ``TimeSeriesKMeans`` or ``MiniSom`` object
   * - ``pred``
     - List of length ``n_clusters``; each element is an array of profiles in that cluster
   * - ``para``
     - Dictionary of algorithm parameters
   * - ``cluster_map``
     - ``pandas.DataFrame`` with columns ``lat``, ``lon``, ``cluster``

----

Elbow / Knee detection
-----------------------

:func:`vpcluster_evaluate_kmean` fits k-means for each value in ``nrange`` and uses the
``kneed`` library to locate the knee of the distortion (within-cluster sum of distances) curve.

.. code-block:: python

   from seisgo import clustering
   import numpy as np
   from tslearn.utils import to_time_series_dataset

   ts = to_time_series_dataset(all_profiles)
   nbest, distortions = clustering.vpcluster_evaluate_kmean(
       ts,
       nrange=np.arange(2, 15),
       smooth=True,
       plot=True,
   )
   print("Recommended cluster count:", nbest)