# Data Formats

SeisGo uses a small set of structured data objects and file formats throughout its pipeline.

## File formats

### ASDF (`.h5`)

The primary on-disk format is **ASDF** (Adaptable Seismic Data Format), a self-describing HDF5
container that stores waveforms alongside station metadata (StationXML). SeisGo reads and writes
ASDF files via [pyasdf](https://seismicdata.github.io/pyasdf/).

ASDF files are used for:

- Raw waveform archives
- FFT intermediate products
- Cross-correlation output
- dv/v monitoring output

### Pickle (`.pk`)

Lightweight serialization for intermediate results and clustering outputs. Useful for quick
experimentation but not recommended for long-term archival.

### CSV (`.csv`)

Plain text output for anisotropy measurements (`beamforming results`, `anisotropy parameters`)
and cluster map DataFrames.

---

## In-memory data types (`seisgo.types`)

### `FFTData`

Holds the frequency-domain representation of one station–component time series, ready for
cross-correlation.

Key attributes:

| Attribute | Description |
|-----------|-------------|
| `data` | 2-D complex array `(n_windows, n_freqs)` |
| `dt` | Sampling interval (s) |
| `std` | Per-window standard deviation |
| `time` | Window start times (UTCDateTime) |
| `freq` | Frequency vector (Hz) |
| `win_len` | Analysis window length (s) |
| `step` | Window step / overlap (s) |
| `net`, `sta`, `loc`, `chan` | SEED channel identifiers |

### `CorrData`

Stores a time-lag cross-correlation function (or autocorrelation) between a station pair over
multiple time windows.

Key attributes:

| Attribute | Description |
|-----------|-------------|
| `data` | 2-D array `(n_windows, n_lags)` |
| `dt` | Sampling interval (s) |
| `lag` | Maximum lag (s) |
| `dist` | Inter-station distance (km) |
| `cc_comp` | Component pair, e.g. `"ZZ"`, `"RR"` |
| `side` | `"a"` (both), `"n"` (negative), `"p"` (positive) |
| `sta` | `[source_name, receiver_name]` |
| `time` | Window timestamps |
| `misc` | Dictionary of auxiliary metadata |

Useful methods:

```python
corrdata.stack(method="linear")      # collapse to a single stack
corrdata.filter(fmin, fmax)          # bandpass filter in-place
corrdata.plot()                      # quick visualization
corrdata.to_asdf(outfile)            # save to ASDF
```

### `DvvData`

Stores dv/v measurements derived from a `CorrData` object.

Key attributes:

| Attribute | Description |
|-----------|-------------|
| `dvv` | dv/v values array |
| `err` | Measurement error / uncertainty |
| `cc` | Cross-correlation coefficient of each measurement |
| `time` | Time vector of measurements |
| `freq` | Frequency bands |
| `method` | Method used (`"wts"` or `"ts"`) |
| `cc_comp` | Component pair |

Useful methods:

```python
dvvdata.plot()
dvvdata.to_asdf(outfile)
```

---

## Xcorr output structure options

When saving cross-correlations with `noise.do_xcorr`, the subdirectory layout is controlled by
the `output_structure` parameter. Available options (from `helpers.xcorr_output_structure()`):

| Option | Short | Layout |
|--------|-------|--------|
| `raw` | `r` | By time chunk, all pairs together |
| `source` | `s` | Subfolder per virtual source |
| `station-pair` | `sp` | Subfolder per station pair |
| `station-component-pair` | `scp` | Nested source / component folders |

---

## Stacking methods

Available methods (from `helpers.stack_methods()`):

`linear`, `pws`, `tf-pws`, `robust`, `acf`, `nroot`, `selective`, `cluster`

See [Stacking API](../api/stacking.rst) for parameter details.

## Cross-correlation methods

Available methods (from `helpers.xcorr_methods()`):

`xcorr`, `deconv`, `coherency`