1. Preprocess seismic waveforms

An essential step before detecting and locating earthquakes is the adequate preprocessing of the seismic data. Preprocessing addresses specific needs for a given data set and task and, thus, cannot be straightforwardly generalized to new data sets. Preprocessing workflows must remain as simple as possible. The preprocessing steps taken in this notebook are given as an example and should not be blindly reproduced on other data sets.

Contents

[43]:
import obspy
import os
import tqdm

import matplotlib.pyplot as plt

from glob import glob

Read and destination paths

First, we define the path from which we read the raw data and save the preprocessed data. Also extract the path to all input waveforms.

[44]:
DIRPATH_RAW = "../data/raw/"
DIRPATH_PROCESSED = "../data/processed/"

Again create directory if needed and copy the metadata therein.

[45]:
# Create directory
os.makedirs(DIRPATH_PROCESSED, exist_ok=True)

# Copy meta to destination
!cp {DIRPATH_RAW}*xml {DIRPATH_PROCESSED}

Show an example raw trace

We collect all available waveform files to manipulate them afterward.

[46]:
filepaths_raw = sorted(glob(os.path.join(DIRPATH_RAW, "*.mseed")))

Using obspy, we can visualize the waveforms. In the following cell, the selected waveform has not been preprocessed yet: the amplitudes are in counts (signed integers) and the waveform exhibits low-frequency components, especially with a day-long period. Note that, here, we do not show the earthquake itself to see the ambient noise better.

[47]:
stream = obspy.read(filepaths_raw[0])
stream.plot(size=(600, 250), show=True)
plt.gcf().set_facecolor('w')
../../_images/tutorial_notebooks_1_preprocess_9_0.png
<Figure size 640x480 with 0 Axes>

Preprocess

The next cell loads every waveform and applies the preprocessing workflow, which includes detrending individual segments, filling gaps over the entire day, synchronizing the traces on the requested start and end times, resampling at the target sampling rate and removing the instrument gain. This cell runs in about 3 minutes on a desktop machine.

[48]:
TARGET_SAMPLING_RATE = 25.0
MINIMUM_CHUNK_DURATION_SEC = 1800.
FREQ_MIN = 2.0
FREQ_MAX = 12.0
TAPER_PERCENT = 0.02
TAPER_TYPE = "cosine"
MSEED_ENCODING = "FLOAT64"
TARGET_STARTTIME = obspy.UTCDateTime("2012-07-26")
TARGET_ENDTIME = obspy.UTCDateTime("2012-07-27")

for filepath_waveform in tqdm.tqdm(filepaths_raw, desc="Processing data"):

    # Read trace
    trace = obspy.read(filepath_waveform)[0]

    # Split trace into segments to process them individually
    stream = trace.split()
    for chunk in stream:
        t1 = obspy.UTCDateTime(chunk.stats.starttime.timestamp)
        t2 = obspy.UTCDateTime(chunk.stats.endtime.timestamp)
        if t2 - t1 < MINIMUM_CHUNK_DURATION_SEC:
            # don't include this chunk
            stream.remove(chunk)


    # Apply detrend on segments
    stream.detrend("constant")
    stream.detrend("linear")
    stream.taper(TAPER_PERCENT, type=TAPER_TYPE)

    # Merge traces, filling gaps with zeros and imposing start and end times
    stream = stream.merge(fill_value=0.0)
    trace = stream[0]
    trace.trim(starttime=TARGET_STARTTIME, endtime=TARGET_ENDTIME, pad=True, fill_value=0.0)

    # Resample at target sampling rate
    trace.resample(TARGET_SAMPLING_RATE)

    # Attach instrument response
    filepath_inventory = f"{trace.stats.network}.{trace.stats.station}.xml"
    filepath_inventory = os.path.join(DIRPATH_RAW, filepath_inventory)
    inventory = obspy.read_inventory(filepath_inventory)
    trace.attach_response(inventory)

    # Remove instrument gain
    trace.remove_sensitivity()

    # Detrend
    trace.detrend("constant")
    trace.detrend("linear")
    trace.taper(TAPER_PERCENT, type=TAPER_TYPE)

    # Filter
    trace.filter(
        "bandpass", freqmin=FREQ_MIN, freqmax=FREQ_MAX, zerophase=True
    )
    trace.taper(TAPER_PERCENT, type=TAPER_TYPE)

    # Write processed traces
    _, filename = os.path.split(filepath_waveform)
    filepath_processed_waveform = os.path.join(DIRPATH_PROCESSED, filename)
    trace.write(filepath_processed_waveform, encoding=MSEED_ENCODING)

Processing data:   0%|          | 0/24 [00:00<?, ?it/s]
Processing data: 100%|██████████| 24/24 [01:18<00:00,  3.26s/it]

Compare raw and processed traces

After the preprocessing, we see that the low-frequency components were filtered out as a result of the bandpass filter, and we see that the waveform amplitude is units of velocity (m/s). Impulsive signals are also more visible.

[53]:
# Get a file to read
filepath_raw = filepaths_raw[2]
filename_raw = os.path.basename(filepath_raw)
filepath_processed = os.path.join(DIRPATH_PROCESSED, filename_raw)

# Loop over cases and show
for filepath in (filepath_raw, filepath_processed):
    stream = obspy.read(filepath)
    stream.plot(figsize=(12, 6), show=True)
../../_images/tutorial_notebooks_1_preprocess_13_0.png
../../_images/tutorial_notebooks_1_preprocess_13_1.png

Data are never perfect… Check out station DE07, which came with a huge anomaly shortly before 4am.

[59]:
# Get a file to read
filepath_raw = filepaths_raw[12]
filename_raw = os.path.basename(filepath_raw)
filepath_processed = os.path.join(DIRPATH_PROCESSED, filename_raw)

# Loop over cases and show
for filepath in (filepath_raw, filepath_processed):
    stream = obspy.read(filepath)
    stream = stream.slice(
        starttime=obspy.UTCDateTime("2012-07-26T03:50:00"), endtime=obspy.UTCDateTime("2012-07-26T03:52:00")
    )
    stream.plot(figsize=(12, 6), method="full", show=True)
../../_images/tutorial_notebooks_1_preprocess_15_0.png
../../_images/tutorial_notebooks_1_preprocess_15_1.png