{
"cells": [
{
"cell_type": "markdown",
"id": "6619490b",
"metadata": {},
"source": [
"# Preprocess Seismic Data\n",
"\n",
"In this notebook, we pre-process the seismic data downloaded in the previous section `1_download_data.ipynb`. This step is essential for the successful application of the detection and location workflow."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "0bcab028",
"metadata": {},
"outputs": [],
"source": [
"import BPMF\n",
"import glob\n",
"import numpy as np\n",
"import os\n",
"\n",
"import obspy as obs\n",
"\n",
"from datetime import timedelta\n",
"from tqdm import tqdm"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "42b9c95d",
"metadata": {},
"outputs": [],
"source": [
"DATE = obs.UTCDateTime(\"2012-07-26\")\n",
"dirpath_data = os.path.join(BPMF.cfg.INPUT_PATH, str(DATE.year), DATE.strftime(\"%Y%m%d\"))"
]
},
{
"cell_type": "markdown",
"id": "a3f9c2b9",
"metadata": {},
"source": [
"## Build the data reader\n",
"\n",
"`BPMF` allows any data formats to be used. The price for flexibility is to define a function that reads the data and returns `obspy.Stream`."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "6dbf21a6",
"metadata": {},
"outputs": [],
"source": [
"def data_reader_mseed(\n",
" where,\n",
" network=\"*\",\n",
" station=\"*\",\n",
" channel=\"*\",\n",
" location=\"*\",\n",
" starttime=None,\n",
" endtime=None,\n",
" attach_response=False,\n",
" data_folder=\"\",\n",
" **kwargs\n",
"):\n",
" \"\"\"Data reader for BPMF.\n",
"\n",
" This data reader is specifically designed for the folder tree convention\n",
" that we use in the tutorial. We will use the same data reader at later\n",
" stages of the workflow.\n",
"\n",
" Parameters\n",
" -----------\n",
" where: string\n",
" Path to data file or root data folder.\n",
" network: string or list, optional\n",
" Code(s) of the target network(s).\n",
" station: string or list, optional\n",
" Code(s) of the target station(s).\n",
" channel: string or list, optional\n",
" Code(s) of the target channel(s).\n",
" location: string or list, optional\n",
" Code(s) of the target location(s).\n",
" starttime: string or obspy.UTCDateTime, optional\n",
" Target start time.\n",
" endtime: string or obspy.UTCDateTime, optional\n",
" Target end time.\n",
" attach_response: boolean, optional\n",
" If True, find the instrument response from the xml files\n",
" and attach it to the obspy.Stream output instance.\n",
" data_folder: string, optional\n",
" If given, is the child folder in `where` containing\n",
" the mseed files to read.\n",
"\n",
" Returns\n",
" -------\n",
" traces: obspy.Stream\n",
" The seismic data.\n",
" \"\"\"\n",
" from obspy import Stream\n",
"\n",
" traces = Stream()\n",
" # read your data into traces\n",
" data_files = glob.glob(\n",
" os.path.join(where, data_folder, f\"{network}.{station}.{location}.{channel}*\")\n",
" )\n",
" for fname in data_files:\n",
" traces += obs.read(fname, starttime=starttime, endtime=endtime, **kwargs)\n",
" if attach_response:\n",
" resp_files = glob.glob(\n",
" os.path.join(where, \"resp\", f\"{network}.{station}.xml\")\n",
" )\n",
" invs = list(map(obs.read_inventory, resp_files))\n",
" traces.attach_response(invs)\n",
" return traces\n"
]
},
{
"cell_type": "markdown",
"id": "d74a30de",
"metadata": {},
"source": [
"In general, you do want to make sure you personalize this data reader for your own folder architecture. If you choose to use a similar file architecture as in this tutorial, you may want to use a more complex version of this data reader, available at `BPMF.data_reader_examples.data_reader_mseed`."
]
},
{
"cell_type": "markdown",
"id": "04a865dc",
"metadata": {},
"source": [
"## Read the raw data"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "f7e9af54",
"metadata": {},
"outputs": [],
"source": [
"raw_traces = data_reader_mseed(dirpath_data, attach_response=True, data_folder=\"raw\")"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "8d028e4b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"24 Trace(s) in Stream:\n",
"YH.DE08..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE07..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE08..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DD06..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC07..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.SAUV..HHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 100.0 Hz, 8740001 samples\n",
"YH.DC07..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.SAUV..HHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 100.0 Hz, 8740001 samples\n",
"YH.DC08..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.SPNC..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.SPNC..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC06..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC06..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC06..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC07..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.SAUV..HHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 100.0 Hz, 8740001 samples\n",
"YH.DC08..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC08..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.SPNC..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE07..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE08..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE07..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DD06..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DD06..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"Channel Response\n",
"\tFrom M/S (Velocity in Meters per Second) to COUNTS (Digital Counts)\n",
"\tOverall Sensitivity: 2.45935e+09 defined at 1.000 Hz\n",
"\t6 stages:\n",
"\t\tStage 1: PolesZerosResponseStage from M/S to V, gain: 2420\n",
"\t\tStage 2: CoefficientsTypeResponseStage from V to COUNTS, gain: 1.01626e+06\n",
"\t\tStage 3: FIRResponseStage from COUNTS to COUNTS, gain: 1\n",
"\t\tStage 4: FIRResponseStage from COUNTS to COUNTS, gain: 1\n",
"\t\tStage 5: FIRResponseStage from COUNTS to COUNTS, gain: 1\n",
"\t\tStage 6: FIRResponseStage from COUNTS to COUNTS, gain: 1\n"
]
}
],
"source": [
"print(raw_traces.__str__(extended=True))\n",
"\n",
"print(raw_traces[0].stats.response)"
]
},
{
"cell_type": "markdown",
"id": "1da95385",
"metadata": {},
"source": [
"## Preprocess the data"
]
},
{
"cell_type": "markdown",
"id": "1307c5a9",
"metadata": {},
"source": [
"First, we call `obspy.Stream.merge()` to group the different segments of a same channel within a single trace."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "1e3651c2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"24 Trace(s) in Stream:\n",
"\n",
"YH.DC06..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"...\n",
"(22 other traces)\n",
"...\n",
"YH.SPNC..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"\n",
"[Use \"print(Stream.__str__(extended=True))\" to print all Traces]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw_traces.merge()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "309caa0b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"24 Trace(s) in Stream:\n",
"YH.DC06..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC06..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC06..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC07..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC07..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC07..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC08..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC08..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DC08..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DD06..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DD06..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DD06..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE07..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE07..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE07..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE08..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE08..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.DE08..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.SAUV..HHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 100.0 Hz, 8740001 samples\n",
"YH.SAUV..HHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 100.0 Hz, 8740001 samples\n",
"YH.SAUV..HHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 100.0 Hz, 8740001 samples\n",
"YH.SPNC..BHE | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.SPNC..BHN | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n",
"YH.SPNC..BHZ | 2012-07-25T23:51:40.000000Z - 2012-07-27T00:08:20.000000Z | 50.0 Hz, 4370001 samples\n"
]
}
],
"source": [
"# there is now a single trace per channel\n",
"print(raw_traces.__str__(extended=True))"
]
},
{
"cell_type": "markdown",
"id": "abc3cb76",
"metadata": {},
"source": [
"Then, we use `BPMF.utils.preprocess_stream` with the below-defined parameters (some parameters were already defined when initializing the project). We do not correct for the full instrument response, as the response is mostly flat in the frequency band of interest. However, we do correct for the instrument sensitivity so that the waveform amplitudes are in physical units [m/s]."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "c5cfe62b",
"metadata": {},
"outputs": [],
"source": [
"# preprocessing parameters\n",
"freqmin = BPMF.cfg.MIN_FREQ_HZ\n",
"freqmax = BPMF.cfg.MAX_FREQ_HZ\n",
"target_SR = BPMF.cfg.SAMPLING_RATE_HZ\n",
"target_starttime = DATE\n",
"target_endtime = DATE + timedelta(days=1.)\n",
"remove_response = False\n",
"remove_sensitivity = True"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "63da4f1b",
"metadata": {},
"outputs": [],
"source": [
"preprocessed_traces = BPMF.utils.preprocess_stream(\n",
" raw_traces,\n",
" freqmin=freqmin,\n",
" freqmax=freqmax,\n",
" target_SR=target_SR,\n",
" target_starttime=target_starttime,\n",
" target_endtime=target_endtime,\n",
" remove_response=remove_response,\n",
" remove_sensitivity=remove_sensitivity,\n",
" antialiasing_filter_order=8, # make it sharp\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "0add8aef",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"24 Trace(s) in Stream:\n",
"YH.DC06..BHE | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DC06..BHN | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DC06..BHZ | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DC07..BHE | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DC07..BHN | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DC07..BHZ | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DC08..BHE | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DC08..BHN | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DC08..BHZ | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DD06..BHE | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DD06..BHN | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DD06..BHZ | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DE07..BHE | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DE07..BHN | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DE07..BHZ | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DE08..BHE | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DE08..BHN | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.DE08..BHZ | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.SAUV..HHE | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.SAUV..HHN | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.SAUV..HHZ | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.SPNC..BHE | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.SPNC..BHN | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n",
"YH.SPNC..BHZ | 2012-07-26T00:00:00.000000Z - 2012-07-26T23:59:59.960000Z | 25.0 Hz, 2160000 samples\n"
]
}
],
"source": [
"print(preprocessed_traces.__str__(extended=True))"
]
},
{
"cell_type": "markdown",
"id": "d67c606b",
"metadata": {},
"source": [
"## Plot the preprocessed traces"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "50c34cd1",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"%config InlineBackend.figure_formats = [\"svg\"]\n",
"\n",
"fig = preprocessed_traces.plot(equal_scale=False)"
]
},
{
"cell_type": "markdown",
"id": "ac7d2fdd",
"metadata": {},
"source": [
"## Write the preprocessed data\n",
"\n",
"Save the preprocessed data to the data base."
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "14423ff3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The preprocessed data will be stored at: ../BPMF_data/2012/20120726/preprocessed_2_12\n"
]
}
],
"source": [
"preproc_folder_name = f\"preprocessed_{BPMF.cfg.MIN_FREQ_HZ:.0f}_{BPMF.cfg.MAX_FREQ_HZ:.0f}\"\n",
"print(\"The preprocessed data will be stored at:\", os.path.join(dirpath_data, preproc_folder_name))\n",
"if not os.path.isdir(os.path.join(dirpath_data, preproc_folder_name)):\n",
" os.makedirs(os.path.join(dirpath_data, preproc_folder_name))"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "bf5d2036",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Writing preprocessed data: 100%|██████████| 24/24 [00:05<00:00, 4.32it/s]\n"
]
}
],
"source": [
"for tr in tqdm(preprocessed_traces, desc=\"Writing preprocessed data\"):\n",
" tr.write(os.path.join(dirpath_data, preproc_folder_name, f\"{tr.id}_{DATE.strftime('%Y%m%d')}.mseed\"), encoding=\"FLOAT64\", format=\"mseed\")\n",
"# tip: it is most likely that you don't need the float64 precision. Try converting all tr.data to float32 and use encoding=\"FLOAT32\", you will\n",
"# reduce the memory footprint of your preprocessed data set by 2!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.4 ('hy7_py310')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
},
"vscode": {
"interpreter": {
"hash": "221f0e5b1b98151b07a79bf3b6d0c1d306576197d2c4531763770570a29e708e"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}