{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Build Template Database\n", "\n", "In this notebook, we use the events detected in notebook 5, and further characterized in notebooks 6 ands 7, to build a database of template events. This step consists mostly in building data and metadata files but also allows the selection of events based on quality and the grouping of similar events. We want to avoid having too many similar templates in the database as it means useless, redundant computation in the template matching process (similar templates detect the same events)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "from multiprocessing import cpu_count\n", "n_CPUs = max(cpu_count(), 24)\n", "os.environ[\"OMP_NUM_THREADS\"] = str(n_CPUs)\n", "\n", "import BPMF\n", "import h5py as h5\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from time import time as give_time\n", "from tqdm import tqdm\n", "\n", "from BPMF.data_reader_examples import data_reader_mseed\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# %config InlineBackend.figure_formats = [\"svg\"]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# program parameters\n", "DB_FILENAME = \"reloc_bp.h5\"\n", "DB_PATH_T = \"template_db\"\n", "DATA_FOLDER = \"preprocessed_2_12\"\n", "NETWORK_FILENAME = \"network.csv\"\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# read network metadata\n", "net = BPMF.dataset.Network(NETWORK_FILENAME)\n", "net.read()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read the database of detected events\n", "\n", "We read the database of events detected with backprojection and, with them, create `BPMF.dataset.Template` instances." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# HMAX_LOCATION_UNCERTAINTY_KM: All events with hmax_unc < HMAX_LOCATION_UNCERTAINTY_KM will be selected\n", "# as candidate template events\n", "HMAX_LOCATION_UNCERTAINTY_KM = 5.0\n", "# NOISE_WINDOW_FOR_SNR_SEC: Duration, in seconds, of the window taken before the event's origin time\n", "NOISE_WINDOW_FOR_SNR_SEC = 5.0" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Checking reloc_bp.h5...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 0%| | 0/17 [00:00, ?it/s]" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 12%|█▏ | 2/17 [00:00<00:06, 2.19it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Does not have a `arrival_times` attribute.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 18%|█▊ | 3/17 [00:01<00:05, 2.62it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Does not have a `arrival_times` attribute.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 53%|█████▎ | 9/17 [00:03<00:02, 3.13it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Does not have a `arrival_times` attribute.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " 94%|█████████▍| 16/17 [00:05<00:00, 3.07it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Does not have a `arrival_times` attribute.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 17/17 [00:05<00:00, 2.99it/s]\n" ] } ], "source": [ "templates = []\n", "tid = 0\n", "\n", "with h5.File(os.path.join(BPMF.cfg.OUTPUT_PATH, DB_FILENAME), mode=\"r\") as f:\n", " print(f\"Checking {DB_FILENAME}...\")\n", " for key in tqdm(f.keys()):\n", " event = BPMF.dataset.Event.read_from_file(\n", " DB_FILENAME, db_path=BPMF.cfg.OUTPUT_PATH, gid=key\n", " )\n", " if \"NLLoc_reloc\" in event.aux_data:\n", " if event.hmax_unc < HMAX_LOCATION_UNCERTAINTY_KM:\n", " # if event was well relocated with NLLoc, use its location\n", " event.set_moveouts_to_theoretical_times()\n", " event.where = os.path.join(\n", " BPMF.cfg.INPUT_PATH,\n", " str(event.origin_time.year),\n", " event.origin_time.strftime(\"%Y%m%d\"),\n", " )\n", " event.read_waveforms(\n", " BPMF.cfg.TEMPLATE_LEN_SEC,\n", " data_folder=DATA_FOLDER,\n", " data_reader=data_reader_mseed,\n", " n_threads=4\n", " )\n", " # computing the signal-to-noise ratio here may add a lot of \n", " # computation time if you're scanning through a lot of events,\n", " # but it greatly helps with choosing the best stations for\n", " # template matching, in the following\n", " event.set_availability()\n", " event.compute_snr(\n", " noise_window=NOISE_WINDOW_FOR_SNR_SEC,\n", " data_folder=DATA_FOLDER,\n", " data_reader=data_reader_mseed,\n", " )\n", " template = BPMF.dataset.Template.init_from_event(event)\n", " template.id = tid\n", " template.set_aux_data({\"tid\": tid})\n", " templates.append(template)\n", " tid += 1\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Remove redundant templates\n", "\n", "We first create an `BPMF.dataset.TemplateGroup` instance with the templates built in the previous cell. This class offers a number of methods to operate on groups of templates." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "template_group = BPMF.dataset.TemplateGroup(templates, net)\n", "# this trick is necessary to make the class aware that the waveforms are already\n", "# attached to the Template instances (_update_attributes should never be called at\n", "# any other time of the workflow).\n", "template_group._update_attributes.append(\"read_waveforms\")\n", "# normalize the waveforms\n", "template_group.normalize()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# parameters controlling how the inter-template correlation matrix is computed\n", "DISTANCE_THRESHOLD_KM = 5. # in km, criterion on ellipsoid separation\n", "N_CLOSEST_STATIONS = 10 # > larger than total number of stations, i.e. use all stations\n", "MAX_LAG_SAMP = 5 # in samples, maximum time shift allowed when searching for similarity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`TemplateGroup.compute_intertemplate_cc` computes the network-averaged correlation coefficient between templates. To account for possible location errors -- and, therefore, different alignments on the P- and S-wave arrivals -- we search for the maximum correlation within +/- `MAX_LAG_SAMP`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True ../BPMF_data/2012/intertp_cc.h5 False\n", "Computing the similarity matrix...\n", "Computing the inter-template directional errors...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/ebeauce/software/Seismic_BPMF/BPMF/dataset.py:4575: RuntimeWarning: invalid value encountered in divide\n", " unit_direction /= np.sqrt(np.sum(unit_direction**2, axis=1))[\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Computed inter-template CCs in 0.26sec.\n" ] } ], "source": [ "t1 = give_time()\n", "template_group.compute_intertemplate_cc(\n", " distance_threshold=DISTANCE_THRESHOLD_KM,\n", " n_stations=N_CLOSEST_STATIONS,\n", " save_cc=False,\n", " max_lag=MAX_LAG_SAMP,\n", " compute_from_scratch=True,\n", ")\n", "t2 = give_time()\n", "print(f\"Computed inter-template CCs in {t2-t1:.2f}sec.\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "10 | \n", "11 | \n", "12 | \n", "13 | \n", "14 | \n", "15 | \n", "16 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1.000000 | \n", "0.492278 | \n", "0.000000 | \n", "0.000000 | \n", "0.609729 | \n", "0.332857 | \n", "0.361573 | \n", "0.390337 | \n", "0.301650 | \n", "0.000000 | \n", "0.000000 | \n", "0.754384 | \n", "0.519809 | \n", "0.719606 | \n", "0.370723 | \n", "0.000000 | \n", "0.195433 | \n", "
1 | \n", "0.492278 | \n", "1.000000 | \n", "0.000000 | \n", "0.187213 | \n", "0.565095 | \n", "0.411928 | \n", "0.382834 | \n", "0.453597 | \n", "0.307070 | \n", "0.000000 | \n", "0.000000 | \n", "0.482994 | \n", "0.357882 | \n", "0.489115 | \n", "0.472800 | \n", "0.000000 | \n", "0.178264 | \n", "
2 | \n", "0.000000 | \n", "0.000000 | \n", "1.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.194202 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.180455 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
3 | \n", "0.000000 | \n", "0.187213 | \n", "0.000000 | \n", "1.000000 | \n", "0.000000 | \n", "0.178173 | \n", "0.172453 | \n", "0.151518 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.176006 | \n", "0.000000 | \n", "0.193301 | \n", "
4 | \n", "0.609729 | \n", "0.565095 | \n", "0.000000 | \n", "0.000000 | \n", "1.000000 | \n", "0.439895 | \n", "0.444787 | \n", "0.487140 | \n", "0.337587 | \n", "0.000000 | \n", "0.000000 | \n", "0.567154 | \n", "0.473492 | \n", "0.536317 | \n", "0.514677 | \n", "0.000000 | \n", "0.214054 | \n", "
5 | \n", "0.332857 | \n", "0.411928 | \n", "0.194202 | \n", "0.178173 | \n", "0.439895 | \n", "1.000000 | \n", "0.442811 | \n", "0.342537 | \n", "0.315133 | \n", "0.161617 | \n", "0.175915 | \n", "0.358923 | \n", "0.274556 | \n", "0.339257 | \n", "0.370628 | \n", "0.186084 | \n", "0.216613 | \n", "
6 | \n", "0.361573 | \n", "0.382834 | \n", "0.000000 | \n", "0.172453 | \n", "0.444787 | \n", "0.442811 | \n", "1.000000 | \n", "0.353616 | \n", "0.314060 | \n", "0.000000 | \n", "0.000000 | \n", "0.367496 | \n", "0.296924 | \n", "0.340311 | \n", "0.361826 | \n", "0.000000 | \n", "0.186617 | \n", "
7 | \n", "0.390337 | \n", "0.453597 | \n", "0.000000 | \n", "0.151518 | \n", "0.487140 | \n", "0.342537 | \n", "0.353616 | \n", "1.000000 | \n", "0.301116 | \n", "0.000000 | \n", "0.000000 | \n", "0.390978 | \n", "0.351354 | \n", "0.345460 | \n", "0.412677 | \n", "0.000000 | \n", "0.161650 | \n", "
8 | \n", "0.301650 | \n", "0.307070 | \n", "0.000000 | \n", "0.000000 | \n", "0.337587 | \n", "0.315133 | \n", "0.314060 | \n", "0.301116 | \n", "1.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.337232 | \n", "0.279402 | \n", "0.293451 | \n", "0.315505 | \n", "0.000000 | \n", "0.166269 | \n", "
9 | \n", "0.000000 | \n", "0.000000 | \n", "0.180455 | \n", "0.000000 | \n", "0.000000 | \n", "0.161617 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "1.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.192087 | \n", "0.000000 | \n", "
10 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.175915 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "1.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
11 | \n", "0.754384 | \n", "0.482994 | \n", "0.000000 | \n", "0.000000 | \n", "0.567154 | \n", "0.358923 | \n", "0.367496 | \n", "0.390978 | \n", "0.337232 | \n", "0.000000 | \n", "0.000000 | \n", "1.000000 | \n", "0.481450 | \n", "0.817688 | \n", "0.422008 | \n", "0.000000 | \n", "0.204009 | \n", "
12 | \n", "0.519809 | \n", "0.357882 | \n", "0.000000 | \n", "0.000000 | \n", "0.473492 | \n", "0.274556 | \n", "0.296924 | \n", "0.351354 | \n", "0.279402 | \n", "0.000000 | \n", "0.000000 | \n", "0.481450 | \n", "1.000000 | \n", "0.588040 | \n", "0.410869 | \n", "0.000000 | \n", "0.188014 | \n", "
13 | \n", "0.719606 | \n", "0.489115 | \n", "0.000000 | \n", "0.000000 | \n", "0.536317 | \n", "0.339257 | \n", "0.340311 | \n", "0.345460 | \n", "0.293451 | \n", "0.000000 | \n", "0.000000 | \n", "0.817688 | \n", "0.588040 | \n", "1.000000 | \n", "0.463257 | \n", "0.000000 | \n", "0.203328 | \n", "
14 | \n", "0.370723 | \n", "0.472800 | \n", "0.000000 | \n", "0.176006 | \n", "0.514677 | \n", "0.370628 | \n", "0.361826 | \n", "0.412677 | \n", "0.315505 | \n", "0.000000 | \n", "0.000000 | \n", "0.422008 | \n", "0.410869 | \n", "0.463257 | \n", "1.000000 | \n", "0.000000 | \n", "0.189606 | \n", "
15 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.186084 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.192087 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "1.000000 | \n", "0.000000 | \n", "
16 | \n", "0.195433 | \n", "0.178264 | \n", "0.000000 | \n", "0.193301 | \n", "0.214054 | \n", "0.216613 | \n", "0.186617 | \n", "0.161650 | \n", "0.166269 | \n", "0.000000 | \n", "0.000000 | \n", "0.204009 | \n", "0.188014 | \n", "0.203328 | \n", "0.189606 | \n", "0.000000 | \n", "1.000000 | \n", "