Relocation

In the previous notebook, 5_backprojection, we built a database of events detected with the backprojection technique. Here, we will improve the location of each of these events by using a more elaborated location technique.

[1]:

import os
# choose the number of threads you want to limit the computation to
n_CPUs = 24
os.environ["OMP_NUM_THREADS"] = str(n_CPUs)

import copy
import BPMF
import h5py as h5
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import string

from BPMF.data_reader_examples import data_reader_mseed
from time import time as give_time

# %config InlineBackend.figure_formats = ["svg"]
plt.rcParams["svg.fonttype"] = "none"

[2]:

INPUT_DB_FILENAME = "raw_bp.h5"
OUTPUT_DB_FILENAME = "reloc_bp.h5"
OUTPUT_CSV_FILENAME = "backprojection_catalog.csv"
NETWORK_FILENAME = "network.csv"

[3]:

net = BPMF.dataset.Network(NETWORK_FILENAME)
net.read()

Load the metadata of the events previously detected with backprojection

In the previous notebook, we used BPMF.dataset.Event.write to save the metadata of each detected event to a hdf5 database. Here, we will use BPMF.dataset.Event.read_from_file to read the metadata and build a list of BPMF.dataset.Event instances.

[4]:

backprojection_events = []
with h5.File(os.path.join(BPMF.cfg.OUTPUT_PATH, INPUT_DB_FILENAME), mode="r") as fin:
    print(f"Each event's metadata is stored at a 'subfolder'. The subfolders are: {fin.keys()}")
    print("In hdf5, 'subfolders' are called 'groups'.")
    for group_id in fin.keys():
        backprojection_events.append(
            BPMF.dataset.Event.read_from_file(
                INPUT_DB_FILENAME,
                db_path=BPMF.cfg.OUTPUT_PATH,
                gid=group_id,
                data_reader=data_reader_mseed
            )
        )

Each event's metadata is stored at a 'subfolder'. The subfolders are: <KeysViewHDF5 ['20120726_011024.280000', '20120726_011555.880000', '20120726_015757.600000', '20120726_023502.160000', '20120726_030040.880000', '20120726_044340.160000', '20120726_044840.160000', '20120726_080827.360000', '20120726_100725.640000', '20120726_115536.760000', '20120726_133527.760000', '20120726_134834.840000', '20120726_135200.480000', '20120726_135333.600000', '20120726_135656.000000', '20120726_150621.480000', '20120726_172821.680000']>
In hdf5, 'subfolders' are called 'groups'.

[5]:

initial_catalog = pd.DataFrame(columns=["origin_time", "longitude", "latitude", "depth"])
for i, ev in enumerate(backprojection_events):
    initial_catalog.loc[i, "origin_time"] = ev.origin_time
    initial_catalog.loc[i, "longitude"] = ev.longitude
    initial_catalog.loc[i, "latitude"] = ev.latitude
    initial_catalog.loc[i, "depth"] = ev.depth
initial_catalog["origin_time"] = initial_catalog["origin_time"].values.astype("datetime64[ms]")
initial_catalog["longitude"] = initial_catalog["longitude"].values.astype("float32")
initial_catalog["latitude"] = initial_catalog["latitude"].values.astype("float32")
initial_catalog["depth"] = initial_catalog["depth"].values.astype("float32")
initial_catalog

[5]:

	origin_time	longitude	latitude	depth
0	2012-07-26 01:10:24.280	30.309999	40.720001	12.0
1	2012-07-26 01:15:55.880	30.379999	40.709999	8.0
2	2012-07-26 01:57:57.600	30.209999	40.599998	-1.5
3	2012-07-26 02:35:02.160	30.330000	40.700001	-1.5
4	2012-07-26 03:00:40.880	30.350000	40.709999	9.0
5	2012-07-26 04:43:40.160	30.350000	40.720001	8.5
6	2012-07-26 04:48:40.160	30.389999	40.709999	8.0
7	2012-07-26 08:08:27.360	30.379999	40.709999	8.0
8	2012-07-26 10:07:25.640	30.330000	40.720001	9.5
9	2012-07-26 11:55:36.760	30.270000	40.599998	3.0
10	2012-07-26 13:35:27.760	30.450001	40.799999	-2.0
11	2012-07-26 13:48:34.840	30.309999	40.720001	12.0
12	2012-07-26 13:52:00.480	30.360001	40.709999	7.5
13	2012-07-26 13:53:33.600	30.330000	40.720001	12.0
14	2012-07-26 13:56:56.000	30.330000	40.730000	10.0
15	2012-07-26 15:06:21.480	30.330000	40.599998	-2.0
16	2012-07-26 17:28:21.680	30.430000	40.720001	10.0

Build a BPMF.dataset.Catalog instance from initial_catalog in order to benefit from the BPMF.dataset.Catalog plotting methods.

[6]:

initial_catalog = BPMF.dataset.Catalog.read_from_dataframe(
    initial_catalog,
)

[7]:

initial_catalog.catalog

[7]:

	longitude	latitude	depth	origin_time
0	30.309999	40.720001	12.0	2012-07-26 01:10:24.280
1	30.379999	40.709999	8.0	2012-07-26 01:15:55.880
2	30.209999	40.599998	-1.5	2012-07-26 01:57:57.600
3	30.330000	40.700001	-1.5	2012-07-26 02:35:02.160
4	30.350000	40.709999	9.0	2012-07-26 03:00:40.880
5	30.350000	40.720001	8.5	2012-07-26 04:43:40.160
6	30.389999	40.709999	8.0	2012-07-26 04:48:40.160
7	30.379999	40.709999	8.0	2012-07-26 08:08:27.360
8	30.330000	40.720001	9.5	2012-07-26 10:07:25.640
9	30.270000	40.599998	3.0	2012-07-26 11:55:36.760
10	30.450001	40.799999	-2.0	2012-07-26 13:35:27.760
11	30.309999	40.720001	12.0	2012-07-26 13:48:34.840
12	30.360001	40.709999	7.5	2012-07-26 13:52:00.480
13	30.330000	40.720001	12.0	2012-07-26 13:53:33.600
14	30.330000	40.730000	10.0	2012-07-26 13:56:56.000
15	30.330000	40.599998	-2.0	2012-07-26 15:06:21.480
16	30.430000	40.720001	10.0	2012-07-26 17:28:21.680

Let’s plot the locations that the backprojection method produced.

[8]:

fig = initial_catalog.plot_map(figsize=(10, 10), network=net, s=50, markersize_station=50, lat_margin=0.02)
ax = fig.get_axes()[0]
ax.set_facecolor("dimgrey")
ax.patch.set_alpha(0.15)

../../_images/tutorial_notebooks_6_relocate_11_0.png

The backprojection method offers several advantages for locating earthquakes:

Locations come for free in addition to detecting events.
The “picking” of P and S waves does not involve any single-channel threshold.
Backprojection effectively solves the detection and phase association problems at the same time.

However, there is a number of cons:

The backprojection locations are just as good as the grid is. If a downsampled grid was used for computation efficiency, then the locations definitely need to be refined.
It is hard and computationally expensive to estimate uncertainties (as we will see later in this notebook).

Thus, in the following, we will explore two ways to refine the initial locations.

Relocation 1: PhaseNet picks and NonLinLoc

If you have installed PhaseNet and NonLinLoc, you can use BPMF.dataset.Event methods to easily relocate events with automatic picking and earthquake location. This is a two-step procedure. We first need to pick the P- and S-wave arrivals with PhaseNet before giving these data to NonLinLoc.

We will explore, step by step, what happens to the BPMF.dataset.Event instance when picking the phase arrivals and when relocating with NonLinLoc. First, we read short windows around the P- and S-wave arrivals, as determined by the backprojection locations.

Automated picking and relocation

[9]:

# event extraction parameters

# PHASE_ON_COMP: dictionary defining which moveout we use to extract the waveform
PHASE_ON_COMP = {"N": "S", "1": "S", "E": "S", "2": "S", "Z": "P"}
# OFFSET_PHASE: dictionary defining the time offset taken before a given phase
#               for example OFFSET_PHASE["P"] = 1.0 means that we extract the window
#               1 second before the predicted P arrival time
OFFSET_PHASE = {"P": 1.0, "S": 4.0}
# TIME_SHIFTED: boolean, if True, use moveouts to extract relevant windows
TIME_SHIFTED = True
# DATA_FOLDER: name of the folder where the waveforms we want to use for picking are stored
DATA_FOLDER = "preprocessed_2_12"

[10]:

t1 = give_time()
for i in range(len(backprojection_events)):
    backprojection_events[i].read_waveforms(
        BPMF.cfg.TEMPLATE_LEN_SEC,
        phase_on_comp=PHASE_ON_COMP,
        offset_phase=OFFSET_PHASE,
        time_shifted=TIME_SHIFTED,
        data_folder=DATA_FOLDER,
    )
t2 = give_time()
print(
    f"{t2-t1:.2f}sec to read the waveforms for all {len(backprojection_events):d} detections."
)

3.40sec to read the waveforms for all 17 detections.

Let’s plot these short windows with BPMF.dataset.Event.plot.

[11]:

# play with EVENT_IDX to plot different events
EVENT_IDX = 0
fig = backprojection_events[EVENT_IDX].plot(figsize=(15, 15))

../../_images/tutorial_notebooks_6_relocate_17_0.png

Now, we will use PhaseNet to pick the P- and S-wave arrivals. This is done with the BPMF.dataset.Event.pick_PS_phases method.

[12]:

# PhaseNet picking parameters

# PhaseNet was trained for 100Hz data. Even if we saw that running PhaseNet on 25Hz data
# was good for backprojection, here, picking benefits from running PhaseNet on 100Hz data.
# Thus, we will upsample the waveforms before running PhaseNet.
PHASENET_SAMPLING_RATE_HZ = 100.
UPSAMPLING_BEFORE_PN_RELOCATION = int(PHASENET_SAMPLING_RATE_HZ/BPMF.cfg.SAMPLING_RATE_HZ)
DOWNSAMPLING_BEFORE_PN_RELOCATION = 1

# DURATION_SEC: the duration, in seconds, of the data stream starting at the detection time
#               defined by Event.origin_time. This data stream is used for picking the P/S waves.
DURATION_SEC = 60.0
# THRESHOLD_P: probability of P-wave arrival above which we declare a pick. If several picks are
#              declared during the DURATION_SEC data stream, we only keep the best one. We can
#              afford using a low probability threshold since we already know with some confidence
#              that an earthquake is in the data stream.
THRESHOLD_P = 0.10
# THRESHOLD_S: probability of S-wave arrival above which we declare a pick.
THRESHOLD_S = 0.10
# DATA_FOLDER: name of the folder where the waveforms we want to use for picking are stored
DATA_FOLDER = "preprocessed_2_12"
# COMPONENT_ALIASES: A dictionary that defines the possible channel names to search for
#                    for example, the seismometer might not be oriented and the horizontal channels
#                    might be named 1 and 2, in which case we arbitrarily decide to take 1 as the "N" channel
#                    and 2 as the "E" channel. This doesn't matter for picking P- and S-wave arrival times.
COMPONENT_ALIASES = {"N": ["N", "1"], "E": ["E", "2"], "Z": ["Z"]}
# USE_APRIORI_PICKS: boolean. This option is IMPORTANT when running BPMF in HIGH SEISMICITY CONTEXTS, like
#                   during the aftershock sequence of a large earthquake. If there are many events happening
#                   close to each other in time, we need to guide PhaseNet to pick the right set of picks.
#                   For that, we use the predicted P- and S-wave times from backprojection to add extra weight to
#                   the picks closer to those times and make it more likely to identify them as the "best" picks.
#                   WARNING: If there are truly many events, even this trick might fail. It's because "phase association"
#                   is an intrinsically hard problem in this case, and the picking might be hard to do automatically.
USE_APRIORI_PICKS = True

[13]:

for i in range(len(backprojection_events)):
    print(f"Picking P and S wave arrivals on detection {i:d}")
    # make sure the event's station list has all the stations at which
    # we eventually want travel times after NLLoc's location
    backprojection_events[i].stations = net.stations.astype("U")
    # the following line should only be used in a similar context as here
    # it uses the moveouts from the backprojection location to define the
    # P/S arrival times that we use a "prior information"
    backprojection_events[i].set_arrival_times_from_moveouts(verbose=0)
    backprojection_events[i].pick_PS_phases(
        DURATION_SEC,
        threshold_P=THRESHOLD_P,
        threshold_S=THRESHOLD_S,
        component_aliases=COMPONENT_ALIASES,
        data_folder=DATA_FOLDER,
        upsampling=UPSAMPLING_BEFORE_PN_RELOCATION,
        downsampling=DOWNSAMPLING_BEFORE_PN_RELOCATION,
        use_apriori_picks=USE_APRIORI_PICKS,
        keep_probability_time_series=True,
    )

Picking P and S wave arrivals on detection 0
Picking P and S wave arrivals on detection 1
Picking P and S wave arrivals on detection 2
Picking P and S wave arrivals on detection 3
Picking P and S wave arrivals on detection 4
Picking P and S wave arrivals on detection 5
Picking P and S wave arrivals on detection 6
Picking P and S wave arrivals on detection 7
Picking P and S wave arrivals on detection 8
Picking P and S wave arrivals on detection 9
Picking P and S wave arrivals on detection 10
Picking P and S wave arrivals on detection 11
Picking P and S wave arrivals on detection 12
Picking P and S wave arrivals on detection 13
Picking P and S wave arrivals on detection 14
Picking P and S wave arrivals on detection 15
Picking P and S wave arrivals on detection 16

We are done with the automatic picking! Every instance of BPMF.dataset.Event should now have a new attribute picks.

[14]:

EVENT_IDX = 0
backprojection_events[EVENT_IDX].picks

[14]:

	P_probas	P_picks_sec	P_unc_sec	S_probas	S_picks_sec	S_unc_sec	P_abs_picks	S_abs_picks
stations
DE07	0.665437	20.300467	0.155635	0.860289	22.553024	0.101877	2012-07-26T01:10:24.580467Z	2012-07-26T01:10:26.833024Z
DC07	0.602067	20.334587	0.185503	0.678574	22.202774	0.129679	2012-07-26T01:10:24.614587Z	2012-07-26T01:10:26.482774Z
DC08	0.754658	20.053699	0.115350	0.690027	21.970953	0.112607	2012-07-26T01:10:24.333699Z	2012-07-26T01:10:26.250953Z
DE08	0.241382	19.796843	0.295909	0.366147	21.934898	0.229555	2012-07-26T01:10:24.076843Z	2012-07-26T01:10:26.214898Z
SPNC	0.918361	20.136955	0.094823	0.884857	21.986248	0.096080	2012-07-26T01:10:24.416955Z	2012-07-26T01:10:26.266248Z
DC06	0.826057	20.938963	0.085419	0.886496	23.207859	0.097003	2012-07-26T01:10:25.218963Z	2012-07-26T01:10:27.487859Z
SAUV	0.793898	19.595510	0.104622	0.818875	21.098804	0.107155	2012-07-26T01:10:23.875510Z	2012-07-26T01:10:25.378804Z
DD06	0.456268	20.447830	0.156354	0.805816	22.341667	0.112516	2012-07-26T01:10:24.727830Z	2012-07-26T01:10:26.621667Z

These picks are automatically read when calling BPMF.dataset.Event.plot. Before plotting the waveforms with the newly obtained picks, let’s re-read the waveforms in the same format as before for the sake of easier visualization.

[15]:

t1 = give_time()
for i in range(len(backprojection_events)):
    backprojection_events[i].read_waveforms(
        BPMF.cfg.TEMPLATE_LEN_SEC,
        phase_on_comp=PHASE_ON_COMP,
        offset_phase=OFFSET_PHASE,
        time_shifted=TIME_SHIFTED,
        data_folder=DATA_FOLDER,
    )
t2 = give_time()
print(
    f"{t2-t1:.2f}sec to read the waveforms for all {len(backprojection_events):d} detections."
)

3.42sec to read the waveforms for all 17 detections.

The following plot shows:

the P-/S-wave arrival continuous probabilities (blue=P-wave, red=S-wave) computed by PhaseNet,
the PhaseNet picks with dashed vertical lines (blue=P-wave, red=S-wave),
the arrival times predicted by the backprojection location with solid vertical lines (purple=P-wave, orange=S-wave).

See how, in this tutorial, the backprojection locations already do a good job at predicting the P- and S-wave times… given the limitations of the 1D velocity model.

Note: Try to run the phase picking again with USE_APRIORI_PICKS=False and see how the plot changes.

[16]:

# play with EVENT_IDX to plot different events
EVENT_IDX = 0
fig = backprojection_events[EVENT_IDX].plot(
    figsize=(15, 15), plot_picks=True, plot_predicted_arrivals=False, plot_probabilities=True
    )

../../_images/tutorial_notebooks_6_relocate_26_0.png

We now are all set for using our favorite location software using PhaseNet’s picks. We can use NLLoc with one of BPMF.dataset.Event methods. See http://alomax.free.fr/nlloc/ for more info.

[17]:

# location parameters

LOCATION_ROUTINE = "NLLoc"
# NLLOC_METHOD: string that defines what loss function is used by NLLoc, see http://alomax.free.fr/nlloc/ for more info.
#               Using some flavor of 'EDT' is important to obtain robust locations that are not sensitive to pick outliers.
NLLOC_METHOD = "EDT_OT_WT_ML"
# MINIMUM_NUM_STATIONS_W_PICKS: minimum number of stations with picks to even try relocation.
MINIMUM_NUM_STATIONS_W_PICKS = 3

[18]:

for i, ev in enumerate(backprojection_events):
    # before relocating, let's store the old location for book-keeping
    ev.set_aux_data(
        {"backprojection_longitude": ev.longitude,
         "backprojection_latitude": ev.latitude,
         "backprojection_depth": ev.depth}
    )
    if len(ev.picks.dropna(how="any")) >= MINIMUM_NUM_STATIONS_W_PICKS:
        print(f"Relocating event {i}")
        # first relocation, insensitive to outliers
        ev.relocate(
            stations=net.stations, routine=LOCATION_ROUTINE, method=NLLOC_METHOD,
        )

Relocating event 0
Relocating event 1
Relocating event 2
Relocating event 3
Relocating event 4
Relocating event 6
Relocating event 7
Relocating event 8
Relocating event 9
Relocating event 10
Relocating event 11
Relocating event 12
Relocating event 13
Relocating event 14
Relocating event 15
Relocating event 16

Let’s plot the arrival times predicted by NLLoc’s location! As you can see, a 1D velocity model cannot fit the picks very well. The stations are right in the heterogeneous fault zone.

[19]:

# play with EVENT_IDX to plot different events
EVENT_IDX = 1
fig = backprojection_events[EVENT_IDX].plot(
    figsize=(15, 15), plot_picks=True, plot_probabilities=True, plot_predicted_arrivals=True
    )

../../_images/tutorial_notebooks_6_relocate_31_0.png

It doesn’t look like much has changed… But we now have the location uncertainties. The uncertainty information is encoded in the covariance matrix, which is stored at BPMF.dataset.Event.aux_data["cov_mat"].

[ ]:

EVENT_IDX = 0
ev = backprojection_events[EVENT_IDX]
print(f"The maximum horizontal location uncertainty of event {EVENT_IDX} is {ev.hmax_unc:.2f}km.")
print(f"The azimuth of the maximum horizontal uncertainty is {ev.az_hmax_unc:.2f}degrees.")
print(f"The mainimumhorizontal location uncertainty of event {EVENT_IDX} is {ev.hmin_unc:.2f}km.")
print(f"The azimuth of the minimum horizontal uncertainty is {ev.az_hmin_unc:.2f}degrees.")
print(f"The maximum vertical location uncertainty is {ev.vmax_unc:.2f}km.")

The maximum horizontal location uncertainty of event 0 is 1.73km.
The azimuth of the maximum horizontal uncertainty is 157.27degrees.
The mainimumhorizontal location uncertainty of event 0 is 1.18km.
The azimuth of the minimum horizontal uncertainty is 67.27degrees.
The maximum vertical location uncertainty is 2.64km.

Before plotting the map with the new locations, let’s make sure our locations are not unnecessarily biased by aberrant pick outliers. Even though we used NonLinLoc’s EDT method, which is robust to outliers, we can easily identify outliers and remove them.

[21]:

# we set a maximum tolerable difference, in percentage, between the picked time and the predicted travel time
MAX_TIME_DIFFERENT_PICKS_PREDICTED_PERCENT = 25
for i, ev in enumerate(backprojection_events):
    if "NLLoc_reloc" in ev.aux_data:
        # this variable was inserted into ev.aux_data if NLLoc successfully located the event
        # use predicted times to remove outlier picks
        ev.remove_outlier_picks(max_diff_percent=MAX_TIME_DIFFERENT_PICKS_PREDICTED_PERCENT)
        if len(ev.picks.dropna(how="any")) >= MINIMUM_NUM_STATIONS_W_PICKS:
            print(f"Relocating event {i}")
            # first relocation, insensitive to outliers
            ev.relocate(
                stations=net.stations, routine=LOCATION_ROUTINE, method=NLLOC_METHOD,
            )
        else:
            # after removing the outlier picks, not enough picks were left
            # revert to backprojection location
            del ev.arrival_times
            ev.longitude = ev.aux_data["backprojection_longitude"]
            ev.latitude = ev.aux_data["backprojection_latitude"]
            ev.depth = ev.aux_data["backprojection_depth"]

Relocating event 0
Relocating event 1
Relocating event 4
Relocating event 6
Relocating event 7
Relocating event 8
Relocating event 10
Relocating event 11
Relocating event 12
Relocating event 13
Relocating event 14
Relocating event 15

[22]:

# play with EVENT_IDX to plot different events
EVENT_IDX = 0
fig = backprojection_events[EVENT_IDX].plot(
    figsize=(15, 15), plot_picks=True, plot_predicted_arrivals=True, plot_probabilities=False
    )

../../_images/tutorial_notebooks_6_relocate_36_0.png

Gather the new catalog, clean up possible multiples and plot the new locations

Just like before, we make a new instance of BPMF.dataset.Catalog to use the built-in plotting methods.

[23]:

# --- update event ids now that we have our final locations
for ev in backprojection_events:
    ev.id = str(ev.origin_time)

# --- create a Catalog objet
nlloc_catalog = BPMF.dataset.Catalog.read_from_events(
    backprojection_events,
    extra_attributes=["hmax_unc", "hmin_unc", "az_hmax_unc", "vmax_unc"]
    )
nlloc_catalog.catalog["az_hmax_unc"] = nlloc_catalog.catalog["az_hmax_unc"]%180.
nlloc_catalog.catalog

Class instance does not have a `cov_mat` attribute.

[23]:

	longitude	latitude	depth	origin_time	hmax_unc	hmin_unc	az_hmax_unc	vmax_unc
event_id
2012-07-26T01:10:21.800000Z	30.323047	40.731250	12.015625	2012-07-26 01:10:21.800	1.733366	1.182667	157.271635	2.640251
2012-07-26T01:15:54.160000Z	30.343555	40.716875	9.273438	2012-07-26 01:15:54.160	2.070620	1.108417	90.435851	2.250307
2012-07-26T01:58:19.720000Z	30.210000	40.600000	-1.500000	2012-07-26 01:58:19.720	0.968885	0.833068	11.277525	2.155680
2012-07-26T02:35:01.160000Z	30.330000	40.700000	-1.500000	2012-07-26 02:35:01.160	4.140636	2.593994	72.951755	4.445532
2012-07-26T03:00:39.160000Z	30.324023	40.715625	9.070312	2012-07-26 03:00:39.160	1.812523	1.146382	96.893951	3.431834
2012-07-26T04:43:40.160000Z	30.350000	40.720000	8.500000	2012-07-26 04:43:40.160	15.000000	15.000000	0.000000	15.000000
2012-07-26T04:48:38.640000Z	30.339648	40.716875	9.273438	2012-07-26 04:48:38.640	2.689873	1.357818	77.828681	3.859434
2012-07-26T08:08:25.680000Z	30.335742	40.720625	8.867188	2012-07-26 08:08:25.680	2.457855	1.324990	80.384442	3.851911
2012-07-26T10:07:23.720000Z	30.338672	40.716250	8.765625	2012-07-26 10:07:23.720	1.642315	0.946182	93.504006	2.522272
2012-07-26T11:55:35.520000Z	30.270000	40.600000	3.000000	2012-07-26 11:55:35.520	2.731442	1.437874	152.628773	3.716407
2012-07-26T13:35:26.480000Z	30.295215	40.797812	-1.949219	2012-07-26 13:35:26.480	3.697239	0.998076	88.219160	5.458099
2012-07-26T13:48:32.480000Z	30.315234	40.723750	11.609375	2012-07-26 13:48:32.480	1.429097	1.324683	34.905759	2.865370
2012-07-26T13:51:58.200000Z	30.314258	40.713125	13.335938	2012-07-26 13:51:58.200	1.782372	1.640373	133.233003	3.765628
2012-07-26T13:53:31.400000Z	30.311328	40.718750	12.015625	2012-07-26 13:53:31.400	1.557949	1.486866	36.010911	3.102796
2012-07-26T13:56:54.320000Z	30.337695	40.716875	8.867188	2012-07-26 13:56:54.320	2.875005	1.242650	84.046297	2.869467
2012-07-26T15:06:20.160000Z	30.326221	40.600156	-1.974609	2012-07-26 15:06:20.160	1.502012	0.491035	92.386020	0.880841
2012-07-26T17:28:20.400000Z	30.430000	40.720000	10.000000	2012-07-26 17:28:20.400	3.553327	1.468112	71.712055	5.997913

Remove possible multiple detections:

The backprojection method does not have good inter-event time resolution because of the large uncertainties on the event origin times. There always exists a relatively large trade-off between the origin time and the location so it happens to detect the same event several times in a row if MINIMUM_INTEREVENT_TIME_SEC was not set large enough in 5_backprojection.ipynb. After relocation with NonLinLoc, it is easier to identify such multiple detections.

[24]:

# the criteria that determine whether two events are the same or not are of course somewhat arbitrary
# however, one can play with them and see what values effectively discard multiples
MINIMUM_INTEREVENT_TIME_SEC = 5.
MINIMUM_EPICENTRAL_DISTANCE_KM = 10.

[25]:

# ---------------------------------------------------------
#         Remove multiple detections of same event
# ---------------------------------------------------------

# --- build a dictionary where each entry is an event, for explicit indexing
backprojection_events_dict = {ev.id : ev for ev in backprojection_events}


keep = np.ones(nlloc_catalog.n_events, dtype=bool)
for i in range(1, nlloc_catalog.n_events):
    previous_ev = i - 1
    while not keep[previous_ev]:
        previous_ev -= 1
        if previous_ev < 0:
            break
    if previous_ev < 0:
        # there is not previous event to compare with
        continue
    # -----------------------------------------
    # catalog row indexes and backprojection_events indexes do not always match
    # because catalog is ordered chronologically and some events' chronology
    # might have swapped during NLLoc's location process
    # -----------------------------------------
    event_id = nlloc_catalog.catalog.index[i]
    det = backprojection_events_dict[event_id]
    previous_event_id = nlloc_catalog.catalog.index[previous_ev]
    previous_det = backprojection_events_dict[previous_event_id]

    dt_sec = (
        nlloc_catalog.catalog.origin_time[i]
        - nlloc_catalog.catalog.origin_time[previous_ev]
    ).total_seconds()
    if dt_sec < MINIMUM_INTEREVENT_TIME_SEC:
        # test their epicentral distance
        epi_distance = BPMF.utils.two_point_epicentral_distance(
                nlloc_catalog.catalog.iloc[i]["longitude"],
                nlloc_catalog.catalog.iloc[i]["latitude"],
                nlloc_catalog.catalog.iloc[previous_ev]["longitude"],
                nlloc_catalog.catalog.iloc[previous_ev]["latitude"],
                )
        if epi_distance > MINIMUM_EPICENTRAL_DISTANCE_KM:
            # these two events are close in time but distant in space
            continue
        if (
            "NLLoc_reloc" in det.aux_data
            and "NLLoc_reloc" not in previous_det.aux_data
        ):
            keep[previous_ev] = False
        elif (
            "NLLoc_reloc" not in det.aux_data
            and "NLLoc_reloc" in previous_det.aux_data
        ):
            keep[i] = False
        elif det.hmax_unc < previous_det.hmax_unc:
            keep[previous_ev] = False
        elif det.hmax_unc > previous_det.hmax_unc:
            keep[i] = False
        else:
            # if both det and previous_det were not relocated with NLLoc,
            # then there origin times cannot be closer than MINIMUM_INTEREVENT_TIME_SEC
            continue

nlloc_catalog.catalog = nlloc_catalog.catalog[keep]
nlloc_catalog.catalog["event_id"] = [f"bp_{i}" for i in range(len(nlloc_catalog.catalog))]
nlloc_catalog.catalog

[25]:

	longitude	latitude	depth	origin_time	hmax_unc	hmin_unc	az_hmax_unc	vmax_unc	event_id
event_id
2012-07-26T01:10:21.800000Z	30.323047	40.731250	12.015625	2012-07-26 01:10:21.800	1.733366	1.182667	157.271635	2.640251	bp_0
2012-07-26T01:15:54.160000Z	30.343555	40.716875	9.273438	2012-07-26 01:15:54.160	2.070620	1.108417	90.435851	2.250307	bp_1
2012-07-26T01:58:19.720000Z	30.210000	40.600000	-1.500000	2012-07-26 01:58:19.720	0.968885	0.833068	11.277525	2.155680	bp_2
2012-07-26T02:35:01.160000Z	30.330000	40.700000	-1.500000	2012-07-26 02:35:01.160	4.140636	2.593994	72.951755	4.445532	bp_3
2012-07-26T03:00:39.160000Z	30.324023	40.715625	9.070312	2012-07-26 03:00:39.160	1.812523	1.146382	96.893951	3.431834	bp_4
2012-07-26T04:43:40.160000Z	30.350000	40.720000	8.500000	2012-07-26 04:43:40.160	15.000000	15.000000	0.000000	15.000000	bp_5
2012-07-26T04:48:38.640000Z	30.339648	40.716875	9.273438	2012-07-26 04:48:38.640	2.689873	1.357818	77.828681	3.859434	bp_6
2012-07-26T08:08:25.680000Z	30.335742	40.720625	8.867188	2012-07-26 08:08:25.680	2.457855	1.324990	80.384442	3.851911	bp_7
2012-07-26T10:07:23.720000Z	30.338672	40.716250	8.765625	2012-07-26 10:07:23.720	1.642315	0.946182	93.504006	2.522272	bp_8
2012-07-26T11:55:35.520000Z	30.270000	40.600000	3.000000	2012-07-26 11:55:35.520	2.731442	1.437874	152.628773	3.716407	bp_9
2012-07-26T13:35:26.480000Z	30.295215	40.797812	-1.949219	2012-07-26 13:35:26.480	3.697239	0.998076	88.219160	5.458099	bp_10
2012-07-26T13:48:32.480000Z	30.315234	40.723750	11.609375	2012-07-26 13:48:32.480	1.429097	1.324683	34.905759	2.865370	bp_11
2012-07-26T13:51:58.200000Z	30.314258	40.713125	13.335938	2012-07-26 13:51:58.200	1.782372	1.640373	133.233003	3.765628	bp_12
2012-07-26T13:53:31.400000Z	30.311328	40.718750	12.015625	2012-07-26 13:53:31.400	1.557949	1.486866	36.010911	3.102796	bp_13
2012-07-26T13:56:54.320000Z	30.337695	40.716875	8.867188	2012-07-26 13:56:54.320	2.875005	1.242650	84.046297	2.869467	bp_14
2012-07-26T15:06:20.160000Z	30.326221	40.600156	-1.974609	2012-07-26 15:06:20.160	1.502012	0.491035	92.386020	0.880841	bp_15
2012-07-26T17:28:20.400000Z	30.430000	40.720000	10.000000	2012-07-26 17:28:20.400	3.553327	1.468112	71.712055	5.997913	bp_16

The NLLoc locations are able to resolve the location differences between events much better than the backprojection locations. However, limits remain due to:

Errors in picks.
Approximations in velocity model (1D model whereas the area is hghly heterogeneous!).
Earthquakes are located outside the network, so large uncertainties cannot be avoided.

[26]:

fig = nlloc_catalog.plot_map(
    figsize=(10, 10), network=net, s=50, markersize_station=50, lat_margin=0.02, plot_uncertainties=True
    )
ax = fig.get_axes()[0]
ax.set_facecolor("dimgrey")
ax.patch.set_alpha(0.15)

../../_images/tutorial_notebooks_6_relocate_43_0.png

Save the relocated events

Save all the events that were not identified as multiples to a hdf5 database.

[27]:

nlloc_catalog.catalog.to_csv(
    os.path.join(BPMF.cfg.OUTPUT_PATH, OUTPUT_CSV_FILENAME)
)

[28]:

for i in range(len(nlloc_catalog.catalog)):
    evid = nlloc_catalog.catalog.index[i]
    backprojection_events_dict[evid].write(
            OUTPUT_DB_FILENAME,
            db_path=BPMF.cfg.OUTPUT_PATH,
            gid=backprojection_events_dict[evid].id
            )

Relocation 2 (Bonus): Locating and estimating uncertainties with backprojection on a fine grid

The goal of this section is to explore an alternative to the PhaseNet + NonLinLoc workflow solely based on the use backprojection. If a “sparse” grid was used for the purpose of earthquake detection, then a fine grid can be used here.

In the following, we run a “vanilla” backprojection that backprojects the waveforms’ envelopes (as defined by the Hilbert transform).

[29]:

# PHASES: list of seismic phases to use for backprojection
PHASES = ["P", "S"]
# TT_FILENAME: name of the file with the travel times
TT_FILENAME = "tts.h5"
# N_MAX_STATIONS: the maximum number of stations used for stacking (e.g. the closest to a given grid point)
# note: this parameter is irrelevant in this example with only 8 stations, but it is important when
# using large seismic networks
N_MAX_STATIONS = 10

For backprojection, we need to load the travel time grid and set up the BPMF.template_search.Beamformer instance.

[30]:

tts = BPMF.template_search.TravelTimes(
    TT_FILENAME,
    tt_folder_path=BPMF.cfg.MOVEOUTS_PATH
)
tts.read(
    PHASES, source_indexes=None, read_coords=True, stations=net.stations
)
tts.convert_to_samples(BPMF.cfg.SAMPLING_RATE_HZ, remove_tt_seconds=True)

[31]:

EVENT_IDX = 0

# initialize a new beamformer because, in general, you might want to use a different
# travel-time table for this beamformer (e.g. denser)
bf_loc = BPMF.template_search.Beamformer(
    phases=PHASES,
    moveouts_relative_to_first=True,
)
# here, we need to "trick" the Beamformer instance and give it something similar to the Data instance in 5_backprojection
bf_loc.set_data(backprojection_events[EVENT_IDX])
# attach the Network instance
bf_loc.set_network(net)
# attach the TravelTimes instance
bf_loc.set_travel_times(tts)

# attach weights (the waveform features are the waveforms' envelopes)
# weights_phases = np.ones((net.n_stations, net.n_components, len(PHASES)), dtype=np.float32)
# weights_phases[:, :2, 0] = 0. # P-wave weight = 0 for horizontal components
# weights_phases[:, 2, 1] = 0. # S-wave weight = 0 for vertical components

# uncomment this section to perform the backprojection-based relocation
# with ML waveform features (PhaseNet)
weights_phases = np.ones((net.n_stations, 2, 2), dtype=np.float32)
weights_phases[:, 0, 1] = 0.0  # S-wave weights to zero for channel 0
weights_phases[:, 1, 0] = 0.0  # P-wave weights to zero for channel 1
bf_loc.set_weights_sources(N_MAX_STATIONS)
bf_loc.set_weights(weights_phases=weights_phases)

[32]:

# relocation parameters
LOCATION_ROUTINE = "beam"
RESTRICTED_DOMAIN_FOR_UNC_KM = 100.
DATA_FOLDER = "preprocessed_2_12"

[33]:

# uncomment this section to perform the backprojection-based relocation
# with ML waveform features (PhaseNet)

import torch
import seisbench.models as sbm

beam_relocated_event = copy.deepcopy(backprojection_events[EVENT_IDX])
ml_detector = sbm.PhaseNet.from_pretrained("original")
ml_detector.eval()

beam_relocated_event.read_waveforms(
    120., offset_ot=60., time_shifted=False,
    data_folder="preprocessed_2_12", data_reader=data_reader_mseed
)

data_arr = beam_relocated_event.get_np_array(net.stations)

data_arr_n = BPMF.utils.normalize_batch(data_arr)
closest_pow2 = int(np.log2(data_arr_n.shape[-1])) + 1
diff = 2**closest_pow2 - data_arr_n.shape[-1]
left = diff//2
right = diff//2 + diff%2
data_arr_n = np.pad(
        data_arr_n,
        ((0, 0), (0, 0), (left, right)),
        mode="reflect"
        )


with torch.no_grad():
    PN_probas = ml_detector(
        torch.from_numpy(data_arr_n).float()
    )
# remove channel 0 that is probability of noise
waveform_features = PN_probas[:, 1:, :].detach().numpy()

[34]:

beam_relocated_event = copy.deepcopy(backprojection_events[EVENT_IDX])

# uncomment this section if you are using envelopes (modulus of analytical signal) as waveform features
# beam_relocated_event.relocate(
#     routine=LOCATION_ROUTINE,
#     duration=120.,
#     offset_ot=40.,
#     beamformer=bf_loc,
#     data_folder=DATA_FOLDER,
#     restricted_domain_side_km=RESTRICTED_DOMAIN_FOR_UNC_KM,
#     data_reader=data_reader_mseed,
# )

# uncomment this section if you are using ML waveform features
beam_relocated_event.relocate(
    routine=LOCATION_ROUTINE,
    beamformer=bf_loc,
    data_folder=DATA_FOLDER,
    restricted_domain_side_km=RESTRICTED_DOMAIN_FOR_UNC_KM,
    data_reader=data_reader_mseed,
    waveform_features=waveform_features
)

[35]:

dummy_catalog = BPMF.dataset.Catalog.read_from_events(
    [backprojection_events[EVENT_IDX]],
    extra_attributes=["hmax_unc", "hmin_unc", "az_hmax_unc", "vmax_unc"]
    )
dummy_catalog.catalog["az_hmax_unc"] = dummy_catalog.catalog["az_hmax_unc"]%180.
dummy_catalog.catalog

[35]:

	longitude	latitude	depth	origin_time	hmax_unc	hmin_unc	az_hmax_unc	vmax_unc
event_id
2012-07-26T01:10:21.800000Z	30.323047	40.73125	12.015625	2012-07-26 01:10:21.800	1.733366	1.182667	157.271635	2.640251

The difference between this backprojection and the “detection backprojection” of 5_backprojection is that, here, we stored the value of each beam of the 4D (x, y, z, time) grid! Below, we plot 3 slices of the (x, y, z, time=time of max beam) volume.

[36]:

from cartopy.crs import PlateCarree

scatter_kwargs = {}
scatter_kwargs["edgecolor"] = "k"
scatter_kwargs["linewidths"] = 0.5
scatter_kwargs["s"] = 40
scatter_kwargs["zorder"] = 3
scatter_kwargs["color"] = "C3"



# when using the beam-based relocation routine, the entire grid of beams is stored in memory
fig_beam_reloc = bf_loc.plot_likelihood(figsize=(15, 11), likelihood=bf_loc.likelihood)
axes = fig_beam_reloc.get_axes()
fig_beam_reloc = dummy_catalog.plot_map(
    ax=axes[0], network=net, s=50, markersize_station=50, lat_margin=0.02, plot_uncertainties=True, depth_colorbar=False
    )

# the following is a bit cumbersome and is just to plot the uncertainty ellipses on the depth cross-sections
ax = axes[0]
data_coords = PlateCarree()
projected_coords_nlloc = ax.projection.transform_points(
    data_coords,
    np.array([backprojection_events[EVENT_IDX].longitude]),
    np.array([backprojection_events[EVENT_IDX].latitude]),
)
projected_coords_backproj = ax.projection.transform_points(
    data_coords,
    np.array([beam_relocated_event.longitude]),
    np.array([beam_relocated_event.latitude]),
)

ax.scatter(
    beam_relocated_event.longitude,
    beam_relocated_event.latitude,
    marker="*",
    s=40,
    color="k",
    label="Initial backprojection location",
    transform=data_coords
)

axes[1].scatter(
    projected_coords_nlloc[0, 0], backprojection_events[EVENT_IDX].depth, **scatter_kwargs
)
axes[1].scatter(
    projected_coords_backproj[0, 0],
    initial_catalog.catalog.loc[EVENT_IDX, "depth"],
    marker="*",
    s=40,
    color="k",
)
ew_ellipse_lon, ew_ellipse_lat, ew_ellipse_dep = BPMF.plotting_utils.vertical_uncertainty_ellipse(
    backprojection_events[EVENT_IDX].aux_data["cov_mat"],
    backprojection_events[EVENT_IDX].longitude,
    backprojection_events[EVENT_IDX].latitude,
    backprojection_events[EVENT_IDX].depth,
    horizontal_direction="longitude"
)
projected_ellipse_coords = ax.projection.transform_points(
    data_coords,
    ew_ellipse_lon,
    ew_ellipse_lat,
)
axes[1].plot(
    projected_ellipse_coords[:, 0], ew_ellipse_dep, color=scatter_kwargs["color"]
)

axes[2].scatter(
    backprojection_events[EVENT_IDX].depth, projected_coords_nlloc[0, 1], **scatter_kwargs
)
axes[2].scatter(
    initial_catalog.catalog.loc[EVENT_IDX, "depth"],
    projected_coords_backproj[0, 1],
    marker="*",
    s=40,
    color="k",
)
ns_ellipse_lon, ns_ellipse_lat, ns_ellipse_dep = BPMF.plotting_utils.vertical_uncertainty_ellipse(
    backprojection_events[EVENT_IDX].aux_data["cov_mat"],
    backprojection_events[EVENT_IDX].longitude,
    backprojection_events[EVENT_IDX].latitude,
    backprojection_events[EVENT_IDX].depth,
    horizontal_direction="latitude"
)
projected_ellipse_coords = ax.projection.transform_points(
    data_coords,
    ns_ellipse_lon,
    ns_ellipse_lat,
)
axes[2].plot(
    ns_ellipse_dep, projected_ellipse_coords[:, 1], color=scatter_kwargs["color"]
)

axes[1].set_ylim(20., axes[1].get_ylim()[1])
axes[2].set_xlim(axes[2].get_xlim()[0], 20.)

for i, ax in enumerate(axes[:-1]):
    ax.text(0.02, 0.98, f'({string.ascii_lowercase[i]})', va='top',
            fontsize=20, ha='left', bbox={"facecolor": "white", "edgecolor": "none", "alpha": 0.75},
            transform=ax.transAxes)

../../_images/tutorial_notebooks_6_relocate_57_0.png

The current way of estimating location uncertainties with the backprojection method is to compute a weighted average distance between the maximum beam grid point (indexed by \(k^*\)) and all other grid points. To avoid the result to be grid-size dependent, we limit the weighted average to a restricted rectangular domain taken around the maximum beam grid point.

\[h_{unc} = \dfrac{L_k e_{i,k^*}}{\sum_{k \in \mathcal{R}} L_k},\]

\[v_{unc} = \dfrac{L_k \Delta d_{i,k^*}}{\sum_{k \in \mathcal{R}} L_k},\]

where \(e_{i,k^*}\) is the epicentral distance between grid points \(i\) and \(k^*\) and \(\Delta d_{i,k^*}\) is the depth difference between grid points \(i\) and \(k^*\).

In this case, we assume the uncertainty ellipse to be isotropic, that is, \(h_{max,unc} = h_{min,unc}\).

[ ]:

EVENT_IDX = 0
ev = backprojection_events[EVENT_IDX]
print(f"The maximum horizontal location uncertainty of event {EVENT_IDX} is {ev.hmax_unc:.2f}km.")
print(f"The azimuth of the maximum horizontal uncertainty is {ev.az_hmax_unc:.2f}degrees.")
print(f"The mainimum horizontal location uncertainty of event {EVENT_IDX} is {ev.hmin_unc:.2f}km.")
print(f"The azimuth of the minimum horizontal uncertainty is {ev.az_hmin_unc:.2f}degrees.")
print(f"The maximum vertical location uncertainty is {ev.vmax_unc:.2f}km with {ev._pl_vmax_unc%90:.2f}degrees plunge.")

The maximum horizontal location uncertainty of event 0 is 1.73km.
The azimuth of the maximum horizontal uncertainty is 157.27degrees.
The mainimum horizontal location uncertainty of event 0 is 1.18km.
The azimuth of the minimum horizontal uncertainty is 67.27degrees.
The maximum vertical location uncertainty is 2.64km with 29.70degrees plunge.