Relocation

In the previous notebook, 5_backprojection, we built a database of events detected with the backprojection technique. Here, we will improve the location of each of these events by using a more elaborated location technique.

[1]:
import os
# choose the number of threads you want to limit the computation to
n_CPUs = 24
os.environ["OMP_NUM_THREADS"] = str(n_CPUs)

import copy
import BPMF
import h5py as h5
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import string

from BPMF.data_reader_examples import data_reader_mseed
from time import time as give_time

%config InlineBackend.figure_formats = ["svg"]
plt.rcParams["svg.fonttype"] = "none"
[2]:
INPUT_DB_FILENAME = "raw_bp.h5"
OUTPUT_DB_FILENAME = "reloc_bp.h5"
OUTPUT_CSV_FILENAME = "backprojection_catalog.csv"
NETWORK_FILENAME = "network.csv"
[3]:
net = BPMF.dataset.Network(NETWORK_FILENAME)
net.read()

Load the metadata of the events previously detected with backprojection

In the previous notebook, we used BPMF.dataset.Event.write to save the metadata of each detected event to a hdf5 database. Here, we will use BPMF.dataset.Event.read_from_file to read the metadata and build a list of BPMF.dataset.Event instances.

[4]:
backprojection_events = []
with h5.File(os.path.join(BPMF.cfg.OUTPUT_PATH, INPUT_DB_FILENAME), mode="r") as fin:
    print(f"Each event's metadata is stored at a 'subfolder'. The subfolders are: {fin.keys()}")
    print("In hdf5, 'subfolders' are called 'groups'.")
    for group_id in fin.keys():
        backprojection_events.append(
            BPMF.dataset.Event.read_from_file(
                INPUT_DB_FILENAME,
                db_path=BPMF.cfg.OUTPUT_PATH,
                gid=group_id,
                data_reader=data_reader_mseed
            )
        )
Each event's metadata is stored at a 'subfolder'. The subfolders are: <KeysViewHDF5 ['20120726_011024.280000', '20120726_011555.880000', '20120726_015757.600000', '20120726_023502.160000', '20120726_030040.880000', '20120726_044340.160000', '20120726_044840.160000', '20120726_080827.360000', '20120726_100725.640000', '20120726_115536.760000', '20120726_133527.760000', '20120726_134834.840000', '20120726_135200.480000', '20120726_135333.600000', '20120726_135656.000000', '20120726_150621.480000', '20120726_172821.680000']>
In hdf5, 'subfolders' are called 'groups'.
[5]:
initial_catalog = pd.DataFrame(columns=["origin_time", "longitude", "latitude", "depth"])
for i, ev in enumerate(backprojection_events):
    initial_catalog.loc[i, "origin_time"] = ev.origin_time
    initial_catalog.loc[i, "longitude"] = ev.longitude
    initial_catalog.loc[i, "latitude"] = ev.latitude
    initial_catalog.loc[i, "depth"] = ev.depth
initial_catalog["origin_time"] = initial_catalog["origin_time"].values.astype("datetime64[ms]")
initial_catalog["longitude"] = initial_catalog["longitude"].values.astype("float32")
initial_catalog["latitude"] = initial_catalog["latitude"].values.astype("float32")
initial_catalog["depth"] = initial_catalog["depth"].values.astype("float32")
initial_catalog
[5]:
origin_time longitude latitude depth
0 2012-07-26 01:10:24.280 30.309999 40.720001 12.0
1 2012-07-26 01:15:55.880 30.379999 40.709999 8.0
2 2012-07-26 01:57:57.600 30.209999 40.599998 -1.5
3 2012-07-26 02:35:02.160 30.330000 40.700001 -1.5
4 2012-07-26 03:00:40.880 30.350000 40.709999 9.0
5 2012-07-26 04:43:40.160 30.350000 40.720001 8.5
6 2012-07-26 04:48:40.160 30.389999 40.709999 8.0
7 2012-07-26 08:08:27.360 30.379999 40.709999 8.0
8 2012-07-26 10:07:25.640 30.330000 40.720001 9.5
9 2012-07-26 11:55:36.760 30.270000 40.599998 3.0
10 2012-07-26 13:35:27.760 30.450001 40.799999 -2.0
11 2012-07-26 13:48:34.840 30.309999 40.720001 12.0
12 2012-07-26 13:52:00.480 30.360001 40.709999 7.5
13 2012-07-26 13:53:33.600 30.330000 40.720001 12.0
14 2012-07-26 13:56:56.000 30.330000 40.730000 10.0
15 2012-07-26 15:06:21.480 30.330000 40.599998 -2.0
16 2012-07-26 17:28:21.680 30.430000 40.720001 10.0

Build a BPMF.dataset.Catalog instance from initial_catalog in order to benefit from the BPMF.dataset.Catalog plotting methods.

[6]:
initial_catalog = BPMF.dataset.Catalog.read_from_dataframe(
    initial_catalog,
)
[7]:
initial_catalog.catalog
[7]:
longitude latitude depth origin_time
0 30.309999 40.720001 12.0 2012-07-26 01:10:24.280
1 30.379999 40.709999 8.0 2012-07-26 01:15:55.880
2 30.209999 40.599998 -1.5 2012-07-26 01:57:57.600
3 30.330000 40.700001 -1.5 2012-07-26 02:35:02.160
4 30.350000 40.709999 9.0 2012-07-26 03:00:40.880
5 30.350000 40.720001 8.5 2012-07-26 04:43:40.160
6 30.389999 40.709999 8.0 2012-07-26 04:48:40.160
7 30.379999 40.709999 8.0 2012-07-26 08:08:27.360
8 30.330000 40.720001 9.5 2012-07-26 10:07:25.640
9 30.270000 40.599998 3.0 2012-07-26 11:55:36.760
10 30.450001 40.799999 -2.0 2012-07-26 13:35:27.760
11 30.309999 40.720001 12.0 2012-07-26 13:48:34.840
12 30.360001 40.709999 7.5 2012-07-26 13:52:00.480
13 30.330000 40.720001 12.0 2012-07-26 13:53:33.600
14 30.330000 40.730000 10.0 2012-07-26 13:56:56.000
15 30.330000 40.599998 -2.0 2012-07-26 15:06:21.480
16 30.430000 40.720001 10.0 2012-07-26 17:28:21.680

Let’s plot the locations that the backprojection method produced.

[8]:
fig = initial_catalog.plot_map(figsize=(10, 10), network=net, s=50, markersize_station=50, lat_margin=0.02)
ax = fig.get_axes()[0]
ax.set_facecolor("dimgrey")
ax.patch.set_alpha(0.15)
../../_images/tutorial_notebooks_6_relocate_11_0.svg

The backprojection method offers several advantages for locating earthquakes: - Locations come for free in addition to detecting events. - The “picking” of P and S waves does not involve any single-channel threshold. - Backprojection effectively solves the detection and phase association problems at the same time.

However, there is a number of cons: - The backprojection locations are just as good as the grid is. If a downsampled grid was used for computation efficiency, then the locations definitely need to be refined. - It is hard and computationally expensive to estimate uncertainties (as we will see later in this notebook).

Thus, in the following, we will explore two ways to refine the initial locations.

Relocation 1: PhaseNet picks and NonLinLoc

If you have installed PhaseNet and NonLinLoc, you can use BPMF.dataset.Event methods to easily relocate events with automatic picking and earthquake location. This is a two-step procedure. We first need to pick the P- and S-wave arrivals with PhaseNet before giving these data to NonLinLoc.

We will explore, step by step, what happens to the BPMF.dataset.Event instance when picking the phase arrivals and when relocating with NonLinLoc. First, we read short windows around the P- and S-wave arrivals, as determined by the backprojection locations.

Automated picking and relocation

[9]:
# event extraction parameters

# PHASE_ON_COMP: dictionary defining which moveout we use to extract the waveform
PHASE_ON_COMP = {"N": "S", "1": "S", "E": "S", "2": "S", "Z": "P"}
# OFFSET_PHASE: dictionary defining the time offset taken before a given phase
#               for example OFFSET_PHASE["P"] = 1.0 means that we extract the window
#               1 second before the predicted P arrival time
OFFSET_PHASE = {"P": 1.0, "S": 4.0}
# TIME_SHIFTED: boolean, if True, use moveouts to extract relevant windows
TIME_SHIFTED = True
# DATA_FOLDER: name of the folder where the waveforms we want to use for picking are stored
DATA_FOLDER = "preprocessed_2_12"
[10]:
t1 = give_time()
for i in range(len(backprojection_events)):
    backprojection_events[i].read_waveforms(
        BPMF.cfg.TEMPLATE_LEN_SEC,
        phase_on_comp=PHASE_ON_COMP,
        offset_phase=OFFSET_PHASE,
        time_shifted=TIME_SHIFTED,
        data_folder=DATA_FOLDER,
    )
t2 = give_time()
print(
    f"{t2-t1:.2f}sec to read the waveforms for all {len(backprojection_events):d} detections."
)
2.65sec to read the waveforms for all 17 detections.

Let’s plot these short windows with BPMF.dataset.Event.plot.

[11]:
# play with EVENT_IDX to plot different events
EVENT_IDX = 0
fig = backprojection_events[EVENT_IDX].plot(figsize=(15, 15))
../../_images/tutorial_notebooks_6_relocate_17_0.svg

Now, we will use PhaseNet to pick the P- and S-wave arrivals. This is done with the BPMF.dataset.Event.pick_PS_phases method.

[12]:
# PhaseNet picking parameters

# PhaseNet was trained for 100Hz data. Even if we saw that running PhaseNet on 25Hz data
# was good for backprojection, here, picking benefits from running PhaseNet on 100Hz data.
# Thus, we will upsample the waveforms before running PhaseNet.
PHASENET_SAMPLING_RATE_HZ = 100.
UPSAMPLING_BEFORE_PN_RELOCATION = int(PHASENET_SAMPLING_RATE_HZ/BPMF.cfg.SAMPLING_RATE_HZ)
DOWNSAMPLING_BEFORE_PN_RELOCATION = 1

# DURATION_SEC: the duration, in seconds, of the data stream starting at the detection time
#               defined by Event.origin_time. This data stream is used for picking the P/S waves.
DURATION_SEC = 60.0
# THRESHOLD_P: probability of P-wave arrival above which we declare a pick. If several picks are
#              declared during the DURATION_SEC data stream, we only keep the best one. We can
#              afford using a low probability threshold since we already know with some confidence
#              that an earthquake is in the data stream.
THRESHOLD_P = 0.10
# THRESHOLD_S: probability of S-wave arrival above which we declare a pick.
THRESHOLD_S = 0.10
# DATA_FOLDER: name of the folder where the waveforms we want to use for picking are stored
DATA_FOLDER = "preprocessed_2_12"
# COMPONENT_ALIASES: A dictionary that defines the possible channel names to search for
#                    for example, the seismometer might not be oriented and the horizontal channels
#                    might be named 1 and 2, in which case we arbitrarily decide to take 1 as the "N" channel
#                    and 2 as the "E" channel. This doesn't matter for picking P- and S-wave arrival times.
COMPONENT_ALIASES = {"N": ["N", "1"], "E": ["E", "2"], "Z": ["Z"]}
# USE_APRIORI_PICKS: boolean. This option is IMPORTANT when running BPMF in HIGH SEISMICITY CONTEXTS, like
#                   during the aftershock sequence of a large earthquake. If there are many events happening
#                   close to each other in time, we need to guide PhaseNet to pick the right set of picks.
#                   For that, we use the predicted P- and S-wave times from backprojection to add extra weight to
#                   the picks closer to those times and make it more likely to identify them as the "best" picks.
#                   WARNING: If there are truly many events, even this trick might fail. It's because "phase association"
#                   is an intrinsically hard problem in this case, and the picking might be hard to do automatically.
USE_APRIORI_PICKS = True
[13]:
for i in range(len(backprojection_events)):
    print(f"Picking P and S wave arrivals on detection {i:d}")
    # make sure the event's station list has all the stations at which
    # we eventually want travel times after NLLoc's location
    backprojection_events[i].stations = net.stations.astype("U")
    # the following line should only be used in a similar context as here
    # it uses the moveouts from the backprojection location to define the
    # P/S arrival times that we use a "prior information"
    backprojection_events[i].set_arrival_times_from_moveouts(verbose=0)
    backprojection_events[i].pick_PS_phases(
        DURATION_SEC,
        threshold_P=THRESHOLD_P,
        threshold_S=THRESHOLD_S,
        component_aliases=COMPONENT_ALIASES,
        data_folder=DATA_FOLDER,
        upsampling=UPSAMPLING_BEFORE_PN_RELOCATION,
        downsampling=DOWNSAMPLING_BEFORE_PN_RELOCATION,
        use_apriori_picks=USE_APRIORI_PICKS,
        keep_probability_time_series=True,
    )
Picking P and S wave arrivals on detection 0
Picking P and S wave arrivals on detection 1
Picking P and S wave arrivals on detection 2
Picking P and S wave arrivals on detection 3
Picking P and S wave arrivals on detection 4
Picking P and S wave arrivals on detection 5
Picking P and S wave arrivals on detection 6
Picking P and S wave arrivals on detection 7
Picking P and S wave arrivals on detection 8
Picking P and S wave arrivals on detection 9
Picking P and S wave arrivals on detection 10
Picking P and S wave arrivals on detection 11
Picking P and S wave arrivals on detection 12
Picking P and S wave arrivals on detection 13
Picking P and S wave arrivals on detection 14
Picking P and S wave arrivals on detection 15
Picking P and S wave arrivals on detection 16

We are done with the automatic picking! Every instance of BPMF.dataset.Event should now have a new attribute picks.

[14]:
EVENT_IDX = 0
backprojection_events[EVENT_IDX].picks
[14]:
P_probas P_picks_sec P_unc_sec S_probas S_picks_sec S_unc_sec P_abs_picks S_abs_picks
stations
DE07 0.665437 20.300467 0.155635 0.860289 22.553028 0.101877 2012-07-26T01:10:24.580467Z 2012-07-26T01:10:26.833028Z
DC07 0.602067 20.334587 0.185502 0.678574 22.202778 0.129679 2012-07-26T01:10:24.614587Z 2012-07-26T01:10:26.482778Z
DC08 0.754658 20.053699 0.115350 0.690027 21.970955 0.112607 2012-07-26T01:10:24.333699Z 2012-07-26T01:10:26.250955Z
DE08 0.241382 19.796841 0.295909 0.366147 21.934898 0.229555 2012-07-26T01:10:24.076841Z 2012-07-26T01:10:26.214898Z
SPNC 0.918361 20.136955 0.094823 0.884857 21.986250 0.096080 2012-07-26T01:10:24.416955Z 2012-07-26T01:10:26.266250Z
DC06 0.826056 20.938959 0.085419 0.886496 23.207861 0.097003 2012-07-26T01:10:25.218959Z 2012-07-26T01:10:27.487861Z
SAUV 0.793898 19.595509 0.104621 0.818875 21.098804 0.107155 2012-07-26T01:10:23.875509Z 2012-07-26T01:10:25.378804Z
DD06 0.456268 20.447828 0.156354 0.805816 22.341667 0.112516 2012-07-26T01:10:24.727828Z 2012-07-26T01:10:26.621667Z

These picks are automatically read when calling BPMF.dataset.Event.plot. Before plotting the waveforms with the newly obtained picks, let’s re-read the waveforms in the same format as before for the sake of easier visualization.

[15]:
t1 = give_time()
for i in range(len(backprojection_events)):
    backprojection_events[i].read_waveforms(
        BPMF.cfg.TEMPLATE_LEN_SEC,
        phase_on_comp=PHASE_ON_COMP,
        offset_phase=OFFSET_PHASE,
        time_shifted=TIME_SHIFTED,
        data_folder=DATA_FOLDER,
    )
t2 = give_time()
print(
    f"{t2-t1:.2f}sec to read the waveforms for all {len(backprojection_events):d} detections."
)
3.00sec to read the waveforms for all 17 detections.

The following plot shows: - the P-/S-wave arrival continuous probabilities (blue=P-wave, red=S-wave) computed by PhaseNet, - the PhaseNet picks with dashed vertical lines (blue=P-wave, red=S-wave), - the arrival times predicted by the backprojection location with solid vertical lines (purple=P-wave, orange=S-wave).

See how, in this tutorial, the backprojection locations already do a good job at predicting the P- and S-wave times… given the limitations of the 1D velocity model.

Note: Try to run the phase picking again with USE_APRIORI_PICKS=False and see how the plot changes.

[16]:
# play with EVENT_IDX to plot different events
EVENT_IDX = 0
fig = backprojection_events[EVENT_IDX].plot(
    figsize=(15, 15), plot_picks=True, plot_predicted_arrivals=False, plot_probabilities=True
    )
../../_images/tutorial_notebooks_6_relocate_26_0.svg

We now are all set for using our favorite location software using PhaseNet’s picks. We can use NLLoc with one of BPMF.dataset.Event methods. See http://alomax.free.fr/nlloc/ for more info.

[17]:
# location parameters

LOCATION_ROUTINE = "NLLoc"
# NLLOC_METHOD: string that defines what loss function is used by NLLoc, see http://alomax.free.fr/nlloc/ for more info.
#               Using some flavor of 'EDT' is important to obtain robust locations that are not sensitive to pick outliers.
NLLOC_METHOD = "EDT_OT_WT_ML"
# MINIMUM_NUM_STATIONS_W_PICKS: minimum number of stations with picks to even try relocation.
MINIMUM_NUM_STATIONS_W_PICKS = 3
[18]:
for i, ev in enumerate(backprojection_events):
    # before relocating, let's store the old location for book-keeping
    ev.set_aux_data(
        {"backprojection_longitude": ev.longitude,
         "backprojection_latitude": ev.latitude,
         "backprojection_depth": ev.depth}
    )
    if len(ev.picks.dropna(how="any")) >= MINIMUM_NUM_STATIONS_W_PICKS:
        print(f"Relocating event {i}")
        # first relocation, insensitive to outliers
        ev.relocate(
            stations=net.stations, routine=LOCATION_ROUTINE, method=NLLOC_METHOD,
        )
Relocating event 0
Relocating event 1
Relocating event 2
Relocating event 3
Relocating event 4
Relocating event 6
Relocating event 7
Relocating event 8
Relocating event 9
Relocating event 10
Relocating event 11
Relocating event 12
Relocating event 13
Relocating event 14
Relocating event 15
Relocating event 16

Let’s plot the arrival times predicted by NLLoc’s location! As you can see, a 1D velocity model cannot fit the picks very well. The stations are right in the heterogeneous fault zone.

[19]:
# play with EVENT_IDX to plot different events
EVENT_IDX = 1
fig = backprojection_events[EVENT_IDX].plot(
    figsize=(15, 15), plot_picks=True, plot_probabilities=True, plot_predicted_arrivals=True
    )
../../_images/tutorial_notebooks_6_relocate_31_0.svg

It doesn’t look like much has changed… But we now have the location uncertainties. The uncertainty information is encoded in the covariance matrix, which is stored at BPMF.dataset.Event.aux_data["cov_mat"].

[20]:
EVENT_IDX = 0
ev = backprojection_events[EVENT_IDX]
print(f"The maximum horizontal location uncertainty of event {EVENT_IDX} is {ev.hmax_unc:.2f}km.")
print(f"The azimuth of the maximum horizontal uncertainty is {ev.az_hmax_unc:.2f}degrees.")
print(f"The mainimumhorizontal location uncertainty of event {EVENT_IDX} is {ev.hmin_unc:.2f}km.")
print(f"The azimuth of the minimum horizontal uncertainty is {ev.az_hmin_unc:.2f}degrees.")
print(f"The maximum vertical location uncertainty is {ev.vmax_unc:.2f}km.")
The maximum horizontal location uncertainty of event 0 is 1.74km.
The azimuth of the maximum horizontal uncertainty is 156.99degrees.
The mainimumhorizontal location uncertainty of event 0 is 1.19km.
The azimuth of the minimum horizontal uncertainty is 66.99degrees.
The maximum vertical location uncertainty is 2.66km.

Before plotting the map with the new locations, let’s make sure our locations are not unnecessarily biased by aberrant pick outliers. Even though we used NonLinLoc’s EDT method, which is robust to outliers, we can easily identify outliers and remove them.

[21]:
# we set a maximum tolerable difference, in percentage, between the picked time and the predicted travel time
MAX_TIME_DIFFERENT_PICKS_PREDICTED_PERCENT = 25
for i, ev in enumerate(backprojection_events):
    if "NLLoc_reloc" in ev.aux_data:
        # this variable was inserted into ev.aux_data if NLLoc successfully located the event
        # use predicted times to remove outlier picks
        ev.remove_outlier_picks(max_diff_percent=MAX_TIME_DIFFERENT_PICKS_PREDICTED_PERCENT)
        if len(ev.picks.dropna(how="any")) >= MINIMUM_NUM_STATIONS_W_PICKS:
            print(f"Relocating event {i}")
            # first relocation, insensitive to outliers
            ev.relocate(
                stations=net.stations, routine=LOCATION_ROUTINE, method=NLLOC_METHOD,
            )
        else:
            # after removing the outlier picks, not enough picks were left
            # revert to backprojection location
            del ev.arrival_times
            ev.longitude = ev.aux_data["backprojection_longitude"]
            ev.latitude = ev.aux_data["backprojection_latitude"]
            ev.depth = ev.aux_data["backprojection_depth"]
Relocating event 0
Relocating event 1
Relocating event 4
Relocating event 6
Relocating event 7
Relocating event 8
Relocating event 10
Relocating event 11
Relocating event 12
Relocating event 13
Relocating event 14
Relocating event 15
[22]:
# play with EVENT_IDX to plot different events
EVENT_IDX = 0
fig = backprojection_events[EVENT_IDX].plot(
    figsize=(15, 15), plot_picks=True, plot_predicted_arrivals=True, plot_probabilities=False
    )
../../_images/tutorial_notebooks_6_relocate_36_0.svg

Gather the new catalog, clean up possible multiples and plot the new locations

Just like before, we make a new instance of BPMF.dataset.Catalog to use the built-in plotting methods.

[23]:
# --- update event ids now that we have our final locations
for ev in backprojection_events:
    ev.id = str(ev.origin_time)

# --- create a Catalog objet
nlloc_catalog = BPMF.dataset.Catalog.read_from_events(
    backprojection_events,
    extra_attributes=["hmax_unc", "hmin_unc", "az_hmax_unc", "vmax_unc"]
    )
nlloc_catalog.catalog["az_hmax_unc"] = nlloc_catalog.catalog["az_hmax_unc"]%180.
nlloc_catalog.catalog
Class instance does not have a `cov_mat` attribute.
[23]:
longitude latitude depth origin_time hmax_unc hmin_unc az_hmax_unc vmax_unc
event_id
2012-07-26T01:10:21.800000Z 30.323047 40.731250 12.015625 2012-07-26 01:10:21.800 1.736381 1.185190 156.991406 2.658909
2012-07-26T01:15:54.160000Z 30.343555 40.716875 9.273438 2012-07-26 01:15:54.160 2.069702 1.101945 89.976650 2.302121
2012-07-26T01:58:19.720000Z 30.210000 40.600000 -1.500000 2012-07-26 01:58:19.720 0.969304 0.828467 1.958087 2.184634
2012-07-26T02:35:01.160000Z 30.330000 40.700000 -1.500000 2012-07-26 02:35:01.160 4.120233 2.617063 74.828267 4.698645
2012-07-26T03:00:39.160000Z 30.324023 40.715625 9.070312 2012-07-26 03:00:39.160 1.791514 1.152377 98.957242 3.450490
2012-07-26T04:43:40.160000Z 30.350000 40.720000 8.500000 2012-07-26 04:43:40.160 15.000000 15.000000 0.000000 15.000000
2012-07-26T04:48:38.640000Z 30.339648 40.716875 9.273438 2012-07-26 04:48:38.640 2.723289 1.353023 77.879040 3.955719
2012-07-26T08:08:25.680000Z 30.335742 40.720625 8.867188 2012-07-26 08:08:25.680 2.498276 1.368690 78.911129 3.864549
2012-07-26T10:07:23.720000Z 30.338672 40.716250 8.765625 2012-07-26 10:07:23.720 1.648093 0.947081 92.574287 2.542094
2012-07-26T11:55:35.520000Z 30.270000 40.600000 3.000000 2012-07-26 11:55:35.520 2.728958 1.461620 152.650097 3.774164
2012-07-26T13:35:26.480000Z 30.295215 40.797812 -1.949219 2012-07-26 13:35:26.480 3.689917 0.996875 88.219249 5.571478
2012-07-26T13:48:32.480000Z 30.315234 40.723750 11.609375 2012-07-26 13:48:32.480 1.456039 1.326382 38.952550 2.881104
2012-07-26T13:51:58.200000Z 30.314258 40.713125 13.335938 2012-07-26 13:51:58.200 1.774422 1.626758 134.647549 3.745274
2012-07-26T13:53:31.400000Z 30.311328 40.718750 12.015625 2012-07-26 13:53:31.400 1.563932 1.491863 32.859366 3.106936
2012-07-26T13:56:54.320000Z 30.337695 40.716875 8.867188 2012-07-26 13:56:54.320 2.864502 1.240042 83.687111 2.879827
2012-07-26T15:06:20.160000Z 30.326099 40.600078 -1.987305 2012-07-26 15:06:20.160 1.464562 0.501280 92.486000 1.075268
2012-07-26T17:28:20.400000Z 30.430000 40.720000 10.000000 2012-07-26 17:28:20.400 3.513222 1.443351 71.766505 6.046675

Remove possible multiple detections:

The backprojection method does not have good inter-event time resolution because of the large uncertainties on the event origin times. There always exists a relatively large trade-off between the origin time and the location so it happens to detect the same event several times in a row if MINIMUM_INTEREVENT_TIME_SEC was not set large enough in 5_backprojection.ipynb. After relocation with NonLinLoc, it is easier to identify such multiple detections.

[24]:
# the criteria that determine whether two events are the same or not are of course somewhat arbitrary
# however, one can play with them and see what values effectively discard multiples
MINIMUM_INTEREVENT_TIME_SEC = 5.
MINIMUM_EPICENTRAL_DISTANCE_KM = 10.
[25]:
# ---------------------------------------------------------
#         Remove multiple detections of same event
# ---------------------------------------------------------

# --- build a dictionary where each entry is an event, for explicit indexing
backprojection_events_dict = {ev.id : ev for ev in backprojection_events}


keep = np.ones(nlloc_catalog.n_events, dtype=bool)
for i in range(1, nlloc_catalog.n_events):
    previous_ev = i - 1
    while not keep[previous_ev]:
        previous_ev -= 1
        if previous_ev < 0:
            break
    if previous_ev < 0:
        # there is not previous event to compare with
        continue
    # -----------------------------------------
    # catalog row indexes and backprojection_events indexes do not always match
    # because catalog is ordered chronologically and some events' chronology
    # might have swapped during NLLoc's location process
    # -----------------------------------------
    event_id = nlloc_catalog.catalog.index[i]
    det = backprojection_events_dict[event_id]
    previous_event_id = nlloc_catalog.catalog.index[previous_ev]
    previous_det = backprojection_events_dict[previous_event_id]

    dt_sec = (
        nlloc_catalog.catalog.origin_time[i]
        - nlloc_catalog.catalog.origin_time[previous_ev]
    ).total_seconds()
    if dt_sec < MINIMUM_INTEREVENT_TIME_SEC:
        # test their epicentral distance
        epi_distance = BPMF.utils.two_point_epicentral_distance(
                nlloc_catalog.catalog.iloc[i]["latitude"],
                nlloc_catalog.catalog.iloc[i]["longitude"],
                nlloc_catalog.catalog.iloc[previous_ev]["latitude"],
                nlloc_catalog.catalog.iloc[previous_ev]["longitude"]
                )
        if epi_distance > MINIMUM_EPICENTRAL_DISTANCE_KM:
            # these two events are close in time but distant in space
            continue
        if (
            "NLLoc_reloc" in det.aux_data
            and "NLLoc_reloc" not in previous_det.aux_data
        ):
            keep[previous_ev] = False
        elif (
            "NLLoc_reloc" not in det.aux_data
            and "NLLoc_reloc" in previous_det.aux_data
        ):
            keep[i] = False
        elif det.hmax_unc < previous_det.hmax_unc:
            keep[previous_ev] = False
        elif det.hmax_unc > previous_det.hmax_unc:
            keep[i] = False
        else:
            # if both det and previous_det were not relocated with NLLoc,
            # then there origin times cannot be closer than MINIMUM_INTEREVENT_TIME_SEC
            continue

nlloc_catalog.catalog = nlloc_catalog.catalog[keep]
nlloc_catalog.catalog["event_id"] = [f"bp_{i}" for i in range(len(nlloc_catalog.catalog))]
nlloc_catalog.catalog
[25]:
longitude latitude depth origin_time hmax_unc hmin_unc az_hmax_unc vmax_unc event_id
event_id
2012-07-26T01:10:21.800000Z 30.323047 40.731250 12.015625 2012-07-26 01:10:21.800 1.736381 1.185190 156.991406 2.658909 bp_0
2012-07-26T01:15:54.160000Z 30.343555 40.716875 9.273438 2012-07-26 01:15:54.160 2.069702 1.101945 89.976650 2.302121 bp_1
2012-07-26T01:58:19.720000Z 30.210000 40.600000 -1.500000 2012-07-26 01:58:19.720 0.969304 0.828467 1.958087 2.184634 bp_2
2012-07-26T02:35:01.160000Z 30.330000 40.700000 -1.500000 2012-07-26 02:35:01.160 4.120233 2.617063 74.828267 4.698645 bp_3
2012-07-26T03:00:39.160000Z 30.324023 40.715625 9.070312 2012-07-26 03:00:39.160 1.791514 1.152377 98.957242 3.450490 bp_4
2012-07-26T04:43:40.160000Z 30.350000 40.720000 8.500000 2012-07-26 04:43:40.160 15.000000 15.000000 0.000000 15.000000 bp_5
2012-07-26T04:48:38.640000Z 30.339648 40.716875 9.273438 2012-07-26 04:48:38.640 2.723289 1.353023 77.879040 3.955719 bp_6
2012-07-26T08:08:25.680000Z 30.335742 40.720625 8.867188 2012-07-26 08:08:25.680 2.498276 1.368690 78.911129 3.864549 bp_7
2012-07-26T10:07:23.720000Z 30.338672 40.716250 8.765625 2012-07-26 10:07:23.720 1.648093 0.947081 92.574287 2.542094 bp_8
2012-07-26T11:55:35.520000Z 30.270000 40.600000 3.000000 2012-07-26 11:55:35.520 2.728958 1.461620 152.650097 3.774164 bp_9
2012-07-26T13:35:26.480000Z 30.295215 40.797812 -1.949219 2012-07-26 13:35:26.480 3.689917 0.996875 88.219249 5.571478 bp_10
2012-07-26T13:48:32.480000Z 30.315234 40.723750 11.609375 2012-07-26 13:48:32.480 1.456039 1.326382 38.952550 2.881104 bp_11
2012-07-26T13:51:58.200000Z 30.314258 40.713125 13.335938 2012-07-26 13:51:58.200 1.774422 1.626758 134.647549 3.745274 bp_12
2012-07-26T13:53:31.400000Z 30.311328 40.718750 12.015625 2012-07-26 13:53:31.400 1.563932 1.491863 32.859366 3.106936 bp_13
2012-07-26T13:56:54.320000Z 30.337695 40.716875 8.867188 2012-07-26 13:56:54.320 2.864502 1.240042 83.687111 2.879827 bp_14
2012-07-26T15:06:20.160000Z 30.326099 40.600078 -1.987305 2012-07-26 15:06:20.160 1.464562 0.501280 92.486000 1.075268 bp_15
2012-07-26T17:28:20.400000Z 30.430000 40.720000 10.000000 2012-07-26 17:28:20.400 3.513222 1.443351 71.766505 6.046675 bp_16

The NLLoc locations are able to resolve the location differences between events much better than the backprojection locations. However, limits remain due to: - Errors in picks. - Approximations in velocity model (1D model whereas the area is hghly heterogeneous!). - Earthquakes are located outside the network, so large uncertainties cannot be avoided.

[26]:
fig = nlloc_catalog.plot_map(
    figsize=(10, 10), network=net, s=50, markersize_station=50, lat_margin=0.02, plot_uncertainties=True
    )
ax = fig.get_axes()[0]
ax.set_facecolor("dimgrey")
ax.patch.set_alpha(0.15)
../../_images/tutorial_notebooks_6_relocate_43_0.svg

Save the relocated events

Save all the events that were not identified as multiples to a hdf5 database.

[27]:
nlloc_catalog.catalog.to_csv(
    os.path.join(BPMF.cfg.OUTPUT_PATH, OUTPUT_CSV_FILENAME)
)
[28]:

for i in range(len(nlloc_catalog.catalog)): evid = nlloc_catalog.catalog.index[i] backprojection_events_dict[evid].write( OUTPUT_DB_FILENAME, db_path=BPMF.cfg.OUTPUT_PATH, gid=backprojection_events_dict[evid].id )

Relocation 2 (Bonus): Locating and estimating uncertainties with backprojection on a fine grid

The goal of this section is to explore an alternative to the PhaseNet + NonLinLoc workflow solely based on the use backprojection. If a “sparse” grid was used for the purpose of earthquake detection, then a fine grid can be used here.

In the following, we run a “vanilla” backprojection that backprojects the waveforms’ envelopes (as defined by the Hilbert transform).

[29]:
# PHASES: list of seismic phases to use for backprojection
PHASES = ["P", "S"]
# TT_FILENAME: name of the file with the travel times
TT_FILENAME = "tts.h5"
# N_MAX_STATIONS: the maximum number of stations used for stacking (e.g. the closest to a given grid point)
# note: this parameter is irrelevant in this example with only 8 stations, but it is important when
# using large seismic networks
N_MAX_STATIONS = 10

For backprojection, we need to load the travel time grid and set up the BPMF.template_search.Beamformer instance.

[30]:
tts = BPMF.template_search.TravelTimes(
    TT_FILENAME,
    tt_folder_path=BPMF.cfg.MOVEOUTS_PATH
)
tts.read(
    PHASES, source_indexes=None, read_coords=True, stations=net.stations
)
tts.convert_to_samples(BPMF.cfg.SAMPLING_RATE_HZ, remove_tt_seconds=True)
[31]:
EVENT_IDX = 0

# initialize a new beamformer because, in general, you might want to use a different
# travel-time table for this beamformer (e.g. denser)
bf_loc = BPMF.template_search.Beamformer(
    phases=PHASES,
    moveouts_relative_to_first=True,
)
# here, we need to "trick" the Beamformer instance and give it something similar to the Data instance in 5_backprojection
bf_loc.set_data(backprojection_events[EVENT_IDX])
# attach the Network instance
bf_loc.set_network(net)
# attach the TravelTimes instance
bf_loc.set_travel_times(tts)

# attach weights (the waveform features are the waveforms' envelopes)
# weights_phases = np.ones((net.n_stations, net.n_components, len(PHASES)), dtype=np.float32)
# weights_phases[:, :2, 0] = 0. # P-wave weight = 0 for horizontal components
# weights_phases[:, 2, 1] = 0. # S-wave weight = 0 for vertical components

# uncomment this section to perform the backprojection-based relocation
# with ML waveform features (PhaseNet)
weights_phases = np.ones((net.n_stations, 2, 2), dtype=np.float32)
weights_phases[:, 0, 1] = 0.0  # S-wave weights to zero for channel 0
weights_phases[:, 1, 0] = 0.0  # P-wave weights to zero for channel 1
bf_loc.set_weights_sources(N_MAX_STATIONS)
bf_loc.set_weights(weights_phases=weights_phases)
[32]:
# relocation parameters
LOCATION_ROUTINE = "beam"
RESTRICTED_DOMAIN_FOR_UNC_KM = 100.
DATA_FOLDER = "preprocessed_2_12"

[33]:
# uncomment this section to perform the backprojection-based relocation
# with ML waveform features (PhaseNet)

import torch
import seisbench.models as sbm

beam_relocated_event = copy.deepcopy(backprojection_events[EVENT_IDX])
ml_detector = sbm.PhaseNet.from_pretrained("original")
ml_detector.eval()

beam_relocated_event.read_waveforms(
    120., offset_ot=60., time_shifted=False,
    data_folder="preprocessed_2_12", data_reader=data_reader_mseed
)

data_arr = beam_relocated_event.get_np_array(net.stations)

data_arr_n = BPMF.utils.normalize_batch(data_arr)
closest_pow2 = int(np.log2(data_arr_n.shape[-1])) + 1
diff = 2**closest_pow2 - data_arr_n.shape[-1]
left = diff//2
right = diff//2 + diff%2
data_arr_n = np.pad(
        data_arr_n,
        ((0, 0), (0, 0), (left, right)),
        mode="reflect"
        )


with torch.no_grad():
    PN_probas = ml_detector(
        torch.from_numpy(data_arr_n).float()
    )
# remove channel 0 that is probability of noise
waveform_features = PN_probas[:, 1:, :].detach().numpy()

[34]:
beam_relocated_event = copy.deepcopy(backprojection_events[EVENT_IDX])

# uncomment this section if you are using envelopes (modulus of analytical signal) as waveform features
# beam_relocated_event.relocate(
#     routine=LOCATION_ROUTINE,
#     duration=120.,
#     offset_ot=40.,
#     beamformer=bf_loc,
#     data_folder=DATA_FOLDER,
#     restricted_domain_side_km=RESTRICTED_DOMAIN_FOR_UNC_KM,
#     data_reader=data_reader_mseed,
# )

# uncomment this section if you are using ML waveform features
beam_relocated_event.relocate(
    routine=LOCATION_ROUTINE,
    beamformer=bf_loc,
    data_folder=DATA_FOLDER,
    restricted_domain_side_km=RESTRICTED_DOMAIN_FOR_UNC_KM,
    data_reader=data_reader_mseed,
    waveform_features=waveform_features
)
[35]:
dummy_catalog = BPMF.dataset.Catalog.read_from_events(
    [backprojection_events[EVENT_IDX]],
    extra_attributes=["hmax_unc", "hmin_unc", "az_hmax_unc", "vmax_unc"]
    )
dummy_catalog.catalog["az_hmax_unc"] = dummy_catalog.catalog["az_hmax_unc"]%180.
dummy_catalog.catalog
[35]:
longitude latitude depth origin_time hmax_unc hmin_unc az_hmax_unc vmax_unc
event_id
2012-07-26T01:10:21.800000Z 30.323047 40.73125 12.015625 2012-07-26 01:10:21.800 1.736381 1.18519 156.991406 2.658909

The difference between this backprojection and the “detection backprojection” of 5_backprojection is that, here, we stored the value of each beam of the 4D (x, y, z, time) grid! Below, we plot 3 slices of the (x, y, z, time=time of max beam) volume.

[36]:
from cartopy.crs import PlateCarree

scatter_kwargs = {}
scatter_kwargs["edgecolor"] = "k"
scatter_kwargs["linewidths"] = 0.5
scatter_kwargs["s"] = 40
scatter_kwargs["zorder"] = 3
scatter_kwargs["color"] = "C3"



# when using the beam-based relocation routine, the entire grid of beams is stored in memory
fig_beam_reloc = bf_loc.plot_likelihood(figsize=(15, 11), likelihood=bf_loc.likelihood)
axes = fig_beam_reloc.get_axes()
fig_beam_reloc = dummy_catalog.plot_map(
    ax=axes[0], network=net, s=50, markersize_station=50, lat_margin=0.02, plot_uncertainties=True, depth_colorbar=False
    )

# the following is a bit cumbersome and is just to plot the uncertainty ellipses on the depth cross-sections
ax = axes[0]
data_coords = PlateCarree()
projected_coords_nlloc = ax.projection.transform_points(
    data_coords,
    np.array([backprojection_events[EVENT_IDX].longitude]),
    np.array([backprojection_events[EVENT_IDX].latitude]),
)
projected_coords_backproj = ax.projection.transform_points(
    data_coords,
    np.array([beam_relocated_event.longitude]),
    np.array([beam_relocated_event.latitude]),
)

ax.scatter(
    beam_relocated_event.longitude,
    beam_relocated_event.latitude,
    marker="*",
    s=40,
    color="k",
    label="Initial backprojection location",
    transform=data_coords
)

axes[1].scatter(
    projected_coords_nlloc[0, 0], backprojection_events[EVENT_IDX].depth, **scatter_kwargs
)
axes[1].scatter(
    projected_coords_backproj[0, 0],
    initial_catalog.catalog.loc[EVENT_IDX, "depth"],
    marker="*",
    s=40,
    color="k",
)
ew_ellipse_lon, ew_ellipse_lat, ew_ellipse_dep = BPMF.plotting_utils.vertical_uncertainty_ellipse(
    backprojection_events[EVENT_IDX].aux_data["cov_mat"],
    backprojection_events[EVENT_IDX].longitude,
    backprojection_events[EVENT_IDX].latitude,
    backprojection_events[EVENT_IDX].depth,
    horizontal_direction="longitude"
)
projected_ellipse_coords = ax.projection.transform_points(
    data_coords,
    ew_ellipse_lon,
    ew_ellipse_lat,
)
axes[1].plot(
    projected_ellipse_coords[:, 0], ew_ellipse_dep, color=scatter_kwargs["color"]
)

axes[2].scatter(
    backprojection_events[EVENT_IDX].depth, projected_coords_nlloc[0, 1], **scatter_kwargs
)
axes[2].scatter(
    initial_catalog.catalog.loc[EVENT_IDX, "depth"],
    projected_coords_backproj[0, 1],
    marker="*",
    s=40,
    color="k",
)
ns_ellipse_lon, ns_ellipse_lat, ns_ellipse_dep = BPMF.plotting_utils.vertical_uncertainty_ellipse(
    backprojection_events[EVENT_IDX].aux_data["cov_mat"],
    backprojection_events[EVENT_IDX].longitude,
    backprojection_events[EVENT_IDX].latitude,
    backprojection_events[EVENT_IDX].depth,
    horizontal_direction="latitude"
)
projected_ellipse_coords = ax.projection.transform_points(
    data_coords,
    ns_ellipse_lon,
    ns_ellipse_lat,
)
axes[2].plot(
    ns_ellipse_dep, projected_ellipse_coords[:, 1], color=scatter_kwargs["color"]
)

axes[1].set_ylim(20., axes[1].get_ylim()[1])
axes[2].set_xlim(axes[2].get_xlim()[0], 20.)

for i, ax in enumerate(axes[:-1]):
    ax.text(0.02, 0.98, f'({string.ascii_lowercase[i]})', va='top',
            fontsize=20, ha='left', bbox={"facecolor": "white", "edgecolor": "none", "alpha": 0.75},
            transform=ax.transAxes)
../../_images/tutorial_notebooks_6_relocate_57_0.svg

The current way of estimating location uncertainties with the backprojection method is to compute a weighted average distance between the maximum beam grid point (indexed by \(k^*\)) and all other grid points. To avoid the result to be grid-size dependent, we limit the weighted average to a restricted rectangular domain taken around the maximum beam grid point.

\[h_{unc} = \dfrac{L_k e_{i,k^*}}{\sum_{k \in \mathcal{R}} L_k},\]
\[v_{unc} = \dfrac{L_k \Delta d_{i,k^*}}{\sum_{k \in \mathcal{R}} L_k},\]

where \(e_{i,k^*}\) is the epicentral distance between grid points \(i\) and \(k^*\) and \(\Delta d_{i,k^*}\) is the depth difference between grid points \(i\) and \(k^*\).

In this case, we assume the uncertainty ellipse to be isotropic, that is, \(h_{max,unc} = h_{min,unc}\).

[37]:
EVENT_IDX = 0
ev = backprojection_events[EVENT_IDX]
print(f"The maximum horizontal location uncertainty of event {EVENT_IDX} is {ev.hmax_unc:.2f}km.")
print(f"The azimuth of the maximum horizontal uncertainty is {ev.az_hmax_unc:.2f}degrees.")
print(f"The mainimum horizontal location uncertainty of event {EVENT_IDX} is {ev.hmin_unc:.2f}km.")
print(f"The azimuth of the minimum horizontal uncertainty is {ev.az_hmin_unc:.2f}degrees.")
print(f"The maximum vertical location uncertainty is {ev.vmax_unc:.2f}km with {ev._pl_vmax_unc%90:.2f}degrees plunge.")
The maximum horizontal location uncertainty of event 0 is 1.74km.
The azimuth of the maximum horizontal uncertainty is 156.99degrees.
The mainimum horizontal location uncertainty of event 0 is 1.19km.
The azimuth of the minimum horizontal uncertainty is 66.99degrees.
The maximum vertical location uncertainty is 2.66km with 29.50degrees plunge.