Relocation

In the previous notebook, 5_backprojection, we built a database of events detected with the backprojection technique. Here, we will improve the location of each of these events by using a more elaborated location technique.

[1]:
import os
# choose the number of threads you want to limit the computation to
n_CPUs = 24
os.environ["OMP_NUM_THREADS"] = str(n_CPUs)

import copy
import BPMF
import h5py as h5
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import string

from BPMF.data_reader_examples import data_reader_mseed
from time import time as give_time

%config InlineBackend.figure_formats = ["svg"]
plt.rcParams["svg.fonttype"] = "none"
[2]:
INPUT_DB_FILENAME = "raw_bp.h5"
OUTPUT_DB_FILENAME = "reloc_bp.h5"
OUTPUT_CSV_FILENAME = "backprojection_catalog.csv"
NETWORK_FILENAME = "network.csv"
[3]:
net = BPMF.dataset.Network(NETWORK_FILENAME)
net.read()

Load the metadata of the events previously detected with backprojection

In the previous notebook, we used BPMF.dataset.Event.write to save the metadata of each detected event to a hdf5 database. Here, we will use BPMF.dataset.Event.read_from_file to read the metadata and build a list of BPMF.dataset.Event instances.

[4]:
backprojection_events = []
with h5.File(os.path.join(BPMF.cfg.OUTPUT_PATH, INPUT_DB_FILENAME), mode="r") as fin:
    print(f"Each event's metadata is stored at a 'subfolder'. The subfolders are: {fin.keys()}")
    print("In hdf5, 'subfolders' are called 'groups'.")
    for group_id in fin.keys():
        backprojection_events.append(
            BPMF.dataset.Event.read_from_file(
                INPUT_DB_FILENAME,
                db_path=BPMF.cfg.OUTPUT_PATH,
                gid=group_id,
                data_reader=data_reader_mseed
            )
        )
Each event's metadata is stored at a 'subfolder'. The subfolders are: <KeysViewHDF5 ['20120726_011024.280000', '20120726_011555.960000', '20120726_015757.400000', '20120726_023502.800000', '20120726_030040.920000', '20120726_044340.160000', '20120726_044840.160000', '20120726_080827.360000', '20120726_100725.640000', '20120726_115536.720000', '20120726_133527.760000', '20120726_134834.840000', '20120726_135200.480000', '20120726_135333.600000', '20120726_135656.000000', '20120726_150621.560000', '20120726_172821.640000']>
In hdf5, 'subfolders' are called 'groups'.
[5]:
initial_catalog = pd.DataFrame(columns=["origin_time", "longitude", "latitude", "depth"])
for i, ev in enumerate(backprojection_events):
    initial_catalog.loc[i, "origin_time"] = ev.origin_time
    initial_catalog.loc[i, "longitude"] = ev.longitude
    initial_catalog.loc[i, "latitude"] = ev.latitude
    initial_catalog.loc[i, "depth"] = ev.depth
initial_catalog["origin_time"] = initial_catalog["origin_time"].values.astype("datetime64[ms]")
initial_catalog["longitude"] = initial_catalog["longitude"].values.astype("float32")
initial_catalog["latitude"] = initial_catalog["latitude"].values.astype("float32")
initial_catalog["depth"] = initial_catalog["depth"].values.astype("float32")
initial_catalog
[5]:
origin_time longitude latitude depth
0 2012-07-26 01:10:24.280 30.309999 40.720001 12.0
1 2012-07-26 01:15:55.960 30.370001 40.709999 7.5
2 2012-07-26 01:57:57.400 30.260000 40.630001 -0.5
3 2012-07-26 02:35:02.800 30.330000 40.730000 7.0
4 2012-07-26 03:00:40.920 30.350000 40.709999 9.0
5 2012-07-26 04:43:40.160 30.350000 40.720001 8.5
6 2012-07-26 04:48:40.160 30.389999 40.709999 8.0
7 2012-07-26 08:08:27.360 30.379999 40.709999 8.0
8 2012-07-26 10:07:25.640 30.330000 40.720001 9.5
9 2012-07-26 11:55:36.720 30.290001 40.619999 0.0
10 2012-07-26 13:35:27.760 30.450001 40.799999 -2.0
11 2012-07-26 13:48:34.840 30.309999 40.720001 12.0
12 2012-07-26 13:52:00.480 30.360001 40.709999 7.5
13 2012-07-26 13:53:33.600 30.330000 40.720001 12.0
14 2012-07-26 13:56:56.000 30.330000 40.730000 10.0
15 2012-07-26 15:06:21.560 30.320000 40.599998 -2.0
16 2012-07-26 17:28:21.640 30.430000 40.720001 9.5

Build a BPMF.dataset.Catalog instance from initial_catalog in order to benefit from the BPMF.dataset.Catalog plotting methods.

[6]:
initial_catalog = BPMF.dataset.Catalog.read_from_dataframe(
    initial_catalog,
)
[7]:
initial_catalog.catalog
[7]:
longitude latitude depth origin_time
0 30.309999 40.720001 12.0 2012-07-26 01:10:24.280
1 30.370001 40.709999 7.5 2012-07-26 01:15:55.960
2 30.260000 40.630001 -0.5 2012-07-26 01:57:57.400
3 30.330000 40.730000 7.0 2012-07-26 02:35:02.800
4 30.350000 40.709999 9.0 2012-07-26 03:00:40.920
5 30.350000 40.720001 8.5 2012-07-26 04:43:40.160
6 30.389999 40.709999 8.0 2012-07-26 04:48:40.160
7 30.379999 40.709999 8.0 2012-07-26 08:08:27.360
8 30.330000 40.720001 9.5 2012-07-26 10:07:25.640
9 30.290001 40.619999 0.0 2012-07-26 11:55:36.720
10 30.450001 40.799999 -2.0 2012-07-26 13:35:27.760
11 30.309999 40.720001 12.0 2012-07-26 13:48:34.840
12 30.360001 40.709999 7.5 2012-07-26 13:52:00.480
13 30.330000 40.720001 12.0 2012-07-26 13:53:33.600
14 30.330000 40.730000 10.0 2012-07-26 13:56:56.000
15 30.320000 40.599998 -2.0 2012-07-26 15:06:21.560
16 30.430000 40.720001 9.5 2012-07-26 17:28:21.640

Let’s plot the locations that the backprojection method produced.

[8]:
fig = initial_catalog.plot_map(figsize=(10, 10), network=net, s=50, markersize_station=50, lat_margin=0.02)
ax = fig.get_axes()[0]
ax.set_facecolor("dimgrey")
ax.patch.set_alpha(0.15)
../../_images/tutorial_notebooks_6_relocate_11_0.svg

The backprojection method offers several advantages for locating earthquakes: - Locations come for free in addition to detecting events. - The “picking” of P and S waves does not involve any single-channel threshold. - Backprojection effectively solves the detection and phase association problems at the same time.

However, there is a number of cons: - The backprojection locations are just as good as the grid is. If a downsampled grid was used for computation efficiency, then the locations definitely need to be refined. - It is hard and computationally expensive to estimate uncertainties (as we will see later in this notebook).

Thus, in the following, we will explore two ways to refine the initial locations.

Relocation 1: PhaseNet picks and NonLinLoc

If you have installed PhaseNet and NonLinLoc, you can use BPMF.dataset.Event methods to easily relocate events with automatic picking and earthquake location. This is a two-step procedure. We first need to pick the P- and S-wave arrivals with PhaseNet before giving these data to NonLinLoc.

We will explore, step by step, what happens to the BPMF.dataset.Event instance when picking the phase arrivals and when relocating with NonLinLoc. First, we read short windows around the P- and S-wave arrivals, as determined by the backprojection locations.

Automated picking and relocation

[9]:
# event extraction parameters

# PHASE_ON_COMP: dictionary defining which moveout we use to extract the waveform
PHASE_ON_COMP = {"N": "S", "1": "S", "E": "S", "2": "S", "Z": "P"}
# OFFSET_PHASE: dictionary defining the time offset taken before a given phase
#               for example OFFSET_PHASE["P"] = 1.0 means that we extract the window
#               1 second before the predicted P arrival time
OFFSET_PHASE = {"P": 1.0, "S": 4.0}
# TIME_SHIFTED: boolean, if True, use moveouts to extract relevant windows
TIME_SHIFTED = True
# DATA_FOLDER: name of the folder where the waveforms we want to use for picking are stored
DATA_FOLDER = "preprocessed_2_12"
[10]:
t1 = give_time()
for i in range(len(backprojection_events)):
    backprojection_events[i].read_waveforms(
        BPMF.cfg.TEMPLATE_LEN_SEC,
        phase_on_comp=PHASE_ON_COMP,
        offset_phase=OFFSET_PHASE,
        time_shifted=TIME_SHIFTED,
        data_folder=DATA_FOLDER,
    )
t2 = give_time()
print(
    f"{t2-t1:.2f}sec to read the waveforms for all {len(backprojection_events):d} detections."
)
3.21sec to read the waveforms for all 17 detections.

Let’s plot these short windows with BPMF.dataset.Event.plot.

[11]:
# play with EVENT_IDX to plot different events
EVENT_IDX = 0
fig = backprojection_events[EVENT_IDX].plot(figsize=(15, 15))
../../_images/tutorial_notebooks_6_relocate_17_0.svg

Now, we will use PhaseNet to pick the P- and S-wave arrivals. This is done with the BPMF.dataset.Event.pick_PS_phases method.

[12]:
# PhaseNet picking parameters

# PhaseNet was trained for 100Hz data. Even if we saw that running PhaseNet on 25Hz data
# was good for backprojection, here, picking benefits from running PhaseNet on 100Hz data.
# Thus, we will upsample the waveforms before running PhaseNet.
PHASENET_SAMPLING_RATE_HZ = 100.
UPSAMPLING_BEFORE_PN_RELOCATION = int(PHASENET_SAMPLING_RATE_HZ/BPMF.cfg.SAMPLING_RATE_HZ)
DOWNSAMPLING_BEFORE_PN_RELOCATION = 1

# DURATION_SEC: the duration, in seconds, of the data stream starting at the detection time
#               defined by Event.origin_time. This data stream is used for picking the P/S waves.
DURATION_SEC = 60.0
# THRESHOLD_P: probability of P-wave arrival above which we declare a pick. If several picks are
#              declared during the DURATION_SEC data stream, we only keep the best one. We can
#              afford using a low probability threshold since we already know with some confidence
#              that an earthquake is in the data stream.
THRESHOLD_P = 0.10
# THRESHOLD_S: probability of S-wave arrival above which we declare a pick.
THRESHOLD_S = 0.10
# DATA_FOLDER: name of the folder where the waveforms we want to use for picking are stored
DATA_FOLDER = "preprocessed_2_12"
# COMPONENT_ALIASES: A dictionary that defines the possible channel names to search for
#                    for example, the seismometer might not be oriented and the horizontal channels
#                    might be named 1 and 2, in which case we arbitrarily decide to take 1 as the "N" channel
#                    and 2 as the "E" channel. This doesn't matter for picking P- and S-wave arrival times.
COMPONENT_ALIASES = {"N": ["N", "1"], "E": ["E", "2"], "Z": ["Z"]}
# USE_APRIORI_PICKS: boolean. This option is IMPORTANT when running BPMF in HIGH SEISMICITY CONTEXTS, like
#                   during the aftershock sequence of a large earthquake. If there are many events happening
#                   close to each other in time, we need to guide PhaseNet to pick the right set of picks.
#                   For that, we use the predicted P- and S-wave times from backprojection to add extra weight to
#                   the picks closer to those times and make it more likely to identify them as the "best" picks.
#                   WARNING: If there are truly many events, even this trick might fail. It's because "phase association"
#                   is an intrinsically hard problem in this case, and the picking might be hard to do automatically.
USE_APRIORI_PICKS = True
[13]:
for i in range(len(backprojection_events)):
    print(f"Picking P and S wave arrivals on detection {i:d}")
    # make sure the event's station list has all the stations at which
    # we eventually want travel times after NLLoc's location
    backprojection_events[i].stations = net.stations.values.astype("U")
    # the following line should only be used in a similar context as here
    # it uses the moveouts from the backprojection location to define the
    # P/S arrival times that we use a "prior information"
    backprojection_events[i].set_arrival_times_from_moveouts(verbose=0)
    backprojection_events[i].pick_PS_phases(
        DURATION_SEC,
        threshold_P=THRESHOLD_P,
        threshold_S=THRESHOLD_S,
        component_aliases=COMPONENT_ALIASES,
        data_folder=DATA_FOLDER,
        upsampling=UPSAMPLING_BEFORE_PN_RELOCATION,
        downsampling=DOWNSAMPLING_BEFORE_PN_RELOCATION,
        use_apriori_picks=USE_APRIORI_PICKS,
        keep_probability_time_series=True,
    )
Picking P and S wave arrivals on detection 0
Picking P and S wave arrivals on detection 1
Picking P and S wave arrivals on detection 2
Picking P and S wave arrivals on detection 3
Picking P and S wave arrivals on detection 4
Picking P and S wave arrivals on detection 5
Picking P and S wave arrivals on detection 6
Picking P and S wave arrivals on detection 7
Picking P and S wave arrivals on detection 8
Picking P and S wave arrivals on detection 9
Picking P and S wave arrivals on detection 10
Picking P and S wave arrivals on detection 11
Picking P and S wave arrivals on detection 12
Picking P and S wave arrivals on detection 13
Picking P and S wave arrivals on detection 14
Picking P and S wave arrivals on detection 15
Picking P and S wave arrivals on detection 16

We are done with the automatic picking! Every instance of BPMF.dataset.Event should now have a new attribute picks.

[14]:
EVENT_IDX = 0
backprojection_events[EVENT_IDX].picks
[14]:
P_picks_sec P_probas P_abs_picks S_picks_sec S_probas S_abs_picks
stations
DD06 20.330000 0.427482 2012-07-26T01:10:24.610000Z 22.340000 0.876346 2012-07-26T01:10:26.620000Z
DC08 20.010000 0.759057 2012-07-26T01:10:24.290000Z 22.010000 0.722221 2012-07-26T01:10:26.290000Z
DE08 19.799999 0.422192 2012-07-26T01:10:24.079999Z 21.950001 0.460754 2012-07-26T01:10:26.230001Z
SPNC 20.150000 0.936033 2012-07-26T01:10:24.430000Z 22.010000 0.915193 2012-07-26T01:10:26.290000Z
DC06 20.950001 0.876173 2012-07-26T01:10:25.230001Z 23.209999 0.897276 2012-07-26T01:10:27.489999Z
SAUV 19.660000 0.818781 2012-07-26T01:10:23.940000Z 21.129999 0.774574 2012-07-26T01:10:25.409999Z
DC07 20.410000 0.711142 2012-07-26T01:10:24.690000Z 22.240000 0.678000 2012-07-26T01:10:26.520000Z
DE07 20.430000 0.958840 2012-07-26T01:10:24.710000Z 22.570000 0.861130 2012-07-26T01:10:26.850000Z

These picks are automatically read when calling BPMF.dataset.Event.plot. Before plotting the waveforms with the newly obtained picks, let’s re-read the waveforms in the same format as before for the sake of easier visualization.

[15]:
t1 = give_time()
for i in range(len(backprojection_events)):
    backprojection_events[i].read_waveforms(
        BPMF.cfg.TEMPLATE_LEN_SEC,
        phase_on_comp=PHASE_ON_COMP,
        offset_phase=OFFSET_PHASE,
        time_shifted=TIME_SHIFTED,
        data_folder=DATA_FOLDER,
    )
t2 = give_time()
print(
    f"{t2-t1:.2f}sec to read the waveforms for all {len(backprojection_events):d} detections."
)
3.30sec to read the waveforms for all 17 detections.

The following plot shows: - the P-/S-wave arrival continuous probabilities (blue=P-wave, red=S-wave) computed by PhaseNet, - the PhaseNet picks with dashed vertical lines (blue=P-wave, red=S-wave), - the arrival times predicted by the backprojection location with solid vertical lines (purple=P-wave, orange=S-wave).

See how, in this tutorial, the backprojection locations already do a good job at predicting the P- and S-wave times… given the limitations of the 1D velocity model.

Note: Try to run the phase picking again with USE_APRIORI_PICKS=False and see how the plot changes.

[16]:
# play with EVENT_IDX to plot different events
EVENT_IDX = 1
fig = backprojection_events[EVENT_IDX].plot(
    figsize=(15, 15), plot_picks=True, plot_predicted_arrivals=False, plot_probabilities=True
    )
../../_images/tutorial_notebooks_6_relocate_26_0.svg

We now are all set for using our favorite location software using PhaseNet’s picks. We can use NLLoc with one of BPMF.dataset.Event methods. See http://alomax.free.fr/nlloc/ for more info.

[17]:
# location parameters

LOCATION_ROUTINE = "NLLoc"
# NLLOC_METHOD: string that defines what loss function is used by NLLoc, see http://alomax.free.fr/nlloc/ for more info.
#               Using some flavor of 'EDT' is important to obtain robust locations that are not sensitive to pick outliers.
NLLOC_METHOD = "EDT_OT_WT_ML"
# MINIMUM_NUM_STATIONS_W_PICKS: minimum number of stations with picks to even try relocation.
MINIMUM_NUM_STATIONS_W_PICKS = 3
[18]:
for i, ev in enumerate(backprojection_events):
    # before relocating, let's store the old location for book-keeping
    ev.set_aux_data(
        {"backprojection_longitude": ev.longitude,
         "backprojection_latitude": ev.latitude,
         "backprojection_depth": ev.depth}
    )
    if len(ev.picks.dropna(how="any")) >= MINIMUM_NUM_STATIONS_W_PICKS:
        print(f"Relocating event {i}")
        # first relocation, insensitive to outliers
        ev.relocate(
            stations=net.stations, routine=LOCATION_ROUTINE, method=NLLOC_METHOD,
        )
Relocating event 0
Relocating event 1
Relocating event 2
Relocating event 3
Relocating event 4
Relocating event 5
Relocating event 6
Relocating event 7
Relocating event 8
Relocating event 9
Relocating event 10
Relocating event 11
Relocating event 12
Relocating event 13
Relocating event 14
Relocating event 15
Relocating event 16

Let’s plot the arrival times predicted by NLLoc’s location! As you can see, a 1D velocity model cannot fit the picks very well. The stations are right in the heterogeneous fault zone.

[19]:
# play with EVENT_IDX to plot different events
EVENT_IDX = 1
fig = backprojection_events[EVENT_IDX].plot(
    figsize=(15, 15), plot_picks=True, plot_probabilities=True, plot_predicted_arrivals=True
    )
../../_images/tutorial_notebooks_6_relocate_31_0.svg

It doesn’t look like much has changed… But we now have the location uncertainties. The uncertainty information is encoded in the covariance matrix, which is stored at BPMF.dataset.Event.aux_data["cov_mat"].

[20]:
EVENT_IDX = 0
ev = backprojection_events[EVENT_IDX]
print(f"The maximum horizontal location uncertainty of event {EVENT_IDX} is {ev.hmax_unc:.2f}km.")
print(f"The azimuth of the maximum horizontal uncertainty is {ev.az_hmax_unc:.2f}degrees.")
print(f"The mainimumhorizontal location uncertainty of event {EVENT_IDX} is {ev.hmin_unc:.2f}km.")
print(f"The azimuth of the minimum horizontal uncertainty is {ev.az_hmin_unc:.2f}degrees.")
print(f"The maximum vertical location uncertainty is {ev.vmax_unc:.2f}km.")
The maximum horizontal location uncertainty of event 0 is 1.78km.
The azimuth of the maximum horizontal uncertainty is 161.99degrees.
The mainimumhorizontal location uncertainty of event 0 is 1.31km.
The azimuth of the minimum horizontal uncertainty is 71.99degrees.
The maximum vertical location uncertainty is 2.57km.

Before plotting the map with the new locations, let’s make sure our locations are not unnecessarily biased by aberrant pick outliers. Even though we used NonLinLoc’s EDT method, which is robust to outliers, we can easily identify outliers and remove them.

[21]:
# we set a maximum tolerable difference, in percentage, between the picked time and the predicted travel time
MAX_TIME_DIFFERENT_PICKS_PREDICTED_PERCENT = 25
for i, ev in enumerate(backprojection_events):
    if "NLLoc_reloc" in ev.aux_data:
        # this variable was inserted into ev.aux_data if NLLoc successfully located the event
        # use predicted times to remove outlier picks
        ev.remove_outlier_picks(max_diff_percent=MAX_TIME_DIFFERENT_PICKS_PREDICTED_PERCENT)
        if len(ev.picks.dropna(how="any")) >= MINIMUM_NUM_STATIONS_W_PICKS:
            print(f"Relocating event {i}")
            # first relocation, insensitive to outliers
            ev.relocate(
                stations=net.stations, routine=LOCATION_ROUTINE, method=NLLOC_METHOD,
            )
        else:
            # after removing the outlier picks, not enough picks were left
            # revert to backprojection location
            del ev.arrival_times
            ev.longitude = ev.aux_data["backprojection_longitude"]
            ev.latitude = ev.aux_data["backprojection_latitude"]
            ev.depth = ev.aux_data["backprojection_depth"]
Relocating event 0
Relocating event 1
Relocating event 3
Relocating event 4
Relocating event 5
Relocating event 6
Relocating event 7
Relocating event 8
Relocating event 10
Relocating event 11
Relocating event 12
Relocating event 13
Relocating event 14
Relocating event 15
[22]:
# play with EVENT_IDX to plot different events
EVENT_IDX = 1
fig = backprojection_events[EVENT_IDX].plot(
    figsize=(15, 15), plot_picks=True, plot_predicted_arrivals=True, plot_probabilities=False
    )
../../_images/tutorial_notebooks_6_relocate_36_0.svg

Gather the new catalog, clean up possible multiples and plot the new locations

Just like before, we make a new instance of BPMF.dataset.Catalog to use the built-in plotting methods.

[29]:
# --- update event ids now that we have our final locations
for ev in backprojection_events:
    ev.id = str(ev.origin_time)

# --- create a Catalog objet
nlloc_catalog = BPMF.dataset.Catalog.read_from_events(
    backprojection_events,
    extra_attributes=["hmax_unc", "hmin_unc", "az_hmax_unc", "vmax_unc"]
    )
nlloc_catalog.catalog["az_hmax_unc"] = nlloc_catalog.catalog["az_hmax_unc"]%180.
nlloc_catalog.catalog
[29]:
longitude latitude depth origin_time hmax_unc hmin_unc az_hmax_unc vmax_unc
event_id
2012-07-26T01:10:21.840000Z 30.322070 40.730625 12.117188 2012-07-26 01:10:21.840 1.779872 1.305840 161.988359 2.565296
2012-07-26T01:15:54.200000Z 30.343555 40.716875 9.070312 2012-07-26 01:15:54.200 2.077245 1.130659 90.589373 2.339978
2012-07-26T01:58:19.840000Z 30.260000 40.630000 -0.500000 2012-07-26 01:58:19.840 1.778483 0.702436 175.539454 2.400404
2012-07-26T02:35:01.120000Z 30.341602 40.718125 8.867188 2012-07-26 02:35:01.120 3.878430 1.763533 113.548582 5.734983
2012-07-26T03:00:38.760000Z 30.316211 40.718125 12.117188 2012-07-26 03:00:38.760 1.674063 1.431853 25.259715 3.172909
2012-07-26T04:43:38.160000Z 30.315234 40.713750 12.421875 2012-07-26 04:43:38.160 4.718800 1.679460 122.483911 4.514452
2012-07-26T04:48:38.400000Z 30.359180 40.714375 9.882812 2012-07-26 04:48:38.400 4.091946 1.587564 85.557525 4.705347
2012-07-26T08:08:25.760000Z 30.337695 40.719375 8.664062 2012-07-26 08:08:25.760 3.033962 1.372813 103.115926 3.483848
2012-07-26T10:07:23.760000Z 30.341113 40.715938 8.511719 2012-07-26 10:07:23.760 1.592323 0.957655 90.956024 2.790996
2012-07-26T11:55:36.200000Z 30.290000 40.620000 0.000000 2012-07-26 11:55:36.200 5.414679 0.868935 124.691332 1.100644
2012-07-26T13:35:26.440000Z 30.295215 40.799062 -1.949219 2012-07-26 13:35:26.440 3.780788 0.951556 87.980070 4.216654
2012-07-26T13:48:32.600000Z 30.316211 40.720625 11.304688 2012-07-26 13:48:32.600 1.428168 1.294981 38.195164 2.869099
2012-07-26T13:51:58.360000Z 30.308398 40.715625 12.320312 2012-07-26 13:51:58.360 1.634489 1.439889 29.107871 5.108501
2012-07-26T13:53:31.680000Z 30.324023 40.720625 9.273438 2012-07-26 13:53:31.680 1.547559 1.530669 156.791350 5.334101
2012-07-26T13:56:54.280000Z 30.342578 40.716250 8.765625 2012-07-26 13:56:54.280 2.667188 1.050842 80.353125 5.208330
2012-07-26T15:06:20.080000Z 30.328174 40.600156 -1.974609 2012-07-26 15:06:20.080 0.919964 0.627512 157.405152 0.419150
2012-07-26T17:28:22.080000Z 30.430000 40.720000 9.500000 2012-07-26 17:28:22.080 4.777719 0.886096 48.619974 2.428971

Remove possible multiple detections:

The backprojection method does not have good inter-event time resolution because of the large uncertainties on the event origin times. There always exists a relatively large trade-off between the origin time and the location so it happens to detect the same event several times in a row if MINIMUM_INTEREVENT_TIME_SEC was not set large enough in 5_backprojection.ipynb. After relocation with NonLinLoc, it is easier to identify such multiple detections.

[30]:
# the criteria that determine whether two events are the same or not are of course somewhat arbitrary
# however, one can play with them and see what values effectively discard multiples
MINIMUM_INTEREVENT_TIME_SEC = 5.
MINIMUM_EPICENTRAL_DISTANCE_KM = 10.
[31]:
# ---------------------------------------------------------
#         Remove multiple detections of same event
# ---------------------------------------------------------

# --- build a dictionary where each entry is an event, for explicit indexing
backprojection_events_dict = {ev.id : ev for ev in backprojection_events}


keep = np.ones(nlloc_catalog.n_events, dtype=bool)
for i in range(1, nlloc_catalog.n_events):
    previous_ev = i - 1
    while not keep[previous_ev]:
        previous_ev -= 1
        if previous_ev < 0:
            break
    if previous_ev < 0:
        # there is not previous event to compare with
        continue
    # -----------------------------------------
    # catalog row indexes and backprojection_events indexes do not always match
    # because catalog is ordered chronologically and some events' chronology
    # might have swapped during NLLoc's location process
    # -----------------------------------------
    event_id = nlloc_catalog.catalog.index[i]
    det = backprojection_events_dict[event_id]
    previous_event_id = nlloc_catalog.catalog.index[previous_ev]
    previous_det = backprojection_events_dict[previous_event_id]

    dt_sec = (
        nlloc_catalog.catalog.origin_time[i]
        - nlloc_catalog.catalog.origin_time[previous_ev]
    ).total_seconds()
    if dt_sec < MINIMUM_INTEREVENT_TIME_SEC:
        # test their epicentral distance
        epi_distance = BPMF.utils.two_point_epicentral_distance(
                nlloc_catalog.catalog.iloc[i]["latitude"],
                nlloc_catalog.catalog.iloc[i]["longitude"],
                nlloc_catalog.catalog.iloc[previous_ev]["latitude"],
                nlloc_catalog.catalog.iloc[previous_ev]["longitude"]
                )
        if epi_distance > MINIMUM_EPICENTRAL_DISTANCE_KM:
            # these two events are close in time but distant in space
            continue
        if (
            "NLLoc_reloc" in det.aux_data
            and "NLLoc_reloc" not in previous_det.aux_data
        ):
            keep[previous_ev] = False
        elif (
            "NLLoc_reloc" not in det.aux_data
            and "NLLoc_reloc" in previous_det.aux_data
        ):
            keep[i] = False
        elif det.hmax_unc < previous_det.hmax_unc:
            keep[previous_ev] = False
        elif det.hmax_unc > previous_det.hmax_unc:
            keep[i] = False
        else:
            # if both det and previous_det were not relocated with NLLoc,
            # then there origin times cannot be closer than MINIMUM_INTEREVENT_TIME_SEC
            continue

nlloc_catalog.catalog = nlloc_catalog.catalog[keep]
nlloc_catalog.catalog["event_id"] = [f"bp_{i}" for i in range(len(nlloc_catalog.catalog))]
nlloc_catalog.catalog
[31]:
longitude latitude depth origin_time hmax_unc hmin_unc az_hmax_unc vmax_unc event_id
event_id
2012-07-26T01:10:21.840000Z 30.322070 40.730625 12.117188 2012-07-26 01:10:21.840 1.779872 1.305840 161.988359 2.565296 bp_0
2012-07-26T01:15:54.200000Z 30.343555 40.716875 9.070312 2012-07-26 01:15:54.200 2.077245 1.130659 90.589373 2.339978 bp_1
2012-07-26T01:58:19.840000Z 30.260000 40.630000 -0.500000 2012-07-26 01:58:19.840 1.778483 0.702436 175.539454 2.400404 bp_2
2012-07-26T02:35:01.120000Z 30.341602 40.718125 8.867188 2012-07-26 02:35:01.120 3.878430 1.763533 113.548582 5.734983 bp_3
2012-07-26T03:00:38.760000Z 30.316211 40.718125 12.117188 2012-07-26 03:00:38.760 1.674063 1.431853 25.259715 3.172909 bp_4
2012-07-26T04:43:38.160000Z 30.315234 40.713750 12.421875 2012-07-26 04:43:38.160 4.718800 1.679460 122.483911 4.514452 bp_5
2012-07-26T04:48:38.400000Z 30.359180 40.714375 9.882812 2012-07-26 04:48:38.400 4.091946 1.587564 85.557525 4.705347 bp_6
2012-07-26T08:08:25.760000Z 30.337695 40.719375 8.664062 2012-07-26 08:08:25.760 3.033962 1.372813 103.115926 3.483848 bp_7
2012-07-26T10:07:23.760000Z 30.341113 40.715938 8.511719 2012-07-26 10:07:23.760 1.592323 0.957655 90.956024 2.790996 bp_8
2012-07-26T11:55:36.200000Z 30.290000 40.620000 0.000000 2012-07-26 11:55:36.200 5.414679 0.868935 124.691332 1.100644 bp_9
2012-07-26T13:35:26.440000Z 30.295215 40.799062 -1.949219 2012-07-26 13:35:26.440 3.780788 0.951556 87.980070 4.216654 bp_10
2012-07-26T13:48:32.600000Z 30.316211 40.720625 11.304688 2012-07-26 13:48:32.600 1.428168 1.294981 38.195164 2.869099 bp_11
2012-07-26T13:51:58.360000Z 30.308398 40.715625 12.320312 2012-07-26 13:51:58.360 1.634489 1.439889 29.107871 5.108501 bp_12
2012-07-26T13:53:31.680000Z 30.324023 40.720625 9.273438 2012-07-26 13:53:31.680 1.547559 1.530669 156.791350 5.334101 bp_13
2012-07-26T13:56:54.280000Z 30.342578 40.716250 8.765625 2012-07-26 13:56:54.280 2.667188 1.050842 80.353125 5.208330 bp_14
2012-07-26T15:06:20.080000Z 30.328174 40.600156 -1.974609 2012-07-26 15:06:20.080 0.919964 0.627512 157.405152 0.419150 bp_15
2012-07-26T17:28:22.080000Z 30.430000 40.720000 9.500000 2012-07-26 17:28:22.080 4.777719 0.886096 48.619974 2.428971 bp_16

The NLLoc locations are able to resolve the location differences between events much better than the backprojection locations. However, limits remain due to: - Errors in picks. - Approximations in velocity model (1D model whereas the area is hghly heterogeneous!). - Earthquakes are located outside the network, so large uncertainties cannot be avoided.

[32]:
fig = nlloc_catalog.plot_map(
    figsize=(10, 10), network=net, s=50, markersize_station=50, lat_margin=0.02, plot_uncertainties=True
    )
ax = fig.get_axes()[0]
ax.set_facecolor("dimgrey")
ax.patch.set_alpha(0.15)
../../_images/tutorial_notebooks_6_relocate_43_0.svg

Save the relocated events

Save all the events that were not identified as multiples to a hdf5 database.

[33]:
nlloc_catalog.catalog.to_csv(
    os.path.join(BPMF.cfg.OUTPUT_PATH, OUTPUT_CSV_FILENAME)
)
[35]:

for i in range(len(nlloc_catalog.catalog)): evid = nlloc_catalog.catalog.index[i] backprojection_events_dict[evid].write( OUTPUT_DB_FILENAME, db_path=BPMF.cfg.OUTPUT_PATH, gid=backprojection_events_dict[evid].id )

Relocation 2 (Bonus): Locating and estimating uncertainties with backprojection on a fine grid

The goal of this section is to explore an alternative to the PhaseNet + NonLinLoc workflow solely based on the use backprojection. If a “sparse” grid was used for the purpose of earthquake detection, then a fine grid can be used here.

In the following, we run a “vanilla” backprojection that backprojects the waveforms’ envelopes (as defined by the Hilbert transform).

[36]:
# PHASES: list of seismic phases to use for backprojection
PHASES = ["P", "S"]
# TT_FILENAME: name of the file with the travel times
TT_FILENAME = "tts.h5"
# N_MAX_STATIONS: the maximum number of stations used for stacking (e.g. the closest to a given grid point)
# note: this parameter is irrelevant in this example with only 8 stations, but it is important when
# using large seismic networks
N_MAX_STATIONS = 10

For backprojection, we need to load the travel time grid and set up the BPMF.template_search.Beamformer instance.

[37]:
tts = BPMF.template_search.TravelTimes(
    TT_FILENAME,
    tt_folder_path=BPMF.cfg.MOVEOUTS_PATH
)
tts.read(
    PHASES, source_indexes=None, read_coords=True, stations=net.stations
)
tts.convert_to_samples(BPMF.cfg.SAMPLING_RATE_HZ, remove_tt_seconds=True)
[38]:
EVENT_IDX = 0

# initialize a new beamformer because, in general, you might want to use a different
# travel-time table for this beamformer (e.g. denser)
bf_loc = BPMF.template_search.Beamformer(
    phases=PHASES,
    moveouts_relative_to_first=True,
)
# here, we need to "trick" the Beamformer instance and give it something similar to the Data instance in 5_backprojection
bf_loc.set_data(backprojection_events[EVENT_IDX])
# attach the Network instance
bf_loc.set_network(net)
# attach the TravelTimes instance
bf_loc.set_travel_times(tts)

# attach weights (the waveform features are the waveforms' envelopes)
# weights_phases = np.ones((net.n_stations, net.n_components, len(PHASES)), dtype=np.float32)
# weights_phases[:, :2, 0] = 0. # P-wave weight = 0 for horizontal components
# weights_phases[:, 2, 1] = 0. # S-wave weight = 0 for vertical components

# uncomment this section to perform the backprojection-based relocation
# with ML waveform features (PhaseNet)
weights_phases = np.ones((net.n_stations, 2, 2), dtype=np.float32)
weights_phases[:, 0, 1] = 0.0  # S-wave weights to zero for channel 0
weights_phases[:, 1, 0] = 0.0  # P-wave weights to zero for channel 1
bf_loc.set_weights_sources(N_MAX_STATIONS)
bf_loc.set_weights(weights_phases=weights_phases)
[39]:
# relocation parameters
LOCATION_ROUTINE = "beam"
RESTRICTED_DOMAIN_FOR_UNC_KM = 100.
DATA_FOLDER = "preprocessed_2_12"

[40]:
# uncomment this section to perform the backprojection-based relocation
# with ML waveform features (PhaseNet)

import torch
import seisbench.models as sbm

beam_relocated_event = copy.deepcopy(backprojection_events[EVENT_IDX])
ml_detector = sbm.PhaseNet.from_pretrained("original")
ml_detector.eval()

beam_relocated_event.read_waveforms(
    120., offset_ot=60., time_shifted=False,
    data_folder="preprocessed_2_12", data_reader=data_reader_mseed
)

data_arr = beam_relocated_event.get_np_array(net.stations)

data_arr_n = BPMF.utils.normalize_batch(data_arr)
closest_pow2 = int(np.log2(data_arr_n.shape[-1])) + 1
diff = 2**closest_pow2 - data_arr_n.shape[-1]
left = diff//2
right = diff//2 + diff%2
data_arr_n = np.pad(
        data_arr_n,
        ((0, 0), (0, 0), (left, right)),
        mode="reflect"
        )


with torch.no_grad():
    PN_probas = ml_detector(
        torch.from_numpy(data_arr_n).float()
    )
# remove channel 0 that is probability of noise
waveform_features = PN_probas[:, 1:, :].detach().numpy()

[41]:
beam_relocated_event = copy.deepcopy(backprojection_events[EVENT_IDX])

# uncomment this section if you are using envelopes (modulus of analytical signal) as waveform features
# beam_relocated_event.relocate(
#     routine=LOCATION_ROUTINE,
#     duration=120.,
#     offset_ot=40.,
#     beamformer=bf_loc,
#     data_folder=DATA_FOLDER,
#     restricted_domain_side_km=RESTRICTED_DOMAIN_FOR_UNC_KM,
#     data_reader=data_reader_mseed,
# )

# uncomment this section if you are using ML waveform features
beam_relocated_event.relocate(
    routine=LOCATION_ROUTINE,
    beamformer=bf_loc,
    data_folder=DATA_FOLDER,
    restricted_domain_side_km=RESTRICTED_DOMAIN_FOR_UNC_KM,
    data_reader=data_reader_mseed,
    waveform_features=waveform_features
)
[42]:
dummy_catalog = BPMF.dataset.Catalog.read_from_events(
    [backprojection_events[EVENT_IDX]],
    extra_attributes=["hmax_unc", "hmin_unc", "az_hmax_unc", "vmax_unc"]
    )
dummy_catalog.catalog["az_hmax_unc"] = dummy_catalog.catalog["az_hmax_unc"]%180.
dummy_catalog.catalog
[42]:
longitude latitude depth origin_time hmax_unc hmin_unc az_hmax_unc vmax_unc
event_id
2012-07-26T01:10:21.840000Z 30.32207 40.730625 12.117188 2012-07-26 01:10:21.840 1.779872 1.30584 161.988359 2.565296

The difference between this backprojection and the “detection backprojection” of 5_backprojection is that, here, we stored the value of each beam of the 4D (x, y, z, time) grid! Below, we plot 3 slices of the (x, y, z, time=time of max beam) volume.

[43]:
from cartopy.crs import PlateCarree

scatter_kwargs = {}
scatter_kwargs["edgecolor"] = "k"
scatter_kwargs["linewidths"] = 0.5
scatter_kwargs["s"] = 40
scatter_kwargs["zorder"] = 3
scatter_kwargs["color"] = "C3"



# when using the beam-based relocation routine, the entire grid of beams is stored in memory
fig_beam_reloc = bf_loc.plot_likelihood(figsize=(15, 11), likelihood=bf_loc.likelihood)
axes = fig_beam_reloc.get_axes()
fig_beam_reloc = dummy_catalog.plot_map(
    ax=axes[0], network=net, s=50, markersize_station=50, lat_margin=0.02, plot_uncertainties=True, depth_colorbar=False
    )

# the following is a bit cumbersome and is just to plot the uncertainty ellipses on the depth cross-sections
ax = axes[0]
data_coords = PlateCarree()
projected_coords_nlloc = ax.projection.transform_points(
    data_coords,
    np.array([backprojection_events[EVENT_IDX].longitude]),
    np.array([backprojection_events[EVENT_IDX].latitude]),
)
projected_coords_backproj = ax.projection.transform_points(
    data_coords,
    np.array([beam_relocated_event.longitude]),
    np.array([beam_relocated_event.latitude]),
)

ax.scatter(
    beam_relocated_event.longitude,
    beam_relocated_event.latitude,
    marker="*",
    s=40,
    color="k",
    label="Initial backprojection location",
    transform=data_coords
)

axes[1].scatter(
    projected_coords_nlloc[0, 0], backprojection_events[EVENT_IDX].depth, **scatter_kwargs
)
axes[1].scatter(
    projected_coords_backproj[0, 0],
    initial_catalog.catalog.loc[EVENT_IDX, "depth"],
    marker="*",
    s=40,
    color="k",
)
ew_ellipse_lon, ew_ellipse_lat, ew_ellipse_dep = BPMF.plotting_utils.vertical_uncertainty_ellipse(
    backprojection_events[EVENT_IDX].aux_data["cov_mat"],
    backprojection_events[EVENT_IDX].longitude,
    backprojection_events[EVENT_IDX].latitude,
    backprojection_events[EVENT_IDX].depth,
    horizontal_direction="longitude"
)
projected_ellipse_coords = ax.projection.transform_points(
    data_coords,
    ew_ellipse_lon,
    ew_ellipse_lat,
)
axes[1].plot(
    projected_ellipse_coords[:, 0], ew_ellipse_dep, color=scatter_kwargs["color"]
)

axes[2].scatter(
    backprojection_events[EVENT_IDX].depth, projected_coords_nlloc[0, 1], **scatter_kwargs
)
axes[2].scatter(
    initial_catalog.catalog.loc[EVENT_IDX, "depth"],
    projected_coords_backproj[0, 1],
    marker="*",
    s=40,
    color="k",
)
ns_ellipse_lon, ns_ellipse_lat, ns_ellipse_dep = BPMF.plotting_utils.vertical_uncertainty_ellipse(
    backprojection_events[EVENT_IDX].aux_data["cov_mat"],
    backprojection_events[EVENT_IDX].longitude,
    backprojection_events[EVENT_IDX].latitude,
    backprojection_events[EVENT_IDX].depth,
    horizontal_direction="latitude"
)
projected_ellipse_coords = ax.projection.transform_points(
    data_coords,
    ns_ellipse_lon,
    ns_ellipse_lat,
)
axes[2].plot(
    ns_ellipse_dep, projected_ellipse_coords[:, 1], color=scatter_kwargs["color"]
)

axes[1].set_ylim(20., axes[1].get_ylim()[1])
axes[2].set_xlim(axes[2].get_xlim()[0], 20.)

for i, ax in enumerate(axes[:-1]):
    ax.text(0.02, 0.98, f'({string.ascii_lowercase[i]})', va='top',
            fontsize=20, ha='left', bbox={"facecolor": "white", "edgecolor": "none", "alpha": 0.75},
            transform=ax.transAxes)
../../_images/tutorial_notebooks_6_relocate_57_0.svg

The current way of estimating location uncertainties with the backprojection method is to compute a weighted average distance between the maximum beam grid point (indexed by \(k^*\)) and all other grid points. To avoid the result to be grid-size dependent, we limit the weighted average to a restricted rectangular domain taken around the maximum beam grid point.

\[h_{unc} = \dfrac{L_k e_{i,k^*}}{\sum_{k \in \mathcal{R}} L_k},\]
\[v_{unc} = \dfrac{L_k \Delta d_{i,k^*}}{\sum_{k \in \mathcal{R}} L_k},\]

where \(e_{i,k^*}\) is the epicentral distance between grid points \(i\) and \(k^*\) and \(\Delta d_{i,k^*}\) is the depth difference between grid points \(i\) and \(k^*\).

In this case, we assume the uncertainty ellipse to be isotropic, that is, \(h_{max,unc} = h_{min,unc}\).

[44]:
EVENT_IDX = 0
ev = backprojection_events[EVENT_IDX]
print(f"The maximum horizontal location uncertainty of event {EVENT_IDX} is {ev.hmax_unc:.2f}km.")
print(f"The azimuth of the maximum horizontal uncertainty is {ev.az_hmax_unc:.2f}degrees.")
print(f"The mainimum horizontal location uncertainty of event {EVENT_IDX} is {ev.hmin_unc:.2f}km.")
print(f"The azimuth of the minimum horizontal uncertainty is {ev.az_hmin_unc:.2f}degrees.")
print(f"The maximum vertical location uncertainty is {ev.vmax_unc:.2f}km with {ev._pl_vmax_unc%90:.2f}degrees plunge.")
The maximum horizontal location uncertainty of event 0 is 1.78km.
The azimuth of the maximum horizontal uncertainty is 161.99degrees.
The mainimum horizontal location uncertainty of event 0 is 1.31km.
The azimuth of the minimum horizontal uncertainty is 71.99degrees.
The maximum vertical location uncertainty is 2.57km with 26.27degrees plunge.