BPMF.dataset

class BPMF.dataset.Catalog(longitudes, latitudes, depths, origin_times, event_ids=None, **kwargs)[source]

Bases: object

A class for catalog data, and basic plotting.

classmethod concatenate(catalogs, ignore_index=True)[source]

Build catalog from list of pandas.DataFrame.

Parameters:

catalogs (list of pandas.DataFrame) – List of pandas.DataFrame with consistent columns.

property depth
property latitude
property longitude
property n_events
property origin_time
plot_map(ax=None, figsize=(20, 10), depth_min=0.0, depth_max=20.0, network=None, plot_uncertainties=False, depth_colorbar=True, **kwargs)[source]

Plot epicenters on map.

Parameters:
  • ax (matplotlib.pyplot.Axes, default to None) – If None, create a new plt.Figure and plt.Axes instances. If speficied by user, use the provided instance to plot.

  • figsize (tuple of floats, default to (20, 10)) – Size, in inches, of the figure (width, height).

  • depth_min (scalar float, default to 0) – Smallest depth, in km, in the depth colormap.

  • depth_max (scalar float, default to 20) – Largest depth, in km, in the depth colormap.

  • network (BPMF.dataset.Network, optional) – If provided, use information in network to plot the stations.

  • plot_uncertainties (boolean, default to False) – If True, plot the location uncertainty ellipses.

  • depth_colorbar (boolean, default to True) – If True, plot the depth colorbar on the left.

Returns:

fig – The figure with depth color-coded epicenters.

Return type:

matplotlib.pyplot.Figure

plot_space_time(ax=None, figsize=(20, 10), color_coded='longitude', y_axis='latitude', **kwargs)[source]

Plot the space-time event distribution.

Parameters:
  • ax (plt.Axes, default to None) – If None, create a new plt.Figure and plt.Axes instances. If speficied by user, use the provided instance to plot.

  • figsize (tuple of floats, default to (20, 10)) – Size, in inches, of the figure (width, height).

  • color_coded (string, default to 'longitude') – Can be either ‘longitude’, ‘latitude’, or ‘depth’. This is the attribute used to define the color scale of each dot.

  • y_axis (string, default to 'latitude') – Can be either ‘longitude’, ‘latitude’, or ‘depth’. This is the attribute used to define the y-axis coordinates.

Returns:

fig – The figure with color coded latitudes or longitudes.

Return type:

matplotlib.pyplot.Figure

plot_time_statistics(UTC_local_corr=0.0, figsize=(16, 7), **kwargs)[source]

Plot the histograms of time of the day and day of the week.

Parameters:
  • figsize (tuple of floats, optional) – Size, in inches, of the figure (width, height). Default is (16, 7).

  • UTC_local_corr (float, optional) –

    Apply UTC to local time correction such that:

    local_hour = UTC_hour + UTC_local_corr

Returns:

fig – The created matplotlib Figure object.

Return type:

matplotlib.figure.Figure

classmethod read_from_dataframe(dataframe)[source]

Initialize a Catalog instance from a pandas.DataFrame instance.

Parameters:

dataframe (pandas.DataFrame) – pandas.DataFrame from which the Catalog instance will be built.

Returns:

catalogCatalog instance.

Return type:

Catalog

classmethod read_from_detection_file(filename, db_path='', gid=None, extra_attributes=[], fill_value=nan, return_events=False)[source]

Read all detected events and build catalog.

Parameters:
  • filename (string) – Name of the hdf5 file with the event data base.

  • db_path (string, optional) – Name of the directory where filename is located. Default is config.OUTPUT_PATH.

  • gid (string or float or int, optional) – Name of the group to read within the hdf5 file. If None (default), reads all the groups in the root directory of the hdf5 file.

  • extra_attributes (list, optional) – Default attributes included in the catalog are: longitude, latitude, depth and origin time. Any extra attribute that should be returned in the catalog should be given here as a string in a list. Example: extra_attributes=[‘hmax_inc’, ‘vmax_inc’]

  • fill_value (any, optional) – Value that is returned in the catalog if an attribute from extra_attributes is not found. Default is numpy.nan.

  • return_events (boolean, optional) – If True, returns the list of Event instances in addition to the Catalog instance. Default is False.

Returns:

  • catalog (Catalog) – Catalog instance.

  • events (list, optional) – List of Event instances of return_events=True.

classmethod read_from_events(events, extra_attributes=[], fill_value=nan)[source]

Build catalog from list of Event instances.

Parameters:
  • events (list) – List of Event instances.

  • extra_attributes (list, optional) – Default attributes included in the catalog are: longitude, latitude, depth and origin time. Any extra attribute that should be returned in the catalog should be given here as a string in a list. Example: extra_attributes=[‘hmax_inc’, ‘vmax_inc’]

  • fill_value (any, optional) – Value that is returned in the catalog if an attribute from extra_attributes is not found. Default is numpy.nan.

Returns:

catalogCatalog instance.

Return type:

Catalog

class BPMF.dataset.Data(date, where, data_reader, duration=86400.0, sampling_rate=None)[source]

Bases: object

A Data class to manipulate waveforms and metadata.

get_np_array(stations, components=['N', 'E', 'Z'], component_aliases={'E': ['E', '2'], 'N': ['N', '1'], 'Z': ['Z']}, priority='HH', verbose=True)[source]

Arguments go to BPMF.utils.get_np_array.

read_waveforms(trim_traces=True, **reader_kwargs)[source]

Read the waveform time series.

Parameters:

trim_traces (bool, optional) – If True, call trim_waveforms to make sure all traces have the same start time.

Notes

Additional parameters can be provided as keyword arguments **reader_kwargs which are passed to the data_reader method for waveform reading.

set_availability(stations, components=['N', 'E', 'Z'], component_aliases={'E': ['E', '2'], 'N': ['N', '1'], 'Z': ['Z']})[source]

Set the data availability.

A station is available if at least one station has non-zero data. The availability is then accessed via the property self.availability.

Parameters:
  • stations (list of strings or numpy.ndarray) – Names of the stations on which we check availability. If None, use self.stations.

  • components (list, optional) – List of component codes to consider for availability check, by default [“N”, “E”, “Z”]

  • component_aliases (dict, optional) – Dictionary mapping component codes to their aliases, by default {“N”: [“N”, “1”], “E”: [“E”, “2”], “Z”: [“Z”]}

Return type:

None

property sr
property time
trim_waveforms(starttime=None, endtime=None)[source]

Trim waveforms.

Adjusts the start and end times of waveforms to ensure all traces are synchronized.

Parameters:
  • starttime (str or datetime, optional) – Start time to use for trimming the waveforms. If None, self.date is used as the start time. (default None)

  • endtime (str or datetime, optional) – End time to use for trimming the waveforms. If None, self.date + self.duration is used as the end time. (default None)

Return type:

None

class BPMF.dataset.Event(origin_time, moveouts, stations, phases, data_filename, data_path, latitude=None, longitude=None, depth=None, component_aliases={'E': ['E', '2'], 'N': ['N', '1'], 'Z': ['Z']}, sampling_rate=None, components=['N', 'E', 'Z'], id=None, data_reader=None)[source]

Bases: object

An Event class to describe any collection of waveforms.

property availability
property availability_per_cha
property availability_per_sta
property az_hmax_unc
property az_hmin_unc
compute_snr(noise_window_sec=5.0, **data_reader_kwargs)[source]
Parameters:

noise_window_sec (float, optional) –

get_np_array(stations, components=None, priority='HH', verbose=True)[source]

Arguments are passed to BPMF.utils.get_np_array.

get_peak_amplitudes(stations, components)[source]

Get peak waveform amplitudes.

The peak waveform amplitudes are typically used to compute amplitude-based local magnitudes.

Parameters:
  • stations (list of strings) – Names of the stations to include in the output array. Define the order of the station axis.

  • components (list of strings, default to ['N','E','Z']) – Names of the components to include in the output array. Define the order of the component axis.

Returns:

peak_amplitudes – (num_stations, num_components) numpy.ndarray with the peak waveform amplitude on each channel.

Return type:

numpy.ndarray

property hmax_unc
property hmin_unc
hor_ver_uncertainties(mode='intersection')[source]

Compute the horizontal and vertical uncertainties on location.

Returns the errors as given by the 68% confidence ellipsoid.

Parameters:

mode (str, optional) – Specifies the mode for calculating the horizontal uncertainties. - If ‘intersection’, the horizontal uncertainties are the lengths of the semi-axes of the ellipse defined by the intersection between the confidence ellipsoid and the horizontal plane. - If ‘projection’, the horizontal uncertainties are the maximum and minimum spans of the confidence ellipsoid in the horizontal directions. (default ‘intersection’)

Return type:

None

Raises:
  • None

  • New Attributes

  • --------------

:raises _hmax_unc : float: The maximum horizontal uncertainty in kilometers. :raises _hmin_unc : float: The minimum horizontal uncertainty in kilometers. :raises _vmax_unc : float: The maximum vertical uncertainty in kilometers. :raises _az_hmax_unc : float: The azimuth (angle from north) of the maximum horizontal uncertainty in degrees. :raises _az_hmin_unc : float: The azimuth (angle from north) of the minimum horizontal uncertainty in degrees.

Notes

  • The sum of _hmax_unc and _vmax_unc does not necessarily equal the

maximum length of the uncertainty ellipsoid’s semi-axis; the latter represents the longest semi-axis of the ellipsoid.

property location
n_best_SNR_stations(n, available_stations=None)[source]

Adjust self.stations to the n best SNR stations.

This function finds the n best stations based on signal-to-noise ratio (SNR) and modifies the self.stations attribute accordingly. The instance’s properties will also change to reflect the new selection of stations.

Parameters:
  • n (int) – The number of best SNR stations to select.

  • available_stations (list of str, default None) – The list of stations from which to search for the closest stations. If provided, only stations in this list will be considered. This can be used to exclude stations that are known to lack data.

n_closest_stations(n, available_stations=None)[source]

Adjust self.stations to the n closest stations.

Finds the n closest stations based on distance and modifies self.stations accordingly. The instance’s properties will also change to reflect the updated station selection.

Parameters:
  • n (int) – The number of closest stations to fetch.

  • available_stations (list of strings, optional) – The list of stations from which the closest stations are searched. If certain stations are known to lack data, they can be excluded from the closest stations selection. (default None)

Return type:

None

pick_PS_phases(duration, threshold_P=0.6, threshold_S=0.6, offset_ot=-10.0, mini_batch_size=126, phase_on_comp={'1': 'S', '2': 'S', 'E': 'S', 'N': 'S', 'Z': 'P'}, component_aliases={'E': ['E', '2'], 'N': ['N', '1'], 'Z': ['Z']}, upsampling=1, downsampling=1, use_apriori_picks=False, search_win_sec=2.0, ml_model=None, ml_model_name='original', keep_probability_time_series=False, phase_probability_time_series=None, **kwargs)[source]

Use PhaseNet (Zhu et al., 2019) to pick P and S waves (Event class).

Parameters:
  • duration (float) – Duration of the time window, in seconds, to process and search for P and S wave arrivals.

  • threshold_P (float, optional) – Threshold on PhaseNet’s probabilities to trigger the identification of a P-wave arrival. (default 0.60)

  • threshold_S (float, optional) – Threshold on PhaseNet’s probabilities to trigger the identification of an S-wave arrival. (default 0.60)

  • offset_ot (float, optional) – Offset in seconds to apply to the origin time. (default cfg.BUFFER_EXTRACTED_EVENTS_SEC)

  • mini_batch_size (int, optional) – Number of traces processed in a single batch by PhaseNet. (default 126)

  • phase_on_comp (dict, optional) – Dictionary defining the seismic phase extracted on each component. For example, phase_on_comp[‘N’] specifies the phase extracted on the north component. (default {“N”: “S”, “1”: “S”, “E”: “S”, “2”: “S”, “Z”: “P”})

  • component_aliases (dict, optional) – Dictionary mapping each component to a list of strings representing aliases used for the same component. For example, component_aliases[‘N’] = [‘N’, ‘1’] means that both the ‘N’ and ‘1’ channels will be mapped to the Event’s ‘N’ channel. (default {“N”: [“N”, “1”], “E”: [“E”, “2”], “Z”: [“Z”]})

  • upsampling (int, optional) – Upsampling factor applied before calling PhaseNet. (default 1)

  • downsampling (int, optional) – Downsampling factor applied before calling PhaseNet. (default 1)

  • use_apriori_picks (bool, optional) – Flag indicating whether to use apriori picks for refining the P and S wave picks. (default False)

  • search_win_sec (float, optional) – Search window size, in seconds, used for refining the P and S wave picks. (default 2.0)

  • ml_model (object, optional) – Pre-trained PhaseNet model object. If not provided, the default model will be loaded. (default None)

  • ml_model_name (str, optional) – Name of the pre-trained PhaseNet model to load if ml_model is not provided. (default “original”)

  • keep_probability_time_series (bool, optional) – If True, the phase probability time series are stored in a new class attribute, self.probability_time_series. (default True)

  • phase_probability_time_series (BPMF.dataset.WaveformTransform, optional) – If different from None, the phase probabilities are not re-computed but, instead, are fetched from an existing BPMF.dataset.WaveformTransform instance. (default None)

  • **kwargs (dict, optional) – Extra keyword arguments passed to Event.read_waveforms.

Return type:

None

Notes

  • PhaseNet must be used with 3-component data.

  • Results are stored in the object’s attribute self.picks.

property pl_vmax_unc
plot(figsize=(20, 15), gain=1000000.0, stations=None, ylabel='Velocity ($\\mu$m/s)', plot_picks=True, plot_predicted_arrivals=True, plot_probabilities=False, **kwargs)[source]

Plot the waveforms of the Event instance.

This function plots the waveforms associated with the Event instance. The waveforms are plotted as subplots, with each subplot representing a station and component combination. The start and end times of the waveforms are determined to set the x-axis limits of each subplot. The picks and theoretical arrival times associated with the Event are overlaid on the waveforms.

Parameters:
  • figsize (tuple, optional) – The figure size in inches, specified as a tuple (width, height). Defaults to (20, 15).

  • gain (float, optional) – The gain factor applied to the waveform data before plotting. Defaults to 1.0e6.

  • stations (list of str, optional) – The list of station names for which to plot the waveforms. If None, all stations associated with the Event are plotted. Defaults to None.

  • ylabel (str, optional) – The label for the y-axis. Defaults to r”Velocity ($mu$m/s).

  • **kwargs – Additional keyword arguments that are passed to the matplotlib plot function.

Returns:

fig – The Figure instance produced by this method.

Return type:

matplotlib.figure.Figure

classmethod read_from_file(filename=None, db_path='', hdf5_file=None, gid=None, data_reader=None)[source]

Initialize an Event instance from filename.

Parameters:
  • filename (string, default to None) – Name of the hdf5 file with the event’s data. If None, then hdf5_file should be specified.

  • db_path (string, default to cfg.INPUT_PATH) – Name of the directory where filename is located.

  • gid (string, default to None) – If not None, this string is the hdf5’s group name of the event.

  • hdf5_file (h5py.File, default to None) – If not None, is an opened file pointing directly at the subfolder of interest.

  • data_reader (function, default to None) – Function that takes a path and optional key-word arguments to read data from this path and returns an obspy.Stream instance. If None, data_reader has to be specified when calling read_waveforms.

Returns:

event – The Event instance defined by the data in filename.

Return type:

Event instance

read_waveforms(duration, phase_on_comp={'1': 'S', '2': 'S', 'E': 'S', 'N': 'S', 'Z': 'P'}, component_aliases={'E': ['E', '2'], 'N': ['N', '1'], 'Z': ['Z']}, offset_phase={'P': 1.0, 'S': 4.0}, time_shifted=True, offset_ot=-10.0, data_reader=None, n_threads=1, **reader_kwargs)[source]

Read waveform data (Event class).

Parameters:
  • duration (float) – Duration of the extracted time windows in seconds.

  • phase_on_comp (dict, optional) – Dictionary defining the seismic phase extracted on each component. For example, phase_on_comp[‘N’] specifies the phase extracted on the north component. (default {“N”: “S”, “1”: “S”, “E”: “S”, “2”: “S”, “Z”: “P”})

  • component_aliases (dict, optional) – Dictionary mapping each component to a list of strings representing aliases used for the same component. For example, component_aliases[‘N’] = [‘N’, ‘1’] means that both the ‘N’ and ‘1’ channels will be mapped to the Event’s ‘N’ channel. (default {“N”: [“N”, “1”], “E”: [“E”, “2”], “Z”: [“Z”]})

  • offset_phase (dict, optional) – Dictionary defining when the time window starts with respect to the pick. A positive offset means the window starts before the pick. Not used if time_shifted is False. (default {“P”: 1.0, “S”: 4.0})

  • time_shifted (bool, optional) – If True (default), the moveouts are used to extract time windows from specific seismic phases. If False, windows are simply extracted with respect to the origin time.

  • offset_ot (float, optional) – Only used if time_shifted is False. Time in seconds taken before origin_time. (default cfg.BUFFER_EXTRACTED_EVENTS_SEC)

  • data_reader (function, optional) – Function that takes a path and optional keyword arguments to read data from this path and returns an obspy.Stream instance. If None (default), this function uses self.data_reader and returns None if self.data_reader=None.

  • n_threads (int, optional) – Number of threads used to parallelize reading. Default is 1 (sequential reading).

  • **reader_kwargs (dict, optional) – Extra keyword arguments passed to the data_reader function.

Return type:

None

Raises:

None

Notes

  • This function populates the self.traces attribute with the waveform data.

  • The duration and component_aliases are stored as attributes in the instance.

  • The phase_on_comp and offset_phase information is stored

as auxiliary data in the instance. - If reader_kwargs contains the “attach_response” key set to True, traces without instrument response information are removed from self.traces.

relocate(*args, routine='NLLoc', **kwargs)[source]

Wrapper function for earthquake relocation using multiple methods.

This function serves as an interface for the earthquake relocation process using different relocation routines. The routine to be used is specified by the routine parameter. All keyword arguments are passed to the corresponding routine.

Parameters:

routine (str, optional) –

Method used for the relocation. Available options are: - ‘NLLoc’: Calls the relocate_NLLoc method and requires the event

object to have the picks attribute.

  • ’beam’: Calls the relocate_beam method.

Default is ‘NLLoc’.

Notes

  • The relocate function acts as a wrapper that allows for flexibility in choosing the relocation routine.

  • Depending on the specified routine, the function calls the corresponding relocation method with the provided arguments and keyword arguments.

relocate_NLLoc(stations=None, method='EDT', max_epicentral_dist_km_S=None, verbose=0, cleanup_out_dir=True, **kwargs)[source]

Relocate the event using NonLinLoc (NLLoc) based on the provided picks.

Parameters:
  • stations (list of str, optional) – Names of the stations to include in the relocation process. If None, all stations in self.stations are used. Default is None.

  • method (str, optional) – Optimization algorithm used by NonLinLoc. Available options are: ‘GAU_ANALYTIC’, ‘EDT’, ‘EDT_OT’, ‘EDT_OT_WT_ML’. Refer to NonLinLoc’s documentation for more details. Default is ‘EDT’.

  • verbose (int, optional) – Verbosity level of NonLinLoc. If greater than 0, NonLinLoc’s outputs are printed to the standard output. Default is 0.

  • cleanup_out_dir (bool, optional) – If True, NLLoc’s output files are deleted after reading the relevant information. Default is True.

  • **kwargs (dict, optional) – Additional keyword arguments for BPMF.NLLoc_utils.write_NLLoc_control.

Notes

  • The event’s attributes, including the origin time and location, are updated.

  • The theoretical arrival times are attached to the event in Event.arrival_times.

relocate_beam(beamformer, duration=60.0, offset_ot=-10.0, phase_on_comp={'1': 'S', '2': 'S', 'E': 'S', 'N': 'S', 'Z': 'P'}, component_aliases={'E': ['E', '2'], 'N': ['N', '1'], 'Z': ['Z']}, waveform_features=None, uncertainty_method='spatial', restricted_domain_side_km=100.0, device='cpu', **kwargs)[source]

Use a Beamformer instance for backprojection to relocate the event.

Parameters:
  • beamformer (BPMF.template_search.Beamformer) – Beamformer instance used for backprojection.

  • duration (float, optional) – Duration, in seconds, of the extracted time windows. Default is 60.0 seconds.

  • offset_ot (float, optional) – Time in seconds taken before the origin_time. Only used if time_shifted is False. Default is cfg.BUFFER_EXTRACTED_EVENTS_SEC.

  • phase_on_comp (dict, optional) – Dictionary defining which seismic phase is extracted on each component. For example, phase_on_comp[‘N’] gives the phase extracted on the north component. Default is {“N”: “S”, “1”: “S”, “E”: “S”, “2”: “S”, “Z”: “P”}.

  • component_aliases (dict, optional) – Dictionary where each entry is a list of strings. component_aliases[comp] is the list of all aliases used for the same component ‘comp’. For example, component_aliases[‘N’] = [‘N’, ‘1’] means that both the ‘N’ and ‘1’ channels will be mapped to the Event’s ‘N’ channel. Default is {“N”: [“N”, “1”], “E”: [“E”, “2”], “Z”: [“Z”]}.

  • waveform_features (numpy.ndarray, optional) – If not None, it must be a (num_stations, num_channels, num_time_samples) numpy.ndarray containing waveform features or characteristic functions that are backprojected onto the grid of theoretical seismic sources. Default is None.

  • restricted_domain_side_km (float, optional) – The location uncertainties are computed on the full 3D beam at the time when the 4D beam achieves its maximum over the duration seconds. To avoid having grid-size-dependent uncertainties, it is useful to truncate the domain around the location of the maximum beam. This parameter controls the size of the truncated domain. Default is 100.0 km.

  • device (str, optional) – Device to be used for computation. Must be either ‘cpu’ (default) or ‘gpu’.

  • **kwargs (dict, optional) – Additional keyword arguments.

Notes

  • The function updates the origin_time, longitude, latitude, and depth attributes of the event based on the maximum focusing of the beamformer.

  • Location uncertainties are estimated based on the computed likelihood, a restricted domain, and the location coordinates.

  • The estimated uncertainties can be accessed through the Event’s

properties Event.hmax_unc, Event.hmin_unc, Event.vmax_unc.

remove_distant_stations(max_distance_km=50.0)[source]

Remove picks on stations that are further than a given distance.

This function removes picks on stations that are located at a distance greater than the specified maximum distance. The distance between the source and each station is computed using the source_receiver_dist attribute. Picks on stations beyond the maximum distance are set to NaN.

Parameters:

max_distance_km (float, optional) – Maximum distance, in kilometers, beyond which picks are considered to be on stations that are too distant. Picks on stations with a distance greater than this threshold will be set to NaN. Default is 50.0.

Notes

  • The function checks if the source_receiver_dist attribute is present in the object. If it is not available, an informative message is printed and the function returns.

  • For each station in stations, the distance between the source and the station is retrieved from the source_receiver_dist attribute.

  • If the distance of a station is greater than the specified maximum distance (max_distance_km), the picks associated with that station are set to NaN.

remove_outlier_picks(max_diff_percent=25.0)[source]

Remove picks that are too far from the predicted arrival times.

Picks that have a difference exceeding the specified threshold between the picked arrival time and the predicted arrival time are considered outliers and are removed.

Parameters:

max_diff_percent (float, optional) – Maximum allowable difference, in percentage, between the picked and predicted arrival times. Picks with a difference greater than this threshold will be considered outliers and removed. Default is 25.0.

set_arrival_times_from_moveouts(verbose=1)[source]

Build arrival times assuming at = ot + mv.

set_aux_data(aux_data)[source]

Adds any extra data to the Event instance.

Parameters:

aux_data (dictionary) – Dictionary with any auxiliary data.

set_availability(stations=None, components=['N', 'E', 'Z'], component_aliases={'E': ['E', '2'], 'N': ['N', '1'], 'Z': ['Z']})[source]

Set the data availability.

A station is available if at least one station has non-zero data. The availability is then accessed via the property self.availability.

Parameters:

stations (list of strings or numpy.ndarray, default to None) – Names of the stations on which we check availability. If None, use self.stations.

set_component_aliases(component_aliases)[source]

Set or modify the component_aliases attribute.

Parameters:

component_aliases (Dictionary) – Each entry of the dictionary is a list of strings. component_aliases[comp] is the list of all aliases used for the same component ‘comp’. For example, component_aliases[‘N’] = [‘N’, ‘1’] means that both the ‘N’ and ‘1’ channels will be mapped to the Event’s ‘N’ channel.

set_components(components)[source]

Set the list of components.

Parameters:

components (list of strings) – The names of the components on which the Template instance will work after this call to self.set_components.

set_moveouts_to_empirical_times()[source]

Set moveouts equal to picks, if available.

set_moveouts_to_theoretical_times()[source]

Set moveouts equal to theoretical arrival times, if available.

set_source_receiver_dist(network)[source]

Set source-receiver distances using the provided network.

This function calculates the hypocentral and epicentral distances between the event’s source location and the stations in the given network. It stores the distances as attributes of the event object.

Parameters:

network (dataset.Network) – The Network instance containing the station coordinates to use in the source-receiver distance calculation.

Notes

  • The function uses the BPMF.utils.compute_distances function to compute both hypocentral and epicentral distances between the event’s source location and the stations in the network.

  • The computed distances can be accessed with the Event’s properties

Event.source_receiver_dist and Event.source_receiver_epicentral_dist.

property snr
property source_receiver_dist
property source_receiver_epicentral_dist
property sr
trim_waveforms(starttime=None, endtime=None)[source]

Trim waveforms to a specified time window.

This function trims the waveforms in the event’s traces attribute to the specified time window. It ensures that all traces have the same start time by adjusting the start time if necessary.

Parameters:
  • starttime (str or datetime, optional) – The start time of the desired time window. If None, the event’s date attribute is used as the start time. Default is None.

  • endtime (str or datetime, optional) – The end time of the desired time window. If None, the end time is set as self.date + self.duration, where self.date is the event’s date attribute and self.duration is the event’s duration attribute. Default is None.

update_aux_data_database(overwrite=False)[source]

Update the auxiliary data in the database with new elements from self.aux_data.

This function adds new elements from the self.aux_data attribute to the database located at path_database. If the element already exists in the database and overwrite is False, it skips the element. If overwrite is True, it overwrites the existing data in the database with the new data.

Parameters:

overwrite (bool, optional) – If True, overwrite existing data in the database. If False (default), skip existing elements and do not modify the database.

Notes

  • If any error occurs during the process, the function removes the lock file and raises the exception.

Raises:

Exception – If an error occurs while updating the database, the function raises the exception after removing the lock file.

update_picks()[source]

Update the picks with respect to the current origin time.

This function updates the picks of each station with respect to the current origin time of the event. It computes the relative pick times by subtracting the origin time from the absolute pick times and update the picks attribute of the event.

update_travel_times()[source]

Update the travel times with respect to the current origin time.

This function updates the travel times of each station and phase with respect to the current origin time of the event. It adjusts the propagation times by subtracting the origin time from the absolute times and update the arrival_times attribute of the event.

property vmax_unc
property waveforms_arr

Return traces in numpy.ndarray.

write(db_filename, db_path='', save_waveforms=False, gid=None, hdf5_file=None)[source]

Write the event information to an HDF5 file.

This function writes the event information, including waveform data if specified, to an HDF5 file. The event information is stored in a specific group within the file. The function uses the h5py module to interact with the HDF5 file.

Parameters:
  • db_filename (str) – Name of the HDF5 file storing the event information.

  • db_path (str, optional) – Path to the directory where the HDF5 file is located. Defaults to cfg.OUTPUT_PATH.

  • save_waveforms (bool, optional) – If True, save the waveform data associated with the event in the HDF5 file. Defaults to False.

  • gid (str, optional) – Name of the HDF5 group where the event will be stored. If gid is None, the event is stored directly at the root of the HDF5 file. Defaults to None.

  • hdf5_file (h5py.File, optional) – An opened HDF5 file object pointing directly to the subfolder of interest. If provided, the db_path parameter is ignored. Defaults to None.

Notes

  • The function uses the read_write_waiting_list function from the utils module to handle the writing operation in a thread-safe manner. The actual writing is performed by the _write method, which is called with the appropriate parameters.

  • The read_write_waiting_list function handles the synchronization and queueing of write operations to avoid conflicts when multiple processes try to write to the same file simultaneously. The function waits for a lock before executing the func partial function, which performs the actual writing operation. BUT the queueing is still not bullet-proof.

zero_out_clipped_waveforms(kurtosis_threshold=-1.0)[source]

Find waveforms with anomalous statistic and zero them out.

This function identifies waveforms with a kurtosis value below the specified threshold as anomalous and zeros out their data. The kurtosis of a waveform is a measure of its statistical distribution, with a value of 0 indicating a Gaussian distribution.

Parameters:

kurtosis_threshold (float, optional) – Threshold below which the kurtosis is considered anomalous. Waveforms with a kurtosis value lower than this threshold will have their data zeroed out. Default is -1.0.

Notes

  • This is an oversimplified technique to find clipped waveforms.

class BPMF.dataset.EventGroup(events, network)[source]

Bases: Family

A class for a group of events.

Each event is represented by a dataset.Event instance.

events

List of dataset.Event instances constituting the group.

Type:

list

network

The Network instance used to query consistent data across all events.

Type:

dataset.Network

SVDWF_stack(freqmin, freqmax, sampling_rate, expl_var=0.4, max_singular_values=5, wiener_filter_colsize=None)[source]

Apply Singular Value Decomposition Waveform Filtering (SVDWF) and stack the waveforms.

Parameters:
  • freqmin (float) – Minimum frequency in Hz for the bandpass filter.

  • freqmax (float) – Maximum frequency in Hz for the bandpass filter.

  • sampling_rate (float) – Sampling rate in Hz of the waveforms.

  • expl_var (float, default 0.4) – Explained variance ratio threshold for retaining singular values.

  • max_singular_values (int, default 5) – Maximum number of singular values to retain during SVDWF.

  • wiener_filter_colsize (int, default None) – Size of the column blocks used for the Wiener filter.

Return type:

None

Notes

See: Moreau, L., Stehly, L., Boué, P., Lu, Y., Larose, E., & Campillo, M. (2017). Improving ambient noise correlation functions with an SVD-based Wiener filter. Geophysical Journal International, 211(1), 418-426.

property n_events
read_waveforms(duration, time_shifted=False, progress=False, **kwargs)[source]

Call dataset.Event.read_waveform with each event.

Parameters:
  • duration (float) – Duration of the waveform to read, in seconds.

  • time_shifted (bool, default False) – Whether to apply time shifting to the waveforms.

  • progress (bool, default False) – Whether to display progress information during waveform reading.

  • **kwargs (dict) – Additional keyword arguments to pass to dataset.Event.read_waveforms.

Return type:

None

class BPMF.dataset.Family[source]

Bases: ABC

An abstract class for several subclasses.

property components
get_moveouts_arr()[source]
get_waveforms_arr()[source]
property moveouts_arr
normalize(method='rms')[source]

Normalize the template waveforms.

Parameters:

method (string, default to 'rms') – Either ‘rms’ (default) or ‘max’.

abstract read_waveforms()[source]
set_network(network)[source]

Update self.network to the new desired network.

Parameters:

network (dataset.Network instance) – The Network instance used to query consistent data accross all templates.

set_source_receiver_dist()[source]

Compute the source-receiver distances for template.

property stations
property waveforms_arr
class BPMF.dataset.Network(network_file)[source]

Bases: object

Station metadata.

Contains station metadata.

box(lat_min, lat_max, lon_min, lon_max)[source]

Geographical selection of sub-network.

Parameters:
  • lat_min (scalar, float) – Minimum latitude of the box.

  • lat_max (scalar, float) – Maximum latitude of the box.

  • lon_min (scalar, float) – Minimum longitude of the box.

  • lon_max (scalar, float) – Maximum longitude of the box.

Returns:

subnet – The Network instance restricted to the relevant stations.

Return type:

Network instance

datelist()[source]
property depth
property elevation
property interstation_distances

Compute the distance between all station pairs.

Returns:

A pandas DataFrame containing the interstation distances.

The rows and columns represent the station names, and the values represent the distances in kilometers.

Return type:

DataFrame

property latitude
property longitude
property n_components
property n_stations
property networks
plot_map(ax=None, figsize=(20, 10), **kwargs)[source]

Plot stations on map.

Parameters:
  • ax (plt.Axes, default to None) – If None, create a new plt.Figure and plt.Axes instances. If speficied by user, use the provided instance to plot.

  • figsize (tuple of floats, default to (20, 10)) – Size, in inches, of the figure (width, height).

Returns:

fig – The map with seismic stations.

Return type:

plt.Figure

read()[source]

Reads the metadata from the file at self.where

Note: This function can be modified to match the user’s data convention.

property station_indexes
property stations
stations_idx(stations)[source]
subset(stations, components, method='keep')[source]
Parameters:
  • stations (list or array of strings) – Stations to keep or discard, depending on the method.

  • components (list or array of strings) – Components to keep or discard, depending on the method.

  • method (string, default to 'keep') – Should be ‘keep’ or ‘discard’. If ‘keep’, the stations and components provided to this function are what will be left in the subnetwork. If ‘discard’, the stations and components provided to this function are what won’t be featured in the subnetwork.

Return type:

subnetwork

class BPMF.dataset.Stack(stacked_traces, moveouts, stations, phases, latitude=None, longitude=None, depth=None, component_aliases={'E': ['E', '2'], 'N': ['N', '1'], 'Z': ['Z']}, sampling_rate=None, components=['N', 'E', 'Z'], aux_data={}, id=None, filtered_data=None)[source]

Bases: Event

A modification of the Event class for stacked events.

pick_PS_phases_family_mode(duration, threshold_P=0.6, threshold_S=0.6, mini_batch_size=126, phase_on_comp={'1': 'S', '2': 'S', 'E': 'S', 'N': 'S', 'Z': 'P'}, upsampling=1, downsampling=1, ml_model_name='original', ml_model=None, **kwargs)[source]

Use PhaseNet (Zhu et al., 2019) to pick P and S waves.

This method picks P- and S-wave arrivals on every 3-component seismogram in self.filtered_data. Thus, potentially many picks can be found on every station and all of them returned in self.picks.

Parameters:
  • duration (float) – Duration, in seconds, of the time window to process to search for P and S wave arrivals.

  • threshold_P (float, optional) – Threshold on PhaseNet’s probabilities to trigger the identification of a P-wave arrival. Default: 0.60

  • threshold_S (float, optional) – Threshold on PhaseNet’s probabilities to trigger the identification of an S-wave arrival. Default: 0.60

  • mini_batch_size (int, optional) – Number of traces processed in a single batch by PhaseNet. This shouldn’t have to be tuned. Default: 126

  • phase_on_comp (dict, optional) – Dictionary defining which seismic phase is extracted on each component. For example, phase_on_comp[‘N’] gives the phase that is extracted on the north component.

  • upsampling (int, optional) – Upsampling factor applied before calling PhaseNet. Default: 1

  • downsampling (int, optional) – Downsampling factor applied before calling PhaseNet. Default: 1

Return type:

None

Notes

  • PhaseNet must be used with 3-comp data.

  • If self.filtered_data does not exist, self.pick_PS_phases is used

on the stacked traces.

read_waveforms(duration, phase_on_comp={'1': 'S', '2': 'S', 'E': 'S', 'N': 'S', 'Z': 'P'}, offset_phase={'P': 1.0, 'S': 4.0}, time_shifted=True, offset_ot=-10.0)[source]

Read waveform data.

Parameters:
  • duration (float) – Duration, in seconds, of the extracted time windows.

  • phase_on_comp (dict, optional) – Dictionary defining which seismic phase is extracted on each component. For example, phase_on_comp[‘N’] gives the phase that is extracted on the north component.

  • offset_phase (dict, optional) – Dictionary defining when the time window starts with respect to the pick. A positive offset means the window starts before the pick. Not used if time_shifted is False.

  • time_shifted (bool, optional) – If True, the moveouts are used to extract time windows from specific seismic phases. If False, windows are simply extracted with respect to the origin time.

  • offset_ot (float, optional) – Only used if time_shifted is False. Time, in seconds, taken before origin_time.

Return type:

None

Notes

The waveforms are read from self.stacked_waveforms and are formatted as obspy.Stream instances that populate the self.traces attribute.

class BPMF.dataset.Template(origin_time, moveouts, stations, phases, template_filename, template_path, latitude=None, longitude=None, depth=None, sampling_rate=None, components=['N', 'E', 'Z'], id=None)[source]

Bases: Event

A class for template events.

The Template class is a subclass of the Event class and is specifically designed for template events. It inherits all the attributes and methods of the Event class and adds additional functionality specific to template events.

Parameters:
  • origin_time (str or datetime) – The origin time of the template event. Can be specified as a string in ISO 8601 format or as a datetime object.

  • moveouts (pandas.DataFrame) – The moveout table containing the travel time information for the template event. The moveouts table should have the following columns: ‘Phase’, ‘Station’, ‘Distance’, and ‘TravelTime’.

  • stations (list of str) – The list of station names associated with the template event.

  • phases (list of str) – The list of phase names associated with the template event.

  • template_filename (str) – The filename of the template waveform file.

  • template_path (str) – The path to the directory containing the template waveform file.

  • latitude (float or None, optional) – The latitude of the template event in decimal degrees. If None, latitude information is not provided. Defaults to None.

  • longitude (float or None, optional) – The longitude of the template event in decimal degrees. If None, longitude information is not provided. Defaults to None.

  • depth (float or None, optional) – The depth of the template event in kilometers. If None, depth information is not provided. Defaults to None.

  • sampling_rate (float or None, optional) – The sampling rate of the template waveforms in Hz. If None, the sampling rate is not provided. Defaults to None.

  • components (list of str, optional) – The list of component names associated with the template waveforms. Defaults to [‘N’, ‘E’, ‘Z’].

  • id (str or None, optional) – The ID of the template event. If None, no ID is assigned. Defaults to None.

distance(longitude, latitude, depth)[source]

Compute the distance between the template and a given location.

This function calculates the distance between the longitude, latitude, and depth of the template and the longitude, latitude, and depth of a target location using the two_point_distance function from the utils module.

Parameters:
  • longitude (float) – The longitude of the target location.

  • latitude (float) – The latitude of the target location.

  • depth (float) – The depth of the target location in kilometers.

Returns:

distance – The distance, in kilometers, between the template and the target location.

Return type:

float

find_monochromatic_traces(autocorr_peak_threshold=0.33, num_peaks_criterion=5, taper_pct=5.0, max_lag_samp=None, verbose=True)[source]

Detect traces in seismic waveforms with dominant monochromatic signals.

Parameters:
  • autocorr_peak_threshold (float, optional) – Threshold for peak detection in the autocorrelation function. Default is 0.33.

  • num_peaks_criterion (int, optional) – Minimum number of peaks required above the threshold for a trace to be considered monochromatic. Default is 5.

  • taper_pct (float, optional) – Percentage of the signal to taper at both ends using a Tukey window function. Default is 5.0.

  • max_lag_samp (int or None, optional) – Maximum lag samples to consider in the autocorrelation function. If None, uses the entire autocorrelation. Default is None.

  • verbose (bool, optional) – If True, displays a warning about the experimental nature of the method. Default is True.

Returns:

  • monochromatic (pandas.DataFrame) – A DataFrame indicating whether each trace is monochromatic or not. Rows represent stations and columns represent components.

  • num_peaks_above_threshold (pandas.DataFrame) – A DataFrame containing the number of peaks above the threshold for each trace.

classmethod init_from_event(event, attach_waveforms=True)[source]

Instantiate a Template object from an Event object.

This class method creates a Template object based on an existing Event object. It converts the Event object into a Template object by transferring the relevant attributes and data.

Parameters:
  • event (Event instance) – The Event instance to convert to a Template instance.

  • attach_waveforms (boolean, optional) – Specifies whether to attach the waveform data to the Template instance. If True, the waveform data is attached. If False, the waveform data is not attached. Defaults to True.

Returns:

template – The Template instance based on the provided Event instance.

Return type:

Template instance

property moveouts_arr

Return a moveout array given self.components and phase_on_comp.

property moveouts_win
plot_detection(idx, filename=None, db_path=None, duration=60.0, phase_on_comp={'1': 'S', '2': 'S', 'E': 'S', 'N': 'S', 'Z': 'P'}, offset_ot=10.0, **kwargs)[source]

Plot the idx-th detection made with this template.

Parameters:
  • idx (int) – Index of the detection to plot.

  • filename (str, optional) – Name of the detection file. If None, use the standard file and folder naming convention. Defaults to None.

  • db_path (str, optional) – Name of the directory where the detection file is located. If None, use the standard file and folder naming convention. Defaults to None.

  • duration (float, optional) – Duration of the waveforms to plot, in seconds. Defaults to 60.0.

  • phase_on_comp (dict, optional) – Dictionary specifying the phase associated with each component. The keys are component codes, and the values are phase codes. Defaults to {“N”: “S”, “1”: “S”, “E”: “S”, “2”: “S”, “Z”: “P”}.

  • offset_ot (float, optional) – Offset in seconds to apply to the origin time of the event when retrieving the waveforms. Defaults to 10.0.

  • **kwargs – Additional keyword arguments to be passed to the Event.plot method.

Returns:

fig – Figure instance produced by this method.

Return type:

matplotlib.figure.Figure

plot_recurrence_times(ax=None, annotate_axes=True, unique=False, figsize=(20, 10), **kwargs)[source]

Plot recurrence times vs detection times.

Parameters:
  • ax (plt.Axes, optional) – If not None, use this plt.Axes instance to plot the data.

  • annotate_axes (bool, optional) – Whether to annotate the axes with labels. Defaults to True.

  • figsize (tuple, optional) – Figure size (width, height) in inches. Defaults to (20, 10).

  • **kwargs – Additional keyword arguments to be passed to the plt.plot function.

Returns:

fig – Figure instance produced by this method.

Return type:

matplotlib.figure.Figure

read_catalog(filename=None, db_path=None, gid=None, extra_attributes=[], fill_value=nan, return_events=False, check_summary_file=True, compute_return_times=True)[source]

Build a Catalog instance from detection data.

This function builds a Catalog instance by reading detection data from a file specified by filename and db_path, or by checking for an existing summary file. It supports reading additional attributes specified by extra_attributes and provides options to control the behavior of filling missing values with fill_value and returning a list of Event instances with return_events.

Parameters:
  • filename (str, optional) – Name of the detection file. If None, the standard file and folder naming convention will be used. Defaults to None.

  • db_path (str, optional) – Name of the directory where the detection file is located. If None, the standard file and folder naming convention will be used. Defaults to None.

  • gid (str, int, or float, optional) – If not None, this is the HDF5 group where the data will be read from. Defaults to None.

  • extra_attributes (list of str, optional) – Additional attributes to read in addition to the default attributes (‘longitude’, ‘latitude’, ‘depth’, and ‘origin_time’). Defaults to an empty list.

  • fill_value (str, int, or float, optional) – Default value to fill missing target attributes. Defaults to np.nan.

  • return_events (bool, optional) – If True, a list of Event instances will be returned. This can only be True if check_summary_file is set to False. Defaults to False.

  • check_summary_file (bool, optional) – If True, it checks if the summary HDF5 file already exists and reads from it using the standard naming convention. If False, it builds the catalog from the detection output. Defaults to True.

  • compute_return_times (bool, optional) – If True, it computes inter-event times. Because events detected with the same template are co-located, these inter-event times can be called return times.

Notes

  • If return_events is set to True, check_summary_file must be set to False.

  • The read_catalog function first checks if the summary file exists.

  • The resulting Catalog instance is assigned to self.catalog, and additional attributes such as ‘tid’ and ‘event_id’ are set accordingly.

classmethod read_from_file(filename, db_path='', gid=None)[source]

Initialize a Template instance from a file.

This class method reads a file and initializes a Template instance based on the content of the file. It utilizes the Event class method read_from_file to read the file and convert it into an Event instance. The Event instance is then converted into a Template instance using the init_from_event class method.

Parameters:
  • filename (str) – The name of the file to read and initialize the Template instance.

  • db_path (str, optional) – The path to the directory containing the file. Defaults to cfg.INPUT_PATH.

  • gid (str, optional) – The name of the hdf5 group where the file is stored. Defaults to None.

Returns:

template – The Template instance initialized from the specified file.

Return type:

Template instance

read_waveforms(stations=None, components=None)[source]

Read the waveforms time series.

This function reads the waveforms from the stored data file and initializes the self.traces attribute with the waveform data.

Parameters:
  • stations (list of str, default None) – The list of station names for which to read the waveforms. If None, all stations in self.stations will be read.

  • components (list of str, default None) – The list of component names for which to read the waveforms. If None, all components in self.components will be read.

Notes

  • The waveform data is stored in the self.traces attribute as an obspy.Stream object.

  • The starttime of each trace is set based on the origin time of the event and the moveout values stored in the moveouts_win attribute.

  • The set_availability method is called to update the availability of the waveforms for the selected stations.

property template_idx
property tid
write(db_filename, db_path='', save_waveforms=True, gid=None, overwrite=False)[source]

Write the Template instance to an HDF5 file.

This function writes the Template instance to an HDF5 file specified by db_filename and db_path. It uses the Event.write method to handle the majority of the writing process.

Parameters:
  • db_filename (str) – Name of the HDF5 file to store the Template instance.

  • db_path (str, optional) – Path to the directory where the HDF5 file will be located. Defaults to cfg.OUTPUT_PATH.

  • save_waveforms (bool, optional) – Flag indicating whether to save the waveforms in the HDF5 file. Defaults to True.

  • gid (str, optional) – Name of the HDF5 group under which the Template instance will be stored. If None, the Template instance will be stored directly in the root of the HDF5 file. Defaults to None.

  • overwrite (bool, optional) – Flag indicating whether to overwrite an existing HDF5 file with the same name. If True, the existing file will be removed before writing. If False, the write operation will be skipped. Defaults to False.

Notes

  • The write function sets the where attribute of the Template instance to the full path of the HDF5 file.

  • If overwrite is True and an existing file with the same name already exists, it will be removed before writing the Template instance.

  • The majority of the writing process is handled by the Event.write method, which is called with appropriate arguments.

write_summary(attributes, filename=None, db_path=None, overwrite=True)[source]

Write the summary of template characteristics to an HDF5 file.

This function writes the summary of template characteristics, e.g. the characteristics of the detected events, specified in the attributes dictionary to an HDF5 file specified by filename and db_path. It supports overwriting existing datasets or groups with overwrite option.

Parameters:
  • attributes (dict) – Dictionary containing scalars, numpy arrays, dictionaries, or pandas dataframes. The keys of the dictionary are used to name the dataset or group in the HDF5 file.

  • filename (str, optional) – Name of the detection file. If None, the standard file and folder naming convention will be used. Defaults to None.

  • db_path (str, optional) – Name of the directory where the detection file is located. If None, the standard file and folder naming convention will be used. Defaults to None.

  • overwrite (bool, optional) – If True, existing datasets or groups will be overwritten. If False, they will be skipped. Defaults to True.

class BPMF.dataset.TemplateGroup(templates, network, source_receiver_dist=True)[source]

Bases: Family

A class for a group of templates.

Each template is represented by a dataset.Template instance.

box(lon_min, lon_max, lat_min, lat_max, inplace=False)[source]

Keep templates inside the requested geographic bounds.

Parameters:
  • lon_min (float) – Minimum longitude in decimal degrees.

  • lon_max (float) – Maximum longitude in decimal degrees.

  • lat_min (float) – Minimum latitude in decimal degrees.

  • lat_max (float) – Maximum latitude in decimal degrees.

  • inplace (bool, default False) – If True, perform the operation in-place by modifying the existing TemplateGroup. If False, create and return a new TemplateGroup with the filtered templates.

Returns:

template_group – The TemplateGroup instance containing templates within the specified geographic bounds. This is returned only when inplace=False.

Return type:

TemplateGroup instance

compute_dir_errors()[source]

Compute the length of the uncertainty ellipsoid in the inter-template direction.

This method calculates the length of the uncertainty ellipsoid in the inter-template direction for each pair of templates in the TemplateGroup. The inter-template direction is defined as the direction linking two templates. The length of the uncertainty ellipsoid is computed based on the covariance matrix of each template.

The computed errors can be read at the self.dir_errors property of the TemplateGroup. It is a pandas DataFrame with dimensions (n_templates, n_templates), where each entry represents the length of the uncertainty ellipsoid in kilometers.

Example: self.dir_errors.loc[tid1, tid2] is the width of template tid1’s uncertainty ellipsoid in the direction of template tid2.

Returns:

The computed errors can be read at the self.dir_errors property of the TemplateGroup instance.

Return type:

None

compute_ellipsoid_dist()[source]

Compute the separation between uncertainty ellipsoids in the inter-template direction.

This method calculates the separation between the uncertainty ellipsoids in the inter-template direction for each pair of templates in the TemplateGroup instance. The separation is computed as the difference between the inter-template distances and the directional errors. It can be negative if the uncertainty ellipsoids overlap.

The computed separations can be read at the self.ellipsoid_dist property of the TemplateGroup instance. It is a pandas DataFrame with dimensions (n_templates, n_templates), where each entry represents the separation between the uncertainty ellipsoids in kilometers.

Returns:

The computed separations can be read at the self.ellipsoid_dist property of the TemplateGroup instance.

Return type:

None

compute_intertemplate_cc(distance_threshold=5.0, n_stations=10, max_lag=10, save_cc=False, compute_from_scratch=False, device='cpu', progress=False, output_filename='intertp_cc.h5')[source]

Compute the pairwise template cross-correlations (CCs).

This method computes the pairwise cross-correlations (CCs) between templates in the TemplateGroup.

The CCs measure the similarity between the waveforms of different templates. CCs are computed only for template pairs that are separated by less than distance_threhold. The number of closest stations to include and the maximum lag for searching the maximum CC are configurable parameters.

Parameters:
  • distance_threshold (float, default to 5.0) – The distance threshold, in kilometers, between two uncertainty ellipsoids under which the CC is computed.

  • n_stations (int, default to 10) – The number of closest stations to each template that are used in the computation of the average CC.

  • max_lag (int, default to 10) – The maximum lag, in samples, allowed when searching for the maximum CC on each channel. This parameter accounts for small discrepancies in windowing that could occur for two templates highly similar but associated with slightly different locations.

  • save_cc (bool, default to False) – If True, save the inter-template CCs in the same folder as the first template (self.templates[0]) with filename ‘output_filename’.

  • compute_from_scratch (bool, default to False) – If True, force the computation of the inter-template CCs from scratch. This is useful when the user knows that the computation is faster than reading a potentially large file.

  • device (str, default to 'cpu') – The device to use for the computation. Can be either ‘cpu’ or ‘gpu’.

  • progress (bool, default to False) – If True, print a progress bar using tqdm.

  • output_filename (str, default to 'intertp_cc.h5') – The filename to use when saving the inter-template CCs.

Returns:

The computed inter-template CCs can be read at the self.intertemplate_cc property of the TemplateGroup instance.

Return type:

None

compute_intertemplate_dist()[source]

Compute the template-pairwise distances in kilometers.

This method calculates the distances between all pairs of templates in the TemplateGroup. The distances are computed based on the longitude, latitude, and depth of each template.

The computed distances are then read at the self.intertemplate_dist property of the TemplateGroup instance.

Returns:

The computed distances can be accessed at the self.intertemplate_dist property of the TemplateGroup.

Return type:

None

property dir_errors
property ellipsoid_dist
property intertemplate_cc
property intertemplate_dist
n_best_SNR_stations(n, available_stations=None)[source]

Adjust self.stations on each template to the n best SNR stations.

This method calls template.n_best_SNR_stations for each Template instance in self.templates. For each template, it finds the n best SNR stations and modify template.stations accordingly.

Parameters:
  • n (int) – The n best SNR stations.

  • available_stations (list of str, optional) – The list of stations from which we search the closest stations. If some stations are known to lack data, the user may choose to not include these in the best SNR stations. Defaults to None.

Return type:

None

n_closest_stations(n, available_stations=None)[source]

Adjust self.stations on each template to the n closest stations.

This method calls template.n_closest_stations for each Template instance in self.templates. For each template, it finds the n closest stations and modify template.stations accordingly.

Parameters:
  • n (int) – The n closest stations.

  • available_stations (list of str, optional) – The list of stations from which we search the closest stations. If some stations are known to lack data, the user may choose to not include these in the closest stations. Defaults to None.

Return type:

None

property n_templates
property network_to_template_map
plot_detection(idx, **kwargs)[source]

Plot the idx-th event in self.catalog.catalog.

This method identifies which template detected the idx-th event and calls template.plot_detection with the appropriate arguments.

Parameters:

idx (int) – Event index in self.catalog.catalog.

Returns:

fig – The figure showing the detected event.

Return type:

matplotlib.pyplot.Figure

plot_recurrence_times(figsize=(15, 7), progress=False, **kwargs)[source]

Plot recurrence times vs detection times, template-wise.

This method calls template.plot_recurrence_times for every Template instance in self.templates. Thus, the recurrence time is defined template-wise.

Parameters:
  • figsize (tuple of floats, optional) – Size in inches of the figure (width, height). Defaults to (15, 7).

  • progress (boolean, optional) – If True, print progress bar with tqdm. Defaults to False.

read_catalog(extra_attributes=[], fill_value=nan, progress=False, **kwargs)[source]

Build a catalog from all templates’ detections.

Parameters:
  • extra_attributes (list of str, optional) – Additional attributes to read in addition to the default attributes of ‘longitude’, ‘latitude’, ‘depth’, and ‘origin_time’.

  • fill_value (str, int, or float, optional) – Default value to use if the target attribute does not exist in a template’s detections.

  • progress (bool, optional) – If True, display a progress bar during the operation.

  • **kwargs – Additional keyword arguments to be passed to the read_catalog method of each template.

Return type:

None

Notes

This method reads the detection information from each template in the TemplateGroup and builds a catalog that contains the combined detections from all templates.

The resulting catalog will include the default attributes of ‘longitude’, ‘latitude’, ‘depth’, and ‘origin_time’, as well as any additional attributes specified in the extra_attributes parameter.

If a template does not have a specific attribute in its detections, it will be assigned the fill_value for that attribute.

The progress parameter controls whether a progress bar is displayed during the operation. Setting it to True provides visual feedback on the progress of reading the catalogs.

classmethod read_from_files(filenames, network, gids=None, **kwargs)[source]

Initialize the TemplateGroup instance given a list of filenames.

Parameters:
  • filenames (list of str) – List of full file paths from which to instantiate the list of dataset.Template objects.

  • network (dataset.Network instance) – The Network instance used to query consistent data across all templates.

  • gids (list of str, optional) – If provided, this should be a list of group IDs where the template data is stored in their HDF5 files.

Returns:

template_group – The initialized TemplateGroup instance.

Return type:

TemplateGroup instance

read_waveforms(n_threads=1, progress=False)[source]

Read waveforms for all templates in the TemplateGroup.

This method calls template.read_waveforms for each Template instance in self.templates.

Parameters:
  • n_threads (int, optional) – The number of threads to use for parallel execution. If set to 1 (default), the waveforms will be read sequentially. If set to a value greater than 1, the waveforms will be read in parallel using multiple threads. If set to 0, None, or “all”, the method will use all available CPUs for parallel execution.

  • progress (bool, optional) – If True, a progress bar will be displayed during the waveform reading process.

Return type:

None

remove_multiples(n_closest_stations=10, dt_criterion=4.0, distance_criterion=1.0, speed_criterion=5.0, similarity_criterion=-1.0, progress=False, **kwargs)[source]

Search for events detected by multiple templates and flag them.

Parameters:
  • n_closest_stations (int, optional) – The number of stations closest to each template used in the calculation of the average cross-correlation (cc).

  • dt_criterion (float, optional) – The time interval, in seconds, under which two events are examined for redundancy.

  • distance_criterion (float, optional) – The distance threshold, in kilometers, between two uncertainty ellipsoids under which two events are examined for redundancy.

  • speed_criterion (float, optional) – The speed criterion, in km/s, below which the inter-event time and inter-event distance can be explained by errors in origin times and a reasonable P-wave speed.

  • similarity_criterion (float, optional) – The template similarity threshold, in terms of average cc, over which two events are examined for redundancy. The default value of -1 means that similarity is not taken into account.

  • progress (bool, optional) – If True, print progress bar with tqdm.

  • **kwargs – Additional keyword arguments.

Return type:

None

Notes

This method creates a new entry in each Template instance’s catalog where unique events are flagged: self.templates[i].catalog.catalog[‘unique_events’]

set_network_to_template_map()[source]

Compute the map between network arrays and template data.

Template data are broadcasted to fit the dimensions of the network arrays. This method computes the network_to_template_map that tells which stations and channels are used on each template. For example: network_to_template_map[t, s, c] = False means that station s and channel c are not used on template t.

Parameters:

None

Return type:

None