BPMF.utils
- BPMF.utils.SVDWF(matrix, expl_var=0.4, max_singular_values=5, freqmin=-10.0, freqmax=-10.0, sampling_rate=-10.0, wiener_filter_colsize=None)[source]
Implementation of the Singular Value Decomposition Wiener Filter (SVDWF) described in Moreau et al 2017.
Note: The SVD-WF-filtered matrix is filtered between freqmin and freqmax before being returned.
- Parameters:
matrix ((n x m) numpy.ndarray) – n is the number of events, m is the number of time samples per event.
expl_var (float, optional) – Fraction of the total variance explained by the retained singular vectors. Defaults to 0.4.
max_singular_values (int, optional) – Number of singular values to retain in the SVD decomposition of matrix. Defaults to 5.
freqmin (float, optional) – Minimum target frequency, in Hz. Defaults to BPMF.cfg.MIN_FREQ_HZ.
freqmax (float, optional) – Maximum target frequency, in Hz. Defaults to BPMF.cfg.MAX_FREQ_HZ.
sampling_rate (float, optional) – Sampling rate of matrix. Defaults to BPMF.cfg.SAMPLING_RATE_HZ.
wiener_filter_colsize (int, optional) – Size of the wiener filter in the outermost dimension, that is, the event axis. Defaults to None, in which case the filter has length n.
- Returns:
filtered_data – The matrix filtered through the SVD procedure.
- Return type:
(n x m) numpy array
- BPMF.utils.SVDWF_multiplets(tid, db_path='', db_path_M='matched_filter_1', db_path_T='template_db_1', best=False, norm_rms=True, max_singular_values=5, expl_var=0.4, freqmin=-10.0, freqmax=-10.0, sampling_rate=-10.0, wiener_filter_colsize=None, attach_raw_data=False, detection_waveforms=None)[source]
- Parameters:
tid (scalar integer,) – Template id.
db_path (string, default to cfg.INPUT_PATH) – Root path of the database.
db_path_M (string, default to 'matched_filter_1') – Name of the folder where matched-filtering results are stored.
db_path_T (string, default to 'template_db_1') – Name of the folder where template files are stored.
best (boolean, default to False) – If True, only use the detections with higher correlation coefficients.
norm_rms (boolean, default to True) – If True, individual event are RMS-normalized before stacking.
max_singular_values (scalar integer, default to 5) – Disregarding how many singular values are needed to reconstruct 100xexp_var% of the variance of the detection matrix, the number of singular values will not be larger than max_singular_values.
max_freq (scalar float, default to cfg.MAX_FREQ_HZ) – The maximum frequency of the data, or maximum target frequency, is used to determined the size in the time axis of the Wiener filter.
wiener_filter_colsize (scalar integer, default to None,) – Size of the wiener filter in the vertical direction (i.e. in the observation axis). If set to None, then it will be equal to the number of rows.
attach_raw_data (boolean, default to False.) – If True, the data extracted during the matched-filter search are returned.
- Returns:
S – Return the SVDWF-processed detections traces in the format of an obspy Stream. Stacked traces can be found in the obspy Traces, and the filtered (and raw, if attach_raw_data is True) data matrix is returned as an attribute. See also all the useful metadata in the attributes.
- Return type:
obspy Stream,
- BPMF.utils.bandpass_filter(X, filter_order=4, freqmin=-10.0, freqmax=-10.0, f_Nyq=-5.0, taper_alpha=0.01, zerophase=True)[source]
- Parameters:
X ((n x m) numpy array,) – Numpy array of n observations of m samples each. Use X.reshape(1, -1) if you want to process a single observation.
filter_order (integer scalar, default to 4,) – Order/number of corners of the bandpass filter.
freqmin (scalar float, default to cfg.MIN_FREQ_HZ,) – Low frequency cutoff.
freqmax (scalar float, default to cfg.MAX_FREQ_HZ,) – High frequency cutoff.
f_Nyq (scalar float, default to cfg.SAMPLING_RATE_HZ/2,) – Nyquist frequency of the data. By definition, the Nyquist frequency is half the sampling rate.
taper_alpha (scalar float, default to 0.01,) – Defines the sharpness of the edges of the taper function. We use the tukey window function. The default value of 0.01 produces sharp windows.
zerophase (bool, optional) – If True, the filter is applied in the forward and reverse direction to cancel out phase shifting.
- Returns:
filtered_X – The input array X after filtering.
- Return type:
(n x m) numpy array,
- BPMF.utils.compute_distances(source_longitudes, source_latitudes, source_depths, receiver_longitudes, receiver_latitudes, receiver_depths, return_epicentral_distances=False)[source]
Fast distance computation between all source points and all receivers.
This function uses cartopy.geodesic.Geodesic to compute pair-wise distances between source points and receivers. It computes both hypocentral distances and, if specified, epicentral distances.
- Parameters:
source_longitudes (numpy.ndarray or list) – Longitudes, in decimal degrees, of the source points.
source_latitudes (numpy.ndarray or list) – Latitudes, in decimal degrees, of the source points.
source_depths (numpy.ndarray or list) – Depths, in kilometers, of the source points.
receiver_longitudes (numpy.ndarray or list) – Longitudes, in decimal degrees, of the receivers.
receiver_latitudes (numpy.ndarray or list) – Latitudes, in decimal degrees, of the receivers.
receiver_depths (numpy.ndarray or list) – Depths, in kilometers, of the receivers. Negative depths indicate receivers located at the surface.
return_epicentral_distances (bool, optional) – Flag indicating whether to return epicentral distances in addition to hypocentral distances. Default is False.
- Returns:
hypocentral_distances (numpy.ndarray) – Array of hypocentral distances between source points and receivers. The shape of the array is (n_sources, n_receivers).
epicentral_distances (numpy.ndarray, optional) – Array of epicentral distances between source points and receivers. This array is returned only if return_epicentral_distances is True. The shape of the array is (n_sources, n_receivers).
- BPMF.utils.cov_mat_intersection(cov_mat, axis1=0, axis2=1)[source]
Compute intersection between covariance matrix and plane.
Note that we assume the following coordinate system: - X: westward - Y: southward - Z: upward
- Parameters:
cov_mat (numpy.ndarray) – The (3x3) covariance matrix returned by Event.relocate(method=’NLLoc’).
axis1 (integer, optional) – Index of the first axis defining the intersecting plane.
axis2 (integer, optional) – Index of the second axis defining the intersecting plane.
- Returns:
max_unc (float) – Maximum uncertainty, in km, of the intersected covariance matrix.
min_unc (float) – Minimum uncertainty, in km, of the intersected covariance matrix.
az_max (float) – “Azimuth”, that is, angle from axis2 of maximum uncertainty.
az_min (float) – “Azimuth”, that is, angle from axis2 of minimum uncertainty.
- BPMF.utils.event_count(event_timings_str, start_date, end_date, freq='1D', offset=0.0, trim_start=True, trim_end=False, mode='end')[source]
- Parameters:
event_timings_str (list of array of str) – Timings of the events given as strings of characters.
start_date (str) – Starting date of the event count time series.
end_date (str) – End date of the event count time series.
freq (str, default to '1D') – Desired frequency of the event count time series. Default is one day.
offset (float, default to 0.) – Fraction of the frequency used for defining the beginning of each bin. For example, offset=0.5 with freq=’1D’ will return daily event counts from noon to noon.
mode (str, default to 'end') – Can be ‘end’ or ‘beginning’. This string defines whether the seismicity counted between time 1 and time 2 is indexed at time 2 (‘end’) or time 1 (‘beginning’).
- Returns:
event_count – Pandas Series with temporal indexes defined by freq and base, and values given by the event count. To get a numpy array from this Pandas Series, use: event_count.values
- Return type:
Pandas Series
- BPMF.utils.extract_colors_from_tree(dendogram, labels, color_singleton)[source]
Routine to build the map from cluster ids to dendogram colors.
- Parameters:
dendogram (dendogram from scipy.hierarchy.dendogram.) –
labels ((n_samples) numpy array,) – Labels, or cluster ids, returned by scipy.hierarchy.fcluster. For each of the initial n_samples samples, this function returns its cluster membership. labels[i] = k, means that data point i belongs to cluster k.
color_singleton (string,) – Color given to the singleton when calling scipy.hierarchy.dendogram.
- Returns:
cluster_colors – Map between cluster id and plotting color. cluster_colors[k] = ‘C0’ means that cluster k is colored in ‘C0’.
- Return type:
dictionary,
- BPMF.utils.fetch_detection_waveforms(tid, db_path_T, db_path_M, db_path='', best_CC=False, max_n_events=0, norm_rms=True, ordering='correlation_coefficients', flip_order=True, selection=None, return_event_ids=False, unique_events=False, catalog=None)[source]
- BPMF.utils.fetch_detection_waveforms_refilter(tid, db_path_T, db_path_M, net, db_path='', best_CC=False, max_n_events=0, norm_rms=True, freqmin=0.5, freqmax=12.0, target_SR=50.0, integrate=False, t0='detection_time', ordering='correlation_coefficients', flip_order=True, **preprocess_kwargs)[source]
- BPMF.utils.find_picks(phase_probability, threshold, **kwargs)[source]
Find phase picks from time series of phase probability.
Phase picks are given by the peaks exceeding threshold in the time series phase_probability. The probability neighborhood around each peak is used as the pdf of a given pick measurement.
Note: Additional key-word arguments are passed to ‘scipy.signal.find_peaks’.
- Parameters:
phase_probability (array-like) – Probability time series of observing the arrival of a given seismic phase.
threshold (float) – Value above which peaks are taken to be candidate phase picks.
- Returns:
peaks_value (numpy.ndarray) – Probability value of the selected peaks.
peaks_mean (numpy.ndarray) – Expected pick timing, in samples.
peaks_std (numpy.ndarray) – Uncertainty on pick timing, in samples.
- BPMF.utils.find_template_clusters(TpGroup, method='single', metric='correlation', criterion='distance', clustering_threshold=0.33, color_singleton='dimgray', ax_dendogram=None)[source]
Find non-overlapping groups of similar templates with the hierarchical clustering package from scipy.
- BPMF.utils.get_moveout_array(tts, stations, phases)[source]
Format the travel times into a numpy.ndarray.
- Parameters:
tts (dictionary) – Output of load_travel_times.
stations (list of strings) – List of station names. Determine the order in which travel times are written in the output array.
phases (list of strings) – List of seismic phases. Determine the order in which travel times are writtem in the output array.
- Returns:
moveout_arr – Numpy array of travel times, in the order specified by stations and phases. At this stage, moveout_arr should still be in units of seconds.
- Return type:
(n_sources, n_stations, n_phases) numpy.ndarray
- BPMF.utils.get_np_array(stream, stations, components=['N', 'E', 'Z'], priority='HH', n_samples=None, component_aliases={'E': ['E', '2'], 'N': ['N', '1'], 'Z': ['Z']}, verbose=True)[source]
Fetch data from Obspy Stream and returns an ndarray.
- Parameters:
stream (Obspy Stream instance) – The Obspy Stream instance with the waveform time series.
stations (list of strings) – Names of the stations to include in the output array. Define the order of the station axis.
components (list of strings, default to ['N','E','Z']) – Names of the components to include in the output array. Define the order of the component axis.
component_aliases (dictionary, optional) – Sometimes, components might be named differently than N, E, Z. This dictionary tells the function which alternative component names can be associated with each “canonical” component. For example, component_aliases[‘N’] = [‘N’, ‘1’] means that the function will also check the ‘1’ component in case the ‘N’ component doesn’t exist.
priority (string, default to 'HH') – When a station has multiple instruments, this string tells which channel to use in priority.
n_samples (scalar int, default to None) – Duration, in samples, of the output numpy.ndarray. Select the n_samples first samples of each trace. If None, take n_samples as the length of the first trace.
verbose (boolean, default to True) – If True, print extra output in case the target data cannot be fetched.
- Returns:
data – The waveform time series formatted as an numpy.ndarray.
- Return type:
(n_stations, n_components, n_samples) numpy.ndarray
- BPMF.utils.get_picks(picks, buffer_length=-20, prior_knowledge=None, search_win_samp=-40)[source]
Select a single P- and S-pick on each 3-comp seismogram.
- Parameters:
picks (pandas.DataFrame) – pandas.DataFrame as formatted in BPMF.dataset.pick_PS_phases.
buffer_length (scalar int, optional) – Picks that are before this buffer length, in samples, are discarded.
prior_knowledge (pandas.DataFrame, optional) – If given, picks that are closer to the a priori pick (for example, given by a preliminary location) will be given a larger weight and will be more likely to be selected. In practice, pick probabilities are multiplied by gaussian weights and the highest modified pick probability is selected.
search_win_samp (scalar int, optional) – Standard deviation, in samples, used in the gaussian weights.
- Returns:
picks – pandas.DataFrame with a single P- and S-wave pick per channel.
- Return type:
pandas.DataFrame
- BPMF.utils.linear_regression(x, y)[source]
cf. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html
- Returns:
a (slope)
b (intercept)
r_val (correlation coefficient, usually) – people use the coefficient of determination R**2 = r_val**2 to measure the quality of the fit
p_val (two-sided p-value for a hypothesis test whose null) – hypothesis is that the slope is zero
std_err (standard error of the estimated slope)
- BPMF.utils.load_travel_times(path, phases, source_indexes=None, return_coords=False, stations=None)[source]
Load the travel times from path.
- Parameters:
path (string) – Path to the file with the travel times.
phases (list of strings) – List of the seismic phases to load.
source_indexes ((n_sources,) int numpy.ndarray, default to None) – If not None, this is used to select a subset of sources from the grid.
return_coords (boolean, default to False) – If True, also return the source coordinates.
stations (list of strings, default to None) – If not None, only read the travel times for stations in stations.
- Returns:
tts (dictionary) – Dictionary with one field per phase. tts[phase_n] is itself a dictionary with one field per station.
source_coords (dictionary, optional) – Returned only if return_coords is True. This is a dictionary with three (n_sources,) numpy.ndarray: source_coords[‘latitude’], source_coords[‘longitude’], source_coords[‘depth’].
- BPMF.utils.lowpass_chebyshev_I(X, freqmax, sampling_rate, order=8, max_ripple=5, zerophase=False)[source]
Apply low-pass type I Chebyshev filter.
- Parameters:
X (array-like) – 1D array-like with data to filter.
freqmax (float) – High cut-off frequency, in Hz.
sampling_rate (float) – Sampling rate, in Hz, of X.
order (int, optional) – Order of the filter. Defaults to 8.
max_ripple (int, optional) – Defaults to 5.
zerophase (bool, optional) – If True, the filter is applied in the forward and reverse direction to cancel out phase shifting.
- Returns:
X – 1D array-like with filtered data.
- Return type:
array-like
- BPMF.utils.lowpass_chebyshev_II(X, freqmax, sampling_rate, order=3, min_attenuation_dB=40.0, zerophase=False)[source]
Apply low-pass type II Chebyshev filter.
- Parameters:
X (array-like) – 1D array-like with data to filter.
freqmax (float) – High cut-off frequency, in Hz.
sampling_rate (float) – Sampling rate, in Hz, of X.
order (int, optional) – Order of the filter. Defaults to 3.
min_attenuation_dB (float, optional) – The minimum attenuation required in the stop band, in decibels. Defaults to 40.
zerophase (bool, optional) – If True, the filter is applied in the forward and reverse direction to cancel out phase shifting.
- Returns:
X – 1D array-like with filtered data.
- Return type:
array-like
- BPMF.utils.normalize_batch(seismogram, normalization_window_sample=3000, overlap=0.5)[source]
Apply Z-score normalization in running windows.
Following Zhu et al. 2019, this function applied Z-score normalization in running windows with length normalization_window_sample.
- Parameters:
seismogram (numpy.ndarray) – Three-component seismograms. seismogram has shape (num_traces, num_channels=3, num_time_samples).
normalization_window_sample (integer, optional) – The window length, in samples, over which normalization is applied. Default is 3000 (like in Zhu et al. 2019).
- Returns:
normalized_seismogram – Normalized seismogram with same shape as seismogram.
- Return type:
numpy.ndarray
- BPMF.utils.preprocess_stream(stream, freqmin=None, freqmax=None, target_SR=None, remove_response=False, remove_sensitivity=False, plot_resp=False, target_duration=None, target_starttime=None, target_endtime=None, minimum_length=0.75, minimum_chunk_duration=600.0, verbose=True, SR_decimals=1, decimation_method='simple', allow_oversampling=False, unit='VEL', n_threads=1, **kwargs)[source]
Preprocesses a stream of seismic data.
- Parameters:
stream (obspy.Stream) – A stream of seismic data.
freqmin (float, optional) – The minimum frequency to bandpass filter the data.
freqmax (float, optional) – The maximum frequency to bandpass filter the data.
target_SR (float, optional) – The target sampling rate of the data.
remove_response (bool, optional) – Whether to remove instrument response from the data.
remove_sensitivity (bool, optional) – Whether to remove instrument sensitivity from the data.
plot_resp (bool, optional) – Whether to plot the instrument response of the data.
target_duration (float, optional) – The target duration of the data.
target_starttime (obspy.UTCDateTime, optional) – The start time of the target data.
target_endtime (obspy.UTCDateTime, optional) – The end time of the target data.
minimum_length (float, optional) – The minimum length of the data as a fraction of the target duration.
minimum_chunk_duration (float, optional) – The minimum duration of each data chunk.
verbose (bool, optional) – Whether to print verbose output during processing.
SR_decimals (int, optional) – The number of decimals to round sampling rate values to.
decimation_method (str, optional) – The method used for decimation. Either ‘simple’ or ‘fourier’.
allow_oversampling (bool, optional) – If True, raw traces that have a sampling rate lower than the target sampling rate will be oversampled instead of being discarded.Defaults to False.
unit (str, optional) – The unit of the data.
n_threads (int, optional) – The number of threads over which preprocesing is parallelized.
**kwargs (dict) – Other keyword arguments to pass to the function.
- Returns:
A preprocessed stream of seismic data.
- Return type:
obspy.Stream
- BPMF.utils.read_write_waiting_list(func, path, unit_wait_time=0.2)[source]
Read/write queue to avoid conflicts between jobs.
- BPMF.utils.round_time(t, sr=-10.0)[source]
- Parameters:
t (scalar float,) – Time, in seconds, to be rounded so that the number of meaningful decimals is consistent with the precision allowed by the sampling rate.
sr (scalar float, default to cfg.SAMPLING_RATE_HZ,) – Sampling rate of the data. It is used to round the time.
- Returns:
t – Rounded time.
- Return type:
scalar float,
- BPMF.utils.sec_to_samp(t, sr=-10.0, epsilon=0.2)[source]
Convert seconds to samples taking into account rounding errors.
- BPMF.utils.spectrogram(x, window_duration_sec, overlap, sampling_rate, detrend=False, window='hann', nfft=None, boundary=None, padded=False, scaling='spectrum')[source]
- Parameters:
window_duration_sec (float) – Duration of the sliding window, in seconds, of the short-time Fourier transform. This is later converted to samples to define the nperseg argument for scipy.signal.stft.
overlap (float) – Ratio of overlap, from 0 to 1, between subsequent windows. This is later converted to samples to define the noverlap argument for scipy.signal.stft.
detrend (bool, optional) – If False, no detrending is done. If it is a string or a function, detrending is done (see doc of scipy.signal.stft). Defaults to False.
window (str or tuple or array-like, optional) – The string is the name of the taper window to apply to each segment. See the doc of scipy.signal.stft for more details.
boundary (str or None, optional) – Define how the time series are extended at the boundaries. See the doc of scipy.signal.stft for more details. Defaults to None (only valid segments are used).
padded (bool, optional) – If True, padding occurs after boundary extension. Defaults to False.
scaling (str, optional) – Either ‘spectrum’ or ‘psd’.
- Returns:
frequency (numpy.ndarray) – Frequencies, in Hertz, at which the spectra are estimated.
time (numpy.ndarray) – Times, in seconds, of the sliding windows.
spectrogram (numpy.ndarray)
- BPMF.utils.time_range(start_time, end_time, dt_sec, unit='ms', unit_value={'ms': 1000.0, 'ns': 1000000000.0, 'us': 1000000.0})[source]
Compute a range of datetime64.
- Parameters:
start_time (string or datetime) – Start of the time range.
end_time (string or datetime) – End of the time range.
dt_sec (scalar float) – Time step, in seconds, of the time range.
unit (string, default to 'ms') – Unit in which dt_sec is converted in order to reach an integer number.
unit_value (dictionary, optional) – Dictionary with the value of 1 second in different units.
- Returns:
time_range – The time range computed from the input parameters.
- Return type:
(n_samples,) numpy.ndarray of numpy.datetime64
- BPMF.utils.two_point_distance(lon_1, lat_1, depth_1, lon_2, lat_2, depth_2)[source]
Compute the distance between two points.
- Parameters:
lon_1 (scalar, float) – Longitude of Point 1.
lat_1 (scalar, float) – Latitude of Point 1.
depth_1 (scalar, float) – Depth of Point 1 (in km).
lon_2 (scalar, float) – Longitude of Point 2.
lat_2 (scalar, float) – Latitude of Point 2.
depth_2 (scalar, float) – Depth of Point 2 (in km).
- Returns:
dist – Distance between Point 1 and Point 2 in kilometers.
- Return type:
scalar, float
- BPMF.utils.two_point_epicentral_distance(lon_1, lat_1, lon_2, lat_2)[source]
Compute the distance between two points.
- Parameters:
lon_1 (scalar, float) – Longitude of Point 1.
lat_1 (scalar, float) – Latitude of Point 1.
lon_2 (scalar, float) – Longitude of Point 2.
lat_2 (scalar, float) – Latitude of Point 2.
- Returns:
dist – Distance between Point 1 and Point 2 in kilometers.
- Return type:
scalar, float
- BPMF.utils.weighted_linear_regression(X, Y, W=None)[source]
- Parameters:
X ((n,) numpy array or list) –
Y ((n,) numpy array or list) –
W (default to None, (n,) numpy array or list) –
- Returns:
best_slope (scalar float,) – Best slope from the least square formula
best_intercept (scalar float,) – Best intercept from the least square formula
std_err (scalar float,) – Error on the slope