Deep learning models trained to estimate the probability of seismic P and S phases are rapidly expanding the scale of local event detections. Here, we evaluate the potential for deep learning model output phase detection probabilities to contribute to event‐type classification, particularly discrimination of single‐fired borehole explosions and earthquakes at local distances (<300 km). Motivated by the empirical success of P/S amplitude ratios, we consider the difference between P and S pick probability output from previously developed phase detection models, PprobSprob, as a discriminant. Test data include ML14 earthquakes and explosions observed by common seismographs in ten geologically diverse localities. Depending on the picking model and training data, binary classification using PprobSprob with at least three stations can achieve approximately equivalent classification accuracy as P/S amplitude ratios without requiring any customization. Joint classification with P/S and PprobSprob improves accuracy for most quality control scenarios. Pick probabilities are an efficient attribute to consider in explosion discrimination because they can be automated byproducts of event detection. They avoid the binary choice of picking or not picking weakly visible S waves common to explosions.

Data processing advances for seismic monitoring, such as deep learning models for picking P and S phases (Zhu and Beroza, 2019; Mousavi et al., 2020) and the increasing abundance of seismic instrumentation, result in the accelerating growth of local seismic event catalogs (e.g., Liu et al., 2020; Glasgow et al., 2021). Commensurate improvements in the efficiency of source characterization tools are needed to fully benefit from the increased scale of event detection and allow analysts to efficiently identify events that merit customized scrutiny. One societally important and widely studied source characterization problem is the discrimination of explosions from earthquakes, which is particularly challenging for lower magnitude or yield events (M < 3–4) that are best recorded at local distances of less than approximately 300 km (Bowers and Selby, 2009).

Measurements of P/S amplitude ratios are one of the most common explosion discrimination metrics either used directly (O’Rourke et al., 2016; Pyle and Walter, 2019), jointly with other physics‐based metrics (Tibi et al., 2018; Wang, Schmandt, et al., 2021), or integrated into machine learning‐based models (Kong et al., 2022). Use of P/S ratios is motivated by the theoretical expectation that explosion sources preferentially excite P waves compared to earthquakes and it exhibits long‐standing empirical success, especially at regional to teleseismic distances of greater than approximately 300 km (Taylor et al., 1989; Bowers and Selby, 2009). There is growing evidence that P/S ratios are also useful discriminants at local distance, with variability across different localities (e.g., O’Rourke et al., 2016; Wang, Bian, et al., 2021; Pyle and Walter, 2022; Rathnayaka et al., 2024).

An inherent challenge with the use of P/S ratios to discriminate local explosions from earthquakes is that the method relies on the measurement of relatively low‐S‐wave amplitudes, which may approach the background noise or P‐coda amplitudes for small events. Consequently, different studies make a variety of choices regarding whether both the P and S waves must be visually “picked” or attain a certain signal‐to‐noise ratio (SNR) threshold to be included for analysis (Pyle and Walter, 2019; Rathnayaka et al., 2024). Alternatively, S‐wave amplitude can be extracted from a predicted travel‐time window, but then S‐wave amplitudes could be overestimated if they do not rise above the P coda and background noise (O’Rourke et al., 2016; Wang et al., 2020). Other important choices required for P/S measurement parameters include frequency content, phase window timing and duration, and potential site and/or path corrections, which may need optimization for different localities and/or different distances within the approximately 0–300 km range.

Here, we test whether the use of deep learning models trained for seismic phase picking and event detection with analyst‐labeled earthquake seismograms can efficiently provide complementary insights into the relative excitation of P and S waves by locally observed explosions. The motivations are that the phase probabilities are often available as an automated byproduct of machine learning‐based event detection workflows, they require few parameter choices because they often ingest raw or minimally processed data, and they provide a quantitative metric of seismic phase identification even for modest SNR. The motivation is not to replace other local discrimination metrics but rather to determine whether phase pick probabilities may be another valuable metric to consider as a link between event detection and classification in modern seismic monitoring workflows.

Three new datasets were added to the multiregional database recently developed by Maguire et al. (2024) (Fig. 1a). All data are uniformly formatted and have openly available metadata and waveforms (see Data and Resources, Fig. S1, available in the supplemental material to this article). The new datasets include the Dry Alluvium Geology component of the Source Physics Experiment (e.g., Pyle and Walter, 2022), the SIMA controlled source crustal imaging project in southern Spain (Ayarza et al., 2014) and the RIFSIS crustal imaging project in Morocco (Gil et al., 2014), along with surrounding temporary broadband seismometers in Spain and Morocco (Bezada et al., 2014). The data are chosen for their consistent use of single‐fired shallow borehole explosions and availability of similar magnitude local earthquake recordings from a common set of three‐component broadband seismometers. Collectively, the earthquake and explosion datasets exhibit similar distributions of source–receiver distances (Fig. 1b), although the spatial distribution of different source types varies among the arrays (Fig. S1).

Earthquake and explosion waveforms recorded at the same station and similar distances often reveal distinct patterns. Earthquakes often exhibit higher S‐wave amplitude and lower P‐wave amplitude, whereas explosions tend to show the opposite—higher P‐wave amplitude and lower S‐wave amplitudes that can be challenging to measure among P coda and noise (Fig. 1c). This tendency for P and S relative amplitudes and the difficulty of objectively picking explosion S waves inspires us to explore their relative phase identification probability, PprobSprob, derived from deep learning models as a potential explosion discrimination metric that would be easily automated.

Raw seismic data were downloaded from EarthScope Data Services and saved along with the event and station metadata in a uniform format following Maguire et al. (2024). The total duration of waveforms is 120 s from 10 s before the origin time to 110 s after. Data channels were restricted to three‐component BH (broadband), EH (short period, ∼1–2 Hz), or HH (high sample rate broadband). All the waveforms were resampled to a uniform rate of 40 Hz, which is the lowest common sample rate among the data.

We input the processed adaptable seismic data format (ASDF) to the deep learning models PhaseNet (Zhu and Beroza, 2019) and EQTransformer (Mousavi et al., 2020), both pretrained on the STanford EArthquake Dataset (STEAD) dataset (Mousavi et al., 2019), to produce model output time series of P pick probability (Pprob) and S pick probability (Sprob). A recent quantitative review of the performance of these and other deep learning phase picking models is provided by Münchmeyer et al. (2022). Here, the pick probability estimation process was implemented using SeisBench (Woollam et al., 2022) and multiple models were used to investigate model dependency of the results. The model output phase pick probability from these phase picking models is known to vary with small changes in the time window used for analysis (Park et al., 2023). Here, each seismogram segment containing an event was processed 3 times with a 1 s shift in start time to improve the stability of the estimated peak P and S probabilities for each source–receiver pair. The mean difference between explosion PprobSprob and earthquake PprobSprob remains stable for each of the three event segment start times, so further iterations were not considered necessary (Fig. S2).

Three time windows are defined within each seismogram: P‐wave window, S‐wave window, and pre-event noise window following Wang et al. (2020). The P/S ratio and SNR were computed using the predefined time windows and 6–18 Hz band‐pass‐filtered three‐component waveforms. The P/S ratio (equation 1) was derived from the effective variance of the P and S phases measured on all three components, and the SNR (equation 2) was calculated from the effective variance of the P and pre‐event noise phases (Wang et al., 2020):
in which three‐component P windows (PZ, PR, PT), S windows (SZ, SR, ST), and noise windows (NZ, NR, NT) are considered in calculations. P/S ratios were measured for all source–receiver pairs with P‐wave SNR > 3. Alternate frequency band choices show similar or poorer P/S discrimination performance (Fig. S3).

To measure PprobSprob the maximum amplitude from the P pick probability and S pick probability time series was extracted within the respective phase windows. All station source–receiver pairs with SNR > 3 from a single earthquake or explosion event were used to calculate the mean PprobSprob, and mean P/S ratio for the event. Performance of P/S discrimination can vary with the number of stations recording each event (Pyle and Walter, 2019; Wang et al., 2020). Here, results are shown using all events with at least three source–receiver pairs because that is often a practical minimum for local event detection and location. We use PprobSprob rather than Pprob/Sprob because the ratio behaves erratically for low probabilities.

To derive a balanced set of earthquake and explosion observations with which to estimate classification performance, we randomly select 10 times the minimum number of events between earthquakes and explosions. Receiver operating characteristic (ROC) curves (James et al., 2013) and the area under the ROC curve (AUC) for PprobSprob and P/S ratio were calculated to assess binary classification performance for the two metrics individually. This process was repeated for 1000 iterations of bootstrap resampling, and the mean values of false positive rate, true positive rate, and AUC are reported. For joint classification we follow the approach used by Wang, Schmandt, et al. (2021) (grid search over slope and intercept) to obtain the line that optimally separates earthquakes and explosions in bivariate scatter plots of P/S and PprobSprob (Fig. S4).

Pick probability time series provide a useful illustration of local phase propagation for explosions and earthquakes in each study area. For example, time versus distance images of stacked P and S pick probabilities for the Idaho‐Oregon (IDOR) dataset show that S wavepick probabilities tend to be lower for explosions than for earthquakes with both the PhaseNet and EQTransformer models (Fig. 2). Closer inspection shows that PhaseNet demonstrates more contrast between source types with higher P pick probability and lower S pick probability for explosions (Fig. 2b). These observations are consistent with additional statistical results shown in Figures S5 and S6. Both models produce local maxima in S pick probability at distances >150 km, near the time of the P‐wave probability (Fig. 2).

The P/S ratio distribution exhibits a mean value of 0.72 for earthquakes and 1.45 for explosions (Fig. 3a, Fig. S7). Similarly, PhaseNet PprobSprob shows contrasting distributions for earthquakes (mean of 0.07) and explosions (mean of 0.35, Fig. 3b). EQTransformer PprobSprob shows less separation, with mean values of 0.12 for earthquakes and 0.23 for explosions (Fig. 3c). Although the ROC curves and AUC values indicate that PhaseNet PprobSprob exhibits classification performance comparable to the P/S ratio, the two metrics lead to different ROC curve shapes that indicate potentially complementary characteristics (Fig. 3d).

PprobSprob shows greater separation between explosions and earthquakes when implemented with PhaseNet rather than EQTransformer, despite both models using the STEAD training data (Fig. 3d, Figs. S5 and S6). PhaseNet outputs probabilities of P and S phases (Zhu and Beroza, 2019), and EQTransformer outputs probabilities of P, S, and event detection (Mousavi et al., 2020). The event detection output from EQTransformer shows a tendency toward higher probabilities for earthquakes compared to explosions (Fig. 3e).

Joint use of PhaseNet PprobSprob and P/S for explosion discrimination was tested to see if performance can exceed that of either metric individually. Optimized linear discrimination thresholds are shown for an individual study area and the cumulative results across the 10 localities based on the quality control requirement of P‐wave SNR > 3 (Fig. 4). Cumulative results are shown for several alternative quality control scenarios in Table 1, with some quality control based on P‐wave amplitude SNR and others based on PhaseNet P and or S pick probability. For all but one scenario, joint classification with PhaseNet PprobSprob and P/S results in at least slightly higher balanced accuracy (Table 1). The exception is for a relatively high threshold of PhaseNet P and S pick probability >0.3, in which case, joint classification does not help, and P/S classification alone is more accurate. However, this stringent quality control scenario substantially reduces the fraction of the events (requiring at least three stations for an event) or source–receiver pairs that can be used for classification (Table 1).

Seismic phase picking and source‐type classification are conventionally addressed separately. The results here highlight how these tasks may be related when phase pick probabilities are obtained from deep learning models. Specifically, the differential probability of observed P and S phases appears to encode information about whether the source process is an earthquake or explosion (Fig. 3). Such a result is not surprising based on longstanding use of P/S (or S/P) amplitudes in explosion discrimination (e.g., Taylor et al., 1989) and focal mechanism estimation (e.g., Hardebeck and Shearer, 2003). Similarly, differential magnitude estimates have long been used for explosion discrimination (Bowers and Selby, 2009; Koper et al., 2021) and they too rely on the preferential excitation of seismic phases by different source types. Probabilistic phase picking information averaged across multiple stations is another expression of the preferential excitation of P waves relative to S waves for subsurface explosions, although the probability of observance at any station is also affected by radiation pattern, path, and site effects. To be clear, the results are not considered as evidence to support using PprobSprob independently to discriminate between explosions and earthquakes; rather, the results highlight that PprobSprob might have value as an efficient first‐pass screening tool or among a suite of automatically measured attributes to help handle the increasing scale of local event catalogs.

Performance of PprobSprob is variable across the geologic settings represented, with datasets like Mount St. Helens (MSH) showing clear separation of explosions and earthquakes (Fig. 4a,c), but the cumulative database showing more scattered overlap (Fig. 4b,d). The most extreme cases of potential for misclassification involve explosions that produced PprobSprob<0 (Fig. 4b, Fig. S8). All of these came from two overlapping southeastern U.S. datasets (Eastern North America Margin and Georgia‐South Carolina, which are labeled B and E, respectively, in Fig. 1a) and most had source–receiver distances >130 km (Table S1). In contrast, most MSH source–receiver distances for explosions are <120 km. A supplementary test using events with shorter source–receiver distances (mean distances <130 km) shows improved classification performance for both PprobSprob and P/S (Fig. S9).

Diminished effectiveness at farther distances may be related to the Pg‐to‐Pn first‐arrival transition (e.g., Davenport et al., 2017; Marzen et al., 2019). The STEAD training data better represent shorter distances where the crustal phase Pg is the first arrival (Mousavi et al., 2019). The influence of the approximately 150 km emergence of the uppermost mantle turning phase Pn as the first arrival (Davenport et al., 2017) also shows up in the false local maxima for S wave pick probability during the P wave (Fig. 2).

A noteworthy limit of PprobSprob is that its ability to distinguish between source types diminishes for strong signals with high P and S pick probabilities (Table 1). A test using only high PhaseNet pick probability source–receiver pairs (i.e., >0.3) results in higher balanced accuracy using P/S alone compared to joint PprobSprob and P/S classification (Table 1, Fig. S10). Thus, PprobSprob seems more useful for weaker signals with greater phase pick ambiguity, which may help with the large number of events detected near the magnitude of completeness for a given local network. The comparison of different quality control scenarios using PhaseNet pick probability thresholds shows the importance of incorporating data with low PhaseNet S pick probability. Only applying a 0.3 probability quality control threshold to PhaseNet P pick probability (no S threshold) results in an increase in balanced accuracy and a greater fraction of events that can be classified (Table 1).

It is important to note that our results indicate a model dependency when using the proposed PprobSprob attribute to help distinguish between earthquakes and explosions. As shown in Figure 3b,c, PhaseNet‐derived PprobSprob has a better separation capability compared to that of EQTransformer, although both models were trained with the same data. However, the probabilistic phase picking from both models demonstrates some degree of discrimination capability, highlighting that averaging phase pick probabilities from multiple stations can be sensitive to the preferential excitation of P waves relative to S waves for explosions. We speculate that one possibility for the origin of the model‐dependent difference in classification performance comes from EQTransformer’s multitask learning (Mousavi et al., 2020), in which an elevated probability of event detection may give the model more confidence to label S phases in the waveform, even for the explosion waveforms. However, this speculation needs further investigation because many factors influence neural network model performance (e.g., Park and Shelly, 2024).

It is well documented that deep learning model phase picking performance can be challenged by previously unseen noise conditions such as in the ocean bottom versus onshore environments (Bornstein et al., 2024) and different geologic settings such as volcanic systems (Lapins et al., 2021; Zhong and Tan, 2024). Yet, examples like using a model trained on tectonic earthquakes to detect icequakes show that picking models can exhibit considerable source‐type versatility, if conditions like the magnitude range and source–receiver distances remain similar to the training (Peña Castro et al., 2025). The explosion phase detection performance of either model used here could likely be improved with transfer learning (e.g., Lapins et al., 2021), incorporating a smaller batch of explosion seismograms to supplement STEAD (Mousavi et al., 2019). However, training with explosion data might degrade the potential to use PprobSprob as a discriminant because we expect that a model trained with explosion seismograms would have an enhanced ability to pick weak explosion S waves. Alternatively, phase picking could be conducted with two different models, with one trained only on earthquakes and the other augmented with explosion-based transfer learning. These possibilities are left to future studies, but we hypothesize that the differences in detection and P‐S-wave pick probability between such models could provide valuable explosion discrimination metrics and insights into effective quality control strategies.

With the growing use of machine learning models that output phase pick probabilities, rather than simply the presence or absence of a pick, there are opportunities to efficiently gain information for source classification problems like explosion discrimination. Test results from ten different localities show that differential phase pick probabilities, here PprobSprob, offer a new and highly efficient way to discriminate local explosions and earthquakes, as well as a potential metric for quality control in the application of conventional P/S ratio discrimination. If PprobSprob is used separately for binary classification, it can provide earthquake and explosion classification accuracy similar to that of P/S amplitude ratios without any customization, such as optimizing the choice of frequency band. In most quality control scenarios, classification accuracy can be improved by the joint use of PprobSprob and P/S ratios, suggesting that they carry complementary information.

Raw seismograms were obtained using EarthScope Data Services, which is funded through the Seismological Facilities for the Advancement of Geoscience (SAGE) Award of the National Science Foundation under Cooperative Support Agreement EAR‐1851048. The data used are openly available in uniformly formatted ASDF files through Zenodo repository at doi: 10.5281/zenodo.15490373. The supplemental material includes ten figures (Figs. S1–S10) and one table (Table S1).

The authors acknowledge that there are no conflicts of interest recorded.

This research was supported by Air Force Research Laboratory (AFRL) Contract Numbers FA9453‐21‐02‐0024 and FA9453‐24‐9‐0001. Eli Baker provided helpful feedback during the project. C. D. thanks Andres F. Peña Castro for an introduction to SeisBench. Alex Witsil, an anonymous reviewer, and the editors are thanked for their helpful feedback. This Source Physics Experiment (SPE) research was funded by the National Nuclear Security Administration, Defense Nuclear Nonproliferation Research and Development (NNSA DNN R&D). The authors acknowledge important interdisciplinary collaboration with scientists and engineers from Los Alamos National Laboratory (LANL), Lawrence Livermore National Laboratory (LLNL), Nevada National Security Site (NNSS), Sandia National Laboratories (SNL), and University of Nevada, Reno (UNR). Qingkai Kong’s work was performed under the auspices of the U.S. Department of Energy by LLNL under Contract Number DE‐AC52‐07NA27344. This is LLNL Contribution Number LLNL‐JRNL‐2003743.

Supplementary data