Oklahoma and Kansas experienced unprecedented seismic activity over the past decade due to earthquakes associated with unconventional hydrocarbon development. The modest natural seismicity and incomplete knowledge of the fault network in the region made it difficult to anticipate the locations of earthquakes with larger magnitudes (). Here, we show that monitoring of microearthquakes at regional scale using a pretrained neural phase picker and an earthquake relocation algorithm can illuminate unknown fault structures, and deliver information that can be synthesized for earthquake forecasting. We found that 80% of the larger earthquakes that occurred in the past decade could have been anticipated based on the spatial extent of the seismicity clusters that were formed before these earthquakes occurred. We also found that once a seismicity cluster with a length scale enough to host a larger earthquake was formed, there was a ∼5% chance that it would host one or more larger earthquakes within a year. This probability is nearly an order of magnitude higher than one based on Gutenberg–Richter statistics and preceding seismicity. Applying our approach in practice can provide critical information on seismic hazards for risk management and regulatory decision making.
Oklahoma and Kansas experienced unprecedented seismic activity over the past decade due to earthquakes triggered by unconventional hydrocarbon development (Ellsworth, 2013; Keranen et al., 2014). The region did not have a history of frequent earthquake activity, and little was known about the location of the faults that came to host it. For that reason, seismic hazard forecasts relied on the seismicity rate of the previous years without information on the distribution of regional faults (Petersen et al., 2016, 2017, 2018).
Microearthquake activity can illuminate otherwise unknown faults, particularly when events are detected down to low magnitudes and are located precisely. The resulting seismicity can be treated as a sample of the underlying fault geometry and used to estimate important fault attributes, such as length, segmentation, and orientation. In this study, we apply machine learning algorithms to reanalyze publicly available continuous waveform data in Oklahoma and Kansas from the decade 2010 to 2019, to develop a comprehensive regional earthquake catalog. From this improved catalog, we demonstrate that most of the faults that hosted earthquakes with moment magnitude of 4 or greater () were illuminated by detectable seismicity in advance, such that they could have been better anticipated had real‐time microearthquake monitoring been operational.
Earthquake Catalog Development
Length Scale Comparisons
Earthquakes form clusters that illuminate the activated segments of underlying fault structures. However, clustering the earthquakes is a challenging task due to the nonuniformity in their spatial distribution and density, location uncertainty, and faults that intersect. Clustering algorithms that are solely based on point coordinates are not robust against these conditions. Because most faults in the area are vertical strike slip, and because focal depths are less certain than the epicenter, we ignored the focal depths and used a ridge detection algorithm (Chen et al., 2015) to map the epicenters into ridges. We then estimated the normal vector at each point along the ridge using principal component analysis on the neighboring ridge points within 50 m. Using both the coordinates and the estimated normal vectors of the ridge points, we segmented the ridges and assigned nearby earthquakes to their closest ridge segment. We did not use the earthquakes that could not be confidently relocated using hypoDD (Waldhauser, 2011), because our analysis more strongly depends on precise locations rather than catalog completeness. Finally, we postprocessed the clustered earthquakes by manually joining clusters that were well aligned and removing the clusters with low aspect ratios that did not look like a planar fault or those with less than 30 earthquakes. Every step, especially the last step, involves some level of subjectivity when choosing parameters for the algorithms and deciding on which clusters to merge. However, the statistics we discuss here were counted after finalizing the clustering and thus were not subjectively biased. Figure 3 demonstrates the workflow applied to one of the most challenging areas to cluster (highlighted in Fig. 1).
Illumination of unknown structures
The seismicity in our catalog reveals widespread and previously unknown fault structures in Oklahoma and Kansas, including those that hosted the earthquakes. The trend of seismicity shows clear preferred orientations that are close to optimally oriented with respect to the maximum horizontal stress direction (Alt and Zoback, 2016). Some sequences, particularly the Pawnee sequence in Figure 1, display faulting on multiple structures and at all resolvable scales. Another aspect that emerges are clear seismicity boundaries. The Nemaha ridge (Fig. 1a) has been hypothesized as a barrier to flow (Weingarten et al., 2015), and our result strengthens the case. Specifically, the seismicity in eastern Oklahoma tends to be constrained to the east of the ridge south of 36.25° N but to the west of the ridge farther north. Another example is shown in Figure 1b in which the seismicity is confined to the west of the highlighted line, which is nearly parallel to the mapped faults in the vicinity.
Our catalog shows substantial improvement in the earthquake count and spatial resolution compared to the previous studies that used other workflows for improving upon the routine catalog (Schoenball and Ellsworth, 2017a), including template matching (Skoumal et al., 2019). Figure 2 shows an example of the seismicity from these previous studies and from our catalog in the area highlighted in Figure 1 as well as their cumulative magnitude–frequency distributions. The added microearthquakes in our catalog effectively connect the dots between scattered earthquakes and small clusters formed by larger earthquakes, highlighting more extensive and continuous structures. More zoomed‐in examples are shown in Figures S5–S11. However, the magnitudes in the three catalogs were taken from different sources and/or calculated using different correlations, and thus should not be directly compared.
Preillumination of structures with sufficient length scales
Figure 4 shows the seismicity before and after each of the 60 earthquakes, and compares their moment magnitudes to those calculated from the measured length of the hosting clusters prior to their occurrence. Based on our analysis, 48 of the 60 earthquakes (80%) occurred on clusters that were large enough to host events of at least their size given antecedent seismicity such that they may not have come as a complete surprise. On the other hand, 12 of the 60 earthquakes (20%) occurred on clusters that would not have been flagged as large enough, as no clear cluster with sufficient length scale was identified in advance. This could reflect either a genuine lack of seismic activity before these earthquakes, or it could be due to insufficient station coverage for seismicity monitoring (Fig. S12), which is most likely to be relevant for earthquakes 1 and 2 in Figure 4. By capturing the aftershocks of these “surprising” earthquakes, however, the hosting clusters became more definite over time such that the measured length scales exceeded those of the following earthquakes on those faults. For example, earthquake 34 in Figure 4 was a surprise, but the next seven earthquakes (earthquakes 40, 41, 42, 44, 45, 46, and 51), including the 2016 5.0 Fairview mainshock (earthquake 42), occurred on a clearly defined structure with a sufficient length scale. Such persistently active features may be especially useful for hazard analysis at long‐term injection sites where operating time scales can be years to decades.
Chances for an cluster hosting one or more earthquakes
Similar to the exceedance probabilities used in seismic hazard analysis, this accounting could be interpreted as an empirical probability of having at least one earthquake within the set time interval once a fault activates over a length scale of at least LM4. Based on our results, the precision scores were evaluated as 4.8%, 5.5%, 5.7%, and 6.2% for time intervals of 1, 2, 3, and 4 yr, respectively, in which 4 yr was enough to contain all the 26 clusters that produced at least one earthquake in the past decade. The TP and FP clusters classified based on the one‐year time interval are shown in Figure 5. We are not counting the false negative cases in which a cluster was either unidentified or had a length scale shorter than LM4 before hosting the first earthquake. The result shows that most of the clusters that hosted an earthquake did so within the first year after formation (20 out of 26), and the number of those that took longer was relatively small. It is not clear, however, if this is a feature of cluster evolution, a result of changes in oil and gas operations or a consequence of regulatory restrictions on wastewater disposal.
These statistics were derived from a simple scaling relationship (equation 3), and the clusters we identified do not necessarily represent a simple, planar fault. However, these approaches could still be used to support decision making when structures with concerning length scales are revealed by antecedent seismicity, such as taking steps for risk reduction (Langenbruch and Zoback, 2016). It could also be used to motivate enhanced geophysical monitoring such as monitoring the pressure in the disposal zone or deploying more seismometers around concerning clusters to improve microearthquake monitoring.
Enhanced microearthquake monitoring has become straightforward due to the advances in automated phase picking algorithms and other well‐established methods of precision seismology. We developed a comprehensive regional earthquake catalog for a decade for Oklahoma and Kansas that can be thought of as retrospective monitoring using a workflow suitable for real‐time operations. We used this catalog to demonstrate that the faults that hosted induced seismicity were distinct from the previously mapped faults, and that locations of most of the earthquakes should not have been a complete surprise. Using the correspondence between preilluminated structures and earthquake potential to quantify earthquake probabilities would support improved near real‐time operational and regulatory decision making.
Data and Resources
Continuous waveform data for the 17 public seismic networks were downloaded through https://www.iris.edu/ (last accessed May 2022). The links to each seismic network data are 4H: doi: 10.7914/SN/4H_2014, 9L: doi: 10.7914/SN/9L_2013, GS: doi: 10.7914/SN/GS, N4: doi: 10.7914/SN/N4, NP: doi: 10.7914/SN/NP, NQ: doi: 10.7914/SN/NQ, NX: doi: 10.7914/SN/NX, O2: doi: 10.7914/SN/O2, OK: doi: 10.7914/SN/OK, TA: doi: 10.7914/SN/TA, US: doi: 10.7914/SN/US, XR: doi: 10.7914/SN/XR_2016, Y7: doi: 10.7914/SN/Y7_2016, Y9: doi: 10.7914/SN/Y9_2016, ZD: doi: 10.7914/SN/ZD_2014, ZP: doi: 10.7914/SN/ZP_2016, and ZQ: doi: 10.7914/SN/ZQ_2011. The supplemental material contains the details of the stations used in this study are available in Table S1, the earthquake catalog produced from this study is available in Table S2, the details of the 60 earthquakes are available in Table S3, and details of the 430 identified clusters are available in Table S4.
Declaration of Competing Interests
The authors declare that there are no conflicts of interest recorded.
This work was supported by the Stanford Center for Induced and Triggered Seismicity, and the Department of Energy (Basic Energy Sciences; Award Number DE‐SC0020445). The authors thank Daniel Trugman and an anonymous reviewer for their constructive feedback.