A Detailed Earthquake Catalog for Banda Arc– Australian Plate Collision Zone Using Machine- Learning Phase Picker and an Automated Workflow

Cite this article as Jiang, C., Zhang, P., White, M. C.A., Pickle, R., and Miller, M. S. (2022). A Detailed Earthquake Catalog for Banda Arc–Australian Plate Collision Zone Using Machine-Learning Phase Picker and an Automated Workflow, The Seismic Record. 2(1), 1–10, doi: 10.1785/ 0320210041. Supplemental Material The tectonic setting of Timor–Leste and Eastern Indonesia comprises of a complex transition from oceanic lithosphere subduction to arc-continental collision. To better understand the deformation and convergent-zone structure of the region, we derive a new catalog of earthquake hypocenters andmagnitudes from a temporary deployment of five years of continuous seismic data using an automated processing procedure. This includes a machine-learning phase picker, EQTransformer, and a sequential earthquake association and location workflow. We detect and locate ∼19,000 events during 2014–2018, which demonstrates that it is possible to characterize earthquake sequences from raw seismic data using a well-trained machine-learning picker for a complex convergent plate setting. This study provides the most complete catalog available for the region for the duration of the temporary deployment, which includes a complex pattern of crustal events across the collision zone and into the back-arc, as well as abundant deep slab seismicity.


Introduction
Between the active Sunda and Banda arcs of southeast Asia lies the complex transition from oceanic lithosphere subduction to arc-continental collision. This portion of the plate margin, where the Indo-Australian plate converges with the Eurasian plate, effectively spans the different stages of evolution from arc-oceanic subduction in the west to arc-continent collision in the east. The collision of the Australian continental lithosphere with the volcanic arc results in complex crustal deformation, whereas subduction of oceanic lithosphere is associated with abundant deep slab seismicity (Fig. 1). The transition between oceanic subduction and subsequent continental collision is occurring at different stages along the plate boundary, which can be used to study and assess the spatiotemporal evolution of arc-continent collision. A temporary deployment of 30 broadband seismometers (YS network; see Data and Resources) across this region , aimed at understanding the initiation of continental mountain building and the cessation of island arc volcanism, offers a rare glimpse into a set of processes that have shaped Earth's evolution over geologic time. The data from these instruments and three permanent stations (GE network; see Data and Resources) here are used to create a detailed earthquake catalog to provide unique insights into understanding the deformation and subduction zone structure in the region.
Accurate location of seismicity provides important data to understand deformation not only within the crust but also within the subducting lithospheric plate. Crustal earthquakes can also illuminate the presence of new active faults, and mapping these features is essential for understanding and mitigating seismic hazards. The sparse number of permanent stations with open access data has previously made this region particularly challenging to understand from a seismological perspective. Schöffel and Das (1999) and Das (2004) manually relocated Wadati-Benioff zone earthquakes (>50 km depth) for events from 1962 to 1996, using a combination of first arrivals, depth phases, and core phases. The details of these hypocenter locations are fundamental to addressing questions regarding the distribution of seismicity along this complex subduction zone and its relation to mantle flow and tectonic plate motions.
As previously noted by Schöffel and Das (1999) and Das (2004), most of the preexisting catalogs for the region are not reliable for detailed studies, as few utilize targeted temporary arrays. Most recently, Supendi et al. (2020) used the first two years of continuous YS network data (2014)(2015) to manually derive a catalog of 415 events and created a local P-wave tomography model. However, their manual picking processes and quality-control criteria of requiring recording for at least six stations limit the detected events to only intermediate-tolarge magnitude earthquakes. Utilizing the full 5 yr of YS network data (2014-2019) with new automated earthquake detection techniques provides an opportunity to produce the best possible catalog to date.
In this study, we derived a detailed earthquake catalog using fully automated and state-of-the-art open-source programs that involve machine learning models. Previous studies have demonstrated their promise in picking efficiency, picking accuracy and location robustness (e.g., Ross et al., 2018;Zhu and Beroza, 2019;Liu et al., 2020;Wang et al., 2020;Tan et al., 2021) for crustal event detection. However, very few studies attempt to use this workflow for a subduction zone (e.g., Wang et al., 2019) where pervasive deep slab seismicity dominates seismic activity across a large spatial domain. Despite the combined challenges of enormous searching domain in 3D, sparse island-only station positioning and noisy waveforms , we detected more than 19,000 events with M L magnitudes ranging 0.5-7. The new catalog of earthquake locations, sizes, and arrival-time measurements presented in this study is the region's most detailed to date.

Data and Method
We analyzed continuous broadband seismic data from 30 YS network stations between March 2014 and January 2019 , plus three permanent GE network stations located in our study region ( Fig. 1). At the end of 2016, 22 of the temporary stations were removed, and 12 of the instrument replacements were not continuously functioning for the duration of the experiment . Details can be found in Data and Resources.

Earthquake detection and location
We built the detailed earthquake catalog with an automatic processing procedure adapted from the previous work of White et al. (2019White et al. ( , 2021 to incorporate a new machine-learning earthquake detector and phase picker from Mousavi et al. (2020). The workflow consists of four major steps: (1) detecting earthquakes and picking P and S phases, (2) associating phase picks with earthquake sources and crudely locating earthquakes, (3) refining earthquake locations individually using a 3D location algorithm, and (4) relocating event ensembles using a double-difference relocation algorithm. Each step takes advantage of open-source packages, which has the benefit of promoting reproducibility of the procedure.
First, the machine-learning model for simultaneously detecting earthquakes and picking phase arrivals, EQTransformer (Mousavi et al., 2020), was used to identify P and S arrivals from the continuous, three-component data. Compared to the many other machine-learning models for detecting earthquakes and picking phase arrivals (e.g., PhaseNet; Zhu and Beroza, 2019), EQTransformer was trained with waveforms from a wide variety of tectonic settings,  et al., 1994). The inset shows the geographic location of the study region relative to Australia and southeast Asia. The focal mechanism represents the Global Centroid Moment Tensor (Global CMT) solution (Ekström et al., 2012; see Data and Resources) for the 2015 M w 6.5 Alor earthquake, as referred to in the Results section.
https://www.seismosoc.org/publications/the-seismic-record/ • DOI: 10.1785/0320210041 The Seismic Record though significant proportions of the training waveforms were from small (M <2.5) crustal events. To detect as many picks as possible, we changed the default detection probability, P probability, and S probability, as defined in Mousavi et al. (2020), from 0.1 to 0.01. To assess the uncertainty of the automatic picker, we manually picked five P and five S phases for each of ∼40 random events in 2015 and compared these picks to the automatic picks. The average difference between these methods was 0.16 s and 0.35 s for P and S wave with one standard deviation of 0.23 s and 0.41 s, respectively. This difference is one order of magnitude smaller than the final travel-time residuals for each station as we found out later, indicating that observation errors associated with the arrival-time estimates contribute relatively minor errors to the earthquake locations.
In total, EQTransformer yielded about 1.26 million P and 1.57 million S picks, which are then passed into step two, which uses the Rapid Earthquake Association and Location algorithm (REAL; Zhang et al., 2019) for association and preliminary earthquake location. We use a local 1D model derived from recent regional 3D P-and S-velocity models (Supendi et al., 2020;Zhang and Miller, 2021) (see supplemental material for details). We find that the outputs from REAL are mostly sensitive to the threshold parameter set for the number of associated phases. In particular, only events associated with a minimum of two S-wave arrivals and seven total arrivals (P and S) across four or more stations are retained for later analysis. The 3D grid-search area in REAL spans 3°laterally centered at the station, recording the earliest P pick and extends to a depth of 680 km. To reduce the computational cost and accommodate the smaller velocity variations at depth (>250 km), we implement an adaptive grid search with a 4 km interval between 0 and 80 km depth, 8 km interval between 80 and 240 km depth, and 20 km interval between 240 and 680 km depth. This results in 28,700 associated events located with station azimuthal gaps smaller than 340° (Fig. 2a).
Step three utilizes a probabilistic, global-search algorithm, NonLinLoc (Lomax et al., 2000), to relocate earthquakes with 3D velocity structures. We tested several regional 3D V P and V S models (Fichtner et al., 2009;Supendi et al., 2020;Zenonos et al., 2020) and selected the model from Fichtner et al. (2009) for the final relocation, which resulted in slightly smaller averaged travel-time misfits compared to the rest (see supplemental material). NonLinLoc estimates the absolute location uncertainties by assuming arrival-time uncertainties of 0.1 s for both P and S to be modeled by a Gaussian distribution. All other parameters used in NonLinLoc are listed in Table S1, available in the supplemental material to this article. After the relocation, we further removed those events of either lateral uncertainties larger than 100 km or of travel-time residuals larger than 3.5 s. This additional data culling removes 30% of the REAL outputs resulting in ∼20,000 events (Fig. 2b).
The last step of the workflow includes a high-resolution relative relocation utilizing the double-difference algorithm of hypoDD (Waldhauser and Ellsworth, 2000). For the cluster relocation, we only measured differential travel time using the existing P and S picks from the automatic picker, because we found that the waveform cross-correlation technique did not contribute to the total number of measurements (∼1.7 million). This is because the waveform similarity is generally weak for the cataloged events due to the known low signal-to-noise ratio of the temporary network stations . During the hypoDD relocation, a minimum of five differential travel times was required to relocate each event pair within a maximum distance range of 500 km. Table S2 lists all other parameters used in hypoDD. This step relocated more than 11,500 events, with the rest 7629 events remaining in their NonLinLoc locations (Fig. 2c).
After the automatic procedures described earlier, a final visual inspection was performed for all the 20,034 events by cross checking the raw vertical-component waveforms with the picks generated from EQTransformer. This manual review removed ∼1000 events showing either weak, false, or incoherent signals. This step was necessary considering the low criteria we used for phase picking and the wide extent of the study region covered by the 33 island stations. Eventually, 19,074 events passed this quality control step to form our final catalog (Fig. 2d).

Local magnitude calculation
Local magnitudes were calculated following the Richter's original formula (Richter, 1935), as implemented by the SeisComp software package (see Data and Resources). Only stations <400 km were considered, and we used a trimmed mean average of both the M Lv and M Lh magnitude estimates on the vertical and horizontal components, respectively. Amplitudes were defined after removing instrument response, simulating a Wood-Anderson instrument. Other parameters were kept as the default values in SeisComp. The Seismic Record catalog in both map views and cross sections. The relocation using NonLinLoc (Fig. 2b) helps dramatically tighten the spatial distribution of seismicity and results in a significant reduction of travel-time residuals (Fig. S1). However, minor differences were observed between the NonLinLoc and hypoDD outputs with a mean shift of 7.1 km in the lateral dimension and 7.4 km in the vertical dimension for the overlapped events (Figs. S2, S3). Our final catalog was composed of 19,074 events, including 11,445 hypoDD relocated events and another 7629 NonLinLoc relocated events (not relocated by hypoDD).

Results
The 19,074 events that make up the final catalog are spread widely across the Banda arc and extend from the surface down to 644 km depth (Figs. 2a and 3). Despite only using landbased stations, many offshore events are detected, and they show coherent spatial distributions. 46.9% of the events are shallower than 50 km, 44.8% are between 50 and 200 km depth, and 8.3% are below 200 km depth (Fig. 3). The Seismic Record 4 when more stations are available. The replacement of only a subset of the 22 stations at the end of 2016 resulted in a dramatic decrease of both the detection rates and the continuity of spatial distribution (Fig. 4 and Fig. S4). The catalog contains events with local M L magnitudes ranging from 0.5 to 7, and the proportion of the database with events smaller than M L 2, 3, and 4 are 22.5%, 76.3%, and 95.1%, respectively. In addition, several distinct spikes can be observed in event counts over time. For example, the large spike in November 2015 (Fig. 4) documents the M w 6.5 earthquake on Alor Island and the associated aftershocks. . Surprisingly, our catalog only missed one out of the 14 USGS-identified events (Fig. 4b). This is exceptional, considering that the trained model we used for the automatic picker is known for being more effective at detecting crustal events of small magnitude (Mousavi et al., 2020).

Discussion
Performance evaluation of the automatic workflow To assess the overall performance of our automatic workflow, we compared the preliminary catalogue from REAL with the ISC catalog (2021). The REAL output was used to remove the potential biases from the quality control parameters adopted in the later steps, as those effects are difficult to explore quantitatively. We selected the 2015 events for this benchmark, when the YS array was fully operational (Fig. 4a). The ISC reports The associated focal mechanism of this earthquake is shown in Figure 1. Note that at the end of 2016, 22 of the temporary stations were removed, and 12 replacements were not fully functional through the duration of the experiment , resulting in a dramatic decrease of detection after this array change.
https://www.seismosoc.org/publications/the-seismic-record/ • DOI: 10.1785/0320210041 The Seismic Record 6 626 events in the study region, with magnitudes ranging from 2.0 to 5.0. Although the REAL output contains ∼22,300 events in total, 590 of them have origin time within a 30 s time window of those reported by the ISC. We visually inspected the waveforms for the 36 events not detected with our automated workflow (event details summarized in Table S3). We found that 13 of the 36 events had no clear phases within 5 s of our detected P-and S-wave arrivaltime window. We suspect, these events were potentially mislocated, as some stations indeed show clear phases but are 20-30 s away from the predicted P-and S-wave arrival time that are difficult to explain with structural heterogeneities (Fig. S5). However, the other 23 events do show clear P-and/or S-wave arrivals (Fig. S5), indicating that most of the clear phases were not detected by the automatic picker. In addition, the requirement of at least two S picks in the association step also leads to some events being missed. Therefore, these earthquakes were absent from our preliminary catalog.
We notice that 15 of the 23 missed events have magnitudes greater than 3.4. This is reasonable, considering that large earthquakes are underrepresented in the training dataset (Mousavi et al., 2019). In addition, small earthquakes tend to exhibit waveform similarity, whereas larger earthquakes exhibit less similarity (e.g., Yeck et al., 2021). Therefore, from our current workflow, the missed events (6% rate) are not too surprising.

Catalog completeness
The large number of detected events enabled further statistical analysis of our catalog. We estimate the magnitude of catalog completeness (M c ) of 3.0 by approximating it with the 99th percentile of the Gaussian component of an exponentially modified Gaussian probability density function following White et al. (2019). Detailed procedures are described in the supplemental material. The M c value of 3.0 (Fig. 5) is smaller than estimates from other active subduction zones, such as Mexico and Izu-Mariana (Nishikawa and Ide, 2014).
Based on the M c , we estimated the b-value by accounting for the frequency-magnitude distribution beyond the magnitude of 3.0 (Fig. 5). This results in a b-value of 0.87, which is on the lower end of derived values for the Sumatra-Sunda subduction system (Nishikawa and Ide, 2014).

Implications for tectonic deformation
Our new catalog reveals both deep subduction zone and interesting crustal events along the broad strike of the arc-continent collision zone. The seismicity shows distinct characteristics separated into two regions near ∼123°-124°E. In the east, there are abundant crustal events clustered beneath Alor and Timor (Fig. 2). This area marks a boundary that has been previously identified to separate the Banda orogen with different amounts of crustal shortening (Harris, 2011;Koulali et al., 2016). It is also approximately concordant with a geodetic block boundary (e.g., Nugroho et al., 2009), with the Timor-Wetar block in the east accommodating up to 63% motion of the Australian plate relative to southeast Asia compared to an amount of 21%-41% in the west (Nugroho et al., 2009). A recent ambient noise tomography study also observes contrasting crustal structure on the two sides, with more slower materials observed in the crust beneath the Timor Zhang and Miller, 2021;Miller et al., 2021). Combined, we interpret the abundant crustal seismicity beneath the Alor-Timor segment to reflect the strong and complex crustal shortening and subduction of Australian continental materials. Another transitional boundary of crustal events is observed northeast of Timor, where fewer events exist in the back-arc region, compared to the more continuous distribution along the strike in the west (Figs. 2a and 3).
The dramatic Wadati-Benioff zone seismicity and its distinct distribution along the strike is another prominent feature in the catalog, which highlights the change of slab morphology at depth. To the west of Alor, seismicity defines a continuous north-dipping slab seismicity extending down to ∼600 km The Seismic Record depth, which closely follows the slab 2.0 model (Hayes et al., 2018) (Fig. 3a-d). Beneath the islands of Timor and Alor, we observe fewer deep earthquakes, and most of the observed events are confined to the top 100 km (Fig. 3e). Interestingly, one distinct cluster of deep events is observed at regions further east of the Timor Island (beyond the longitude of 126°E), and they appear to be focused at 100-200 km of depth range (Fig. 3f). More sporadic events are observed at deeper (∼400 km) depths (Fig. 3f), beyond which exceeds the detection limit. Gaps in seismicity at slab depths have been noted previously by multiple studies in the region (e.g., Das, 2004;Ely and Sandiford, 2010). The lack or reduced number of earthquakes around 300-400 km globally has been well established (Kirby et al., 1996), which is evident across the region in our new catalog as well as the Das (2004) catalog. However, Das (2004) additionally noted a gap in the 100-300 km depth range between 120°and 122°E and very few earthquakes between the surface to ∼400 km between 125°and 127°E. They propose these gaps as evidence of two slabs rather than one highly distorted slab, supporting the original idea of Cardwell and Isacks (1978). In contrast, the lack of seismicity located in the Das (2004) dataset to the east of the Savu Sea and to the northeast of Timor has been attributed to active slab rupture during the early stages of arc-continent collision (Ely and Sandiford, 2010).
Compared to the existing relocated catalog of Das (2004Das ( ) that spans 1962Das ( -1996, we detected one order of magnitude more events in only ∼5 yr. Our new catalog has many earthquakes between 100 and 300 km depth along profile 120°E (Fig. 3b) and also further to the east at 124°E (Fig. 3d). Therefore, it appears that this part of the slab is not aseismic, as suggested by Das (2004) and Cardwell and Isacks (1978). Interestingly, the distinct cluster of deep events around 128°E (Fig. 3f) contains the most M 4.0+ events along the Banda arc (Fig. S6). The locations are consistent with the Wadati-Benioff zone seismicity identified by Das (2004), indicating the active sinking of oceanic lithosphere beneath this section of the arc. Therefore, 128°E likely delineates a transition from the arc-continental collision zone beneath the Timor in the west to the normal subduction further in the east.

Limitations
The results of this study provide the most complete catalog available for the region congruent with the temporary YS deployment, despite the reduction of detected events after some of the temporary stations were removed and replaced later in the deployment (Fig. 4b). Nevertheless, several aspects of the workflow could be improved for the future work in similar tectonic settings.
First, in the picking step, only the default model of EqT_model was used, which is trained primarily from small (M <2.5) crustal seismicity (Mousavi et al., 2019). Future studies should consider training a new model specifically for subduction-related seismicity. The presented catalog can provide additional training data to train models explicitly targeting subduction environments.
Second, we used regional velocity models for the 3D relocation, and future studies could use the picks from this study to update the model and refine the hypocenter coordinates of the catalog simultaneously.
Third, the magnitude of M L is likely underestimated for the deep events, as only epicentral distance is used in the calculation. Any future work relying on accurate magnitude information requires more careful calibration to remove the potentially systematic difference between the crust and mantle events of this catalog.

Conclusion
Taking advantage of an automated processing procedure with well-trained machine-learning picker, we detected and relocated ∼19,000 events through analyzing approximately 5 yr of new seismic data in the Timor-Leste and Eastern Indonesia region. The catalog contains M L magnitudes ranging 0.5-7 and provides the most complete catalog available in the region for the duration of the temporary YS deployment. This new catalog significantly increases our understanding of tectonic deformation in the greater Banda Arc, including both the arc-continental collision process beneath the Timor and Alor islands and the complex subduction dynamics along the arc at depth.
The detailed catalog from this study provides a unique opportunity for future work, including earthquake tomography, joint inversion with other datasets, such as surface-wave dispersion and teleseismic arrivals, and detailed focal plane analyses to understand the behavior of subducting slab in the deep mantle. The catalog for in the region could be further trained either for an improved catalog or be an example for catalog development relying on open-source package for other active subduction zones.

Data and Resources
Data from the YS array (Miller, 2014;  . Details about the construction of 1D and 3D velocity models for the catalogue production, estimation of magnitude completeness, the effects of the HypoDD relocation on the seismicity distribution, and parameters used in NonLinLoc and HypoDD are provided in the supplemental material.

Declaration of Competing Interests
The authors acknowledge that there are no conflicts of interest recorded.