## Abstract

Cost‐effective strategies for enhancing seismic velocity models are an active research topic. The recently developed hybridization technique shows promise in improving models used for deterministic earthquake hazard evaluation. We augment the results of Ajala and Persaud (2021) by exploring other hybrid models generated using 13 sets of embedding parameters—taper widths and subvolumes—and summarize their effect on waveform predictions up to a minimum period of 2 s. Our results introduce the notion of compatibility as a consideration by showing that the same basin models embedded into two different regional models can produce notably different outcomes. In contrast to most of our hybrid Harvard models that produce better matching ground motions, only one of the hybrid models generated using the Southern California Earthquake Center model as a regional model gives a closer match to the waveforms. Similar results are obtained at higher frequencies; however, improvements due to hybridization are reduced. A potential explanation for these results may be the limited high spatial frequencies in the travel time tomography basin models and the >5–6 s wavefield‐dominated adjoint regional models. Although the strongly tapered compatible hybrid models tend to produce better results, we find instances of improvements even with merging artifacts.

## Introduction

It is imperative that the seismological community be able to develop large‐scale Earth models containing spatial frequencies that can accurately model ground motions in the natural frequency bands of buildings located in earthquake‐prone areas. Many of the ingredients necessary to make this goal a reality are in place. Imaging solutions like full‐waveform inversion coupled with knowledge of the best implementation practices and the development of robust optimization cost functions (Górszczyk *et al.*, 2021) to assist with inversion complexities have led to successful applications at various scales (Tromp, 2020). Advances in geophysical instrumentation, including the creative adaptation of telecommunication cables (Zhan *et al.*, 2021) and nodal seismic arrays (Wang *et al.*, 2021), provide seismograms at the high spatial density needed for high‐resolution modeling. Recent theoretical studies on using artificial intelligence to accelerate seismic wavefield simulations have produced positive results with noted challenges for realistic scenarios (Moseley *et al.*, 2020). Despite these developments, the main bottleneck that makes the creation of high‐resolution (>1 Hz) regional Earth models currently unfeasible is the sheer amount of computational resources required, which may not be available until we can fully harness the next technological leap in computing (e.g., Madsen *et al.*, 2022). These research frontiers imply the existence of underexploited detailed local models developed from dense datasets closer to exploration‐style surveys (Lin *et al.*, 2013). Therefore, there ought to be a way to introduce the shorter spatial wavelength content in these models and datasets into regional‐to‐global models to enhance them, particularly, in areas of interest such as sedimentary basins, active fault zones, and other high seismic hazard regions. Suggested approaches include the Bayesian multiscale inversion framework used to update the Collaborative Seismic Earth Model (Fichtner *et al.*, 2018), model hybridization proposed by Ajala and Persaud (2021) to merge multiscale datasets, and a minimal‐updating level‐set data‐driven scheme tested in the Los Angeles basin (Muir *et al.*, 2022).

We revisit the topic of model merging to provide more insight into the influence of some critical parameters by extending the previous results using more hybrid model examples and analyzing their impact on localized wavefield discrepancies at the minimum period of 2 s. We create these hybrid models by embedding Salton Trough basin models into two Southern California Earthquake Center community velocity models using different subvolumes of the basin models, and boundary smoothness between the basin and regional models. In one instance, hybridization gives overall the lowest misfits and substantial improvements over the community model in most of the cases tested, but only one hybrid model showed improvements for the other community model, which may be due to the compatibility of the merged models. We also find that merging artifacts do not necessarily preclude the hybrid model outperforming its community model. Following Ajala and Persaud (2021), we use the same models for hybridization, a subset of their earthquake data, and a similar verification procedure. In the following sections, we present the dataset and techniques used in the research. Then, we present our validation results by showing some waveform examples in each hybrid model and using central tendency measures to summarize the errors. We finally conclude by discussing some relevant aspects of the work and ideas for future studies.

## Study Area and Dataset

Our simulation domain is Salton Trough (Fig. 1)—a continental rift basin formed by transtensional forces between the Pacific and North American plate boundaries (Elders *et al.*, 1972). The extensive network of active fault systems (Plesch *et al.*, 2007) and the basins filled with sediments deposited by the Colorado River make this a high earthquake hazard region. A rupture scenario for the Big One on the southern end of the San Andreas fault is often referenced to motivate necessary preparations (Jones *et al.*, 2008). The availability of permanent seismic stations deployed in over 20 networks provides sufficient ground‐motion data easily accessed through the Southern California Earthquake Data Center (SCEDC, 2013) and makes the area an attractive natural laboratory for various seismic studies.

### Earthquake waveforms

We use three‐component broadband ground displacement records from five moderate‐magnitude earthquakes (Fig. 1; Yang *et al.*, 2012). Each seismogram is downloaded from the SCEDC, processed, and analyzed for noise content. Processing involves removing the linear trend, mean, instrument response, and filtering in 2–30 s, 3–30 s, and 6–30 s bands. A waveform is selected for use if the signal‐to‐noise ratio is or exceeds three on all components.

### Seismic velocity models

In this study, we consider four Earth models: two regional‐scale models and two basin‐scale models. The regional models are the latest versions of the Community Velocity Model (CVM)—Southern California Earthquake Center (cvms; Lee *et al.*, 2014) and Harvard (cvmh; Tape *et al.*, 2010) developed using low‐frequency (<0.5 Hz) seismograms and adjoint tomography. The basin‐scale models (purple polygons in Fig. 1) are travel time tomographic models created using a combination of borehole‐explosion data and local earthquakes in Imperial Valley (Persaud *et al.*, 2016) and Coachella Valley (Ajala *et al.*, 2019). The basin models have a maximum depth of 10 km in Coachella Valley and 8 km in Imperial Valley (Fig. S1, available in the supplemental material to this article). Because of the spatial coverage of the active source survey and certainty in source locations, the models inherently provide better constraints on the basin structure in the region. *S*‐wave velocity and density for the basin models are empirically determined (Brocher, 2005). SCEC hosts these models that are queried using the Unified Community Velocity Model (UCVM) program (Small *et al.*, 2017) and retrofitted with geotechnical layering in the top 350 m (Ely *et al.*, 2010) and high‐resolution (∼30 m) topography.

## Model Hybridization

*n*‐dimensional space occupied by a regional model $Rn(x)$ and $Ln$ be the space of the local model $Ln(x)$ to be embedded into the regional model with $Ln\u2286Rn$. We define the following blending map:

*n*1D window functions defined to be cosine tapers in this study with taper ratios in [0,0.5) and larger numbers indicating a smoother boundary between models (Fig. 2). The hybrid model is then generated as

We use two different volumes for the local models to merge them with the CVMs. The polygons that indicate the spaces are shown in Figure 1; 1 refers to the entire model domain, whereas 2 refers to the irregular volumes in which the models are inferred to be well resolved or have good ray coverage during their development. The blending maps shown in Figure 2 are used to make 26 hybrid Earth models: 13 cvmh hybrid models (Fig. 3 and Fig. S2) and 13 cvms hybrid models (Fig. 4 and Fig. S3). We consider three levels of tapering: no tapering (a taper ratio of 0), moderate tapering (a taper ratio of 0.2), and strong tapering (a taper ratio of 0.49).

## Ground‐Motion Verification

To check the suitability of the hybridization technique in enhancing regional models or producing meaningful seismograms, we seek to check ground‐motion predictions from the hybrid models against the pure regional models and observations from broadband sensors. We compute synthetic seismograms by forward modeling the full earthquake wavefield using the spectral‐element method (Komatitsch and Tromp, 1999). Each of our 140 simulations is performed in anelastic media using the Olsen attenuation equation (Olsen *et al.*, 2003) to generate the frequency‐independent shear quality factor model by scaling the *S*‐wave velocities by 0.05, and we do not consider anisotropy or source inversions. We enforce an *S*‐wave velocity cutoff of 600 m/s so that our results are globally valid to the smallest period of 2 s. Topography is included in the simulations, but several approaches for ground‐motion modeling without topography exist (Aagaard *et al.*, 2008). The synthetic traces are filtered in the same period intervals as the data.

*p*and observed data

*d*. The first is the normalized squared error given as

## Results

Figures 3 and 4 show all pure and hybrid models with the percentage misfit change and horizontal component 6–30 s waveform examples at select stations within and outside the realm of the embedded basin models. The summary of the validation exercise is shown in Figure 5, and the results for the cvmh and cvms hybrid models are markedly different, even though the pure cvms model outperforms the pure cvmh models in the three period intervals.

In the 6–30 s range, model 2 (Fig. 3b) is the only hybrid model that underperforms relative to the pure cvmh model, with model 9 (Fig. 3i) being the best hybrid model. At 3–30 s and 2–30 s, models 2 (Fig. 3b) and 4 (Fig. 3d) underperform, with model 13 (Fig. 3m) producing the best ground motions for both the period intervals. The waveform examples for event 5 at station TOR in Coachella Valley, WES in Imperial Valley, and MONP2 in the Peninsular Ranges show that all the models can produce decent seismograms except the notably increased amplification in the pure cvmh model. We also observe some basin resonance at station TOR in the later surface waves arrivals of hybrid models 10 (Fig. 3j) and 14 (Fig. 3n) absent in the data. For the hybrid cvms models and all the period intervals, only hybrid model 10 (Fig. 4j), which embeds just the Imperial Valley basin model, outperforms the pure cvms model. The waveforms for event 3 at IDO, SWS, and BAR are reasonable in all the models, with some of the hybrid models, such as model 2 (Fig. 4b), failing to match the amplitudes of some surface wave content at station IDO as well as the pure model. Figures S4 and S5 show waveform examples for event 3 in cvmh and event 5 in cvms, and the entire waveform gallery of the exercise is available in the data repository (Ajala and Persaud, 2022) for perusal.

## Discussion

Recent studies of the community models in the Los Angeles basin highlight the importance of accurate shallow crustal structure, among other parameters, in waveform prediction within sedimentary basins (Lai *et al.*, 2020; Jia and Clayton, 2021). Our hybridization technique allows us to directly test the accuracy of the shallow basin structure in the community models relative to the embedded basin models. We acknowledge some of our modeling assumptions and simplifications, such as our use of empirical relations for some model parameters, a relatively high minimum *S*‐wave velocity restriction than is recommended for accurate ground motions in the 0–0.5 Hz range (Olsen *et al.*, 2003), and the lack of source inversion and anisotropy can lead to incorrect interpretation of the misfits. The exemplary verification exercise would also involve a complete wavefield misfit analysis rather than localized waveform errors used in the study. Although the former is currently impracticable as it would require sensors almost everywhere.

### High‐frequency results

One may expect that hybridization would offer the largest model improvements at higher frequencies, yet it is clear from Figures 3–5 that although the trends in the different period bands are similar, all pure and hybrid models have a poorer performance at shorter periods (3–30 s and 2–30 s), and the influence of hybridization is reduced compared to the longer period results at 6–30 s. This may be due to the spatial content of the models under interrogation. The cvmh model was developed using earthquake seismograms dominant in the 6–30 s period, whereas the cvms model used both noise correlograms and earthquake waveforms filtered in the 5–50 s period. Travel time tomographic models are also known to contain low spatial frequencies (Treister and Haber, 2017); so the basin models may not be as helpful in improving ground motions at shorter periods. We anticipate that embedding local full‐waveform tomography models developed with high‐frequency data may produce even better hybrid models at higher frequencies and will investigate this in a future study.

### Model compatibility

Our study shows that most hybrid cvmh models outperformed the pure cvmh model compared to only one hybrid cvms model (Fig. 4j). These results imply that the structure represented in the Coachella Valley basin model can improve the cvmh, but embedding this basin model degrades the original cvms model. Therefore, we can state that the Coachella Valley basin model is incompatible with the cvms model, unlike the Imperial Valley model. This is another reason why domain‐specific misfit analysis (Figs. S1–S6) is essential when using model hybridization, as it gives the misfit contribution from each embedded model as well as their impact outside their volumes. Well‐resolved volumes (polygons 2 in Fig. 1) and strong tapering tend to produce better hybrid models (Fig. 5). In addition, the presence of merging artifacts does not necessarily imply that a hybrid model will underperform relative to the pure model. For example, cvmh hybrid model 5 (Fig. 3e) uses polygon 2 for both the basin models without tapering, and outperforms tapered hybrid models 6 (Fig. 3f) and 8 (Fig. 3h) at low frequencies. Therefore, finding the well‐resolved volume is just as crucial as tapering away merging artifacts, and both the parameters should be seriously considered during hybridization. We further note that the adverse effects of merging artifacts may become unignorable at >1 Hz, because the hybrid models (6 and 8 in Figs. 3f and h) in the example earlier eventually outperform the hybrid model with no tapering (5 in Fig. 3e) at 2–30 s period. In summary, Figure 5 clearly illustrates the importance of smooth hybridization by showing the possibilities of significant improvements in earthquake ground‐motion prediction provided that compatibility criteria are satisfied.

## Conclusions

We revisit model hybridization to document the effect that the embedding volumes and degree of tapering in hybrid models have on earthquake ground‐motion prediction in Salton Trough. To this end, we consider 26 hybrid models using two basin‐scale and two regional models hosted by the Southern California Earthquake Center. Our model verification uses five earthquakes of moderate magnitude simulated using the spectral element method and analyzed over three period intervals with the shortest period of 2 s. In general, all regional and hybrid models we evaluate perform better in longer than shorter period bands (<6 s), possibly due to the low‐frequency content in the models. Using well‐resolved subsets of the basin‐scale models and strong tapering tend to produce hybrid models with better waveform predictions. However, sharper boundaries in the hybrid model due to less tapering does not necessarily imply an underperformance, especially at low frequencies and when using well‐resolved volumes. Furthermore, the same hybridization approach may not produce better hybrid models regardless of the regional model used, and thus subdomains of hybrid models must be evaluated to ensure their model components are compatible.

## Data and Resources

Reproducibility materials including all data needed to evaluate the research are publicly accessible (Ajala and Persaud, 2022). The supplemental material includes additional details of the simulation results including a summary of the zero‐lag correlation misfits.

## Declaration of Competing Interests

The authors acknowledge that there are no conflicts of interest recorded.

## Acknowledgments

The authors thank Editor Keith Koper, the Associate Editor, and two anonymous reviewers for their comments that helped improve the article. This material is based upon work supported by the National Science Foundation (Grant Number 2105320) and the Southern California Earthquake Center (Award Numbers 18074, 19014, 20023, and 21059). The SCEC Contribution Number is 10950. Rasheed Ajala was supported by the merit‐based Society of Exploration Geophysicists Foundation scholarship. Patricia Persaud was supported as a 2020–2021 fellow of the Radcliffe Institute for Advanced Study at Harvard University.