ABSTRACT
We have developed a deep learning framework to simulate the effect of boundary conditions for wave propagation in anisotropic media. To overcome the challenges associated with the stability of conventional implementation of boundary conditions for strongly anisotropic media, we develop an efficient algorithm using deep neural networks. We incorporate the hyperparameter optimization (HPO) workflow in the deep learning framework to automate the network training process. Hyperparameter selection is a crucial step in model building and has a direct impact on the performance of machine learning models. We implement three different HPO techniques, namely random search, Hyperband, and Bayesian optimization, for simulating boundary conditions and compare the strengths and drawbacks of these techniques. We train the network using a few shot locations and time slices enabling the network to learn how to remove boundary reflections and simulate wave propagation for unbounded media. The automated deep learning framework with HPO improves the performance of deep learning models by achieving the optimal minima and significantly improves the efficiency of the workflow. The benefit of this approach is its simple implementation and significant reduction of reflections at the boundaries, especially in the case of tilted transverse isotropic media. We validate our approach by comparing wave propagation at the boundaries using our algorithm with the output obtained using the unbounded media simulated by padding the model. Tests on different models with acoustic and elastic wave propagation verify the effectiveness of our approach.
INTRODUCTION
Given the finite nature of computational memory, a numerical solution for wave propagation can be obtained only at a finite number of points. Therefore, unbounded domains of the wave equation require truncation by introducing boundaries to obtain finite models (Reynolds, 1978; Ionescu and Igel, 2003). The introduction of boundaries in models leads to reflections and wraparound, whereas, ideally, the wave should propagate through the boundaries with no reflections (Reynolds, 1978). The eventual inward propagation of spurious reflections from the edge of a computational grid masks the true solution for the numerical wave propagation in an infinite media (Clayton and Engquist, 1977). To minimize spurious boundary reflections, boundary conditions must be implemented to avoid numerical instabilities by making the grid boundaries transparent to outward-moving waves (Clayton and Engquist, 1977; Ionescu and Igel, 2003). The conditions need to be such that the solution after implementing the boundary conditions remains as close as possible to the solution obtained using the original physical infinite domain (Nataf, 2013).
Several authors have proposed different sets of boundary conditions for acoustic and elastic media. Engquist and Majda (1977), Clayton and Engquist (1977), and Toldi and Hale (1982) propose absorbing boundary conditions (ABCs) based on paraxial approximations to model outward-moving energy to reduce reflections from the boundaries. Israeli and Orszag (1981) propose ABCs with damping using sponge filters. Berenger (1994, 1996) proposes a perfectly matched later (PML) formulation, which is an absorbing region that can provide reflections that are magnitude lower (Nataf, 2013). Yao et al. (2018) propose an effective ABC for acoustic and elastic wave simulation by simplifying the damping term of the split PML boundary condition. Implementation of low-reflecting boundary conditions onto open boundaries in the anisotropic models is a challenging problem (Bécache et al., 2003; Appelö and Kreiss, 2006; Sofronov and Zaitsev, 2008). Lower order ABCs are easy to implement; however, they are not accurate in anisotropic cases because they generate spurious reflections. Higher order ABCs are accurate, but their implementation is complicated, and they significantly increase computational cost. PMLs, on the other hand, are easier to implement and have a lower computational cost than ABCs; however, they are unstable for different classes of anisotropic media, especially tilted transverse isotropic (TTI) media (Bécache et al., 2003). In addition, the standard implementations of ABCs and PMLs fail in the presence of inverse modes. This is because, for an outgoing inverse-mode approaching the boundary, energy is propagating out with group velocity pointing out; however, the phase velocity is pointing in. The ABC identifies this as an incoming wave and therefore does not let it out. On the other hand, PMLs are not able to select the right outgoing solution. The standard implementations lead to instability in the presence of inverse modes.
In this work, we propose a novel workflow to implement boundary conditions using a deep learning framework. Deep learning algorithms have demonstrated high performance in research and commercial applications (Yu and Zhu, 2020). Different deep learning models are suitable for different types of problem statements and data sets. During the model tuning process, different model configuration arguments known as hyperparameters are specified by the user to tune the network for a given problem statement. Hyperparameters cannot be directly estimated from the data and must be set prior to model training. Selecting the best hyperparameters for models has a direct impact on the performance of the model (Yang and Shami, 2020). Conventionally, hyperparameters are manually tuned by using an iterative trial and error-based approach (Zöller and Huber, 2021). However, manual tuning is inefficient because of a large number of hyperparameters in complex models, nonlinear hyperparameter interactions, and time-consuming model evaluations (Yang and Shami, 2020). These factors have inspired research on the automated tuning of hyperparameters known as hyperparameter optimization (HPO).
HPO has multiple advantages over the manual hyperparameter tuning process (Kaur et al., 2021a,2022). It reduces the considerable human effort to manually tune a large number of hyperparameters, especially for complex models and large data sets. It improves the accuracy and efficiency of the models and makes the models and research more reproducible (Yang and Shami, 2020; Yu and Zhu, 2020). HPO problems need specialized optimization techniques because HPO can sometimes be nonconvex and nondifferentiable; therefore, traditional optimization techniques, such as gradient descent, may result in local instead of global minima (Luo, 2016). In addition, HPO algorithms also should be able to effectively handle different categories of hyperparameters, such as discrete (e.g., whether to use early stopping), continuous (e.g., learning rate), categorical (e.g., type of optimizer), etc.
Several studies have proposed different categories of HPO methods, such as decision-theoretic methods, Bayesian optimization models, and multifidelity optimization techniques. These methods have their own advantages and limitations. Decision-theoretic methods (Bergstra et al., 2011; Bergstra and Bengio, 2012) define a hyperparameter search space and determine the best-performing hyperparameter combination by performing either an exhaustive search using grid-search methods or a random selection using random search methods. These methods treat each parameter configuration independently and may require considerable time and space, especially the exhaustive grid-search methods. Multifidelity optimization algorithms (Li et al., 2017), such as Hyperband, address the limited resource issue and, in each iteration, eliminate poorly performing hyperparameter combinations to save computation time. However, similar to decision-theoretic approaches, Hyperband also performs each evaluation independently of the previous evaluations and can thus perform unnecessary function evaluations in poorly performing regions. This issue can be mitigated by using Bayesian optimization (Eggensperger et al., 2013), which uses the history of previous evaluations to determine the next configuration. Bayesian models belong to the category of sequential models and are difficult to parallelize; however, they can usually converge to optimal hyperparameter combinations within a few iterations.
In this paper, we propose an HPO workflow for a deep learning framework to simulate boundary conditions for wave propagation. We perform a comparative analysis among three different HPO algorithms, namely random search methods, which are decision-theoretic approaches; Hyperband optimization, which is a multifidelity optimization algorithm; and Bayesian optimization. We demonstrate that Bayesian optimization is more efficient and leads to a better performance in the testing phase than random search or Hyperband optimization. We train the network using a few shot locations and time slices to learn a mapping between the bounded model with boundary reflections and the model with expanded boundaries to simulate the effects of boundary conditions. Once the network learns to attenuate the boundary reflections and wrap-around events, we test the proposed approach using different shot locations and time slices of different models that are not a part of training. Using numerical models of increasing complexity, we demonstrate that the proposed approach can effectively simulate the physical behavior of unbounded media for acoustic and elastic wave propagation.
HPO
To achieve an optimal performance with a neural network-based workflow, two different types of parameters need to be tuned. The first category is the model-based parameters, such as weights and biases, which can be initialized and updated through the training process. The second category is the hyperparameters related to the model architecture, which cannot be directly estimated from the data and needs to be set before training the network. Hyperparameters have a direct impact on the neural network training, and their optimization is crucial for achieving an optimal performance. We optimize two different categories of hyperparameters, such as design hyperparameters related to the construction of a deep learning model (number of filters, activation function, etc.) and optimizer hyperparameters related to the training process of the model (learning rate, dropout rate, minibatch size, number of epochs, etc.). For the design-based hyperparameters, we provide several filters and activation functions, and the optimization process selects the best combination based on the complexity of the problem. Next, we specify the search space for the following optimizer hyperparameters.
Learning rate — This is one of the most important hyperparameters in a deep learning model, which defines step size for the convergence of the iterative process. A high learning rate can not only accelerate the training process but also may oscillate the gradients and hinder convergence, whereas a low learning rate leads to smooth convergence with a longer training time. We provide a range of learning rates, as shown in Table 1, and the optimization algorithm selects the appropriate learning rate to converge the objective function to global minima.
Filters | Activation functions | Batch size | Dropout rate | Learning rate |
---|---|---|---|---|
32–64 | Sigmoid ReLU, tanh | 8–16 | 0–0.5 | 0.001–0.01 |
Filters | Activation functions | Batch size | Dropout rate | Learning rate |
---|---|---|---|---|
32–64 | Sigmoid ReLU, tanh | 8–16 | 0–0.5 | 0.001–0.01 |
Dropout rate — Dropout is a method of stochastic regularization which prevents overfitting and enables the neural network to learn robust features (Srivastava et al., 2014). It is a computationally efficient method to simulate an ensemble of neural networks by randomly dropping neurons during the training process. We use a dropout range of 0.0–0.5 with a step size of 0.05, and the optimization algorithm finds the optimal dropout rate of the neural network training.
Batch size — Batch size is the number of training samples used to estimate the gradient. The selection of batch size impacts training stability and generalization performance (Bengio, 2012). Smaller batch size offers a regularization effect and lowers generalization error. We incorporate different batch sizes in the search space for the optimization algorithm, and the workflow automatically selects the optimal batch size for training.
The number of epochs — This is an important hyperparameter because too many epochs can lead to overfitting of the model during the training, whereas too few epochs result in an underfitted model. Early stopping is an optimization technique that helps to mitigate overfitting without compromising model accuracy. We apply an early stopping strategy to train the network until the point when the loss function update becomes small and the validation error is still decreasing.
Activation function — The activation function is a critical part of the neural network design. It defines the mapping of the weighted sum of input from different layers of the network to an output and adds nonlinearity to the network. It is crucial to select the correct activation function in the hidden layers as well as in the output layer for an optimal network performance.
We initiate the optimization process using the previously defined search space, and we optimize the hyperparameters using three optimization algorithms:
Decision-theoretic approach — The first algorithm that we implement is the random search, which is a type of decision-theoretic method. The random search algorithm randomly selects the hyperparameter configurations in the search space defined by the user and trains these candidate values until the defined trials are exhausted.
It can cover a large search space; however, it treats each evaluation independently and does not keep track of the previously well-performing regions. The computation complexity of the random search algorithm is . We implement a random search with 20 trials, and we discuss the test results of the best-performing model in the next section.
Multifidelity optimization — We further improve the efficiency of the random search algorithm by using multifidelity optimization. One of the issues with the random search method is that, with a larger hyperparameter configuration space, the longer execution time may not let the algorithm converge to optimal minima if the resources are limited. To overcome this limitation, we implement the Hyperband algorithm, which is a type of multifidelity optimization. It achieves a trade-off between the allocated resources (B) and the number of hyperparameter configurations (n) by dividing the budget into n parts and allocating these parts to different configurations where . Successive halving is then applied to each of these configurations to discard the poorly performing configurations in each iteration and pass only well-performing configurations to the next iteration (Li et al., 2017; Yang and Shami, 2020).
Given the number of data points, minimum and maximum budget constraints are determined as and , respectively, along with the number of instances for training. Next, on the basis of and , the number of configurations n and budget size for each configuration are calculated. The algorithm begins with the most aggressive bracket in which the bracket is referred to each run of successive halving within the Hyperband. Based on n and b, the configurations are sampled and passed to the successive halving model in which each bracket performs a random search operation. The poorly performing configurations are discarded by the successive halving method and well-performing configurations are passed-on to the next iteration. This process is repeated until an optimal hyperparameter configuration is achieved. The incorporation of the successive halving method makes the computational complexity of the Hyperband method . We compare the test results in the next section.
Bayesian optimization — Previously mentioned algorithms independently process each hyperparameter configuration and may thus end up wasting a significant amount of computational time in poorly performing regions of search space. To circumvent this issue, we implement Bayesian optimization. Unlike previously mentioned algorithms, Bayesian optimization determines the next hyperparameter configuration based on previously tested configurations and thus, given appropriate resources, it assures convergence toward the optimal configuration (Yang and Shami, 2020). Bayesian optimization has two key components: a surrogate model that fits observed points into the objective function and an acquisition function that provides a trade-off between exploration and exploitation regions. We first obtain the predictive distribution for the probabilistic surrogate model. Next, we use the acquisition function for exploration, in which we sample instances in regions that have not been previously sampled, as well as exploitation, in which we sample the most promising regions with a higher likelihood of global optimum. The balance between the exploration and exploitation regions helps the Bayesian optimization to detect the optimal regions and, at the same time, avoid missing the better configurations in unexplored regions. Bayesian optimization can detect the optimal hyperparameter configuration based on the previously tested values; however, dependence on the previously tested values makes the model sequential and difficult to parallelize. The computational complexity of the Bayesian optimization with the Gaussian process is .
DEEP LEARNING FRAMEWORK
For acoustic wave propagation, we implement the HPO workflow using a deep learning framework with a simplified U-Net architecture, as shown in Figure 1, which is a modified version of the architecture from Ronneberger et al. (2015). U-Net architecture is an encoder-decoder architecture that is U-shaped. It consists of the following elements:
Encoder network — The left side of the network is the encoder block, which also is known as the contraction path. It is a feature extractor that learns the abstract representation of the input image through a sequence of encoder blocks. The contraction path consists of convolution layers followed by rectified linear units (ReLUs) and max-pooling operations. The ReLU activation function incorporates nonlinearity into the network, and the output of ReLU is connected with the corresponding decoder block using skip connections. At each contraction step, the number of feature channels is doubled.
Decoder network — The right side of the network is the decoder block or the expansion path. During the expansion process, the network takes the abstract representation from the encoder and generates the output. It upsamples the feature map and concatenates it with the feature map of the contraction path using skip connections. We do not include the upconvolution layer in the upsampling path as used in the original U-Net framework by Ronneberger et al. (2015).
Skip connections — Skip connections help to pass features from the contraction path to the expansion path and help to recover the spatial information lost during the downsampling operation. The spatial information provided by skip connection helps the decoder to effectively recover fine-grained details when producing the output.
We train the network in two steps: HPO and parameter tuning. We create training labels using different isotropic and anisotropic models with different source wavelet frequencies using 2500 time slices from five shot locations at the corner and in the center of the model. We train the network to learn the mapping between the bounded model consisting of spurious reflections and the padded model used to simulate the unbounded domain, as shown in Figure 2a and 2b. We first define hyperparameters for the deep learning model, along with the search space (Table 1). Next, we implement HPO workflow using random search, Hyperband, and Bayesian optimization workflows with 20 trials. After optimizing the hyperparameters, we pick the best hyperparameter configuration from the leaderboard of models, as shown by the green line in Figure 3, and we use it to tune the parameter for network training. We incorporate K-fold cross validation (CV) in the training process to analyze the generalization ability of the proposed model. We train the network for 25 epochs selected using an early stopping strategy. During training, the network learns to attenuate the spurious boundary reflections, and we test the trained network with the best hyperparameter configuration using the three optimization algorithms and perform a comparative analysis during the testing phase in the examples shown in the next section.
For elastic wave propagation, we implement a deep learning framework using generative adversarial networks (GANs) (Goodfellow et al., 2014). The network consists of two deep neural networks: a generator (G) and a discriminator (D) trained in an adversarial way to solve the minimax problem. The G samples the input distribution and maps it to the target distribution. The generated distribution, along with real labels, is fed to D, which attempts to distinguish how close the two distributions are. The two networks improve their respective abilities until the equilibrium point is reached, where D is no longer able to distinguish between real and synthesized images. In the GAN framework, D informs G to adjust the synthesized output toward the more realistic target labels. The elements of G and D are explained as follows.
The G consists of three different blocks, encoder, transformation block, and decoder. The encoder block consists of multiple convolution layers, in which each layer extracts progressively higher level features from the source domain image. Next, the transformation block takes the features from the encoder block and maps the features from the source domain to the target domain. It consists of ResNet blocks, which adds the residue of input to the output and ensures that the output does not deviate much from the original input by retaining the characteristics of the input, such as the shape and size of the object (Kaur et al., 2020). Finally, the decoder block builds the low-level features back from the feature vector by upsampling.
We train the network with isotropic, transverse isotropic (TI), and TTI models with different source wavelet frequencies using five shot locations (one in the center and four on the corners of the model) and 1250 time slices from each model. We use different time slices of the wave propagation in the bounded model with spurious reflections from boundaries as an input to the network, and the target labels are corresponding time slices in a padded model used to simulate the effect of unbounded media. During the training phase, the network learns to remove the boundary reflections and make the grid perimeter transparent to the outgoing wave propagation. We test the trained network using different time slices and shot locations of models that are not a part of the training.
NUMERICAL EXAMPLES
Gradient velocity model
We first use a synthetic model with large velocity variations, as shown in Figure 4a. The model is discretized on a 400 × 400 grid with a 5 m spacing along the vertical and horizontal directions (Sun et al., 2016). We simulate the wavefield using a low-rank extrapolation operator from Fomel et al. (2013b) with a 20 Hz Ricker wavelet and a time step of 2 ms. A blind-test slice is shown in Figure 4b, and the ground truth (not seen by the network) is shown in Figure 4c. We test the trained network to analyze the output using the three HPO algorithms, namely random search, Hyperband, and Bayesian optimization, as shown in Figure 4d–4f with a constrained number of trials. The test output using the random search algorithm needs further improvement, as shown in regions marked by the red arrow in Figure 4d. Next, we analyze the test results using the Hyperband algorithm, as shown in Figure 4e. Hyperband eliminates poorly performing hyperparameter configurations in each run; therefore, in a given number of trials, it can converge to near-optimal minima, as opposed to the random search method. However, we still have some remnant spurious reflections, as shown by red arrows in Figure 4e. Next, we test the Bayesian optimization model, as shown in Figure 4f. In the same number of trials as the other two methods, the Bayesian optimization model converges to an optimal minima and the output is close to the ground truth shown in Figure 4c. To quantify the output using the three optimization methods, we compute the structural similarity index (SSIM) (Wang et al., 2004) and correlation coefficient (), as shown in Table 2 for the test cases. SSIM and values indicate that the Bayesian optimization determines a more optimal hyperparameter configuration because it takes into account previous evaluations, and with each iteration, it moves closer to the optimal minima. We discuss the performance metrics in detail in the subsequent section.
Metrics | Random search | Hyperband | Bayesian |
---|---|---|---|
SSIM | 0.87 | 0.95 | 0.99 |
0.87 | 0.96 | 0.98 |
Metrics | Random search | Hyperband | Bayesian |
---|---|---|---|
SSIM | 0.87 | 0.95 | 0.99 |
0.87 | 0.96 | 0.98 |
Marmousi model
We test the proposed algorithm on the Marmousi model (Versteeg, 1994) using an acoustic wave propagation. The model is based on the North Quenguela trough in the Cuanza Basin and is an interesting model for analyzing the complexities in wave propagation (Fomel et al., 2013a). The model is discretized on a 376 × 384 grid with a spacing of 25 m along the vertical and horizontal directions (Kaur et al., 2019b,2021a). We simulate the wavefield using a low-rank extrapolation operator (Fomel et al., 2013b; Sun et al., 2016) with a 15 Hz Ricker wavelet and a time step of 2 ms. The test wavefields with spurious reflections for two different shot locations are shown in Figure 5a and 5d. The output using the Bayesian optimization workflow is shown in Figure 5b and 5e, which removes boundary reflections and wraparounds and is close to the output using the unbounded media simulated by the padded model, as shown in Figure 5c and 5f.
Two-layer TI model
For the elastic wave-propagation case, we start with a two-layer TI model previously used by Cheng and Fomel (2014) and Kaur et al. (2019a, 2021b). The model is discretized on a 400 × 400 grid. The top layer of the model is the VTI medium with , , , and . The bottom layer is a TTI medium with , , , and and a tilt of . We simulate wave propagation using a 15 Hz Ricker wavelet and low-rank extrapolation operator. We test the trained network with different time slices and source location that are not a part of the training. Figure 6a and 6d shows one of the test slices of the horizontal and the vertical components of the wavefield with spurious reflections from the boundaries. The network applies learned weights during the training process to simulate the effect of boundary conditions and output (Figure 6b and 6e). The output of the proposed method is close to the output using unbounded media (simulated by padding the bounded media), as shown in Figure 6c and 6f.
BP 2007 TTI model
Our last example of simulating boundary conditions using elastic wave propagation is the BP 2007 TTI model. We have shown the application of the proposed method using a part of the BP 2007 TTI model consisting of salt. The velocity model, along with the Thomsen coefficients, is shown in Figure 7a–7d. The model is discretized on a 400 × 400 grid with a vertical and horizontal spacing of 6.25 m. This example is interesting because the conventional low-order implementation of ABCs and PMLs is challenging for TTI media. In the TTI case, instabilities may appear in layers with PMLs owing to exponentially increasing modes, which eventually degrades the reverse time migration output. Therefore, the proposed algorithm is especially important for TTI cases. We input horizontal and vertical components of the elastic wavefield shown in Figure 8a and 8d with boundary reflections into the network. The trained network applies the boundary conditions and outputs (Figure 8b and 8e) with the removal of reflections from the boundary. The output using the proposed method is close to the reference output using the unbounded media simulated by padding the model, as shown in Figure 8c and 8f. We use two different metrics — the SSIM and correlation coefficient (Table 3) — to show that the output of the proposed algorithm is close to the reference time slices for unbounded media simulated by padding the model.
Models | (SSIM) | () | (SSIM) | () |
---|---|---|---|---|
Marmousi | 0.999 | 0.962 | 0.999 | 0.962 |
Two-layer TI | 0.985 | 0.995 | 0.989 | 0.997 |
BP TTI | 0.981 | 0.995 | 0.985 | 0.995 |
Models | (SSIM) | () | (SSIM) | () |
---|---|---|---|---|
Marmousi | 0.999 | 0.962 | 0.999 | 0.962 |
Two-layer TI | 0.985 | 0.995 | 0.989 | 0.997 |
BP TTI | 0.981 | 0.995 | 0.985 | 0.995 |
Marmousi model output is with acoustic wave propagation; therefore, and are similar in the table.
PERFORMANCE METRIC
To quantify the performance of the proposed workflow, we compute different performance metrics during the training and the test phase. During the training phase, we compute CV values (Hastie et al., 2009) for HPO. It is an evaluation technique that provides a measure of how well the algorithm will generalize on the unseen data. It also serves as a resampling technique to evaluate the model with limited data samples. Traditionally, we remove a part of the training data for validation, but with limited data this may pose an underfitting issue. This issue can be addressed by CV, especially K-fold CV, which ensures that every sample from the original data set has the chance of appearing in the training data. We split the data into K folds. For this work, we choose , which, as shown by Kuhn and Johnson (2013), has low bias and modest variance. We fit the model using folds and validate the data with the remaining Kth fold. We repeat this process till every Kth fold serves as the test set, and we note the error for each fold. The mean of errors from all of the iterations gives us the CV performance metric given as . We obtain values of 0.0027, 0.0023, 0.0035, 0.0027, and 0.0029 for different folds with an average value of 0.0028, which indicates that we do not have a split bias and the network generalizes well on the unseen data.
CONCLUSION
We have developed an automated hyperparameter optimized deep neural network-based workflow for the application of boundary conditions during wave propagation in anisotropic media. The idea is to train the network with a few shot locations and time slices such that the network learns to attenuate boundary reflections and Fourier wraparounds on model boundaries. We train the network using the best picked model from the HPO workflow and perform a comparative analysis between three different optimization algorithms: random search, Hyperband, and Bayesian optimization. We demonstrate that, for a given search space and computational resources, Bayesian optimization performs better than random search or Hyperband because it takes into account the previous evaluation for each hyperparameter configuration, which helps convergence within a few iterations. The trained network with optimal hyperparameter configuration can then be applied to test shots with different shot locations and models that are not a part of training. The proposed approach is stable and simulates the effect of higher order ABCs at a fraction of the cost of conventional implementation in strongly anisotropic media, especially TTI media, thus having a great potential for application in reverse time migration. The proposed algorithm reduces manual input into the training process, optimizes values of hyperparameters for a deep learning framework, and thus maximizes the predictive accuracy of deep learning models. We show the application of the proposed workflow for simulating the boundary condition for wave propagation; however, the proposed HPO workflow can easily be integrated into different deep neural networks for different problem statements.
ACKNOWLEDGMENTS
We thank the sponsors of Texas Consortium of Computational Seismology (TCCS) for financial support.
DATA AND MATERIALS AVAILABILITY
Data associated with this research are available and can be obtained by contacting the corresponding author.
REFERENCES
Biographies and photographs of the other authors are not available.