## ABSTRACT

We have recast the forward pass of a multilayered convolutional neural network (CNN) as the solution to the problem of sparse least-squares migration (LSM). The CNN filters and feature maps are shown to be analogous, but not equivalent, to the migration Green’s functions and the quasi-reflectivity distribution, respectively. This provides a physical interpretation of the filters and feature maps in deep CNN in terms of the operators for seismic imaging. Motivated by the connection between sparse LSM and CNN, we adopt the neural network version of sparse LSM. Unlike the standard LSM method that finds the optimal reflectivity image, neural network LSM (NNLSM) finds the optimal quasi-reflectivity image and the quasi-migration Green’s functions. These quasi-migration Green’s functions are also denoted as the convolutional filters in a CNN and are similar to migration Green’s functions. The advantage of NNLSM over standard LSM is that its computational cost is significantly less and it can be used for denoising coherent and incoherent noise in migration images. Its disadvantage is that the NNLSM quasi-reflectivity image is only an approximation to the actual reflectivity distribution. However, the quasi-reflectivity image can be used as an attribute image for high-resolution delineation of geologic bodies.

## INTRODUCTION

Deep convolutional neural networks (CNNs) have been recently used for solving geophysical problems, such as seismic first-arrival picking (Lu and Feng, 2018; Yuan et al., 2018; Hu et al., 2019), seismic interpretation (Shi et al., 2019; Wu et al., 2019a, 2019b), and seismic imaging and inversion (Xu et al., 2019; Kaur et al., 2020; Sun et al., 2020). Interpretation examples include salt classification (Waldeland et al., 2018; Shi et al., 2019), fault detection (Huang et al., 2017; Xiong et al., 2018; Wu et al., 2019b; Zheng et al., 2019), reservoir characterization (Karimpouli et al., 2010; Cao and Roy, 2017; Zhu et al., 2017), and seismic lithofacies classification (Ross and Cole, 2017; Liu et al., 2019) with semisupervised learning (Di et al., 2020). Other applications include the use of neural networks for well-log interpolation (Saggaf and Nebrija, 2003; Salehi et al., 2017; Pham et al., 2020), seismic-data interpolation (Mandelli et al., 2018, 2019; Wang et al., 2020), velocity-model building (Araya-Polo et al., 2018; Richardson, 2018), well-log ties (Bader et al., 2019), synthetic well-data generation (Rolon et al., 2009), autoencoders for unsupervised facies analysis (Qian et al., 2018), and supervised horizon tracking (Peters et al., 2019a, 2019b). Recently, an unsupervised autoencoder method with regularization was developed by Shi et al. (2020) to track target horizons.

There are many types of neural network or machine-learning methods, including generative adversarial networks for seismic interpolation (Siahkoohi et al., 2018), residual networks for traces missing at regular intervals (Wang et al., 2019), Monte Carlo and support vector regression (Jia et al., 2018) for data interpolation, autoencoders (Shi et al., 2020; Wang et al., 2020) for target horizons tracking, and recurrent neural networks for well-log interpolation (Pham et al., 2020). The design of geophysical CNN architectures is largely based on empirical evidence from computer vision research, insights from the principles of artificial intelligence, and heuristic experimentation. Heuristic experimentation is most often used to decide the parameter design (the design of a CNN architecture selects the number of CNN layers, the number of filters/layer, the size of the filters, the type of activation functions, the number of skip layers, and whether a layer acts as a decoding or encoding operation) for the CNN architecture, which has merits and liabilities. The merit is that trial-and-error with different architecture parameters is likely to give excellent results for a particular data set, but it might not be the best one for a different data set. This shortcoming in using empirical tests for parameter selection largely results from the absence of a rigorous mathematical foundation (Papyan et al., 2016, 2017a) for neural networks in general, and CNN in particular.

To partly mitigate this problem for CNN-based imaging algorithms, we now present a physical interpretation of CNN filters and feature maps in terms of physics-based operators for seismic imaging. With such an understanding, we can use a physics-based rationale for the better design of CNN architectures in seismic migration and inversion.

The sparse constraint problem defined in equation 1 is commonly seen in geophysics, e.g., least-squares migration (LSM) with a sparsity constraint. LSM is an important seismic-imaging technique that produces images with better balanced amplitudes, fewer artifacts, and better resolution than standard migration (Lailly, 1983; Tarantola, 1987; Schuster, 1993; Chavent and Plessix, 1999; Nemeth et al., 1999; Duquet et al., 2000; Feng and Schuster, 2017; Schuster and Liu, 2019). The sparsity constraint is one of the important regularization terms used in solving the ill-conditioned least-squares problem (Sacchi and Ulrych, 1995; De Roeck, 2002; Kühl and Sacchi, 2003; Wang and Sacchi, 2005), and sparse LSM (SLSM) has been demonstrated to be effective for mitigation of aliasing artifacts and crosstalk in LSM (Wang and Sacchi, 2007; Herrmann et al., 2009; Dutta, 2017; Witte et al., 2017; Li and Gao, 2018). Image-domain SLSM finds the optimal reflectivity $m*$ with sparse constraints to minimize the objective function $\Vert \Gamma m\u2212mmig\Vert 22$, where $\Gamma $ is the Hessian matrix and $mmig$ is the migration image. Here, we can see that SLSM shares the same problem as that defined in equation 1. Following the work of Papyan et al. (2016), we show that the sparse solution to the LSM problem reduces to the forward modeling operations of a multilayered neural network. The CNN filters and feature maps are shown to be analogous, but not equivalent, to the migration Green’s functions (Hessian) and the reflectivity distribution.

The standard SLSM algorithm needs to solve the wave equation, which is time-consuming. Motivated by the connection between SLSM and CNN, we propose the neural network version of sparse LSM, which does not need to solve the wave equation for the backpropagation of residuals and is faster than standard SLSM. Instead of just finding the optimal reflectivity $m*$, we optimize for the quasi-reflectivity $m$ and the quasi-migration Green’s functions $\Gamma $. These quasi-migration Green’s functions approximate the role of migration Green’s function (Schuster and Hu, 2000) and are denoted as the convolutional filters in a CNN. As discussed in Appendix A, the migration Green’s function is the point scatterer response of the migration operator. The final image is denoted as the neural network least-squares migration (NNLSM) estimate of the quasi-reflectivity distribution that honors the $L1$ sparsity condition.

The next section shows the connection between the multilayer neural network and the solution to the multilayer NNLSM problem. This is followed by numerical examples with synthetic models and field data from the North Sea.

## THEORY OF NEURAL NETWORK LSM

The theory of standard image-domain LSM is first presented to establish the benchmark solution in which the optimal reflectivity function minimizes the image misfit under the $L2$ norm. This is then followed by the derivation of the SLSM solution for a single-layer network. The final two subsections derive the NNLSM solution for single-layer and multilayer networks, respectively.

### Least-squares migration

### SLSM

### NNLSM

We now propose the neural network version of SLSM that finds the $\Gamma *$ and $m*$ that minimize equation 5, which is equivalent to the convolutional sparse coding problem. We denote the optimal solution $m*$ as the NNLSM image. Here, we assume that the migration image $mmig$ can be decomposed into components that have the form $\Gamma 1m1$, where $m1$ represents a sparse quasi-reflectivity structure for the first CNN layer in Figure 1 and $\Gamma 1$ has a convolutional structure. The solution can be found by using the alternating direction method of multipliers method either in the Fourier domain (Heide et al., 2015) or in the space domain (Papyan et al., 2017b), which alternates between finding $\Gamma *$ (the dictionary learning problem) and then finding $m*$ (the sparse-pursuit problem). The key difference between least-squares sparse inversion and NNLSM is that $\Gamma $ is not computed by numerical solutions to the wave equation. Instead, the coefficients of $\Gamma $ are computed by the usual gradient-descent learning algorithm of a CNN. For this reason, we denote $m*$ as the quasi-reflectivity distribution and $\Gamma *$ as the quasi-migration Green’s function.

Appendix C provides the general solution for NNLSM for a single-layer neural network, where the optimal $\Gamma *$ is composed of the quasimigration Green’s functions, which are denoted as convolutional filters in machine learning terminology (Liu and Schuster, 2018, 2019). Each filter is used to compute a feature map that corresponds to a subimage of quasi-reflection coefficients in the context of LSM.

*N*-dimensional vector which can be expressed as

The advantage of NNLSM is that only inexpensive matrix-vector multiplications are used and no expensive solutions to the wave equation are needed for backward and forward wavefield propagation. As will be seen later, convolutional filters that appear to be coherent noise can be excluded for denoising the migration image.

### Multilayer NNLSM

Multilayer NNLSM is a natural extension of single-layer NNLSM. For NNLSM, the migration image $mmig$ can be expressed as $mmig=\Gamma 1m1$ (Figure 2a), where there are $k0$ filters in $\Gamma 1$ and $k0$ sub-quasi-reflectivity images in $m1$. Following Sulam et al. (2018), we can cascade this model by imposing a similar assumption on the sparse representation $m1$, i.e., $m1=\Gamma 2m2$, for a corresponding convolutional matrix $\Gamma 2$ with $k1$ local filters and a sparse sub-quasi-reflectivity image $m2$, as depicted in Figure 2b. In this case, the filter size is $n1\xd7k0$ and there are $k1$ sub-quasi-reflectivity images in $m2$.

## NUMERICAL RESULTS

We now present numerical simulations of NNLSM. Instead of only determining the optimal reflectivity $m$ as computed by SLSM, the NNLSM method computes the quasi-reflectivity $m$ and the elements of the Hessian matrix $\Gamma =LTL$. Each block of $\Gamma $ is interpreted as the segment spread function (SSF) of the migration operator rather than the point spread function (PSF). If the actual Green’s functions are used to construct $\Gamma $, then each column of the Hessian matrix is the point scatterer response of the migration operator (Schuster and Hu, 2000). In contrast, the NNLSM Hessian is composed of blocks, where each block is the segment scatterer response of the migration operator. An example will be shown later where reflections from a segment of the reflector are migrated to give the migration segment response of the migration operator. The computational cost for computing SSFs is several orders of magnitude less than that for PSFs because no solutions to the wave equation are needed. However, the penalty is that the resulting solution $m$ is not the true reflectivity, but a sparse representation of it we denote as the quasi-reflectivity distribution.

Using the terminology of neural networks, we also can denote the sparse sub-quasi-reflectivity images as feature maps. Each block in $\Gamma $ will be denoted as a filter. Therefore, the vector output of $\Gamma m$ can be interpreted as a sum of filter vectors $\gamma i$ weighted by the coefficients in $m$, where $\gamma i$ is the $i$th column vector of $\Gamma $.

### Three-layer velocity model

The interpretation of feature maps and filters can be understood by computing them for the Figure 3a model. The grid size of the model is $101\xd7101$, and the grid interval is 10 m in the *x*- and *z*-directions. There are 26 shots evenly spaced at a distance of 40 m on the surface, and each shot is recorded by 101 receivers with a sampling interval of 10 m. Figure 3b shows the reverse time migration (RTM) image.

The first test is for a 1D model in which we extract the image located at $X=0.5\u2009\u2009km$, which is displayed as the blue curve in Figure 3c. The red curve in Figure 3c represents the reflectivity model. Assume that there is only one filter in $\Gamma $, and it extends over the depth of 400 m (41 grid points). We now compute the NNLSM image by finding the optimal $m$ and $\Gamma $ by the two-step iterative procedure denoted as the alternating-descent method (Liu and Schuster, 2018 and Liu et al., 2018). The computed filter $\gamma i$ is shown in Figure 4a where the phase of the filter $\gamma i$ is nonzero. If we use a filter with a nonzero time lag to calculate its feature map $m$, the phases of the feature map and the true reflectivity $m$ will be different. So we need to modify the time lag and polarity of the basis function $\Gamma \u02dci$. The modified basis function, i.e. filter, is shown in Figure 4b, and its coefficients are displayed as the blue curve in Figure 4c. Compared with the true reflectivity $m$ (the red curve in Figure 4c), the feature map can give the correct positions but it also can give the wrong values of the reflectivity distribution.

Next, we perform a 2D test in which the input is the 2D migration image in Figure 3b. Three $35\xd735$ (grid point) filters are learned (see Figure 5a). The modified filters are shown in Figure 5b. Appendix D describes how we align the filters by using the crosscorrelation method. The feature maps of these three filters are displayed in Figure 6a–6c. Figure 6d shows the sum of these three feature maps. It is evident that the stacked feature maps can estimate the correct locations of the reflectivity spikes.

### SEG/EAGE salt model

The multilayer NNLSM procedure (see equation 12) is now applied to the migration image associated with the 2D SEG/EAGE salt velocity model in Figure 7a. The grid size of the model is 101 grid points in the *z*- and *x*-directions. The grid interval is 40 m in the *x*-direction and 20 m in the *z*-direction. Figure 7b shows the RTM image. The multilayer NNLSM consists of three convolutional layers: The first one contains 15 basis functions, i.e., filters, of size $11\xd711$ grid points; the second one consists of 15 basis functions with dimensions $11\xd711\xd715$; and the third one contains 15 basis functions with dimensions $11\xd711\xd715$. Equation 12 is solved for $mi$ and $\Gamma i$ ($i\u22081,2,3$) by the iterative alternating-descent method. The multilayered structure is shown in Figure 8, where the black dots in $mi$ represent the nonzero values of the quasi-reflectivity distribution. The effective basis functions computed for these layers are shown in Figure 7c–7e, where the yellow, red, and green boxes indicate the sizes of the basis functions, which can be considered as quasi-migration Green’s functions. It indicates that the basis functions of the first layer $\Gamma 1$ contain very simple small-dimensional edges, which are called “atoms” by Sulam et al. (2018). The nonzeros of the second group of basis functions $\Gamma 2$ combine a few atoms from $\Gamma 1$ to create slightly more complex edges, junctions, and corners in the effective basis functions in $\Gamma (2)$. Finally, $\Gamma 3$ combines molecules from $\Gamma (2)$ to reconstruct the more complex parts of the migration image. The corresponding stacked coefficient images, also known as feature maps, are shown in Figure 7f–7h, which give the quasi-reflectivity distributions. The reconstructed migration images are shown in Figure 7i–7k.

For comparison, we computed the standard LSM image using the deblurring method described in Chen et al. (2017, 2019). Here, the deblurring filter size is $17\xd717$ grid points (the black boxes in Figure 9) computed for a $50\xd750$ grid (the red boxes in Figure 9) of evenly spaced point scatterers with the same migration velocity model as that used for the data migration in Figure 7a. The standard LSM images for the 1st and $50th$ iterations are shown in Figure 10b and 10c, respectively, next to the NNLSM image in Figure 10d. It is clear that the NNLSM image is better resolved than the LSM image, although there are discontinuities in some of the NNLSM interfaces not seen in the LSM image. Some of the detailed geology is lost in the NNLSM image as seen in the wiggly interface in the red rectangular area of Figure 10. The practical application of the NNLSM image is that it might serve as a superresolved attribute image that can be combined with other attributes to delineate the geology. For example, combining the depth slice of the NNLSM image with a spectral decomposition image (Aarre, 2016) can help delineate the lithologic edges of meandering channels.

NNLSM can filter out random and coherent noise in the migration image by eliminating the noisy learned basis functions and their coefficients in the NNLSM image. For example, Figure 11a shows the RTM image with a sparse acquisition geometry so that the image contains a strong acquisition footprint. The reconstructed migration image in Figure 11b shows significant mitigation of this noise in Figure 11a. However, the migration swing noise is still prominent near the red arrows in Figure 11b. Such noise is reconstructed from the noisy basis function shown in Figure 12a and the coefficients in Figure 12b. Figure 12c is the image reconstructed by correlating the basis function in Figure 12a with the coefficients in Figure 12b. After filtering out the basis functions from the noise, the reconstructed image is shown in Figure 11c, which is free from aliasing noise at the locations indicated by the red arrows.

### North Sea data

We apply the NNLSM method to field data collected in the North Sea (Schroot and Scüttenhelm, 2003), where the time migration image is shown in Figure 13a. The time axis is gridded with 213 evenly spaced points, and there are 301 grid points along the *x*-axis. We compute 21 $13\xd75$ (grid point) convolutional basis functions, i.e., filters $\gamma i$ for ($i=1,2,\u202621$), by the NNLSM procedure (see Figure 13b). These filters approximate the dip-filtered migration Green’s functions, and the basis functions are marked as the yellow boxes in Figure 13a and 13b. The stacked feature maps (quasi-reflectivity distribution) are displayed in Figure 13c. It is evident that the stacked feature maps can provide a high-resolution migration image. After reconstruction from the learned filters and feature maps, the migration image is shown in Figure 13d with less noise.

Finally, we apply NNLSM to a time slice of the migration image, which is shown in Figure 14a, where the image size is 301 by 301 grid points. Figure 14b shows the 21 $13\xd75$ filters estimated by the NNLSM procedure. The stacked feature map is displayed in Figure 14c, which could be used as an attribute image for high-resolution delineation of geologic bodies. The reconstructed migration image is shown in Figure 14d, and we can see that there is less noise.

## DISCUSSION

The forward modeling for a multilayered neural network is shown to be equivalent to a single-iterate solution of a multilayered LSM problem. This assumes positivity and shrinkage constraints on the soft thresholding operation, so the activation function reduces to the ReLU function. This equivalence relates the physics of seismic imaging to architectural features of the neural network.

The size of the filters in the first layer should be approximately the same size as the migration Green’s function for that model. Experiments with numerical models suggest that this size is approximately one to two wavelengths. In this case, the filter is interpreted as an approximation to the migration Green’s function, except it is that for a reflecting segment. Thus, we interpret the approximate migration Green’s function as a migration SSF rather than a migration PSF. Sulam et al. (2018) classify each feature in the first layer as an atom that takes on the role of an SSF.

The output of the first layer provides the small-scale, i.e., high-wavenumber, features associated with the input data. For an input migration image, the feature maps of the first layer resemble sub-quasi-reflectivity maps of the subsurface. Adding the sub-quasi-reflectivity maps together gives a close approximation to the actual reflectivity model as shown in Figures 6d and 7f.

The output of the second layer is a weighted sum of the first-layer features, which create sparser feature maps. Sulam et al. (2018) classifies the concatenation of the filters from the first and second layers as analogous to the creation of molecules from different atoms (see equation 14). In the migration problem, the resulting filters are SSFs for even larger segments of the original reflector boundaries. The feature maps of the third layer are a weighted sum of the second layer’s features to produce even sparser feature maps. For migration, the final feature maps are very sparse whereas the concatenated filters are associated with large-scale reflectivity structures in the migration image.

The computational cost of computing NNLSM images is significantly less than that for LSM images because no solutions of the wave equation are needed. For example, we consider the 2D FDTD forward modeling of the acoustic equation with an eighth-order scheme in space and a second-order scheme in time, and its computational complexity is $O(ntn2)$ for one shot, where $n$ is the number of grid points in one direction, and $nt$ is the number of the time steps. According to Plessix (2007), the number of time steps $nt$ is approximately equal to $30n$ in order to satisfy the stability condition and to make sure that there is enough recording time when $vmax/vmin\u22433$; here, $vmax$ and $vmin$ are the maximum and minimum velocities, respectively. So, the complexity of 2D FDTD forward modeling of the acoustic equation is $O(nsn3)$, where $ns$ is the number of shots. The computational complexity of LSRTM is $O(6nsNitern3)$ (Schuster, 2017), where $Niter$ is the iteration number. For NNLSM, the complexity is $O(Nitern2logn)$ according to Heide et al. (2015) and can be reduced to $O(Nitern2)$ if a local block coordinate-descent algorithm is used (Zisselman et al., 2019).

The 1D NNLSM can be interpreted as a blind deconvolution (BD) problem in seismic data processing (Kaaresen and Taxt, 1998; Bharadwaj et al., 2018). It can be seen in Figure 5 that the filter of NNLSM is the source wavelet of BD and the coefficients of NNLSM are the quasi-reflectivity coefficients of BD. However, NNLSM can have more than one filter and the filters can be 2D or 3D filters. We show the relationship among BD, SLSM, NNLSM, and CNN in Figure 15.

NNLSM is an unsupervised learning method. Compared to a supervised learning method, it does not heavily depend on the huge amount of training data. However, it may need human intervention for inspecting the quality of the quasi-reflectivity images and we need to align the filters to get a more accurate estimate of the reflectivity distribution.

In Figure 14, we apply 2D NNLSM to a time slice of the 3D North Sea data. The 2D NNLSM method can be extended to a 3D implementation to learn a set of 3D filters. Incorporating the third dimension of information from 3D filters will likely lead to a better continuity of the reflectors in Figure 14c. However, the computational cost will increase by more than a multiplicative factor of $n$.

## CONCLUSIONS

NNLSM finds the optimal quasi-reflectivity distribution and quasi-migration Green’s functions that minimize a sum of migration misfit and sparsity regularization functions. The advantages of NNLSM over standard LSM are that its computational cost is significantly less than that for LSM and that it can be used for filtering coherent and incoherent noise in migration images. A practical application of the NNLSM image is as an attribute map that provides superresolution imaging of the layer interfaces. This attribute image can be combined with other attributes to delineate the structure and lithology in depth/time slices of migration images. Its disadvantage is that the NNLSM quasi-reflectivity image is only an approximation to the actual reflectivity distribution.

A significant contribution of our work is that we show that the filters and feature maps of a multilayered CNN are analogous to migration Green’s functions and reflectivity distributions. We now have a physical interpretation of the filters and feature maps in deep CNN in terms of the operators for seismic imaging. Such an understanding has the potential to lead to better architecture design of the network and extend its application to waveform inversion. In answer to Donoho’s plea for more rigor, NNLSM represents a step forward in establishing the mathematical foundation of CNN in the context of LSM.

## ACKNOWLEDGMENTS

The research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) in Thuwal, Saudi Arabia. We are grateful to the sponsors of the Center for Subsurface Imaging and Modeling Consortium for their financial support. For computer time, this research used the resources of the Supercomputing Laboratory at KAUST and the IT Research Computing Group. We thank them for providing the computational resources required for carrying out this work.

## DATA AND MATERIALS AVAILABILITY

Data associated with this research are confidential and cannot be released.

### MIGRATION GREEN’S FUNCTION

*η*represents model parameters and constants. Here, we implicitly assume a normalized source wavelet in the frequency domain and $Dmodel$ and $Ddata$ represent the sets of coordinates in the model and data spaces, respectively. The term $G(x'|x)=ei\omega \tau xx\u2032/\Vert x\u2212x'\Vert $ is the Green’s function for a source at $x$ and a receiver at $x'$ in a smoothly varying medium (if the source and receiver are coincident at $x$, then the zero-offset trace is represented by the squared Green’s function $G(x|x')2$). The traveltime $\tau xx\u2032$ is for a direct arrival to propagate from $x$ to $x'$.

### SOFT THRESHOLDING FUNCTION

### NEURAL NETWORK LEAST-SQUARES MIGRATION

The NNLSM algorithm in the image domain is defined as solving for the basis functions $\Gamma \u02dc(xi|xj)$ and $m\u02dcj$ that minimize the objective function defined in equation 5. In contrast, SLSM finds only the LSM image in the image domain and uses the precomputed migration Green’s functions that solve the wave equation.

- 1.
- 2.Update the basis functions $\Gamma \u02dc(xi|zj)$ by inserting equation C-3 into the gradient-descent formula to getIt is tempting to think of $\Gamma \u02dc(x|x')$ as the migration Green’s function and $m\u02dci$ as the component of reflectivity. However, we can't submit to this temptation and so we must consider, unlike in the SLSM algorithm, that $\Gamma \u02dc(x|x')$ is a sparse basis function and $m\u02dci$ is its coefficient. To get the true reflectivity, we should equate equation A-3 to $\u2211j\Gamma \u02dc(xi,xj)m\u02dcj$ and solve for $mj$.$\Gamma \u02dc(xi\u2032|zj\u2032)(k+1)=\Gamma \u02dc(xi\u2032|zj\u2032)(k+1)\u2212\alpha \u2202\u03f5\u2202\Gamma \u02dc(xi\u2032|zj\u2032),=\Gamma \u02dc(xi\u2032|zj\u2032)(k+1)\u2212\alpha ({\u2211j\Gamma \u02dc(xi\u2032|zj)m\u02dcj}\u2212mi\u2032mig)m\u02dcj\u2032.$(C-5)

### ALIGNMENT OF THE FILTERS

Figure 18a shows the learned filters with a size of $17\xd79$ from the migration image of the SEG/EAGE salt model. Filter No. 7 (the yellow box) is chosen as the target filter. The aligned filters are shown in Figure 18b, and the stacked feature maps from the original and aligned filters are displayed in Figure 18c and 18d, respectively. It is evident that the reflector interfaces from the aligned filters are more continuous especially in the red box compared with those from the original filters.