We have developed a method to combine unsupervised and supervised deep-learning approaches for seismic ground roll attenuation. The method consists of three components that have physical meaning and motivation. The first component is a convolutional neural network (CNN) to separate a seismic record into ground roll and signal, while minimizing the residual between the sum of the generated signal and ground roll from two subnetworks and the input seismic record. The second component creates a maximum separation of signal and ground roll in the f-k domain, by training a supervised classifier. The third component is a CNN mapping signal to ground roll, which overcomes the problem of finding appropriate masks in traditional methods. Each component in our method is closely related to and motivated by the wave characteristics of the ground roll. Test results on field seismic records demonstrate the effectiveness of combining these components in preventing signal leakage and removing ground roll from seismic data.

Ground roll is a type of coherent seismic noise — characterized by high amplitudes, low frequencies, and low velocities, which sometimes cause spatial aliasing. It corrupts shallow reflections at short offsets and deep reflections at larger offsets. Therefore, it needs to be removed during the processing steps. Because ground roll has a relatively low-frequency content, band-pass filtering is a common way to remove ground roll from seismic records. However, band-pass filtering can distort the desired reflections when there is a frequency overlap between the ground roll and the signal. Several methods have been proposed to overcome this issue. Fomel (2002) attenuates ground roll in seismic records using 1D prediction-error filters to separate in the frequency domain and single-dip plane-wave destruction filters to separate according to slopes. Chen et al. (2015) use local orthogonalization after band-pass filtering to improve the separation. Liu and Fomel (2013) use the local time-frequency transform (LTFT) to create a better separation between the signal and ground roll. Chiu (2019) attenuates aliased ground roll in the regular and irregular data with eigenimage filtering in the frequency domain. These signal processing techniques are based on predefined and relatively simple assumptions about signal and ground roll characteristics, and they may not perform well in realistic scenarios. Perkins and Zwaan (2000) and Meur and Traonmilin (2008) propose a coherent noise attenuation method called adaptive ground roll attenuation (AGORA). In AGORA, ground roll is modeled strictly as a series of dispersive linear events.

Machine learning and deep learning are increasingly used in different geophysical problems such as detecting faults, channels, and salt bodies (Li, 2018; Pham et al., 2019; Shi et al., 2019; Liu et al., 2020; Wu et al., 2020). They also start to show promising results in seismic processing and migration (Oliveira et al., 2018, 2019; Kaur et al., 2020b, 2021). Kaur et al. (2020a) propose a CycleGAN algorithm for attenuating ground roll. Training samples come from small portions of data, and labels are created from a series of processing methods: LTFT (Liu and Fomel, 2013), cascaded plane-wave destruction filter with 1D frequency prediction-error filter (Fomel, 2002), and adaptive subtraction with regularized nonstationary regression (Fomel, 2009). However, the CycleGAN architecture does not express physical meaning and is hard to interpret. Guo et al. (2020) design an unsupervised autoencoder network, with a frequency filtering on network outputs, to decompose a seismic record into two meaningful components of signal and ground roll. The method relies on choosing a frequency threshold to separate the signal and ground roll in the f-k domain. Jia et al. (2019) use a combination of the low-frequency part of the ground roll and reflection components with a discriminative loss to eliminate ground roll with a convolutional neural network (CNN).

We propose a method for ground roll suppression by designing deep-learning blocks that are related to the characteristics of ground roll and can be interpreted with wave physics intuition. Guo et al. (2020) are inspired by an unsupervised machine-learning method for the image decomposition problems (Gandelsman et al., 2019) and create a 2D CNN to separate the ground roll and the signal. We add two additional blocks into this design to impose surface-wave (SW) constraints and improve the signal and noise separation with a supervised classifier. Our method overcomes the difficulty aassociated with creating masks in traditional separation methods and avoids us having to choose frequency thresholds. Each block component in our design is related to the physics characteristics of ground roll and signal: frequency-wavenumber spectral differentiation and relationship between the signal and the ground roll. In this paper, we are interested in ground roll of general characteristics not necessarily well-modeled as linear events, including ground roll with strong curvatures, which would be problematic for AGORA. We test our approach on one field record with aliased ground roll and another in which ground roll and random noise are present.

Conventional seismic processing methods

Following Kaur et al. (2020a), we create training data and labels with a series of cascaded processing methods. First, we use the differences in frequency and slope to separate the signal and ground roll by cascading single-dip plane-wave destruction filters with local 1D three-coefficient prediction-error filters (Fomel, 2002). To handle the variability in ground roll amplitudes, we apply a regularized nonstationary regression with an adaptive subtraction method on the outputs from the first method by designing a nonstationary matching filter with 25 coefficients (Fomel, 2009). To restore the leaked useful signal in the noise model, we orthogonalize the estimated signal and noise from the previous steps (Chen et al., 2015). Finally, we apply f-x regularized nonstationary autoregression (Liu et al., 2012) to remove the remaining random noise. To perform these steps, we need to have an initial ground roll model, which requires an appropriate mask and a low-pass filter.

We can use these steps to create training data for a classifier to add a constraint on the unsupervised deep-learning method such that signal and noise have a maximum separation in the f-k domain. We observe that the shallow reflection events change their locations, simultaneously with the ground roll, from one shot/receiver to another. Therefore, shallow reflections can help predict the location of ground roll. We propose to add an SW mapping block to learn the relationship between the signal and the ground roll and to avoid the need for choosing masks, which can be difficult in real data scenarios.

Frequency-based unsupervised method

Guo et al. (2020) separate the input seismic data into two output components: the signal and the ground roll, through two neural networks. The formulation requires that the sum of the two components is equal to the input seismic data. There are many ways to achieve such separation. Guo et al. (2020) impose constraints based on the low-frequency characteristic of the ground roll, therefore minimizing the low-frequency component of the estimated signal and the high-frequency component of the estimated ground roll. These constraints are implemented as integrations below and above a cutoff frequency, respectively, along the frequency axis in the f-k domain. This formulation is exactly the same as the least-squares spectral domain filter design (Soewito, 1991), except that it is implemented in the nonlinear CNN framework instead of a linear finite impulse response filter. Therefore, the total loss function is
Ltotal=Ldata+λ1Llow+λ2Lhigh,
(1)
where Ldata is the misfit between the input noisy data and the sum of two estimated components, and Llow and Lhigh are the frequency constraints on the signal and ground roll components, respectively. The mathematical formulas of Ldata, Llow, and Lhigh are found in equations 24, respectively. To prevent low-frequency signal leakage, λ1 should be small.
We add more elements to improve this approach. From noisy seismic record Z, two subnetworks CNN1 and CNN2 estimate signal X and ground roll Y (Figure 1a). CNN1 and CNN2 are encoder-decoder CNNs with one skip connection between the first encoder layer and the last decoder layer. Each subnetwork has an initial convolutional layer with a filter size of 7×21, two encoders, a bottleneck, two decoders, convolutional layers, and an output convolutional layer (Figure 2). Each encoder and decoder has three convolutional layers having a filter size of 3×9 in CNN1 (Figure 2) and two convolutional layers having a filter size of 3×9 in CNN2 (Figure 3). Each encoder is followed by a down-sampling step consisting of a convolutional layer with a filter size of 3×9 and a max-pooling layer. Before each decoder, there is an up-sampling layer followed by a convolutional layer with a filter size of 3×9. After the decoder, there are three convolutional layers in CNN1 (Figure 2) and two convolutional layers in CNN2 (Figure 3). The misfit between the summation Z1=X+Y and Z is given by
Ldata=1N(Z1Z)2,
(2)
where N is the total number of pixels in the seismic data. Frequency constraints on X and Y are Llow and Lhigh expressed as
Llow=1WPf=0W|X(k,f)|,
(3)
Lhigh=1NWPf=W+1Q|Y(k,f)|,
(4)
where X(k,f) and Y(k,f) are the 2D Fourier transform functions of X and Y, respectively, k=1,,P and f=1,,Q are the discrete wavenumber and frequency sample points, respectively, and W is the low-pass window size in the frequency domain. The frequency-based unsupervised method is simple and can be effective at removing ground roll. However, when the signal and noise occupy the same frequency range, it will be hard to choose a suitable threshold. Moreover, low-frequency content is only one characteristic of ground roll. The series of conventional methods that we use to create training labels account for the dip and frequency characteristics of ground roll. Therefore, we propose to add an SW constraint and an f-k classifier to separate the signal and the ground roll more effectively in realistic situations where the signal and ground roll characteristics are more complex.

Addition of the SW constraint

Conventional seismic processing methods often rely on masks to locate the position of the ground roll to generate the initial ground roll model. Choosing the correct shape and size of these masks is very tricky, and different choices can lead to signal leakage in the estimated noise or remaining noise in the estimated signal. For the data sets used in this paper, as we move from one shot gather to another, the shallow reflection events move with the ground roll in a related pattern. Part of this observation can be attributed to, for instance, moving from inline to crossline gathers with progressively larger source distance from the receiver line, therefore having similar effect on the location and shape of the shallow reflection events as well as those of the ground roll, depending on the acquisition geometry and the subsurface configuration. Therefore, based on this observation, we use the shallow reflection events to form a soft guidance on the location/shape of the ground roll.

The goal of adding an SW constraint is to enforce the relationship between the ground roll and shallow reflections. This relationship also changes over different shots/receivers configurations, which requires the mapping to be adaptive. Therefore, similar to SW modeling via a near-surface velocity model and dispersion curves, we take a data-driven surrogate approach to transform signal into ground roll, without explicitly generating near-surface velocity and dispersion models, which is enabled by CNN3 (Figure 1b). CNN3 has the same structure as CNN2 and takes the estimated signal X from CNN1 as input to produce Yrecover. The misfit Lrecoverdata between Yrecover and Y is given by
Lrecoverdata=1N(YrecoverY)2.
(5)
We also minimize the high-frequency component in Yrecover with Lrecoverhigh, which is similar to equation 4 but with Yrecover instead of Y. The total loss function is
Ltotal=Ldata+λ1Llow+λ2Lhigh+λ3Lrecoverdata+λ4Lrecoverhigh.
(6)
There are four hyperparameters in the preceding loss function. Including CNN3 can introduce signal leakage in noise and noise residuals in signal if the coefficient of each loss term is not chosen properly. Our experience is that it is easier to get signal leakage in noise so λ2 should be the largest. To keep low-frequency reflection events, λ1 here should be small but bigger than λ1 in the frequency-based method, which leads to small λ4. The term λ3 should be one.

Classifier in the f-k domain

The frequency-based penalty terms in Guo et al. (2020) are a simple filtering procedure. The choice of frequency threshold can be hard to make when the signal and the ground roll have significant frequency overlap. In addition, low frequency is only one of the many ground roll characteristics. Therefore, we propose to use a supervised classifier as a potentially more effective way to separate signal from ground roll, even with the frequency overlap. A small portion of the data using the aformentioned series of conventional processing methods is transformed into the f-k domain and used as training data (Xfk and Yfk) for a classifier. The classifier network classifies each input in the f-k domain as either signal or ground roll (Figure 1c). The classifier separates the estimated signal and ground roll in terms of the frequency and wavenumber instead of manually choosing the frequency threshold. It also takes advantage of a supervised classification method to guide the unsupervised workflow.

Our classifier is a fully CNN with one initial convolutional layer having a filter size of 3×3, three down-sampling layers having a convolutional layer with a filter size of 3×3, followed by a max-pooling layer, and three fully connected layers (Figure 4). We also add a dropout layer after the last downsampling layer. The classifier takes real and imaginary parts of the f-k domain as the input and outputs label 1 for signal and 0 for ground roll. It is pretrained with training data and labels and then is used in the unsupervised framework to calculate the cross-entropy loss (Lnoiseclass, Lsignalclass, and Lrecovernoiseclass) between each output from CNN1, CNN2, CNN3, and the corresponding labels of signal or ground roll. The total loss function is
Ltotal=Ldata+λ1Lsignalclass+λ2Lnoiseclass+λ3Lrecoverdata+λ4Lrecovernoiseclass.
(7)
The pretrained classifier helps to separate the signal and ground roll in the f-k domain more effectively than choosing a hard threshold for every shot/receiver. We can either use it alone or incorporate with the SW constraint. The training labels of the signal have the low-frequency parts. Therefore, when combining with the SW constraint, the classifier can guide CNN1 to keep certain levels of ground roll in the signal to transform the signal to ground roll more easily. To solve this problem, we propose to either use the classifier without the SW constraint or to take the labels of ground roll (Ytrue) and signal (Xtrue) as targets to compute misfit terms, respectively, to avoid crosstalks. The total loss function becomes
Ltotal=Lsignal+Lgroundroll+λ1Lsignalclass+λ2Lnoiseclass+λ3Lrecoverdata+λ4Lrecovernoiseclass,
(8)
where Lgroundroll=(1/N)(YtrueY)2 and Lsignal=(1/N)(XtrueX)2.

We examine the aforementioned workflows on six receiver lines of a 3D land shot gather with 96 offsets, an offset interval of 50 m, 500 time samples, and a time sampling interval of 4 ms. It is contaminated with 3D ground roll with a hyperbolic shape in the cross section. The ground roll is aliased, which makes it hard to separate the signal and the ground roll in the f-k domain.

We first apply the conventional methods mentioned in the “Method” section to produce a baseline and create training data for the classifier approach. The estimated signal after a series of conventional methods and the estimated ground roll are given in Figure 5. The conventional methods work reasonably well, but there are still some remaining ground roll residuals and the aliased parts are not completely removed (the red arrows in Figure 5).

We train the first five receiver lines with the deep-learning approaches. We leave the last receiver line, whose ground roll is more linear and more aliased, for testing. For the frequency-based unsupervised method, we choose a frequency threshold of 20 Hz, λ1=0.0009, and λ2=0.0025. We create rectangular patches with 400 time samples and 64 offsets, so that the network sees the shallow reflection events and the deep ground roll. Our convolutional filters are also rectangles with shapes proportional to the dimension of the patches, so that they capture relationships in larger areas across the time dimension. The patches are created by a nonstationary patching algorithm (Claerbout, 2014). We use 90% of the patches for training and 10% of the patches for validation. We train the network in 200 epochs with an early-stop after the validation loss increases over 50 epochs. A model pretrained on one survey will need to be retrained when applied to a different survey. The model can be retrained with a subset of a new field data before applying to the rest of the data. The model needs to be retrained because the new data set likely has somewhat different statistics and other characteristics from the data set that were used to train the network the first time. The estimated signal of the testing receiver line is shown in Figure 6b. The estimated ground roll is shown in Figure 6g. There is signal leakage into the noise domain (the blue arrow in Figure 6g) and noise residuals in the signal domain (the red arrow in Figure 6b).

We then add the SW constraint and minimize three subnetworks together. We still choose a frequency threshold of 20 Hz, λ1=0.002, λ2=0.004, λ3=1, and λ4=0.0025. The design of the network and training data is the same as that of the frequency-based method. When applying the pretrained model on one survey to new field data, we will retrain with a subset of the new survey. There are fewer noise residuals in the estimated signal (the red arrow in Figure 6c) and less signal leakage in the estimated ground roll (the blue arrow in Figure 6h).

There is less signal leakage when replacing the frequency hard thresholding with a classifier in the f-k domain (the blue arrow in Figure 6i). More random noise in the aliased areas and ground roll are removed (the red arrow in Figure 6i). We choose λ1=10 and λ2=5. Combining the f-k classifier with the SW constraint is tricky because the classifier and the SW constraint may keep low-frequency ground roll instead of low-frequency reflection events, which causes ground roll residuals in the estimated signal.

Therefore, we change the loss function to equation 8. We use the classifier and the SW constraint. Similar to the two approaches mentioned previously, a pretrained model on one survey will be retrained with a subset of a new survey before being applied to the rest of the new data. The loss is higher than the other approaches because we incorporate the f-k classifier and the SW constraint. There is less signal leakage in the estimated ground roll (the blue arrow in Figure 6j), and fewer ground roll residuals in the estimated signal (the red arrow in Figure 6e). The results of only using true ground roll and signal to calculate misfit terms are worse, which proves the effectiveness of the SW constraint and the classifier. Figure 7 shows the effective performance of CNN3 to transform the estimated signal into the ground roll.

The training and validation losses for models in Figure 1a, 1b, and 1c are shown in Figure 8a, 8b, and 8c, respectively.

We then calculate the f-k spectra of the estimated results from the last receiver line (Figures 9 and 10). The comparison shows that the SW constraint and the classifier are useful in keeping the high-frequency parts in the estimated ground roll and the low-frequency parts in the signal component (the frequencies higher than 20 Hz in Figure 9c9e). The addition of the true signal and ground roll for calculating misfits helps the network better preserve the low-frequency parts of the signal (the frequencies smaller than 12 Hz in Figure 10e).

We apply the trained parameters of the network with the SW constraint and the f-k classifier to all receiver lines (Figure 11a). The estimated signal is shown in Figure 11b. The estimated ground roll is shown in Figure 11c. The results show that the ground roll is removed in all receiver lines with little signal leakage. Our method is faster than the conventional methods, when applyied to many receivers. We only need to train on some receivers and can apply the trained parameters to the remaining receivers without the need of modifying parameters or creating masks. Moreover, comparing our results and the results from the conventional methods for the last receiver line, there are fewer ground roll residuals in the signal domain and more ground roll is removed in the aliased area (the red arrows in Figure 5 and the blue arrows in the last panel from the left in Figure 11b).

The second data set has 36 shot gathers with 432 offsets, an offset interval of 35.3 m, 1500 time samples, and a time sampling interval of 4 ms. Ground roll appears hyperbolic for the lines where the receivers are the farthest away from the sources, and it appears up closer to the surface with a more linear trend for the lines where the receivers are closer to the sources. To illustrate the results of the conventional processing method mentioned in the “Method” section, we show a shot gather after removing ground roll (stage I in Figure 12). In the second stage, we apply f-x regularized nonstationary autoregression on the estimated signal from stage I for removing random noise. The f-x method causes some signal leakage in the noise domain (stage II in Figure 12), which affects our results when training with a classifier.

We then apply the frequency-based unsupervised method. We choose a frequency threshold approximately 12 Hz, λ1=0.0009, and λ2=0.005. The training data are created from 8 shots; each patch has 256 offsets and 1200 time samples. The estimated signal for one test shot is shown in Figure 13b. The estimated ground roll for this test shot is shown in Figure 13g. The f-x deconvolution can be applied to the estimated signal to remove random noise.

The estimated signal of adding the SW constraint does not appear too different from the frequency-based result (Figure 13h). The SW constraint reduces the signal leakage in the estimated ground roll (the red arrow in the third panel from the left of Figure 13b). We choose λ1=0.001, λ2=0.003, λ3=1, and λ4=0.0015.

Instead of having only two classes of signal and ground roll, the data set also has random noise. The SW constraint is between the signal and ground roll. Therefore, we train a classifier with three classes: signal, ground roll, and other noise. We then incorporate the trained classifier into the unsupervised workflow to replace the frequency threshold. We add another subnetwork with the same structure as CNN2 to estimate other noise. We use the true ground roll, true signal, and true random noise to calculate the misfit terms. The estimated signal is shown in Figure 13d. The ground roll and random noise are removed.

Combining the SW constraint and the f-k classier with misfit terms calculated with the true ground roll and true signal, the amplitudes of the ground roll are estimated more accurately (Figure 13j). The method with the SW constraint and the f-k classifier recovers more signal compared to the method with only the f-k classifier (the orange arrows in Figure 13d and 13e). Methods with hard frequency thresholding cannot separate random and blended noise from the signal because these methods do not impose the wavenumber thresholds. The addition of the SW constraint and true labels to calculate the misfit terms helps to recover the high-fequency parts of ground roll more accurately (frequencies higher than 20 Hz in Figure 14). To further improve the results, we can use different conventional methods to remove random and blended noise in the training data with less signal leakage. The f-k domain of the estimated signal (Figure 15) shows that the classifier is effective in removing ground roll (frequencies smaller than 15 Hz) and other noise (wavenumbers with absolute value higher than 0.005 m−1).

We apply the trained parameters of the network with the SW constraint and the f-k classifier to all of the receiver lines (Figure 16a). The estimated signal is shown in Figure 16b. The estimated ground roll is shown in Figure 16c. Ground roll is suppressed, and the signal is recovered with accurate amplitudes. The approaches with the f-k classifier have the potential to remove not only ground roll but also other types of noise. Moreover, the method with the SW constraint and the frequency threshold has the ability to remove some of the random noise, when compared with the results from the conventional methods mentioned in the “Method” section (the blue arrow in Figure 13h).

We propose a deep-learning workflow for attenuating ground roll in seismic records by subsequently adding constraint blocks on an unsupervised method, which are based on the characteristics of ground roll. The base of the unsupervised method is two CNNs to estimate the signal and the ground roll with the objective to minimize the misfit between their summation and the noisy seismic data. The differences in the frequency and wavenumber spectra between the ground roll and the signal make them potentially distinguishable in the f-k domain. This can be achieved either by a hard thresholding or an f-k classifier pretrained on the training data. We add an SW constraint to transform the signal into ground roll and infer the location of the ground roll. Results on two field data sets show the effectiveness of adding these constraints on the unsupervised workflow in removing ground roll from the signal, also recovering the high-frequency parts of ground roll and the low-frequency parts of the signal. The usage of the f-k classifier has the potential to remove other noisy events beside ground roll. Because the results are affected by the training data, more effective conventional methods to create true labels can further improve the network performance.

The first data set used in the manuscript can be requested from the authors.

Biographies and photographs of the authors are not available.

Freely available online through the SEG open-access option.