The calculation of the gradient in full-waveform inversion (FWI) usually involves crosscorrelating the forward-propagated source wavefield and the back-propagated data residual wavefield at each time step. In the real earth, propagating waves are typically attenuated due to the viscoelasticity, which results in an attenuated gradient for FWI. Replacing the attenuated true gradient with a Q-compensated gradient can accelerate the convergence rate of the inversion process. We have used a phase-dispersion and an amplitude-loss decoupled constant-Q wave equation to formulate a viscoacoustic FWI. We used this wave equation to generate a Q-compensated gradient, which recovers amplitudes while preserving the correct kinematics. We construct an exact adjoint operator in a discretized form using the low-rank wave extrapolation technique, and we implement the gradient compensation by reversing the sign of the amplitude-loss term in the forward and adjoint operators. This leads to a Q-dependent gradient preconditioning method. Using numerical tests with synthetic data, we demonstrate that the proposed viscoacoustic FWI using a constant-Q wave equation is capable of producing high-quality velocity models, and our Q-compensated gradient accelerates its convergence rate.

Full-waveform inversion (FWI) is a data-fitting procedure used to construct high-resolution quantitative subsurface models by exploiting the full information in the observed data, which has been extensively developed in recent years (Virieux and Operto, 2009). To solve FWI in the framework of the local optimization approach, a model update needs to be computed at each iterative step and each model update involves calculating the gradient of the misfit function with respect to the model parameters (Tarantola, 2005). Due to the nonlinearity of FWI in practice, FWI may converge slowly and cause the algorithm to get trapped in a local minimum. To accelerate the convergence rate of FWI, one can calculate a more accurate model update by applying an approximate Hessian inverse to its gradient (Ma and Hale, 2012; Qin et al., 2015; Hou and Symes, 2016). Another more common approach is to precondition the gradient, for instance, using the energy of the source wavefield to normalize the gradient (Gauthier et al., 1986), weighting the gradient by depth-dependent functions (Shipp and Singh, 2002; Wang and Rao, 2009), or smoothing/filtering the gradient in the spatial frequency or other transform domains (Guitton et al., 2012; Xue et al., 2017). Because seismic waves propagating in viscoacoustic media are attenuated, a natural gradient preconditioning approach is to compensate for the energy loss caused by attenuation. Here, we limit our focus on intrinsic attenuation, although seismic scattering attenuation caused by heterogeneity also complicates the amplitude analysis (Browaeys and Fomel, 2009).

The classic approach of superposition of several standard linear solids (SLSs) has been widely used in the time-domain finite-difference method to approximate the constant-Q seismic wave propagation for a specific frequency band (Robertsson et al., 1994). However, compensating Q effects with SLSs is challenging because reversing the sign of memory variables that account for attenuation does not correctly compensate for the phase (Zhu, 2014, 2016; Guo and McMechan, 2015). To overcome this issue, a better choice is to compensate for amplitude loss by using a modified wave equation, which has separate control over amplitude loss and phase dispersion effects (Zhu and Harris, 2014). This idea has been applied previously in reverse time migration (RTM) to get more balanced illumination in seismic images (Zhang et al., 2010; Suh et al., 2012; Bai et al., 2013; Zhu et al., 2014; Sun et al., 2015; Zhu and Sun, 2017), but its application to velocity model building remains to be investigated.

In this paper, the viscoacoustic wave equation proposed by Zhu and Harris (2014) is adopted to formulate a preconditioned approach to Q-compensated FWI. This equation involves fractional Laplacians and has separate control over phase dispersion and amplitude loss effects. In this way, compensating for the amplitude loss while preserving the correct kinematics can be done by reversing the sign of the amplitude-loss factor and keeping the sign of the dispersion factor unchanged (Zhu, 2016). This property was used recently by Sun et al. (2016b) to accelerate the convergence rate of least-squares RTM. We use it in this paper to accelerate the convergence rate of FWI. In the following, we first present the theory related to the forward operator, adjoint operator, FWI gradient, and Q-compensated gradient. Then, we demonstrate the effectiveness of our preconditioning strategy for FWI using a synthetic example.

Forward operator

To describe the intrinsic attenuation behavior in subsurface media, we choose the constant-Q model (Kjartansson, 1979) that is mathematically simple, and it is widely used for applications in exploration seismology. Based on this model, Zhu and Harris (2014) propose a constant-Q viscoacoustic wave equation with decoupled attenuation effects as follows:
1c22Pt2=2P+β1{η(2)γ+12}P+β2{τt(2)γ+1/2}P,
(1)
where
γ(x)=arctan(1/Q(x))/π,
(2)
c2(x)=c02(x)cos2(πγ(x)/2),
(3)
η(x)=c02γ(x)ω02γ(x)cos(πγ(x)),
(4)
τ(x)=c02γ(x)1ω02γ(x)sin(πγ(x)).
(5)

The function P(x,t) indicates the pressure wavefield, Q(x) is the quality factor, and c0(x) stands for the acoustic velocity model defined at a reference frequency ω0. Note that γ(x) defined in equation 2 ranges from 0 to 1/2 and is a dimensionless parameter. The term starting with β1 parameter in equation 1 characterizes the velocity dispersion, and the term starting with β2 describes the amplitude loss. Thus, β1 and β2 take values of ±1 and act like on/off switches to control phase dispersion and amplitude loss effects, respectively.

Equation 1 involves fractional Laplacians and has separate controls over phase dispersion and amplitude loss, which is a big advantage as opposed to other formulations of viscoacoustic wave equation. As shown by Zhu and Harris (2014), this equation is also valid for very low Q values, such as 10–20. In our numerical example, we set the reference frequency ω0 to be very high and assume that the intrinsic attenuation will delay the propagation of the seismic wave.

Setting β1 and β2 to one, equation 1 leads to the following viscoacoustic dispersion relation with fractional powers of the wavenumber:
ω2c2=η|k|2γ+2iωτ|k|2γ+1,
(6)
from which we can solve for the angular frequency ω as follows (Sun et al., 2015):
ω=ip1+p22,
(7)
where
p1=τc2|k|2γ+1,
(8)
p2=τ2c4|k|4γ+24ηc2|k|2γ+2.
(9)
Because the first term under the square root in p2 does not affect the amplitude of wave propagation and only has a relatively small effect on the phase due to the small value of τ (Sun et al., 2015), equation 9 can be approximated as
p24ηc|k|γ+1.
(10)
In heterogeneous media, the form of ω in equation 7 can be used to define the phase shift symbol φ (Zhang and Zhang, 2009). With the use of a one-step extrapolation method, the pressure wavefield P(x,t) becomes complex and satisfies the following first-order partial differential equation:
(t+iφ)P(x,t)=0,
(11)
which can be solved using the following mixed-domain time marching operator (Sun et al., 2016a):
P(x,t+Δt)=P^(k,t)eiϕ1(x,k,Δt)dk,
(12)
where P^ is the spatial Fourier transform of P and the phase function ϕ1(x,k,Δt) is defined as
ϕ1(x,k,Δt)=x·k+φΔt=x·k+ip1+p22Δt.
(13)
To implement the wave propagation numerically, we use the low-rank approximation method of Fomel et al. (2013b) to decompose the wave extrapolation matrix ei[ϕ1(x,k,Δt)x·k] into the following separated representation by selecting a set of N representative spatial locations and M representative wavenumbers:
ei[ϕ1(x,k,Δt)x·k]m=1Mn=1NW(x,km)amnW(xn,k).
(14)
The computation of P(x,t+Δt) then becomes
P(x,t+Δt)m=1MW(x,km){n=1Namn[eix·kW(xn,k)P^(k,t)dk]},
(15)
whose computational cost equals that of applying N inverse fast Fourier transforms per time step, where N is the approximation rank. We can rewrite equation 15 in an algebraically compact form as
P(x,t+Δt)=AP(x,t).
(16)
Correspondingly, the full forward-modeling process incorporating the source term f(x,t) can be expressed as
[IAIAIAI][P(x,0)P(x,Δt)P(x,2Δt)P(x,nΔt)]=[f(x,0)f(x,Δt)f(x,2Δt)f(x,nΔt)].
(17)
The absorbing boundary condition is used to eliminate boundary effects during wave propagation, and it has been intrinsically defined in the wave-propagation matrix (Sun et al., 2016a).

Adjoint operator

Based on equation 17, the adjoint operator (conjugate transpose) can be formulated as
[IA*IA*IA*I][P˜(x,0)P˜(x,Δt)P˜(x,2Δt)P˜(x,nΔt)]=[f˜(x,0)f˜(x,Δt)f˜(x,2Δt)f˜(x,nΔt)],
(18)
where * denotes the complex conjugate transpose of a matrix. The adjoint wavefield can be recursively calculated with the following backward time marching scheme (Sun et al., 2016b):
P˜(x,t)=A*P˜(x,t+Δt)+f˜(x,t+Δt).
(19)
The operator A* carries out the following calculation:
P˜(k,t)n=1NW*(xn,k){m=1Mamn*[eix·kW*(x,km)P˜(x,t+Δt)dx]}.
(20)
Equations 15 and 20 differ from each other in the multiplication order of low-rank decomposed small matrices. In addition, the low-rank matrices used in equation 20 are the complex conjugate transpose of the ones in equation 15. The relative error of the dot-product tests using our implementation of the forward and adjoint operators is approximately 1e-6.

FWI gradient

The least-squares objective function of standard FWI can be written as
J(c0)=12dobsdsyn(c0)22,
(21)
where dobs is the observed waveform data and dsyn is the synthetic data.
According to the adjoint-state method (Tarantola, 2005; Plessix, 2006), the FWI gradient can be expressed as
g(x)=tP(x,t)·Fc0·P˜(x,t),
(22)
where the state variable P(x,t) can be obtained by solving equation 17, the adjoint variable P˜(x,t) is calculated by solving equation 18 with data residual dobsdsyn as the source term f˜(x,t), and F is the forward operator, whose discretized form can be expressed as the square matrix in equation 17. Using equations 7 and 11, F can also be expressed as
F=(t+iφ)=(t+p1+ip22).
(23)
With the definitions of p1 and p2 in equations 8 and 10, respectively, we can derive the gradient of F with respect to velocity c0 as follows:
Fc0=2γ+12c02γω02γsin(πγ)cos2(πγ2)|k|2γ+1+i(γ+1)c0γω0γcos(πγ)cos(πγ2)|k|γ+1,
(24)
which is a function of x, and can be substituted into equation 22 to calculate the velocity gradient.

Q-compensated gradient

The gradient construction in FWI requires the forward-propagated source wavefield and the backward-propagated data residual wavefield. Without compensation, the source wavefield will be attenuated, and the data residual wavefield will be attenuated once again, which means that the crosscorrelation of the source wavefield and data residual wavefield will suffer from the intrinsic attenuation along the entire wave propagation path twice.

To compensate for attenuation, we can simply set β2 in equation 1 to be 1 and keep β1 unchanged. In this way, the amplitude can be amplified based on the Q model and the dispersion effects will be counteracted (Zhu et al., 2014). Thus, the attenuation-compensated constant-Q wave equation corresponds to the following dispersion relation:
ω2c2=η|k|2γ+2+iωτ|k|2γ+1,
(25)
which defines a new phase function (Sun et al., 2015):
ϕ2(x,k,Δt)=x·k+ip1+p22Δt,
(26)
where ϕ2(x,k,Δt) is the complex conjugate of ϕ1(x,k,Δt). For consistency with the gradient amplitude in acoustic media, we need to use ϕ2(x,k,Δt) to compute the source wavefield and data residual wavefield. Although the wavefield for calculating the data residual is attenuated along the whole propagation path, the source and data residual wavefields for calculating the velocity gradient are compensated to accumulate the correct compensation factor along the entire propagation path. This resembles the Q-compensated RTM. When the crosscorrelation imaging condition is applied, attenuation compensation should be applied to the forward and backward wave propagations; when the deconvolution imaging condition is used, attenuation compensation should be applied just to backward wave propagation (Zhu, 2016).

We use a portion of a modified Marmousi velocity model (Figure 1) and a built Q model (Figure 1) to test the Q-compensated FWI. The models include a water layer from the surface down to 200 m. Figure 1 shows the initial velocity model, which is obtained by using a Gaussian function to convolve the original velocity model with a radius of 450 m, but keeping the water layer unchanged. A fixed-spread acquisition survey is generated, which consists of 26 shots spaced every 200 m. The source wavelet for generating the synthetic data is a Ricker wavelet centered at 10 Hz. The record length of the synthetic data is 2.2 s with a time step of 2 ms. To test the forward operator used in FWI, we compare the acoustic shot record generated by a finite-difference method and the shot record generated by our viscoacoustic low-rank one-step operator with Q=10,000. The comparison in Figure 2 demonstrates that the low-rank one-step operator can simulate the same shot record even though it is based on a different formulation of the seismic wave equation.

We perform two FWI experiments: one using the conventional velocity gradient and the other one using the proposed Q-compensated gradient. The multiscale technique (Bunks et al., 1995) is used to avoid the cycle-skipping problem. The seismic data are filtered into three frequency groups of increasing frequency content: 2–5, 2–9, and 2–15 Hz. The optimization method is nonlinear conjugate gradients. We performed 20 iterations for each of the frequency groups successively. Without Q-compensation, source and data residual wavefields get attenuated. Figure 3 shows the (source and data residual) wavefield snapshot and the corresponding gradient comparisons without and with Q-compensation. These wavefields have the same traveltime of 1 s and are extracted from the same iteration of FWI. It is apparent that the wavefields with Q-compensation have a larger amplitude, especially for the deeper part, in which the accumulated attenuation effects are stronger. With the compensated wavefields, we can construct a Q-compensated gradient, which has the same phase as the compensation-free gradient, but a Q-dependent amplitude recovery.

Figure 4 shows the FWI results after the inversions with and without Q-compensation for the three frequency groups. Comparing Figure 4 and 4, we observe the obvious improvement in the central and deep parts of the velocity model brought by the Q-compensated gradient. This phenomenon is easy to understand. As indicated in Figure 3, Q-compensated gradient has a larger amplitude, especially for the deep part. In the subsequent two stages, Q-compensated FWI keeps recovering more accurate velocity models at the cost of the same iteration numbers. Figure 4 and 4 shows the final inverted velocities without and with using Q-compensated gradient. The rock cap (indicated by arrows) in the velocity obtained by Q-compensated FWI has a larger value, and it is closer to the true model. Figure 5 shows the data misfit convergence comparison of FWI with and without Q-compensation at the three stages. At the initial several iterations of the first stage, Q-compensated FWI has a faster convergence rate, and then it keeps the smaller misfit as the iterations continue. Comparing the misfit difference through the three stages, we can find that the relative misfit difference becomes bigger as the data frequencies increase. This implies that the attenuation effect is more obvious for high frequencies, and therefore the Q-compensated gradient can play a more significant role for high-frequency data inversion.

We use the proposed method of preconditioning to accelerate the convergence rate of time-domain FWI for velocity reconstruction. The synthetic example shows that the Q-compensated gradient can effectively accelerate the rate of convergence, although the low-frequency components of seismic data, which are less sensitive to attenuation, are mainly used in FWI. Previous studies about time-domain FWI in the presence of attenuation have shown that considering attenuation as a smooth background modeling parameter can significantly improve the velocity reconstruction (Kurzmann et al., 2013; Bai et al., 2014). Compared with those studies, our method not only incorporates the attenuation effects in the wave simulation process but it also aims to construct an amplitude-compensated and phase-preserved gradient based on the attenuation model. Similarly, the attenuation model used in this study is assumed to be known and it is not an inversion parameter. FWI inversion strategies for simultaneous and hierarchical velocity and attenuation inversion have been investigated by Kamei and Pratt (2013). Yang et al. (2016) provide a theoretical review on the formulation of 3D viscoelastic multiparameter FWI. Qu et al. (2017) implement viscoacoustic anisotropic FWI and test their method’s sensitivity to velocity, Q, and anisotropic parameters. A background Q model can be estimated using either data-domain or image-domain tomographic techniques (Brzostowski and McMechan, 1992; Quan and Harris, 1997; Hu et al., 2011; Zhou et al., 2011; Valenciano and Chemingui, 2012; Shen et al., 2014, 2015; Shen and Zhu, 2015; Dutta and Schuster, 2016).

We have implemented a viscoacoustic FWI using phase and amplitude decoupled constant-Q wave equation. By reversing the sign of amplitude-loss term in this equation, we have constructed the Q-compensated gradient that has the correct phase information and a Q-dependent amplitude recovery. The low-rank wave extrapolation technique is used to formulate the exact adjoint operator in the construction of the FWI gradient. Our testing results show that the FWI convergence rate can be notably accelerated by the Q-compensated gradient, which helps to rapidly construct the deep structures of the velocity model.

We thank the assistant editor D. Draganov, the anonymous associate editor, reviewer J. Hou, and one anonymous reviewer for providing valuable suggestions. We thank the sponsors of the Texas Consortium for Computational Seismology for financial support. Z. Xue and J. Sun were additionally supported by the Statoil Fellows Program at the University of Texas at Austion. We also thank the Texas Advanced Computing Center for providing the computational resources used in this study. All computations presented in this paper are reproducible using the Madagascar software package (Fomel et al., 2013a).

Freely available online through the SEG open-access option.