## ABSTRACT

Full-waveform inversion deals with estimating physical properties of the earth’s subsurface by matching simulated to recorded seismic data. Intrinsic attenuation in the medium leads to the dispersion of propagating waves and the absorption of energy — media with this type of rheology are not perfectly elastic. Accounting for that effect is necessary to simulate wave propagation in realistic geologic media, leading to the need to estimate intrinsic attenuation from the seismic data. That increases the complexity of the constitutive laws leading to additional issues related to the ill-posed nature of the inverse problem. In particular, the joint estimation of several physical properties increases the null space of the parameter space, leading to a larger domain of ambiguity and increasing the number of different models that can equally well explain the data. We have evaluated a method for the joint inversion of velocity and intrinsic attenuation using semiglobal inversion; this combines quantum particle-swarm optimization for the estimation of the intrinsic attenuation with nested gradient-descent iterations for the estimation of the P-wave velocity. This approach takes advantage of the fact that some physical properties, and in particular the intrinsic attenuation, can be represented using a reduced basis, substantially decreasing the dimension of the search space. We determine the feasibility of the method and its robustness to ambiguity with 2D synthetic examples. The 3D inversion of a field data set for a geologic medium with transversely isotropic anisotropy in velocity indicates the feasibility of the method for inverting large-scale real seismic data and improving the data fitting. The principal benefits of the semiglobal multiparameter inversion are the recovery of the intrinsic attenuation from the data and the recovery of the true undispersed infinite-frequency P-wave velocity, while mitigating ambiguity between the estimated parameters.

## INTRODUCTION

Seismic full-waveform inversion (FWI) (Tarantola, 1984) is a method for estimating physical properties of the subsurface from seismic recorded data. Even though the velocity of seismic waves has been the chief estimated property in applications of FWI (Sirgue et al., 2010; da Silva et al., 2016; Routh et al., 2017; Yao and Wu, 2017; Yao et al., 2018a), the method can also be used for the estimation of other physical properties provided that the required property affects the seismic data in a defined way, at least in principle (Tarantola, 1988). The most common type of constitutive law used in commercial 3D FWI considers the earth to be an acoustic medium. More recently, it has become commonplace also to account for seismic anisotropy (Plessix and Cao, 2011; Warner et al., 2013; da Silva et al., 2016). Seismic anisotropy is particularly important to account for the different scales of heterogeneity in the medium, when compared with the signal bandwidth, as well as preferred alignment of crystals, cracks, and layers (Thomsen, 1986). Seismic anisotropy is normally indispensable for fitting seismic field data at long and short offsets (Plessix and Cao, 2011). In addition, internal friction, crystal-defect sliding, grain-boundary processes, thermoelastic effects, and/or fluid-filled cracks are responsible for anelastic attenuation of propagating seismic waves; elastic scattering from subwavelength heterogeneities can also mimic closely the effects of true anelastic loss.

Anelastic loss and subwavelength scattering attenuation are generally described by a parameter quality factor $Q$. It quantifies the quantity of energy that is lost per cycle (Aki and Richards, 2002). This effect can be accounted for by correcting the amplitude and phase of the recorded signals (Agudo et al., 2018), prior to carrying out the inversion of data. Alternatively, intrinsic attenuation can be accounted for in the constitutive laws. Because attenuating media is frequency dependent, the dependency on the frequency can be explicitly introduced in the constitutive law when using frequency-domain simulation of seismic waves (Song et al., 1995; Liao and McMechan, 1993, 1996; Hicks and Pratt, 2001; Ribodetti and Hanyga, 2004). The frequency-dependent nature of the intrinsic attenuation translates into a convolution operation in the time domain between the anelasticity and strain tensors describing hysteresis in the medium (Aki and Richards, 2002). That is, at any given instant, the state of deformation, and consequently energy exchange in the system, depends upon all of the previous states. This relation between stress and strain needs to be addressed efficiently when carrying out numerical simulations in the time domain to avoid computational burdens. Examples of methods to deal efficiently with numerical simulation in attenuating media, and in the time domain, are the use of Padé approximants (Day and Minster, 1984), the method of memory variables with the standard linear solid (SLS) (Carcione et al., 1988; Robertsson et al., 1994), the pseudospectral method (Liao and McMechan, 1993), and the temporal fractional derivative with Kjartansson’s $Q$-theory (Kjartansson, 1979; Zhu and Carcione, 2014; Zhu and Harris, 2015; Yao et al., 2017).

Song et al. (1995) and Hicks and Pratt (2001) report simultaneous inversion for velocity and intrinsic attenuation in frequency-domain FWI. Watanabe et al. (2004) introduce joint cascading inversion for the real part of the velocity using phase information and inverting the imaginary part of the velocity using amplitude information. Rao and Wang (2015) introduce inversion of velocity and intrinsic attenuation by sequential updates of velocity and intrinsic attenuation. Bai et al. (2014) discuss time-domain viscoacoustic FWI. A particularity of using frequency-domain FWI is the fact that the parameterization for the inversion can be formulated explicitly in terms of $Q$ for each frequency or in terms of the real and imaginary parts of the velocity or slowness (Pratt et al., 2004, Kamei and Pratt, 2013, Rao and Wang, 2015).

Song et al. (1995), Malinowski et al. (2011), Operto et al. (2015), and Plessix et al. (2016) present examples of the application of FWI to data sets affected by intrinsic attenuation. Those previous works focus on the minimization of the objective function by updating the model parameters iteratively with the gradient of the objective function with respect to velocity and intrinsic attenuation. First-order local methods are generally preferred to implement feasible and efficient inversion algorithms. One of the key drawbacks of that type of method, for multiparameter inversion, is that the estimated properties are affected by crosstalk as a result of ambiguity in the relation between perturbations in the model parameters and perturbations in the data. In other words, those jointly estimated properties are affected by a larger null space. That has been reported when jointly inverting for velocity and intrinsic attenuation (Malinowski et al., 2011; Kamei and Pratt, 2013; Plessix et al., 2016) with local inversion methods.

The application of global inversion methods is generally hindered by the need to perform a large number of model-space searches (Sen and Stoffa, 2013) because the size of this search space increases rapidly with the dimensionality of the model space. Nonetheless, some physical properties can be represented by smooth models, or models that have very weak spatial variation. Models with that type of property can be represented in a space of basis functions with low dimension. This concept has been used for the estimation of smooth starting velocity models for FWI (Datta and Sen, 2016; Diouane et al., 2016) and for seismic anisotropy (Afanasiev et al., 2014; Debens et al., 2015). Ji and Singh (2005) combine a genetic algorithm (GA) with the conjugate-gradient method for the estimation of the elasticity tensor coefficients in one dimension.

The quality factor is in a class of physical properties that can be approximated by a reduced basis. That is, it can be sufficient to use a smooth model of $Q$ for predicting seismic waves that have the correct kinematics and dynamics. This statement is validated later using a numerical example. Thus, global inversion of intrinsic attenuation becomes computationally feasible because the dimensionality of the search space for this parameter can be reduced.

We introduce and discuss a method for the joint estimation of seismic intrinsic attenuation and seismic velocity. This method combines conventional FWI and quantum particle-swarm optimization (QPSO) (Sun et al., 2004a, 2004b; Debens et al., 2015). The semiglobal inversion algorithm nests local gradient-descent iterations within an outer QPSO global iteration. The velocity model is represented on a dense grid, and it is updated only at the nested gradient-descent iterations. The model of the intrinsic attenuation is represented on a sparse basis, and it is estimated only at the outer global iterations. We note that such an approach is not limited to parameterizations with velocity and intrinsic attenuation only; the wave equation can describe wave propagation in more complex rheology, and in our examples, the medium includes the anisotropy of velocity.

The paper is structured as follows. First, we review the theory for the simulation of seismic waves in anisotropic viscoacoustic media. We then introduce the semiglobal algorithm and we demonstrate its effectiveness with synthetic examples, discussing the accuracy of a sparse representation of intrinsic attenuation, as well as the mitigation of crosstalk between the estimated parameters. Finally, we show an application of the global inversion scheme to a real data set acquired in the North Sea, demonstrating that the proposed approach is accurate and computationally feasible for the inversion of a large-scale field seismic data, leading to improved velocity recovery and a more-complete characterization of the subsurface.

## FORWARD MODELING

## INVERSION ALGORITHM

Tarantola and Valette (1982) set the ambitious goal that the solution of the inverse problem requires a full probabilistic description of the solution space. This, however, is generally not attainable in full large-scale 3D geophysical inversion because the computational load associated with the simulation of the data can become prohibitive if full-space searches are carried out. Consequently, the solution of large-scale geophysical inverse problems is generally estimated with local optimization methods. This class of methods relies upon a priori information, such as, for instance, obtained by determining a good starting model using seismic tomography. This approach aims to place successive local iterations within the global basin of attraction. Nonetheless, convergence toward the global minimum is not guaranteed.

### Local inversion methods — Gradient descent

### Semiglobal inversion

#### On the choice of the global optimization method

This work is focused on introducing an approach for the practical estimation of intrinsic attenuation and velocity from real data, rather than discussing the computational efficiency of optimization algorithms. We point out that similar semiglobal approaches could be obtained, in principle, combining different algorithms. For example, conjugate gradients or the L-BFGS method (Nocedal and Wright, 2006) could be used for the nested iterations. However, the semiglobal inversion algorithm introduced herein is focused on the estimation of parameters over one or two frequency bands, and for a relatively small number of iterations. In such a case, the full benefit of these methods might not be achieved, while introducing computational overhead as a consequence of increasing the number of arithmetic operations. In addition, alternative global optimization methods can also be considered. Monte Carlo methods are relatively inefficient because they require a large number of realizations in the search space. Simulated annealing (SA) mitigates this problem. However, it depends upon a cooling schedule. For a successful application of SA, the cooling schedule needs to be sufficiently slow to avoid convergence toward local minima (equivalent to the process of annealing producing a structure with defects). If that cooling schedule is too slow, then the number of space searches also becomes relatively high. In addition, SA can also require model-space searches that effectively do not produce an update. Other efficient classes of metaheuristic methods include GA and particle swarm optimization (PSO). The former encompasses a vast domain of different approaches based mainly upon natural selection. The latter is based upon the dynamics of populations or physical systems (Kennedy and Eberhart, 1995, 2001). Each element of that population, or swarm, is called a particle. Particles move in the multidimensional space evolving toward an optimum position (e.g., minimum energy). The position and velocity define the states of a particle. The evolution of each particle is determined by its own current and past states as well as the past states of the whole swarm. The position of a particle is effectively the optimization variable. Despite the need to store a memory of the system, PSO algorithms are computationally very efficient and are not memory demanding. QPSO was introduced extending the concept of swarm optimization to systems with quantum behavior (Sun et al., 2004a, 2004b). A key advantage of QPSO over PSO is its superior computational efficiency because it does not need information on the velocity vector of each particle. Hence, fewer parameters need to be adjusted. In quantum systems, the dynamics of particles is defined by a quantum potential. That potential sets an attractor toward a global minimum. The $\delta $-well (Levin, 2001) is the most commonly used quantum potential in QPSO. That is also the potential used in our implementation. Note that the state of a quantum particle is defined by a potential, and the optimization variable is the position. This sets a clear difference between PSO and QPSO as in the former the state depends explicitly on the position. In QPSO, the information on the position of each particle is obtained collapsing the potential with the Monte Carlo method (for a thorough explanation, see Sun et al., 2004a, 2004b).

Debens (2015) reports a higher efficiency of QPSO over GA. In addition, as one can immediately conclude, the dimension of the population (the number of particles in the swarm) drives the computational cost of QPSO. However, because the state of each particle can be assessed independently, this makes QPSO very suitable for parallelization over the number of particles. That is an important feature because computing model realizations can be very intensive. Parallelization is not as trivial with SA or Monte Carlo algorithms. Hence, QPSO has significant advantages over other global optimization algorithms regarding its efficiency, and for that reason it is our method of choice. We refer to Press et al. (2007) and Sen and Stoffa (2013) for a thorough description on global optimization methods.

#### The semiglobal inversion algorithm

Algorithm 1 describes a QPSO algorithm combined with local nested iterations of steepest descent. It is the combination of those two methods that gives the semiglobal algorithm introduced herein. Effectively that algorithm can be used in any inversion application changing the function to carry out the model realizations. The position of a particle $k$ in the swarm, at the global iteration $n$, is denoted $\eta k,n$. Note that effectively $\eta k,n$, represents the model of attenuation, $m2$, associated to a particle $k$ at the global iteration $n$. One can then define the set of all the positions of the particles in the swarm at iteration $n$ as, $Mn={\eta 1,n,\eta 2,n,\u2026,\eta Np,n}$. The object $M0$ defines the set of the initial position of the particles in the swarm. The elements of $M0$ must be uniformly distributed. Each position $\eta k,n$ has a respective level of energy $Jk,n$. The objective of the QPSO optimization is minimizing the overall energy of the system measured by an objective function. The objective function can be arbitrary. However, in the context of geophysical inversion and herein, the objective function is the $L$_{2}-norm of the data misfit, $Jk,n(m1,\eta k,n)=12\Vert p(m1,\eta k,n)\u2212d\Vert 22$, where $d$ denotes the data and $p(m1,\eta k,n)$ is a model realization. The set $Jn={J1,n,J2,n,\u2026,JNp,n}$ contains the corresponding level of misfit for each element of $Mn$. The set of the best overall position for each particle is denoted $M*={\eta 1*,\eta 2*,\u2026,\eta Np*}$. Each element of $M*$ is the position for each particle at which the lowest value of the objective function was reached, between the first and a given iteration. The corresponding set of data misfits is denoted $J*={J1*,J2*,\u2026,JM*}$. Finally, the best overall position of the swarm $\eta g*$ corresponds to the minimum of $M*$, denoted as $Jg*$. Formally, $Jg*=Ja,b$ with $a,b=argmina\u2208{1,..,Np},b\u2208{1,..,n}(Jn)$, and $\eta g*=\eta a,b*$. The update of each particle’s position depends on the diagonal matrices $\phi $ and $\varphi $, with random numbers uniformly distributed on their diagonals, and on $r$ (0,1), which denotes a random number following a normal distribution with zero mean and unit variance.

The parameter $\beta $ is the contraction-expansion coefficient. Sun et al. (2012) report that $\beta =0.75$ is a suitable value for unimodal problems (only one global minimum and possibly several local minima). We did not test the effect of changing this parameter because it has been investigated previously and the value pointed out yielded good convergence in all the examples in the outline.

We point out that the semiglobal algorithm is described in a general fashion because it effectively finds applications with several classes of parameters (e.g., seismic anisotropy or elastic parameters). In this scope, the position of a particle is effectively a model of $Q$ represented over a sparse basis and parameterized with the logarithm of base 10.

### Basis functions

The dimension of the basis functions is related to the scale of the wavelengths that are present in the distribution of a given model parameter in space. That is, the smaller the wavelength of the anomalies, the higher the dimension of the space of the basis functions has to be. Choosing the space of the basis functions appropriately allows reducing the dimension of the discretization space. For example, wavelets are remarkably flexible to represent large- and small-scale anomalies with a small number of basis functions (Daubechies, 1992). That fact has been exploited for devising model-compression schemes in second-order Gauss-Newton inversion (Abubakar et al., 2012).

For large-scale global (or semiglobal) inversion methods, and when the estimate of the states of the system is computationally intensive (as in the case of using finite-difference or finite-element methods), the use of global (or semiglobal) inversion is feasible for up to a few tens of parameters. Thus, that requires two factors to be considered — either the parameter to be estimated has a smooth variation in space, or not having, its smooth representation yields a model-response very close to that of a full representation of all its wavelength components.

Here, we take advantage of the fact that for some classes of physical properties, it is sufficient to know their long-wavelength components to obtain a relatively accurate simulation of seismic data or at least be accurate enough to carry out inversion. Examples of physical properties that can often be represented with a low-dimensional basis are the background P-wave velocity, seismic anisotropy parameters, $VP$-to-$VS$ ratio (da Silva et al., 2018a), and P-wave intrinsic attenuation. Here, we estimate models of intrinsic attenuation in a low-dimensional space, henceforth referred to as the reduced-space or sparse, grid, whereas the acoustic velocity is estimated in a conventional grid, henceforth referred to as the full-space grid. This approach requires a mapping between reduced and full-space grids. Here, we perform such mappings with B-splines (Schumaker, 2015). The mapping approach consists in estimating the values of an unknown parameter over the nodes of a sparser grid. Then, that parameter is determined in the full-space grid prior to carrying out the steepest-descent iterations. Note that we do not regularize the estimates of the model of the intrinsic attenuation in the case of semiglobal inversion.

## AMBIGUITY BETWEEN THE ESTIMATES OF $VP$ AND $Q$

In this example, we investigate numerically the different behavior of the gradient-descent and the semiglobal inversion methods when dealing with ambiguity. We consider a velocity model with a positive anomaly superimposed on a background defined by a positive gradient of velocity, as depicted in Figure 1a. The $Q$ model is depicted in Figure 2a, and it has a homogeneous background with an anomaly superimposed with relatively low $Q$. The background of the $Q$ model does not absorb energy. The region of the low-$Q$ anomaly absorbs energy.

The anomalies of the P-wave velocity and $Q$ are not correlated in space. Their spatial dimension is chosen such that both of them can be represented over a sparse grid. This allows testing the robustness of the semiglobal algorithm to ambiguity between the estimates of the P-wave velocity and $Q$.

The semiglobal inversion estimates the coefficients of the basis functions with B-spline interpolation at the nodes marked with circles in Figure 2a, 2c, and 2d. The semiglobal inversion has three degrees of freedom. Two of the unknowns are the values of the coefficients at the red and black nodes. The third degree of freedom is assigned to all the white nodes. The black node overlaid to the velocity model in Figure 1a has the same position as that in Figure 2. If the semiglobal inversion algorithm is robust to crosstalk, then the estimated coefficient of the basis function, located at the position of the black node, is close to the value of the $Q$ of the background.

We generated data for 345 shots, spaced at 24 m, and at a depth of 6 m. The temporal dependency of the source wavelet is given by a Ricker wavelet with a peak at 8 Hz. The receiver geometry is fixed with 720 receivers, spaced at 12 m, and at 12 m of depth. Estimating parameters from data contaminated with noise is a key issue in inverse problems as a result of poorer conditioning. Hence, the existence of noise in the data is an important factor to consider when testing the robustness of an inversion algorithm. We then generated a second synthetic data set contaminating the synthetically generated data with noise. The noise is generated with a random variable following a Gaussian distribution with zero mean and unit variance. The noise is convolved with the source wavelet to match its spectrum with that of the data. We set a loss of 5% in the signal-to-noise ratio, when adding noise to the data.

We carried out a series of tests inverting the data with FWI and with the semiglobal inversion combined with FWI. The local FWI inverts the data over four frequency bands with cut-off filters applied at 2.5, 3, 3.5, and 4 Hz. The updates are iterated 12 times in the first frequency band and six times in all the subsequent bands of frequency. The semiglobal inversion is carried out at 2.5 Hz only, with six outer global iterations and two nested local gradient-descent iterations for velocity only. Conventional FWI is then carried out with the estimated P-wave and $Q$ models at 3, 3.5, and 4 Hz, iterating six times in each frequency band. The overall amount of P-wave velocity iterations is the same in both cases. FWI inverts all the data, whereas each nested local iteration of the semiglobal inversion fits one sixth of the data. That means that after completing the semiglobal inversion all the data have been used twice, but at a fraction of the cost. The starting model, for each parameter, for all the local inversions is the background model. The semiglobal inversion generates starting models for $Q$ randomly, setting the range between 1.2 and 5 on a logarithmic scale of basis 10.

The jointly inverted velocity and $Q$ models, with FWI, are depicted in Figures 1b and 2b, respectively. The data used in this example are noise-free. One can observe that the algorithm did not recover the high-velocity anomaly. Even though the starting model of $Q$ is the exact background, the final estimated model has very large errors. It diverged toward a completely different background model with low values of $Q$. In addition, it shows a hint of correlation with the high-velocity anomaly, as it presents an artifact matching closely the top of this velocity anomaly.

Figure 1c and 1d depicts the velocity models estimated with the semiglobal inversion, with noise-free and noisy data, respectively. The respective estimated $Q$ models are depicted in Figure 2c and 2d. One can observe that the high-velocity anomaly is very well-recovered when the data are noise-free. The existence of noise in the data affects the estimate of the velocity anomaly. However, it is clear that the algorithm updates correctly the velocity anomaly, increasing its magnitude at the correct position in space. It is also important to note that the nested local iterations of semiglobal inversion use a reduced data set per iteration, hence decreasing redundancy and degrading the signal-to-noise ratio. Therefore, a stronger effect of the noise over the estimates at the nested iterations is expected. The corresponding $Q$ models are estimated with a very good accuracy regarding its background, positioning of the anomaly, and their respective magnitudes. The small-scale high-$Q$ anomalies in the vicinity of the true low-$Q$ anomaly are artifacts generated by the interpolator. These have no impact overall because these values of $Q$ are effectively very high and they have a smooth variation yielding effectively the same response as the background. One can observe that the presence of the high-velocity anomaly did not influence the estimated $Q$, resulting from semiglobal inversion. In other words, there is no evidence of ambiguity between the P-wave velocity and $Q$. Those velocity and $Q$ models are then used to carry out conventional FWI from 3 to 4 Hz, updating for velocity only. The resulting velocity models are depicted in Figure 1e for noise-free data and in Figure 1f for noisy data. Those pictures show a further improvement of the inverted velocity anomaly. The corresponding inverted velocity models, estimated with FWI only, assuming that the $Q$ model is known, are depicted in Figure 1g (noise-free data) and 1h (noisy data). These results show that the semiglobal inversion algorithm can effectively determine a model of $Q$ that is comparable with the true model of $Q$, leading to improved inversion results. The semiglobal inversion clearly outperformed the joint local inversion for velocity and $Q$ with conventional FWI regarding accuracy and suppression of ambiguity, due to the trade-off between estimates.

### A further analysis on ambiguity

As a last remark, in the case of parameterizing $Q$ with the logarithm of base 10, equation 16b becomes, $\delta C(m,\omega )=(\u2202C/\u2202vU)\delta vU+(\u2202C/\u2202Q)(\u2202Q/\u2202(log10\u2009Q))\delta (log10\u2009Q)$. Hence, the same conclusions regarding the trade-off between estimates hold.

## SYNTHETIC EXAMPLES

### Comparison of shot records obtained with true and background $Q$

We show with a numerical example, that in the case of $Q$, it is sufficient at least in some practical settings, to represent the model with the correct long wavelength, thus demonstrating the relevance and feasibility of the method outlined in this paper. We first generate an acoustic shot record for different combinations of velocity and intrinsic attenuation. The velocity model is the Marmousi model (Figure 4a), and synthetic shot gathers are generated without $Q$, and with the $Q$ models depicted in Figure 4b and 4d. The intrinsic attenuation model depicted in Figure 4d is the long-wavelength component of the model of $Q$ shown in Figure 4b. The model of velocity anisotropy is kept fixed in all the synthetic examples. The Thomsen’s parameters $\epsilon $ and $\delta $ defining the model of anisotropy are depicted in Figure 5a and 5b, respectively. The receiver geometry is fixed with a 12 m horizontal receiver separation, 720 receivers in total, at 12 m depth. The source is placed at 6 m depth and at 1440 m from the leftmost boundary of the model. The temporal history of the source function is given by a Ricker wavelet with a peak frequency at 8 Hz. A free-surface boundary condition is imposed at the top of the model, and absorbing boundaries (Yao et al., 2018b) are used at the lateral and bottom boundaries.

Figure 6a depicts a shot gather generated without intrinsic attenuation. Figure 6a and 6b depicts shot gathers generated with the intrinsic attenuation models shown in Figure 4b and 4d, respectively. Comparing the highlighted events, the effect of intrinsic attenuation in the synthetic data can be seen clearly by observing the stronger amplitude of the events in the shot record that is not affected by intrinsic attenuation (Figure 6a). Figure 6d depicts the difference between the shot gathers in Figure 6b and 6c. One can see that the difference between those shot gathers is residual. To compare these data more closely, we compare two traces from each one of the shot gathers. We selected trace number 300, with a relatively short offset (Figure 7a) and trace number 700 with a larger offset (Figure 7c). One can see that the traces generated with attenuating models are different from the trace generated without intrinsic attenuation in the model. In addition, one cannot detect any noticeable difference between the traces generated with the different models of intrinsic attenuation. Because the intrinsic attenuation affects the frequency response recorded in the seismic data, it is important to carry out a comparison between the spectra of these traces. Figure 7b compares the spectrum of the traces in Figure 7a, and Figure 7d compares the spectrum of the traces in Figure 7c. One can see that the spectrum of the data generated with intrinsic attenuation is different from the spectrum of the data generated without intrinsic attenuation, in both cases. In addition, one can also make the observation that the spectrum remained unchanged despite ignoring the high-wavenumber component of the model of intrinsic attenuation at both selected source-receiver offsets. We emphasize that the background model of $Q$ is correct in both cases. This is a clear demonstration of a potential ambiguity in the model space because two different models have a very close response. In the presence of noise, this issue becomes more critical. One can then conclude that data generated, and possibly recorded, with similar settings (geometry and bandwidth) to those in this example, may not contain enough information to reconstruct the high wavenumbers of the model of intrinsic attenuation. This example also shows that the high wavenumbers in the true $Q$ model can be largely ignored when estimating the P-wave velocity, provided that the background model of $Q$ is sufficiently accurate.

### Inversion with the true and background Q models

To test the hypothesis formulated in the previous sentence, we carried out the inversion for velocity only using FWI generating a full data set using the Marmousi model (Figure 4a) and the model of $Q$ with long- and short-wavelength perturbations (Figure 4b). The synthetic data are generated for 345 shots spaced 24 m along the horizontal and placed at 6 m depth. The receiver geometry and temporal dependency of the source wavelet are the same as in the previous example.

The inversion is carried out using the starting velocity model depicted in Figure 4c, and the model of intrinsic attenuation is left unchanged throughout the inversion. Thus, only the velocity model is updated. The inversion is set to run in blocks of six iterations per frequency band. After completing each block of iterations, the frequency band is widened 1 Hz. The first frequency band is selected applying a cut-off filter at 3 Hz. Then, the process is repeated up to a band of frequencies limited at 16 Hz. This means that we run 84 iterations in total. Inversions are carried out leaving $Q$ unchanged throughout the inversion, and using the models depicted in Figure 4b and 4d. The corresponding inverted velocity models are depicted in Figure 8a and 8b, respectively. By inspection of the two figures, one can conclude that the differences between the two inverted models are negligible. That result demonstrates that P-wave velocity can be estimated accurately inverting surface seismic data if the background $Q$ model is known. The latter can be represented in a sparse basis. Then, a sparse representation of $Q$ is relevant, with the proviso that the long wavelengths of the perturbations of $Q$ are appropriately represented in that reduced basis.

### Conventional FWI versus semiglobal inversion

#### Inversion with fixed $Q$

In this example, the medium is viscoacoustic and it has velocity anisotropy. The data are generated using the Marmousi model (Figure 4a) and the model of $Q$ in Figure 4b. We use those models in all the following synthetic examples. The model of seismic anisotropy is defined by the Thomsen’s parameters depicted in Figure 5a and 5b. The source-receiver configuration is the same as that used in the previous section.

Conventional FWI is carried out over the generated synthetic data set using the same setting as that in the example in the previous section. First, it is assumed that the model of $Q$ is known, leading to the inverted velocity model in Figure 9a. As expected, the inversion recovers most parts of the structures quantitatively and qualitatively. When the intrinsic attenuation is ignored throughout the inversion, the inverted velocity model shows strong errors as shown in Figure 9b. The effect of energy absorption in the upper left region of the model due to the presence of an anomaly with small values of $Q$ is clear. The effect of the error in the model of intrinsic attenuation (as it is disregarded) becomes especially visible in the region deeper than 1.2 km, where the high-velocity sharp structures become completely smeared (highlighted with white arrows). In addition, the average estimated velocity is lower than that of the true velocity model. That is most visible in the central and deeper part of the model. The error in the macrovelocity model impedes the correct positioning and reconstruction of the high-wavenumber anomalies of the P-wave velocity. Those inaccuracies happen because in the presence of intrinsic attenuation, propagating seismic waves suffer from dispersion, meaning that the lower frequency components of the signal propagate at a slower rate than that of the higher frequency components (Aki and Richards, 2002). This implies that the overall envelope of energy, or group energy, propagates more slowly. Because the generated synthetic data are affected by intrinsic attenuation, when inverting this data set without accounting for intrinsic attenuation will drive the estimated velocity to be less than the true one to correctly match the traveltime and recorded phases at the receiver positions.

#### Joint local inversion without regularization

A third example involves jointly inverting for velocity and intrinsic attenuation. The starting model for $Q$ is homogeneous and with $Q=105$. This inversion example does not include the regularization term. Hence, the inversion is carried out minimizing only the data-misfit term. In this case, the inverted velocity model, depicted in Figure 9c, shows short-wavelength artifacts and a very poor reconstruction especially in the central region of the model. In addition, some of the anomalies are smeared, as in the previous example. The corresponding inverted model of $Q$ is shown in Figure 10a. One can observe that the inverted $Q$ (with conventional FWI) contains short- and long-wavelength perturbations. The background model of the inverted model is on average close to the background of the true model. That is, the algorithm estimated a model with a background that quantitatively approached the true values of $Q$. Nonetheless, this estimate of the background of $Q$ is still inaccurate and does not have the same distribution in space as that of the true model (Figure 4b). In addition, the inverted model of $Q$ shows short-wavelength anomalies, which do not exist in the true model. These artifacts clearly correlate with interfaces of the anomalies in the true models of velocity and $Q$, and they result significantly from a trade-off between the estimation of velocity and $Q$.

The trade-off between the estimated parameters results from the fact that anomalies of velocity and $Q$ scatter energy with a similar dependency on direction of the radiation envelope, as pointed out earlier. Then, it is very difficult to determine which perturbed physical property has originated a perturbation in the data. That statement is supported by the radiation patterns of scattered energy (Figure 3).

#### Joint local inversion with regularization

As demonstrated, the correct estimation of the long wavelengths of the model of $Q$ is sufficient for a relatively accurate reconstruction of the P-wave velocity model. In addition, the previous example shows that joint local estimation of velocity and intrinsic attenuation is not suitable for carrying out this type of inversion, as a consequence of the strong ambiguity in the estimated models. However, in this case, all the wavelength components of the $Q$ model are estimated throughout the inversion. Hence, a question that is naturally raised is what happens if the local inversion penalizes the short-wavelength components of the model of intrinsic attenuation. Smoothing the update of $Q$ with a local scheme is conceptually equivalent to the effect of using a reduced basis. Malinowski et al. (2011) and Kamei and Pratt (2013) report smoothing regularization over $Q$. The former applied a smoothing operator over the gradient for the model parameter(s), whereas the latter regularized the update of $Q$ with a smoothing penalty term.

As pointed out earlier, the joint estimation of P-wave velocity and $Q$ suffers from ambiguity in the whole range of wavenumbers. Hence, smoothing the successive updates of $Q$ can at best eliminate ambiguities with a high wavenumber. For the low wavenumbers, this issue is significantly ameliorated as long as the long wavelengths of the starting velocity model are estimated correctly. This condition is generally met because it is necessary to carry out FWI successfully.

As in the previous example, the starting model of $Q$ is homogeneous and with $Q=105$. The inverted velocity model, depicted in Figure 9d, shows a relatively good reconstruction of the long- and short-wavelength anomalies. However, high-wavenumber artifacts are still observed. Even though the inverted velocity model improved in comparison with the one in Figure 9c, its reconstruction is not as good as the one obtained when the true model of $Q$ is known (Figure 9a). The corresponding inverted model of $Q$ is shown in Figure 10b. One can observe the clear effect of the regularization term because only the long wavelengths of $Q$ are estimated. The values of the inverted model of $Q$ are on average close to the background model (Figure 4d). However, qualitatively, one can observe that the models have very different structure. The inaccuracy of the estimated model of $Q$ introduces traveltime errors and errors in the interfering pattern of the modeled wavefield. This introduces overall errors in the simulated recorded phases that will be translated into errors in the estimated velocity model. Hence, one can conclude that the smoothing regularization term is effective in eliminating high-wavenumber artifacts in the inverted $Q$ model and mitigating some of the high-wavenumber artifacts in the velocity model resulting from a trade-off between estimates (comparing Figure 9c and 9d). However, it is not effective in estimating a sufficiently accurate background $Q$ model. As a result the jointly inverted velocity model is also inaccurate.

#### Semiglobal inversion

The semiglobal inversion scheme jointly updates velocity and $Q$ at 3 Hz, using a swarm of 20 particles, performing 21 global iterations each with three nested local iterations for velocity only. Only each sixth shot is used in a nested local iteration. The whole set of data was used upon completing the semiglobal inversion. The starting velocity model is the same as that used in conventional FWI (Figure 4c). The starting models for $Q$ are generated randomly setting a range of variation between 10 and 1000 for this parameter. The search space of $Q$ is parameterized with its logarithm of base 10. The values of $Q$ are estimated at the position of the nodes (in white) overlaid to the true $Q$ model (Figure 10f), except at the ones at the top row. These are forced to match the value of $Q$ in the seawater. The resulting inverted velocity and $Q$ models are depicted in Figures 9e and 10c, respectively. The $Q$ model estimated with the semiglobal inversion is very close to the background of the true model. Effectively the semiglobal inversion method estimated an accurate background $Q$ model, concerning its structure in space and the magnitude of the recovered anomalies. Those inverted models are used to carry out the conventional FWI inversion subsequently. The local FWI starts at 4 Hz, and the frequency band is widened 1 Hz after the completion of blocks of six iterations up to 16 Hz.

The final inverted model of velocity is depicted in Figure 9f. One can observe that this model is very similar to the inverted velocity model when the true model of $Q$ is used to invert for the velocity only (Figure 9a). There is no evidence of an existing trade-off between the estimated velocity and the estimated intrinsic attenuation. It is important to note that errors in the estimated models of $Q$ affect the estimates of the velocity model, as one can conclude from the previous examples. There is no evidence of the existence of these errors affecting the estimate of the velocity model (Figure 9f) because this model is comparable with that when the true model of intrinsic attenuation is known (Figure 9a).

#### Projection of the gradient of $Q$ onto a sparse basis

As we demonstrated above, the regularized inversion example does not lead to an accurate update of the long wavelengths of $Q$. On the other hand, the semiglobal inversion algorithm was very effective in estimating a $Q$ model that is very close to the true model. However, one could argue that such discrepancy is because the regularization term behaves much differently than estimating the components of the long wavelengths in a sparse basis. In this example, we test that hypothesis by projecting the updates of $Q$ over the same sparse basis as the one used in the semiglobal inversion algorithm. See Appendix C for details on the projection onto a sparse basis.

The inversion is carried out jointly updating the velocity and the intrinsic attenuation over the same bandwidth used in the previous local inversion examples. The number of local gradient-descent iterations is also the same. The key difference in this example is the projection of the gradient of $Q$ onto the sparse basis. That projection is carried out at each local iteration after computing the gradient of $Q$. The projected gradient is then used to compute the update of $Q$. The starting model of $Q$ is homogeneous with $Q=105$. Figure 9g depicts the resulting inverted velocity model. One can see that the main features are reconstructed. However, this model shows high-wavenumber artifacts. In addition, the velocity anomalies in the central part of the model are smeared. The inverted model of velocity is comparable with that obtained regularizing the update of $Q$ with a smoothing constraint. The corresponding model of inverted $Q$ is depicted in Figure 10d. The updates of $Q$ converged toward a model that is almost homogeneous, and it does not show any of the anomalies present in the background model. However, the values of $Q$ are very low; therefore, in a sense the local inversion updates approximated an attenuating medium, which is the main feature of the true model of $Q$. We carried out several inversion tests with different starting $Q$ models leading to similar results.

#### Inversion of data affected with noise

The noise is generated with a random variable with Gaussian distribution with zero mean and unit variance. The spectrum of the noise is then matched to that of that data, convolving the noise with the source wavelet. Finally, we contaminated the synthetically generated data with the generated noise setting a loss of 5% in the signal-to-noise ratio. One should notice that FWI carried out subsequently to the semiglobal inversion relies upon estimates of velocity and $Q$ models that are affected by noise. Hence, in this section, we compare our results against the effect of noise when the $Q$ model is known a priori.

We first inverted for velocity only with the same number of iterations and frequency bands as used in the previous examples. The model of $Q$ is the true model and is unchanged throughout the iterations. The resulting inverted velocity model is depicted in Figure 9h. One can identify some degradation of the reconstructed velocity model when it is compared with the noise-free data example (Figure 9a). More noticeable is the less accurate reconstruction of the high-velocity layer highlighted with the white arrow, and the small-scale structures highlighted with the black ellipse. Nonetheless, the inverted model is still relatively accurate overall.

We then carried out the semiglobal inversion with subsequent FWI. We used the same setting as that used previously in the noise-free examples. The final inverted velocity and $Q$ models are depicted in Figures 9i and 10e, respectively. First, one can observe that the quality of the inverted velocity model is comparable with that of the model inverted when the true model of $Q$ is known (Figure 9h). That is highlighted with the white arrow and the black ellipse. The inverted model of $Q$ is less accurate than that estimated with noise-free data. This is most visible as the low $Q$ anomaly in the upper-left region has a longer extension than that of the true model. In addition, the overall values of $Q$ are smaller in the upper-right region and bottom-left region than those estimated with noise free data and than those of the true model. However, overall, the model is still very accurate, and the final estimate of the velocity model is comparable with that when the true model of $Q$ is known. Therefore, there are no inaccuracies related to $Q$ being introduced in the estimates of P-wave velocity.

## REAL-DATA EXAMPLE

The field data set was acquired in shallow water in the Norwegian North Sea. This field is a gas-condensate discovery, and the reservoir is a fractured chalk formation sitting at the crest of an anticline. This reservoir sits in a seismically obscured region due to the presence of a gas hosted in an overlying formation of interbedded sandstone and silt (Granli et al., 1999; Warner et al., 2013).

A full-azimuth, 3D, ocean-bottom cable survey, was carried out over this field. The cables were 6 km long with inline spacing of 25 m, and the crossline spacing between adjacent cables was 300 m. Figure 11 shows the acquisition geometry with respect to the overlying gas cloud and a well drilled in the area. The shooting was carried out orthogonally to the receiver cables, with 75 m crosstrack and 25 m along-track separation, representing a total of 96,000 sources for 5760 4C receivers, covering an area of $180\u2009\u2009km2$ approximately. The dark-blue lines, labeled IL for the inline, and XL for the crossline, represent the position of selected sections for plotting the velocity and $Q$ models in the following examples.

### Setting the inversion framework

Low-frequency content in the data is crucial for the successful application of FWI (Bunks et al., 1995; Sirgue and Pratt, 2004). However, real data are affected by noise thus limiting the region of the spectrum that can be used and consequently imposing a restriction on the lowest frequency that can be used in the inversion. The band of frequencies that can be used in the signal is determined from the signal-to-noise ratio. The starting frequency is determined from the phase variation within common-receiver gathers by identifying the lowest frequency with a coherent signal. For this data set, it was determined that the starting frequency is 3 Hz.

The data were acquired with four components, from which three components are geophones, which measure the particle velocity, and one component is a hydrophone, which measures pressure. In this viscoacoustic inversion, we invert only the hydrophone data. Source and receiver ghosts, the source bubble, surface and internal multiples, precritical and postcritical reflections, and wide-angle refractions are all included in the inversion.

Determining a wavelet that simulates the seismic data with significant accuracy over the frequency range of interest is important for inverting data with FWI. The source signature was derived from the direct arrival, and the source and receiver ghosts and water-bottom multiples were deconvolved. This source wavelet has to be free of ghosts and multiples because the modeling of seismic data includes an explicit free surface, which will regenerate all of these arrivals.

FWI is computationally intensive, and it scales with the dimensions of the grid and the number of time steps with $O(n4)$, where $n$ is the number of cells. The number of shots to be simulated then multiply this computational cost, representing a large computational load if the volume of data to be inverted is large. This burden is decreased dramatically by considering reciprocity, that is, interchanging receivers with sources and vice-versa. Thus, the number of effective shots in the simulation of data is reduced to 1440 sources (because only the hydrophone components are being used) at each iteration with 96,000 reciprocal receivers. Because the data set covers the area very densely, a sparse subset of the shots can be used, which further reduces the computational cost (van Leeuwen and Herrmann, 2013). This sparse subset of reciprocal sources is different at each iteration. However, all the data are used after completing each block of iterations for each selected frequency band.

A good starting model has to accurately account for the kinematics of wave propagation and guarantee that the misfit between the recorded and simulated data is within half a cycle. Early tests carried out to determine a starting velocity model demonstrated that it is not possible to match the short and long offsets using an isotropic model. This indicated that seismic anisotropy has to be incorporated. In fact, the existence of anisotropy would be expected from the geologic characteristics of the area, due to the fine layering of silt and sand in the overlying gas-cloud region, as well as potentially in the fractured chalk in the reservoir.

Figure 12 shows the starting velocity model. This model is built from reflection tomography by picking residual moveout on the prestack time-migrated volume. The parameters of anisotropy $\epsilon $ and $\delta $ (Figure 13a and 13b) are estimated by matching the stacked time-migrated gathers to well information and by matching longer-offset moveout. The starting velocity model matches the kinematics of the real data, presenting an anomaly with low velocity at the center of the model matching the region of the gas cloud. This low-velocity anomaly was inserted in the model to match prior geologic information from a sonic log. This sonic log was acquired in a vertical well that crosses the section along the edge of the gas cloud, in the direction orthogonal to that section, down to the reservoir, at the crest of the anticline and crossing the deeper carbonates with higher velocity.

A close correlation can be observed between the sonic log and the variation of the velocity profile along the vertical, where the velocity of the sonic log decreases in the region of the overlaying gas cloud and an increase in the velocity correlating with the crest of the carbonates structure. A comprehensive description of this seismic data set preprocessing, as well as the starting velocity- and anisotropy-model building is outlined in Warner et al. (2013).

In this example, inversion is carried out up to 6.5 Hz. The models of anisotropy are smooth and match the kinematics of the waves only, rather than the dynamics. These approximations for velocity anisotropy are valid in the selected frequency range where the dynamics plays a lesser role than the kinematics. The density is derived from Gardners’ law. The presence of gas in the overlying interbedded silt and sand creates a low-velocity anomaly, and it is responsible for strong intrinsic attenuation of the seismic energy that propagates through this region. This has an impact in the inversion of the seismic data and resulting inverted models if not taken into account.

The inversion was carried out for six frequency bands with a cut-off low-pass filter applied at 3.0, 3.5, 4.1, 4.8, 5.6, and 6.5 Hz. The order of the shots is randomized prior to running the inversion to avoid generating coherent interference patterns. The seismic anisotropy is left fixed throughout the inversion.

### Inversion without intrinsic attenuation — Conventional FWI without $Q$

FWI without $Q$ is carried out for velocity only, the medium is assumed to be VTI, and intrinsic attenuation is not considered in the constitutive relation. The inversion is carried out iterating 18 times for each frequency band, up to 5.6 Hz, and using every 18th shot. In the last frequency band, the number of iterations is 36, and every 18th shot is used. This means that the overall number of local gradient-descent iterations is 126, and that all the data have been used upon completing each block of iterations in each frequency band. Each trace is thus used just seven times during the inversion.

### Semiglobal inversion followed by conventional FWI with $Q$

The semiglobal inversion of the seismic data assumes a medium with anisotropy and intrinsic attenuation. In the first stage, the inversion updates the velocity and $Q$ with the outlined semiglobal inversion algorithm. Then, conventional FWI is used inverting for velocity with local gradient-descent updates, while keeping the $Q$ model fixed. The seismic anisotropy is not updated in any of the stages. The same starting velocity model is used, and a population of 12 particles with random models is generated. The random values of $Q$ in the starting models range between 10 and 2000. The values of $Q$ are estimated in the gas cloud, sediments and carbonates regions over the grid nodes of the sparse basis as depicted in Figure 12. However, we do not estimate a different value of $Q$ for each node because this would represent an insurmountable computational cost. Instead, we use the structure of the starting velocity model as a prior for defining three distinct regions. We then group the nodes lying within each one of these regions, and the same value of $Q$ is assigned to all nodes within that same group. We consider three principal distinct regions in the model. The first is the gas-cloud (black nodes), the second is the carbonates formation (gray nodes), and the third is the background (white nodes). As in the case of the synthetic example, $Q$ is not estimated at the node positions in the first row. At these positions, we set $Q$ fixed and equal to that of the seawater ($Q=105$). Effectively, the dimension of the sparse basis is three; i.e., only three values of $Q$ are estimated. Semiglobal inversion is carried out with six semiglobal iterations at 3.0 Hz and six semiglobal iterations at 3.5 Hz, each with three nested local iterations. This means that upon completing the semiglobal inversion, the algorithm iterated the velocity model 36 times with 18 iterations at 3.0 and 3.5 Hz. Thus, the velocity model was updated as many times as in the case of the conventional FWI after completing two bands of frequency. We also further reduced the amount of data used in each local nested iteration, using only every 36th shot. This means that all of the data have been used after completing the 36 semiglobal iterations. This procedure did not produce artifacts in the estimated velocity model with the semiglobal inversion because the survey is very dense, the inversion frequency band in the signal is relatively low, and the shot positions are randomized prior to inversion. After completing the semiglobal iterations, we switched to the setting used in the previous example with conventional FWI, holding fixed the $Q$ model estimated with the semiglobal inversion. The FWI inversion starts at 4.1 Hz and runs up to 6.5 Hz.

Figure 14a and 14b shows inline sections of the inverted velocity models using conventional FWI, without $Q$, at 3.5 and 6.5 Hz, respectively. One can note that at 3.5 Hz, the inversion has changed the starting model significantly. Most visibly is the sharpening of the gas cloud and channels as well as significant updates of the velocity in the region of the anticline. At 6.5 Hz, the gas cloud becomes larger and the velocity of the structure at the bottom decreases. The inversion clearly is forcing an estimate of velocity that is relatively more homogeneous when compared with that at 3.5 Hz. There is significantly less energy propagating closer to the edges, which implies that the estimates of velocity are less constrained toward the edges.

Figure 14c shows the velocity model after carrying out the semiglobal inversion up to 3.5 Hz. As pointed out earlier, the overall number of nested local iterations is the same as that when carrying out FWI at 3.5 Hz. The resolution of the velocity model depicted in Figure 14c is the same as that of the velocity model shown in Figure 14a. The velocity model depicted in Figure 14c is used as a starting model to carry out conventional FWI, with $Q$, starting at 4 Hz and up to 6.5 Hz. Figure 14d depicts the final inverted velocity model resulting from semiglobal inversion followed by conventional FWI with $Q$. Comparing Figure 14b and 14d, one can identify that both inversion approaches lead to comparable velocity models. Both show a sharpening of the gas cloud and of the lateral channels through which gas has traveled. In both cases, the structure at the top of the anticline, corresponding to the reservoir of fractured chalk, is resolved. An evident difference between the resulting inverted velocity models is the fact that the gas cloud shows higher heterogeneity and is sharper when intrinsic attenuation is considered in the inversion. In particular, the bottom region of the gas cloud is now clearer, and is almost detached from the main gas cloud. We show the corresponding velocity models inverted with FWI and the semiglobal method at 6.5 Hz, along the crossline direction, in Figure 15a and 15b, respectively.

The overall magnitude of the P-wave velocity estimated without taking intrinsic attenuation into account (Figures 14b and 15a) is higher than that estimated when intrinsic attenuation is taken into account (Figures 14d and 15b). That is because when considering intrinsic attenuation in the modeling, the group velocity is slower; thus, the velocity has to increase to match the recorded phases in the data because these are invariant in the two examples, with and without intrinsic attenuation. This is especially evident in the formation of carbonates (deeper region in the model), where the velocity is clearly higher when considering intrinsic attenuation (comparing Figure 14b and 14d with Figure 15a and 15b).

The resulting model of intrinsic attenuation along the selected crossline is depicted in Figure 16. The estimated model of $Q$ is a macromodel because it is represented over a sparse support. The inverted model of $Q$ shows three main regions. In the central part of the model is estimated an anomaly that matches the position of the gas cloud with a low value of $Q$, thus correlating with the physical properties of this region. The presence of gas is responsible for strong intrinsic attenuation and absorption of energy, thus demonstrating a clear matching between the estimated values of $Q$ from the real data and the physical phenomena that occurs in this region. The other two regions are formed by sediments ($Q\u2248200$) and by the carbonates ($Q\u22481000$) in the deeper part. The estimated values of $Q$ in these two regions are also in agreement with the geologic conditions because the carbonates form a large macroscopic region that is relatively homogeneous, whereas the sediments are less consolidated than the carbonates. Thus, it is expected that the formations with sediments exert stronger intrinsic attenuation than the carbonates. However, the model of $Q$ does not have enough resolution to capture the fractured chalk at the top of the crest where stronger intrinsic attenuation should in principle occur due to the presence of these fractures.

Figure 17 compares the objective function when carrying out FWI only, and semiglobal inversion prior to FWI. The values of the objective function of the semiglobal inversion are compared at each third iteration of FWI because the value of the objective function is obtained after completing three nested local iterations. It is important to note that, in this example, the semiglobal inversion uses less data per iteration than conventional FWI. Then, the values of the objective function over the first 36 iterations are not directly comparable. One can observe that carrying out FWI after the semiglobal inversion method, and accounting for $Q$ yields an improved data fitting.

Figure 18 compares the data misfit using the semiglobal inversion and conventional FWI after completing the inversion, for each 75th trace in a receiver-gather (the position of the red star in Figure 11). The black bars represent the data fitting when inverting for velocity only and disregarding $Q$. The gray bars represent the resulting data fitting when inverting for velocity and $Q$ using the semiglobal inversion scheme. The range of misfit in the histogram is normalized by the maximum overall value of the misfit. One can see a consistent improvement of the data fitting for each trace in the receiver gather, as a result of our semiglobal inversion method. This resulting improvement in the data misfit results principally from computing waveforms with improved amplitude as a result of considering attenuation.

The difference between traces computed with and without the estimated Q model is mainly noticeable when comparing traces individually. Figure 19 compares traces for the same receiver gather. The traces are selected at the positions denoted with red circles and labeled $a$ to $d$ in Figure 11. The relative shot position is that of the red star. The amplitude of each trace has been scaled to facilitate the comparison. The traces on the left are real data, the traces at the center are computed considering the inverted $Q$ (with the corresponding velocity model), and the traces on the right are computed disregarding $Q$, using the velocity model resulting from conventional FWI. One can observe how the waveforms computed with the estimated viscoacoustic model became closer to those of the real data. Hence, the semiglobal inversion algorithm determined a model of the subsurface with a response closer to that recorded in the real data.

We further justify our result with a sonic log. Figure 20 compares the vertical profiles of the velocity extracted at the location of the well (denoted by the black circle in Figure 11) with a sonic log. One can see an overall good agreement between the two estimated models. The difference between the inverted models is not significant down to approximately 2 km of depth corresponding to the bottom of the gas cloud. With increasing depth, the effect of correcting the inversion with intrinsic attenuation becomes more noticeable as the errors in the modeling accumulated when not accounting for intrinsic attenuation, and the difference is more evident, demonstrating an overall better agreement between the estimated velocity model using the combined semiglobal scheme and the sonic log, when compared with the inverted velocity model not accounting for intrinsic attenuation. The estimate of the velocity in the layer with carbonates is also significantly more accurate when considering intrinsic attenuation. As mentioned above, the velocity model estimated with the semiglobal inversion scheme shows a second smaller region with gas separated from the main overlying gas deposit. This is also validated with the sonic log, showing that the vertical profile of the velocity has a better match with respect to the sonic log just less than 2500 m of depth, where this separation occurs.

## COMPUTATIONAL ASPECTS

In this section, we compare the computational cost of the semiglobal inversion, $O(G)$, against that of conventional FWI, $O(L)$. This comparison, as outlined herein, is valid whenever the nested local iterations of the semiglobal algorithm have the same computational complexity as that of the FWI implementation, regardless of the order of the local minimization algorithm (first- or second-order).

The overwhelming computational load of local minimization algorithms, in large-scale applications, comes from the solution of the forward problem. In this case, it comes from the computation of wavefields. These wavefields are computed several times per local iteration because the adjoint-state method requires the solution of the state-variable (wavefields) and of the adjoint-variable over a grid. The dimension of these grids can range between several hundred thousand, in 2D applications, and several million or thousands of million in typical large-scale 3D applications. In addition, a large amount of memory needs to be allocated to store those wavefields, as well as the physical parameters, and respective updates. Any other operations and memory storage are insignificant, when compared with propagating the wavefields over a grid.

On the other hand, the key operations carried out at each QPSO iteration are generating a set of random numbers, carrying out the update of the position of each particle, keeping track of the best position for each particle, and keeping track of the evolution of the cost function. The computational load of those operations is detrimental when compared with that of carrying out adjoint-state operations. Hence, the computational cost of a local nested iteration is the key factor driving that of the semiglobal inversion algorithm. Table 1 compares the computational cost of the conventional FWI with that of the semiglobal inversion method. One can immediately observe that the number of particles $Np$ is the differentiating factor in driving the computational cost of the semiglobal inversion, when compared with conventional FWI. The total number of local iterations $NL\u2032$ is determined by the number of local nested iterations $Nn$, the number of outer global iterations $Ng$, and the number of frequency bands $Nf\u2032$. If the number of global and nested global iterations remain constant over each frequency band, then the total number of local iterations carried out by the semiglobal inversion algorithm is $NL\u2032=NgNnNf\u2032$.

Then, we can get the relative computational costs of the examples outlined. The semiglobal synthetic data example runs a total of $NL\u2032=63$ local iterations distributed over 21 global iterations and one frequency band, and each iteration uses one sixth of the data; that is, $NS\u2032=NS/6$. The relative cost of the semiglobal inversion is then $O(G)=NpNLNS63/504$, or $O(G)=2.5O(L)$, showing that the computational cost of the semiglobal inversion method is competitive to that of typical applications of FWI.

In the 3D real-data inversion example, the local inversion completed a total of $NL=126$ local iterations using $Ns$ shots per iteration. The semiglobal inversion carried out a total of $NL\u2032=36$ iterations and using $NS\u2032=NS/2$, per iteration, and $Np=12$. Then, the relative cost of the semiglobal inversion method is $O(G)=NLNS\xd7(12\xd736)/(126\xd72)$ or $O(G)\u22481.7O(L)$. One can see that the computational cost of the semiglobal inversion is very comparable with that of conventional FWI over a typical band of frequencies. It is important to note that FWI was carried out in a relatively narrow band of frequency. If the FWI were upscaled up to 10 Hz, this relation would become even more favorable for the semiglobal inversion method, approaching a relation close to $O(G)\u223cO(L)$.

## CONCLUSION

We introduced a semiglobal inversion method for the joint estimation of P-wave velocity and intrinsic attenuation. The method relies upon the use of local iterations updating velocity only, nested within a loop of global iterations for the estimation of quality factor $Q$ on a reduced basis. We have demonstrated the feasibility of using a reduced basis when estimating $Q$ with a numerical example, showing that a good estimate of the macromodel of intrinsic attenuation suffices for the estimation of a velocity model from seismic data affected by intrinsic attenuation. The accuracy of that estimated velocity model is of the order of that when the true intrinsic attenuation model is known. Synthetic examples demonstrated that unlike pure gradient-descent methods, semiglobal inversion does not suffer from noticeable crosstalk between estimated parameters, and it works in complex rheology in the presence of anisotropy. In addition, we also demonstrated that smoothing regularization combined with local optimization, or updating $Q$ over a sparse basis with a local descent method, is not suitable for estimating the long wavelengths of the model of $Q$. Hence, the outer global update of intrinsic attenuation clearly yields improved joint estimates of intrinsic attenuation and P-wave velocity, outperforming any of the approaches outlined based exclusively on local-search methods.

A real-data example demonstrated the feasibility of the method for the inversion of data recorded over large 3D surveys. This example demonstrated that the semiglobal algorithm estimated a model from the real data that matches the known geologic conditions of the area, and it improved the match between the estimated velocity model and a sonic log acquired in a well drilled in the area. In addition, the waveforms of the predicted data match more closely those of the recorded data. This demonstrates that the semiglobal inversion method outlined here improved the estimation of the models of the subsurface — for the intrinsic attenuation and the intrinsic attenuation-corrected P-wave velocity.

## ACKNOWLEDGMENTS

The authors are grateful for the comments and insights of assistant editor D. Draganov, associate editor P. Routh, R. Kamei, M. Vyas, and an anonymous reviewer who helped to improve the outline of this paper significantly. The authors are also thankful to the sponsors of the FULLWAVE Game Changer research consortium, under the IFT research program.

### COMPUTATION OF THE GRADIENT OF THE OBJECTIVE FUNCTION WITH THE ADJOINT-STATE METHOD

**A**is the discretized forward-modeling operator. The angle brackets denote an inner product defined over the space of source functions $S$, and $T$ in the superscript denotes the transpose. When the constraint is satisfied $L(p(m),\lambda ,m)=J(p(m))$, and $\u2207mL(p(m),\lambda ,m)=\u2207mJ(p(m))$. Finally, forcing the condition $\u2207pL(p,\lambda ,m)=0$, yields

### RADIATION PATTERN IN A VISCOACOUSTIC ANISOTROPIC MEDIUM

### PROJECTION ONTO A SPARSE BASIS

In this appendix, we outline the approach for projecting the gradient of the objective function with respect to $Q$ onto a sparse basis. From a general point of view it can be applied to any other examples requiring a projection of a vector over a different basis. This includes any quantity that takes a discrete form (discrete gradient or discrete physical property).