ABSTRACT

The determination of background velocity by full-waveform inversion (FWI) is known to be hampered by the local minima of the data misfit caused by the phase shifts associated with background perturbations. Attraction basins for the underlying optimization problems can be computed around any nominal velocity model, and they guarantee that the misfit functional has only one (global) minimum. The attraction basins are further associated with tolerable error levels representing the maximal allowed distance between the (observed) data and the simulations (i.e., the acceptable noise level). The estimates are defined a priori, and they only require the computation of (possibly many) the first- and second-order directional derivatives of the (model to synthetic) forward map. The geometry of the search direction and the frequency influence the size of the attraction basins, and the complex frequency can be used to enlarge the basins. The size of the attraction basins for the perturbation of background velocities in classic FWI (global model parameterization) and the data-space reflectivity reformulation (migration-based traveltime [MBTT]) are compared: The MBTT reformulation substantially increases the size of the attraction basins (by a factor of 4–15). Practically, this reformulation compensates for the lack of low-frequency data. Our analysis provides guidelines for a successful implementation of the MBTT reformulation.

INTRODUCTION

The full-waveform inversion (FWI) approach to seismic inversion consists of minimizing the misfit between the recorded and synthetic data with respect to the earth parameters. It uses the full waveform of the recorded waves as data, which makes it free of horizon picking, and hence a good candidate for automatic inversion. It was first introduced by Bamberger et al. (1977, 1979) for the 1D wave equation, and it was followed, in two dimensions, by the work of Lailly (1983) and Tarantola (1984). In particular, Lailly (1983) links the gradient of the misfit functional, computed by the adjoint-state method, with the migration imaging operator used in geophysics. Initially carried out in the time domain, FWI can also be used in the frequency domain, as promoted by Pratt and Worthington (1990), Pratt and Goulty (1991), and Pratt et al. (1996). The use of complex frequencies was later introduced by Shin and Cha (2008, 2009), giving rise to the so-called Laplace and Laplace-Fourier domain FWI. Early numerical experimentations by Tarantola (1987) and Gauthier et al. (1986) already show the strength and the weakness of FWI: It can produce very clear and detailed images of the subsubsurface (strength), under the imperative condition that a good background velocity model is available, i.e., a priori information (weakness). In fact, the misfit functional with respect to background parameters shows local minima due to phase shift/cycle skipping, which hamper the reconstruction by local algorithms, unless the minimization starts inside the attraction basin.

In recent years, with the increase in computational power, the solution of the FWI approach to seismic imaging by local optimization algorithms has been extensively developed. The iterative minimization is usually conducted with Newton-type algorithms (Pratt et al., 1998). To avoid the formation of the dense Hessian (and its inversion), multiscale Newton methods are studied by Akcelik et al. (2002). For similar reasons, the pseudo-Hessian is studied by Shin et al. (2001) and the iterative conjugate gradient approach is implemented by Métivier et al. (2013). Estimating the step length is another challenge for the iterative minimization problem, and it is often conducted using the line search method (Nocedal and Wright, 2006). Alternatives for a more precise step exist, such as the trust region method (Eisenstat and Walker, 1994) or the maximum projected curvature method of Chavent (2010), but with increased computational complexity. The performance of FWI is further compared in the context of inverse scattering by Barucq et al. (2018), depending on the choice of search direction and the line search method.

However, none of these local optimization methods can avoid local minima. The determination of the full velocity model, including the low spatial frequency background, requires very low-frequency data (as unrealistic as 1 Hz) in marine seismic applications, coupled with a resolution from low- to high-frequency content to avoid “cycle skipping” (Bunks et al., 1995; Sirgue and Pratt, 2004). To overcome the presence of local minima when solving the FWI problem by local optimization algorithms, alternative parameterizations have been proposed. They are based on a decomposition of the earth parameter into background and reflectivity parameters, resulting in an increase of the computational complexity. The differential semblance optimization by Symes and Carazzone (1991) extends the depth reflectivity model to account for the various illuminations in the data, defining a semblance criterion whose minimization produces the desired background model; the migration-based traveltime (MBTT) approach of Chavent and Clément (1992), Clément et al. (2001), and Chavent et al. (2015) parameterizes the depth reflectivity by a data-space reflectivity, which is used to store the reflectivity during the update of the background. Note that the so-called reflection FWI (Xu et al., 2013) retains from MBTT the migration-demigration process, but it differs in that it does not use the data-space reflectivity concept and it uses the gradient of the classical FWI data misfit as the search direction.

In this paper, we investigate the optimizability of FWI in the frequency domain, using the geometric tools set up by Barucq et al. (2019b). We analyze the size of the attraction basin, in which the misfit functional is guaranteed to have no local minimum, and the associated maximal distance allowed between the data and the attainable set (i.e., the allowed noise level).

We first present the FWI minimization problem for a time-harmonic acoustic model, and its MBTT reformulation. The second section analyzes the influence of (complex) frequency and search geometry on the size of attraction basins and tolerable error levels for the classical FWI formulation. We also demonstrate the accuracy with experiments of FWI reconstruction. Then, we compare the optimizability of classic FWI and the MBTT reformulation. We quantify the improvement obtained with the MBTT approach which, thanks to the migration-demigration that eliminates phase shift/cycle-skipping problems, leads to larger attraction basins. We further indicate the guidelines for an efficient implementation of MBTT.

TIME-HARMONIC INVERSE WAVE PROBLEM

Propagation of time-harmonic acoustic waves

We consider the seismic inverse problem for acoustic media. In our numerical applications, we focus on a 2D domain Ω, but note that the methodology extends similarly to 3D problems. The propagation of time-harmonic waves in acoustic media is based on the Euler equations. Assuming homogeneous density, it coincides with the Helmholtz equation, where the pressure field p is the solution to
ω2c2(x)p(x)Δp(x)=g(x),
(1)
with boundary conditions being specified below. We have introduced the (possibly complex) angular frequency ω and the source term g. The wavespeed, which governs the propagation, is c(x).
Regarding the boundary conditions, we separate the upper free surface (say Γ1) from the numerical boundary because we do not want to simulate the entire earth (Γ2). First, we have a free surface condition on Γ1:
p(x)=0.
(2)
Second, we implement perfectly matched layers (Bérenger, 1994) on the other boundary (Γ2) to ensure that entering waves are not reflected back to the domain. For this, we follow the work of Turkel and Yefet (1998) and Wang et al. (2011). Note that in the following, we not only study this acoustic situation but we also offer some analysis for the isotropic elastic situation (Barucq et al., 2019a), as well as for other types of boundary conditions.
The inverse problem aims to recover the wavespeed c(x) in equation 1, using observations of the waves. We consider that measurements of the pressure field are recorded by receivers positioned on a portion Σ of the domain Ω (which is near the surface). The data contain the direct arrivals and reflections. However, to image the deep structures of the earth, we remove the direct arrivals from the recorded data before we proceed with the reconstruction. Therefore, the forward operator (that maps the model to the data), at frequency ω for the source g, is defined by
Fω(g)(c2)=R(pω(g)(x))R(p0,ω(g)(x)),
(3)
where R denotes the restriction to the receivers’ location Σ. The field pω(g) is the solution to equation 1 using the velocity model c and p0,ω(g) is the solution associated with a smooth velocity model, say c0, which generates direct arrivals only. It is important to remember that c0 possibly depends on c. In the following, we use m=c2 for simplicity, and we consequently write the forward operator Fω(g)(m). Furthermore, we use the vector notation to account for all sources and frequencies:
F(m)={Fω(g)(m),for all  (g,ω)}.
(4)

Reconstruction via classic FWI

The FWI method performs an iterative minimization of the difference between the observations and simulations, to quantitatively estimate the velocity model (Tarantola, 1984; Pratt and Worthington, 1990; Virieux and Operto, 2009). Assuming some observed data d, it consists of solving the following nonlinear minimization problem:
minmMJ(m)=12F(m)dD2.
(5)
Note that we consider the least-squares misfit functional, but alternative norms can be used (Fichtner et al., 2008; Bozdağ et al., 2011; Métivier et al., 2016; Yang et al., 2018; Alessandrini et al., 2019; Faucher et al., 2019). We introduce the following notation:
  • The term M is the model space, which is a subset of the parameter space E=Rn. The model is represented via n coefficients (dimension of the unknown). In our application, it has a piecewise constant representation that follows the space discretization.

  • The term D is the data space. The data consist of a vector of q complex numbers and D=Cq, with q=nfrequency×nsource×nreceiver, using the number of frequencies, sources, and receivers, respectively.

For the minimization, the model is updated iteratively from the computation of a search direction and a scalar step length α with
m(k+1)=m(k)αs(k).
(6)
Here, the search direction at iteration ks(k) depends only on the gradient of the misfit functional (i.e., the Hessian is not computed or inverted as in the full Newton method), which is written as
J(m)=DF(m)*(F(m)d),
(7)
where DF stands for the Fréchet derivative of F and * is the adjoint. More precisely, we rely on a nonlinear conjugate gradient method for the search direction (Nocedal and Wright, 2006). The specificity of FWI is to compute the gradient using the adjoint-state method, which avoids the complete Jacobian matrix (Chavent, 1974, 2010; Plessix, 2006). Note that in the frequency domain, due to complex-valued fields, the adjoint-state method requires careful steps that are prescribed by Barucq et al. (2019b) in their Appendix A. Then, the step length in equation 6 is usually computed with a backtracking line-search algorithm (Nocedal and Wright, 2006).

The main difficulty with FWI is the determination of the background velocity part of m because the misfit functional suffers from many local minima caused by the phase shifts between the observed and simulated signals. These can only be avoided if the data contain extremely low frequencies, as illustrated in our experiments below. As an alternative, the following MBTT reformulation of the problem is designed to alleviate these difficulties.

Background/data-space reflectivity decomposition for MBTT-FWI

In the MBTT approach, the (global) model m is decomposed into a smooth background p and the depth reflectivity r such that
m=p+r.
(8)
The essence of MBTT is to parameterize this depth reflectivity r by a new parameter s that is less dependent than r on the background p. This is naturally done by choosing for s an object in the data space, where the arrival times of the events are known and, hence, do not change with the background. Then, s is related to r by a migration operator associated to the current propagator p (Clément et al., 2001):
r=WDF(p)*s,
(9)
where W is a scaling operator meant to compensate for the lower amplitude of deep migrated events (Plessix et al., 1995). It is taken to be proportional to the square root of the depth in our experiment.

We shall refer to s as the data-space reflectivity, in opposition to the depth reflectivity r. The data-space reflectivity s is one of the unknowns that we search for, together with p, in the MBTT formulation. At the optimum, s can be understood as a demultiplied and deconvolved version of the data themselves, which hence provides a first approximation of s.

The forward map of equation 4 associated to the MBTT formulation is then
F(p,s)=defF(m),with  mgivenbyequations  8  and  9.
(10)
When this decomposition is used, it is natural to choose p as the smooth model m0=c02 that only generates direct arrivals in equation 3, so that F satisfies
F(p,0)=F(p)=F(m0)0.
(11)

As indicated above, the motivation for parameterizing the reflectivity by an element s of the data space is to eliminate the phase shifts induced in the simulations by changes in the background p: The events in the synthetics are obtained from the data-space reflectivity s by migration, followed by simulation with the same kinematics. Hence, they are expected to have the same phase as those of s, as illustrated in our experiments. In addition to controlling the phase, the stack involved in the migration turns, for a fixed data-space reflectivity, the data misfit into a coherency measure for the current background (Chavent, 2017). On the other hand, the MBTT model decomposition induces greater computational cost because of the dependency on DF(p) in equation 9. Evaluations are derived from adjoint-state formula, and we refer to Barucq et al. (2019b) in their Appendix C for more details.

As a consequence, the MBTT-FWI approach results in a minimization problem with respect to the unknown parameters p, which belongs to the set of admissible smooth backgrounds MsM, and s, which belongs to the data space:
minpMs,sDJ(p,s)=12F(p,s)dD2.
(12)

The iterative reconstruction has been shown successful by Chavent et al. (2015), starting with 5 Hz data, and it follows alternative updates of the two model components:

  1. 1)

    Start the reconstruction with the recovery of the reflectivity part s, using a fixed p.

  2. 2)

    Fix the recovered reflectivity s, and iterate to reconstruct the background part of the model p.

  3. 3)

    Continue with alternate updates, in a succession of increasing frequency.

The optimizability study that we provide below quantifies the enlargement of the attraction basins provided by the MBTT decomposition, in particular for the second step.

Estimates of attraction basins and tolerable data error, methodology

We review the properties that allow us to represent the optimizability of least-squares problems such as in equations 5 and 12. We refer to optimizability as the absence of local minimum in the misfit functional, which ensures the convergence of the minimization algorithm toward the appropriate (hence global) minimum, provided that the data (observations) are “not too far” from the attainable set F(M). The precise analysis can be found in Barucq et al. (2019b) (section 2). Here, we want to give a comprehensive overview of the tools that we use to analyze least-squares problems. It is important to note that the estimates to quantify optimizability are defined a priori; i.e., they only depend on the forward problem (not on the data) and can be computed without — and preferably before — performing any inversion.

In seismic applications, the model space M does not usually contain much information because subsurface characterization is mostly unknown. In particular, the data phase shifts resulting from an error in the long spatial wavelength of the initial guess produce local minima for the problem stated in equation 5, hampering the search for the global minimum. Therefore, our idea is to analyze, from an initial model m0, what is, for a given direction u, the largest interval in which the misfit functional is guaranteed to contain no parasitic stationary points — and hence no parasitic local minima — beside the global minimum, provided that the error in the data is smaller than some tolerable level. We refer to such an interval [m0Δm0uu,m0+Δm0uu]as a directional attraction basin of size Δm0u centered around m0 in the normalized direction u, i.e., with u=1.

In Figure 1, we give an intuitive presentation, in the case of a 2D data space, of the results of Chavent (2010) on strictly quasiconvex sets for the determination of attraction basins. Consider the curve or path P of the data space, which is the image, by the forward map F, of the interval [m0Δu,m0+Δu] in the model parameter space:
P:t[0,1]F(m0+(2t1)Δu),
(13)
where t is a dimensionless parameter. Let d be a point of the data space such that the “distance to d” (i.e., the data misfit) has, beside the global minimum over P, one “parasitic” local minimum. It is illustrated in Figure 1 with the global minimum in t=1 and a parasitic minimum for a value tt. The same reasoning works if t and t are interior or endpoints of P. Of course, the distance to d also has the same global and parasitic local minima over the subpath Ptt. Hence, d necessarily belongs to normal half-spaces N(t) and N(t) at the endpoints of Ptt (the dashed areas). The following inequalities can be observed in Figure 1:
dist(d,P)=P(1)d=dist(d,Ptt)dist(P(t),N(t)N(t))=defRG(t,t),
(14)
where RG(t,t) is called the global radius of curvature of Ptt at t seen from t.
Consider now the worst case over all t,t[0,1], and define
RG(P)=definft,t[0,1],ttRG(t,t).
(15)
Then, equation 14 implies that
dist(d,P)<RG(P)t[0,1]dP(t)  has no parasitic local minimum
(16)
or in terms of an attraction basin
RG estimate:
RG(P)>0[m0Δm0uu,m0+Δm0uu]is an attractionbasin with a tolerable error level  RG(P).
(17)

The numerical estimation of RG(P) will be the basis for the a priori determination of attraction basins in the next sections.

We now present the relation between the global radius of curvature RG(P) and the usual one R(t), which is given by the formulas (Barucq et al., 2019b)
1R(t)=A(t)DV(t)D2sin(A(t),V(t)).
(18)
where V(t) and A(t) are the velocity and acceleration vectors along P:
{V(t)=P(t)=2ΔDF(m0+(2t1)Δu)(u),A(t)=P(t)=4Δ2D2F(m0+(2t1)Δu)(u,u),
(19)
where (u) denotes the directional derivative, following the notation of Cartan (1971).
When t is close to t, equation 14 for the definition of RG(t,t) simplifies to the distance of P(t) to the intersections of the two normals to P at t and t, whose limit when tt is the usual radius of curvature R(t):
limttRG(t,t)=R(t)  so one can define  RG(t,t)=defR(t),
(20)
and
RG(P)=inft,t[0,1]RG(t,t)inft[0,1]R(t)=defR(P).
(21)
One can prove that equality holds in equation 21 as soon as the deflection Θ(P) along the path P (to be defined below) is smaller than π/2:
RG(P)=R(P)  as soon as  Θ(P)π/2.
(22)
This gives a simpler formula for the determination of RG(P) because it involves only the infimum of R over t instead of the infimum of RG over t and t. Moreover, when the path P is smooth, as is the case in seismic problems, the smallest radius of curvature along P is strictly positive, so that the condition RG(P)=R(P)>0 is always satisfied and equation 17 gives Θ estimate:
Θ(P)π/2[m0Δm0uu,m0+Δm0uu]  is an attraction basinwith a tolerable error level  R(P),
(23)
where the deflection along the path, Θ(P) is defined by (Figure 1)
Θ(P)=defsupt,t[0,1]Θ(t,t)whereΘ(t,t)=defangle between  V(t)  and  V(t).
(24)
where Θ(P) can be computed exactly by evaluating the infimum with respect to t and t in equation 24, but it also can be estimated using the radius of curvature R(t). For this, we split the interval [0,1] into small intervals of length dt, and let dν be the arc length and dΘ the deflection (counted positive clockwise; for example, remember P is a plane curve in our illustration as it lives in R2) of the corresponding piece of arc of P. Then, let tΘ and tΘ be the points of P where the deflection is maximum:
Θ(P)=tΘtΘdΘtΘtΘ|dΘ|01|dΘ|where  |dΘ|V(t)R(t)dt.
(25)
When dt0, this suggests the following upper bound for the deflection of P:
Θ(P)01V(t)R(t)dt=01A(t)DV(t)Dsin(A(t),V(t)).
(26)
It is valid in a data space of any dimension, and it is sharp: Equality holds as soon as P is a plane curve that turns always in the same direction, which corresponds to the fact that all dΘ in equation 25 have the same sign. But it can also be very pessimistic; for example, when P is a plane curve such that the dΘ in equation 25 changes sign steadily. Both situations will be met in the seismic applications below.

All the above formulas have been illustrated with a plane curve P. They remain true when P is a curve of a Hilbert data space (at the exception of equation 25).

Computation of estimates of attraction basins and tolerable data error

We apply the methodology to the a priori determination of directional attraction basins [m0Δm0uu,m0+Δm0uu], which will allow us to analyze the behavior of seismic inverse problems and to compare formulations: The size Δ of a directional attraction basin in a descent direction tells us how far we can move away in this direction without being stopped by parasitic local minima. Our objective is hence to determine (see the illustration in Figure 2)

  1. 1)

    The sizeΔm0u of the directional attraction basin centered at m0. The larger Δm0u is, the better the least-squares problem is amenable to minimization by local algorithm because we allow a larger area for investigation. In our numerical experiments, we shall scale the estimate with the norm of the nominal model m0E to provide the relative (to the model) quantity.

  2. 2)
    The associated tolerable error level RG,m0u. It is the largest tolerable error on the data d, which ensures the absence of parasitic local minima for the least-squares objective function
    t[0,1]12F(m0+(2t1)Δm0uu)dD2,
    (27)
    over [0,1]. The larger RG,m0u is, the better is the robustness of the minimization procedure to noise in the data. In our numerical experiments, we divide the estimates with the norm of the synthetic data d0=F(m0) to provide relative (to the data) quantity.

We shall use two types of estimates:

  1. 1)

    Θ estimates of Δm0u, based on equation 23, where optimizability is obtained by ensuring the sufficient condition Θ(P)π/2. In this case, RG(P)=R(P), so the tolerable error level is given by the minimum over the [0,1] interval of R(t) given in equation 18.

  2. 2)

    RG estimates of Δm0u, based on equation 17, where optimizability is obtained by satisfying the necessary and sufficient condition RG>0. In this case, the associated tolerable error level RG has to be computed by numerically evaluating the infimum of RG(t,t) in equation 15.

Exact Θ and RG estimates of Δm0u and associated tolerable error levels

The exact determination in equations 17 and 23 of the Θ and RG estimates of the attraction basin centered at m0 in a direction u inside an interval [m0Δm0uu,m0+Δm0uu] of given size Δ requires the numerical computation of the deflection Θ(t,t) and the global radius of curvature RG(t,t) between any two points of path P:
F(m0+tu),ΔtΔandF(m0+tu),ΔtΔ.
(28)
The deflection Θ(t,t) defined in equation 24 is given by
Θ(t,t)=arccosV(t),V(t)DV(t)DV(t)D[0,π[,
(29)
and the global radius of curvature RG(t,t), defined in equation 14, is given by the formulas (Chavent, 2010)
RG(t,t)={N+/Dif  V(t),V(t)D0,N+if  V(t),V(t)D0,
(30)
and
{N+=max(N,0)where  N=sign(tt)P(t)P(t),V(t)D,D=(1V(t)V(t)D,V(t)V(t)DD2)1/2.
(31)
The directional derivatives V(t), V(t), A(t), and A(t) are computed by differentiation of the Helmholtz equation 1 twice in the direction u (Barucq et al., 2019b, Appendix A), at a cost of the order of four forward simulations (see Algorithm 1).

To compute the exact estimates, deflection maps and global radius maps are computed, which display the values of Θ(t,t) and RG(t,t) between the points t,t of [Δ,+Δ]×[Δ,+Δ]. On the diagonal of the maps, where t=t, RG is not defined in equations 30 and 31, and we indicate instead the values of R(t) given in equations 18 and 19, which represent the limits of RG(t,t) when tt. We review the different steps for the computation of the maps in Algorithm 1.

One can then read on these maps (see the numerical experiments below):

  1. 1)

    The exact Θ estimate of the attraction basin size Δm0u, given by the largest square centered at (0,0) where Θ(t,t)π/2 for all t, t.

  2. 2)

    The exact RG estimate of the attraction basin size Δm0u, given by the largest square centered at (0,0) where RG(t,t)>0 for all t, t.

During this process, when the size Δ of the investigated square increases from zero to the exact Θ estimate, the associated exact tolerable error Rm0u=infΔtΔR(t) decreases from its value R0 at m0 to the tolerable error Rm0u of the Θ-attraction basin. When the size Δ of the square increases further to the exact RG estimate, the tolerable error RG=infΔt,tΔRG(t,t) continues to decrease, until it reaches the value zero, which gives the limit of the RG-attraction basins. The exact estimates are relatively demanding because they require evaluation of RG and Θ for many couples (t,t), and we now introduce the cheaper local estimates.

Local Θ estimate for Δm0u and tolerable error level Rm0u

A computationally inexpensive — but possibly very conservative, see the comments after equation 26 — upper bound to the deflection Θ(P) can be derived from equation 26 using only the velocity V(1/2) and acceleration A(1/2)atm0 in the direction u. We use for that a rectangle approximation of the integral in equation 26, which gives the approximate upper bound Θm0u to the deflection Θ(P):
Θ(P)01A(t)DV(t)DdtA(1/2)DV(1/2)D=2ΔD2F(m0)(u,u)DDF(m0)(u)D=defΘm0u.
(32)
Then, equation 23 leads to the local Θ estimate of the size Δ of an attraction basin at m0 in the direction u:
Δm0u=π4DF(m0)(u)DD2F(m0)(u,u)D.
(33)
This estimate is an approximate (because of the rectangle approximation of the integral) lower bound (because it is based on the worst-case upper bound given in equation 26) to the size of the largest attraction basins at m0 in the direction u, but it is computationally efficient.
The associated tolerable error level Rm0u is the minimum of the radius of curvature along P, which is again approximated by its value at m0, that is R(t=1/2) given in equation 18:
Rm0u=V(1/2)D2A(1/2)D|sin(A(1/2),V(1/2))|.
(34)

FWI: NUMERICAL DETERMINATION OF CONVERGENCE BASINS AND TOLERABLE ERROR LEVELS

Our objective in this section is to give a comprehensive understanding of the behavior of the seismic FWI problem equation 5 with respect to parameters such as the frequency, the geometry of the target, and the parameterization. We select a nominal model m0 and compute for different (normalized) directions u:

  1. 1)

    The size Δm0u of an attraction basin, which represents the interval, in the model space, where the misfit functional is guaranteed to have only one minimum.

  2. 2)

    The associated tolerable error level Rm0u, which is the maximal distance allowed between the data and the attainable set (i.e., the maximal tolerable error level).

When m0 is smooth, the direction u can be seen as the search direction or the geometry of the unknown parameter to be reconstructed in the framework of inversion.

We consider the 2D geophysical setup for the acoustic Helmholtz equation 1, and we take the squared slowness mc2 as unknown parameter, expressed in s2m2. In our experiments, we consider a 9.2×3  km domain, and we take nsrc=19 sources and nrcv=183 receivers associated with each source. Both are located near the surface: The sources are at 10 m depth, and the receivers, which remain in the same position for all sources, are at 40 m in depth.

Influence of frequency ω and search direction u

We first investigate individual frequency, where the forward map Fω at frequency ω for parameter m is a vector of C(nrcv×nsrc). The nominal model m0 is the smooth background velocity pictured in Figure 3, in which the velocity varies linearly with depth, according to the fact that no information is known on the subsurface structures to reconstruct.

We investigate the behavior of attraction basins for different directions u, which are chosen to be based on straight reflectors, or extracted from the Marmousi model, a synthetic wavespeed model designed by the Institut Français du Pétrole in the late 1980s; we refer to Versteeg (1994) for additional details. They are shown in Figure 4: u1 has a single, slim reflector (Figure 4), u2 has two slim reflectors (Figure 4), u3 has a single large reflector (Figure 4), and um is obtained by high-pass filtering from the Marmousi model of Figure 4. The amplitude for the directions is selected such that uk is normalized: uk=1, for all k.

Given a perturbation size Δ, one can think of m1=m0+Δu as being the target velocity, which one tries to retrieve starting from the smooth model m0, by minimization of the FWI misfit function in direction u by a local algorithm (it is analogous to the update formula given in equation 6 in Newton-type algorithm). The question is: How large can Δ be while still guaranteeing that the algorithm will not stop before the global minimum, provided that the error in the data is smaller than some R or RG?

Local Θ estimates of Δm0u and Rm0u

The determination of Δm0u and Rm0u with the local Θ estimate, equations 33 and 34, and Rm0u with equation 34, only requires the computation of the first- and second-order derivatives in the direction u, obtained by differentiation of the Helmholtz equation (Barucq et al., 2019b, Appendix A). Figure 5 shows the evolution of Δm0u (Figure 5) and Rm0u (Figure 5) for frequencies between 0.5 and 15.0 Hz, for the four directions u of Figure 4.

We observe that, in all cases, the size Δm0u of the directional attraction basin decreases with increasing frequency, meaning that the largest basin of attraction is obtained for the lowest frequencies. This is the expected behavior that indicates that lower frequencies give a better chance for the iterative minimization to converge, especially when no prior information is known for the initial model.

Concerning the effect of the perturbation direction on attraction basins, Figure 5 shows that the size of the basins decreases in the order u1u2umu3. This can be understood as follows: Directions u1, u2, and um are essentially reflective and contain no low-frequency components. Hence, increasing Δ will not change the phase of backscattered data, but because of multiple reflections, some nonlinearity will occur, which limits the size of the attraction basins. The directions u1 and u2 with fewer multiple reflections have the largest basins of attraction. Then, comes um, which has many reflectors and hence many multiples. On the contrary, direction u3 has a strong low-frequency component besides its reflectivity. Hence, increasing Δ will generate a phase shift in the backscattered data, which is likely to create local minima more quickly, and one expects the size Δm0u of the attraction basin in the direction u3 to be much smaller than in the three other directions.

Concerning the largest tolerable error, Figure 5 shows that Rm0u increases with the frequency, meaning that the low frequencies are more likely to be affected in the presence of noise. We notice the exception of u3 for which the allowed error remains relatively similar (and worse than the others) with frequency.

These results clearly show that it is harder to converge toward the salt dome than it is to recover Marmousi-like structures. Because of its low spatial frequency component, the former reduces the size of the attraction basin and is less tolerant to noise in the data. This situation is further illustrated in the context of model reconstruction in the subsection below. Note that the order of magnitude in Figure 5 and 5 is approximately 101 of the norms of the model and data, respectively.

Exact Θ and RG estimates of Δm0u and Rm0u

As an alternative to confirm these preliminary observations, we compute Δm0u using the exact Θ and RG estimate. It requires the computation of deflection and global radius maps on the investigated interval, from which the estimate is extracted. In Figures 6 and 7, we show the resulting maps in the direction of the Marmousi reflectors um of Figure 4 and the single large reflector u3 of Figure 4, using frequencies of 4 and 7 Hz. In Table 1, we show the numerical values of the interval size centered around m0, depending on the estimation.

The maps confirm the results of the previous experiment: The exact Θ estimates of Δm0u (Figure 6 and 6) are of the same order of magnitude, but they are slightly larger than the local Θ estimates of Figure 5 (see Table 1). Similarly, the exact RG estimates of Δm0u (Figure 6 and 6) are larger (Table 1), but at the price of a smaller tolerable error on the data. We also note a much more consistent pattern in the deflection for um compared to u3. Comparing the formulas, we observe that the inexpensive local Θ estimates already provide accurate values for classic FWI problems, close to the more refined (and computationally more intensive) exact Θ and RG estimates. Therefore, we shall only use the local Θ estimate in the remainder of this section.

Influence of the initial model

We now investigate the influence of the initial model m0 on the size of attraction basins. We consider two initial models: the acoustic Marmousi medium mm and a model mc encompassing objects of high contrast. These are pictured, together with their respective local Θ estimates Δmju given in equation 33, in Figure 8, for the two directions u1 (one reflector) and u3 (one large reflector). We see that Δmju is barely affected by the choice of initial model for the computation. We still observe the decrease in the size of the interval with increasing frequency, and the deterioration in the direction of the large reflector u3. Clearly, when comparing amplitudes of our estimates, the direction u has much more influence than the initial model m0.

From this first set of experiments, we clearly see that the frequency should evolve from low to high values to facilitate the convergence. When no prior information is known for the initial model, one should take advantage of the larger interval size Δm0u given by low frequencies. Here, we motivate the well-known frequency progression of, e.g., Bunks et al. (1995) and Sirgue and Pratt (2004), from quantitative estimates of the size of the basin of attraction, which allows the quantification of the benefits. However, the low frequencies require accurate data, i.e., close to the F(M), as illustrated with the estimates of Rm0u. In addition, we mention the fact that seismic data are particularly affected by noise at low frequencies, which are sometimes unusable. This highlights where resides the difficulty: The low frequencies that are required for convergence may be unusable due to the noise, or the inaccuracy may violate the curvature condition.

Frequency bandwidth data

We have seen that sequential progression in frequency must be conducted from the low to the high regime. We now investigate the case of frequency bandwidth: when the forward problem encompasses not only one, but several frequencies. This approach is particularly natural for the time domain inverse problem in which several frequencies are simultaneously contained in the data. The forward problem is now a vector of C(nrcv×nsrc×nω), where nω is the number of frequencies taken in the subgroup.

We remain with the smooth background of Figure 3 for m0. We select groups of 10 consecutive frequencies. The initial group encompasses the frequencies from 0.1 to 1.0 Hz using 0.1 Hz increment. Similarly, we design 15 groups so that the last one is from 14.1 to 15.0 Hz, using an identical increment. The local Θ estimate of the attraction basin size is shown in Figure 9, together with the largest tolerable error Rm0u. We investigate two directions: the Marmousi one of Figure 4 and the large reflector of Figure 4. We picture curves and histograms that correspond to sequential or group of frequencies, respectively. Numerical values for the attraction basis are prescribed in Table 2.

The comparisons provided in Figure 9 and 9 and Table 2 highlight that the size of the attraction basin is dictated by the largest frequency in the group when using the group of frequencies. Therefore, it is better to use the lowest frequency sequentially to have the largest size possible. For instance, using the single frequency 0.1 Hz provides a size of interval one order of magnitude larger than when including frequencies up to 1.0 Hz. Then, at a fixed frequency, taking into account lower frequencies does not affect the size of the interval.

This preference on sequential over multiple frequency is actually advocated by Brenders et al. (2012), motivated, as the frequency progression (Bunks et al., 1995), by the intuition that it reduces the cycle-skipping effect. Here, we justify this choice by a quantitative estimation of the size of the attraction basin. Moreover, we show that once the lowest frequencies are taken individually, incorporating them in higher frequency content does not modify the size of the attraction basin.

Regarding the largest tolerable error on data, Figure 9 shows that adding frequency increases the robustness in the direction um, except for low frequency. Then, for the direction of the large contrasting object u3, we do not observe any improvement, and the allowed distance remains relatively stable, as it was observed in Figure 5.

Selection of complex frequencies

The use of complex frequencies (also referred to as the Laplace-Fourier domain) has shown advantageous behavior when used during reconstruction algorithm. In this case, the angular frequency ω in equation 1 is written as
ω2=(σ+2iπf)2,iω=σ+2iπf,
(35)
where f is the “Fourier” component of the frequency (in Hz) and σ is a damping factor (in s1), representative of the Laplace component. Here, we are interested in deciding how to select the progression of those coefficients to obtain the best convergence properties. Applications have been carried out with a progression that appears mostly intuitive in the literature. In the case of zero-Fourier frequency (f=0  Hz), it is proposed from low to high damping coefficients (Shin and Cha, 2008). However, for the complex frequency progression (f0  Hz and σ0s1), the evolution of the damping parameter at a fixed f is proposed to be from high to low damping (Shin et al., 2010; Petrov and Newman, 2014).

We compute the estimation of Δm0u (using the local Θ estimate) and Rm0u with complex frequencies to see how it affects the basin of attraction and robustness to noise. We consider m0 the smooth background velocity. The estimates are shown in Figures 10 and 11, in which we compare the behaviors on a logarithmic scale for the 2D complex frequency plane. Here, f and σ vary from 0 to 15, with the frequency given in Hz and the Laplace component in s1, excluding the case in which f and σ are zero.

It is clear that the incorporation of a damping coefficient in the frequency is beneficial for the convergence because it increases the size of the attraction basin of several orders of magnitude. We observe

  1. 1)

    For a fixed f, the interval size increases with increasing damping, with the exception of the case f=0  Hz (see Figure 10 and 10).

  2. 2)

    At fixed damping σ, the interval size decreases with increasing f.

  3. 3)

    Following a selection from the large to small attraction basin gives

    • Initial iterations should use f=0  Hz (the largest interval), and a damping coefficient progression from low to high (see Figure 10 and 10).

    • Fix f0  Hz as small as possible, the damping coefficient should evolve from high to low (see Figure 10 and 10), and then f increases progressively.

    • Finally, one can take σ=0s1 and f evolves from low to high values.

Regarding the largest tolerable data error Rm0u, we see that at fixed σ, increasing f allows an increase in the distance between the data and the attainable set. Then, the pattern strongly depends on the type of the search direction geometry because it has already been observed in the previous experiment (and contrary to Δm0u). The distance is allowed to increase with σ for the direction um, but it decreases in the direction of the strong reflection, u3. Also in the direction of the Marmousi reflectors, Figure 11, we observe a band in which the allowed distance decreases, around σ=1s1. On the other hand, for both directions, the case in which σ=0s1 appears the most robust (i.e., where the allowed distance between the data and the attainable set is maximal). In terms of the magnitude, we clearly see that the high contrast object has globally much less allowance, except when the damping is close to zero.

It is important to notice that the use of complex frequencies, especially when f=0  Hz, improves the convergence by increasing notably the size of the attraction basin. Therefore, it provides a fundamental benefit for the reconstruction, in particular when no information is initially known. Note that one can also use very low frequencies (actually, the magnitude of the estimates for complex frequencies eventually matches the very low-frequency ones; we can see a continuation in the magnitudes). On the other hand, such frequencies are more sensitive to noise and require accuracy in the data. We refer to Shin and Cha (2008), Shin et al. (2010), Petrov and Newman (2014), and Faucher (2017) for illustrations of the reconstruction.

Experiments of FWI reconstruction

We illustrate the influence of the subsurface geometry with an acoustic FWI experiment, in which we demonstrate the behavior observed in the previous analysis (Figures 5 and 10). In particular, we have shown that a high contrast object complicates the process of reconstruction compared to a Marmousi-like subsurface, by reducing the size of the attraction basin. In addition, we have seen that the very low or complex frequencies, by increasing the size of the basin of attraction, provide benefits on the reconstruction.

Therefore, we perform FWI for 2D geophysical experiments in the acoustic settings. We consider a domain of size 9.2×3  km, which is the size of the models presented in Figure 8. We use the algorithm to recover the wavespeed c in the Helmholtz equation 1, from measurements of pressure fields (see equation 3). For the seismic acquisition, we consider 91 sources and 183 receivers per source, which are both located at 20 m depth. The measurements consist of synthetic frequency-domain data with added noise.

We apply the subsurface reconstruction FWI procedure, for two targeted subsurface media of different geometries. We consider the reconstruction of the Marmousi model of Figure 8, and the reconstruction of the medium encompassing salt domes of Figure 8. For both experiments, we take the same initial (starting) model, which does not assume any information on the subsurface (see Figure 12).

Comparison between Marmousi and salt geometries

The iterative minimization is conducted using frequencies from 2 to 10 Hz, with 1 Hz increment. There are 20 iterations per frequency, which are taken sequentially (from low to high). In Figure 12, we compare the reconstructions of the Marmousi model and the salt dome, using a similar setup: the same model size, the same frequency progression, and the same initial model with no a priori information (actually, the starting model shown in Figure 12 is relatively close to the background of the salt medium shown in Figure 8 in terms of amplitude).

We observe that only the experiment with the Marmousi medium is able to accurately capture the subsurface structures, in which only the deepest parts are missing in Figure 12. On the other hand, none of the contrasting salt domes are retrieved, see Figure 12, with only artifacts in the reconstruction. It demonstrates the results of the previous analysis that, by reducing the size of the basin of attraction, high-contrast objects are more complicated to reconstruct than Marmousi-like media when using a similar frequency range in the data. Note that to obtain a better reconstruction, one could use an initial model less challenging than the one of Figure 12, i.e., by incorporating a priori information.

Low or complex frequencies compensation

It has been shown in Figure 5 that low frequencies enlarge the size of the basin. As an alternative, complex frequencies produce the same behavior (see Figure 10). To demonstrate the validity of these observations, we repeat the experiment of salt dome reconstruction, incorporating low or complex frequencies:

  1. 1)

    In Figure 13, we show the reconstruction that starts with lower frequencies, using the sequence {0.5,0.75,1,2,3,4,5,6,7,8,9,10}Hz, hence incorporating three (unrealistically) low frequencies compared to the reconstruction of Figure 12.

  2. 2)

    In Figure 13, we first use the following set of complex frequencies (labeled as (σ,f) in equation 35): {(7,2),(5,2),(2,2),(1,2),(7,3),(5,3),(2,3),(1,3)}, with f given in Hz and σ in s1, before using the same sequence as for Figure 12: from 2 to 10 Hz (where σ=0s1). Note that our selection of complex frequencies follows our deduction (low to high for f, high to low for σ).

The FWI algorithm benefits from the incorporation of low or complex frequencies, and it is now able to accurately recover the contrasting objects, as expected. We note that the complex frequency experiment produces less accurate results; in particular, it misses the deepest boundary of the salt dome. However, it does not use a frequency of less than than 2 Hz. Then, both experiments show drastic improvement compared with the results of Figure 12.

We have used synthetic experiments to validate the basin of attraction estimates and its dependency on the geometry of the target and on the frequency. The use of very low frequencies (as in the reconstruction of Figure 13) is efficient but unrealistic in seismic application. The complex frequency alternative is probably a better candidate, but also relies on efficient data preprocessing (e.g., noise removal).

To overcome the issue of low-frequency requirement, the MBTT model decomposition has been introduced. The decomposition is expected to ameliorate the robustness with respect to missing low frequencies, in particular for the recovery of the background velocity. In the next section, we investigate its performance, by the means of attraction basin size, and we quantify the benefits of this approach.

OPTIMIZABILITY: MBTT VERSUS CLASSIC FWI

In this section, we apply the optimizability analysis tools to compare the MBTT formulation of the FWI presented in equation 12 (where the unknown model is parameterized by a smooth background p and a data-space reflectivity s), with the original FWI problem stated in equation 5 (where the unknown model is the squared slowness m). The MBTT approach was originally designed to overcome the phase shift problem that hampers classic FWI, so the optimizability study of this section is expected to quantify the gain — if any — of the MBTT formulation over classic FWI in this respect.

Choice of the model

To compare the two approaches on the same model, we first select the MBTT decomposition p0 and s0 and then we define the nominal model m0=m(p0,s0) using equations 8 and 9. We select here p0 and s0 such that
p0=m0,  s0(ω)=Fω(mm)Fω(m0),  ω,
(36)
where m0 is the smooth background of Figure 3 and mm is the Marmousi model of Figure 8. Note that if we assume that m0=p0 generates only the first arrivals as shown in equation 11, we have s0(ω)=Fω(mm), so that in our computations, the nominal s0(ω) coincides with the Marmousi data — but the nominal depth reflectivity r0(ω) differs from the true Marmousi reflectivity.
In the following, we restrict ourselves in studying single-frequency models, in which the reflectivity is computed from one frequency only:
m0(ω)=m(p0,s0(ω))=p0+W(ω)DFω*(p0)s0(ω)=p0+r0(ω).
(37)
We illustrate the models r0(ω) in Figure 14 at 4 and 7 Hz.
For the weight W(ω), it is defined as the square root of depth, as indicated after equation 9, and we incorporate a coefficient to control the model reflectivity level β at frequency ω, which we define by
β(ω)=defr0(ω)/p0.
(38)
When β is small, the forward-modeling part of the MBTT approach becomes close to the Born approximation. At first, we fix 1% of the reflectivity level for all frequencies, and next, we investigate the influence of β. Finally, we provide guidelines for the implementation of MBTT.

Choice of perturbation directions for MBTT decomposition

For the perturbation of the background part of the velocity model, we consider the direction of a 1D ramp u, given in Figure 15. In Figure 16, we show the forward problem (i.e., the solution of the Helmholtz equation at the receivers’ location, deprived from the direct arrivals) corresponding to unperturbed and perturbed models, for a single, centrally located source at 4 Hz. The figure shows the effect of applying the perturbation onto the global model (as used in classic FWI) or onto the background unknown p of the MBTT formulation: When the perturbation is applied to the global model m, the phases of the signal are shifted, and the amplitudes are incorrect, but when the perturbation is applied to the background MBTT variable p, the phases of the signal are preserved, and only the amplitudes are modified.

For the perturbation direction us associated with the data-space reflectivity s, we select a random vector. Note that we expect large basins of attraction with respect to s in the MBTT formulation because the FWI misfit functional is known to be nearly quadratic with respect to the reflectivity, i.e., to the high spatial frequency part, and consequently for the MBTT misfit functional with respect to the data-space reflectivity s.

Comparison of local Θ estimates

The local Θ estimates associated with the MBTT forward problem F(p,s) can be easily derived from equation 33 given for F(m) in FWI. The sizes of the attraction basins for p and s at p0, s0 are, respectively, denoted by Δp0u and Δs0us:
{Δp0u=π4DpF(p0,s0)(u)Dp2F(p0,s0)(u,u),Δs0us=π4DsF(p0,s0)(us)Ds2F(p0,s0)(us,us),
(39)
where we omit the frequency index for clarity in the notation (note that we have chosen to work with a single frequency in the reflectivity in equation 37). The evolution of the local Θ estimates with frequency is shown in Figure 17.

From Figure 17, we observe that the size of the attraction basins decreases with frequency, and it is only slightly larger when the perturbation is applied onto p (MBTT) instead of m (classic FWI). The gain appears minimal, and it is not up to the expectations raised as the MBTT parameterization allows us to overcome the phase-shift problem, as shown in Figure 16. Similarly, the attraction basin for s is relatively small compared to what is expected. However, we shall recall that these are only local Θ estimates that can represent a coarse approximation. Therefore, we compute exact Θ estimates in the following, to investigate if those preliminary observations still hold.

Comparison of exact Θ and RG estimates

To compute the exact Θ and RG estimates, the perturbation of the model is either applied onto the global model m (FWI) or onto the parameters p, s of the MBTT decomposition. Therefore, the deflection and global radius of the curvature maps are computed between the following points, for all Δt,tΔ:

  1. 1)

    FWI (attraction basin for m): F(m0+tu) and F(m0+tu);

  2. 2)

    MBTT (attraction basin for p): F(p0+tu,s0) and F(p0+tu,s0);

  3. 3)

    MBTT (attraction basin for s): F(p0,s0+tus) and F(p0,s0+tus).

In Figures 18 and 19, we show the deflection and global radius of curvature maps at 7 Hz, for m, p, and s. The corresponding values for the estimates are provided in Table 3, in which we also give the results for the frequency 4 Hz.

We now observe a clear difference between the MBTT parameterization and the global model one, and it shows the gain (in terms of attraction basin size) of the former approach (see Table 3). In the context of global model FWI (Figures 17 and 18) the local and exact estimates are of the same size, as observed in the above section. In sight of the upper bound equation 32, on which the local Θ estimate is based, one can think that the FWI formulation corresponds to the “worst” situation, in which the image of a segment in the background space is a curve close to an arc of circle in the data space. However, in the MBTT formulation (Figures 17 and 18), the exact Θ estimate is approximately 10 times larger than its local Θ estimate. Therefore, the exact estimates are needed for a proper comparison of the FWI and MBTT approaches. Then, we observe that the values of deflection are much lower when the background perturbation u is applied to p (MBTT) rather than to m (FWI): Few portions reach π/2 with MBTT (Figure 18), whereas it is rapidly attained with classic FWI (Figure 18). Consequently, the MBTT formulation produces larger Θ-attraction basins than the standard FWI formulation, roughly by a factor of 10 (Table 3). Regarding the global radius of curvature RG, it is strictly positive for the complete investigated interval in the case of MBTT, whereas, in classic FWI, it rapidly decreases to zero outside the diagonal. Therefore, the RG-attraction basins are much larger for MBTT, by a factor of 4–8 (Table 3). In addition, the amplitude of RG, whose minimum over the attraction basin gives a tolerable error level, is, near the diagonal, larger for FWI than for MBTT.

Regarding the data-space reflectivity s, Figure 19, the exact Θ-attraction basin is particularly large (23–54 times the norm of s0 depending on frequency), and it is 105 times larger than its local estimate. We observe the expected behavior because the forward map F is nearly linear with respect to s, hence resulting in large attraction basins. It also confirms the necessity of exact estimates for accurate characterization of attraction basins in the context of MBTT parameterization.

To summarize, MBTT significantly extends the size of attraction basins with respect to background perturbations, explaining its success with relatively high-frequency content in the data (Chavent et al., 2015). However, it costs a reduction in the admissible error level in the data.

Influence of the reflectivity level

The previous comparisons have been performed on a nominal model that has a reflectivity level β of 1% (see equation 38). We repeat the same computations, at 7 Hz, changing only the reflectivity of the model, which is fixed to a low level, 0.2% in Figure 20 (close to the Born approximation), and to a stronger value, 5% (close to Marmousi reflectivity) in Figure 21. Table 4 summarizes the sizes and corresponding tolerable error of the attraction basins.

Increasing the reflectivity increases the relative importance of multiple reflections — whose phases are not controlled by the migration/demigration included in the MBTT approach — with respect to primary reflections, whose phases are controlled. It is then expected that the performance of MBTT, when applied to a full-wave model as is the case here, will deteriorate when the reflectivity level increases. This is confirmed in Table 4: The MBTT attraction basin for a reflectivity level of 5% is three times smaller than at 1%, which itself is two times smaller than at 0.2%. On the other hand, the FWI attraction basin remains essentially unaffected by the reflectivity level, and the MBTT basins remain always larger, by a factor of 4–15 with respect to the Θ estimates and a factor of 8 for the RG estimates.

Influence of background smoothness

Here, we illustrate an important condition that has to be satisfied for the MBTT decomposition to work. When a background perturbation is applied to p0, whereas s0 is maintained fixed, the phase shifts of the reflections are controlled, provided these reflections are generated only by s0. Any backscattered energy generated by the updated background p0+tu — or worse, by p0 itself — will lead to reflections whose phase is not controlled by the migration-demigration process included in MBTT, thus annihilating its benefits. This appears already in Figure 18, in which one sees that the deflection reaches π/2 (respectively, the global radius of curvature equates zero) much closer to the diagonal at the top-right corner than at the bottom-left one. This zone (upper right corner of Figure 18) corresponds to large positive values of t, t, which lead to velocity models (p0+tu and p0+tu) made of strongly increasing background, consequently generating backscattered energy. At the opposite corner of the diagrams (bottom left), which corresponds to mildly increasing velocities, the amount of scattered energy is smaller and the limiting values are further apart from the diagonal. We have considered for simplicity only attraction basins that are centered at the nominal model m0, but on most of the deflection and curvature maps, the attraction basin could, in fact, be extended on the side of negative t (where backgrounds are less steep).

In the case in which the unperturbed background scatters back some energy, as the one of Figure 22, the situation is even worse, as illustrated in the corresponding deflection and curvature maps of Figure 22, computed at 4 Hz. These maps show no enlargement of the attraction basin in favor of MBTT; in fact, it is even smaller than for FWI. That is why it is fundamental to ensure the smoothness of the background in the MBTT decomposition by using an adapted parameterization, e.g., spline functions are used by Chavent et al. (2015) to represent p.

Behavior of the data misfit

We illustrate in this section the behavior of the FWI and MBTT misfit functions in the model space direction u at the nominal model m0=m(p0,s0) for some seismic data d, where p0 is the smooth background of Figure 3. The cost functions are, respectively, J(m0+tu) and J(p0+tu,s0) for FWI and MBTT, according to equations 5 and 12. This will allow us to verify the theoretical analysis: If the minimum of the misfit function is smaller than the tolerable error RGu associated to an attraction basin, the misfit functional should not have any local minimum over the basin. However, when the data exhibit an error larger than the tolerable error, local minima may but do not necessarily appear.

To define a seismic data d to build the misfit, we first denote by d0=F(m0)=F(p0,s0) the synthetic data associated to the nominal model. Note that due to the choices made in the construction of the nominal model, d0 is nothing but a migrated-demigrated version of the Marmousi data deprived of the direct arrivals. We define a scaled Marmousi data dm by scaling the synthetic Marmousi data dM=F(mm) (where mm is given in Figure 8) in such a way that dm=d0. Then, the data that enter in the FWI and MBTT misfit functionals are defined by
d=(1η)d0+ηdm,η[0,1].
(40)

When η increases from zero to one, the distance of d to the attainable set increases from zero to dmd0, which is larger than the tolerable error associated to the attraction basin of interest (FWI or MBTT).

The computations are done for the nominal model with a 1% reflectivity level, and the background perturbation direction u of Figure 15. Figure 23 shows the cost functions for FWI and MBTT at 7 Hz for different values of η. We also show by vertical lines the exact RG-attraction basin associated to zero tolerable error, and we show by horizontal lines the levels 1/2RG2 corresponding to the tolerable error level associated to the exact Θ-attraction basin. In Figure 23, we observe that the FWI and MBTT attraction basins are essentially independent of the error level η in the data. The size of the FWI RG-attraction basins is slightly larger than the one anticipated by Δm0u. When η=0, d=d0 is attainable, with a zero error level, and one sees that the first local minimum is just after the right limit of the exact RG-attraction basin; hence, it is consistent with our analysis. Note that the tolerable error level RG,p0u on d for the Θ-attraction basin is attained at η=0.1. Clearly, when one moves away from the nominal value t=0, the local minima appear much earlier with FWI than with MBTT and the size of intervals in which there is no local minimum is well anticipated by the estimates Δu. In the numerical setting considered here, the MBTT approach appears to be robust with respect to data error levels much larger than the tolerable level because we do not see the local minimum appearing inside the attraction basin.

Strategy for MBTT inversion

The numerical experiments lead to the following recommendations, for implementing the MBTT reformulation of FWI to obtain large attraction basins for the background reconstruction.

First, the background space for p has to be chosen smooth enough and the reflectivity level β=r/p large enough so that the energy backscattered by p is negligible compared to that backscattered by r. On the other hand, the reflectivity level β has to be chosen small enough so that the importance of multiples remains small compared to primary reflections in the synthetics. This can be achieved by using an adapted parameterization for the background p (e.g., spline functions) and by tuning the reflectivity level β to adjust the migration weight W in equation 37. We are reminded that the relevance of the chosen settings can be checked before performing any iterative optimization algorithm, by computing the size of the Θ and RG-attraction basins and the associated tolerable error. In addition, we have noted that the attraction basins tend to be larger in the direction of slowly increasing (with depth) backgrounds, which suggests underestimating the initial slope of the background velocity with respect to depth.

The dependency of the size of attraction basins with the reflectivity level suggests the following organization for MBTT waveform inversion. First, we need to scale down the given seismic data so that they can be resimulated by a reflectivity level β of approximately 0.2%–1.0%, which amounts to bring the forward model close to the Born approximation. Second, we perform alternate minimization with respect to s and p, starting with s first. Third, one can rescale (up) the processed seismic data to get closer to the original given data. Then, we pursue the alternate updates of s and p. Eventually, we continue until the processed and given original data coincide. Note that all of the difficulties related to the choice of the background and the reflectivity level disappear if one uses a linearized model (the Born approximation) as the forward model.

SUMMARY OF RESULTS

We have first studied the classic FWI problem for the acoustic wave equation in the time-frequency domain, and we computed directional attraction basins that give quantitative insight into optimizability of FWI. Unsurprisingly, the size of attraction basins becomes smaller when the frequency increases. In our tests, it is only of ±2% of the norm of the background at 4 Hz and ±1% at 7 Hz. Clearly, the use of complex frequencies including damping enlarges the attraction basin. However, the largest attraction basins obtained for low (or complex) frequency usually go with smaller tolerable error levels. Therefore, low and complex frequencies suffer more from noise.

We have observed that solving the problem for a sequence of increasing single frequency data is better in terms of the attraction basin than using an increasing sequence of frequency-bandwidth data. As expected, media with salt domes produce smaller attraction basins, and hence they are harder to recover.

We have then studied the MBTT approach, which uses a background/data-space reflectivity parameterization of the velocity model. In a direction of background perturbation, the largest attraction basin for MBTT shows an increase in size by a factor of 4–15 compared to FWI, and it is only barely reduced when the frequency increases. Correlatively, the tolerable error for MBTT is divided approximately by 10 compared to FWI. Nonetheless, the comparison of misfit functions has shown that the local minimum does not appear when the tolerable error is exceeded. In the direction of data-space reflectivity perturbations, in which the forward map is linear up to the multiple reflections, the attraction basin is very large, more than ±20 times the norm of the nominal data-space reflectivity. Furthermore, we have identified the critical parameters for an efficient implementation of MBTT: the background smoothness and the reflectivity level.

CONCLUSION

In this paper, we provide an a priori analysis of the optimizability of nonlinear least-squares minimization problems. For this purpose, we compute the size of the attraction basin around nominal models and the associated tolerable error level, such that the misfit functional is guaranteed to have only one (hence global) minimum in the attraction basin, provided the noise on the data is below the tolerable error level. These estimates are defined a priori, and they depend only on the first- and second-order directional derivatives of the forward problem.

A first application is carried out with the seismic FWI problem, in which the misfit functional is known to have local minima in the directions associated with low spatial frequencies perturbation (background velocity). Yet, note that our methodology is totally applicable for other least-squares minimization problems. Our findings confirm the inherent difficulty of FWI, which requires unrealistically low-frequency data for the determination of the full velocity model and which we illustrate with experiments of reconstruction.

To see if these limitations could be overcome, the methodology is then applied to compare with the data-space reflectivity MBTT reformulation, originally designed to alleviate the phase shift problem. The results show a substantial increase of the attraction basins for MBTT, which can indeed overcome the requirement for low frequencies in the data.

As a side product of this analysis, we finally provide some guidelines for an efficient implementation of the MBTT reformulation of FWI.

ACKNOWLEDGMENTS

The authors would like to thank the editor, associate editor, and the three anonymous referees for their valuable comments that have contributed to improve the quality of the paper. F. Faucher acknowledges funding from the Austrian Science Fund (FWF) under the Lise Meitner fellowship M 2791-N. The research of F. Faucher is also supported by the Inria-TOTAL strategic action DIP. The research of H. Barucq has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement no. 777778.

DATA AND MATERIALS AVAILABILITY

Data associated with this research are confidential and cannot be released.

Freely available online through the SEG open-access option.