This is the second part of a three-part tutorial series on full-waveform inversion (FWI) in which we provide a step-by-step walk through of setting up forward and adjoint wave equation solvers and an optimization framework for inversion. In Part 1 (Louboutin et al., 2017), we showed how to use Devito (http://www.opesci.org/devito-public) to set up and solve acoustic wave equations with (impulsive) seismic sources and sample wavefields at the receiver locations to forward model shot records. Here in Part 2, we will discuss how to set up and solve adjoint wave equations with Devito and, from that, how we can calculate gradients and function values of the FWI objective function.
Introduction
The gradient of FWI is most commonly computed via the adjoint-state method, by crosscorrelating forward and adjoint wavefields and summing the contributions over all time steps (Plessix, 2006). Calculating the gradient for one source location consists of three steps:
- (1)
Solve the forward wave equation to create a shot record. The time varying wavefield must be stored for use in step 3; techniques such as subsampling can be used to reduce the storage requirements.
- (2)
Compute the data residual (or misfit) between the predicted and observed data.
- (3)
Solve the corresponding discrete adjoint model using the data residual as the source. Within the adjoint (reverse) time loop, crosscorrelate the second time derivative of the adjoint wavefield with the forward wavefield. These crosscorrelations are summed to form the gradient.
We start with the definition and derivation of the adjoint wave equation and its Devito stencil and then show how to compute the gradient of the conventional least-squares FWI misfit function. As usual, this tutorial is accompanied by all the code you need to reproduce the figures. Go to github.com/seg/tutorials-2018 and follow the links.
A simple experiment
To demonstrate the gradient computation in the simplest possible way, we perform a small seismic transmission experiment with a circular imaging phantom, i.e., a constant velocity model with a circular high-velocity inclusion in its center, as shown in Figure 1. For a transmission experiment, we place 21 seismic sources on the left-hand side of the model and 101 receivers on the right-hand side.
We will use the forward propagator from Part 1 to independently model the 21 “observed” shot records using the true model. As the initial model for our gradient calculation, we use a constant velocity model with the same velocity as the true model, but without the circular velocity perturbation. We will then model the 21 predicted shot records for the initial model, calculate the data residual and gradient for each shot, and sum them to obtain the full gradient.
The adjoint wave equation
The adjoint acoustic wave equation is equivalent to the forward equation with the exception of the damping term Η(t, x, y) = η(x, y)dv(t, x, y)/dt, which contains a first time derivative and therefore has a change of sign in its adjoint. (A second derivative matrix is the same as its transpose, whereas a first derivative matrix is equal to its negative transpose and vice versa.)
In this demonstration, there is no real data. Instead we will generate the “observed” data via forward modeling with the true model model. The synthetic data is generated from the initial model model0. The resulting data, and their difference, are shown in Figure 2.
In contrast to forward modeling, we do not record any measurements at the surface since we are only interested in the adjoint wavefield itself. The full script for setting up the adjoint wave equation, including an animation of the adjoint wavefield is available in adjoint_modeling.ipynb.
Computing the FWI gradient
The inner sum time = 1,…, nt runs over the number of computational time steps nt and denotes the second temporal derivative of the adjoint wavefield v. Computing the gradient of equation 3, therefore corresponds to performing the point-wise multiplication (denoted by the symbol ⊙) of the forward wavefields with the second time derivative of the adjoint wavefield and summing over all time steps.
To avoid the need to store the adjoint wavefield, the FWI gradient is calculated in the reverse time loop while solving the adjoint wave equation. To compute the gradient g for the current time step v[time]:
The second time derivative of the adjoint wavefield is computed with a second-order finite-difference stencil and uses the three adjoint wavefields that are kept in memory during the adjoint time loop (equation 2).
In Devito, we define the gradient as a Function since the gradient is computed as the sum over all time steps and therefore has no time dependence:
The update for the gradient as defined in equations 4 and 5 is then:
Now we must add the gradient update expression to the adjoint propagator op_grad. This yields a single symbolic expression with update instructions for both the adjoint wavefield and the gradient:
Solving the adjoint wave equation by running the following now computes the FWI gradient for a single source. Its value is stored in grad.data.
Now we can iterate over all the shot locations, running the same sequence of commands each time.
This gradient can then be used for a simple gradient descent optimization loop, as illustrated at the end of the notebook adjoint_modeling.ipynb. After each update, a new gradient is computed for the new velocity model until sufficient decrease of the objective or chosen number of iteration is reached. A detailed treatment of optimization and more advanced algorithms will be described in the third and final part of this tutorial series.
Conclusions
We need the gradient of the FWI objective function in order to find the optimal solution. It is computed by solving adjoint wave equations and summing the point-wise product of forward and adjoint wavefields over all time steps. Using Devito, the adjoint wave equation is set up in a similar fashion as the forward wave equation, with the main difference being the (adjoint) source, which is the residual between the observed and predicted shot records.
With the ability to model shot records and compute gradients of the FWI objective function, we are ready to demonstrate how to set up more gradient-based algorithms for FWI in Part 3 next month.
Acknowledgments
This research was carried out as part of the SINBAD II project with the support of the member organizations of the SINBAD Consortium. This work was financially supported in part by EPSRC grant EP/L000407/1 and the Imperial College London Intel Parallel Computing Centre.