## Abstract

We present the Seismic Laboratory for Imaging and Modeling/Monitoring open-source software framework for computational geophysics and, more generally, inverse problems involving the wave equation (e.g., seismic and medical ultrasound), regularization with learned priors, and learned neural surrogates for multiphase flow simulations. By integrating multiple layers of abstraction, the software is designed to be both readable and scalable, allowing researchers to easily formulate problems in an abstract fashion while exploiting the latest developments in high-performance computing. The design principles and their benefits are illustrated and demonstrated by means of building a scalable prototype for permeability inversion from time-lapse crosswell seismic data, which, aside from coupling of wave physics and multiphase flow, involves machine learning.

## Motivation

Advancements in high-performance computing techniques have led to giant leaps in computational (exploration) geophysics over the past decades. These developments have led, for instance, to the adoption of wave-equation-based inversion technologies such as full-waveform inversion (FWI) and reverse time migration (RTM) that, due to their adherence to wave physics, have resulted in superior imaging in complex geologies. While these techniques rank among the most sophisticated imaging technologies, their implementation relies with few exceptions — most notably iWave++ (Sun and Symes, 2010), Julia Devito Inversion framework (JUDI.jl) of the Seismic Laboratory for Imaging and Modeling (SLIM) (Witte et al., 2019a; Louboutin et al., 2023), and Chevron's COFII (Washbourne et al., 2021) — on monolithic low-level (C/Fortran) implementations. As a consequence, due to their lack of abstraction and modern programming constructs, these low-level implementations are difficult and costly to maintain, especially when performance considerations prevail over best software practices. A noteworthy attempt at modernizing wave-equation inversion frameworks is Deepwave (Richardson, 2018), which implements FWI using PyTorch (Paszke et al., 2019). Despite state-of-the-art examples and applications for 2D inversion, this work is limited by the aforementioned pitfalls as it relies on handwritten low-level C/Cuda code, reducing the flexibility and extensibility to new physics and three-dimensional problems. It also does not integrate machine learning with FWI as advocated in this work. While these implementation design choices lead to performant code for specific problems, such as FWI, they often hinder the implementation of new algorithms, e.g., based on different objective functions or constraints, as well as coupling existing code bases with external software libraries. For instance, combining wave-equation-based inversion with machine learning frameworks or coupling wave physics with multiphase fluid-flow solvers is considered challenging and costly. Thus, our industry runs the risk of losing its ability to innovate, a situation exacerbated by the challenges we face due to the energy transition.

In this work, we present a flexible and agile software framework that aims to resolve these challenges and is designed to be scalable, differentiable, and interoperable. We first introduce the design principles of our software framework, followed by a concrete usage scenario for time-lapse seismic monitoring of geologic carbon storage. This illustrative and didactic example involves the integration of multiple software modules for different types of physics with machine learning techniques such as learned deep priors and neural surrogates. For each module, we explain the choices we made and how these modules are connected through software abstractions and overarching high-level programming language constructs. The advocacy of our proposed framework is demonstrated on a preliminary 2D case study involving the realistic Compass model (Jones et al., 2012). We conclude by discussing remaining challenges and future work directions.

## Design principles

To address the shortcomings of current software implementations that impede progress, we have embarked on the development of a performant software framework. For instance, our wave propagators, implemented in Devito (Louboutin et al., 2019; Luporini et al., 2020), are used in production by contractors and oil and gas majors while enabling rapid, low-cost, scalable, and interoperable algorithm development for multiphysics and machine learning problems that run on a variety of chipsets (e.g., ARM, Intel, POWER) and graphics accelerators (e.g., NVIDIA, AMD, Intel). To achieve this, we adopt contemporary software design practices that include high-level abstractions, software design principles, and utilization of modern programming languages such as Python (Rossum and Drake, 2009) and Julia (Bezanson et al., 2017). We also make use of abstractions provided by domain-specific languages (DSLs) such as the Rice Vector Library (Padula et al., 2009) and the Unified Form Language (Alnaes et al., 2015; Rathgeber et al., 2016) and adopt reproducible research practices introduced by the trailblazing open-source initiative Madagascar (Fomel et al., 2013), which made use of version control and an abstraction based on the software construction tool SCons.

To meet the challenges of modern software design in a performance-critical environment, we adhere to three key principles — in addition to the fundamental principle of separation of concerns. First, we adopt mathematical language to inform our abstractions. Mathematics is concise, unambiguous, well understood, and leads to natural abstractions for the

wave physics, through partial differential equations as put to practice by Devito, which relies on Symbolic Python (SymPy) (Meurer et al., 2017) to define partial differential equations. Given the symbolic expressions, Devito automatically generates highly optimized, possibly domain-decomposed, parallel C code that targets the available hardware with near-optimal performance for 3D acoustic, tilted-transverse-isotropic, or elastic wave equations;

linear algebra, through matrix-free linear operators, as in JUDI.jl (Witte et al., 2019a; Louboutin et al., 2023) — a high-level linear algebra DSL for wave-equation-based modeling and inversion. These ideas date back to SPOT (van den Berg and Friedlander, 2009) with more recent implementations JOLI.jl (Modzelewski et al., 2023) in Julia and PyLops in Python (Ravasi and Vasconcelos, 2020); and

optimization, through definition of objective functions, also known as loss functions, that need to be minimized — via SlimOptim.jl (Louboutin et al., 2022c) — subject to mathematical constraints, which can be imposed through SetIntersectionProjection.jl (Peters and Herrmann, 2019; Peters et al., 2022).

Second, we exploit hierarchy within wave-equation-based inversion problems that naturally leads to a separation of concerns. At the highest level, we deal with linear operators, specifically matrix-free Jacobians of wave-based inversion, with JUDI.jl and parallel file input/output with SegyIO.jl (Lensink et al., 2023) on premise, or in the cloud (Azure) via JUDI4Cloud.jl (Louboutin et al., 2022b) and CloudSegyIO.jl (Modzelewski and Louboutin, 2022). At the intermediate and lower level, we make extensive use of Devito (Louboutin et al., 2019; Luporini et al., 2020) — a just-in-time compiler for stencil-based time-domain finite-difference calculations, the development of which SLIM has been involved in over the years.

Third, we build on the principles of differentiable programming as advocated by Innes et al. (2019) and intrusive automatic differentiation introduced by D. Li et al. (2020) to integrate wave physics with machine learning frameworks and multiphase flow. Specifically, we employ automatic differentiation (AD) through the use of the chain rule, including abstractions that allow the user to add derivative rules, as in ChainRules.jl (White et al., 2022, 2023).

During the Federal University of Rio Grande do Norte's inaugural FWI workshop in 2015, we at SLIM started articulating these design principles (Lin and Herrmann, 2015), which over the years cumulated in scalable parallel software frameworks for time-harmonic FWI (Silva and Herrmann, 2019), for time-domain RTM and FWI (Witte et al., 2018, 2019a; Louboutin et al., 2023), and for abstracted FWI (Louboutin et al., 2022a) allowing for connections with machine learning. Aside from developing software for wave-equation-based inversion, we have been involved more recently in the development of scalable machine learning solutions, including the Julia package InvertibleNetworks.jl (Witte et al., 2023), which implements memory-efficient invertible deep neural networks such as (conditional) normalizing flows (NFs) (Rezende and Mohamed, 2015), and scalable distributed Fourier neural operators (FNOs) (Z. Li et al., 2020) in the dfno software package (Grady et al., 2022a, 2022b). All of these will be described in more detail later in this paper.

To illustrate how these design principles can lead to solutions of complex learned coupled inversions, we consider in the ensuing sections end-to-end inversion of time-lapse seismic data for the spatial permeability distribution (D. Li et al., 2020). As can be seen from Figure 1, this inversion problem is rather complex, and its solution arguably benefits from our three design principles listed earlier. In this formulation, the latent representation for the permeability is taken via a series of nonlinear operations all the way to the time-lapse seismic data. In the remainder of this exposition, we will detail how the different components in this learned inversion problem are implemented so that the coupled inversion can be carried out. The results presented are preliminary, representing a snapshot on how research is conducted according to the design principles.

## Learned time-lapse end-to-end permeability inversion

Combating climate change and dealing with the energy transition call for solutions to problems of increasing complexity. Building seismic monitoring systems for geologic CO_{2} and/or H_{2} storage falls in this category. To demonstrate how math-inspired abstractions can help, we consider inversion of permeability from crosswell time-lapse data (see Figure 2 for experimental setup) involving (1) coupling of wave physics with two-phase (brine/CO_{2}) flow using Jutul.jl (Møyner et al., 2023) state-of-the-art reservoir modeling software in Julia; (2) learned regularization with NFs with InvertibleNetworks.jl; and (3) learned surrogates for the fluid-flow simulations with FNOs. This type of inversion problem is especially challenging because it involves different types of physics to estimate the past, current, and future saturation and pressure distributions of CO_{2} plumes from crosswell data in saline aquifers. In the subsequent sections, we demonstrate how we invert time-lapse data using the separate software packages listed in Figure 1.

** Wave-equation-based inversion**. Due to its unmatched ability to resolve CO

_{2}plumes, active-source time-lapse seismic is arguably the preferred imaging modality when monitoring geologic storage (Ringrose, 2020). In its simplest form for a single time-lapse vintage, FWI involves minimizing the ℓ

_{2}-norm misfit/loss function between observed and synthetic data — i.e., we have

In this formulation, the symbol **F**(**m**) represents the forward modeling operator (wave physics), parameterized by the squared slowness **m**. This forward operator acting on the sources consists of the composition of source injection operator **P**_{s}^{T}, with ^{T} denoting the transpose operator, solution of the discretized wave equation via **A**(**m**)^{−1}, and restriction to the receivers via the linear operator **P**_{r}. The vector **q** represents the seismic sources, and the vector **d** contains single-vintage seismic data collected at the receiver locations. Thanks to our adherence to the math, the corresponding Julia code to invert for the unknown squared slowness **m** with JUDI.jl reads

To obtain this concise and abstract formulation for FWI, we utilized hierarchical abstractions for propagators in Devito and linear algebra tools in JUDI.jl, including matrix-free implementations for F and its Jacobian J. While the preceding stand-alone implementation allows for (sparsity-promoting) seismic (Louboutin and Herrmann, 2017; Louboutin et al., 2018; Herrmann et al., 2019; Witte et al., 2019b; Rizzuti et al., 2020, 2021; Siahkoohi et al., 2020a, 2020b, 2020c; Yang et al., 2020; Yin et al., 2021, 2023) and medical (Yin et al., 2020; Orozco et al., 2021, 2023a, 2023b) inversions, it relies on hand-derived implementations for the adjoint of the Jacobian J' and for the derivative of the loss function. Although this approach is viable, relying solely on hand-derived derivatives can become cumbersome when we want to utilize machine learning models or when we need to couple the wave equation to the multiphase flow equations.

To allow for this situation, we make use of Julia's differentiable programming ecosystem that includes tools to use AD and to add differentiation rules via ChainRules.jl. Using this tool, the AD system can be taught how to differentiate JUDI.jl via the following differentiation rule for the forward propagator:

In this rule, the pullback function takes as input the data residual, dy, and outputs the gradient of F * q with respect to the operator * (no gradient), the model parameters, and the source distribution. With this differentiation rule, the above gradient descent algorithm can be implemented as follows:

Compared to the original implementation, this code only needs F(m) and the function loss(m). With the help of the above rrule, Julia's AD system^{8} is capable of computing the gradients (line 5). Aside from remaining performant — i.e., we still make use of the adjoint-state method to compute the gradients — the advantage of this approach is that it allows for much more flexibility, e.g., in situations where the squared slowness is parameterized in terms of a pretrained neural network or in terms of the output of multiphase flow simulations. In the next section, we show how trained NFs can serve as priors to improve the quality of FWI.

** Deep priors and NFs.** NFs are generative models that take advantage of invertible deep neural network architectures to learn complex distributions from training examples (Dinh et al., 2016). The term “flow” refers to the transformation of data from a complex distribution to a simple one. The term “normalizing” refers to the standard Gaussian (normal) target distribution that the network learns to map images to. For example, in seismic inversion applications, we are interested in approximating the distribution of earth models to use as priors in downstream tasks. NFs learn to map samples from the target distribution (i.e., earth models) to zero-mean unit standard deviation Gaussian noise using a sequence of trainable nonlinear invertible layers. Once trained, one can resample new Gaussian noise and pass it through the inverse sequence of layers to obtain new generative samples from the target distribution. NFs are an attractive choice for generative models in seismic applications (Zhang and Curtis, 2020, 2021; Siahkoohi and Herrmann, 2021; Siahkoohi et al., 2021, 2022, 2023; Zhao et al., 2021) because they provide fast sampling and allow for memory-efficient training due to their intrinsic invertibility, which eliminates the need to store intermediate activations during backpropagation. Memory efficiency is particularly important for seismic applications due to the 3D volumetric nature of the seismic models. Thus, our methods need to scale well in this regime.

To illustrate the practical use of NFs as priors in seismic inverse problems, we trained an NF on slices from the Compass model (Jones et al., 2012). The training of an NF is laid out in Figure 5 where, for illustrative purposes, we demonstrate a training run on small (64 × 64) slices of the Compass model. Each row shows the normalization (image **m** transformed to **Z**_{m} intended to be white zero-mean standard deviation one Gaussian noise) during training and its generative inverse (white noise **z** ∼ $N$(0,1) to image $m~$) during each epoch. From Figure 5, we clearly observe the intended behavior. As the training proceeds, the NFs transform the true model better toward white noise while its inverse progressively generates more realistic looking generative velocity models. To perform a comparison with traditional FWI, we train an NF on full model size slices (512 × 256 grid points). In Figure 5, we compare generative samples from the NF with the slices used to train the model shown in Figure 4. Although there are still irregularities, the model has learned important qualitative aspects of the model that will be useful in inverse problems. To demonstrate this usefulness, we test our prior on an FWI inverse problem. Because our NF prior is trained independently, it is flexible and can be plugged into different inverse problems easily.

Our FWI experiment includes ocean-bottom nodes, Ricker wavelet with no energy below 4 Hz, and additive colored Gaussian noise that has the same bandwidth as the noise-free data. For FWI with our learned prior, we minimize

where $G\theta $ is a pretrained NF with weights θ*. After training, the inverse of the NF maps realistic Compass-like earth samples to white noise — i.e., $G\theta \u22121.(m)=z\u223cN(0,I)$. Because the NFs are designed to be invertible, the action of the pretrained NF, $G\theta $, on Gaussian noise **z** produces realistic samples of earth models (see Figure 5). We use this capability in equation 2 where the unknown model parameters in **m** are reparameterized by $G\theta .(z)$. The regularization term, $\lambda 2\Vert z\Vert 22$, penalizes the latent variable **z** with large ℓ_{2}-norm, where λ balances the misfit and regularization terms. Consequently, this learned regularizer encourages FWI results that are more likely to be realistic earth models (Asim et al., 2020). However, notice that the optimization routine now requires differentiation through both the physical operator (wave physics, **F**) and the pretrained NF ($G\theta $), and only a true invertible implementation like ours, with minimal memory imprint for both training and inference, can provide scalability.

Due to the JUDI.jl's rrule for F and InvertibleNetworks.jl's rrule for G, integration of machine learning with FWI becomes straightforward involving replacement of m by G(z) on line 6. Minimizing the objective function in equation 2 now translates to

In Figure 6, we compare the results of FWI with our learned prior against unregularized FWI. Because our prior regularizes the solution toward realistic models, we obtain a velocity estimate that is closer to the ground truth. To measure the performance of our method, we use peak signal-to-noise ratio (PS/N) and see an increase from 12.98 dB with traditional FWI to 14.77 dB with the learned prior.

Through this simple example, we demonstrated the ability to easily integrate our state-of-the-art wave-equation propagators with Julia's differentiable programming system. By applying these design principles to other components of the end-to-end inversion, we design a seismic monitoring framework for real-world applications in subsurface reservoirs.

** Fluid-flow simulation and permeability inversion.** As stated earlier, our goal is to estimate the permeability from time-lapse crosswell monitoring data collected at a CO

_{2}injection site (cf. Figure 2). Compared to conventional seismic imaging, time-lapse monitoring of geologic storage differs because it aims to image time-lapse changes in the CO

_{2}plume while obtaining estimates for the reservoir's fluid-flow properties. This involves coupling wave modeling operators to fluid-flow physics to track the CO

_{2}plumes underground. The fluid-flow physics models the slow process of CO

_{2}partly replacing brine in the pore space of the reservoir, which involves solving the multiphase flow equations. For this purpose, we need access to reservoir simulation software capable of modeling two-phase (brine/CO

_{2}) flow. While several proprietary and open-source reservoir simulators exist, including MRST (Lie and Møyner, 2021), GEOSX (Settgast et al., 2022), and Open Porous Media (Rasmussen et al., 2021), few support differentiation of the simulator's output (CO

_{2}saturation) with respect to its input (the spatial permeability distribution

**K**in Figure 1). We use the recently developed external Julia package JutulDarcy.jl that supports Darcy flow and serves as a front-end to Jutul.jl (Møyner et al., 2023), which provides accurate Jacobians with respect to

**K**. Jutul.jl is an implicit solver for finite-volume discretizations that internally uses AD to calculate the Jacobian. It has a performance and feature set comparable to commercial multiphase flow simulators and accounts for realistic effects (e.g., dissolution, interphase mass exchange, compressibility, capillary effects) and residual trapping mechanisms. It also provides accurate sensitivities through an adjoint formulation of the subsurface multiphase flow equations. To integrate the Jacobian of this software package into Julia's differentiable programming system, we wrote the light “wrapper package” JutulDarcyRules.jl (Yin and Louboutin, 2023) that adds an rrule for the nonlinear operator $S$(

**K**), which maps the permeability distribution,

**K**, to the spatially varying CO

_{2}concentration snapshots, $c={ci}i=1nv$, over

*n*

_{v}monitoring time steps (cf. Figure 1). Addition of this rrule allows these packages to interoperate with other packages in Julia's AD ecosystem. The following shows a basic example where the ADAM algorithm is used to invert for subsurface permeability given the full history of CO

_{2}concentration snapshots:

During each iteration of the preceding loop, Julia's machine learning package Flux.jl (Innes et al., 2018; Innes, 2018b) uses the custom gradient defined by the aforementioned rrule, calling the high-performance adjoint code from JutulDarcy.jl. Our adaptable software framework also facilitates effortless substitution of deep learning models in lieu of the numerical fluid-flow simulator. In the next section, we introduce distributed FNOs and discuss how this neural surrogate contributes to our inversion framework.

** Fourier neural operator surrogates.** While the integration of multiphase flow modeling into the Julia differentiable programming ecosystem opens the way to carry out end-to-end inversions (as explained later), fluid-flow simulations are computationally expensive — a notion compounded by the fact that these simulations have to be done many times during inversion. For this reason, we switch to a data-driven approach where a neural operator is first trained on simulation examples, pairs {

**K**,$S$(

**K**)}, to learn the mapping from permeability models,

**K**, to the corresponding CO

_{2}snapshots,

**c**: = $S$(

**K**). After incurring initial offline training costs, this neural surrogate provides a fast alternative to numerical solvers with acceptable accuracy. FNOs (Z. Li et al., 2020), a neural network architecture based on spectral convolutions that capture the long-range correlations rather than localized spatial convolutions, have been introduced recently as a surrogate for elliptic partial differential equations such as the Darcy or Burgers equation. This spectral architecture has been applied successfully to simulate two-phase flow during geologic CO

_{2}storage projects (Wen et al., 2022). Independently, Yin et al. (2022) used a trained FNO to replace the fluid-flow simulations as part of end-to-end inversion and showed that AD of Julia's machine learning package can be used to compute gradients with respect to the permeability using Flux.jl's reverse-mode AD system Zygote.jl (Innes, 2018a). After training, the above permeability inversion from concentration snapshots,

**c**, is carried out by simply replacing $S$ by $S$

**w*** with

**w*** being the weights of the pretrained FNO. Thanks to the AD system, the gradient with respect to

**K**is computed automatically. Thus, after loading the trained FNO and redefining the operator $S$, the aforementioned code remains exactly the same. For implementation details on the FNO and its training, we refer to Yin et al. (2022) and Grady et al. (2022b).

## Putting it all together

As a final step in our end-to-end permeability inversion, we introduce a nonlinear rock-physics model, denoted by ℛ. Based on the patchy saturation model (Avseth et al., 2010), this model nonlinearly maps the time-lapse CO_{2} saturations to decreases in the seismic properties (compressional wavespeeds, $v={vi}i=1nv$) within the reservoir with the Julia code

We map the changes in the wavespeeds to time-lapse seismic data, $d={di}i=1nv$, via the blockdiagonal seismic modeling^{9} operator $F(v)=diag({Fi(vi)qi}i=1nv)$). In this formulation, the single-vintage forward operators **F**^{i} and corresponding sources, **q**^{i}, are allowed to vary between vintages.

With the fluid-flow (surrogate) solver, $S$, the rock-physics module, ℛ, and wave-physics module, ℱ, in place, along with regularization via reparametrization using $G\theta *$, we are now in a position to formulate the desired end-to-end inversion problem as

where the inverted permeability can be calculated by $K*=G\theta .(z*)$ with z^{*} the latent space minimizer of equation 3. As illustrated in Figure 1, we obtain the nonlinear end-to-end map by composing the fluid-flow, rock, and wave physics, according to ℱ ∘ ℛ ∘ $S$. The corresponding Julia code reads

This end-to-end inversion procedure, which utilizes a learned deep prior and a pretrained FNO surrogate, was successfully employed by Yin et al. (2022) on a simple stylistic blocky high-low permeability model. The procedure involves using AD, with rrule for the wave and fluid physics, in combination with innate AD capabilities to compute the gradient of the objective in equation 3, which incorporates fluid-flow, rock, and wave physics. To follow, we share early results from applying the proposed end-to-end inversion in a more realistic setting derived from real data (cf. Figure 2).

## Preliminary inversion results

While initial results by Yin et al. (2022) were encouraging and showed strong benefits from the learned prior, the permeability model and fluid-flow simulations considered in their study were too simplistic. To evaluate the proposed end-to-end inversion methodology in a more realistic setting, we consider the permeability model plotted in Figure 7a, which we derived from a slice of the Compass model (Jones et al., 2012) shown in Figure 2. To generate realistic CO_{2} plumes in this model, we generate immiscible and compressible two-phase flow simulations with JutulDarcy.jl over a period of 18 years with five snapshots plotted at years 10, 15, 16, 17, and 18. These CO_{2} snapshots are shown in the first row of Figure 8. Next, given the fluid-flow simulation, we use the patchy saturation model (Avseth et al., 2010) to convert each CO_{2} concentration snapshot, **c**^{i}, *i* = 1 … *n*_{v} to corresponding wavespeed model, *v*_{i}, *i* = 1 … *n*_{v} with **v** = ℛ(**c****)**. We then use JUDI.jl to generate synthetic time-lapse data, **d**^{i}, i = 1 … *n*_{v}, for each vintage.

During the inversion, the first 15 years of time-lapse data, **d**^{i}, i = 1 … 15, from the aforementioned synthetic experiment are inverted with permeabilities within the reservoir initialized by a single reasonable value as shown in Figure 7b. Inversion results obtained after 25 passes through the data for the physics-driven two-phase flow solver and its learned neural surrogate approximation are included in Figure 7c and Figure 7d, respectively. Both results were obtained with 200 iterations of the preceding code block. Each time-lapse vintage consist of 960 receivers and 32 shots. To limit the number of wave-equation solves, gradients were calculated for only four randomly selected shots with replacement per iteration. While these results obtained without learned regularization are somewhat preliminary, they lead to the following observations. First, both inversion results for the permeability follow the inverted cone shape of the CO_{2}. This is to be expected because permeability can only be inverted where CO_{2} has flown over the first 15 years. Second, the inverted permeability follows trends of this strongly heterogeneous model. Third, as expected, details and continuity of the results obtained with the two-phase flow solver are better. In part, this can be explained by the fact that there are no guarantees that the model iterations remain with the statistical distribution on which the FNO was trained. Fourth, the implementation of this workflow benefited greatly from the aforementioned software design principles. For instance, the use of abstractions made it trivial to replace physics-driven two-phase flow solvers with their learned counterparts.

Despite being preliminary, the inversion results show that this framework is conducive to producing current CO_{2} plume estimates and near-future forecasts. As described by Yin et al. (2022), these capabilities can be achieved through use of the physics simulator or the trained FNO surrogate. The 18-year CO_{2} simulations in both inverted permeability models are reasonable when comparing the true plume development plotted in the top row of Figure 8 with plumes simulated from the inverted models plotted in rows three and four of Figure 8. While certain details are missing in the estimates for the past, current, and predicted CO_{2} concentrations, the inversion constitutes a considerable improvement compared to plumes generated in the starting model for the permeability plotted in the second row of Figure 8. An early version of the presented workflow can be found in the Julia package Seis4CCS.jl. As the project matures, updated workflows and codes will be pushed to GitHub.

## Remaining challenges

We hope we have been able to convince the reader that working with abstractions has its benefits. Due to the math-inspired abstractions, which naturally lead to modularity and separation of concerns, we were able to accelerate the research and development cycle for the end-to-end inversion. As a result, we created a development environment that allowed us to include machine learning techniques. Relatively late in the development cycle, it also gave us the opportunity to swap out the original 2D reservoir simulation code for a much more powerful and fully featured industry-strength 3D code developed by a national lab. What we unfortunately have not yet been able to do is demonstrate our ability to scale this end-to-end inversion to 3D, while both the Devito-based propagators and Jutul.jl's fluid-flow simulations both have been demonstrated on industry-scale problems. Unfortunately, lack of access to large-scale computational resources makes it challenging in an academic environment to validate the proposed methodology on 4D synthetic and field data, even though the computational toolchain presented in this paper is fully differentiable and, in principle, capable of scale-up. Most components have been tested separately and verified on realistic 3D examples (Grady et al., 2022b; Louboutin and Herrmann, 2022, 2023; Møyner and Bruer, 2023) and efforts are underway to remove fundamental memory and other bottlenecks.

** Scale-up NFs.** Generative models, and NFs included, call for relatively large training sets and large computational resources for training. While efforts have been made to create training sets for more traditional machine learning tasks, no public-domain training set exists that contains realistic 3D examples. A positive is that NFs (Rezende and Mohamed, 2015) have a small memory footprint compared to diffusion models (Song et al., 2020), so training this type of network will be feasible when training sets and compute become available. In our laboratory, we were already able to successfully train and evaluate NFs on 256 × 256 × 64 models. In some cases where geophysicists might not have enough samples for velocity/permeability models, one could use in-house legacy models to train the NFs as a preparation step for inverting the seismic data. We leave the potential investigation to future studies.

** Scale-up neural operators.** Since the seminal paper by Z. Li et al. (2020), there has been a flurry of publications on the use of FNOs as neural surrogates for expensive multiphase fluid-flow solvers used to simulate CO

_{2}injection as part of geologic storage projects (Wen et al., 2022, 2023). While there is good reason for this excitement, challenges remain when scaling this technique to realistic 3D problems. In that case, additional measures must be taken. For instance, by nesting FNOs Wen et al. (2023) were able to divide 3D domains into smaller hierarchical subdomains centered around the wells — an approach that is only viable when certain assumptions are met. Because of this nested decomposition, these authors avoid the large memory footprint of 3D FNOs and report a speedup of many orders of magnitude. Given the potential impact of irregular CO

_{2}flow, e.g., leakage, we try as much as possible to avoid making assumptions on the flow behavior and propose an accurate distributed FNO structure based on a domain decomposition of the network's input and network weights (Grady et al., 2022b). By using DistDL (Hewett and Grady II, 2020), a software package that supports “model parallelism” in machine learning, our dfno software package partitions the input data and network weights across multiple GPUs such that each partition is able to fit in the memory of a single GPU. As reported by Grady et al. (2022b), our work demonstrated validity of dfno on a realistic problem and reasonable training set (permeability/CO

_{2}concentration pairs) sizes for permeability models derived from the Sleipner benchmark model (Furre et al., 2017). On 16 timesteps and models of size 64 × 118 × 263, we reported from our perspective a more realistic speedup of more than 1300× compared to the simulation time on Open Porous Media (Rasmussen et al., 2021), one of the leading open-source reservoir simulators. These results confirm a similar indepedent approach advocated by Witte et al. (2022). Even though we are working with our industrial partners and Extreme Scale Solutions to further improve these numbers, we are confident that distributed FNOs are able to scale to 3D with a high degree of parallel efficiency.

** Toward scalable open-source software.** In addition to allowing for reproduction of published results, we are advocates of pushing out scalable open-source software to help with the energy transition and with combating climate change. As observed in other fields, most notably in machine learning, open-source software leads to accelerated rates of innovation, a feature needed in industries faced with major challenges. Despite the exposition on our experiences implementing end-to-end permeability inversion, this work constitutes a snapshot of an ongoing project. However, many of the software components listed in Table 1 are in an advanced stage of development and to a large degree are ready to be tested in 3D and ultimately on field data. For instance, all of our software supports large-scale 3D simulation and AD. In addition, we are in an advanced state of development to support GPU for all codes. For those curious about future developments, we also include the Julia package ParametricOperators.jl, which is designed to allow for high-dimensional parallel tensor manipulations in support of future Julia-native implementations of distributed FNOs.

The work presented in this paper would not have been possible without open-source efforts from other groups, most notably by researchers at the UK's Imperial College London, who spearheaded the development of Devito, and researchers at Norway's SINTEF. By integrating these packages into Julia's agile differentiable programming environment, we believe that we are well on our way to arrive at a software environment that is much more viable than the sum of its parts. We welcome readers to check https://github.com/slimgroup for the latest developments.

## Conclusions

In this work, we introduced a software framework for geophysical inverse problems and machine learning that provides a scalable, portable, and interoperable environment for research and development at scale. We showed that through carefully chosen design principles, software with math-inspired abstractions can be created that naturally leads to desired modularity and separation of concerns without sacrificing performance. We achieve this by combining Devito's automatic code generation for wave propagators with Julia's modern highly performant and scalable programming capabilities, including differentiable programming. These features enabled us to quickly implement a prototype, in principle scalable to 3D, for permeability inversion from time-lapse crosswell seismic data. Aside from the use of proper abstractions, our approach to solving this relatively complex multiphysics problem relied extensively on Julia's innate algorithmic differentiation capabilities, supplemented by auxiliary performant derivatives for the wave/fluid-flow physics and for components of the machine learning. Because of these design choices, we developed an agile and relatively easy to maintain compact software stack where low-level code is hidden through a combination of math-inspired abstractions, modern programming practices, and automatic code generation.

## Acknowledgments

This research was carried out with the support of the Georgia Research Alliance and industrial partners of the ML4Seismic Center. The authors thank Henryk Modzelewski (University of British Columbia) and Rishi Khan (Extreme Scale Solutions) for constructive discussions. This work was supported in part by the U.S. National Science Foundation grant OAC 2203821 and Department of Energy grant no. DE-SC0021515.

## Data and materials availability

Our software framework is organized into registered Julia packages, all of which can be found on the SLIM GitHub page (https://github.com/slimgroup). The software packages described in this paper are all open source and released under the MIT license for use by the community.