ABSTRACT
Geologic carbon storage (GCS) must be safe and profitable. To achieve these goals for gigaton-scale GCS operations, decision-making in the presence of uncertainty is required. Geophysical monitoring methods can inform such decisions, given their sensitivity to the spatiotemporal changes in the subsurface during and after injection. We investigate a novel framework for the optimal control of GCS operations using geophysical monitoring. We refer to this decision-making tool as “geophysical control” and develop sequential decision-making models trained using digital twins of GCS operations and the corresponding geophysical monitoring signals. In particular, we obtain these models via deep reinforcement learning (DRL) and specifically focus on two types of uncertainty: geophysical noise and uncertainty in the subsurface petrophysical model. Our objective is to demonstrate how each source of stochasticity affects the decision-making process when one seeks to maximize profit while minimizing the risk of induced seismicity through an optimal policy that determines the annual target CO2 injection rate. We train a suite of DRL agents with different geophysical observations (surface time-lapse gravity, surface seismic amplitude-variation-with-offset (AVO), and combined gravity and AVO surveys), different signal-to-noise ratio levels, and with/without petrophysical model uncertainties. A comparison of the learning behavior of these independent DRL agents shows that (1) the DRL framework has the capacity to learn optimal CO2 injection policies; (2) training performance degrades with increasing geophysical noise (especially more in the seismic AVO case); and (3) the combination of AVO and gravity enhances decision-making, especially in the presence of geophysical noise. Our results show that the use of multigeophysical measurements and the incorporation of subsurface model uncertainties are critical in developing robust injection control agents using DRL.
INTRODUCTION
The use of digital twins to study geologic carbon storage (GCS) problems has been significantly enhanced by advancements in digital oilfield techniques (Wanasinghe et al., 2020), facilitating more efficient, accurate, and cost-effective monitoring and management of GCS. The key components of this digital transformation include the use of sensing technologies, big data analytics, the internet of things, and data processing and interpretation algorithms based on conventional and/or machine-learning (ML) methods (White, 2019; Wanasinghe et al., 2020; Herrmann, 2023; Um et al., 2023). These technologies are pivotal in collecting, processing, and analyzing the vast amounts of data generated from GCS operations, including geophysical monitoring measurements, which provide critical information used for decision-making during and after CO2 injection. Multigeophysical techniques, such as those combining seismic (White et al., 2017), electromagnetic (EM) (Böhm et al., 2015), and gravity (Appriou et al., 2020) data can be used to track CO2 plume migration for monitoring, measurement, and verification (MMV) purposes. Given a digital twin of a GCS field, continuously recorded geophysical monitoring data enable operators to adjust the development and injection plans using a combination of reservoir, geomechanical, and geostatistical simulations.
A GCS operator may often seek to increase profit by maximizing the amount of injected CO2 but must also be careful to avoid induced seismicity. To address such a constrained optimization problem, effective decision-making in the presence of subsurface uncertainty is required. The optimization of GCS operations has been extensively studied using reservoir and geomechanical simulations and depends on the availability of geoscientific data, which varies with the phase of GCS projects. A relatively well-established approach is commonly referred to as planning (or design) via simulation-based methods. Given a reservoir model (consisting, for example, of porosity and permeability distribution) and its associated uncertainties, one can simulate a series of development and/or operational scenarios. Various methods, such as traditional ensemble-based data assimilation techniques (Chen and Oliver, 2010), dynamic modeling (Leach et al., 2011), genetic algorithms (Safi et al., 2016), or Bayesian optimization (Lu et al., 2022), can be used to determine injection rates or development plans through (geostatistical) reservoir and/or geomechanical simulations (Zheng et al., 2021). The outcomes of the optimization can then serve as an initial plan for the development or operation of a GCS project; revised subsurface models updated via history matching can be used for further optimization (Zhang et al., 2022).
Another more novel approach to optimization is to continuously record geophysical and other monitoring data (including operational data recorded in wells) and make “online” GCS operation and/or development decisions based on this new evidence as it becomes available. Online optimization then becomes a sequential decision-making problem, which, under certain conditions, can be described through a (partially observable) Markov decision process (MDP) (Bellman, 1957). The need for sequential decision-making occurs in many applications, such as finance (Almgren and Chriss, 2001), healthcare (Murphy, 2003), and robotics (Kober et al., 2013). One tool to solve such complex MDPs is an ML method referred to as reinforcement learning (RL) (Sutton and Barto, 2018). Mnih et al. (2015) demonstrate the first successful use of a powerful combination of deep learning and RL (deep reinforcement learning [DRL]), allowing artificial intelligence (AI) to achieve human-level performance in a suite of classic Atari video games. Subsequent work includes applications to other complex control problems, such as medical diagnoses (Yu et al., 2023a), autonomous driving (Kiran et al., 2021), and nuclear fusion (Degrave et al., 2022). The first application of DRL to optimize GCS management is reported in Sun (2020), wherein the DRL inputs are CO2 saturations and pore pressures that completely describe the dynamic changes within the target formation. More recent work on similar decision-making algorithms via DRL includes reservoir engineering studies by Dawar (2021), Chen et al. (2024), and Nasir and Durlofsky (2024).
Integrating the capabilities of DRL into sequential decision-making with geophysical monitoring, Noh and Swidinsky (2024) were the first to investigate whether DRL agents could “geophysically control” GCS operations. Their results show that the combination of time-lapse gravity and well pressure monitoring data could be used by a DRL agent to learn behavior that maximizes profit while minimizing the risk of induced seismicity. A similar recent study approaches such geophysically guided decision-making problems using sequential Bayesian inference and digital twins (Gahlot et al., 2024). The use of time-lapse geophysical measurements for GCS control is, in our opinion, novel, as geophysical data are conventionally collected for MMV purposes in carbon storage applications. By providing field-measurable quantities as input to the DRL agent, the approach becomes a “what-to-do-given-an-observation” method, whereas optimized planning or scheduling — including the work in Sun (2020) — can be considered a “what-to-do-when” method. In other words, the DRL approach proposed in Noh and Swidinsky (2024) relies on training agents using digital twins of the subsurface (such as reservoir and engineering models) and applying the resulting policy using field measurements (such as geophysical and well monitoring data).
The DRL has the ability to learn control policies that incorporate stochasticity (i.e., uncertainty) through repeated episodes of experience. Such uncertainties may be endogenous or exogenous, terms that are often used in economics, finance, and decision theory (Alison et al., 2015; Zhang et al., 2021) but rarely mentioned in applied geoscience. Endogenous uncertainty originates from the system being studied, whereas exogenous uncertainty comes from outside the system (meaning that such uncertainty is not influenced by actions applied to the system). When geophysical measurements are used for decision-making, there are two major sources of stochasticity — geophysical noise and subsurface model uncertainty: the former is part of each new measurement and is therefore endogenous, whereas the latter is external to such measurements and is therefore exogenous. In Sun (2020) and Noh and Swidinsky (2024), a suite of permeability models was realized via geostatistical simulation and provided to the DRL agent during training. The resulting control policy then provided optimal GCS operation parameters given the input data while accounting for this exogenous uncertainty of the static subsurface model. Endogenous uncertainties, such as measurement noise, were not considered in either study. Furthermore, Noh and Swidinsky (2024) only considered a single geophysical monitoring measurement — noise-free surface gravity — as input to the control policy.
In this work, we seek to understand the effects of the endogenous and exogenous uncertainties on GCS control policies. We compare policies learned using two different geophysical monitoring methods with corresponding different sensitivities to the subsurface: time-lapse gravity and seismic amplitude variation with offset (AVO). We shall see that such multigeophysical measurements improve GCS control policies learned via DRL in the presence of endogenous (i.e., geophysical noise) and exogenous (i.e., subsurface model) uncertainties.
METHODOLOGY
The RL framework
Environment: Coupled geostatistical, reservoir, and geophysical simulations
We train our DRL agent through repeated interactions with a simulated geophysical digital twin of a GCS operation, where the environment shown in Figure 1 contains geostatistical, reservoir, and geophysical simulators. Figure 2 shows the layout of a hypothetical GCS operation wherein CO2 is injected at a variable rate through a single, centrally located well into a brine aquifer consisting of a single layer. We further simulate the extraction of preexisting fluid (i.e., brine) through producers to reduce pressure buildup, a process often referred to as pressure management (Birkholzer et al., 2012); in our model, two brine producers are connected to the corners of the target formation (the blue lines in Figure 2) and operate at a fixed flow rate until CO2 production is detected in place of brine.
Figure 3a and 3b shows the “true” permeabilities and porosities, respectively, of the single-layer target formation used in this study. To incorporate exogenous subsurface model uncertainty given limited petrophysical samples from exploratory wells, we simulate a new permeability model at the beginning of each new episode of training (i.e., a new lifecycle of the hypothetical GCS operation) while keeping a fixed porosity model. The solid empty circles in Figure 3a indicate the location of the 25 gridded permeability samples, and the models in Figure 3c show examples of the realized permeability models at various episodes. We recognize that such a dense distribution of sampling via pattern-drilled wells may not be practically available in real-world scenarios. However, a key aspect of our methodology lies in the ability to incorporate subsurface static model uncertainty through a new geostatistical realization at each episode, regardless of how such a realization is generated (i.e., one could combine information from a limited number of wells and 3D seismic reservoir characterization studies as a more realistic approach to geostatistical simulation). In addition, given the capability of considering stochasticity in the RL method and using a new geostatistical model for each episode in our approach, the use of a (non)perfect grid of samples does not significantly impact the RL process. Instead, the overall level of stochasticity and the accuracy of the realized model relative to the unknown true model are more critical to the process. The porosity and permeability models then serve as static inputs for reservoir simulation; subsurface pressure and saturation distributions are subsequently used for geophysical simulation (specifically to calculate time-lapse AVO and gravity responses). Table 1 summarizes the parameters and conditions used for the reservoir simulation. In this work, we use the open-source geostatistical software GeoStatTools (GSTools; Müller et al., 2022) and the open-source reservoir simulator Open Porous Media (OPM; Rasmussen et al., 2021) to simulate the GCS operations. Gravity and AVO simulations are performed using rockphypy (Yu et al., 2023b), an open-source rock-physics package (see section “States: Time-lapse AVO and gravity”).
Action: Target CO2 injection rate
Our action space consists of the target CO2 injection rate, ranging from 0.2 to 1.0 million tonnes per annum (MTPA) at 0.1 MTPA intervals, leading to nine discrete actions. This range is set based on the actual CO2 injection rates from real-world projects at Aquistore (White, 2019) and Quest with an adjustment to our target reservoir. Because injection decisions are made annually for up to 25 years, there are 925 possible strategies, and the optimal solution, therefore, cannot be found via an exhaustive search.
Reward: Profit given induced seismicity risk
States: Time-lapse AVO and gravity monitoring data
In RL theory, a state is a snapshot of the environment at time step t, containing the information the agent needs to make a decision about its next action. We consider two of the most widely used or considered effective geophysical monitoring methods for GCS: time-lapse seismic and gravity monitoring.
Fluid substitution and AVO forward models
For simplification, we make several approximations for AVO forward modeling. First, we assume a homogeneous overburden with constant elastic moduli on top of the target formation. Second, we assume constant temperature and salinity for the fluid mixture, and constant overburden pressure. Third, we assume a single rock-physics model for the entire target formation. Under these three assumptions, we used the fluid substitution model based on Gassmann’s equation to estimate the impacts of changes in effective pressure and CO2 saturation on the elastic moduli of the target formation.
Time-lapse gravity monitoring
Noise contamination
One of the main goals of this work is to understand how DRL agents perform in the presence of geophysical noise (i.e., endogenous uncertainty in the state). Figure 4 shows examples of pore pressure and CO2 saturation distributions, together with the corresponding noise-free and noisy (in the dB of the signal-to-noise ratio [S/N]) time-lapse AVO and gravity signals. The resulting maps show that is sensitive to changes, whereas responds to and pore pressure P. However, is sensitive to changes in (and indirectly sensitive to pore pressure via the Batzle-Wang equation for the density of the fluid mixture and saturated brine).
One fundamental difference between , , and is that although the changes in the reflection coefficients ( and ) originate at the lithologic boundary, the time-lapse gravity signal () is measured at the gravity meter stations. This difference, combined with differences in the governing physics between elastic and potential fields and the fact that gravity responds only to changes in density, results in a variable sensitivity to reservoir changes (i.e., less sensitivity in the gravity measurements compared with AVO).
We recognize that directly using , , and instead of the raw data recorded from a 3D grid of geophones or gravity stations assumes that a suite of data processing workflows has been conducted after each monitoring survey. We also recognize that the noise of the gravimeters is considered to be in an additive form (e.g., 3 μGal for land-based gravity surveys; Krahenbuhl et al., 2011; Van Camp et al., 2017), and 30 dB of the gravity noise corresponds to 3 μGal after approximately 5 million tonnes of CO2 injection.
Agent: Double deep Q-network
We use a double deep Q-network (double DQN; DDQN) (Van Hasselt et al., 2016) as the agent that learns the optimal GCS operating policy for the problem at hand. In brief, through trial and error, the DDQN method determines the value of each action that can be taken in each state to develop its behavioral policy; this action-value function is denoted by Q(s, a), which represents the expected cumulative reward an agent can achieve starting from the state (s) when taking action (a). It can be approximated by CNNs to work with image-like inputs, such as maps of time-lapse gravity and AVO monitoring data. In other words, the Q-network outputs the expected cumulative reward (i.e., the cumulative normalized profit of GCS operation — see equations 1–3) for each target injection rate (action) based on the observed geophysical monitoring data (state). Thus, the objective of DQN/DDQN training is to learn an accurate representation of the Q-network, which can subsequently be used to select the optimal action corresponding to the largest Q-value. Notably, we also normalized input features accordingly following conventional deep-learning implementation. Table 3 provides the DRL parameters used for this study, whereas the specific CNN architectures are described in Appendix A. For further description of the underlying theory and the detailed implementation of DDQN, we refer to Van Hasselt et al. (2016). When training the DDQN network, there are two main computational loads: one due to the gathering of experiences (states and rewards after an action is taken) via reservoir simulation and subsequent geophysical simulation (see Figure 1) and another due to the update of the neural network using these experiences. We use multiple central processing units (CPUs) in parallel for experience-gathering and graphics processing units (GPUs) for updating the neural network (see Figure 5).
The application of our proposed stochastic geophysical control approach is depicted in Figure 6, where a trained DDQN agent (with its corresponding Q-network approximated by a CNN) takes P and S reflectivity and the time-lapse gravity of the target formation as input images (state). This neural network will then output the action-values Q(s, a) via a forward pass. The target injection rate (action) corresponding to the largest Q-value is chosen at each time step via a greedy algorithm.
TRAINING AND APPLICATION EXAMPLES
We aim to investigate the effect of two types of stochasticity within the geophysical control of GCS operations: the presence of geophysical noise and uncertainty in the subsurface static model. We independently train a DDQN agent for the following cases: (1) deterministic (assumed to be known petrophysical models) or stochastic (geostatistical models as shown in Figure 3a) permeability models; (2) states defined by time-lapse gravity measurements, time-lapse seismic AVO measurements, and the combination of these two methods; and (3) multiple degrees of noise contamination, such as the noise-free case. Table 4 provides training and evaluation parameters for each DDQN agent considered in this work. The rest of this section is divided into two parts. The first half describes the training results of the various agents, whereas the second half evaluates these agents from the perspective of geophysical noise and static model uncertainty.
Effect of stochasticity on training
True deterministic reservoir model with/without geophysical noise
Figure 7 compares the cumulative reward as a function of episode for the deterministic permeability model cases (meaning the true permeability in Figure 3a is always used while training). The training parameters used for the training of the corresponding agents (Agents 1 to 8) are shown in Table 4. Figure 7a shows the total reward (equation 1 using a discount factor of unity) for varying state definitions when no geophysical noise is present, together with the total reward from the optimal constant injection scenario (the constant rate being chosen from the nine possible actions). The corresponding monetary values in millions of USD (the sum of the profits in equation 2) are shown on the right-hand-side y-axis. As expected, the agents learn different policies with corresponding differences in performance. Using gravity monitoring alone (Agent 1) yields the lowest total reward at the end of the training, followed by using AVO alone (Agent 4). The multigeophysical monitoring case (Agent 7) shows the highest total reward at the end of the training. Such ordering is expected given the higher resolution of AVO compared with gravity and the naturally higher information content when two complementary methods are combined. However, it is interesting to note that the difference in the total rewards is not severe (especially compared with the stochastic subsurface model cases that we shall subsequently consider). These results suggest that the difference in the value of subsurface information from different geophysical methods may not be critical to DRL when a (correct) deterministic subsurface model is available and the input state is free of noise. In other words, agents can learn good policies with minimal observations, given no exogenous or endogenous uncertainties. Notably, we refer to Noh and Swidinsky (2024) for a detailed comparison between the commercial optimization solutions and the geophysical control with noise-free gravity alone case.
Figure 7b shows the training results from gravity monitoring with different S/N levels of 50 dB (Agent 2) and 30 dB (Agent 3), together with the noise-free case (note that the noise-free gravity case [Agent 1] is shown both in Figure 7a and 7b (in black) for clarity). These three learning curves show a similar level of total reward at the end of the training, which, we believe, is due to the deterministic nature of the environment; however, the case with the highest amount of noise (an S/N of 30 dB in red) somewhat underperforms. Figure 7c shows the training results from the time-lapse seismic AVO data with different S/Ns (Agents 5 and 6), along with the multigeophysics training result using a 30 dB S/N (Agent 8). Please note again that the noise-free AVO case (Agent 4) is shown in Figure 7a (in blue) and 7c (in black) for clarity. The training result using seismic AVO data alone with a 50 dB S/N shows learning behavior similar to the noise-free case. In addition, the total reward at the end of the training surpasses the results from all gravity cases (noisy or noise-free) in Figure 7b. However, the training progress using seismic AVO data with a 30 dB S/N shows delayed improvement after approximately the 5000th episode and a lower total reward than any gravity case at the end of the training. Considering the higher subsurface resolution in the time-lapse AVO data compared with the time-lapse gravity data, such a result suggests that severe noise on an input channel with significant information content may degrade DRL performance. The training results from the multigeophysical input with an S/N of 30 dB on both data sets (shown in green in Figure 7c) show that the total reward at the end of the training is comparable to the noise-free and 50 dB AVO cases. When compared with the 30 dB AVO case, a clear advantage of using multigeophysical inputs in the presence of severe noise is evident.
Stochastic reservoir model with/without geophysical noise
Figure 8 shows the training results from the stochastic permeability models intended to mimic a real-world situation wherein the true subsurface static model is not exactly known. Compared with the deterministic training curves in Figure 7, the stochastic results generally show a high variance at later episodes when the exploration rate is low (see Table 3). This behavior is expected, given that new permeability models are realized at the start of each episode. Figures 8a and 8b compare the results for the respective time-lapse gravity (Agents 9 and 10) and time-lapse AVO (Agents 11 to 14) cases. The time-lapse gravity case results in Figure 8a generally show a similar total reward at the end of the training despite the presence of noise. The results from 50 and 40 dB cases, although not displayed, show the same behavior. The noise-free time-lapse AVO case shows higher total rewards than the noise-free and noisy time-lapse gravity cases. However, as the noise level of the time-lapse AVO data increases, the training performance in Figure 8b is degraded — interestingly distinct from the gravity case. In addition, the noise-free and 50 dB AVO cases have similar maximum total rewards, suggesting that a certain level of noise can be handled by the DRL agent even when the subsurface model is stochastically realized during training. Such a finding is similar to the results for Agent 5, shown in blue in Figure 7c, trained in the absence of exogenous uncertainty. When comparing the noise-free time-lapse gravity data (black in Figure 8a; Agent 9) to the time-lapse AVO data (black in Figure 8b) for the stochastic model case, the time-lapse AVO data shows a higher total reward at the end of the training. However, the results from the time-lapse gravity data, with a relatively high noise level (an S/N of 30 dB) in Figure 8c, show a comparable, if not higher, total reward compared with the 30 dB AVO result (blue in Figure 8c; Agent 14). Considering that AVO data have a generally higher level of subsurface information content in contrast to gravity data, this result suggests that “strong noise” affects the agents trained to use AVO data more severely than those trained to use gravity data (at least in our implementation). However, the 30 dB S/N multigeophysics result in Figure 8c (shown in red) not only surpasses the noise-contaminated single-method results but is also comparable to the training performance using noise-free AVO (shown in black in Figure 8b) and the noise-free multigeophysics result (shown in green). The advantage of using DRL agents trained to use multigeophysical input in the presence of geophysical noise and subsurface uncertainty is remarkable, especially considering the potentially performance-degrading effect of geophysical noise.
Evaluation of the application of the control agent
Deterministic model case
Figure 9 shows the behavior of a DRL agent trained with and applied to the true (deterministic) permeability model. In this case, noise-free multigeophysical measurements were used for training and application (Agent 7 in Table 3), with the corresponding learning curve shown in black in Figure 7c. Figure 9a compares the results derived from the DRL agent with two constant CO2 injection rates, whereas the corresponding (unnormalized) cumulative profits are shown in Figure 9b. The result from the DRL agent (Agent 7) shows generally high injection rates during the early stage of the operation (years 1 and 2), followed by a gradual decrease (years 3 and 4) and a low injection period (years 8 to 9). Starting at year 10, the DRL agent (Agent 7) decides to alternate between increasing and decreasing the injection rates. We believe that such behavior is intended to periodically release the pressure, maximizing the project lifespan and the corresponding cumulative profit. The DRL solution obtains a higher cumulative profit than the two constant injection scenarios, even though it has a shorter overall period of operation compared with the optimal constant injection scenario.
Stochastic model case
Figure 10 shows the behavior of a DRL agent trained with model uncertainty and noise-free multigeophysical data (Agent 15) and applied to a suite of 100 evaluation models. To summarize the results across these 100 models, the average injection rate for each year was plotted as a solid line, whereas the distribution of the injection rate for each year is shown with box plots (that illustrate the minima and maxima via whiskers, and the 25th and 75th percentiles via boxes). Note that due to the stochastic nature of the evaluation (via the geostatistical realizations) and the terminal pore pressure condition, the project lifespan for each year varies, with fewer actions taken as the operation proceeds. The result in Figure 10 shows a general trend of high injection rates at the start of the operation, followed by a gradual decrease. During the first two years, the injection rate has a low variance across the evaluation models (i.e., the boxes are not visible). As the average injection rate gradually decreases between years 3 and 9, the variance of the injection rate increases. Such behavior indicates that the trained agent selects a relatively consistent injection rate, independent of the geostatistical realization of the permeability model during the first two years, which subsequently varies depending on the observations (i.e., states) generated as a result of these early actions. From year 7 to year 16, the agent shows a wider range of actions that span the minimum to maximum injection rates, although these rates are generally low on average. The occasional high injection rate in later years corresponds to similar behavior shown in the deterministic case in Figure 9.
In Figure 11, the probability density functions illustrating the financial performance of the two DRL agents trained with and without subsurface uncertainty are shown. The probability density results were calculated using Gaussian kernel density estimation with Scott’s rule (Scott, 1992) for bandwidth selection. In both cases, the agents were trained on and applied using noise-free multigeophysical measurements (Agents 7 and 15). The profit distribution for the agent that is trained with the true deterministic model (Agent 7) but applied to a suite of geostatistical evaluation models is shown in red, whereas the distribution for the agent trained with model uncertainty (Agent 15) is shown in blue. The profit distribution for the optimal constant injection scenario is shown for reference in black; this constant rate was chosen based on the average cumulative profit over the 100 evaluation models, considering the nine possible injection actions. The constant injection scenario and the agent trained without subsurface model uncertainty generate almost the same average cumulative profit, which increases when the agent is trained with subsurface model uncertainty. Furthermore, compared with the optimal constant injection case, the DRL agent trained with the deterministic permeability model generates more instances of cumulative profit of over 400 million USD, but it also generates more instances of profit below 270 million USD. Such unfavorable outcomes arise from the high injection rate decisions made by the DRL agent leading to early project termination due to excessive pore pressure buildup. This possibility is undesirable as it negatively affects the profitability of the operation and increases induced seismicity risk. In contrast, the agent trained with the subsurface model stochasticity shows no instances of cumulative profit less than 320 million USD, demonstrating the advantage of considering the exogenous uncertainty of the petrophysical models used in reservoir simulation when searching for an optimal GCS operating policy.
In Figure 12, the financial performance of DRL agents trained to consider subsurface uncertainty is compared across varying levels of noise during training and evaluation, as well as across different combinations of geophysical measurements. Figure 12a compares the results generated by the DRL agents trained on multigeophysical measurements in the presence and absence of noise (Agents 15 and 16) and subsequently evaluated using either noise-free or noisy data. We observe that the agent trained without noise but applied using noisy measurements performs poorly, obtaining a lower mean cumulative profit than the optimal constant injection scenario. However, an agent trained with noise obtains a mean cumulative profit that exceeds the constant injection case, approaching the performance of a DRL agent trained and evaluated on noise-free data. Figure 12b shows cumulative profit distributions for DRL agents trained and evaluated on noisy time-lapse AVO data (Agent 14) or noisy time-lapse gravity data alone (Agent 10), along with the distribution generated by an agent trained and evaluated on both types of noisy data (Agent 16) together. As expected, the mean profit obtained using a combination of monitoring data exceeds the mean profit generated using individual geophysical methods, indicating that multigeophysical monitoring strategies add value to GCS optimal control.
DISCUSSION
We have demonstrated that a combination of time-lapse gravity and AVO measurements improves DRL-derived control policies, especially in the presence of geophysical noise. We, therefore, speculate that using more diverse measurements will achieve better outcomes. For example, EM monitoring using controlled sources, which are well known to be sensitive to pore-filling fluids (Commer et al., 2022), could be considered as part of the state signal to enhance performance further. Well-based measurements using EM, gravity, or distributed acoustic sensing similarly have the potential to provide complementary information to the agent (Freifeld et al., 2009; Pevzner et al., 2021). Other possible observations that can be included in the state definition include microseismic monitoring, which can directly detect fractures and/or fault (re)activation (Verdon, 2011), and ground deformation information via interferometric synthetic aperture radar, an informative proxy for the subsurface pressure status (Yang et al., 2015). The inclusion of geomechanical simulation (or coupled flow and geomechanical simulation) as part of the environment should also be considered to more completely describe risks, such as induced seismicity (Tillner et al., 2014). The DRL framework provides a natural platform to integrate multidisciplinary geoscientific methods such as geophysics, flow simulation, and geostatistics with the common goal of sequential decision-making in GCS projects. We believe that this natural integration property is one of the most important features of DRL in the context of subsurface fluid management.
Generally speaking, DRL agents trained on a digital twin of a GCS operation provide a framework for evaluating operational parameters through explicitly defined state-reward-action pairs derived from the digital twin. We refer to this concept as “geophysical control” using DRL-AI. Beyond the novel concept of geophysical control, there are still many open research questions to be answered before our approach can be applied in real-world scenarios. First, additional challenges related to subsurface uncertainty are expected. In Noh and Swidinsky (2024), the impact of such exogenous uncertainty on the performance of a policy applied to the unseen true subsurface was demonstrated. Their results show that when a more accurate subsurface model is used for training, the application performance improves. However, given that a complete reservoir description can never be obtained, future work will focus on tackling this problem to allow DRL to be used in the real world eventually. Furthermore, 3D reservoir models should be used to describe real-world complex geologic systems, and the computational cost for simulation and training in such cases will increase significantly. Surrogate modeling approaches using physics-informed neural networks (Shokouhi et al., 2021) or neural operators (Wen et al., 2022) should be considered to reduce potential bottlenecks. Finally, techniques that can effectively distill information from raw geophysical monitoring data into a usable state should be developed to provide more robust inputs for DRL training and application. For example, inverse models, processed images, and other types of postprocessed information that are intended to extract and reveal information from the raw data meaningfully may potentially be used to generate better control policies. However, it remains an open question whether these additional layers of complexity (such as regularization in geophysical inversion) will enhance or degrade DRL performance.
CONCLUSION
The long-term goal of our work is to optimize (and potentially automate) the control of GCS operations using a combination of geophysical monitoring data and AI. As a small step toward real-world applications, we investigated the sensitivity of DRL — a sequential decision-making technique — to two different types of stochasticity: an uncertain subsurface model (exogenous uncertainty) and noise in the monitoring data (endogenous uncertainty). In addition, we investigated whether multigeophysical measurements could improve control policies under such circumstances. We independently trained and compared the learning behavior and application performance of various agents provided with or without petrophysical model uncertainties, using different geophysical measurement types (surface time-lapse seismic AVO, surface time-lapse gravity, and combined AVO and seismic monitoring data) and different levels of measurement noise.
When the learning behaviors of the DRL agents were compared, we observed that the agents using more information (gravity monitoring, seismic AVO, and multigeophysics, in ascending order) obtained higher total rewards at the end of the training when noise levels were low. Learning performance deteriorates with increasing noise, as one would expect. However, a counterintuitive result is that this effect is more apparent for seismic AVO than for gravity monitoring, indicating that DRL is more sensitive to noisy measurements that provide high resolution subsurface information. As expected, the use of multigeophysical monitoring significantly improves learning behavior in the presence of noise, an observation that holds for the deterministic and stochastic model cases.
The application of various agents to a suite of evaluation models demonstrates that: (1) training DRL with geostatistical models is critical due to the uncertainty in true subsurface conditions, as policies learned from incorrect deterministic models can lead to poor decisions in practice; (2) training with noise-contaminated geophysical measurements is also important, reflecting the practical reality of geophysical noise; and (3) using multiple geophysical measurements yields better policies in scenarios where both types of uncertainty are present. Future work will investigate how our research can be applied to the digital twins of real GCS operations, including additional considerations such as uncertainties in carbon pricing.
ACKNOWLEDGMENTS
This work was supported by the NSERC Discovery Grant RGPIN-2021-02528 and enabled, in part, by high-performance computing support from the Digital Research Alliance of Canada (alliancecan.ca). The authors also appreciate the GSTools, OPM, and rockphypy communities for providing the open-source geostatistical, reservoir simulation, and rock-physical modeling tools used in this study.
DATA AND MATERIALS AVAILABILITY
Data associated with this research are available and can be obtained by contacting the corresponding author.
APPENDIX A NEURAL NETWORK ARCHITECTURES FOR THE AVO, GRAVITY, AND MULTIGEOPHYSICAL STATE DEFINITIONS WITHIN THE DDQN MODEL
The three state definitions (equations 7, 10, and 11) require different CNN architectures as function approximators within the DDQN model, given the different shapes of the input tensors. When time-lapse seismic AVO and gravity monitoring are combined, each input channel is treated by the corresponding hidden topologies and concatenated for sensory fusion of the information from each measurement. The detailed topologies of these three architectures are provided in Table A-1.
Biographies and photographs of the authors are not available.