Much like artists use their skills of abstraction, simplification, and idealization to capture the essence of a landscape, the articles in this issue of *Elements* on reactive transport modeling describe how theories and assumptions about the subsurface world can be translated into constructs of a mathematical world. Such theories may encompass the actions of micro- and macroorganisms, solute and gas transport, the speciation of both solid phases and surfaces, and their myriad interactions. The resulting mathematical world, built up over decades, is then distilled and interrogated numerically using computer models made up of advanced algorithms that tenaciously step through time and space. Although the computer world is inherently multilingual, a large fraction of the existing reactive transport models (RTMs) were developed when Fortran was the dominant language for scientific computing. Ultimately, the job of the computers is to mediate between the abstracted conceptual world of the scientists and the striking, but hidden, complexity of the subsurface world. The RTMs are, hence, diplomats in an ever-expanding quest to understand the past, present, and future of the subsurface. How can we apply these algorithms to test and advance our theories about the subsurface world?

## BUILDING A SITE CONCEPTUAL MODEL

The most common objective in the application of RTMs is to create a description of the subsurface (seeFig. 1) that we can use in two ways: (1) To test and expand our understanding of processes by comparing computational results to the information we derive from real-world measurements; (2) to codify this understanding by expanding the libraries of the mathematical world. How do we decide on how big or how detailed the virtual subsurface should be? Despite ever growing computational power, most reactive transport models are optimized to work over domains larger than ~ 1 cm (or the lower limit of Darcy's law) but smaller than ~ 1,000 m. What are the key spatial dimensions and the key transport and (bio)geochemical processes at work within such a domain? An example might be a 100 meter long by 20 meter wide contaminant plume within a 10 meter thick aquifer. Depending on the flow rate and contaminant chemistry, the zone of contamination may extend several meters upgradient and several tens of meters downgradient. A key topic of investigation might be the heterogeneity of the chemical and hydrologic properties of the aquifer and how these properties affect contaminant migration. In this case, we might rely on well logs, geophysical images, pressure gradients, and spatial and temporal variations in contaminant concentrations. Ideally, a set of field measurements will be obtained that address the initial parameterization of the model (*input files*), while another set will be used to test the predictions (*output files*) that it produces.

## BUILDING THE MATHEMATICAL WORLD

To translate a conceptual world into an RTM, the mathematics has to be defined. Thus, we need to develop a quantitative description of the interacting geochemical processes that we infer are at work.

For a contaminant plume, this starts by defining the relevant chemical components that will be the foundation of the underlying reaction network. What are the relevant aqueous complexes and surface complexes? Which minerals can sorb the contaminant? Is calcite dissolution controlling the pH? Are microbes controlling oxygen levels? What are the kinetics of each of these reactions? The stoichiometric relationships, equilibrium constants, types of surface complexes, and the mathematical description of sorption (e.g., type of isotherm) all require mathematical descriptions—some of these are well known (e.g., equilibrium constants), whereas others, such as sorption capacity and microbial kinetics, must be determined experimentally. For an introduction to quantitative biogeochemical representations associated with reaction transport modeling, see Bethke (2008).

Another key issue to be addressed during reaction transport modeling is how will flow transport be calculated? Will there be discrete lenses of fine-grained, low conductivity material, or will uniform conditions be applied throughout the domain? The majority of models use so-called *operator splitting*, where the transport calculations are performed separately from the geochemical calculations, a process that requires smaller time steps for smaller grid cells. As a result, to physically resolve subsurface heterogeneity may require a smaller domain or appropriate upscaling techniques. Thus, alternative frameworks have been developed to account for subsurface heterogeneity, including the use of stagnant zones for alluvial aquifers and a concept called *multiple interacting continua* for modeling fracture systems.

A comprehensive mathematical description must also include both initial and boundary conditions. A model needs to know what is in the system at time zero and what will be allowed to enter and leave the simulation volume. Initial conditions might include mineral distributions and their surface areas, microbial biomass, and hydraulic conductivity. Boundary conditions tell the model about the entry and exit points for material and for energy fluxes. Where is water entering and what is its composition? Where can it leave? These are represented by mathematical equations that the computer world needs.

## COMMUNICATING WITH THE COMPUTER WORLD

The computer world is looking for a very specific suite of parameters that must be provided in the form of input files that summarize the mathematical representation. There are some 22 different RTMs that are commonly used today. But which one is best suited to achieving your research goal? Table 1 of Steefel et al. (2015) provides a comparison of the different mathematical formalisms. For example, if your plume is supercritical CO_{2} then you will need the multiphase flow capabilities provided by a simulator like TOUGHREACT (https://tough.lbl.gov/software/toughreact/). If you want flexible descriptions of sorption, then a variety of other models will work perfectly well. Another option is to look at how the model has been used previously—the user community can be an important resource. Ultimately, selecting a model hinges strongly on the conceptual and mathematical worlds you build, as well as on the realism and precision you need for understanding your virtual subsurface.

When submitting jobs to the computer world it is best to start simple and to build complexity over time. An excellent example of systematic model building is given by Postma and Appelo (2000), who carried out detailed simulations of a column experiment investigating the interactions between iron and manganese in aquifer sediments. By using a stepwise approach that builds up the complexity, the conceptual model can be refined and, ultimately, streamlined.

The majority of RTMs are free to use and are open source. This means that you can download your own version and modify it as you see fit. Most models will come with example cases that you can build from and benchmarks where the inputs and test data are provided. Increasingly, models are moving towards parallelization and extreme “interoperability”, meaning that codes can work together, thereby profiting from the particular strengths of individual models. The move towards cloud-based computing is also increasing the accessibility of high-performance computing by eliminating the need for access to a physical cluster. These approaches can be very useful when solving large-scale and complex problems.

## INTERROGATING THE SUBSURFACE WORLD

The models described above will do their best to create a virtual subsur-face world for you to explore. Snapshots in time of selected concentrations, mineral phases, adsorbed species and other chemical components can be compared to values measured in the real world via 2-D and 3-D contour plots (Fig. 1). Most models lack built-in visualization tools, so people often use the tools of Matlab, R, or Python to manipulate and visualize their results. VisIt is a freely available 3-D visualization and animation tool (https://wci.llnl.gov/simulation/computer-codes/visit) that can handle a wide variety of data formats. In addition to multidimensional images, models allow you to specify observation points, such as concentrations at a multilevel sampling well. At these locations, the model will predict the transient evolution of select species revealing, for example, how a contaminant plume has evolved over time.

What we ultimately want to know is how well does the computer-generated subsurface world correspond to real-world measurements, given that both have intrinsic biases. In order to address this challenge, new toolsets are under development that capitalize on machine learning algorithms to assess how well models correlate with physical observations (Scheidt et al. 2018). Because they require fewer simulations to arrive at uncertainty quantification, for large and complex models these new approaches provide an alternative to more established tools, such as inverse modeling using programs like PEST (http://pesthomepage.org/). Indeed, advanced algorithms are increasingly being used to diagnose and rank processes and parameters in complex models in order to identify which are the key input parameters, to prioritize data collection efforts, and to guide model simplification.

## LEARNING THE LANGUAGE OF THE MATHEMATICAL AND COMPUTER WORLDS

Most of us are quite adept at building real-world conceptual models, but using the languages of the mathematical and computer worlds may be more foreign. Fortunately, there are a growing number of resources becoming available. One option is to take a short course from a developer, such as the outstanding course offered by Geochemist's Workbench (www.gwb.com), which also makes their slides publicly available. The TOUGHREACT developers also offer a yearly short course. Professor Li Li, at Pennsylvania State University (USA), has also developed an open access online course that explains RTM principles (https://www.e-education.psu.edu/png550). Finally, most tertiary education institutions offer courses in reactor modeling (usually in civil, environmental, or chemical engineering departments), fluid mechanics (most engineering programs), hydrogeology, and quantitative geochemistry.

Ultimately, employing the computer world to create an idealized and accessible representation of the subsurface is a process that requires one to draw on many different skills, from conceptualization to basic knowledge of model structure. Yet, like an artist who builds to mastery by practicing skills and learning from others, the growing galaxy of models combined with ever-expanding cloud computing resources, is expanding geoscience's access to codes and computational power, online resources, in-person training opportunities, and model benchmarks.

**The readers has free access to the “free” material but MSA holds the rights**