## Abstract

Three-dimensional (3D) microstructure reconstruction is a key approach to exploring the relationship between pore characteristics and physical properties. Viewing the training image as a prior model, multiple-point statistics (MPS) focus on reproducing spatial patterns in the simulation grid. However, it is challenging to efficiently generate 3D nonstationary models with varying microstructures. In this work, we propose column-oriented simulation (ColSIM) to achieve the stochastic reconstruction of 3D porous media. A heterogeneous system is understood as a spatially evolving process that consists of frequent transitions of small magnitude and abrupt changes of large magnitude. First, a training image selection step is suggested to find representative microstructures. Our program applies modified Hausdorff distance, *t*-distributed stochastic neighboring embedding, and spectral clustering to organize two-dimensional (2D) candidate images. The medoid of each group is applied to guide the following programs. Second, we introduce column-oriented searching into MPS. To save simulation time, a subset of conditioning points is checked to find desired instances. Our program suggests an early stopping strategy to address complex microstructures. Third, a contrastive loss term is designed to create 3D models from 2D slice. To automatically calibrate the volume fraction and simplify parameter specification, the computer consistently monitors the difference between the present model and the target. The performance of ColSIM is examined by 3D multiphase material modeling and 3D heterogeneous shale simulation. To achieve quantitative evaluation, we compute various statistical functions and physical descriptors on simulated realizations. The proposed ColSIM exhibits competitive performance in terms of calculation efficiency, microstructure reproduction, and spatial uncertainty.

## 1. Introduction

Microstructure models are increasingly recognized as a fundamental component in various geological and geophysical applications. Recent research in computational mineral physics has highlighted the need for high-quality realizations to express the microstructure of interest [1]. Compared with the traditional experiment analysis, computational methods provide a low-cost way to build the bridge between spatial structure and physical property [2]. In petroleum exploration, three-dimensional (3D) digital rock becomes an essential tool to study the pore connectivity and permeability [3]. With the development of imaging devices, computed tomography (CT), magnetic resonance imaging (MRI), and focused ion beam have become important sources for directly presenting 3D microstructure of porous media [4, 5]. However, randomness and uncertainty are key concepts in the subsurface system. Only a few accurate models cannot completely represent the variability during microstructure synthesis and evolution. Therefore, stochastic microstructure reconstruction is gaining popularity [6]. Based on a prior image, the computer creates an ensemble of high-quality models. In general, there are two basic steps within the stochastic modeling. At first, a characterization step focuses on representing microstructures with several descriptors and functions. Next, a group of realizations that exhibit comparable property with the target are generated by the reconstruction program.

Statistical functions are widely used to explain geometric characteristics and spatial correlation. Yeong and Torquato characterized 3D materials with the two-point correlation function and achieved reconstruction by a simulated annealing optimization method [7]. The simulated realization is iteratively updated until its statistical difference from the target is lower than a threshold. To improve characterization accuracy, a set of high-order functions is introduced. On one hand, the statistical function is improved to express the complicated microstructure. Lineal path function and two-point cluster correlation function are reported to measure the topological connectivity of porous media [6]. On the other hand, physical descriptors are applied to characterize two-dimensional (2D) and 3D microstructures [8]. For instance, the volume fraction explains the proportion of each material phase. Particle size distribution is effective in describing the porous media [9]. The advantage of physical descriptors lies in explicitly revealing the effect of microstructure on material properties. Furthermore, numerous researchers are devoted to improving optimization algorithms. Genetic algorithms, pixel-swapping heuristics, and model initialization are suggested to save computation time [10, 11]. Apparently, the choice of microstructure descriptors has a substantial influence on reconstruction accuracy and efficiency. A combination of objective functions becomes a feasible way to describe complex microstructures [12]. However, incorporating many functions into the optimization program leads to a high computational burden and the local optima problem.

The limitations of statistical functions and physical descriptors motivate the application of image processing and geostatistics in the field of microstructure reconstruction. The representative methods include texture synthesis (TS), Markov Chain Monte Carlo (MCMC), and multiple-point statistics (MPS) [13, 14]. Instead of a characterization function, a training image (TI) is used to express the microstructure under consideration. To extract morphological features, the computer focuses on identifying the relationship between neighboring pixels. Next, a reconstruction procedure is launched. The program scans the simulation grid (SG) pixel-by-pixel along a specified or random path. The unknown point is predicted by pasting a matching structure from TI to SG. With the objective to improve modeling quality, numerous machine learning techniques are incorporated into TS, MCMC, and MPS frameworks. A range of methods vary in their choices of neighborhood geometry, similarity measurement, searching strategy, and prediction method [15, 16]. Further technical improvements include high-dimensional modeling, anisotropic structures, multiphase materials, and multiresolution analysis [17, 18]. Moreover, the development of deep generative model has attracted considerable attention [19, 20]. The core idea of generative adversarial network (GAN) is the zero-sum game. Based on a large amount of TIs, the generator network attempts to create new models with the desired features. In comparison, a discriminator network focuses on flagging synthetic images. After numerous backpropagation iterations, two networks converge to Nash equilibrium. Moreover, substantial efforts are made to help GAN to honor the conditioning data [21, 22]. On one hand, the geostatistical models become TIs to express the spatial patterns of the subsurface system. On the other hand, the generator network incorporates the well measurement and the seismic interpretation to create an ensemble of realistic subsurface models [23]. The growing applications of GAN contain sandstone [24], shale [25], geological facies [26, 27], subsurface structures [28], and so forth.

Though many improvements have been achieved, reconstructing the heterogeneous microstructure is a challenging task. On one hand, a major limitation is the stationary and uniform assumption in TS, MCMC, and MPS. According to prior knowledge, the modeling method concentrates on simulating description functions or spatial patterns. However, there are diverse structures in a nonstationary material system. It is difficult to reproduce microstructures at the appropriate position. Unrealistic models lead to a bias in the property prediction. On the other hand, the technical limitations of MPS have a negative impact on reconstruction speed and quality. The existing row-oriented dataset is not efficient when a complicated query is provided. As a widely used method to create 3D model, probability aggregation cannot adaptively tune its weight vector. In order to build a model with desired property, the user has to consume considerable time to find the optimal parameters.

In this paper, we explore a way to perform MPS on a heterogeneous system. A group of 3D models with accurate microstructure has a positive effect on investigating the relationship between pore characteristics and physical property. The core idea of our column-oriented simulation (ColSIM) is to refer to a nonstationary material as a stochastic process consisting of frequently small changes and rarely abrupt shifts. First, we propose a TI selection method to find representative microstructures. Aiming at measuring the similarity between candidate images, modified Hausdorff distance (MHD) is applied to calculate the distance between edge points. The image similarity becomes a measure of morphological variation in the heterogeneous system. Our program conducts *t*-distributed stochastic neighboring embedding (tSNE) and spectral clustering to organize 2D slices. The medoid of each group has a high rank in the subsequent simulation program. Second, column-oriented searching is suggested to accelerate stochastic reconstruction. Given conditioning data, we efficiently retrieve matching structures from MPS dataset. The recursive searching in the previous MPS is removed by our early stopping strategy. Third, the proposed program fulfills 3D modeling from 2D TI. With the intention of simplifying parameter specification, a contrastive loss term is introduced in the probability aggregation framework. The volume fraction of each material phase is automatically calibrated.

As the first application, a porous medium with three material phases is adopted to test our method. Based on a 2D slice, the proposed ColSIM generates an ensemble of 2D and 3D models. Next, we examine the performance of our method in a heterogeneous shale system. Fourteen representative slices are selected from a 3D model of size 128 × 128 × 128 pixel^{3}. A group of 3D shale realizations are rapidly created to express uncertainty in the petroleum reservoir. The computer checks reconstructed models with not only morphological characteristics but also physical properties. Statistical function, analysis of distance (ANODI), electrical conductivity, and pore volume distribution are applied to comprehensively evaluate the reconstruction quality. The modeling results indicate that the proposed ColSIM exhibits impressive performance in terms of computational efficiency, microstructure reproduction, and uncertainty quantification. The microstructure in the TI is well preserved and regenerated in 3D realizations.

## 2. Related Work

In this section, we provide a brief review on MPS microstructure reconstruction. At first, the previous MPS dataset and point simulation are discussed in detail. Given a TI as prior model, the computer applies a row-oriented dataset to collect spatial patterns. During the reconstruction step, the matching structure is found from the dataset and pasted into SG. The searching speed plays a decisive role in MPS running time. Next, we explain the probability aggregation framework. Based on several conditional probabilities, a global probability is generated to predict an unknown point in 3D model. The weight vector in the probability aggregation becomes a contributing factor to the reconstruction quality.

### 2.1. Row-Oriented Dataset in MPS

Dataset construction and pattern retrieval are key steps in many MPS programs. As the first practical MPS program, single normal equation simulation (SNESIM) applies a search tree to collect patterns [15]. To alleviate memory issue, the list structure is employed by improved parallelization (IMPALA) [16]. Figure 1 provides a conceptual example to explain the dataset construction. As shown in Figure 1(a), the prior knowledge comprises a TI and an SG. Based on a predefined template in Figure 1(b), the computer visits a point in TI and captures a pattern. During the pattern extraction, template points are sorted in accordance with their distances with the center $u$. For instance, a pattern centered at the point $a1$ is $Pa1=a1+u1,a1+u2,a1+u3,a1+u4=(3,3,2,2)$. Moreover, the program records the value of the template center with the occurrence vector. Since there are three phases in TI, $(O1,O2,O3)$ becomes the basic form. In this case, we create $O1,O2,O3=(0,0,1)$ because the state of the point $a1$ is 3. Based on the preceding procedure, the computer visits every available point in TI. Twenty-five patterns are found. The occurrence vector is updated by gathering the center of each pattern. After the pattern extraction procedure, IMPALA ranks training patterns according to the number of benchmark phases. In this example, the state 3 is chosen because it is the majority in TI.

After dataset construction, MPS launches a reconstruction program to generate realizations. Along a random path, the program visits an unknown point in SG. As shown in Figure 2(a), an unknown point $b1$ is examined. With the template used above, the program creates a conditioning pattern $Pb1=(3,0,1,0)$. We denote the value of unknown point as 0. Thus, the possible number of reference state 3 in matching patterns ranges from 1 to 3. Next, a searching step is conducted in Figure 2(b). The program compares the query $Pb1$ and every candidate instance. In this case, the program obtains two compatible patterns, $P6=(3,2,1,2)$ and $P8=(3,3,1,1)$. The corresponding occurrence vectors are $(0,2,0)$ and $(1,0,0)$. Therefore, the conditional probabilities of three phases are 0.33, 0.67, and 0, respectively. On the basis of those probabilities, the program produces a random sample as the result of MPS point simulation.

One issue in the preceding step is that there is no matching pattern in the dataset. For example, the compatible pattern does not exist when a query $Pb2=(3,2,3,2)$ is provided. Thus, a pruning strategy is applied by SNESIM and IMPALA. The program removes the farthest known point and creates a new conditioning pattern. In this case, a new instance $(3,2,3,0)$ is input into the searching procedure. Nevertheless, the program does not find a matching pattern. This pruning strategy is consistently performed until the number of compatible patterns is larger than a threshold. In this case, the program discards two informed points and upgrades the conditioning pattern into $(3,2,0,0)$. $P6=(3,2,1,2)$ and $P7=(3,2,2,1)$ are output as the searching result. In the pruning strategy, a fundamental concept is that the point close to the template center plays a substantial role. There is no distance calculation within the pattern retrieval program. Though it has three matching elements, $P11=(3,3,3,2)$ is not regarded as a compatible instance with $Pb2=(3,2,3,2)$. Furthermore, the number of template points is a key factor to searching speed. With the intention of extracting complex patterns, an extending template is broadly employed in practical MPS applications. Aiming at finding a completely matching pattern, the computer has to repeatedly launch the pruning and retrieval procedures. This recursive searching strategy results in time consumption during microstructure reconstruction.

Moreover, a noticeable characteristic of the existing MPS dataset is that there is an exponential relationship between searching time and the number of unknown points in the conditioning pattern. A growing number of unknown points leads to a dramatic increase in searching scope. Since there are few known points in SG, the speed issue becomes serious at the beginning of reconstruction. For example, the program visits a point $b3$ and produces a conditioning pattern $Pb3=(0,2,0,0)$ in Figure 2(a). Therefore, the possible number of phase 3 in matching patterns is in the range [0, 3]. As Figure 2(d) displays, eleven training patterns are involved in the searching step. The program outputs $P2$, $P6$, and $P7$ as the result.

### 2.2. Probability Aggregation Method to Create 3D Models from a 2D Slice

In Earth science, 3D microstructures have an important influence on physical properties. Various imaging devices, such as CT and MRI, are able to explicitly observe 3D structures. However, the tradeoff between image resolution and receptive field brings difficulty to practical applications. It is costly to directly generating a high-resolution image of big size. In comparison, collecting a high-resolution 2D slice is relatively convenient. Therefore, researchers propose a variety of MPS programs to produce 3D models using 2D cross-sectional image. In particular, probability aggregation has attracted attention [29]. The advantages include easy implementation, high simulation quality, and multiresolution modeling.

The core idea of probability aggregation is to approximate a 3D probability by combining several conditional probabilities that are computed according to 2D patterns. First, MPS visits an unknown point in 3D grid. Second, three 2D conditioning patterns are individually created along perpendicular directions. Third, the computer independently launches searching programs and calculates three conditional probabilities. Fourth, a probability aggregation expression is employed. As a broadly used metric, Bordley formula is defined as follows:

where $\Theta G$ is the odds of 3D conditional probability $pG$. There are four probabilities on the right side of Bordley formula. $p0(k)$ is the prior probability of the phase $k$. In general, the proportion of each phase in TI becomes the prior knowledge. By contrast, $p1$, $p2$, and $p3$ are conditional probabilities calculated from three 2D patterns. Based on the first 2D pattern, $p1(k)$ denotes the possibility that the unknown point is the phase $k$. Moreover, the weight $wi$ is a predefined parameter to determine the contribution of every term. Based on the 3D probability, Monte Carlo sampling is activated to predict the value of the unknown point. The preceding program is repeated until there is no uninformed point in SG.

Weight specification is a labor-intensive step within the existing probability aggregation method. By convention, the same contribution is associated with each 2D plane. The weights $w2$ and $w3$ are identical to $w1$. Therefore, $w1$ becomes a fine-tuned parameter. Since weights in Bordley formula are predefined and fixed, the program cannot adaptively calibrate the volume fraction or other properties during the reconstruction. The weight vector has to be iteratively adjusted until a desired 3D model is created. For example, the Berea sandstone reconstruction implies that the weight setting has a significant impact on porosity as well as connectivity [29]. Eleven different values of $w1$ are separately examined to synthesize 3D models. Considerable time is spent on the parameter configuration step.

## 3. Proposed Method

In this work, we explore a way to efficiently generate 3D heterogeneous microstructure models. First, a TI selection is presented. Our program organizes 2D candidate slices according to their similarities. The medoid of each group is given top priority in the downstream simulation program. Second, we propose column-oriented searching to accelerate MPS modeling. Rather than the row-based dataset in previous methods, our column-oriented dataset is able to rapidly find matching patterns and remove redundant calculations. Third, a contrastive loss term is introduced into probability aggregation framework. In order to simplify parameterization, our program adaptively tunes the 3D conditional probability on the basis of the difference between the present model and the target.

### 3.1. TI Selection with MHD, tSNE, and Spectral Clustering

Heterogeneity and nonstationarity are common scenarios in the 3D porous material research. In general, heterogeneity means there are diverse microstructures in one model. Therefore, only one image cannot provide sufficient prior knowledge to inspire MPS simulation. There are three broadly used strategies to handle heterogeneous structures. The first method is to manually divide the heterogeneous TI or SG into several stationary subareas [30]. Each subarea has its specified characteristics. However, the partition strategy heavily relies on the understanding of the microstructure system of interest. Furthermore, discontinuities between local areas become a key drawback. As the second way, a spatially contiguous map becomes the auxiliary variable to help the simulation program [31, 32]. Viewed as the primary variable, the original TI focuses on expressing the morphological characteristics under investigation. By comparison, the auxiliary variable defines the occurring locations and trend of spatial structures. Despite numerous applications, one major challenge of the auxiliary variable is the weighted aggregation. The contributions of primary and auxiliary variables are case-dependent in the real-world scenario. The limitation of preceding two methods inspires the development of TI selection [33-35]. Based on several machine learning techniques, researchers concentrate on reproducing the spatial and temporal evolution of morphology. The representative structures are used to guide the following programs. One key advantage of TI selection method is the parameter specification. Compared with domain partition and auxiliary variable, there are few user-defined parameters in the TI selection method.

Figure 3 displays a 3D heterogeneous shale system with a resolution of 2.8 µm. There are two material phases: pore and grain. Our program views a 3D model as a 2D image sequence. Let $TIz$ denote an individual image whose z-coordinate is $z$ in this porous medium. The total number of 2D images is 128, while only five samples are depicted. It is clear that there are intensive variabilities between $TI0$ and $TI64$. A long-range connected component exists in $TI0$. The fracture becomes a key path to water flow. By comparison, there are many isolated pores in $TI32$. Furthermore, $TI64$, $TI96$, and $TI127$ share similar porosity and pore size. In these images, the big pores are surrounded by a set of small pores.

Aiming at reproducing heterogeneous structures and quantifying uncertainty, we introduce TI selection into 3D microstructure reconstruction. A heterogeneous material is equivalent to a series of small-scale transitions and abrupt shifts. As shown in Figure 4, there are three major steps in our TI selection program. At first, a distance metric is applied to distinguish 2D candidate slices. In this work, we apply MHD to calculate the geometrical similarity between two images. A large value indicates that there is a change of large magnitude between two slices. On the other hand, a smooth transition occurs when the distance is small. The workflow of MHD includes the following components: (1) edge detection becomes necessary to preprocess 2D slices. The edge of pore structures is identified by the Roberts operator. Let $E(1)$ denote the edge set in the first image, and $ei(1)$ denote the $i$-th edge point. It is worth noting that there is no repetitive element in the point set. (2) The program calculates the distance $Dei(1),E(2)$ between one individual point $ei(1)$ and a set $E(2)$. This distance is defined as the minimum:

In this work, we apply the Euclidean distance as the measure of distance between two edge points. (3) The distance $DE(1),E(2)$ between two point sets is computed by:

where $N1$ is the number of edge points in set $E(1)$. (4) The proposed method conducts maximization to enable the distance to become symmetrical. It should be noted that the distance $DE(1),E(2)$ is not equal to the distance $DE(2),E(1)$ in equation (4). These two measures do not satisfy the axiom of distance. Thus, the resulting MHD between two images is defined as:

MHD provides an effective and efficient way to measure the similarity between two slices. The technical details of MHD are discussed by Scheidt et al. [33]. In a 3D system, our method compares every two slices. Figure 4(b) provides the distance matrix between five example images. The deep blue indicates a high value of distance. Apparently, $TI0$ and $TI32$ exhibit large differences with other images. By contrast, there is a close affinity between $TI64$, $TI96$, and $TI127$. Since there are 128 candidate images, a distance matrix of size 128 × 128 is produced in this shale case. Then, the dimension reduction technique is applied to visualize the relationship between candidate images. In this work, we use tSNE. As an emerging manifold learning method, tSNE is devised to project each instance in a low-dimensional map. In the feature space, the similar examples are close, while the mismatching instances have a large distance. The main advantages of tSNE include (1) detecting nonlinear relationship and (2) preserving the local structure between instances. Within the dimension reduction framework, compromise is an inevitable part when the computer converts high-dimensional data into the low-dimensional space. The proximity between 128 images may be distorted. Aiming at facilitating the downstream procedures, tSNE focuses on preserving the local structure but may deform the global relationship. In other words, the program is capable of representing the distance between similar images. By contrast, the similarity between distant instances may be corrupted. In order to balance the local and global relationship, the perplexity becomes a key parameter in tSNE. A low value of perplexity indicates that a small neighborhood is considered. The program pays close attention to preserve the local structure. By comparison, the computer incorporates more mismatching instances when high perplexity is given. The global trend is well conserved. In our program, we specify the perplexity as thirteen, which is 10% of the number of 2D images.

In addition, a notable aspect of tSNE is the randomness in its output. In order to conserve the similarity between data points, tSNE minimizes the Kullback-Leibler (KL) divergence between the joint probabilities of the low-dimensional feature vectors and the high-dimensional data. Since the loss function is nonconvex, the gradient descent program in tSNE converges to the different solutions depending on the initial conditions. With the aim of creating a desirable result, the tSNE program is carried out twenty times. Our program provides the compressed features with the minimum KL divergence to the downstream tasks. Based on the MHD distance matrix, tSNE visualization result is shown in Figure 4(c). Each node represents a 2D image. The color map indicates the $z$-coordinate of each candidate slice. Since there is low similarity, $TI32$ has a large distance with other examples in the feature space. By contrast, $TI64$, $TI96$, and $TI127$ are located on the right side. This figure provides strong evidence that there is morphological evolution in the shale system.

Based on the tSNE result, our method focuses on grouping shale images. As an unsupervised learning technique in the field of machine learning, the clustering program is adopted to partition training instances into several groups. Similar examples get assigned to the same group. By comparison, instances in different groups have high disagreement with one another. In this work, we employ the spectral clustering method. Compared with other techniques, spectral clustering benefits from the connectivity between instances. There is no strong assumption on the statistics of the clusters. Moreover, one key parameter in spectral clustering is the number of groups. Based on the observation in Figure 3(c), there is strong morphological evolution within the heterogeneous shale system. Significant differences are found between $TI0$, $TI32$, and $TI64$ . $TI0$ is characterized by the long-distance connected component. Many isolated pores of small size exist in $TI32$. In contrast, $TI64$, $TI96$, and $TI127$ share the comparable porosity and pore sizes. Therefore, it can be assumed that there are three groups within the shale system. The spectral clustering result and medoid images are shown in Figure 4(d). In this work, we define the medoid as the 2D slice closest to the geometrical center of each cluster. According to the calculation result, the computer assigns the image associated with long-range microstructures to the red group. By comparison, the blue cluster accounts for isolated pores of small size. Images that contain pore structures of modest size belong to the green cluster. The program groups $TI64$, $TI96$, and $TI127$ together.

In order to simplify our method, we apply Silhouette score to find the optimal parameter in the spectral clustering. The program individually computes the mean intracluster distance $Dintra$ and mean nearest-cluster distance $Dnear$ for each sample. The former implies the average distance between each instance within the same cluster. On the other hand, the latter is equal to the smallest mean distance between a sample and all examples in the other groups. Silhouette score is defined as:

The value of Silhouette score ranges from −1 to 1. The best value 1 indicates that clusters are well separated and clearly distinguished. Therefore, we test a range of group numbers in the spectral clustering. The clustering program with the maximum Silhouette score is output as the optimal configuration.

In this case, fourteen image groups are identified based on MHD, tSNE, and spectral clustering. Next, our program selects the medoid of each group as a representative. Furthermore, we divide the SG according to the clustering result. The shale system is partitioned into fourteen subareas. The morphological characteristics of each subarea are defined by the corresponding medoid image.

### 3.2. Column-Oriented Dataset to Accelerate Pattern Searching

As mentioned in section 2.1, dataset is an important component within MPS. Writing and searching are two major functions of MPS dataset. At first, MPS scans TI along a predefined path and stores patterns in the dataset. There is a dramatic expansion during dataset construction. Next, the modeling program concentrates on finding desired patterns from the dataset. It is worth noticing that there is no writing task during the MPS simulation.

The characteristics of MPS dataset motivate us to introduce column-oriented searching into the MPS framework. In the big data community, there are two ways to organize relational databases [36]. On one hand, row-oriented databases are designed to achieve online transactional processing. The core task is to quickly write an individual record into dataset. As a horizontally partitioned structure, row-oriented dataset views every instance as an operating unit. However, one limitation of row-oriented dataset is the searching efficiency. In order to find a desired instance, the row-oriented dataset has to read a single row of data and test its attributes. On the other hand, column-oriented databases have gained popularity in the context of online analytical processing. Different from the row-oriented, the column-oriented dataset is divided vertically. Therefore, one key benefit is the retrieval speed. Rather than the whole instance, only a subset of attributes is checked by the searching program.

Figure 5 provides a conceptual example to discuss column-oriented searching within MPS framework. To promote reasonable comparison, three query instances in Figure 5 are the same as the conditioning patterns in Figure 2. Like the previous method, the template point with a small distance to the center is given top priority in the pattern lookup. Compared with row-oriented searching in section 2.1, there are three key points in the proposed method. (1) The searching direction is vertical. (2) Our program tests one attribute at a time. (3) Only known points in the conditioning pattern are examined. Based on the conditioning pattern $Pb1=(3,0,1,0)$, the main steps of the column-oriented searching are shown in Figure 5(a). Since the first known point in $Pb1=(3,0,1,0)$ is the phase 3, the first five training examples are eliminated. Next, the searching program focuses on the second known point. Thus, compatible patterns $P6$ and $P8$ are found by the searching program. Compared with Figure 2(b), column-oriented searching has the same result as the row-oriented searching program. Our method is more efficient because only a subset of attributes is examined.

Our column-oriented searching becomes more powerful than the existing method when there are many unknown points in the query. As mentioned in section 2.1, the searching time of row-oriented method has an exponential relationship with the number of unknown points. However, there are few informed points at the beginning stage of microstructure reconstruction. Missing data brings a large searching scope to the row-oriented dataset. By comparison, our column-oriented program benefits from unknown points. For example, $Pb3=(0,2,0,0)$ is captured from Figure 2(a). Because there is one informed point, the program only checks the second column in Figure 5(b). It is unnecessary to test other attributes. Consequently, $P2$, $P6$, and $P7$ are output as the searching result.

With the modeling program proceeds, the number of informed points has increased in SG. However, growing known points may result in the shortage of completely matching patterns. As section 2.1 states, the existing program applies a pruning strategy. The computer discards the farthest point and creates an updated query. The searching procedure is repeatedly performed until the number of compatible patterns meets a predefined threshold. In our method, we insist on finding the completely matching pattern. An early stopping strategy is proposed to tackle complex query and avoid redundant calculation. Instead of removing the farthest point in the pruning strategy, our program focuses on checking the close points. Figure 5(c) provides an example. Suppose that the threshold to define the number of matching patterns is 1. Based on the known points in $Pb2=(3,2,3,2)$, our program has gradually shrunk the searching scope. The first conditioning point in $Pb2$ enables the computer to concentrate on the last seven patterns. Next, $P6$ and $P7$ are selected because the state of the second informed point is 2. However, the candidate set becomes empty after the program examines the third column. Thus, the matching patterns found by the second column searching are output. There is no pattern update step in our method. Similar to the pruning strategy, the sequence of template points is a decisive factor in our early stopping strategy. The point close to the template center has more importance than the distant points. Despite there are three identical points, $P11=(3,3,3,2)$ is eliminated by our program because its second element is not consistent with the query $Pb2=(3,2,3,2)$.

It is worth noting that the checking sequence is a key component in our column-oriented searching program. Prior to the right side, the left columns are inspected by the proposed method. In other words, the computer allocates high rank to the conditioning points that are close to the template center. When the candidate set becomes empty, our early stopping strategy outputs the searching result. The rest of conditioning points are not visited. A basic idea behind this design is that there is an intensive correlation between template center and its close neighbors. During the simulation, template points with small distances are more important than long-range points.

The preceding three examples provide strong evidence that column-oriented searching is efficient to find matching patterns from the dataset. In addition, there is no quality degradation within the proposed dataset retrieval step. Given a query pattern, both row-oriented and the column-oriented searching programs output identical results. Consequently, the proposed program has comparable accuracy to the previous method while significantly saving the running time.

### 3.3. Probability Aggregation with Contrastive Loss to Reproduce 3D Microstructure

Based on the proposed TI selection and column-oriented dataset, our program finds the representative microstructures and speeds up the pattern searching. The next important technical problem is to reproduce 3D structures according to 2D slices. As stated in section 2.2, Bordley formula is widely used to aggregate a 3D probability from several 2D probabilities. However, one major limitation is the parameter specification. The program has to iteratively adjust weights until desirable realizations are yielded.

In this work, we introduce the contrastive loss term into the Bordley formula. The motivation comes from the concept of contrastive learning and contrastive loss function in the deep learning community [37]. Aiming at upgrading trainable weights in convolutional neural network, contrastive learning is reported to generate visual features without annotated labels [38]. During network training, two or more input images are encoded into feature vectors. The core idea of contrastive learning is to pull similar images together in the embedding space while pushing apart mismatch instances. As the objective function, contrastive loss is reported to measure the consistency between two feature vectors. The deep learning program attempts to minimize the contrastive loss and learn high-level features of images. The feature vector of an input image exhibits agreement with its counterpart.

Considering its significant influence on physical property, the volume fraction plays an essential role in our probability aggregation step. The conditional probability is calibrated in accordance with the difference between the volume fraction of SG and the target. Suppose that there are $K$ phases in the material of interest. Let $pSG(k)$ denote the proportion of the $k$-th phase in the present grid. On the other hand, $pt(1),pt(2),\u2026,pt(K)$ becomes the target distribution. Accordingly, the contrastive loss term is defined as follows:

As the only parameter, $\tau $ is referred to as the temperature normalization factor. A smaller value indicates that there is a strong constraint on the volume fraction. In this paper, we fix $\tau $ as 0.005. Based on equation (7), our program obtains a probability set $pc=pc(1),pc(2),\u2026,pc(K)$. Next, the odds $\Theta c$ is calculated in accordance with equation (2). Finally, we define the Bordley formula with contrastive loss term as follows:

where $p1$, $p2$, and $p3$ are conditional probabilities calculated by 2D patterns. Moreover, it is worth noting that the weight vector exists. In our formula, we specify each weight as 0.25. In other words, four terms in probability aggregation share the identical contributions.

## 4. Practical Applications of the Proposed Method

### 4.1. 3D Multiphase Material Modeling

In this section, we examine the proposed method in the context of multiphase material system. Asphalt mixes are essential materials in many practical applications [39]. Recent investigations have indicated that the physical property of porous media is highly dependent on 3D microstructure. With the intention of optimizing material design, it is necessary to create a collection of 3D microstructure models and gather their properties. As shown in Figure 6(a), CT is applied to explicitly express the microstructure of an asphalt mixture sample. Then, a thresholding program is conducted to detect three material phases: pore, cement, and matrix. In the pavement engineering community, these three phases are referred to as void, mortar, and aggregate, respectively. In order to investigate the randomness and uncertainty, we generate numerous 3D models on the basis of one 2D slice. Moreover, the asphalt mixture is manufactured by mixing heated asphalt and aggregate. Thus, this material meets the homogeneous and stationary assumption. The TI selection is not tested in this section.

As the first examination, we compare row- and column-oriented datasets. Based on a TI of size 128 × 128, an unconditional simulation is launched to create hundred 2D realizations. SG is of the same size as TI. To achieve extensive comparison, we test the performance of row- and column-oriented methods for setting the template size to 3 × 3, 5 × 5, 7 × 7, and 9 × 9. Moreover, the multigrid strategy with $G$ = 3 is employed to fulfill multiresolution reconstruction. The first simulated realization of each template is shown in Figures 6(c) and (d). Figure 7(a) provides the average running time for all cases. As the template expands, the computational efficiency of row-oriented programs has considerably decreased. By comparison, our column-oriented dataset does not lead to a substantial growth of running time. These observations indicate that the proposed ColSIM is efficient in reconstructing complicated structures with a big template.

The searching speeds of two datasets are further investigated. On one hand, we check the effect of unknown points on the simulation time. There are three basic steps. (1) Based on a template of size 7 × 7, our program randomly extracts 1000 patterns from TI. (2) According to the specified number of conditioning points, known points are arbitrarily removed to produce 1000 query instances. (3) Row- and column-oriented datasets are independently launched to find matching patterns. Since every query comes from TI, there is no need to conduct the pruning strategy in the row-oriented framework. The computational time is shown in Figure 7(b). It is clear that there is an exponential relationship between running time and the number of unknown points when the row-oriented dataset is adopted. In contrast, the proposed column-oriented program is faster to deal with an increasing number of missing data. On the other hand, the generalization ability of searching method is inspected. It is common that the query pattern does not appear in TI. Therefore, we create the 1000 conditioning patterns with random sampling. The proportion of each phase is the same as TI. Two searching programs are activated to tackle those unseen instances. Figure 7(c) records the searching time. The limitations of row-oriented searching embody two aspects. (1) The shortage of informed points leads to a large searching scope. Given few conditioning points, the program takes a lot of time to find compatible patterns. (2) The growing number of known points bring a computational cost. Based on the pruning strategy, the row-oriented method repeatedly discards conditioning points and creates new patterns. The searching program is iteratively performed to address updated query. By comparison, the proposed column-oriented searching method is efficient in addressing not only query instances with few conditions but also patterns associated with abundant data. The searching time is significantly saved.

Moreover, we compare the speed of our ColSIM with a variety of MPS programs. Aiming at accomplishing scientific test, all reconstruction programs are developed by Python. The computer configuration is Windows 10 with Intel I9-11900 of 2.5 GHz. At first, we implement IMPALA [16]. Based on a list-structured dataset, the computer activates row-oriented searching to find desired patterns. The parallel computation is not applied. Next, direct sampling (DS) [40], tree-based direct sampling (TDS) [32], and nearest neighbor simulation (NNSIM) [41] are launched to reconstruct 2D models. To achieve comprehensive comparison, various templates are used by IMPALA, NNSIM, and the proposed ColSIM. We separately employ a template of size 3 × 3, 5 × 5, 7 × 7, and 9 × 9. A multigrid strategy with $G=3$ is applied to detect microstructures across different scales. Moreover, we inspect the influence of conditioning points within DS and TDS frameworks. The number of template points is individually configured as 20, 30, and 40. With the purpose of balancing accuracy and efficiency, the distance threshold and the searching fraction are set as 0.1 and 0.25. Based on parameters stated above, MPS programs are carried out to generate hundred 2D models. Figure 7(d) provides the simulation time. Apparently, our ColSIM exhibits competitive performance in terms of computational efficiency. In this multiphase reconstruction scenario, the pattern lookup is considerably accelerated by the proposed column-oriented searching strategy.

Then, we apply two evaluation methods to check the reconstruction quality. At first, statistical functions are employed. Based on 2D images, our program independently conducts two-point correlation function $S2(r)$ and lineal path function $L(r)$. As a widely used descriptor, two-point correlation function accounts for the probability that both two points belong to the target phase in a microstructure image. By comparison, lineal path function concentrates on the connectedness along a straight line. From a probabilistic point of view, $L(k)(r)$ defines the possibility of tossing a line on the microstructure image, and the entire line is on phase $k$. Figure 8 exhibits statistical functions on 2D realizations created by two dataset searching programs and a template 9 × 9. Moreover, we calculate the average Euclidean distance between the statistical functions. A small distance indicates that there is a high consistency between TI and simulated realizations. According to Figures 8(a)–8(c), the average distances between TI and column-oriented searching models are 0.04, 0.12, and 0.37, respectively. By comparison, 0.04, 0.14, and 0.43 are yielded by comparing the red and blue curves. On the other hand, we test the difference between lineal path functions. Based on realizations created by column-oriented searching, the computer outputs three distances as 0.03, 0.04, and 0.24. The difference between TI and row-oriented models is quantified as 0.03, 0.04, and 0.26. Apparently, our column-based simulation method has similar statistical characteristics to TI as well as existing MPS programs.

In addition, we inspect the statistical characteristics of all MPS asphalt models. The average Euclidean distance becomes a measure of difference between TI and MPS realizations. Figures 9 and 10 provide the computation results. The finding embodies two aspects. On one hand, the simulation quality of IMPALA and ColSIM is not heavily dependent on the template size. The distance smoothly decreases with increasing template size. An extending template is beneficial to reproduce complicated structures. On the other hand, the number of template points is a sensitive parameter in DS, TDS, and NNSIM. As shown in Figures 9(b) and 9(c), the Euclidean distance has grown with the development of template. The similar phenomenon can be observed in Figures 10(b) and 10(c). By contrast, templates of size 5 × 5 and 9 × 9 have a negative effect on NNSIM simulation. Based on a template of size 7 × 7, NNSIM obtains comparable quality to IMPALA and ColSIM. The main reason of the preceding observations lies in the pattern similarity metric. In IMPALA, the pruning strategy discards the farthest point when there is no matching instance in the dataset. The early stopping strategy in ColSIM focuses on the points which are close to the template center. These mechanisms enable the computer to concentrate on the template points with high correlation. In comparison, Hamming distance is employed by DS, TDS, and NNSIM. Every conditioning point shares the same weight. An inappropriate template setting may introduce uncorrelated and redundant information to the reconstruction program.

The second way to assess 2D stochastic models is ANODI. From the perspective of geostatistics, there are two variabilities within the stochastic modeling: spatial uncertainty and morphological similarity. The former is measured by the average distance between two simulated realizations. In contrast, the latter represents the consistency between TI and reconstructed models. A competitive modeling method not only enlarges the distance between two models but also reduces the difference with TI. Using two microstructure model sets $A$ and $B$ as input, ANODI outputs three ratios. (1) The ratio $rA,Buncertainty$ compares two sets in terms of spatial uncertainty. $rA,Buncertainty\xa7amp;gt;1.00$ reveals that the set $A$ has larger diversity than set $B$. (2) The morphological similarity is quantified by $rA,Bsimilarity$. A small value indicates that $A$ is powerful to reproduce spatial structures in TI. (3) $rA,Boverall=rA,Buncertainty/rA,Bsimilarity$ summarizes previous two aspects. $rA,Boverall\u22481.00$ indicates two methods have similar reconstruction quality. The program regards $A$ as a better program when the value of $rA,Boverall$ is larger than 1.00. The technical details of ANODI are elaborated by Tan et al. [42]. Based on 2D models, ANODI result is shown in Tables 1 and 2. In the first test, column-oriented searching becomes the method $A$, while the existing row-oriented program is input as method $B$. Since every ratio is close to 1, the column-oriented searching program has comparable accuracy to the previous method. In Table 2, the proposed ColSIM is the method $A$, while other MPS programs are viewed as the method $B$. In accordance with the finding in Figures 9 and 10, we use DS and TDS realizations which is created by a template with twenty points. A template of size 7 × 7 is applied by IMPALA, NNSIM, and ColSIM. As Table 2 displays, the similarity ratios of IMPALA and ColSIM are close to 1. This phenomenon implies that microstructures in TI are well preserved and pasted into the SG. According to the preceding evaluation metrics, we can conclude that our column-oriented program significantly improves the computational efficiency while yielding reliable microstructure models.

Template size | Uncertainty ratio | Similarity ratio | Overall ratio |
---|---|---|---|

3 × 3 | 0.96:1.00 | 1.00:1.00 | 0.96:1.00 |

5 × 5 | 1.01:1.00 | 0.96:1.00 | 1.05:1.00 |

7 × 7 | 0.97:1.00 | 0.99:1.00 | 0.98:1.00 |

9 × 9 | 1.01:1.00 | 1.01:1.00 | 1.00:1.00 |

Template size | Uncertainty ratio | Similarity ratio | Overall ratio |
---|---|---|---|

3 × 3 | 0.96:1.00 | 1.00:1.00 | 0.96:1.00 |

5 × 5 | 1.01:1.00 | 0.96:1.00 | 1.05:1.00 |

7 × 7 | 0.97:1.00 | 0.99:1.00 | 0.98:1.00 |

9 × 9 | 1.01:1.00 | 1.01:1.00 | 1.00:1.00 |

Method A | Method B | Uncertainty ratio | Similarity ratio | Overall ratio |
---|---|---|---|---|

ColSIM | IMPALA | 0.97:1.00 | 0.99:1.00 | 0.98:1.00 |

ColSIM | DS | 0.90:1.00 | 0.67:1.00 | 1.35:1.00 |

ColSIM | TDS | 0.89:1.00 | 0.64:1.00 | 1.39:1.00 |

ColSIM | NNSIM | 0.86:1.00 | 0.84:1.00 | 1.02:1.00 |

Method A | Method B | Uncertainty ratio | Similarity ratio | Overall ratio |
---|---|---|---|---|

ColSIM | IMPALA | 0.97:1.00 | 0.99:1.00 | 0.98:1.00 |

ColSIM | DS | 0.90:1.00 | 0.67:1.00 | 1.35:1.00 |

ColSIM | TDS | 0.89:1.00 | 0.64:1.00 | 1.39:1.00 |

ColSIM | NNSIM | 0.86:1.00 | 0.84:1.00 | 1.02:1.00 |

Then, we focus on the effectiveness of the contrastive loss term. On one hand, the proposed method is employed to generate ten 3D realizations. As the only one parameter, the temperature factor is specified as 0.005. Four weights in the probability aggregation formula are fixed as 0.25. Accordingly, the contrastive loss term and three conditional probabilities share the same contribution in the 3D reconstruction. On the other hand, Bordley formula with prior probability term is performed to create 3D models from a 2D slice. At first, the computer creates ten models with the weight $w1=0.25$. In other words, the importance is uniformly assigned into four terms. Therefore, the only difference between the previous and the proposed method is the first term in probability aggregation. Next, a range of weights are tested to generate models. We independently configure the weight $w1$ as 0.4, 0.5, 0.6, 0.7, and 0.8. The computer produces two realizations with a given parameter. Thus, ten 3D asphalt mixture models are produced. It should be noted that three conditional probabilities in equation (1) have the same weight. The values of $w2$ and $w3$ are always equal to $w1$. The weight $w0$ of the prior probability $p0$ is computed by $w0=1-\u2211i=13wi$. With the aim of saving time, we use column-oriented searching to find matching instances. The reconstruction results are shown in Figure 11. Moreover, we scan the asphalt mixture sample with CT. The 3D model becomes a benchmark to validate our simulated realizations. According to the visual inspection, the realizations created by the proposed contrastive loss term have comparable microstructure to the CT model. By comparison, the value of weight $w1$ has a substantial influence on the simulation quality. There are significant differences between five models in Figure 11(d). Considerable time is consumed to find the suitable parameter which reproduces the similar structures to TI.

Three evaluation methods are launched to test our simulation method. First, we check the volume fraction of each material phase. In Figure 12(a), the horizontal and vertical axes separately represent the proportion of pore and cement. Second, the statistical functions are performed. As shown in Figures 12(b) and 12(c), our program individually calculates $S2(r)$ and $L(r)$ on the basis of 3D microstructure images. Third, we investigate the effect of pore characteristics on the physical property. The electrical behavior is predicted based on 3D asphalt mixture models. In this work, we adopt the finite element method (FEM) program developed by the National Institute of Standard and Technology (NIST) [43]. There are three basic steps: (1) The computer specifies the electrical conductivity of each material phase. In this case, the properties of pore, cement, and matrix are individually set as 0, 3 × 10^{−9}, and 1 × 10^{−3} S/m. (2) According to the change of the dielectric displacement, an adaptive meshing technique is conducted to partition 3D models. Linear interpolation is activated to calculate the potential distribution. (3) Based on the potential distribution, the program computes effective electrical conductivity by minimizing the electrical energy using a fast conjugate-gradient solver. Using thirty MPS models as the input, thirty electrical conductivity samples are predicted. In order to quantify uncertainty, we conduct kernel density estimation to create two probability distributions. Furthermore, the electrical behavior measured by the CT model becomes the ground truth. The calculation results are shown in Figure 12(d).

Based on the evaluation result, weight specification in the previous Bordley formula has a considerable influence on the reconstruction quality. Finding the proper parameter is a time-demanding and labor-intensive step. By comparison, our contrastive loss term automatically calibrates the volume fraction. In Figure 12, the models created by our contrastive loss term have comparable characteristics to TI. The statistical and physical properties of ColSIM realizations are consistent with TI and the CT model.

### 4.2. 3D Heterogeneous Shale Reconstruction

Digital rock is an important material to study the nonconventional petroleum reservoir. 3D shale characterization and reconstruction play an integral role in permeability prediction, flow simulation, and reservoir evaluation [44, 45]. With the objective to explore the interaction between pore characteristics and physical property, it is beneficial to create a collection of 3D stochastic models. However, reproducing heterogeneous and nonstationary microstructure is a challenging task. In this section, we focus on the performance of the proposed TI selection method. As mentioned in section 3.1, our program carries out MHD, tSNE, and spectral clustering to group 2D candidate images. Fourteen representative images are selected to inspire MPS simulation. In addition, our program divides SG according to the clustering results. On the basis of Figure 4, the simulation domain is partitioned into fourteen subareas in this shale case. For each subarea, the computer builds an independent dataset to collect microstructures. During the shale reconstruction, the morphological characteristics are reproduced in the specified subarea. For example, the first cluster encompasses 2D slices whose $z$-coordinate ranges from 0 to 6. $TI2$ is selected as the representative. Therefore, our program extracts microstructures from $TI2$ and pastes suitable patterns into the first seven layers of SG. It is worth noting that we insist on predicting an unknown point in 3D domain in accordance with 2D TI. Column-oriented searching and contrastive loss term are adopted to save running time. Other parameters include a template of size 5 × 5, a multigrid strategy with $G=4$ and a temperature factor with 0.005. The proposed heterogeneous MPS method creates ten shale models. The first realization is shown in Figure 13(b).

Moreover, the 3D MPS modeling without the proposed TI selection is performed. In this section, we name this method as the stationary MPS. At first, the program attempts to generate 3D models from a 2D slice. In order to control the size of pattern dataset and facilitate efficiency, one TI is randomly sampled from 128 candidate slices. According to the homogeneous assumption, the computer produces 3D realizations with column-oriented searching and probability aggregation framework. Second, all 128 slices are employed as TIs. MPS program uniformly reproduces microstructures in the 3D domain. However, the involvement of multiple TIs leads to the dramatic growth of pattern dataset. The preceding two programs are individually performed to create ten models. Parameters are the same as the heterogeneous MPS program. Third, 3D TI becomes prior knowledge. A template of size 5 × 5 × 5 pixel^{3} is used to extract 3D patterns from the CT model. The computer synthesizes ten shale models in accordance with 3D microstructure. The reconstructed models are shown in Figures 13(c)–13(e).

Next, we conduct a range of evaluation methods to check the modeling accuracy. Prior to quantitative analyses, we manually inspect the nonuniformity and morphological evolution in the shale system. The 2D slices whose $z$-coordinate is 0, 32, 64, 96, and 127 are separately displayed. As shown in Figure 14, our heterogeneous MPS model matches with the CT model. For the slice whose index is 0, the proposed ColSIM reproduces the pore of big size. By contrast, the small pore presents in $z$ = 32. Furthermore, our method reconstructs the pore of moderate size in the range [64, 127]. However, there is no significant variation within stationary MPS realizations. The pore microstructure is uniformly dispersed.

The heterogeneity within shale models is further investigated by the statistical function. In this case, we view a 3D model as a stack of 2D images. Two-point correlation function and lineal path function are measured based on each 2D slice. Accordingly, two multivariate functions $S2(r,z)$ and $L(r,z)$ are created. Here, $z$ denotes the $z$-coordinate of the 2D image under investigation. $r$ is the distance between two points. The computer specifies the pore as the target phase. Figures 15 and 16 show the calculation results on the first MPS model. There are twofold phenomena. On one hand, several peaks and valleys are presented in CT sample, and the models created by the heterogeneous MPS. The fluctuation indicates that there is distinct nonstationarity in statistical characteristics. On the other hand, large flat areas exist in the last three models in Figures 15 and 16. The pore structure is evenly recreated by the stationary MPS.

In addition, we independently inspect the short-distance correlation as well as long-range connectivity. $S2(r=3,z)$, $S2(r=12,z)$, $L(r=3,z)$, and $L(r=12,z)$ are individually displayed in Figure 17. We display the average behavior of ten 3D models. The purple curves are consistent with the red. In contrast, the cyan, blue, and green lines have different behaviors. These findings provide evidence that our heterogeneous MPS is capable of reproducing spatially evolving microstructures.

With the purpose of investigating the morphological variation, we calculate statistical functions in the direction of heterogeneity. Figure 18 provides the computation results. To avoid confusion, the average values of two-point correlation functions and lineal path functions are displayed. Moreover, we compute the mean distance between TI and realization functions. It is evident that realizations created by our method have the smallest distance to TI. The proposed TI selection program has an ability to reproduce the heterogeneous system.

Next, the physical descriptor is applied to examine our method. In petroleum engineering, pore size distribution contributes to the reservoir permeability and transport competence. In this work, we employ the maximum ball algorithm to categorize microstructure into two classes: pore and throat. The pore focuses on storing fluid, while the throat connects two pore bodies. Figure 19 displays the histogram of pore volume. For each MPS method, the average behavior is provided according to ten shale models. To highlight the big storages, the pore bodies whose volume is larger than 400 and 1,000 pixel^{3} are recorded. Moreover, MDS is implemented to visualize the difference between two distributions. In the feature space, two close points reveal that there is an intensive similarity between two models. Two phenomena are observed in these diagrams. (1) The pore volume distributions of CT and heterogeneous MPS are different from those of stationary MPS. There are numerous pore bodies of small size in the stationary MPS realizations. (2) The pore volumes of CT and our method are comparable. The large pore is presented in the CT model as well as heterogeneous MPS realizations. The spatial dependency within the heterogeneous microstructure is well conserved by the proposed program.

Finally, we check the influence of microstructures on the electrical property. In order to observe anisotropy in this shale system, FEM is performed to separately measure the effective electrical conductivity along $x$-direction, $y$-direction, and $z$-direction. Therefore, the computer outputs three predictions based on one 3D shale model. Aiming at emphasizing the connected component, we presume that the pore space is full of water. Therefore, the pore becomes an intensive conductor to current. The electrical conductivities of rock and water are configured as 1 × 10^{−3} and 1 S/m, respectively. Next, kernel density estimation is performed to calculate the probability distributions. Moreover, the electrical conductivity measured based on the CT model becomes the benchmark. The calculation results are shown in Figure 20. The prediction of heterogeneous MPS agrees with CT model. The peak of purple distribution is close to the red line. By contrast, three stationary MPS programs do not provide viable estimations.

In summary, we perform heterogeneous shale reconstruction in this section. Multivariate statistical function, pore volume distribution, and electrical property are used to quantitatively evaluate the reconstruction quality. The computational results indicate that our realizations have comparable properties to the benchmark model. The proposed ColSIM program has the ability to conserve the morphological variation in the TIs.

## 5. Conclusions

In this paper, we explore a computational simulation method to fulfill stochastic reconstruction of 3D heterogeneous microstructure. Based on the proposed ColSIM, a group of porous medium models are efficiently generated. High-quality models play an essential role in identifying the relationship between pore structure and physical property. First, a TI selection method is proposed to find representative structures. Our program explains the nonstationary system as a spatial evolving process consisting of frequent transitions and abrupt shifts. MHD, tSNE, and spectral clustering are used to organize candidate images. Second, we introduce column-oriented searching into MPS framework. Compared with the previous row-oriented searching, the proposed method rapidly addresses the query pattern associated with missing data. In order to deal with complicated instances, our program adopts the early stopping strategy to improve the generalization ability. Third, a contrastive loss term is suggested into the probability aggregation framework. During the stochastic reconstruction, our program constantly monitors the difference between the present model and the target. The volume fraction of each phase is automatically calibrated.

We test the performance of our method in the context of a multiphase material and a heterogeneous shale system. The proposed ColSIM creates a set of 2D and 3D microstructure models. Compared with existing methods, our program significantly saves running time and simplifies parameter specification. The reconstruction results are quantitatively evaluated by statistical functions, ANODI, electrical conductivity, and pore volume distribution. The calculation results indicate the proposed ColSIM exhibits competitive performance in terms of calculation efficiency, microstructure reproduction, and spatial uncertainty. The morphological characteristics and physical properties are in agreement with the target. Considering its efficiency and accuracy, we believe that the proposed method has the potential to be applied to a broad range of porous media.

It is recommended that future research could be undertaken in the following directions: First, the template configuration is an essential component in MPS simulation. In our ColSIM, we specify the sequence of template points according to their proximity. Our program concentrates on points that have a small distance from the template center. Therefore, a natural progression of this work is to calculate the point-checking sequence according to the spatial dependency of TI. Second, the proposed ColSIM is applicable for the microstructure reconstruction of categorical variables. Further investigation into the simulation of continuous variable is strongly encouraged.

## Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

## Acknowledgments

The work was supported in part by National Science Foundation of China (NSFC) [41874140 and 52108395], Natural Science Foundation of Shaanxi Province [2022JQ-227 and 2021JQ-289], and Fundamental Research Funds for the Central Universities of China [300102341308].