Lithostratigraphic modeling holds a vital role in mineral resource exploration and geological studies. In this study, we introduce a novel approach for automating pseudo-lithostratigraphic modeling in the deep subsurface, leveraging inversed geophysical properties. We propose a three-dimensional convolutional neural network with adaptive moment estimation (3D Adam-CNN) to achieve this objective. Our model employs 3D geophysical properties as input features for training, concurrently reconstructing a 3D geological model of the shallow subsurface for lithostratigraphic labeling purposes. To enhance the accuracy of pseudo-lithostratigraphic modeling during the model training phase, we redesign the 3D CNN framework, fine-tuning its parameters using the Adam optimizer. The Adam optimizer ensures controlled parameter updates with minimal memory overhead, rendering it particularly well-suited for convolutional learning involving huge 3D datasets with multi-dimensional features. To validate our proposed 3D Adam-CNN model, we compare the performance of our approach with 1D and 2D CNN models in the Qingniandian area of Heilongjiang Province, Northeastern China. By cross-matching the model’s predictions with manually modeled shallow subsurface lithostratigraphic distributions, we substantiate its reliability and accuracy. The 3D Adam-CNN model emerges as a robust and effective solution for lithostratigraphic modeling in the deep subsurface, utilizing geophysical properties.

Litho-strata manifest a broad range of rock properties arising from distinct geological processes, for example, weathering, compaction, metamorphism, and deformation. These intricate processes, when combined, yield complex representations of various litho-strata. Furthermore, lithostratigraphy dictates the physical and chemical attributes of the litho-strata, the distribution of which is intimately intertwined with mineral resource distribution. Traditional lithostratigraphic identification methods, for example, borehole drilling and trenching, necessitate manual interpretation of field-collected or borehole core samples. While direct, these methods prove costly, time-consuming, and inadequate for identifying deep subsurface litho-strata in a large area. Additionally, human subjectivity and experience wield considerable influence over these conventional approaches’ outcomes. Hence, there exists a need for more reliable lithostratigraphic identification methods using physical or chemical properties. Three-dimensional geological modeling plays a vital role in portraying subsurface spatial characteristics, for example, litho-strata, fault networks, physical and chemical properties, and quantitative mineralization [1-3]. With the advancements in deep-penetration geophysical and geochemical exploration, the prospect of studying geological bodies’ physical and chemical properties from a 3D perspective becomes feasible [4-6]. These methodologies deepen the comprehension of the lithostratigraphy’s spatial distribution patterns and guide mineral resource exploration.

In recent years, advancements in computing capacity have empowered geologists to pursue more intelligent lithostratigraphic identification. Lithostratigraphic identification can be framed as a classification problem, and a variety of machine-learning methods have been employed for litho-stratigraphic classifications of 1D well-logging and 2D geological maps or images. Support vector machines, capable of transforming linearly indistinguishable samples into distinguishable ones, encounter practical challenges with large-scale training samples [7, 8]. Decision tree-based ensemble methods, for example, random forests [9-11] and gradient-boosting decision tree [12-14], overlook cross-correlation between different attributes in the dataset despite being less affected by category imbalance. Generally, convolutional neural networks (CNNs), as a deep learning method, can enhance the efficiency and scalability of 1D and 2D lithostratigraphic identification [15-17]. Based on neural networks, CNNs can preserve the input’s neighborhood correlations and spatial localities in their latent higher-level feature representations. Meanwhile, feature extraction using CNNs enables users to achieve the feature extraction without fully understanding the specific features. CNN techniques can significantly improve 2D lithostratigraphic identification accuracy [18]. The visual geometry group (VGG) network, a CNN with a deep network structure and small convolution kernels, excels in capturing detailed features of 2D remote-sensing images while controlling computational parameters to manage workload [19].

Three-dimensional convolution was initially proposed for learning spatiotemporal features [20], in which each convolution kernel can obtain a 3D feature. The 3D kernel is not constrained to the length and width of 2D but slides across all dimensions of the input feature map, producing a 3D output tensor instead of a 2D matrix [21, 22]. Compared with a 2D CNN, a 3D CNN can be implemented by combining 3D convolution kernels into a cube formed by stacking multiple slices, thereby capturing 3D spatial features from the 3D dataset [23]. In subsurface geological pseudo-modeling, 3D CNNs have been instrumental in detecting objects from 3D seismic images. For instance, Shi et al. [24] proposed a 3D CNN model for automatic salt body detection in a 3D single-channel seismic amplitude by analyzing learned visual features [24]. Wu et al. [25] introduced a 3D CNN model for detecting and extracting faults and horizons from 3D seismic images by analyzing structural features learned from simulated 3D structural models and synthetic seismic images [25]. However, limited research has applied 3D CNNs to other geophysical properties inversed from gravitational, magnetic, and electromagnetic explorations.

This study aims to extend VGG’s capabilities to 3D pseudo-modeling for identifying lithostratigraphy in deep subsurface space, with a maximum depth of −2000 m, using a 3D dataset featuring multichannel inversed gravitational, magnetic, and electromagnetic properties in the Qingniandian area of Heilongjiang Province, NE China. Our approach involves learning the nonlinear mapping relationship between the variations of multiple physical properties and a manually constructed lithostratigraphic model in the shallow subsurface, with a maximum depth of −200 m. The experimental results validate the reliability of the proposed 3D CNN method in lithostratigraphic identification, enabling automatic identification of related deep subsurface lithostratigraphy in a large area.

In this study, we employ three geophysical properties (density contrast, magnetic susceptibility contrast, and resistivity) to train a 3D CNN-based pseudo-lithostratigraphic classification model, capable of identifying fourteen types of rocks or strata in deep 3D subsurface. We use these three geophysical properties as the input feature channels in the training blocks, which are partitioned from the shallow subsurface using marching cubes, and we consider the lithostratigraphy of the central voxel of the block as the block’s label. During the training process, the lithostratigraphic labels play a significant role in constraining the weight coefficients of the convolutional layer.

2.1. Architecture

While a CNN can directly learn end-to-end mappings and has achieved excellent results on 2D images [26], the training of 3D CNNs for 3D dataset demands increased computation complexity compared with 2D imagery datasets [27]. To assess recognition accuracies and training efficiencies of 1D, 2D, and 3D convolution kernels, we redesign the architecture of the 1D (along vertical direction), 2D (in horizontal plane), and 3D CNNs to suit pseudo-lithostratigraphic modeling, as shown in Figure 1. The parameter counts of the three models are detailed in online supplementary Table S1. For instance, the 3D CNN model comprises five 3D convolutional layers with the number of convolutional kernels gradually increasing from 64 to 512. Increasing the architecture depth with small convolutional filters significantly improves the training and recognition performance of convolutional networks [28-30]. For 3D CNNs, we utilize the convolution layer with a receiving field size of 3 × 3 × 3, which is the smallest filter that can capture direct neighborhood characteristics, and the convolution stride was set to 1. The smaller convolution kernel can achieve the same scale-receptive field as the larger convolution kernel by stacking multiple layers. We also employ three stacked 3 × 3 × 3 convolution kernels to approximate the effect of a 7 × 7 × 7 convolution kernel and two stacked 3 × 3 × 3 convolution kernels to approximate the effect of a 5 × 5 × 5 convolution kernel in the neural network structure. This reduces the number of network parameters, deepens the network, and improves network capacity. We apply batch normalization [31, 32] after each convolution layer to standardize the output characteristic into a standard normal distribution, ensuring a consistent input distribution for the next convolution layer and avoiding the vanishing gradient problem. After the calculations of the convolution layers are complete, a max-pooling layer [33] is added to replace the feature of a single voxel with the characteristic statistics of its neighboring voxels. The max-pooling cube size is set to 2 × 2 × 2, and the step size for pooling is 2. The pooling layer down-samples the output characteristics of the convolution layer into the characteristic of a single voxel, allowing for a wider range of characteristics from its neighborhood. We introduce the rectified linear unit (ReLU) activation function between the convolutional layer and the pooling layer to enhance the nonlinearity of the neural network model. To prevent overfitting and improve model generalization, we employ a dropout layer that randomly deactivates neurons after the activation function, with a dropout rate set to 0.2. Additionally, we use a fully connected layer to integrate the output characteristics of the convolutional and pooling layers to make full use of the existing high-order characteristics. After each fully connected layer, we apply the ReLU activation function and a dropout layer with a dropout rate of 0.5. Finally, the output of the fully connected layer is passed into the SoftMax layer [34, 35] to obtain the classification result, providing the predicted probability for each class.

2.2. Adam Optimizer

Since Adam optimizer can handle sparse gradient and nonstationary targets, requires less memory, and is also suitable for large datasets and high-dimensional spaces [36], we use the Adam optimizer to optimize our training network. The optimizer’s purpose is to minimize the loss function by updating the weight parameters, typically to minimize the difference between the predicted and actual values [37]. The network weight θ is initialized according to a Gaussian distribution and adjusted during the iterations. The loss function was increasingly used to optimize θ during continuous iterative tuning. The Adam optimizer is an optimization algorithm based on a random gradient to assign adaptive individual learning rates to each neural network parameter [36, 38, 39]. The formula for calculating the adaptive learning rate of parameters using Adam is as follows:

E[g2]t=βE[g2]t1+(1β)gt2
(1)
θt+1=θtηEgt2+εgt
(2)

where gt represents the gradient on the current mini-batch, t is the current iteration number, the coefficient β is usually set to 0.9 based on experiences, θ denotes the model weight, η is the step size, and the smoothing term ε typically defaults to 10-6.

For Adam optimizer, setting the learning rate too low results in slow training convergence due to tiny adjustments of model updates in each iteration and may cause overfitting due to its sensitivity to noises and details in the training dataset. Conversely, a high learning rate can improve training but may lead to the problem of exploding gradient, where substantial parameter updates due to rapidly increasing gradients result in unstable model training. Hence, we set an adjustable learning rate to avoid exploding gradients and expedite training. We initially set a high learning rate to accelerate the training process in the training phase. As the number of iterations increases, the learning rate decreases. The specific formula is as follows:

lrt+1=lrt11+tdecay
(3)

where lr is the learning rate and decay is the decay rate.

The Adam optimizer compares the predicted output values with the actual real values and evaluates them using multiple indicators (primarily accuracy and loss). Subsequently, the Adam optimizer calculates the difference between the predicted and actual values, updates the parameters of the network, and then iteratively trains the 3D CNNs until satisfactory results are achieved.

2.3. Model Evaluation

To evaluate the model’s performance, we consider accuracy, the categorical cross-entropy loss function, and the confusion matrix. We plotted accuracy and categorial cross-entropy loss curves for the training and testing dataset, utilizing true lithostratigraphic labels extracted from the shallow 3D geological model to ensure that the models do not overfit. Accuracy, representing overall precision, is calculated as the ratio of correctly predicted sampling voxels to the total number of sampling voxels:

Accuracy=TP+TNTP+TN+FP+FN
(4)

where TP, TN, FP, and FN denote the numbers of true-positive, true-negative, false-positive, and false-negative sampling voxels in the training or testing datasets, respectively. A higher accuracy value indicates better model performance.

The categorical cross-entropy is a loss function for multiple categorical probabilities [40, 41], where the loss equals zero only when the actual label and the predicted label match; otherwise, it is a positive value. The formula for calculating categorical cross-entropy is as follows:

loss=1Ni=1NyilnPii=1N(1yi)ln(1Pi)
(5)

where N represents the number of samples, yi is a binary indicator function detecting whether the ith training pattern belongs to the true category, and Pi represents the probability that the sample is predicted to be its true label by the SoftMax layer.

A confusion matrix, also known as an error matrix, is a standard format for classification precision. The value of each element in the confusion matrix is precision, calculated as follows:

Ci,j=Pi,jClassi×100%
(6)

where Classi represents the total number of the ith class corresponding to each row of the confusion matrix, and Pi,j represents the number of samples belonging to the ith class that are predicted as the jth class. Therefore, the on-diagonal element Ci,i of the confusion matrix represents the proportion of samples belonging to the ith class that are correctly predicted, while the off-diagonal element Ci,j(ij) represents the proportion of samples belonging to the ith class that are incorrectly predicted as the jth class.

3.1. Study Area

The Qingniandian area, situated in the southwestern part of Duobaoshan–Daxintun metallogenic belt, is one of the significant gold and polymetallic metallogenic belts in Heilongjiang Province, China [42-44] (Figure 2). The structural complexity in this area and its adjacent regions began to develop during the Mesozoic period. The primary structural features include NNE-trending faults followed by NW-trending faults, which exert significant structural influence on the Mesozoic-related stratigraphic units. NNE-trending faults generally exhibit straight and have zigzag structural features, serving as the primary ore-controlling structures in this area. The main period of fault formation occurred during the Mesozoic period. Hydrothermal fluids, associated with Early Cretaceous volcanic rocks, predominantly migrated along these faults. Moreover, geochemical anomalies and mineralization alterations are mainly distributed along these fault zones.

This metallogenic belt encompasses medium- to large-sized representative deposits, including the Zhengguang gold deposit, Handa gold deposit, Wudaogou gold deposit, and Tongshan copper deposit. These deposits exhibit variations in types, ranging from gold-bearing quartz vein type to fractured and altered rock type [45-47]. Their metallogenesis is also diverse, including magmatic-hydrothermal, volcanic, and subvolcanic-hydrothermal deposits.

The lithostratigraphy of the Qingniandian area and adjacent regions mainly comprises Early Carboniferous granodiorite (γδC1), Early Carboniferous monzogranite (ηγC1), Early Carboniferous syenogranite (ξγC1), Lower Carboniferous Keluo complex (C1kl), Late Carboniferous alkali feldspar granite (κγC2), Upper Carboniferous to Lower Permian Baoligaomiao Formation (C2P1bl), Lower Cretaceous Jiufengshan Formation (K1j), Lower Cretaceous Ganhe Formation (K1g), Lower Cretaceous Gushanzhen Formation (K1gs), Holocene Quaternary Harbin Formation (Qp3h), Holocene Quaternary valley sediments (Qh2), and a very small area of quartz vein (q), diorite porphyrite (δμ), and Andesite dike (σK1), as shown in Figure 3. These various litho-strata exhibit distinct petrographic compositions, as shown in Table 1, leading to different geophysical features.

The intrusive rocks in the study area predominantly consist of Early Carboniferous granite complex rocks and dynamic metamorphic deep intrusive rocks that typically manifest as strip and eyeball structures. The granite complex rocks are mostly mylonitic granite and granitic mylonite, with mylonitic foliation direction oriented in the northeast direction. Silicification, pyrite, and chalcopyrite are observed in granitic mylonites, which bear indicative significance for gold and copper mineral explorations. The Early Cretaceous period in the study area witnessed strong volcanic eruptive activities, primarily characterized by basic and intermediate basic volcanic activities during the Ganhe Period. These volcanic rocks were basaltic and andesitic in composition, and some underwent alteration processes involving clayization or ferritization. The hydrothermal fluid emanating from volcanic magma served as the primary thermal fluid responsible for the formation of low-temperature minerals, bearing indicative significance for polymetallic mineral explorations. Hence, recognizing the spatial distribution of these litho-strata based on the geophysical explorations in the study area is imperative.

3.2. Datasets

3.2.1. Shallow Subsurface 3D Geological Model

To reconstruct a geological model of the shallow subsurface, ranging from ground surface (approximately 300 m) to −200 m in the study area, we initially obtained 3D fault and lithostratigraphy boundaries from the planar geological map’s map-cut cross-sections. Subsequently, we tiled lithostratigraphy boundaries into surfaces using triangular facets. After extending and refining the lithostratigraphy boundary surfaces, we obtained a topologically consistent geological solid model, ensuring there are no gaps or intersections between different boundary surfaces. Finally, the geological voxel model (Figure 4) was generated through rasterization from the solid model.

3.2.2. Inversed Geophysical Properties

In this study, we employed three types of inversed geophysical properties, that is, density contrast, magnetic susceptibility contrast, and resistivity, as predictors for lithostratigraphy. The 3D geophysical properties, inversed from geophysical gravitational, magnetic, and electromagnetic explorations by the Shenyang Center of China Geological Survey, serve as crucial features for lithostratigraphy recognition and ore-forming process analysis in the deep subsurface. Distinct litho-strata exhibit variations in the signals obtained from various geophysical explorations, which are reflected in the reversed geophysical features. As depicted in online supplementary Figures S1–S3, the geophysical properties of different litho-strata, as labeled by the shallow subsurface 3D geological model of the study area, fall within distinct ranges. Variations in density, magnetism, and resistivity are attributed to the different mineral compositions of litho-strata, impacting geophysical explorations. Therefore, we explored the nonlinear mapping relationship between inversed geophysical features and lithostratigraphy to enable lithostratigraphy recognition in deep subsurface.

Although the spatial distribution of lithostratigraphy is primarily known from the 3D geological model resulting from cross-sections in the shallow subsurface of the study area, we employed the 3D shallow subsurface geological model and inversed geophysical model as training and testing datasets, as shown in Figure 5. This involves utilizing the inversed geophysical density, magnetism, and resistivity as input features and the lithostratigraphy from the geological model as a label. The model was subsequently utilized to predict the distribution of lithostratigraphy in the deep subsurface of the study area.

Each voxel in the 3D grid model contains seven attributes, that is, central coordinates (x, y, and z), density contrast, magnetic susceptibility contrast, resistivity, and lithostratigraphic label. We chose a 3D grid measuring 570 × 689 × 33 voxels from the shallow subsurface of the study area, with each voxel measuring 35 × 35 × 35 m. These voxels were divided into training (9,720,068 voxels) and testing (3,240,022 voxels) datasets, maintaining a 3:1 ratio. Employing a marching cube algorithm, we partitioned the 3D grid into a series of 1D (along the Z-axis), 2D (in the XY plane), and 3D blocks, serving as the samples, with a clipping stride of 1 voxel. This strategy allows us to obtain a larger number of samples. The lithostratigraphy, labeled using the shallow subsurface 3D geological model, for the central voxel of each block created by marching cubes serves as the label for the entire block (Figure 6). We utilized one-hot encoding to convert the lithostratigraphy label into a binary encoding format, facilitating processing by machine learning algorithms.

To assess the models’ generalization, we divided the available samples into training and testing datasets, with a ratio of 3:1 between them. The training process involves three stages: model training using the training dataset, evaluating model performance using the testing dataset, and using the trained model to predict lithostratigraphy in the deep subsurface of the study area.

We experimented with different optimizers, marching cube sizes, and training epochs, resulting in the design of seven distinct 3D CNN-based models for lithostratigraphy recognition. Through a comprehensive evaluation encompassing overall accuracy, classification precision, loss function, and prediction results, we identified an optimal 3D CNN-based model with a specific parameter combination.

4.1. Optimizer

The choice of optimizer significantly influences training outcomes. We tested three different optimizers, that is, stochastic gradient descent (SGD) [48, 49], AdaGrad [50, 51], and Adam, aiming to expedite training and enhance accuracy. As indicated in Table 2, the model employing the Adam optimizer demonstrates higher accuracy and lower loss values in both training and testing datasets. Figures 7 and 8 illustrate that the Adam-based model converges quickly and achieves superior training results. Notably, the lower performance on the training set than that on the testing set arises from our use of the dropout option during training. The confusion matrices presented in online supplementary Figure S4 indicate that the SGD- and AdaGrad-based models struggle to accurately identify certain litho-strata, for example, quartz vein and Andesite dike in the testing dataset, which have limited training samples. However, the Adam-based model excels in identifying these litho-strata.

We compared the predicted lithostratigraphy results with the actual lithostratigraphy in the training area, as depicted in online supplementary Figure S5. The SGD- and AdaGrad-based models differ significantly from the actual distribution, whereas the Adam-based model not only reconstructs the actual distribution but also defines clear lithostratigraphic boundaries. From the predicted results in the deep subsurface, as shown in online supplementary Figure S6, the SGD- and AdaGrad-based models wrongly identify newly deposited Cretaceous lithostratigraphy in the deep subsurface, which contradicts geological laws. In contrast, the Adam-based model achieves better results in lithostratigraphy prediction in the deep subsurface of the study area.

4.2. Marching Cube Size

Using a larger marching cube size on the samples provides more neighbors correlated to the central voxel of the block. Consequently, we tested three different marching cube sizes, that is, 17 × 17 × 17, 19 × 19 × 19, and 21 × 21 × 21, to optimize performance. As depicted in Table 3, increasing the cube size from 17 × 17 × 17 to 19 × 19 × 19 improves the model’s accuracy and reduces the loss on both the training and testing datasets. This enhancement is attributed to the larger marching cube size capturing more neighborhood characteristics and subsequently optimizing the model’s performance. However, when the cube size is increased from 19 × 19 × 19 to 21 × 21 × 21, the model’s performance shows only marginal improvement. The testing curve also indicates that increasing the marching cube size expedites model convergence during training, as shown in Figures 9 and 10.

The confusion matrices of the three models, as shown in online supplementary Figure S7, indicate that increasing the cube size from 17 × 17 × 17 to 19 × 19 × 19 significantly improved the model’s accuracy. While the model with a marching cube size of 21 × 21 × 21 displays superior performance metrics, its predicted complex lithological boundaries lack sufficient rationality. Nonetheless, increasing the cube size from 19 × 19 × 19 to 21 × 21 × 21 does not yield a noticeable improvement in model accuracy. Compared with the model with a marching cube size of 17 × 17 × 17, the prediction result of the model with a marching cube size of 19 × 19 × 19 exhibits better consistency with the actual lithostratigraphy (online supplementary Figure S8). The models with marching cube sizes of 17 × 17 × 17 and 21 × 21 × 21 incorrectly identify artifacts where several Cretaceous and Quaternary litho-strata appear in the deep subsurface of the study area (online supplementary Figure S9). Hence, the optimal result was obtained with the model with a marching cube size of 19 × 19 × 19.

4.3. Training Epochs

We designed three models with training epochs of 10, 25, and 50. Increasing the training epochs from 10 to 25 gradually improved model performance (Table 4, Figures 11, 12, and S10, 10). However, beyond twenty-five epochs, further increases do not significantly enhance model performance, as the model has already converged.

As training epochs increased from 10 to 25, the model’s lithostratigraphy prediction results increasingly aligned with the actual lithostratigraphy distribution (online supplementary Figures S11 and S12). Nevertheless, when the training epochs reach fifty, the prediction results in the deep subsurface appear inconsistent with geological principles. Furthermore, an increased number of training epochs inevitably leads to longer model training times.

5.1. Comparison of 1D, 2D, and 3D CNN Models

The convolutional layers play a pivotal role in extracting characteristics and learning the mapping correlation between geophysical properties and lithostratigraphic model. When the CNNs process input data, they capture spatial correlation characteristics within the study area, creating a large receptive field. By employing 3D convolution kernels on the samples, a larger set of correlated neighbors around the central voxel of the block is utilized compared with 2D and 1D convolution kernels. We evaluated the performance of three distinct convolution kernels, that is, 1D, 2D, and 3D. As depicted in Table 5, the 3D CNN model exhibits the highest accuracy and lowest loss values across both training and testing datasets. This best performance stems from the 3D convolution kernel’s capacity to capture a more comprehensive set of neighborhood characteristics. As illustrated in Figures 13 and 14, after twenty-five epochs, further iterations do not significantly enhance the performance of the 2D and 3D CNN models as they have already converged. In contrast, the 1D CNN model converges slowly, requiring 200 epochs, and achieves the lowest performance. Moreover, an increased number of training epochs leads to longer training times for the 1D CNN model. The confusion matrices, as shown in online supplementary Figure 13, highlight that the 3D CNN markedly enhances the model’s accuracy. Conversely, the 1D and 2D CNN models exhibit inferior performance metrics, with their predicted lithological boundaries displaying insufficient rationality. As shown in Figure 15, a comparison between the predicted lithostratigraphy results with the actual lithostratigraphy in the training area reveals significant discrepancies between the 1D and 2D CNN models. Conversely, the 3D CNN model not only reconstructs the actual distribution but also delineates distinct lithostratigraphic boundaries, demonstrating better alignment with the actual lithostratigraphy.

Examining the predicted results in the deep subsurface, as shown in Figure 16, both the 1D and 2D CNN models inaccurately identify newly deposited Cretaceous lithostratigraphy in the deep subsurface, a contradiction to geological laws. Conversely, the optimal result in the deep subsurface is achieved by the 3D CNN model with its ability to accurately capture subsurface lithostratigraphic characteristics.

5.2. Validations of 3D CNN Model

In this study, we applied the 3D Adam-CNN model to classify fourteen types of litho-strata. The experimental results demonstrate that the model using the Adam optimizer fits the actual lithostratigraphic conditions better and achieves convergence within a relatively short training period. The model with a marching cube size of 19 × 19 × 19 is sufficiently large to capture neighborhood-inversed geophysical features for optimal performance. Furthermore, increasing the number of training epochs does not significantly improve overall accuracy and prediction precision after reaching 25.

After parameter tuning and comparisons, the model utilizing the Adam optimizer, a marching cube size of 19 × 19 × 19, and twenty-five training epochs yields the best prediction results. We compared the statistical distribution difference of three geophysical properties in several actual and predicted litho-strata using histograms and fitted normal distribution curves (online supplementary Figure 14). The comparison results indicate that the distribution of three geophysical properties in several actual and predicted litho-strata is roughly consistent, particularly in the Cretaceous and Carboniferous litho-strata.

We selected a validation profile in the study area and compared the real lithostratigraphic distributions in the shallow subsurface on the profile with the predicted lithostratigraphic distributions by the model and the corresponding inversed geophysical properties on the profile, as shown in Figure 17. The validation profile indicates that the predicted lithostratigraphy results are relatively reliable and closely match the actual lithostratigraphic distribution, especially in the early Carboniferous granite, which is a crucial indicator for metallogenic prediction. In the validation profile, the lithostratigraphic distributions of deep and shallow subsurface spaces in the study area exhibit good continuity, with clear and smooth boundaries between various litho-strata, conforming to the sequence of geological sedimentation. The models achieve high accuracy in predicting granite and Cretaceous strata due to their unique mineral compositions and geophysical features. However, the models have high classification error rates in quartz veins and often misidentify them as granites due to two primary reasons: (1) the inversed geophysical properties of these quartz veins are similar to those of granite, making it difficult to distinguish the two litho-strata; and (2) these quartz veins only exist in a few areas with low thickness and a small number of samples, limiting the model’s ability to capture sufficient corresponding characteristics to fit the nonlinear relationship between geophysical properties and lithostratigraphy. Our 3D Adam-CNN model successfully identifies and analyzes the differences in three inversed geophysical features among different litho-strata, learning the mapping relationships between them to achieve correct lithostratigraphic modeling. Consequently, it infers the lithostratigraphic distribution in the deep subsurface of the study area. Given the lithostratigraphic distribution is closely linked to the mineral deposits, particularly certain litho-strata (e.g., γδC1, ηγC1, and ξγC1), considering their contact zones as prospecting indicators, the lithostratigraphic prediction results obtained through the proposed 3D Adam-CNN method can serve as essential indicators for mineral prospecting.

This study focuses on pseudo-lithostratigraphic modeling using inversed geophysical data based on the 3D CNN model, achieving satisfactory experimental results with high performance (e.g., an overall accuracy of 96.7%). The model’s predictions in the study area align with the geological chronological sequence, demonstrating the proposed 3D Adam-CNN model’s robust generalization ability. In conclusion, this study contributes as follows:

  1. Our approach leverages 3D geophysical properties as input features for training while employing the shallow subsurface 3D geological model as labels, which stands as a reliable pathway for pseudo-lithostratigraphic modeling in the deep subsurface.

  2. Compared with the 1D and 2D CNN models, our proposed 3D CNN model significantly elevates the accuracy of pseudo-lithostratigraphic modeling, providing valuable guidance for mineral prospecting in our study area.

  3. By comparing the predicted lithostratigraphic distributions with actual lithostratigraphic distributions in the validation profile, our 3D CNN model exhibits strong agreement with the actual lithostratigraphic model in the shallow subsurface dataset, effectively discerning boundaries and trends among different litho-strata in the deep subsurface.

There are several limitations that need to be addressed. A fundamental assumption of our proposed approach is that the same set of litho-strata distributes in both shallow subsurface and deep subsurface. However, the inherent inhomogeneity and anisotropy of the Earth’s subsurface weaken this generalization, particularly when training a model using a shallow subsurface dataset to predict the deep subsurface where litho-strata vary significantly. While our proposed 3D CNN framework displays applicability to lithostratigraphic modeling in other countries and regions, the specific workload required for labeling and training also diminishes its practicality. Furthermore, concerns about the interpretability of CNN approaches [52] and the need to elucidate the workings of CNNs to enhance algorithmic transparency of pseudo-lithostratigraphic modeling have been raised. Efforts to visualize the internal mechanism of CNNs and improve the understanding of the nonlinear mapping between lithostratigraphy and inversed geophysical properties are imperative. Moreover, integrating deep-level features and neighborhood correlations in the 3D CNN model has aimed to reduce subjectivity and reliance on simplistic statistical features in lithostratigraphic identification using inversed geophysical properties characterized by multiple interpretations. However, future validation should also consider borehole data and corresponding geomechanical properties to validate the predicted results and assess the model’s generalizability.

The dataset of the current study is not publicly available due to a data privacy agreement we signed with the Shenyang Center of China Geological Survey but is available from the corresponding author upon reasonable request.

We confirm that there is no conflict of interest.

The authors express their gratitude to Dr. Zhang Chun-peng and Dr. Liu Bao-shan (Shenyang Center of China Geological Survey) and Dr. Liu Wen-yu (East China University of Technology) for their kind assistance with the data collection. Special thanks are extended to the MapGIS Laboratory Co-Constructed by the National Engineering Research Center for Geographic Information System of China and Central South University for providing the MapGIS software (Wuhan Zondy Cyber-Tech Co. Ltd., Wuhan, China).

Exclusive Licensee GeoScienceWorld. Distributed under a Creative Commons Attribution License (CC BY 4.0).

Supplementary data