The ant-tracking algorithm is commonly used to extract faults in geological structures. However, obtaining 3D ant-tracking data requires the calculation of various volume attributes. To obtain satisfactory data, it is necessary to iterate through the parameters of these attributes to achieve reasonable continuity. Moreover, due to the numerous parameters involved, the algorithm can produce different outputs with each execution. In this study, we aimed to enhance the performance of the ant-tracking algorithm by combining it with a basic U-Net structure. The input and corresponding labels to the model are "cubic shaped" 3D data segmented from the original 3D seismic volume to facilitate cross-validation with distinct regions. We used the label data as the single ant-tracking result to minimize the operator’s bias by executing the ant-tracking with several different parameters and executors and then taking the average. An evaluative comparison of three different loss functions (MAE, RMSE, and MSE) was conducted to identify the optimal function for training the model. Across five out of the six metrics, MSE function demonstrated predominant performance, leading to its adoption. Apart from this, a significant number of misinterpreted faults led us to propose the post-processing algorithm named "Dual-Threshold Iteration." It was initially used to extract fine blood vessels branching out from large vessels in medical image segmentation and adapted in our work to ensure a high level of continuity while ignoring worthless noise. Comparison with the F1 score and the number of 3D-connected components confirmed that the proposed method could generate reduced bias and smoothly connected fault structures.

Fault structures are crucial patterns for tracking the accumulation of hydrocarbons and movement in underground reservoirs. Thus, fault mapping from three-dimensional seismic data is an essential interpretation step for evaluation. Despite the importance of faults, especially in areas with numerous layers, their quantitative interpretation remains a time-consuming and challenging task. Additionally, because individual operators manually select faults, the result varies based on the evaluator. The task also consistently requires expertise to ensure reliability. This emphasizes the necessity of advanced techniques that progress from manually digitizing faults to utilizing specific algorithms. Therefore, considerable effort has been devoted to developing algorithms and methods to enhance fault detection performance.

Various volume attributes, including semblance [1], variance [2], and coherency [3] are commonly employed for fault detection. However, seismic attributes are vulnerable to both seismic noise and complex stratigraphic features, introducing a potential for significant misinterpretation [4]. Unlike the single volume attributes, the ant-tracking algorithm initiates with a fault attribute and utilizes "ant agents" that follow assumed fault features. The algorithm is usually applied to the variance attribute by a predefined set of parameters such as orientation ranges and step sizes. This application allows for the extraction of faults while limiting the influence of random noise [5].

However, despite these advantages, ant-tracking also has several limitations. The most serious drawback of the ant-tracking algorithm is that it requires a considerable number of seismic attributes to derive results, which increases the number of hyperparameters that need to be adjusted. Until satisfactory results are obtained, the operator must iteratively adjust the parameters of each attribute. Furthermore, predicting the final result during intermediate processes is difficult. This situation leads to using ant-tracking in a trial-and-error approach, with executors differing in their interpretations of what noise artifact is or is not. Consequently, the executor’s bias is reflected in the outcomes to an increasing extent [6]. Other disadvantages include excessive false positives and noise-sensitive seismic attributes that could have been eliminated with the manual approach, all of which highlight the need to improve the algorithm’s efficiency [7].

Recently, methods based on machine learning (ML) have found successful applications in the geoscience field. Especially, several convolutional neural network (CNN) methods have been proposed to extract faults with multiple seismic attributes. Wu et al. [7] implemented fault segmentation using a conventional U-Net trained with synthetic seismic images. Liu et al. [8] utilized a Residual U-Net (ResUNet) for more accurate fault structure prediction than conventional U-Net-based algorithms, utilizing artificial training data. Cunha et al. [9] proposed a transfer learning approach with a base model trained on synthetic data, aiming to reduce tuning and training time. Similar to conventional auto-detection methods, ML allows for faster execution compared to manual approaches. Furthermore, once a robust edge detection model is established, results can be obtained without parameter adjustments or human evaluation. However, ML-based methods are inherently dependent on the training data, and can produce uncertain results when analyzing geological structures that have not been trained before. Synthetic data as training data is ineffective because these data mostly lack the complexity of real geological structures, resulting in outcomes that can vary depending on the specific synthetic dataset employed. Training with manually selected data, similarly, does not differ significantly from training with synthetic data, as these data often fail to capture intricate details [10]. Additionally, training with a large amount of guaranteed manual data is time-consuming and challenging.

Our goal is to imitate the overall less-biased ant-tracking workflow by using a deep learning model. To mitigate bias, we aggregate the ant-tracking results obtained from various executors using a combination of different parameters, each following their unique standards. The initial seismic volume and corresponding ant-tracking results were cropped into a smaller 3D volume, followed by various data augmentation processes. We employed a U-Net model with a newly introduced post-processing algorithm named Dual-Threshold Iteration (DTI) to enhance the continuity and model performance. Consequently, combining the U-Net-based model and post-processing algorithm demonstrates our comprehensive approach to fault detection in seismic volumes.

2.1. Ant-Tracking Seismic Attribute

Supervised learning requires labeled input data. To acquire validly labeled images for learning, it is crucial to address the bias issue when the ant-tracking algorithm is executed. There is no guarantee that the obtained results by a single operator accurately mimic "True Faults" in the seismic volume, and even the criteria for satisfactory results can vary depending on the operator’s tendency. Figure 1 depicts fault volumes identified by different executors and parameters, highlighting the significant variations in fault trends.

The exact volume attributes that operators use to generate ant-tracking data are introduced in Figure 2 with the adjustable parameters. Initially, structure smoothing was performed before calculating the variance from the initial seismic volume data. We basically performed data smoothing by local averaging using a Gaussian filter. The variance of the Gaussian filter determines the degree of blur and thereby enables the removal of noise from the initial data. Subsequently, ant-tracking was employed to process the smoothed structure after calculating the variance, and this procedure was repeated. While taking the variance, vertical-moving triangular shape smoothing and variance filter size must be selected. The parameters were adjusted aggressively in the first ant-tracking step, while in the next step, they were adjusted passively. "Possible illegal step" is the expanded search range when the ant agent cannot find the edge in its current location. "Legal steps required" is a necessary number of valid steps after an illegal step. If the ant agent stepped illegally to find the fault structure, there should be at least a 'legal steps required' amount of edge pixels. The larger possible illegal step allows the agent to look for the fault aggressively, and the lower legal steps required lead to less restrictive and drastic connections. Following this, structure smoothing was performed once more, and values below a certain threshold were cut off to create the baseline value of −1.

As we explained, ant-tracking is not a fixed procedure with predefined steps and formulas; rather, it involves the appropriate adjustment of parameters by an operator as they proceed. Therefore, using a single labeled ant-tracking dataset will likely result in biased outcomes. To address this, diverse datasets, each with its standard, were constructed using data from six other operators who adjusted the parameters differently. Additionally, data considered too inaccurate were removed through quality control before averaging the data into a single ant-tracking labeled dataset. Figure 2 presents a simple workflow of the procedure.

2.2. U-Net Model Architecture

U-Net, a CNN-type architecture, is particularly effective for image segmentation tasks [11]. As shown in Figure 3, it has a distinctive U-shaped structure: the encoder (contracting path), shown on the left, captures the context, and the decoder (expanding path), shown symmetrically on the right, enables the precise localization. A noticeable feature of U-Net is the skip connections, which connect layers of equal resolution in the encoder to the decoder. These connections help the network to propagate context information to higher resolution layers, facilitating precise segmentation. However, since the ant-tracking algorithm is based on seismic attributes analyzed in three-dimensional context, it is more reasonable to build the model with three-dimensional volume training data. Therefore, we used 3D U-Net to imitate the averaged ant-tracking labels. Figure 3 illustrates the comprehensive architecture of the 3D U-Net model used in this research and shows that its depth is one less than the 2D version, a modification caused by the constraints of GPU memory availability.

Considering that our label has continuous values, we used the loss functions ordinarily used in regression rather than cross entropy to follow the fault existence tendency. The details will be discussed when describing the evaluation metrics. Three different measures of regression loss are available: Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE), which are defined as follows:


where yi and ti denote the output and label and N indicates the batch size.

2.3. Dual-Threshold Iteration

The label data range from 0 to 1 after min-max normalization. Accordingly, the result value of the model to be interpreted also has a value between approximately 0 and 1. However, while analyzing the results, the interpreter needs to determine which data pertain to the fault and which data do not, thus it is natural to think in terms of binary values (right or wrong) rather than a continuous interpretation. In other words, although the model was trained to fit continuous values, fault extraction is considered as image segmentation. Therefore, in the interpretation stage, values above a "certain value" are interpreted as true values and other parts can be said to be false by binarizing them to 0 and 1.

This approach raises the question of how to determine the appropriate "certain value." In most binary segmentation tasks, data are divided into two classes based on 0.5 as the threshold since the label is binary, and any value exceeding 0.5 suggests a higher likelihood of being classified as 1. However, the value of 1 in the ant-tracking data, being the result of normalization, does not represent a probability of 100%. Therefore, setting a threshold of 0.5 as a basis for division at 50% was deemed inappropriate for this study. Instead, it was considered preferable to interpret only values above a "certain upper percent" as faults in the entire structure. Unfortunately, the result contains noise, making it challenging to achieve a small number of false positives with high continuity and details.

Figure 4 illustrates that adjustment of the percentage yields varying outputs. A low threshold provides detailed information but also includes noise, as marked with the green circle. A high threshold case captures significant signals but lacks continuity and subtle faults, as is marked with the yellow circle. To overcome this trade-off, this study introduces the "Dual-Threshold Iteration" algorithm [12].

This algorithm was originally developed for extracting fine blood vessels branching out from large vessels in retinal extraction. When images of vessels are extracted from the dataset, thick vessels are well observed, but fine vessels located next to the main vessel may not be adequately connected or extracted. To address this issue, DTI is applied iteratively extending the image. Initially, a high threshold is applied to eliminate noise signals that may confuse the model. By doing so, only the main vessels are left without noise and subsequently, the pixels that are adjacent to true values and have values higher than the low threshold are updated into true values. This iteration is repeated, progressively expanding the scope of adjacent pixels, until no further updates occur.

Figure 5 presents the algorithm’s pseudocode with some modifications from the original version. When the target percentage is defined as θ, we set a high bound value γh corresponding to the high threshold’s percentage θh . Similarly, we set a low bound value γl and low threshold’s percentage θl . The algorithm initializes values to 1 for pixels that exceed the high threshold γh . While iterating, the pixels adjacent to true values and in excess of the low bound value γl are transformed to true values until they reach the target percentage θ. If there are no more pixels to update even though the target percentage is not satisfied, they expand the range of neighboring pixels generously to find additional fault candidates.

3.1. Data Preparation and Implementation

We used a seismic volume (14.905 km × 12.155 km × 1844 ms) with 55 m spacing and 4 ms sampling intervals (271 [inline] × 221 [crossline] × 461 [vertical]) in this study. Because of the lack of GPU resources, the model used in this study received cropped seismic data as input. First, the 3D volume cube was transformed into sixteen small cubes by cutting it in the middle of the inline and crossline directions and dividing it into four parts in the vertical direction (135 × 110 × 115). Cropping into small cubes was necessary not only to secure GPU storage space, but also to set distinct regions to facilitate cross-validation.

Given the limited quantity of possible training data, these data were considered insufficient to train the model. Hence, data augmentations were performed to increase the diversity of the datasets. Despite the existence of various augmentation schemes such as shearing and blurring, only a few of these schemes were utilized in this study. Usually, data augmentation during training helps to achieve variety; however, in this case, the input was seismic data, which contains stratigraphic features, thus the neural network was prevented from learning irrelevant and unrealistic patterns. Consequently, the dataset was augmented exclusively through mirror flipping and rotation around the vertical axis, with the number of randomly cropped cubes expanding from 16 to 128 as a result of the image augmentation process. Additionally, we further randomly cropped the seismic data with a side length of ninety-six pixels in the data loader since several binary pooling sequences are meant to be conducted afterward. It also enhanced the dataset’s randomness to mitigate overfitting.

Despite the implementation of data augmentation strategies, it was determined that there was insufficient training data to assess the model’s fitness. Hence, the cross-validation was employed. Initially, from sixteen distinct areas segmented into 135 × 110 × 115, one area was excluded for validation and another for testing, using the remaining fourteen areas as the total training set. Considering data augmentation, 112 3D volumes were used for training, while 8 volumes each were allocated for validation and testing sets. Cross-validation was conducted three times, each time training the 3D U-Net model with a different combination of areas to evaluate its performance.

The experiments consist of two parts: 1) loss function comparison between MAE, RMSE, and MSE, and 2) application of DTI to the output of the trained model. The training processes were executed for up to 100 epochs using multiple NVIDIA GeForce RTX 3090s by using distributed data parallel package, each equipped with 24 GB of memory. We selected the model with the lowest validation loss as the final model. The batch size was set to 8 for all datasets. The learning rates were set to 0.0005, and the AdamW optimizer with the CosineAnnealingWarmRestart learning rate scheduler was utilized [13, 14].

3.2. Evaluation Metrics

The ultimate purpose of this study was to emulate the "reliable" overall process of ant-tracking which produces a value approximately ranging from 0 to 1 by using a deep learning model. Additionally, for this reason, there was a need for the evaluation metrics to become slightly more complex. In the case of ML-based models trained on synthetic data or manually picked fault data, the label values are binary. In such cases, evaluating the model’s performance is straightforward: the logits will be binarized based on 0.5, and the model can be assessed using metrics like Accuracy, Precision, or F1 score.

However, one minor drawback of ant-tracking is that the result of ant-tracking does not directly represent the probability of a fault’s presence in that space. Thus, as briefly mentioned in Section 2.2, it should be interpreted as an image regression problem that mimics continuous labels. However, as noted in Section 2.3, ultimately, it is necessary to determine whether the structure is a fault, thus requiring interpretation as an image segmentation problem as well. Consequently, this study suggests that it is essential to evaluate both image regression (in cases where the label is continuous) and image segmentation (after binarizing the label).

For the numerical analysis of the performances, we used seven different metrics. We first compared the MAE, RMSE, Root Mean Squared Log Error (RMSLE), R2 score before binarizing the target and the F1, Optimal Dataset Scale (ODS), and Optimal Image Scale (OIS) scores were measured on the binarized target. The detailed formulas for the other newly introduced metrics are as follows:


where yi and ti denote the output and target and N indicates the batch size. The value of t- in the R2 score means the mean value of the target data. An R2 score of 1 means that every output pixel matches the target, and a score of 0 indicates that the performance is the same as predicting all pixels as the average of the target.

Terms from equation. 8 to 10 are illustrated in terms of the confusion matrix after binarizing with threshold γ as yi,γ means output after binarizing. True Positive (TP) and True Negative (TN) denote the number of correctly segmented fault pixels and nonfault pixels, respectively. False Positive (FP) and False Negative (FN) denote the numbers of incorrect pixels predicted as faults and nonfaults. The F1 score is calculated as the harmonic mean of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0. It provides a balanced measure of a model’s performance in class imbalance situations or excessive outliers. ODS and OIS are standard methods to evaluate the edge detection performance with imbalanced classes [15]. By using a binarized target, these two metrics evaluate the performance of the model by binarizing the model’s output with a different threshold. tbin denotes the upper 5% binarized target ant-tracking label. The reason for binarizing the target label based on the top 5% is not due to any other factor, but because it visually appears to be the point that best distinguishes between fault and nonfault terrains This value needs to be determined empirically according to the given data environment. It should be noted that this 5% criterion may vary depending on the executor’s perception or target dataset. Figure 4 shows the difference between various percentage criteria.

4.1. Determination of Loss Functions

We compared models trained on the following three loss functions: MAE, RMSE, and MSE. The metrics measured for each of the three cross-validation cases and their averages are organized in Table 1. Excluding the MAE as an evaluation index, it was observed that the models using MSE loss outperformed in most of the cases. Models trained with MAE loss were interpreted to have a particularly lower MAE value as they were obviously trained to reduce this loss. However, the other metrics that were used to measure the overall performance of the model did not improve upon these results compared to the other loss functions. RMSE loss function showed finer performance in several cases, but on average, MSE was the most dominant. A drawback of using deep learning models for fault extraction is the occurrence of structures that resemble noisy dots, which are recognized as faults despite not being faults, with the numerical interpretation resulting in numerous false positives [11]. From this perspective, an improvement in the ODS and OIS score that yields improved values with fewer false positives suggests that the model is converging in a positive direction.

MAE is robust against outliers because it allocates an equal weight to each error. This may help to prevent overfitting; learning on outliers would cause the model to perform poorly on unseen data. In other words, if there are few outliers, it is appropriate to use MAE when attempting to create a model that is less affected by those outliers. However, in the case of MSE and RMSE, because these loss functions are sensitive to outliers, the loss function is relatively affected by the error introduced by the outliers. Therefore, the model is generalized by considering outliers. Nevertheless, this class of generalization should not be considered to have been inappropriate in all cases. In some cases, it is essential to ignore outliers with MAE loss, and in other cases, it is necessary to generalize with MSE or RMSE loss by considering outliers [16]. Figure 6 shows the data distribution of the ant-tracking result for the entire volume and a randomly selected sliced image. Despite both having a log scale on the y-axis, the linear decrease observed as the value increases suggests an exponential decrease in the actual linear scale. In fact, using the ant-tracking result directly as a training label is challenging because of the low percentage of meaningful faults. Fault structures usually occupy a small proportion of the entire 3D volume, making it rigorous for the model to learn patterns based on seismic data. Therefore, it can be inferred that effective learning with MSE loss was achieved on ant-tracking data with a small proportion of significant values in the overall structure. RMSE also showed notable performance relative to MSE since it also considered the outlier patterns, but in terms of the average case, MSE was superior to RMSE performance.

One instance of each of the input, ground truth, and results of the model with the MAE, RMSE, and MSE loss for different cross-validation cases is plotted in Figure 7. The figure shows a single slice of the 3D cube data, where a clearer extraction of fault structures can be seen in the MSE case. We see that MSE loss function improves continuities of significant signals, as indicated by the green circles, and yields more similar to the label. Blue circles in cross-validation case 2 show the MSE loss function can capture the intermediate-level signals, but the others cannot. In Figure 8, however, some blurred seismic noise, which is indicated by the red circles, still remains in MSE loss function cases. It seems to stem from the characteristic of MSE, which emphasizes outliers, as is mentioned above. Such noise necessitates alternative methods for its removal.

As mentioned earlier, the 5% criterion was selected based on visual appearances. Therefore, if the ODS and OIS metrics rely too heavily on the target binarization threshold (tbin), it could compromise the reliability of these metrics. To address this concern, we conducted an additional experiment using different criteria (top 8% and 2%) to demonstrate the effectiveness of the ODS and OIS metrics. The results, as presented in Table 2, empirically show that varying the criterion does not significantly alter the trend of the scores.

4.2. Dual-Threshold Iteration Application

We used DTI to reduce the noise while enhancing the connectivity of fault structures. We applied DTI to the case with the MSE loss function, which showed the best performance overall. To implement DTI, lower and higher thresholds were set with the top 5% value as a target percentage. The experiment compares two scenarios of DTI application: one with the top 3% and 5.5% as the low and high thresholds and another with the top 2% and 6%. The selection of the exact number of threshold percentages is determined empirically, influenced by the visually assessed 5% criterion. Here is a simple guideline for choosing the threshold percentages: Employing a more robust high threshold initially focuses on the most reliable fault structures. This approach can effectively reduce the noise in the intermediate stages. However, this strategy might terminate the iteration loop before achieving the target percentage. Similarly, starting with a generous low threshold can easily find the additional true pixels, but it may terminate the iteration loop too quickly, possibly within one or two times. This understanding informs the strategy of combining thresholds, such as 6/2 and 5.5/3, around the benchmark of 5, where extreme values are paired to balance the detection process.

Table 3 presents a comparison across these two DTI cases and the simple upper 5% binarized output using F1 score. Generally, we observed that the application of DTI resulted in an F1 score that was nearly double compared to scenarios where it was not applied. Based on equation. 8, the change in F1 score indicates an improvement in precision, resulting from a significant decrease in false positive cases, addressing the main challenge of automated fault detection.

Additionally, to understand the extent to which DTI enhanced the connectivity while reducing the amount of negligible noise, the number of 3D-connected components was compared. This measure counts the total number of fault clusters that can be considered as one group within the binarized 3D cube, revealing that the number of fault clusters was reduced by more than half in the cases where DTI was applied, as seen in Table 4. This suggests that using DTI somewhat compensated for the phenomenon where structures that likely would have been connected in the output were interrupted. Overall, cases in which DTI was applied delivered improved performance compared to those that did not.

Figures 9-11 display plots for each of the three cross-validation cases, including the seismic input, label, model’s output, high threshold application cases, low threshold application cases, and DTI results. Note that there are two significantly different semantic objectives. One is "reduction of noise," and the other is "similarity to the label." Table 3 compares the F1 score, which is related to "similarity to the label" as considering false-positive pixels. Table 4 compares the number of 3D-connected components as an indirect indicator of "reduction of noise." Figures 9-11 can show both objectives via 2D sliced components for each cross-validation case. While both metrics are important, achieving the maximum index for both objectives is impossible since the label contains noise, as shown in Figures 9-11, parts (b) and (c). A result leaving only the main stem with zero noise would not resemble the label, while a perfectly imitated result would inevitably have noise. Thus, achieving high but not overwhelming values in both dimensions is essential since both objectives are valuable. Although this paper aims to "mimic less-biased ant-tracking data," extracting faults while excluding abnormal noise is also critical. Regarding this dilemma, it is not a suitable way to determine a single case of DTI percentage combination. If DTI proceeded moderately, it would converge towards the label while reducing noise adequately. If DTI was applied aggressively, it could mitigate the noise significantly but slightly differ from the label.

The aim of our study was to enhance the performance of the existing ant-tracking method, which has two major drawbacks: first, the application of numerous attributes individually led to an excessive inclusion of the operator’s bias in the output and made the process overly complex. Second, an excessive number of FPs were obtained, which means that structures were interpreted as seismic faults when they were not. To overcome these drawbacks, we constructed a 3D U-Net model trained on a single ground truth, which was an average of various ant-tracking results that could arise from an operator’s actions, thereby minimizing bias as much as possible. This approach eliminated unnecessary tasks and enabled the acquisition of less-operator-biased fault structures. We built this model by adopting the MSE loss function, which showed greater performance across various metrics in cross-validation tests. Furthermore, to further lower the number of FPs, we applied the DTI algorithm to decrease intermediate-level noise and enhance the connectivity of faults. The reduction of noise signals was illustrated by comparing F1 scores and 3D-connected components. Consequently, we demonstrated that an ML-based model could function effectively as a fault detector without training on numerous simple synthetic data. To replace synthetic data, we used stacked ant-tracking results with minimized operator bias. To improve precision, we utilized DTI, allowing us to reduce the quantity of excessive false positives.

In terms of the general applications of our study, directly applying our trained model may not be suitable because its training was from a single area without validation across different regions. We suggest that a more beneficial approach would be to adopt the model’s architecture and our method of generating complex labels through ant-tracking. Conversely, our DTI algorithm is versatile and can be integrated into any fault extraction methodology during the post-processing phase. Further research is needed to employ the model training with additional distinctive data and apply the trained model when seismic data from entirely different regions are used as input. Additionally, beyond the use of the ant-tracking algorithm, other seismic attributes like chaos or semblance could also serve as labels for training the model, offering the potential for a broader range of applications.

Authors do not have permission to share the original data.

The authors declare that there is no conflicts of interest regarding the publication of the paper.

We thank SLB for providing Petrel license. This research was partially supported by Korea National Oil Company.

Exclusive Licensee GeoScienceWorld. Distributed under a Creative Commons Attribution License (CC BY 4.0).