Seismic interpretations are, by definition, subjective and often require significant time and expertise from the interpreter. We are convinced that machine-learning techniques can help address these problems by performing seismic facies analyses in a rigorous, repeatable way. For this purpose, we use state-of-the-art 3D broadband seismic reflection data of the northern North Sea. Our workflow includes five basic steps. First, we extract seismic attributes to highlight features in the data. Second, we perform a manual seismic facies classification on 10,000 examples. Third, we use some of these examples to train a range of models to predict seismic facies. Fourth, we analyze the performance of these models on the remaining examples. Fifth, we select the “best” model (i.e., highest accuracy) and apply it to a seismic section. As such, we highlight that machine-learning techniques can increase the efficiency of seismic facies analyses.

Seismic reflection data are a key source of information in numerous fields of geoscience, including sedimentology and stratigraphy (e.g., Vail, 1987; Posamentier, 2004), structural geology (Baudon and Cartwright, 2008; Jackson et al., 2014), geomorphology (e.g., Posamentier and Kolla, 2003; Cartwright and Huuse, 2005; Bull et al., 2009), and volcanology (e.g., Hansen et al., 2004; Planke et al., 2005; Magee et al., 2013). However, the often subjective and nonunique interpretation of seismic reflection data has led to longstanding debates based on contrasting geologic interpretations of the same or similar data sets (e.g., Stewart and Allen, 2002; Underhill, 2004). Moreover, seismic interpretations require significant amounts of time, experience, and expertise from interpreters (e.g., Bond et al., 2012; Bond, 2015; Macrae et al., 2016). We believe that machine-learning techniques can help interpreters reduce some of these problems associated with seismic facies analyses.

Machine learning describes a set of computational methods that are able to learn from data to make accurate predictions. Previous applications of machine learning to seismic reflection data focus on the detection of geologic structures, such as faults and salt bodies (e.g., Hale, 2013; Zhang et al., 2014; Guillen et al., 2015; Araya-Polo et al., 2017; Huang et al., 2017) and unsupervised seismic facies classification, in which an algorithm chooses the number and types of facies (e.g., Coléou et al., 2003; de Matos et al., 2006). Although early studies primarily used clustering algorithms to classify seismic data (e.g., Barnes and Laughlin, 2002; Coléou et al., 2003), recent studies focus on the application of artificial neural networks (e.g., de Matos et al., 2006; Huang et al., 2017). To demonstrate the strength of these advanced algorithms, this study compares 20 different classification algorithms (e.g., K-nearest neighbor, support vector machines, and artificial neural networks).

Although unsupervised classification algorithms are, in theory, able to identify the main seismic facies in a given data set, in practice, it can be difficult to correlate these automatically classified facies to existing geologic units. This correlation can be done by visualizing facies in a lower dimensional space (e.g., Gao, 2007) or self-organized maps (e.g., Coléou et al., 2003). As an alternative, we introduce a simple supervised machine-learning workflow, in which the user can define the number and type of seismic facies used for classification. This approach avoids the correlation by allowing the user to adapt the workflow to a given problem, in which seismic facies can be based on existing geologic units.

To demonstrate the advantages of this approach, we describe the application of supervised machine learning to a seismic facies analysis using 3D broadband seismic reflection data of the northern North Sea. Our workflow consists of five basic steps. First, we extract features from the data by calculating 15 seismic attributes. Second, we generate training data by manually sorting 10,000 examples into four facies. Third, we train 20 models to classify seismic facies using some of these examples. Fourth, we assess the performance of these models using the remaining examples. Fifth, we select the best model based on its performance and apply it to a seismic section. Our results demonstrate that machine-learning algorithms are able to perform seismic facies analyses, which are crucial for mapping sedimentary sequences, structural elements, and fluid contacts.

This study uses state-of-the-art 3D broadband seismic reflection data (CGG BroadseisTM) of the northern North Sea (Figure 1). The data covers an area of 35,410  km2 and was acquired using a series of up to 8-km-long streamers towed 40  m deep. The data recording extends to 9 s with a time sampling of 4 ms. Broadseis data covers a wide range of frequencies reaching from 2.5 to 155 Hz (Firth et al., 2014). The binning size was 12.5×18.75  m. The data was 3-D true amplitude Kirchhoff prestack time migrated. The seismic volume was zero-phase processed with SEG normal polarity; i.e., a positive reflection (white) corresponds to an acoustic-impedance increase with depth.

Supervised machine learning requires a subset of the data for training and testing models. We therefore select a reasonable number (10,000) of examples to perform a manual seismic facies classification. This number represents a trade-off between the time required for manual classification and the achieved model accuracy (>0.95). After testing different sizes (from 10×10 to 500×500 samples), we selected an example size of 100×100 samples (Figure 2), which results in high model accuracies (>0.95). The classification follows standard schemes for seismic facies developed based on numerous studies (e.g., Brown, 2004; Bacon et al., 2007; Kearey et al., 2009). The four facies (i.e., classes) that we use for classification are: (A) continuous, horizontal reflectors, (B) continuous, dipping reflectors, (C) discontinuous, crisscrossing reflectors, and (D) discontinuous, chaotic reflectors (Figure 2). These four are probably the most common basic seismic facies. Because almost all geologic structures show at least one of these facies in seismic reflection data, classifying them accurately would allow us to map a wide range of structures.

To classify seismic facies, we apply a typical machine-learning workflow (e.g., Abu-Mostafa et al., 2012). The basic idea of this workflow is to “teach” a model to identify seismic facies in a seismic section. Our workflow includes the following steps: (1) feature extraction, (2) training, (3) testing, (4) model selection, and (5) application (Figure 3).

Feature extraction

Feature extraction aims to obtain as much information as possible about the object of investigation. For this purpose, we extract so-called features, i.e., properties that describe the object we study. Here, this object is a seismic section (Figure 1) and the features are statistical properties of seismic attributes inside a moving window. Because seismic attributes have been specifically designed to highlight certain characteristics of seismic reflection data (see Randen and Sønneland, 2005; Chopra and Marfurt, 2007), they are well-suited features. After examining all seismic attributes available in Schlumberger Petrel 2015©, we extract 15 attributes that allow accurate seismic facies predictions (see Table 1; Figure 4). Seismic-attribute extraction typically involves nonlinear transformations (e.g., Hilbert transformation) of the original seismic data. As such, we can describe these calculations by:
where X0 is the original data, Ti are the transformations, and Xi are the resulting seismic attributes that were normalized.
Although this process provides a value at each point of the data, the nature of the seismic data requires an additional processing step. The seismic data, and therefore its attributes, contain numerous small-scale variations, which only in combination form a seismic facies. This phenomenon is captured by calculating a series of statistics inside a moving window (100×100 samples) from these attributes. These statistics are the features that we use for machine learning. Mathematically, we can describe this process as a deconstruction of the seismic attribute matrices (Xi) into a large number of matrices (Xij) for each window:
In each window, we calculate a series of statistics, i.e., the features (fij):
The statistics that we use include the (1) 20th percentile, (2) 80th percentile, (3) mean, (4) standard deviation, (5) standard error of the mean, (6) skewness, and (7) kurtosis.


Using a large number of features can result in overfitting whereby an overly complex model describes random errors or noise in the data. To avoid overfitting, we regularize our models, when possible, during training. Training during machine learning usually involves the minimization of the in-sample error, i.e., the difference between the predicted (f(X)) and the actual (y) result:
minf  yf(X).
Regularization introduces an additional constraint on the set of models:
where λ is the regularization parameter and R(f) is the penalty function. The regularization parameter was selected based on a trade-off between model accuracy and simplicity during training (see Figure 5). Although we conduct no explicit feature selection, regularization can be regarded as an implicit method to constrain features.


In this phase, we train 20 models to classify seismic facies using training data of our manual interpretation. Training itself involves the minimization of the in-sample error, i.e., the difference between the predicted (f(X)) and the known result (y) (see equation 4). Because we distinguish between four seismic facies, we conduct a multiclass classification in which the model output comprises four discrete classes (A, B, C, and D). Although some classifiers can inherently handle multiclass problems, binary classifiers require one-versus-all or one-versus-one strategies to predict more than two classes. By covering the most-common algorithms used for multiclass classification (see Table 2), we are able to compare their performance on this data set (Figure 5).

To improve the performance, we explore different kernels for some of the algorithms (see Table 2). Classification problems often become easier when we transform a feature (xRn) into a high-dimensional space (φ(x)Rm). However, explicit feature transformations (φ:RnRm) can be computationally expensive. Kernel functions (K) allow an implicit use of these high-dimensional spaces by calculating inner products between feature pair images:
As such, kernels allow us to use high-dimensional feature spaces without specifying them or the explicit transformation. Here, we use polynomial (Kpol) and radial basis kernel functions (Krbf):
in combination with support vector machines, Gaussian process, and neural network classifiers (see Table 2).


During validation, we determine the model performances on yet unseen data. A simple holdout validation splits the data (X) in two subsets: one for training (Xtrain) and one for testing (Xtest). This approach, however, leads to a dilemma because we would like to maximize both subsets: the training set to generate well-constrained models and the test set to obtain reliable estimates of model performance. This dilemma is resolved by cross-validation, i.e., splitting the data multiple times and averaging performance estimates between folds. We apply a tenfold stratified cross-validation in which data are split into training (90% of the data) and test set (10% of the data) 10 times while preserving the percentage of examples of each class. To visualize model performances, we calculate an average confusion matrix of each model (Figure 6).

To quantify the model performance, we calculate (1) precision, (2) recall, and (3) f1-score of each class, and their averages for each model (see Table 3). Precision describes the ability of classifiers to predict classes correctly, recall (or sensitivity) describes the ability of classifiers to find all examples of a class, f1 score is an equally weighted harmonic mean of precision and recall, and support is the number of examples of each class. Furthermore, we calculate the average accuracy of each model and determine its standard deviation between folds (see Table 3). Regularization, training, and cross-validation were implemented in Python using the scikit-learn package (Pedregosa et al., 2011).

Model selection

Model selection is based on generalization performance of trained models on test data. In our case, the model using a support vector machine with a cubic kernel function shows the highest accuracy, precision, and recall out of all models (see Table 3; Figure 7). This means that this model not only does classify seismic facies most accurately, but it is also the best at avoiding incorrect classifications (Figure 6).


After the model selection, it is recommended to train the best model again using the entire data set available, i.e., training plus test data (Abu-Mostafa et al., 2012). This final model is subsequently applied to the entire seismic section (Figure 7).


Test results obtained during cross-validation include confusion matrices and descriptive metrics. Confusion matrices visualize the precision of models for each class (Figure 6). When the model predicts the correct class, the sample contributes to the diagonal of the confusion matrix. When the model predicts the wrong class, the sample contributes to the cells off diagonal. The first element of the confusion matrix (top left) shows the precision for the first class (i.e., horizontal), the second element (first row, second column) shows the percentage of samples classified into the second class (i.e., dipping) despite belonging to the first class (i.e., horizontal), and so on. As such, confusion matrices show how well each model predicts each class. In general, the observed variations between models and classes are minor (<0.1) with the exception of the Gaussian process (cubic) model (Figure 6).

To quantify these differences, we calculate a set of metrics for each model and each class (see Table 3). On the class level, these metrics include (1) precision (i.e., ability of correct classification), (2) recall (i.e., ability of complete classification), and (3) f1 score (i.e., harmonic mean of the two former). These metrics are lowest for Facies B (i.e., continuous, dipping reflections) and Facies D (i.e., discontinuous, chaotic reflections) for almost all models (Table 3). To assess the overall model performance, we calculate averages of these metrics for each model as well as the accuracy and its standard deviations between folds for each model. Table 3 shows all trained models sorted from highest (0.983) to lowest accuracy (0.735). These results are consistent as the model with the highest accuracy (support vector machine (cubic)) also shows the highest precision, recall, and f1-score.


Applying the best model (support vector machine (cubic)) to the entire seismic section produces the final results shown on Figure 7. The results show that the model is, to first degree, able to identify the seismic facies that we use for training: (A) continuous, horizontal reflectors; (B) continuous, dipping reflectors; (C) discontinuous, criss-crossing reflectors; and (D) discontinuous, chaotic reflectors in the seismic section. In general, the model is able to distinguish the sedimentary succession (Facies A and B) from the basement (Facies C and D). Within the sedimentary succession, the model is also able to identify some details, such as (1) the vertical artifact on the left side of the section, (2) dipping reflectors associated with the artifact, (3) clastic remobilizations, and (4) fault-propagation folds of the major tectonic faults. Within the basement, the model succeeds in identifying areas in which strong reflectors crisscross each other (Facies C).

This study demonstrates the applicability of machine-learning techniques to seismic facies analyses. By applying a supervised machine-learning workflow to state-of-the-art seismic reflection data, we are able to classify the main seismic facies in a seismic section (Figure 7). In contrast to previous studies, which often analyze facies at each point of a data set (see the review by Zhao et al., 2015), our classification is based on a window around each point. This process is equivalent to a deconstruction of the data set into a large number of matrices, each corresponding to one point in the data set. This approach mimics manual facies interpretations, which consider patterns within a certain area, rather than at a specific point. Although this approach might not resolve all details, it is probably more robust to noise.

Although previous studies focused on unsupervised machine-learning (e.g., Coléou et al., 2003; de Matos et al., 2006), this study applies supervised learning, i.e., a classification based on predefined seismic facies. This approach has several advantages when it comes to the final geologic interpretation. The seismic facies derived by unsupervised learning can often be difficult to associate with existing geologic units or structures. In fact, the nature of these automatically extracted facies can remain completely unknown to the interpreter, introducing ambiguities and doubts into the interpretation. However, supervised learning allows us to decide on the number and types of seismic facies to include. This allows us to adapt our analysis to map a wide range of geologic units and structures, such as salt, sand, and magmatic bodies as well as folds, faults, and shear zones.

One may now argue that the supervised learning approach reintroduces an element of subjectivity to the analysis. However, this argument presupposes that unsupervised learning is entirely objective. This is not the case. Even unsupervised learning requires the selection of certain criteria (e.g., the number of classes, choice of objective function). One may thus argue that supervised learning increases the degree of subjectivity in the analysis. Although this appears to be the case, it is worth noting that subjectivity is not a problem per se — in many cases, it is desirable to include expert input as long as the applied workflow remains repeatable. As long as other researchers are able to replicate a workflow while reaching the same results, we maintain repeatability. This is a significant improvement over conventional seismic interpretations, in which multiple researchers can reach different conclusions applying seemingly the same method.

Another advantage of machine learning over conventional seismic interpretation is its ability to obtain quantitative metrics, such as model accuracy. More precisely, we are able to quantify the prediction accuracy on yet unseen data (see Table 3). This is done by cross-validation, where a data set is split several times into training and test data. Applying cross-validation in this study yields prediction accuracies ranging from 0.735 to 0.983, depending on the algorithm used. Given this information, we can (1) select the model with the highest accuracy (support vector machine (cubic)), (2) train this model using the entire data set (training plus test data), and (3) apply it to the seismic section (Figure 7).

A qualitative analysis of the final model results demonstrates that the model is able to identify the four main seismic facies (Figure 7), which implies that the model can, to a certain degree, distinguish between sedimentary (Facies A and B) and basement rocks (Facies C and D). Within the sedimentary succession, the model is even able to identify some details, such as (1) the vertical artifact on the left side of the section, (2) dipping reflectors associated with the artifact, (3) clastic remobilizations, and (4) fault-propagation folds of the major tectonic faults. This suggests that the algorithm could be further optimized to detect any of these geologic or geophysical features. Within the basement, the model succeeds in identifying areas where strong reflectors crisscross each other (Facies C) — a signature that has been used by conventional seismic interpretations of basement rocks (e.g., Phillips et al., 2016; Fazlikhani et al., 2017). Future work can focus on mapping seismic facies within the basement to a greater level of detail, identifying key rock units and structures in the northern North Sea (e.g., the Western Gneiss Region and the Hardangerfjord Shear Zone).

Finally, it may be argued that an experienced seismic interpreter can obtain a much more detailed seismic interpretation and, thus, a deeper geologic understanding of the seismic section above than our best model. First, it is worth remembering that a greater level of detail does not imply that the interpretation is necessarily correct. Seismic interpreters are prone to project geologic structures that they are familiar with onto new data (Bond et al., 2007). Second, our approach could also be adapted to highlight additional geologic details. This would only require the definition of additional seismic facies. Third, the aim of this study is to support, rather than compete with, conventional seismic interpretations. We believe that machine learning can become a valuable tool to seismic interpreters aiming to raise prediction accuracy and to reduce uncertainty.

We demonstrate the applicability of supervised machine learning to seismic facies analyses. The basis of this study is state-of-the-art broadband 3D seismic reflection data of the northern North Sea rift, to which we apply a typical machine-learning workflow including (1) feature extraction, (2) training, (3) testing, (4) model selection, and (5) application. This workflow allows us to generate models that predict seismic facies with accuracies of up to 0.983 ± 0.004. The model with the highest accuracy uses a regularized support vector machine to predict seismic facies. Applying this model to an entire seismic section demonstrates that it is able to provide an effective seismic facies analysis. This highlights that machine-learning has the potential to change the way we analyze seismic reflection data in the future.

First, we would like to thank the journal editors (Deyan Draganov, Valentina Socco and Sergio Chávez-Pérez) and the reviewers (Brendan Hall, Rocky Roden and two anonymous reviewers). We also thank The Norwegian Academy of Science and Letters (VISTA) and The University of Bergen for supporting this research. We are very grateful to CGG for supplying seismic data and allowing us to publish this work. In particular, the support of Stein Åsheim and Marit Stokke Bauck is greatly appreciated. Schlumberger is thanked for providing the software Petrel 2015©. We thank the developers of python and scikit-learn, which was used to implement this workflow and we thank Leo Zijerveld for IT support. Finally, we would like to thank the members of the MultiRift project, in particular Antje Lenhart and Tom Phillips, for numerous discussions leading to the development of this study.

Freely available online through the SEG open-access option.