Despite significant developments in the past few years in the application of machine learning algorithms for the lithologic classification of rock samples, publicly available labeled data sets are very scarce. We open source a fully labeled data set containing more than 16,000 scanning electron microscopy (SEM) images of drill cutting samples—mounted on thin sections—from a low-permeability reservoir in western Canada. We develop a simplified image processing workflow to segment and isolate the rock chips into individual SEM images, which in turn are used to identify, classify, and quantify rock types based on textural characteristics. In addition, using this data set, we explore the use of convolutional neural networks (CNNs) as a baseline tool for acceleration and automatization of rock-type classification. Without significant modifications to popular CNN models, we obtain an accuracy of approximately 90% for the test set. Results demonstrate the potential of CNN as a fast approach for lithologic classification in low-permeability siltstone reservoirs. In addition to making the data set publicly available, we believe our workflow to segment and isolate drill cutting samples in individual images of rock chips will facilitate future research of drill cuttings properties (e.g., lithology, porosity, and particle size) using machine learning algorithms.

Identifying lithology is important for the exploration and extraction of natural resources, such as hydrocarbons, minerals, or geothermal energy, from the subsurface. As a complement to well logs and well cores, drill cuttings provide an alternative source of lithologic data that can be used for multiple purposes, such as the construction of well correlations, assessment of reservoir quality, and validation of well logs. In addition, when analyzed promptly, drill cuttings can be used for completion optimization and prevention of wellbore stability problems, such as mud losses, tight holes, and stuck pipe (Tiainen et al., 2002; Carugo et al., 2013; Reyes et al., 2015). However, despite often being the only available rock samples from wells, and their importance as lithology indicators, the use of drill cuttings has been limited to qualitative and sparse observations. This latter fact is due, in part, to time constraints for processing and analyzing the numerous samples and due to their small size and need for specialized imaging equipment.

Typically, lithology identification and classification from drill cuttings is performed by geologists using stereo microscopes at the drilling site. However, a significant amount of human bias may be introduced when different geologists perform classifications. Moreover, after qualitative lithologic descriptions have been performed—often over a limited time window—the drill cutting samples are stored and rarely examined again. Therefore, most of the geoscience personnel employed by an operating company seldom have the chance to observe the drill cutting samples, instead relying on well-site reports. When quantitative analysis from drill cuttings is required, point counting from petrographic thin sections is typically performed (e.g., Bradbury et al., 2007; Ali and Ghiniwa, 2016). This conventional approach is time consuming, and results may vary due to the experience bias among different petrographers. Furthermore, petrographic thin sections often do not resolve the lithologic heterogeneity of fine-grained reservoirs. For that purpose, scanning electron microscopy (SEM) images provide a higher resolution alternative to optical microscopy.

Digital techniques have been developed for the interpretation of lithology from drill cuttings, such as automated mineralogy from SEM coupled with energy-dispersive X-ray spectra (EDS) (Gottlieb et al., 2000). With such a technique, drill cuttings can be accurately classified into user-defined rock types based on mineral associations. The most widely known proprietary tool that implements SEM-EDS to generate mineralogical images is the QEMSCAN®, an acronym for Quantitative Evaluation of Materials by Scanning Electron Microscopy. Indeed, the SEM-EDS technology has been extensively applied to classify rock types from drill cuttings in oil and gas exploration and production (e.g., Edwards and Butcher, 1999; Sliwinski, 2010; Oliver et al., 2013; Ly et al., 2014). SEM-EDS also has been used for lithologic classification in geothermal (e.g., Ayling et al., 2011, 2012) and mining studies (Haberlah et al., 2011; Goodall and Butcher, 2012). A pertinent review of the application of automated mineralogy for various disciplines through time is presented by Sandmann (2015).

In recent years, facilitated by the quantitative aspect of SEM-EDS mineralogical images, lithology classification of drill cuttings using machine learning algorithms has evolved (e.g., Taylor and Ly, 2021). However, to the best of our knowledge, the use of machine learning for lithologic classification of drill cuttings using raw images (i.e., not SEM-EDS) is limited to only four previous studies. Wade and Arnesen (2018) develop a proprietary tool that relies on a deep neural network to interpret 10–15 lithologies using photographs of washed-and-dried drill cutting samples. The model has been previously trained on a labeled catalog of cuttings photographs from offshore oil and gas wells. Similarly, Kathrada and Adillah (2019) investigate the use of a support vector machine (SVM) and convolutional neural networks (CNNs) to classify the lithology of drill cutting photographs (washed-and-dried drill cutting samples). Caja et al. (2019a) implement an SVM model to classify four lithology classes from high-resolution thin section images of drill cutting samples. Peña et al. (2019) use several machine learning models available in the open-source image processing software ImageJ (Rasband, 1997) to classify four lithologies from thin section images of drill cuttings.

Most previous studies of raw images have used either white-light photographs or petrographic thin section photomicrographs of drill cuttings. In this study, SEM images from drill cuttings mounted on thin sections are used. Here, a simplified image processing workflow is proposed to isolate the rock chips in SEM images of thin sections from drill cuttings into individual images that can be later classified into rock types, either manually or using machine learning algorithms. Notably, the isolation of the SEM images from individual rock chips could serve as a basis for performing more quantitative analysis that requires SEM images as input (e.g., Zhang et al., 2015; Buckman et al., 2017, 2020).

Then, the use of CNNs as a tool for the acceleration and automatization of rock-type classification using SEM images of drill cuttings is explored in this work. The potential of CNN as a fast approach to obtain lithology classification is therefore demonstrated. Finally, an open-source data set of 16,700 fully labeled SEM images of individual rock chips from thin sections of 14 drill cutting samples is provided. This is important because, despite significant recent developments in the application of machine learning algorithms for lithology classification of rock samples (e.g., Pires de Lima et al., 2019; Baraboshkin et al., 2020; Fan et al., 2020; Zhang et al., 2021), open-source data sets of geoscience images that can be used to test these applications are very scarce (e.g., OpenGeoscience from the British Geological Survey). When available, these data sets are frequently not fully labeled mainly because the process is time consuming and requires a certain level of expertise. Consequently, researchers must label their data sets, limit their work to unsupervised techniques, or use supervised techniques with limited training sets. Such limitations hinder the ability of researchers to compare the results from different approaches, undermining the potential that machine learning could have in the geoscience community.

Backscattered electron (BSE) SEM images from 14 thin sections of drill cuttings (Figure 1), collected across 2 km of a horizontal well in western Canada, have been selected for this study. The well targets the Montney Formation, one of the most prolific unconventional plays in North America. The Montney is a low-permeability siltstone reservoir, regarded as having a complex lithology that shows subtle changes in grain size and rock composition that resulted from an arid climate environment and a subsequent strong diagenetic overprint (Davies, 1997; Davies et al., 1997). The workflow applied in this study is comprised of four steps (Figure 2): (1) sample preparation and image acquisition, (2) rock chip segmentation and isolation, (3) manual geologic classification, and (4) CNN classification. These steps are described in the following sections.

Sample preparation and SEM image acquisition

The collected drill cutting samples were washed and oven-dried, after which they were sieved through 20, 35, and 60 US mesh sizes. Approximately 2 g of the 20–35 mesh fraction (0.5–0.8 mm) were examined under a stereo microscope to manually remove drilling mud contaminants, such as wood fiber, metal shavings, and polymer beads. Thin section preparation involved vacuum impregnation with blue-dyed epoxy, polishing, and carbon coating. The surface area of the thin section covered by rock chips was approximately 2.5 cm × 2.5 cm (Figure 1).

The 14 thin sections were analyzed using an FEI Quanta FEG 250 environmental field emission SEM. The microscope was equipped with a BSE detector that produces images with atomic number contrasts (Goldstein et al., 2017). The entire thin section area was imaged through the automated collection of consecutive tiles using the Maps® software from FEI (Figure 1). On average, 550 tiles or images were acquired for each thin section, each image with a resolution of 500 nm per pixel.

Rock chip segmentation and isolation

Several studies have investigated different approaches for the segmentation of geologic features in thin section images (e.g., rock fragments, mineral grains, and porosity), ranging from those based purely on image processing techniques (e.g., Samet et al., 2012; Caja et al., 2019a) to those that rely on machine learning techniques  Asmussen et al., 2015; Budennyy et al., 2017). Our proposed methodology for chip identification and isolation is more straightforward and relies on two main assumptions. First, the drill cutting chips in the thin section are distributed so that individual chips do not touch each other, meaning that there is always some background space (epoxy) around each of the chips. Second, the SEM images obtained from thin sections have sufficient resolution to differentiate the background from rock chips correctly. Under these assumptions, our segmentation is comprised of six consecutive steps: (1) downscale SEM images, (2) define intensity thresholds for the background and the chips, (3) fill “holes” in chips, (4) remove small chips, (5) identify and enumerate chips, and (6) save individual images of chips (Figure 3).

As a presegmentation step, and to speed up the process, the SEM images are downscaled by a factor of two, changing the original resolution of 500 nm to 1 μm per pixel. Then, the actual segmentation process begins by defining the background and foreground threshold values using the pixel intensity. The intensity thresholds have been defined interactively through the analysis for an arbitrary image sample. The background threshold is set to 10 (0–255 range) and corresponds to the black background in the SEM images (Figure 3). The foreground threshold is set to 30 (0–255 range) and corresponds to potential rock chips (Figure 3). As a result, all pixels with intensity values equal to or smaller than 10 (background threshold) are flagged as background, and those with values larger than 30 (foreground threshold) are flagged as possible rock chips.

However, a caveat is that some features within the actual rock chips—that typically present low-intensity pixel values—including pores, microfractures, or organic compounds could be spotted as holes and therefore erroneously flagged as background due to the similarity in pixel intensity (<10). Similarly, small rock fragments or chips—which are most likely a byproduct of the thin section preparation—can be flagged as rock chips due to their intensity value (>30). However, these rock chips are too small and statistically negligible to interpret their lithology. To address these issues, standard morphological image processing operations have been implemented to remove small holes and objects (Figure 3). The threshold values for the holes to be filled are based on the area of the object. To fill up the holes inside the chips, an area of 4 × 104 nm2 is selected, equivalent to the area of a square with a size equal to 0.2 mm. Similarly, the same technique is used to remove the small chips in the thin sections (Figure 3). The threshold value for the small chips is set to 2 × 104 nm2 for samples 2 and 14, and 4 × 104 nm2 for the remaining 12 samples. The area thresholds have been defined interactively through the analysis for an arbitrary image sample and refined for samples 2 and 14.

The next step consists of identifying and assigning a consecutive numeric label to all of the chips present in the thin section (Figure 3). The rock chip identification has been conducted using a region-labeling algorithm, which identifies connected regions of an image. Two pixels are identified to be connected if they are neighbors and have the same value (Fiorio and Gustedt, 1996; Wu et al., 2005). As a final step in the workflow, the segmented chips are saved as individual images (Figure 3). To automatize the segmentation workflow, the image processing package scikit image (Van der Walt et al., 2014) is used. The total computation time to run the complete segmentation workflow is approximately 5 min for a sample with approximately 1000 identified chips using a laptop with an i5-7300HQ CPU at 2.50 GHz.

Rock chip classification

After segmentation, the images of individual rock chips are classified using two methods, including a manual classification based on geologic characteristics and a supervised classification using a CNN model. The two methods are described in detail next.

“Manual” geologic classification

A total of 16,700 individual SEM images from isolated rock chips were manually classified by dragging the image files into individual folders (Figure 2). The classification into five rock classes was based on visual characteristics, such as grain size, pyrite content, and porosity (Table 1). Rock classes comprised (Figure 4): (1) organic-rich mudstone (OR_M), (2) heterolithic dolomitic siltstone (HD_Slt), (3) dolomite-cemented siltstone (DC_Slt), (4) porous dolomitic siltstone (PD_Slt), and (5) chips with evidence of drill bit metamorphism (DBM). In addition, two more categories were included, labeled as (6) contaminants and (7) touching chips (Table 1). Once the manual classification was completed, the SEM images of the entire thin section were optionally saved in an HTML file that can be accessed in most modern web explorer software so that users can check the assigned labels of rock chips. The HTML file greatly facilitated interactivity, such as zooming and panning (refer to the “Data and materials availability” section).

Supervised CNN classification

A CNN is a deep learning algorithm designed to automatically and adaptively exploit the locality, stationarity, and compositionality characteristics of grid pattern data, such as images. For a detailed review of the theory and methods of CNN models, the reader is referred to Lecun et al. (2015). A significant increase in performance obtained by CNN models is due to the development of different CNN architectures. Some of the most popular architectures include VGG (Simonyan and Zisserman, 2015), GoogLeNet (Szegedy et al., 2014), InceptionV3 (Szegedy et al., 2015), ResNet (He et al., 2016), and DenseNets (Huang et al., 2017).

In this study, ResNet is selected as the neural network architecture (He et al., 2016). ResNet has been chosen because of its efficient strategy to avoid the gradient vanishing problem using residual (or skip) connections that facilitate the gradient flow from the output of the network toward the input of the network (He et al., 2016). Specifically, ResNet-18, which is composed of 18 trainable layers, either convolutional or fully connected (FC), is selected. The ResNet architecture ends with a global average pooling layer, followed by an FC layer (Figure 5). Typically, ResNet trained on the ImageNet database has the final FC layer with 1000 neurons (FC 1000), one for each class of ImageNet (Deng et al., 2009). Here, the FC 1000 is randomized, and another FC layer is added with five neurons, one for each rock class (Figure 5).

To fine tune the ResNet architecture, two training methods are used: transfer learning (TL) and randomly initialized weights (RIWs). In TL, a model trained on a primary task (e.g., ImageNet) is repurposed to a secondary task—usually composed of a smaller data set. TL generally facilitates training and helps the models achieve better performance. The expectation in TL is that the weights learned by the model in the primary task are useful for the secondary task (Yosinski et al., 2014). In the case of TL, as applied in this study, only the FC layers are updated in the first five epochs of training. Then, the remaining layers are unfrozen, and all of the weights are updated for the remaining epochs (Figure 5). Training methods, TL and RIW, use the same architecture. The main differences between the methods are the weights assigned for the convolutional layers in the model before the model starts training. As the name implies, models trained with RIW have all of their weights randomly assigned, whereas models trained with TL start with weights previously obtained. A similar strategy and architecture are implemented by Pires de Lima and Duarte (2021), and more details can be found therein.

For TL, the weights learned from models primarily trained for the classification of the ImageNet data set (Deng et al., 2009; Russakovsky et al., 2015) are used. Because ImageNet is a data set with color images, the input for the model expects images with three channels (RGB or red-green-blue). However, SEM images are single channels; thus, this difference is addressed by repeating the single SEM channel into three channels. This approach negligibly increases the necessary complexity of the models by increasing the number of the channels in the first layer but facilitates the use of TL from models previously trained on ImageNet.

The labeled data set is randomly split into training and test sets (Table 2; Figure 6). The splitting is purposely set so that at least one chip image for each rock class from each thin section was present in the training and test sets. The contaminants and the touching chips are not included in the split. There are relatively few contaminants in the samples (163 chips, Table 3), and they are not useful for the lithologic classification. The touching chips can include two or more rock classes and are indeed an undesired product of the segmentation technique.

All SEM images of individual rock chips are rescaled to 224 × 224 pixels to be processed by the model, and 20% of the training set is selected as the validation set. Image augmentation is used during training, including horizontal flip, vertical flip, and rotation limited to ±5°, all with 50% probability. The augmentation is used in the training set only. The validation loss is evaluated during training, and training continues if the validation loss continues decreasing. A patience of 10 epochs is set, and training is interrupted if the validation loss does not improve. Patience is a hyperparameter that expresses how long the model continues training if there are no improvements in some observed metric, in this case, the validation loss.

For the results presented here, all computing is done using Python 3.8.10. The implementation of the CNN models and analysis uses the PyTorch framework (Paszke et al., 2019), following PyTorch Lightning structures (Falcon, 2019). Adam (Kingma and Ba, 2015) or RMSprop (Tieleman and Hinton, 2012) are used as optimizers. Other hyperparameters are provided in the “Results” section.

As a result of segmentation and isolation, the number of individual chips identified for each thin section ranges from 787 to 3441, with a total count of 16,700 chips for all samples (Table 3). On average, less than 5% of the isolated images correspond to touching chips. This validates the assumptions for segmentation previously described: that the drill cutting chips in the thin section do not touch each other and that the SEM images have sufficient resolution so that the background is correctly differentiated from rock chips.

Manual geologic classification

The isolation of individual chips has allowed the identification of seven classes consistently present in the 14 thin sections (Table 1). Five of these classes represent rock types, and the remaining two classes are contaminants and touching rock chips (Table 1). The five rock types are (1) OR_M, (2) HD_Slt, (3) DC_Slt, (4) PD_Slt, and (5) chips with DBM. Table 3 summarizes the chip count for each of the interpreted classes.

The OR_Ms are easily recognized by the fine-grained matrix and the abundant pyrite content that appears as bright colors in the SEM images (Figure 4). No interparticle porosity is observed in the OR_Ms; only microfractures are observed, which are most likely induced during the drilling process or thin section preparation (Figure 4). The HD_Slts are characterized by a silty matrix with fine-grained laminations, which in the SEM are highlighted by abundant pyrite (Figure 4). The PD_Slts are recognized by visible interparticle porosity appearing as black spots among grains (Figure 4). The DC_Slts display negligible visible interparticle porosity, and therefore, this is the main criteria used to define this rock class (Figure 4).

Rock chips that exhibit DBM are characterized by a sheared appearance and often display convex or concave shapes (Figure 4). As documented in the literature (Taylor, 1983; Wenger et al., 2009), DBM chips result from the use of polycrystalline diamond bits, which was the drill bit choice for the well from which the drill cutting samples were collected in this work. The unaltered lithology of the DBM chips is identified mainly as OR_M and HD_Slt due to the abundant pyrite content (Figure 4). Contrary to the DBM chips, the chips from the remaining four rock types do not seem to display a particular shape that would help to differentiate them (Figure 4).

The rock chips in the contaminant class (CONTAM) are generally recognized by unconsolidated aggregates that contain pieces of mud additives (e.g., barite) or metal shavings from the drilling system mixed with ground rock particles (Figure 7). The number of chips classified as contaminants was less than 2% for 12 samples and 3.6% and 3.1% for samples 1 and 14, respectively (Table 3).

The last class (TOUCH) was assigned to chips that touch a neighboring chip—frequently from a different rock type—and were erroneously isolated as single rock chips (Figure 8). Usually, the touching chips are comprised of only two chips, but there were a few cases in which three chips were found to be touching each other (Figure 8). A total of 848 touching chips were identified, with percentages between 3% and 5% in 13 samples and only one sample with 11% of the chips that could not be adequately isolated (Table 3). Even though the rock type of the touching chips could have been easily recognized from the SEM images, they were not included in the manual or CNN classifications. Potential solutions for separating the touching chips have been provided by Faessel and Courtois (2009) and Tan et al. (2019).

An important application for the classification of rock types from drill cutting samples is the analysis of lithologic variation along the length of a horizontal well (Figure 9). For this study, despite sample depths being randomized, the normalized abundance of the five rock types in the 14 samples shows significant lateral variability. Figure 9 demonstrates that PD_Slt and DC_Slt range between 6%–49% and 7%–48%, respectively. In contrast, the OR_Ms and HD_Slts exhibit relatively smaller variations among the samples, with a range between 4% and 21% for both rock types. Notably, among the five rock types, the DBM chips display the most significant variation in the samples, ranging from 16% to 92% (Figure 9; Table 3).

Supervised CNN classification

After manually labeling all of the previously identified rock chips, the performance of CNN models with respect to the classification of the isolated rock chips is now evaluated. Hyperparameter tuning is implemented, changing the batch size, the optimizer, and the learning rate. As previously described, the effects of two different training methods are implemented: TL and RIWs. To evaluate the performance of the models during training, the accuracy and loss curves for both training methods are plotted in Figure 10. The results of models with best-performing hyperparameters are provided. Both methods show similar curves, although the performance of the models in the training set improves smoothly compared with the validation set (Figure 10).

Compared with the RIW model, the TL model trained for longer, and the validation accuracy curve between epochs 24 and 32 is smoother (Figure 10). However, the performance of the TL model for the validation set varies again after epoch 32 (Figure 10). The TL model is trained with a batch size of 32, RMSprop optimizer, with a learning rate of 1e−3. The RIW model is trained with a batch size of 32, Adam optimizer, with a learning rate of 1e−3. Using a laptop GeForce GTX 1050 GPU, the RIW model requires approximately 70 min to train for 24 epochs. With the same hardware configuration, the TL model requires approximately 110 min to train for 36 epochs.

To evaluate the accuracy of the CNN classification, Figure 11 shows the confusion matrix for the test set. The main diagonal indicates the number of times that the model predicted the same rock class as the one previously assigned by manual classification. Off-diagonal elements indicate a difference between the rock class predicted by the model and the class into which the chip was manually assigned. In general, both training methods have similar performance (Figure 11). The overall test accuracy of the TL method is slightly better (91%) compared with the RIW method (88%).

The main confusion for both models was between PD_Slt and DC_Slt chips (Figure 11). TL incorrectly predicted 93 DC_Slt chips as PD_Slt and 69 PD_Slt chips as DC_Slt. Similarly, RIW incorrectly predicted 123 DC_Slt chips as PD_Slt and 57 PD_Slt chips as DC_Slt. RIW also confused HD_Slt chips with all other possible rock classes: 75 chips were predicted to be DBM, 28 to be DC_Slt, 30 to be OR_M, and 47 to be PD_Slt. Remarkably, TL improved the classification for HD_Slt and OR_M, which are the rock classes with the lowest representation (i.e., lowest number of samples) in the training data set (Figure 11; Table 2). TL also misclassified some HD_Slt chips: 4 as DBM, 14 as DC_Slt, 26 as OR_M, and 45 as PD_Slt. Notably, there was no confusion for both models between OR_M and DC_Slt or OR_M and PD_Slt (Figure 11).

To evaluate the performance of the CNN models trained with TL and RIW, the standard metrics precision, recall, and F1 score are calculated for each rock class using the test set (Table 4). In addition, a global weighted average is calculated to account for the variations in the number of rock chips per class (Table 4). The precision reflects the number of rock chips correctly predicted in a class over the total number of rock chips predicted for that class. The recall represents the amount of correctly predicted rock chips in a class over the total number of rock chips in that class. The F1 score integrates the precision and balance metrics using the harmonic average. The TL method demonstrates a better overall performance with higher global average precision (90.7%), recall (90.7%), and F1 score (90.7%) compared with the RIW model (Table 4).

Finally, Figure 12 illustrates the weights (7 × 7) of the first convolutional layer for RIW and TL trained models. Although a thorough model interpretability is out of the scope of this work, such weights are a helpful tool to understand whether the trained models can be easily generalized to other data sets and what features are highlighted when images are processed by the trained models. The expectation is that generic filters, such as color blobs and edge detectors, are better for the generalization of the model. The weights shown in Figure 12a (RIW weights) indicate that the first layer learned to identify high-intensity marks and edge detectors. Examples of high-intensity marks are the white pixels in the first row and first column (1, 1), first row and eighth column (1, 8), and others [e.g., (6, 4), (6, 8)]. Examples of edge detectors are filters observed in (7, 1) and (4, 4). However, the RIW weights are to some extent random. This is a contrast to the TL weights shown in Figure 12b. The weights adapted from the model pretrained on ImageNet show more localized color blobs [e.g., (1, 7), (4, 5)], which are artificial remainders from the original ImageNet weights because there are no differences in the colors for the chips. However, many of them are edge detectors, for example, filters (1, 1), (2, 1), (2, 3), (2, 4), and most filters in the seventh and eighth rows (Figure 12b).

Manual geologic classification versus CNN classification

As anticipated, there are discrepancies among classes predicted by the CNN models and the manual classification (Figure 11). Results illustrate that the most confusing class for training models (TL and RIW) is the HD_Slt. Approximately 10% of the HD_Slt rock chips are erroneously classified as either OR_M or PD_Slt (Figure 11). This confusion is interpreted to be caused by the fact that the HD_Slt is defined as siltstones (either dolomite-cemented or porous) that have fine-grained pyrite-rich laminae (similar to the OR_M) (Figure 4), thus sharing lithologic features with other classes that likely are too subtle for the CNN models. Similarly, models (TL and RIW) also confuse some (approximately 13%) DC_Slt and PD_Slt (Figure 11). Because the distinction between PD_Slt and DC_Slt is based on visual porosity (interparticle or from microfractures), the confusion among these rock classes could be related to the presence of artificial microfractures in the DC_Slt. It is worth noting that the overall proportions of rock types in the drill cutting samples from the manually labeled full data set were not altered after the CNN classification was performed on the test set (Figure 13). In other words, the hierarchy of rock types for each of the drill cutting samples remains constant (Figures 9 and 13). For instance, the order of abundance for the rock types in sample 4 is the same in the manual and CNN classifications: DBM > OR_M > HD_Slt > PD_Slt > DC_Slt (Figure 13).

Application of rock classification based on SEM images from drill cuttings

The classification of drill cutting samples selected from different sections of a horizontal well (2 km) suggests significant variations in the abundance of rock types present at a particular sampled depth (Figure 9). The analysis of such variations facilitates the lithologic association of drill cutting samples. To illustrate this, three lithology groups are qualitatively defined based on the proportion of rock types — and their associated rock properties—for each sample. Group 1 represents depths that display relatively good reservoir quality as determined by the visible amount of porosity of the PD_Slt chips (samples 4, 6, 7, 10, 11, and 13). Group 2 represents rock samples with a high content of dolomite cement, implying poor reservoir quality (samples 1, 2, and 3). And group 3 corresponds to rock samples that are more prone to DBM (samples 5, 8, 12, 9, and 14). Figure 13 presents a graphical comparison of these lithology groups, in which the differences in lithology resulting from the labeled rock chips are more evident when compared with white-light photographs (stereo microscope). Because rock types show very similar aspects when imaged with stereo or petrographic microscopes, lithology identification can be difficult. For instance, it is challenging to differentiate among heterolithic, dolomite-cemented, and porous siltstones because they share similar grain size, color, and mineralogy (Figures 1 and 14). Similarly, the chips that exhibit DBM have a black appearance similar to OR_M chips (Figures 1 and 14); this is because most of the DBM chips are an altered version of the OR_M chips. It is anticipated that this challenge also could be found in other fine-grained formations due to their intrinsic homogeneity (e.g., Marcellus and Woodford shales). To address this issue, it is recommended in this study to work with SEM images from thin sections instead of photographs of washed-and-dried drill cutting samples or photomicrographs from thin sections.

Comparison to previous lithologic classification work

Because many of the popular deep learning frameworks have tools that expect the data to be organized as individual files, having individual chips facilitates the application of CNN for image classification. Training ResNet-18 for the classification of the chips has resulted in models that are approximately 90% accurate in the test set of our data set (TL with 91% and RIW with 88% accuracy). In comparison, Peña et al. (2019) use optical thin section images from drill cuttings and achieve 98.7% accuracy for the classification of four lithologies. Although Peña et al. (2019) use a different type of image data (thin section photomicrographs), their increased accuracy may be caused by the choice of the object being labeled. Peña et al. (2019) classify the pixels of the images, considering the background of the thin section (i.e., epoxy) as a class to label (that accounts for 50% of the thin section), whereas the current study classifies the individual rock chips. They use image processing to generate feature descriptors, such as texture and edge detector filters, to feed a random forest model, among other models.

In general, random forests are easier to train than CNN and also can achieve high-performance results for several tasks (e.g., Fernández-Delgado et al., 2014). However, random forests are often used to generate pixel-level classifications and depend on the input features. In general, manually engineering feature descriptors gives better control to the user when compared with filters generated by CNN models because the user can easily visualize and understand what features are highlighted according to each of the chosen feature descriptors. However, a downside of using such filters is that they are, in many cases, dependent on the window size. For a simplified example, an edge detection algorithm with a large window size can smear edges onto flat regions of the image, whereas a too-small window size can highlight noise. For example, noise can be attenuated with Gaussian filters. Nonetheless, the user often needs to compromise and make a decision. Moreover, the window size, in fact, determines the field of view of the feature descriptors. Such a field of view can be augmented in case the result of one filter is used as input to another, something that, in practice, happens in all CNN models. In summary, the capacity of CNN models to generate their own filters (as shown in Figure 12) and have a larger field of view helps the model classify more complex objects, such as rock chips with laminations or with large pores inside.

TL versus RIWs for model training

In contrast to other popular machine learning methods (e.g., random forests), the weights of pretrained CNN models are frequently used as a starting point for secondary tasks through the implementation of TL. With more than 11,000 individual images available in the training set (Table 2), the results show that the TL training mode only marginally increases the performance of the model (Figure 11). Although other studies have presented a more significant improvement when using TL compared with the RIW training method (e.g., Cunha et al., 2020; Pires de Lima and Duarte, 2021), the results here indicate, as expected, that a sufficiently large data set tends to reduce the improvements achieved by the TL methodology. It is anticipated that the creation of large geologic image databases will facilitate the evaluation and adoption of other ML models. However, the workaround implemented here, triplicating a single SEM channel to create a pseudo-RGB image, would not be necessary if the geoscience community had easy access to models trained on the SEM data set. Moreover, it is expected that using TL will contribute to increasing the number of possible choices for the models. In other words, the models trained with the data set provided here can now be fine tuned when applied to other SEM data sets from other rock types.

Strengths of the proposed methodology

One of the key strengths of the current work is the simple methodology used to isolate the individual rock chips from the SEM image of the whole thin section. Isolating the rock chips and working with individual images provides greater control for the geologist, allowing them to quickly evaluate several individual chips (i.e., side by side) rather than find them in a larger image. In addition, it can be less overwhelming to drag the individual images of isolated chips into their class folder than to label the chips in the full SEM mosaic image. This perception can be related to the cognitive abilities called selective and divided attention. The classification of individual images corresponds to selective attention, and the classification of the rock chips in a thin section corresponds to divided attention. Compared with selective attention, divided attention is associated with an increased demand for cognitive processing and may reduce efficiency and accuracy (Duncan, 1979; Pashler, 1994).

Moreover, performing quality control on the class folders also is easier than revisiting large SEM mosaic files. Experience has shown that geologists seldom revisit their first interpretation of rock chips in thin section images. This is likely because it is inconvenient to evaluate biases or mistakes if it is impossible to compare several chips at once quickly. In addition, isolating the chips and saving them into individual image files also facilitates the training of the CNN model by avoiding the labeling of a subset of chips (i.e., training set) using the full SEM image. In addition, using individual images of rock chips rather than a full thin section also forces a unique class for each chip, regardless of whether a geologist or an ML model makes such an assignment. An example of such an issue is highlighted by Peña et al. (2019) and Caja et al. (2019a), in which from two to four classes (out of four) were assigned to some of the rock chips in the thin sections. Such errors in the classification could potentially hinder the quantification of rock classes.

Suggestions for further study

Arguably, the most widely adopted use of CNN is for classification tasks, in which the output to an image is a single class label. However, CNN also is used for many other tasks, including segmentation. One of the main challenges of implementing CNN for segmentation is assembling a labeled database. The data set provided here addresses this issue because it contains pairs of input images and labels. Therefore, future work could explore the application of CNNs, such as the U-Net architecture (Ronneberger et al., 2015), to improve the segmentation task.

However, it is worth noting that, in this study, the training data set was split proportionally to the number of samples in each rock class (rather than equally), which means that there was an inherent imbalance during training (Table 2). Such an imbalance may potentially introduce a bias toward the more represented classes, which could result in a higher misclassification rate in the least-represented classes (He and Garcia, 2009). Therefore, future work should be conducted to investigate the effects of different sampling methods on the performance of the classification models. Several methods to partially compensate for imbalanced data can be found in the literature (Batista et al., 2004; He and Garcia, 2009; Chawla et al., 2011; Krawczyk, 2016).

In addition to lithologic classification, the isolated SEM images of rock chips also can be used as input for other image-based geologic studies, including 2D porosity (e.g., Buckman et al., 2017; Caja et al., 2019b; Landry et al., 2020; Tian et al., 2021), particle size, shape, sorting (e.g., Guzman, 1999), and cementation (e.g., Vocke et al., 2018).

A new annotated or labeled data set for rock classification based on SEM images of drill cutting samples mounted on thin sections is described and made publicly available. This data set includes 16,700 pairs of input SEM images from individual rock chips and labels. Hopefully, this data set can facilitate more research into image applications in the geoscience community.

A manual geologic classification of lithology, as well as an approach based on CNN, is provided. For the latter, two baseline CNN models for rock chips classification are presented, both based on a ResNet-18 architecture. The two models are differentiated based on different training methodologies: TL and RIWs. Training the ResNet-18 for the classification of the rock chips has resulted in models that are approximately 90% accurate in the test set of our data set (TL with 91% and RIW with 88% accuracy).

The results here demonstrate the potential of CNN as a fast approach for the automatic classification of SEM images from drill cutting samples and, therefore, the lithologic quantification in a highly complex low-permeability reservoir. Further work will be focused on increasing the geologic database for network training. Ultimately, such automation does not replace the expert geologist but enables more rapid and efficient classification tasks, freeing up time and expertise to explore more complex interpretations and concepts.

We thank C. Debur from the University of Calgary for his assistance with the acquisition of SEM images. C. Clarkson would like to thank Ovintiv and Shell for sponsoring his Chair in Unconventional Gas and Light Oil research in the Department of Geoscience, University of Calgary. The sponsors of the Tight Oil Consortium also are acknowledged for their support.

Data associated with this research are available and can be accessed via the following URL: Most of the Python scripts used for the analysis of the data set are available at: (for chip segmentation and isolation) and (for baseline CNN models).

Biographies and photographs of the authors are not available.

Freely available online through the SEG open-access option.