Long-wave infrared (LWIR) spectra can be interpreted using a Random Forest machine learning approach to predict mineral species and abundances. In this study, hydrothermally altered carbonate rock core samples from the Fourmile Carlin-type Au discovery, Nevada, were analyzed by LWIR and micro-X-ray fluorescence (μXRF). Linear programming-derived mineral abundances from quantified μXRF data were used as training data to construct a series of Random Forest regression models. The LWIR Random Forest models produced mineral proportion estimates with root mean square errors of 1.17 to 6.75% (model predictions) and 1.06 to 6.19% (compared to quantitative X-ray diffraction data) for calcite, dolomite, kaolinite, white mica, phlogopite, K-feldspar, and quartz. These results are comparable to the error of proportion estimates from linear spectral deconvolution (±7–15%), a commonly used spectral unmixing technique. Having a mineralogical and chemical training data set makes it possible to identify and quantify mineralogy and provides a more robust and meaningful LWIR spectral interpretation than current methods of utilizing a spectral library or spectral end-member extraction. Using the method presented here, LWIR spectroscopy can be used to overcome the limitations inherent with the use of short-wave infrared (SWIR) in fine-grained, low reflectance rocks. This new approach can be applied to any deposit type, improving the accuracy and speed of infrared data interpretation.


Short-wave infrared (SWIR) spectroscopy techniques are increasingly utilized in mining and mineral exploration to recognize and classify various mineral species of significance for exploration and mineral processing (Ahmed, 2010; Browning, 2014; Maydagán et al., 2016; Bedell et al., 2017). Short-wave infrared spectroscopy is commonly used in porphyry (Pour and Hashim, 2015; Han et al., 2018; Neal et al., 2018) and epithermal exploration (Crósta et al., 2003; Hooper et al., 2018). Recent efforts have been made to apply infrared spectroscopy techniques such as handheld and benchtop infrared analyzers (Ahmed, 2010; Bradford, 2008; Ahmed et al., 2009; Mateer, 2010; Browning, 2014), and, most recently, infrared core scanning technologies (Barker, 2017; Barker and Ridley, 2020) to Carlin-type gold deposits in Nevada. The utility of SWIR for Carlin-type gold deposits, however, has been limited due to the low reflectivity of samples which often produces flat, undiagnostic spectra.

In order to overcome the lack of reflectance and the difficulty of distinguishing minerals such as quartz and feldspars in SWIR, LWIR spectroscopy has been implemented by hyperspectral core logging systems such as the Hylogger-3 (Mauger et al., 2012; Arne et al., 2016) and SisuROCK (Tappert et al., 2015). However, many minerals contained within these fine-grained samples have characteristic peaks that overlap within the spectral range used in this study (7,500–12,000 nm, Salisbury, 2020). In addition, the impact of volume scattering in fine-grained rocks (Ramsey and Christensen, 1998; Zaini et al., 2012; Laukamp et al., 2018), grain orientation (McDowell et al., 2009; Tappert et al., 2013), and grain size (Zaini et al., 2012; Laukamp et al., 2018) on LWIR spectra have hampered our ability to interpret LWIR for mineralogy and mineral chemistry. In this contribution, we demonstrate the use of micro-X-ray fluorescence (μXRF) mapping, supported by machine learning, to provide robust, quantitative analysis of LWIR spectra to predict mineral abundances within LWIR-scanned rock samples. Previous attempts at quantifying LWIR spectral mixtures often involved the use of linear spectral mixture analysis (Gillespie, 1992; Ramsey and Christensen, 1998; Feely and Christensen, 1999; Ramsey, 2004), which requires a complete spectral library of all minerals present in the system. A reference library with a disproportionate number of end-member spectra to the mineral system in question can lead to high error and overfitting (Rogge et al., 2006; Hecker et al., 2012).

In this study, μXRF data were matched to LWIR spectra, and machine learning approaches were used to train models to predict minerals present within each LWIR image pixel. This method is similar to that presented in Hecker et al. (2012), where petrographic point counts were used as training data for a partial least-squares regression algorithm. The method for infrared mineral identification presented here circumvents the need to interpret the tremendous possible variations in spectral mixtures and obviates the need for a spectral library to be produced for each individual geologic domain. This approach could be broadly applied to any deposit type, potentially improving the speed and accuracy of spectral data interpretation by removing the need for custom libraries or relying on SWIR and LWIR spectra alone for mineralogical interpretation.

The rock samples in this study are fine-grained, carbonaceous, calcareous, hydrothermally altered sedimentary rocks from the Fourmile Carlin-type gold discovery, Nevada (Fig. 1). Typical samples may contain carbonate minerals, phyllosilicates, feldspars, and quartz, all of which have been shown to have characteristic features in the LWIR spectral range (Nash and Salisbury, 1991; Lane and Christensen, 1997; Yitagesu et al., 2011).

Data and Methods


Infrared scanning using the SisuROCK core imaging system provided by the company TerraCore was completed on 2,030 m of HQ drill core from seven drill holes, which passed through hydrothermally altered and unaltered sedimentary carbonate and minor low-grade metamorphic host rocks in a transect across the Fourmile Au discovery (Fig. 1). One hundred core samples ≤30 cm in length were collected from the infrared-scanned drill core for μXRF and whole-rock geochemical analysis. Of the 100 μXRF samples, 12 were used in this study (Table 1). These samples were selected to represent the full range of lithology types and alterations found in the IR-scanned drill core. Mineralogy of the samples selected was determined by XRD analysis of powders taken from the same 12 μXRF analyzed samples selected for this study.

Micro-XRF mineralogy

There are two primary reasons a μXRF chemical rastering technique was selected to inform LWIR spectra. First, the similarity of the rastering techniques between μXRF and infrared imaging produces data that are spatially referenced and can be related to one another using an image registration technique. Second, μXRF provided a way to estimate mineralogy independent of infrared techniques and produces a significant number of sample points (or pixels, ~1 × 106 points per μXRF image) for robust statistical analysis. The μXRF geochemical maps were produced on a Bruker Tornado μXRF (Bruker, 2018c), using a 100-μm step size (pixel size) and 25-μm spot size with standard conditions of analyses at 10 msec per pixel dwell time, two frame counts, and 50-kV acceleration voltage at the AuTec Laboratory in Vancouver, Canada. Quantitative chemical results were derived using the Bruker M4 (Bruker, 2018a) QMap fundamental parameter standardless quantification tool (Flude et al., 2017).

A linear programming algorithm was used to calculate mineralogy from quantified μXRF geochemistry. This method uses mineral formulas obtained through electron microprobe analysis to calculate mineralogy from fundamental parameter quantification of μXRF chemical maps (see Barker, et. al., 2020, for complete method description). Quantification from the fundamental parameter method using Bruker M4 software was completed on a 9- × 9-pixel grid, giving a final resolution of ~ 0.9 mm per pixel (aggregation of eighty-one 100-μm pixels).

Infrared spectroscopy

All hyperspectral scanning was completed using the SisuROCK core imaging system provided by the company TerraCore for VNIR, SWIR, LWIR, and RGB. The RGB camera produces 160-μm spatial resolution visible light images. For this project, the FENIX VNIR (350- to 1,000-nm range, 3.5-nm spectral resolution) and SWIR (1,000- to 2,500-nm range, 12-nm spectral resolution) camera produced coregistered images at 1.2-mm spatial resolution and a total of 410 bands. The OWL LWIR camera provided 1.2-mm spatial resolution images with 96 bands (7,500–12,000 nm) and 100-nm spectral resolution.

Preprocessing of hyperspectral data was completed by TerraCore. TerraCore uses the empirical line calibration method (ELC; Smith and Milton, 1999) for conversion of raw spectral data to reflectance. The ELC method directly compares image data and real spectra by using spectrally uniform light and dark pixels to draw a line-fitting algorithm to convert the raw digital numbers produced from the spectral cameras into physical units of reflectance (Bedell and Coolbaugh, 2009). In the case of TerraCore, the white and dark references are collected for each image during the scanning process. The white and dark references are then used to derive a linear adjustment between the raw digital number and the actual reflectance measured from the reference (Bedell and Coolbaugh, 2009). This is done on each box of scanned core to ensure consistent data collection.

For all LWIR data, a continuum removal was applied using the linear interpolation and division normalization method (Clark and Roush, 1984) in the R statistical programming language (R Core Team, 2017).

Image registration

Image registration was completed on 11 of the 12 μXRF images where the μXRF image was registered to a LWIR image of the same sample. Image registration is the process of aligning two images of the same object to compare images from multiple sources, sensor types, and spatial resolutions. Images from this study were registered by manually selecting tie points of the base and warp images in the R statistical programming language (R Core Team, 2017). The tie points were used to calculate a homography and apply an affine transformation using the R packages “raster 2.8-4” (Hijmans, 2018), and “imager 0.41.1” (Barthelme, 2018). Resampling of pixels, or the creation of new pixels in the warp image, was completed using a nearest neighbor resampling method (Canty, 2014). The final products were evaluated for accuracy by measurement of tie points to a predicted location on the warp image. The predicted location is based on a first-order polynomial transform that is generated from the tie points on the base image. A total root mean square error (RMSE) is produced for each image and is taken as a measurement of the overall fit of the tie points when combined with a visual assessment of overlaid base and warp images (Jin, 2017). Table 1 summarizes the number of tie points and RMSE for each sample.

Random Forest classification and regression

Random Forest is an example of an ensemble algorithm, which is designed to combine multiple weak, independently trained models, in this case decision trees, to make an overall prediction (Breiman, 2001). Decision trees generate classification or regression predictions by constructing a model from available training data (Breiman et al., 1984; Quinlan, 1986). Decisions are made at nodes in the tree structure based on splitting criteria that iteratively pass observations down the tree until a prediction decision is made for the given input. Decision trees can be sensitive to changes in the training sets such that different subsets of data can result in vastly different outcomes. Random Forest attempts to alleviate this problem by taking random subsamples from the training data to construct individual decision trees. The final prediction is based on a majority class vote for classification, or the average value for regression (Brownlee, 2018).

The Random Forest v4.6-14 package (Liaw and Wiener, 2002) was used to construct regression models and assess model performance. For each Random Forest model, the number of trees grown (ntree) was left at the default value of 500. The number of variables randomly sampled as candidates at each split (mtry) was also left at the default values of √p for classification and p/3 for regression, where p is the number of variables in the dataset. A series of ntree and mtry values were evaluated using the R package “caret” (v6.0-85; Kuhn, 2020) for each classification and regression model, with a range from 250 to 3,000 for ntree and 1 to 60 for mtry. No significant changes in classification accuracy or regression RMSE were observed across different ntree or mtry values.

A Random Forest binary classification model was used to generate a mask for the LWIR images for the purpose of reducing computational time, limiting false identification of core box pixels, and to improve the aesthetic of false color images. Regions of interest were manually selected from five core-box images and labelled as “rock” or “box.” The resulting Random Forest classifier was created with a total of 52,342 samples (12,973 box and 39,369 rock), 36,639 of which were training and 15,703 test samples, and 96 variables (LWIR bands). The resulting mask model produced results with an accuracy (proportion of correctly identified pixels to total number of pixels) of 0.98. The mask model was run on each box image prior to mineral identification. Pixels that resulted in a box label were given a null value.

Random Forest regression models were created by registering μXRF-derived mineralogy raster images to LWIR spectral images as training data for the following minerals: calcite, dolomite, kaolinite, white mica, potassium feldspar, phlogopite, and quartz. The total number of data points for the infrared regression models is shown in Table 2. Each data point contained 102 variables, 96 of which were LWIR band intensities (LWIR spectra) and six were spectral features that were extracted from the LWIR spectra, such as peak and trough wavelength positions and peak intensity ratio (Table 3).

Performance of models was measured by doing a 70/30 train-test split and calculating the root mean square error (RMSE) and R2 (Pearson), slope, and intercept of the line of best fit (least-squares method) between training test values (mineralogy from μXRF) and predicted values (mineralogy from LWIR). Both RMSE and R2 are measurements of how well a dataset fits to a line that is produced from the input and predicted values. To verify that the predictions are not over-or underestimated, the slope of the line should be close to one and the intercept should be near zero. Therefore, the slope and intercepts are used in conjunction with RMSE and R2 for a more complete performance measurement of the model.

External validation

The LWIR-derived mineralogy results were aggregated to multielement geochemistry and assay intervals (5, 10, and 20 ft) for the same 2,030 m of drill core samples. The multielement geochemistry and assay, provided by Barrick Gold Exploration Inc., were analyzed at ALS, Elko, Nevada for Au and Ag assay and 48 element, 4-acid digest geochemistry (Li, Be, Na, Mg, Al, P, S, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Ge, As, Se, Rb, Sr, Y, Zr, Nb, Mo, Ag, Cd, In, Sn, Sb, Te, Cs, Ba, La, Ce, Hf, Ta, W, Re, Hg, Tl, Pb, Bi, Th, and U). Predicted elemental concentrations (from Random Forest models) were calculated using the mineralogy predictions and compared to multielement geochemistry from each interval for Al, Ca, K, and Mg. Carbon was estimated from multielement geochemistry by calculating CO2 from carbonate in the following equation:


where CO2Dol is the proportion of CO2 in dolomite (0.458), CaODol is the proportion of CaO in dolomite (0.719), and CO2Cal is the proportion of CO2 in calcite (1.274). This method cannot account for organic C and so there must be an assumption that organic C is minor compared to C from carbonate. This assumption is considered to be reasonable because of the low organic content of samples from the unaltered Roberts Mountain Formation (~0.25%; Wells and Mullens, 1973; Clark Maroun et al., 2017) and the altered and unaltered Wenban Formation (<0.05–0.25%; Wells et al., 1969) of the neighboring Cortez Hills Au district. Silica for four acid ICPMS analyses (which do not provide Si concentrations) was calculated by summing all oxides, including the calculated carbonate in the equation:


which assumes that the missing chemical component is SiO2. This approach for estimating C and Si was validated by comparing the predicted results to those obtained from the 100 whole-rock geochemical laboratory analyses. Samples used to verify the Si and C calculation method were characterized for whole-rock geochemistry by ALS Laboratory in Reno, Nevada, using a “complete whole rock characterization package,” which includes major elements measured by XRF and ICP-AES on lithium borate fused disks, trace elements by ICPMS on fused beads, and total carbon and sulfur measured by infrared combustion (LECO) analysis.


Long-wave infrared (LWIR) was chosen for this study due to the lack of reflectivity of the Carlin-type gold deposit host rocks in the short-wave infrared (SWIR) range. Figure 2 is an example of the Carlin-type gold deposit rocks used in this study with a comparison of the SWIR and LWIR spectra from five points taken throughout the box of core. It shows that SWIR spectra from these points are flat and undiagnostic, whereas the LWIR spectra has characteristic peaks that can be used for mineral identification. A scatter plot of peak intensity relative to measured albedo for SWIR and longwave infrared (LWIR) data (Fig. 3) shows the lack of SWIR response in low albedo samples compared to that of LWIR, which demonstrates the limitation of SWIR in dark lithologies. In addition, a plot of the same samples illustrating the intensity of carbonate peaks in the LWIR range correlates to variations in Ca content derived from multielement geochemistry, whereas there is no obvious association in the SWIR due to the lack of reflectance (Fig. 4).

LWIR spectra

Sample LWIR spectra from individual pixels show multiple spectral absorption features indicating a mixture of mineralogy (Fig. 5). In most cases, the spectra from each pixel likely contains more than just the target mineral, due to the fine-grained nature of the rocks, causing characteristic peak overlap and volume scattering (Ramsey and Christensen, 1998), which makes spectral library matching for quantification difficult (Fig. 5).

LWIR Random Forest regression models: predicted mineral proportions

Random Forest regression models for mineral identification were constructed using LWIR spectra of LWIR raster images trained by μXRF-derived quantitative mineralogy. Samples (pixels) were split in a 70/30 ratio where 70% was used to train the models and 30% used to test the predictions of the models. The RMSE for model predictions (measured versus predicted) ranged between 1.17 and 6.75%, with R2 values of 0.82 to 0.94, and regression lines with slopes near one (1.01-1.09) and intercepts near zero (–0.70 - –0.09) for calcite, dolomite, white mica, kaolinite, K-feldspar, phlogopite, and quartz (Fig. 6).

A test μXRF sample was withheld from model training for the purpose of validation of the textural and geologic significance of the Random Forest results. Figures 7 and 8 show side-by-side comparisons of μXRF-derived quantified mineralogy and LWIR Random Forest quantified mineralogy results for sample M180080 for the minerals calcite, dolomite, quartz, and white mica. For descriptive purposes, sections of the sample are numbered in Figures 7 and 8 according to textural significance where (1) is the relatively unaltered host rock, (2) is the larger calcite + quartz veins in the unaltered host rock, (3) shows the silicification front, (4) is the zone of pervasive silicification, (5) shows a thin dolomite vein, and (6) is a zone of high calcite in the lower right corner of the image.

Large calcite veins can be identified in the LWIR Random Forest results (2), as well as calcite in the host rock (1), along the silicification front (3), and in the lower right corner of the image (6). Calcite veins seen in the μXRF image, which are smaller than the spatial resolution of the LWIR image, cannot be clearly recognized in the LWIR image. Some variation in calcite quantity is predicted in the host rock (1) that is not apparent in the μXRF image. It appears the Random Forest model predicted the left side of the image to be slightly higher in calcite abundance than the right (Fig. 7A, B).

Dolomite (Fig. 7C, D) in the predicted LWIR image has similar general textures in the host rock (1) as in the μXRF. Much of the textural detail is lost in the LWIR image, such as the small grains of dolomite that can be seen in the μXRF, but the general presence of dolomite in the host rock (1) and not in the silicified section (3, 4) nor the calcite + quartz vein (2) are similar. Many of the small dolomite veins (5), which are thinner than the spatial resolution of the LWIR, are also not visible in the LWIR image. There also appears to be a small amount of calcite predicted as dolomite in the lower portion of Figure 6.

The quartz mineral maps (Fig. 8A, B) are of similar abundances and show most of the same textures such as a quartz vein (2), silicification front (3), pervasive silicification (1), and lack of silicification (6) in the carbonate-rich zones. Like calcite, there appears to be variation in the quartz quantity within the host rock (1) of the LWIR image that is not seen in the μXRF, except the increase in quantity is seen on the right side.

The white mica mineral maps (Fig. 8C, D) have very similar quantities between the LWIR and μXRF. General textures between the two images appear to agree in the region of the silicification front (3), pervasive silicified zone (4), and calcite zone (6). There appears to be some error in the prediction of the host-rock composition (1). The μXRF shows variable white mica from ~5 to 15% with the highest quantities around the larger calcite + quartz veins. The Random Forest has predicted the host rock to have a more homogeneous white mica distribution (possibly because of the coarse resolution of the LWIR images) and to be generally higher in abundance, with slightly less mica around the calcite + quartz veins.

Visually, the predicted mineralogy results have similar textures and mineralogical patterns as that of the μXRF-derived mineralogy map of the test sample for calcite, dolomite, quartz, and white mica (Figs. 7, 8). However, minor differences can be seen in some of the mineral distributions, especially when it comes to loss of detail (due to the change in image resolution between XRF and LWIR).

Variable importance

The variable importance is a measure of how much a variable increases accuracy when included or decreases accuracy when omitted. It is a typical method for understanding and interpreting the inner workings of a Random Forest model as it provides a record of the variables which are most diagnostic for a prediction. Figure 9 shows the variable importance displayed as a function of wavelength and thus shows which wavelengths were most important in the creation of each mineral model. In some cases, such as dolomite and quartz, characteristic features were the primary variables used in creating the models. Other models like kaolinite however gained the most accuracy by utilizing peak wavelengths such as the ~8,000- to 9,000-nm feature position, as well as some other LWIR bands that don’t appear to be related to recognized spectral features present on reference spectra. In these other cases, the important variable may be a lack of characteristic feature or the approach to a peak (i.e., a peak shoulder).

Regression model results compared to multielement geochemistry

The Random Forest mineral models were run on LWIR images of 2,030 m of drill core from seven drill holes. The mineralogy results were converted to chemistry and aggregated to multielement geochemistry data intervals at 5, 10, and 20 ft for external validation of Random Forest model results (Fig. 10). Multielement proportions for Al, Ca, K, and Mg are measured via 4-acid digest method and C and Si were estimated from the 4-acid digest data (see “Methods”) because neither C nor Si are measured in the 4-acid digest analysis. The calculated Si and C was compared to the measured values for Si and C (geochemistry results from this study using whole-rock XRF for Si and LECO analyses for total C) for validation of this Si and C estimation method (Fig. 11), which produced results with absolute error of ±4.5% for CO2 and ±4.1% for SiO2. The comparison produced RMSE values of 1.06 (Al), 4.55 (Ca), 1.65 (K), and 2.19 (Mg) when compared to measured values. Compared to calculated concentrations, RMSE values were 1.24 (C) and 6.19 (Si). Elements Al, C, Ca, and Si have regression lines with slopes close to one, intercepts near zero, and R2 values that range from 0.55 (Mg) to 0.94 (C and Ca). Potassium and magnesium appear to be slightly overestimated by the models overall and both have a lower R2 value.


LWIR regression models

Feely and Christensen (1999) used a linear spectral deconvolution algorithm to quantify mineral mixtures for major rock-forming minerals using LWIR. They reported residual errors of ±7 to 15% for minerals such as feldspar, quartz, calcite and dolomite, as compared to our Random Forest predictive RMSE of 1.17 to 6.75% and external RMSE of 1.06 to 6.19% (compared to QXRD) for calcite, dolomite, kaolinite, white mica, K-feldpsar, phlogopite, and quartz derived using our modeled Random Forest regression.

Whereas the error of each method is comparable, the methods for mineral identification are very different. The spectral unmixing method requires a complete library of characteristic spectra that represent the minerals present within the samples. Feely and Christensen (1999) also reported on results from coarse-grained rocks, whereas the samples from this study contain very fine-grained minerals. The utility of linear deconvolution is limited in fine-grained rocks due to additional spectral features caused by volume scattering, which is difficult to account for in spectral libraries (Hubbard et al., 2018). The benefit of linear spectral deconvolution is that an accurately defined spectral library can be transferable to other field areas without additional data (as long as the samples have sufficiently coarse grain sizes). However, an incomplete library, or a library that contains characteristic spectra of minerals not present in the system, can lead to large increases in error or compensation by misidentification of a combination of other minerals (Rogge et al., 2006; Hecker et al., 2012). In contrast, the Random Forest regression modeling approach presented here does not require a spectral library to be produced for each individual geologic domain. Furthermore, the approach adopted in this study provides a built-in baseline (whereby other analytical results are used for training data), with error measurement to evaluate the robustness of the results.

The visual comparison of calcite, quartz, and dolomite μXRF-derived mineralogy (Figs. 7A, C, 8A, C, using the methods presented in Barker et al., 2020) and the results of the LWIR regression model (LWIR interpreted mineralogy) on a test sample (M180080), which was omitted from the training data (see Figs. 7, 8), show that the two produce a similar geometric distribution of mineralogy. Figure 7B has textural and mineralogical results consistent with a carbonate host rock that has been decalcified and later crosscut by calcite veins. The resulting image is interpreted to represent a decalcification front with the image top being relatively unaltered and the bottom almost completely decalcified in some areas. Figure 8B shows the silica replacement that often occurs during, or after, decalcification of carbonate rocks in Carlin-type systems (Cline et al., 2005). It also shows quartz veins and the distribution of primary quartz in the unaltered silty carbonate. Due to the difference in spatial resolution (100 μm μXRF and 1.2 mm for LWIR), much of the finer scale details have been lost in the LWIR images. Veins with a width of <1 mm are not identifiable in the Random Forest-predicted LWIR images. There are also small discrepancies (error) between the two image types, especially in the fine-grained host rock. Two of the LWIR images (calcite and quartz) show what may be a “shadow” effect (Figs. 7B, 8B). This may be due to a slightly uneven heating of the sample during analysis. Visually, the LWIR-predicted mineralogy images are comparable to the μXRF-derived mineralogy map with similar mineral abundances and provide details that are beneficial in understanding the alteration that has occurred in this sample.

A final external validation showed that the Random Forest regression models produced results comparable to multielement geochemistry over the scanned 2,030 m of drill core (at 5-, 10-, and 20-ft intervals). The relatively low RMSE (1.06–6.19%) suggest that despite the difference in data type, scale, and source (surface scans versus whole-rock analyses from split core from the same intervals), the Random Forest models produced mineral abundance estimates with chemical components that are consistent with the multielement geochemistry. The low R2 value and shallow slope of the regression line for K and Mg suggests that these elements have been overestimated by the Random Forest models. The difficulty of distinguishing white mica from K-feldspar and phlogopite in the μXRF training data likely contributes to error in the K, whereas the high detection limit of Mg in the μXRF method may have contributed to error in identifying Mg-bearing minerals. However, the results of Al, C, and Ca suggest that the total aluminosilicate and carbonate mineral predictions are accurate. With the development of a more sensitive μXRF analyzers such as the Bruker Tornado Plus (Bruker, 2018b), with lower detection limits for the light elements, Mg predictions may be improved.

The strength of the Random Forest regression models is in their ability to estimate (with relative accuracy and precision) the mineral proportions of complicated spectral mixtures in fine-grained mineralogical systems such as Carlin-type gold deposits. This method is especially useful for data-rich projects, such as those typically managed by the mining industry. Such projects often have multiple datasets readily available that can be used for training LWIR spectra. An additional advantage of this method is in the turnaround time. This method can be completed in-house by workers with an understanding of the particular geologic system in question by collecting samples containing representative mineralogy and creating a training dataset using commercially available analytical rastering technique such as μXRF (as described in Barker, et. al., 2020) or quantitative SEM (e.g. MLA, QEMSCAN; Gottlieb, 2008). Being able to describe the mineralogical variations quantitively and with more confidence, potentially in “real time” as core scanning proceeds, may lead to the discovery of new exploration vectors for Carlin-type and other mineralization styles, particularly in fine-grained sedimentary rocks. Use of this method can provide detailed logs of mineralogy that can be used by geologists to better understand paragenesis and alteration. It may also provide value to metallurgists who can understand the relative abundance of quartz, carbonate, and clay mineralogy, which could affect hardness and mineral processing characteristics (Escolme et al., 2019).


In this study LWIR was used to predict mineral abundances using Random Forest regression models. Results of the Random Forest regression models indicate that the use of μXRF-derived mineralogy to inform LWIR of mineral proportions using Random Forest modeling can produce relatively accurate and precise mineral estimates. The RMSE rates from this study (1.17–6.75% internal and 1.06–6.19% external) are comparable to error of proportion estimates of a linear spectral deconvolution algorithm (±7–15%), a commonly used spectral unmixing method.


We would like to thank Peter Reutemann and Dale Fletcher from the Computing and Mathematical Sciences Department at the University of Waikato for their tutelage in machine learning. Thank you to Dave Browning of Terracore for help with infrared-related questions and technical support. Finally, we would like to thank Barrick Gold Exploration Inc. and the University of Waikato Doctoral Scholarship program for sponsoring this research.

Rocky Barker obtained his M.Sc. degree in geoscience from Colorado State University in 2017. He has worked as a geoscientist and data science consultant in the mineral industry for companies such as Barrick Gold Exploration Inc., Nevada Gold Mines, and AusSpec. He is currently a thirdyear Ph.D. candidate at the University of Waikato in Hamilton, New Zealand. Rocky’s Ph.D. work focuses on the prediction of mineralogy and mineral chemistry in a Carlin-type gold deposit using hyperspectral infrared data. His study involves the integration of geochemical datasets collected at various scales, supported by machine learning methods.

Gold Open Access: This paper is published under the terms of the CC-BY 3.0 license.