Faced with ongoing depletion of near-surface ore deposits, geologists are increasingly required to explore for deep deposits or those lying beneath surface cover. The result is increased drilling costs and a need to maximize the value of the drill hole samples collected. Laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS) analysis of pyrite is one tool that is showing promise in deep exploration. Since the trace element content of pyrite approximates the composition of the fluid from which it precipitated and the crystallization mechanism, the trace element characteristics can be used to predict the type of deposit with which a pyritic sample is associated. This possibility, however, is complicated by overlapping trace element abundances for many deposit types. The solution lies with simultaneous comparison of multiple trace elements through rigorous statistical analysis. Specifically, we used LA-ICP-MS pyrite trace element data and Random Forests, an ensemble machine learning supervised classifier, to distinguish barren sedimentary pyrite and five ore deposit categories: iron oxide copper-gold (IOCG), orogenic Au, porphyry Cu, sedimentary exhalative (SEDEX), and volcanic-hosted massive sulfide (VHMS) deposits. The preferred classifier utilizes in situ Co, Ni, Cu, Zn, As, Mo, Ag, Sb, Te, Tl, and Pb measurements to train the Random Forests. Testing of the Random Forests classifier using additional data from the same deposits and sedimentary basins (test data set) yielded an overall accuracy of 91.4% (94.9% for IOCG, 78.8% for orogenic Au, 81.1% for porphyry Cu, 93.6% for SEDEX, 97.2% for sedimentary pyrite, 91.8% for VHMS). Similarly, testing of the Random Forests classifier using data from deposits and sedimentary basins that did not have analyses in the training data set yielded an overall accuracy of 88.0% (81.4% for orogenic Au, 95.5% for SEDEX, 90.0% for sedimentary pyrite, 73.9% for VHMS; insufficient data was available to perform a blind test on porphyry Cu and IOCG). The performance of the classifier was further improved by instituting criteria (at least 40% of total votes from the Random Forests needed for a conclusive identification) to remove uncertain or inconclusive classifications, increasing the classifier’s accuracy to 94.5% for the test data (94.6% for IOCG, 85.8% for orogenic Au, 87.8% for porphyry Cu, 95.4% for SEDEX, 98.5% for sedimentary pyrite, 94.6% for VHMS) and 93.9% for the blind test data (85.5% for orogenic Au, 96.9% for SEDEX, 96.7% for sedimentary pyrite, 84.6% for VHMS).

The Random Forests classification models for pyrite trace element data can be used as a predictive modeling tool in greenfield terrains by providing an accurate indication of ore deposit type. This advance will assist mineral explorers by allowing early implementation of predictive ore deposit models when prospecting for ore deposits. Furthermore, the ability of the classifier to accurately identify pyrite of sedimentary origin will allow researchers interested in paleoenvironmental conditions of ancient oceans to effectively screen prospective samples that are affected by a hydrothermal overprint.

The correct classification of ore deposits in the early stage of an exploration project can greatly enhance the efficiency of exploration, as it allows for the early application of predictive geologic models. This improvement is especially important when exploring beneath cover due to the increased costs of drilling deep drill holes and when the surface geology or geochemistry fails to reveal details about the deposit at depth. For example, minor disseminated pyrite in a sericite alteration zone intersected in a drill hole under cover could be related to a porphyry Cu outer halo, a volcanic-hosted massive sulfide (VHMS) system footwall alteration zone, a high-sulfidation epithermal Au zone, or barren pyrite unrelated to an ore system. Each of these mineralization types demands a different approach to exploration. Knowing which type is present can save exploration time and money.

Laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS) allows for the determination of the trace element content of individual minerals. These data are useful because different ore deposit types have different fluid sources, metal sources, and depositional mechanisms, all of which can significantly affect the trace element content of the minerals that precipitate from them (Gregory et al., 2014; Tardani et al., 2017). Furthermore, these trace elements can be preserved in their mineral hosts during successive hydrothermal and metamorphic events. In this study we focus on pyrite because it is present in many different types of ore deposits, its trace element content can be preserved up to midgreenschist facies (Large et al., 2009), and there are large data sets available that provide (background) trace element contents of pyrite formed in sedimentary environments without hydrothermal inputs (Large et al., 2014, 2015a; Gregory et al., 2015a). To achieve our objective, LA-ICP-MS analyses of pyrite from a series of different deposit types (iron oxide copper-gold [IOCG], orogenic Au, porphyry Cu, sedimentary exhalative [SEDEX], VHMS deposits, and barren sedimentary pyrite) were used to train a Random Forests classifier to predict deposit type using pyrite LA-ICP-MS analyses. The utility extends to the paleoceanography community, because the presence or absence of hydrothermal overprints/contributions are often unclear (Gregory et al., 2017), thus eroding confidence in reconstructions of ancient conditions in the oceans.

Random Forests, a supervised classification algorithm, has proven to be an ideal choice for accurately predicting categories from multivariate input features across a wide range of data sets (Fernández-Delgado et al., 2014), but it has only rarely been applied to economic geology problems. While notable exceptions exist, such as identifying zones of hydrothermal alteration and host-rock types (Cracknell et al., 2014) and modeling of mineral prospectivity (e.g., Rodriguez-Galiano et al., 2014; Carranza and Laborte, 2015), many other opportunities remain untested. Additionally, only one previous study (O’Brien et al., 2015) used Random Forests analysis of the trace element contents of individual mineral phases (i.e., gahnite), despite the large amount of multielement geochemistry data generated in recent years by LA-ICP-MS. In this contribution we provide a proof of concept—that is, we show how the Random Forests method can be used to classify ore deposit type both as an exploration tool and as a means of identifying samples most representative of primary marine conditions uncompromised by secondary overprints.

Supervised classification

The concept of supervised classification can be thought of as linking input features to target classes via a discrimination function y = f(x). Input features x are represented as m vectors of the form {x1,…,xm}, and y is a finite set of c class labels {y1,…,yc}. Given N instances of x and y, supervised classification attempts to train a classification model f based on a limited number of training samples (Gahegan, 2000; Hastie et al., 2009; Kovacevic et al., 2009).

In general, there are three stages to supervised classification: (1) data preprocessing, (2) classifier training, and (3) prediction evaluation (Cracknell and Reading, 2014). Data preprocessing involves compiling, correcting, and transforming inputs to a representative set of features containing information relevant to the classification problem (Guyon, 2008; Hastie et al., 2009). Classifier training usually requires the adjustment and selection of one or more parameters, specific to a given supervised classifier, that optimize performance on a given set of input features and target classes (Guyon, 2009). The selection of relevant features necessarily reduces the dimensionality of the input data, thus speeding up processing time while also facilitating interpretations of the relationships between categories and features (Cracknell et al., 2014). Prediction evaluation is vital for assessing the validity of classification outcomes and is typically carried out using a test data set not previously seen by the classifier. An assessment of test data and blind test classifications through a confusion matrix and standard classification metrics—such as overall accuracy, recall, and precision—provides an unbiased indication of the performance of trained classifiers (Congalton and Green, 1998).

Random Forests

Random Forests (Breiman, 2001) is an ensemble supervised classifier that generates predictions based on a majority vote cast by multiple randomized decision trees, known as a forest. Randomness is introduced by randomly subsetting a number of input features to split at each node of a decision tree and by bagging (bootstrap aggregation). Bagging (Breiman, 1996) generates training data for a single decision tree by sampling, with replacement, a number of samples equal to the number of instances in the training data. The Gini index is used by Random Forests to determine a best split threshold at each node of a decision tree. The Gini index is defined as

Gini(t)=c=1jgc(1gc),
(1)

where gc is the probability or the relative frequency of class c at node j and is given by

gc=ncn,
(2)

where nc is the number of samples belonging to the class c, and n is the total number of samples within a particular node. For each candidate split, the threshold that defines maximum reduction in class heterogeneity of the resulting child nodes is selected (Breiman, 1984; Waske et al., 2009).

In addition to a label indicating a predicted class for a given sample, Random Forests produces class membership probabilities. These occur in the form of a vector p comprising probabilities for individual predictions representing the proportion of decision trees that predict candidate classes.

Data preprocessing was primarily executed in standard spreadsheet software (Microsoft Excel), with Random Forests classifier training and prediction evaluation conducted in the open source data mining software platform Orange version 3.18 (Demsar et al., 2013).

LA-ICP-MS data sources and preprocessing

This project arose from two major programs of pyrite analysis funded by the Geological Survey of Western Australia (Belousov et al., 2016) and the Geological Survey of South Australia (D. Gregory, unpub. report, 2015), where pyrite from a large number of ore deposits in both states was analyzed. Additional data from various ore deposits have been analyzed subsequently, leading to the current database of 3,579 pyrite analyses (Figs. 1, 2). LA-ICP-MS data has been provided from a number of different sources, including published peer reviewed manuscripts (Maslennikov et al., 2009, 2017; Large et al., 2014, 2015b; Revan et al., 2014; Gregory et al., 2015a, b, 2016, 2017; Gadd et al., 2016), project reports (G. Davidson, unpub. report, 2005; D. Gregory, unpub. report, 2015), Ph.D. theses (Maier, 2011), and new, previously unreported data from the Chalkidiki porphyry Cu district, Greece, and the Lady Loretta SEDEX deposit, Australia.

Fig. 1.

Sample location map for the entire data set.

Fig. 1.

Sample location map for the entire data set.

Fig. 2.

Sample location map for samples from Australia.

Fig. 2.

Sample location map for samples from Australia.

All pyrite analyses except those taken from Gadd et al. (2016) were conducted at the LA-ICP-MS facility located at the University of Tasmania, Australia; however, spot size and the number of standards varied. Detailed analytical procedures are available in the references in Table 1. All samples (except for the Gadd et al., 2016, data, which lacked Te and Au) were analyzed for Co, Ni, Cu, Zn, As, Mo, Ag, Sb, Te, Au, Tl, and Pb, and these are the elements emphasized here. When analyses were below detection limits, either half the detection limit was used or the value was inserted from the referred literature source. Because Gadd et al. (2016) did not report Te or Au, we used average values for these elements from the Lady Loretta SEDEX deposit. These data were assumed to be reasonable estimates, as these elements are commonly below detection in SEDEX deposits. Analyses were conducted on 2.5-cm-diameter polished laser mounts.

Table 1.

Sample Location and Number of Samples Used for Random Forests Test, Training, and Blind Test Data Sets

LocationDeposit typeNumber of training analysesNumber of test analysesNumber of blind test analysesReference
Manxman, AustraliaIOCG9531 D. Gregory, unpub. report, 2015
Punt Hill, AustraliaIOCG258 D. Gregory, unpub. report, 2015
Darlot, AustraliaOrogenic gold85 Belousov et al., 2016 
East Repulse, AustraliaOrogenic gold837 Gregory et al., 2016 
Fortnum, AustraliaOrogenic gold822 Belousov et al., 2016 
Golden Mile, AustraliaOrogenic gold835 Belousov et al., 2016 
Granny Smith, AustraliaOrogenic gold84 Belousov et al., 2016 
Lancefield, AustraliaOrogenic gold82 Belousov et al., 2016 
Mars, AustraliaOrogenic gold89 Belousov et al., 2016 
Meekathara, Micky Doolan, AustraliaOrogenic gold82 Belousov et al., 2016 
Meekathara, Prohibition, AustraliaOrogenic gold82 Belousov et al., 2016 
Minjar, AustraliaOrogenic gold82 Belousov et al., 2016 
Nathans Labouchre, AustraliaOrogenic gold83 Belousov et al., 2016 
Paddington, Western AustraliaOrogenic gold82 Belousov et al., 2016 
Songvang, AustraliaOrogenic gold85 Belousov et al., 2016 
Sunrise Dam, AustraliaOrogenic gold84 Belousov et al., 2016 
Victory, AustraliaOrogenic gold864 Gregory et al., 2016 
Cadia, AustraliaPorphyry6040  
Chalkidiki, GreecePorphyry60256  
Don, CanadaSEDEX2497 Gadd et al., 2016 
HYC, AustraliaSEDEX24316 Maier, 2011 
Lady Loretta, AustraliaSEDEX2425  
Pelly North, CanadaSEDEX2476 Gadd et al., 2016 
XY deposit, CanadaSEDEX24163 Gadd et al., 2016 
Aralka Armadeus basin, AustraliaSedimentary pyrite1012 Large et al., 2014; Gregory et al., 2015a 
Barney Creek Formation, AustraliaSedimentary pyrite105 Large et al., 2014; Gregory et al., 2015a 
Canning basin, AustraliaSedimentary pyrite10197 Large et al., 2014; Gregory et al., 2015a 
Carbondale, USASedimentary pyrite1019 Large et al., 2014; Gregory et al., 2015a 
NW Shelf, Late Jurassic, AustraliaSedimentary pyrite105 Large et al., 2014; Gregory et al., 2015a 
Hamersley basin, AustraliaSedimentary pyrite10125 Large et al., 2014; Gregory et al., 2015a, b 
Perth basin, AustraliaSedimentary pyrite1023 Large et al., 2014; Gregory et al., 2015a 
Woody Island siltstone, AustraliaSedimentary pyrite1012 Large et al., 2014; Gregory et al., 2015a 
Canarvon basin, AustraliaSedimentary pyrite108 Large et al., 2014; Gregory et al., 2015a 
Salmon River siltstone, AustraliaSedimentary pyrite106 Large et al., 2014; Gregory et al., 2015a 
Satkinskaya Suite, RussiaSedimentary pyrite1019 Large et al., 2014; Gregory et al., 2015a 
Selwyn basin, CanadaSedimentary pyrite10221 Large et al., 2014; Gregory et al., 2015a 
Kutlular, TurkeyVHMS156 Revan et al., 2014 
Kyzilkaya, TurkeyVHMS13  Revan et al., 2014 
Lahanos, TurkeyVHMS15  Revan et al., 2014 
Jaguar, AustraliaVHMS10  Belousov et al., 2016 
Golden Grove, AustraliaVHMS12  Belousov et al., 2016 
Scuddles, AustraliaVHMS9  Belousov et al., 2016 
Yaman-Kasy deposit, RussiaVHMS46310 Maslennikov et al., 2009, 2017 
Hill 50, AustraliaOrogenic gold  22Belousov et al., 2016 
Wallaby, AustraliaOrogenic gold  23Belousov et al., 2016 
Wiluna, AustraliaOrogenic gold  62Belousov et al., 2016 
Youanmi, AustraliaOrogenic gold  11Belousov et al., 2016 
Anniversary deposit central, CanadaSEDEX  44Gadd et al., 2016 
Anniversary deposit east, CanadaSEDEX  15Gadd et al., 2016 
OP, CanadaSEDEX  7Gadd et al., 2016 
Alum shale, SwedenSedimentary pyrite  28Large et al., 2014; Gregory et al., 2015a 
Cowrie siltstone, AustraliaSedimentary pyrite  28Large et al., 2014; Gregory et al., 2015a 
Curnamona, AustraliaSedimentary pyrite  37Large et al., 2014; Gregory et al., 2015a 
Dead Bullock Formation, AustraliaSedimentary pyrite  25Large et al., 2014; Gregory et al., 2015a 
Doushantuo Formation, ChinaSedimentary pyrite  106Gregory et al., 2017 
Gordon Group, AustraliaSedimentary pyrite  13Large et al., 2014; Gregory et al., 2015a 
Jet Rock Formation, UKSedimentary pyrite  28Large et al., 2014; Gregory et al., 2015a 
Johnson Cairn Formation, AustraliaSedimentary pyrite  17Large et al., 2014; Gregory et al., 2015a 
Liuchapo Formation, ChinaSedimentary pyrite  10Gregory et al., 2017 
Lower Keppel Creek Formation, AustraliaSedimentary pyrite  48Large et al., 2014; Gregory et al., 2015a 
Oxford J3, RussiaSedimentary pyrite  29Large et al., 2014; Gregory et al., 2015a 
Armadeus basin, AustraliaSedimentary pyrite  9Large et al., 2014; Gregory et al., 2015a 
Posidonia shale, GermanySedimentary pyrite  20Large et al., 2014; Gregory et al., 2015a 
Que River shale, AustraliaSedimentary pyrite  14Large et al., 2014; Gregory et al., 2015a 
Railway shale, AustraliaSedimentary pyrite  15Large et al., 2014; Gregory et al., 2015a 
Valkyrie Formation, AustraliaSedimentary pyrite  9Large et al., 2014; Gregory et al., 2015a 
Yeneena basin, AustraliaSedimentary pyrite  15Large et al., 2014; Gregory et al., 2015a 
Chaely deposit, TurkeyVHMS  4Revan et al., 2014 
DeGrussa, AustraliaVHMS  32Belousov et al., 2016 
Kilik, Ural, TurkeyVHMS  10Revan et al., 2014 
LocationDeposit typeNumber of training analysesNumber of test analysesNumber of blind test analysesReference
Manxman, AustraliaIOCG9531 D. Gregory, unpub. report, 2015
Punt Hill, AustraliaIOCG258 D. Gregory, unpub. report, 2015
Darlot, AustraliaOrogenic gold85 Belousov et al., 2016 
East Repulse, AustraliaOrogenic gold837 Gregory et al., 2016 
Fortnum, AustraliaOrogenic gold822 Belousov et al., 2016 
Golden Mile, AustraliaOrogenic gold835 Belousov et al., 2016 
Granny Smith, AustraliaOrogenic gold84 Belousov et al., 2016 
Lancefield, AustraliaOrogenic gold82 Belousov et al., 2016 
Mars, AustraliaOrogenic gold89 Belousov et al., 2016 
Meekathara, Micky Doolan, AustraliaOrogenic gold82 Belousov et al., 2016 
Meekathara, Prohibition, AustraliaOrogenic gold82 Belousov et al., 2016 
Minjar, AustraliaOrogenic gold82 Belousov et al., 2016 
Nathans Labouchre, AustraliaOrogenic gold83 Belousov et al., 2016 
Paddington, Western AustraliaOrogenic gold82 Belousov et al., 2016 
Songvang, AustraliaOrogenic gold85 Belousov et al., 2016 
Sunrise Dam, AustraliaOrogenic gold84 Belousov et al., 2016 
Victory, AustraliaOrogenic gold864 Gregory et al., 2016 
Cadia, AustraliaPorphyry6040  
Chalkidiki, GreecePorphyry60256  
Don, CanadaSEDEX2497 Gadd et al., 2016 
HYC, AustraliaSEDEX24316 Maier, 2011 
Lady Loretta, AustraliaSEDEX2425  
Pelly North, CanadaSEDEX2476 Gadd et al., 2016 
XY deposit, CanadaSEDEX24163 Gadd et al., 2016 
Aralka Armadeus basin, AustraliaSedimentary pyrite1012 Large et al., 2014; Gregory et al., 2015a 
Barney Creek Formation, AustraliaSedimentary pyrite105 Large et al., 2014; Gregory et al., 2015a 
Canning basin, AustraliaSedimentary pyrite10197 Large et al., 2014; Gregory et al., 2015a 
Carbondale, USASedimentary pyrite1019 Large et al., 2014; Gregory et al., 2015a 
NW Shelf, Late Jurassic, AustraliaSedimentary pyrite105 Large et al., 2014; Gregory et al., 2015a 
Hamersley basin, AustraliaSedimentary pyrite10125 Large et al., 2014; Gregory et al., 2015a, b 
Perth basin, AustraliaSedimentary pyrite1023 Large et al., 2014; Gregory et al., 2015a 
Woody Island siltstone, AustraliaSedimentary pyrite1012 Large et al., 2014; Gregory et al., 2015a 
Canarvon basin, AustraliaSedimentary pyrite108 Large et al., 2014; Gregory et al., 2015a 
Salmon River siltstone, AustraliaSedimentary pyrite106 Large et al., 2014; Gregory et al., 2015a 
Satkinskaya Suite, RussiaSedimentary pyrite1019 Large et al., 2014; Gregory et al., 2015a 
Selwyn basin, CanadaSedimentary pyrite10221 Large et al., 2014; Gregory et al., 2015a 
Kutlular, TurkeyVHMS156 Revan et al., 2014 
Kyzilkaya, TurkeyVHMS13  Revan et al., 2014 
Lahanos, TurkeyVHMS15  Revan et al., 2014 
Jaguar, AustraliaVHMS10  Belousov et al., 2016 
Golden Grove, AustraliaVHMS12  Belousov et al., 2016 
Scuddles, AustraliaVHMS9  Belousov et al., 2016 
Yaman-Kasy deposit, RussiaVHMS46310 Maslennikov et al., 2009, 2017 
Hill 50, AustraliaOrogenic gold  22Belousov et al., 2016 
Wallaby, AustraliaOrogenic gold  23Belousov et al., 2016 
Wiluna, AustraliaOrogenic gold  62Belousov et al., 2016 
Youanmi, AustraliaOrogenic gold  11Belousov et al., 2016 
Anniversary deposit central, CanadaSEDEX  44Gadd et al., 2016 
Anniversary deposit east, CanadaSEDEX  15Gadd et al., 2016 
OP, CanadaSEDEX  7Gadd et al., 2016 
Alum shale, SwedenSedimentary pyrite  28Large et al., 2014; Gregory et al., 2015a 
Cowrie siltstone, AustraliaSedimentary pyrite  28Large et al., 2014; Gregory et al., 2015a 
Curnamona, AustraliaSedimentary pyrite  37Large et al., 2014; Gregory et al., 2015a 
Dead Bullock Formation, AustraliaSedimentary pyrite  25Large et al., 2014; Gregory et al., 2015a 
Doushantuo Formation, ChinaSedimentary pyrite  106Gregory et al., 2017 
Gordon Group, AustraliaSedimentary pyrite  13Large et al., 2014; Gregory et al., 2015a 
Jet Rock Formation, UKSedimentary pyrite  28Large et al., 2014; Gregory et al., 2015a 
Johnson Cairn Formation, AustraliaSedimentary pyrite  17Large et al., 2014; Gregory et al., 2015a 
Liuchapo Formation, ChinaSedimentary pyrite  10Gregory et al., 2017 
Lower Keppel Creek Formation, AustraliaSedimentary pyrite  48Large et al., 2014; Gregory et al., 2015a 
Oxford J3, RussiaSedimentary pyrite  29Large et al., 2014; Gregory et al., 2015a 
Armadeus basin, AustraliaSedimentary pyrite  9Large et al., 2014; Gregory et al., 2015a 
Posidonia shale, GermanySedimentary pyrite  20Large et al., 2014; Gregory et al., 2015a 
Que River shale, AustraliaSedimentary pyrite  14Large et al., 2014; Gregory et al., 2015a 
Railway shale, AustraliaSedimentary pyrite  15Large et al., 2014; Gregory et al., 2015a 
Valkyrie Formation, AustraliaSedimentary pyrite  9Large et al., 2014; Gregory et al., 2015a 
Yeneena basin, AustraliaSedimentary pyrite  15Large et al., 2014; Gregory et al., 2015a 
Chaely deposit, TurkeyVHMS  4Revan et al., 2014 
DeGrussa, AustraliaVHMS  32Belousov et al., 2016 
Kilik, Ural, TurkeyVHMS  10Revan et al., 2014 

Beam size varied from 10 to 100 μm, depending on the size of pyrite analyzed and the goals of the relevant study. For each analysis, background was measured for 30 s prior to a 40- to 60-s laser ablation period. The analyses were conducting in a pure He atmosphere, and Ar was added to the gas stream prior to injection into the ICP-MS to improve aerosol transport. No correction was applied for doubly charged species, because these species were kept at low levels (below 0.2%). Standards were analyzed at the start and end of each sample change and approximately every 25 analyses in between. The standard STDGL2b2 (Danyushevsky et al., 2011) was used to analyze the elements of interest (except those taken from Gadd et al., 2016).

The locations, pertinent references, and number of analyses used for Random Forests training, testing, and blind testing are given in Table 1. To limit the influence of trace elements from microinclusions of other minerals that might be included during the ablation of pyrite, the data was screened to ensure that no analyses had higher than 1% Zn, 2% As, 1% Cu, 1% Ni, and 2% Co. Also, for analyses on which matrix corrections were preformed, samples with higher than 20% matrix were removed. This combination of newly acquired and compiled data yielded a total of 3,579 analyses from 70 different deposits and sedimentary units. Of these, 2,898 analyses from 43 individual deposits/sedimentary formations were used to train and initially test the Random Forests classifier to identify five distinct ore deposit types: IOCG, orogenic Au, porphyry Cu, SEDEX, and VHMS. In addition to these mineral deposit types, barren sedimentary pyrite was included as a class in the training data in an attempt to avoid misclassification of nonmineralized pyrite as from an ore deposit.

The remaining 681 analyses from 27 different deposits/sedimentary formations were used as blind tests of the trained classifier. These data are referred to as blind because analyses from these deposits/sedimentary formations were not present in the training or test data sets.

Data distributions

The geometric mean, multiplicative standard deviation, median, and median absolute deviation (MAD) values of element concentrations for the different ore deposit types from the training and total data sets are provided in Tables 2 and 3. The geometric mean and the median are both presented, because they provide robust summaries of the data, depending on their distributions. Where data are log-normally distributed, the geometric mean and multiplicative standard deviation provide a more useful summary. However, when data are not log-normally distributed, the median and MAD are more appropriate (Reimann and Filzmoser, 2000).

Table 2.

Summary of Statistics for Data Set Used in Training the Random Forests

DepositStatisticCo (ppm)Ni (ppm)Cu (ppm)Zn (ppm)As (ppm)Mo (ppm)Ag (ppm)Sb (ppm)Te (ppm)Au (ppm)Tl (ppm)Pb (ppm)
IOCGn120120120120120120120120120120120120
 Median1,735.367.943.080.502.150.030.040.041.480.010.011.35
 MAD1,680.367.583.000.262.110.030.030.030.650.000.011.34
 GM740.2654.772.750.743.940.050.060.072.180.010.030.76
 MSD12.8814.8920.163.7228.1414.5411.4414.783.045.3625.8546.18
Orogenic Aun120120120120120120120120120120120120
 Median208.2592.615.370.94163.530.020.252.891.130.160.0115.18
 MAD205.6189.385.100.83162.430.010.242.881.070.160.0114.62
 GM69.0099.228.151.92183.380.040.291.671.220.250.027.86
 MSD16.9910.0717.7415.6120.419.0317.0431.3112.0920.1510.9617.03
Porphyryn120120120120120120120120120120120120
Median590.40514.044.011.3953.360.170.140.162.100.030.011.10
 MAD537.86445.613.611.0146.130.160.130.141.830.030.011.07
 GM452.13336.526.512.0559.760.140.180.252.020.050.021.33
 MSD7.696.8216.656.768.149.089.888.578.068.598.2711.90
SEDEXn120120120120120120120120120120120120
Median80.00421.71495.4995.95769.6323.3823.8867.850.270.0224.56963.86
 MAD61.84404.66410.2674.14593.0320.8817.2548.940.000.0023.56622.56
 GM54.39256.92427.17131.75623.0522.9416.0661.860.280.0221.47846.17
 MSD4.928.304.394.713.866.423.843.902.021.339.723.44
Sedimentaryn120120120120120120120120120120120120
Median62.49215.88182.2423.45639.4228.382.2116.130.210.015.45217.16
 MAD52.55128.25131.8520.84486.8924.112.0613.820.100.014.78149.69
 GM52.03262.26179.8323.42536.7122.162.4120.200.380.025.82192.64
 MSD6.193.314.046.414.175.286.425.035.354.195.333.38
VHMSn120120120120120120120120120120120120
 Median21.345.861,002.6180.02660.480.9822.0024.965.170.412.16320.41
 MAD21.334.96786.00178.33616.230.9621.1224.215.070.392.16306.95
 GM8.233.89654.64140.00570.450.8914.9021.275.360.561.21202.38
 MSD47.827.577.8915.1010.4218.2111.0413.2322.608.7830.3912.93
DepositStatisticCo (ppm)Ni (ppm)Cu (ppm)Zn (ppm)As (ppm)Mo (ppm)Ag (ppm)Sb (ppm)Te (ppm)Au (ppm)Tl (ppm)Pb (ppm)
IOCGn120120120120120120120120120120120120
 Median1,735.367.943.080.502.150.030.040.041.480.010.011.35
 MAD1,680.367.583.000.262.110.030.030.030.650.000.011.34
 GM740.2654.772.750.743.940.050.060.072.180.010.030.76
 MSD12.8814.8920.163.7228.1414.5411.4414.783.045.3625.8546.18
Orogenic Aun120120120120120120120120120120120120
 Median208.2592.615.370.94163.530.020.252.891.130.160.0115.18
 MAD205.6189.385.100.83162.430.010.242.881.070.160.0114.62
 GM69.0099.228.151.92183.380.040.291.671.220.250.027.86
 MSD16.9910.0717.7415.6120.419.0317.0431.3112.0920.1510.9617.03
Porphyryn120120120120120120120120120120120120
Median590.40514.044.011.3953.360.170.140.162.100.030.011.10
 MAD537.86445.613.611.0146.130.160.130.141.830.030.011.07
 GM452.13336.526.512.0559.760.140.180.252.020.050.021.33
 MSD7.696.8216.656.768.149.089.888.578.068.598.2711.90
SEDEXn120120120120120120120120120120120120
Median80.00421.71495.4995.95769.6323.3823.8867.850.270.0224.56963.86
 MAD61.84404.66410.2674.14593.0320.8817.2548.940.000.0023.56622.56
 GM54.39256.92427.17131.75623.0522.9416.0661.860.280.0221.47846.17
 MSD4.928.304.394.713.866.423.843.902.021.339.723.44
Sedimentaryn120120120120120120120120120120120120
Median62.49215.88182.2423.45639.4228.382.2116.130.210.015.45217.16
 MAD52.55128.25131.8520.84486.8924.112.0613.820.100.014.78149.69
 GM52.03262.26179.8323.42536.7122.162.4120.200.380.025.82192.64
 MSD6.193.314.046.414.175.286.425.035.354.195.333.38
VHMSn120120120120120120120120120120120120
 Median21.345.861,002.6180.02660.480.9822.0024.965.170.412.16320.41
 MAD21.334.96786.00178.33616.230.9621.1224.215.070.392.16306.95
 GM8.233.89654.64140.00570.450.8914.9021.275.360.561.21202.38
 MSD47.827.577.8915.1010.4218.2111.0413.2322.608.7830.3912.93

GM = geometric mean, MAD = median absolute deviation, MSD = multiplicative standard deviation

Table 3.

Summary of Statistics for Entire Data Set

DepositStatisticCo (ppm)Ni (ppm)Cu (ppm)Zn (ppm)As (ppm)Mo (ppm)Ag (ppm)Sb (ppm)Te (ppm)Au (ppm)Tl (ppm)Pb (ppm)
IOCGn159159159159159159159159159159159159
 Median1,739.962.202.890.502.510.030.030.041.550.010.011.27
 MAD1,694.461.482.830.242.470.030.030.030.680.000.011.27
 GM786.1649.892.500.694.390.050.050.072.240.010.030.75
 MSD12.0314.6719.073.5928.0714.7710.3916.033.045.2924.1549.42
Orogenic Aun436436436436436436436436436436436436
 Median104.88106.725.420.92106.010.020.231.591.160.220.019.99
 MAD100.5096.995.380.89105.850.020.221.591.100.220.019.95
 GM82.36108.055.251.46106.620.030.190.951.060.230.023.80
 MSD9.226.2424.7315.6438.3513.3218.3338.6211.3629.9811.9124.20
Porphyryn416416416416416416416416416416416416
 Median452.81256.893.371.6564.620.310.170.161.570.050.010.96
 MAD410.08238.392.871.2258.340.290.150.141.340.050.010.92
 GM264.53176.065.772.1183.170.240.180.241.570.050.021.21
 MSD9.938.7415.235.738.327.427.549.136.826.415.3714.36
SEDEXn863863863863863863863863863863863863
 Median77.91409.00394.4882.21898.6222.5711.0357.630.270.0220.24716.55
 MAD65.44371.44308.7861.03644.7418.358.3845.110.000.0019.02506.11
 GM72.82347.80428.9795.10765.3420.5711.4752.440.310.0220.86675.64
 MSD5.245.884.074.753.555.553.923.952.131.609.903.56
Sedimentaryn1,2231,2231,2231,2231,2231,2231,2231,2231,2231,2231,2231,223
 Median99.48401.88199.3127.72429.0119.992.0123.250.480.033.51181.79
 MAD90.02293.91154.3924.93343.6918.371.8919.550.350.022.60140.74
 GM77.28354.70165.2828.06383.4718.161.8724.920.700.103.94143.34
 MSD6.683.754.817.084.637.118.885.775.584.484.544.71
VHMSn482482482482482482482482482482482482
 Median5.373.001,011.7322987.271.8324.643524.081.193.49448.06
 MAD5.332.94841.51314.29734.171.7923.3733.9823.941.123.47433.85
 GM4.351.69605.88190.88771.801.0518.0330.0818.330.941.21245.20
 MSD31.0112.607.9311.937.0615.3312.4612.5622.0310.1123.5511.85
DepositStatisticCo (ppm)Ni (ppm)Cu (ppm)Zn (ppm)As (ppm)Mo (ppm)Ag (ppm)Sb (ppm)Te (ppm)Au (ppm)Tl (ppm)Pb (ppm)
IOCGn159159159159159159159159159159159159
 Median1,739.962.202.890.502.510.030.030.041.550.010.011.27
 MAD1,694.461.482.830.242.470.030.030.030.680.000.011.27
 GM786.1649.892.500.694.390.050.050.072.240.010.030.75
 MSD12.0314.6719.073.5928.0714.7710.3916.033.045.2924.1549.42
Orogenic Aun436436436436436436436436436436436436
 Median104.88106.725.420.92106.010.020.231.591.160.220.019.99
 MAD100.5096.995.380.89105.850.020.221.591.100.220.019.95
 GM82.36108.055.251.46106.620.030.190.951.060.230.023.80
 MSD9.226.2424.7315.6438.3513.3218.3338.6211.3629.9811.9124.20
Porphyryn416416416416416416416416416416416416
 Median452.81256.893.371.6564.620.310.170.161.570.050.010.96
 MAD410.08238.392.871.2258.340.290.150.141.340.050.010.92
 GM264.53176.065.772.1183.170.240.180.241.570.050.021.21
 MSD9.938.7415.235.738.327.427.549.136.826.415.3714.36
SEDEXn863863863863863863863863863863863863
 Median77.91409.00394.4882.21898.6222.5711.0357.630.270.0220.24716.55
 MAD65.44371.44308.7861.03644.7418.358.3845.110.000.0019.02506.11
 GM72.82347.80428.9795.10765.3420.5711.4752.440.310.0220.86675.64
 MSD5.245.884.074.753.555.553.923.952.131.609.903.56
Sedimentaryn1,2231,2231,2231,2231,2231,2231,2231,2231,2231,2231,2231,223
 Median99.48401.88199.3127.72429.0119.992.0123.250.480.033.51181.79
 MAD90.02293.91154.3924.93343.6918.371.8919.550.350.022.60140.74
 GM77.28354.70165.2828.06383.4718.161.8724.920.700.103.94143.34
 MSD6.683.754.817.084.637.118.885.775.584.484.544.71
VHMSn482482482482482482482482482482482482
 Median5.373.001,011.7322987.271.8324.643524.081.193.49448.06
 MAD5.332.94841.51314.29734.171.7923.3733.9823.941.123.47433.85
 GM4.351.69605.88190.88771.801.0518.0330.0818.330.941.21245.20
 MSD31.0112.607.9311.937.0615.3312.4612.5622.0310.1123.5511.85

GM = geometric mean, MAD = median absolute deviation, MSD = multiplicative standard deviation

With the exception of the VHMS and IOCG deposits, the training data set used equal numbers of analyses from each deposit. Therefore, the training data set is less biased by the number of analyses preformed on the different deposits (i.e., the classifier will skew toward picking the deposit that has more data points in the training set). VHMS and IOCG deposits did not have sufficient analyses from a variety of deposits to have equal numbers of analyses from each deposit in the training data set. Additionally, of the reported statistics, we assert that the medians of trace element content for the different ore deposit types from the training data set should be used rather than total data set statistics for comparisons in future studies. This is because the training set geometric mean and median attempt to represent equal contributions from the different deposits instead of being overly representative of one deposit from which we have more data.

Random Forests training and evaluation

To train and test the Random Forests classifier, we used a total of 3,579 analyses of pyrite that passed the screening process: 159 IOCG, 436 orogenic Au, 416 porphyry Cu, 863 SEDEX, 1,223 sedimentary pyrite, and 482 VHMS. The pyrite trace element data were then split into three groups for classifier training, testing, and blind testing. The 681 analyses used for the blind test were removed (Table 1): orogenic Au (118 from four deposits), SEDEX (66 from three deposits), sedimentary pyrite (451 from 17 formations/basins), and VHMS (46 from three deposits). From the remaining data, a total of 120 analyses from each ore deposit type were used to train Random Forests. To avoid bias toward classes with more analyses, an equal number of analyses from each deposit were randomly selected, except for VHMS and IOCG deposits, because some deposits lacked a sufficient number of analyses to have equal numbers of analyses (Table 1). The remaining data (2,178 analyses) were used as the initial test of the classifier. A total of 500 trees were used, and splitting was halted if there were five or fewer instances in the resulting child node.

The mean decrease in Gini index, a measure of the contribution of a given variable to correctly classify training data, was used to determine the relevance of different elements during Random Forests classifier training (Fig. 3). This measure of variable importance compares the average total decrease in node impurity (based on the Gini index) when splitting on a given variable, weighted by the proportion of samples in that node. Nickel, As, and Co generated the lowest mean decrease in Gini index values (0.069, 0.069, and 0.062, respectively). To assess if any or all of these elements could be excluded from classifier training, different combinations of these elements were removed from the training data. Random Forests classifiers were also trained with different combinations of these elements removed (Co, Ni, As, Co-Ni, Co-As, Ni-As, and Co-Ni-As). The classifier was also tested with Te and Au removed, because these elements have significant numbers of analyses below detection limits, which could bias classifier training, due to detection limit correlations with the analyses of individual deposits. In the end the Co, Ni, Cu, Zn, As, Mo, Ag, Sb, Te, Tl, and Pb were chosen as the preferred input variables.

Fig. 3.

Gini decrease for elements used in the ore deposit type Random Forests classifier.

Fig. 3.

Gini decrease for elements used in the ore deposit type Random Forests classifier.

Random Forests generates class predictions based on a majority of votes cast by all decision trees. Associated class membership probabilities provide an opportunity to evaluate the confidence of individual classifications (Cracknell and Reading, 2013). To assess the effectiveness of the trained classifier with respect to ambiguous classifications, a range of class membership probability thresholds for the winning class were tested: >33, >40, and >50% of votes. Higher probability thresholds remove increasingly uncertain predictions (this is a requirement of votes needed for a single analysis to be classified conclusively). Additionally, rather than requiring a 50% or greater proportion of analyses from a given deposit to consider a deposit correctly identified, we require that ≥65% of analyses must be classified as the deposit for a conclusive identification (this is a requirement for number of analyses from a deposit to be correctly identified). When the number of analyses is ≤35% of the target deposit type definition, it is termed incorrect, and between 65% and 35% is inconclusive.

Ore deposit type classification

Mineralization type classification outcomes for the test data are summarized in Table 4. The Random Forests classifier was run 10 times with different random selections of training data to assess the effectiveness of the classifier with different random seeds of data (App. 1). Random Forests correctly identified the ore deposit type from pyrite trace element analyses with an overall accuracy of 91.0 ± 0.8%. Recall statistics for individual ore deposit types range from 76.9 ± 5.4 to 95.0 ± 0.8%. IOCG, orogenic Au, and porphyry Cu test data samples were predicted with recalls of 86.9 ± 5.0, 76.9 ± 5.4, and 84.2 ± 2.4%, respectively. SEDEX, sedimentary pyrite, and VHMS test data were predicted with noticeably better recalls of 95.0 ± 0.8, 92.8 ± 2.1, and 94.4 ± 1.8% respectively. These results show that the different random selections for the training data produce similar results. As such the same training data set (the one that produced the results in Table 4) was used for the following experiments.

Table 4.

Confusion Matrix for Random Forests Classification of Test Data

Predicted
  IOCGOrogenic AuPorphyrySEDEXSedimentaryVHMSSum% correct
ActualIOCG37 1 1 3994.9
Orogenic Au915626 1619878.8
Porphyry1130240 6929681.1
SEDEX 4763429367793.6
Sedimentary 365634465297.2
VHMS21124729031691.8
Sum592042826436783122,178 
Predicted
  IOCGOrogenic AuPorphyrySEDEXSedimentaryVHMSSum% correct
ActualIOCG37 1 1 3994.9
Orogenic Au915626 1619878.8
Porphyry1130240 6929681.1
SEDEX 4763429367793.6
Sedimentary 365634465297.2
VHMS21124729031691.8
Sum592042826436783122,178 

Table 5 indicates that by removing a small percentage (7.8%) of Random Forests predictions with class membership probabilities of less than 40%, ambiguous classifications can be eliminated. This correction results in an increase in overall accuracy of 3.1% to a total of 94.5%. Similarly, the range of class recalls for individual ore deposit types increased by between –0.3 and 7.0%, with, IOCG, orogenic Au, porphyry Cu, SEDEX, sedimentary pyrite, and VHMS deposits having adjusted individual recalls of 94.6, 85.8, 87.8, 95.4, 98.5, and 94.6%, respectively.

Table 5.

Confusion Matrix for Random Forests Classification of Test Data when Samples with Less Than 40% of the Votes Are Removed

Predicted
  IOCGOrogenic AuPorphyrySEDEXSedimentaryVHMSSum% correct
ActualIOCG35 1 1 3794.6
Orogenic Au714516  116985.8
Porphyry518216 4324687.8
SEDEX  362325265395.4
Sedimentary  14598460798.5
VHMS1714327929594.6
Sum481702386316312892,007 
Predicted
  IOCGOrogenic AuPorphyrySEDEXSedimentaryVHMSSum% correct
ActualIOCG35 1 1 3794.6
Orogenic Au714516  116985.8
Porphyry518216 4324687.8
SEDEX  362325265395.4
Sedimentary  14598460798.5
VHMS1714327929594.6
Sum481702386316312892,007 

Blind test results indicate the Random Forests classifier generated predictions with an overall accuracy of 88% with class-dependent recalls between 73.9 and 95.5% (Table 6). The orogenic Au, SEDEX, sedimentary pyrite, and VHMS samples were classified with proportions of correct classification of 81.4, 95.5, 90.0, and 73.9%, respectively. More accurate results were again obtained when excluding predictions with maximum class membership probabilities of less than 40% (Table 7). Increases in recall ranged from 1.4 to 10.7%, resulting in class recalls for orogenic Au of 85.5%, SEDEX of 96.9%, sedimentary pyrite of 96.7%, and VHMS of 84.6%.

Table 6.

Confusion Matrix for Random Forests Classification of Blind Test Data

Predicted
  IOCGOrogenic AuPorphyrySEDEXSedimentaryVHMSSum% correct
ActualIOCG       NA
Orogenic Au39612  711881.4
Porphyry       NA
SEDEX 1 632 6695.5
Sedimentary144149406445190.0
VHMS34122344673.9
Sum20105277441045681 
Predicted
  IOCGOrogenic AuPorphyrySEDEXSedimentaryVHMSSum% correct
ActualIOCG       NA
Orogenic Au39612  711881.4
Porphyry       NA
SEDEX 1 632 6695.5
Sedimentary144149406445190.0
VHMS34122344673.9
Sum20105277441045681 
Table 7.

Confusion Matrix for Random Forests Classification of Blind Test Data when Samples with Less Than 40% of the Votes Are Removed

Predicted
  IOCGOrogenic AuPorphyrySEDEXSedimentaryVHMSSum% correct
ActualIOCG      0NA
Orogenic Au2948  611085.5
Porphyry      0NA
SEDEX   622 6496.9
Sedimentary1 56378139196.7
VHMS1311 333984.6
Sum497146938040604 
Predicted
  IOCGOrogenic AuPorphyrySEDEXSedimentaryVHMSSum% correct
ActualIOCG      0NA
Orogenic Au2948  611085.5
Porphyry      0NA
SEDEX   622 6496.9
Sedimentary1 56378139196.7
VHMS1311 333984.6
Sum497146938040604 

Different class membership thresholds were trialed (33, 40, and 50%). The 40% threshold was chosen because it led to an increase in recall rates of 3.1% (importantly, this includes an increase in orogenic Au recall of 7.0% and porphyry Cu recall of 6.7%) while preserving approximately 92.2% of the number of original analyses in the test data. The 33% threshold only increased the recall rates by 1.1%, and the 50% threshold only had an increase in recall rates of 5.2% and required removal of 18.3% of the data. The results of these experiments are in Appendix 2.

A series of Random Forest classifications were rerun with different combinations of Te, Co, As, and Ni removed from the data. This exercise was included because Te has several analyses below detection limits and the data set from Gadd et al. (2016) did not include Te. The value of Co, As, and Ni was tested because these had the lowest mean decreases in the Gini index (Fig. 3). While the removal of one of these elements did not cause large changes in the ability of the Random Forests classifier to predict deposit type in general, it did significantly affect the ability to classify individual deposit types; thus, all these elements were included in the preferred classifier.

It has been proposed that pyrite trace element content is reset when trace elements are forced out of the pyrite lattice at metamorphic grades higher than midgreenschist facies (Large et al., 2009; Thomas et al., 2011). To test this assertion, we put LA-ICP-MS pyrite trace element data (n = 93) from orogenic gold deposits that have been metamorphosed to greater than midgreenschist facies through the classifier (Belousov et al., 2016). This returned only 67.7% correct identifications, 11.1% less than the lower metamorphic-grade orogenic gold deposits. Similarly, when inconclusive (less than 40% of votes) analyses are removed, this only increases to 70.9% correct identifications, 14.9% less than the lower metamorphic-grade orogenic gold deposits (Table 8).

Table 8.

Confusion Matrix for Random Forests Classification of High Metamorphic-Grade Pyrite

DepositDeposit type% correctNumber correctNumber incorrect
Big BellOrogenic gold80.082
ChaliceOrogenic gold90.091
HuntOrogenic gold84.6112
JunctionOrogenic gold0.0 7
Kanowna BelleOrogenic gold84.6112
MaybellOrogenic gold62.553
PorphyryOrogenic gold55.0119
RedeemerOrogenic gold37.535
Total 65.25831
DepositDeposit type% correctNumber correctNumber incorrect
Big BellOrogenic gold80.082
ChaliceOrogenic gold90.091
HuntOrogenic gold84.6112
JunctionOrogenic gold0.0 7
Kanowna BelleOrogenic gold84.6112
MaybellOrogenic gold62.553
PorphyryOrogenic gold55.0119
RedeemerOrogenic gold37.535
Total 65.25831

One of the drawbacks to using Random Forests is that it will always give an answer, even if the actual class of an unknown pyrite sample is not within the training data set. To test how the classifier will react to pyrite that does not fit the types we have included in the training data, we attempted to classify the data presented by Gregory et al. (2016) from the St. Ives Au district, which includes four different types of pyrite not included in the training data (note that the orogenic Au pyrite from this study has been included in the training and test data sets of the classifier). Gregory et al. (2016) presented LA-ICP-MS analyses of sedimentary pyrite (py1 and py2; n = 143), nonmineralization-related hydrothermal pyrite (py3, py4, and py5; n = 37, 8, and 17, respectively), orogenic Au pyrite (py6; n = 117), and greenstone-related pyrite (py7; n = 20). Of these, sedimentary pyrite and orogenic Au pyrite had 97.5 and 84.9% of the analyses correctly identified. Similarly, these classifications only had 16 and 9% of the analyses removed as inconclusive (received less than 40% of the votes). Py5 had 76% of its analyses removed as inconclusive, and Py3 and Py7 both had only 62.5% of their analyses chosen as the one that had the highest percentage classification. Py4 only had 38% of the analyses removed as inconclusive, and 80% of the analyses were identified as orogenic Au. These results are summarized in Table 9.

Table 9.

Confusion Matrix for Random Forests Classification of St. Ives Pyrite Data Set, Including Non-Ore-Related Hydrothermal Pyrite

Pyrite type% inconclusiveInconclusive analysesConclusive analyses% most common classificationMost common classification
Sedimentary162312097.5Sedimentary
Py31453262.5Orogenic Au
Py4383580.0Orogenic Au
Py576134100.0Porphyry
Orogenic Au91110684.9Orogenic Au
Py7202862.5Porphyry
Total1757275  
Pyrite type% inconclusiveInconclusive analysesConclusive analyses% most common classificationMost common classification
Sedimentary162312097.5Sedimentary
Py31453262.5Orogenic Au
Py4383580.0Orogenic Au
Py576134100.0Porphyry
Orogenic Au91110684.9Orogenic Au
Py7202862.5Porphyry
Total1757275  

Conventional X-Y element scatter plots

Conventional element scatter plots of pyrite chemistry have been used with some degree of success to differentiate pyrite from different ore types. However, X-Y scatter plots are less useful when discriminating pyrite from more than two other deposit types. Examples are given in Figure 4 for the pyrite training data set from this study. In general terms, pyrite in the ore zones from medium- to low-temperature hydrothermal deposit types (VHMS and SEDEX) tend to contain higher concentrations of most trace elements compared to pyrite from higher-temperature hydrothermal deposit types (porphyry Cu, IOCG, and orogenic Au). This relationship is illustrated in Figure 4A through C and F (Zn-Cu, Mo-As, Ag-Pb, and Tl-Sb scatter plots). Sedimentary pyrite also contains high concentrations of most trace elements and plots in the same vicinity as data for SEDEX and VHMS deposits. Porphyry Cu, IOCG, and orogenic Au pyrites by comparison generally contain lower levels of Zn, Cu, Mo, Ag, Pb, Tl, and Sb. Commonly, the data for different deposit types exhibit strong overlaps such that it is virtually impossible to distinguish ore type based on simple trace element scatter plots (e.g., Fig. 4D).

Fig. 4.

Scatter plots of trace elements in pyrite used in training data set for the ore deposit type Random Forests classifier: A) Zn versus Cu, B) Mo versus As, C) Ag versus Pb, D) Te versus Au, E) Co versus Ni, and F) Tl versus Sb.

Fig. 4.

Scatter plots of trace elements in pyrite used in training data set for the ore deposit type Random Forests classifier: A) Zn versus Cu, B) Mo versus As, C) Ag versus Pb, D) Te versus Au, E) Co versus Ni, and F) Tl versus Sb.

By simultaneously using several different elements, Random Forests allows us to go beyond what is possible with traditional X-Y plots, but visualization of the distinctions can be challenging. By assessing the overall element concentrations of classified ore deposit types, however, some of the Random Forests decision boundaries can be depicted. For this discussion, we use training data median values, as they are less affected by imbalances in the number of samples from each deposit type compared to complete or test data sets, and they provide a reasonable estimate of the central tendency of populations that are not normally distributed. Copper and Zn can be used to separate SEDEX (medians of 495.49 ppm for Cu and 95.95 ppm for Zn) and VHMS (medians of 1,002.64 ppm for Cu and 180.02 ppm for Zn) deposits from the other deposit types, as they are one to two orders of magnitude more enriched in these elements (Fig. 4). Conversely, distinctly low As values (median 2.15 ppm) can be used to separate IOCG and, to a lesser extent, porphyry Cu mineralization (median 53.36 ppm). Enrichments in molybdenum are known to occur in a number of sedimentary settings, particularly when euxinic conditions are present (Lyons et al., 2003; Tribovillard et al., 2006; Scott et al., 2008; Lyons et al., 2009). Therefore, it follows that high Mo can be used to identify SEDEX and sedimentary pyrite (medians of 23.38 and 28.38 ppm Mo, respectively), both of which formed in marine settings. Similarly, VHMS deposits have low but above detection Mo (median of 0.98 ppm), presumably due to the association of VHMS deposits with seawater and deposition at or near the sea floor. SEDEX (medians of 23.88 ppm for Ag and 963.86 ppm for Pb) and VHMS (medians of 22.00 ppm for Ag and 320.41 ppm for Pb) pyrite is enriched in silver and Pb.

Interestingly, Co and Ni in sulfide minerals, which have long been used to determine pyrite source (Loftus-Hills and Solomon, 1967), were among the lowest ranked elements in terms of mean decrease in Gini index (Fig. 3). Nevertheless, porphyry Cu-related pyrite is enriched in Ni compared to the other deposit types (median of 590.40 ppm), and IOCG is very enriched in Co (median of 1,735.28 ppm). Even Au, which was left out of the favored Random Forests classifier due to concerns about the number of analyses that were below detection limits, is potentially significant for identifying orogenic Au (median 0.16 ppm) and VHMS (median 0.41 ppm) deposit types. However, the strength of the Random Forests method lies with its ability to combine all observations rapidly.

Ore deposit type predictions

The results of Random Forests predictions for test (91.4% correct predictions) and blind test (88%) data (Tables 4, 6) prove the efficacy of Random Forests analyses of pyrite databases to predict ore deposit type. The classification can be further refined by removing the analyses that did not meet the threshold of obtaining 40% or more of the votes from the Random Forests. This adjustment increased the accuracy of experiments with Au removed to 94.5% with 7.8% of data removed for the test data and to 93.9% with 11.3% of data removed for the blind test data (Tables 5, 7).

The very high proportion of correct predictions (98.5% for test data and 96.7% for blind test data) for sedimentary pyrite is particularly important. Specifically, those data represent the only nonmineralized pyrite samples investigated in this study, suggesting that Random Forests classification is able to accurately discriminate pyrite formed from mineralized systems from that formed at low temperature in the water column and in shallow marine sediments. There is often disagreement in the paleoceanographic community in discussions about whether hydrothermal overprints or ocean conditions are responsible for metal enrichments in the rock record. The Random Forests classifier developed here may facilitate the identification of hydrothermal overprints on sedimentary pyrite in future studies.

As there is a disparate number of analyses from different deposit types, it is possible that the classifier is only working well for the deposits that have larger amounts of data. To test whether this is the case, we checked the individual results of the classifier (with a >40% vote threshold) for each deposit from the test and blind test data set (Tables 10, 11). Of these, all but one of the deposits were conclusively (greater than 65%) correctly identified. The deposit that was inconclusive, the Youanmi orogenic Au deposit, still had 60% of the votes and only had 10 analyses to classify, so it may be that the pyrite trace element content was not accurately represented by the sample. This demonstrates that the Random Forests classifier can identify analyses from the deposits used in developing the classifier.

Table 10.

Test Results of Individual Ore Deposits

DepositDeposit type% correctNumber correctNumber incorrect
ManxmanIOCG100.0290
Punt HillIOCG75.062
DarlotOrogenic Au100.040
East RepulseOrogenic Au77.4247
FortnumOrogenic Au75.0155
Golden MileOrogenic Au82.6194
Granny SmithOrogenic Au75.031
LancefieldOrogenic Au100.010
MarsOrogenic Au77.872
Meekatharra, ProhibitionOrogenic Au100.020
Meekatharra, Micky DoolanOrogenic Au100.020
MinjarOrogenic Au100.020
Nathans LabouchereOrogenic Au100.030
PaddingtonOrogenic Au100.020
SongvangOrogenic Au100.050
Sunrise DamOrogenic Au100.020
VictoryOrogenic Au91.5545
ChalkidikiPorphyry80.0287
CadiaPorphyry89.118823
DonSEDEX100.0930
HYCSEDEX91.928425
Lady LorettaSEDEX100.0250
Pelly NorthSEDEX100.0720
XYSEDEX96.81495
Armadeus basinSedimentary pyrite100.0120
Barney Creek Formation, McArthur basinSedimentary pyrite100.040
Woody Island siltstoneSedimentary pyrite91.7111
Canning basinSedimentary pyrite98.31743
CarbondaleSedimentary pyrite100.0180
Late Jurassic NW ShelfSedimentary pyrite100.050
Perth basinSedimentary pyrite100.0230
Satkinskaya SuiteSedimentary pyrite100.0190
Hamersley basinSedimentary pyrite96.51104
Selwyn basinSedimentary pyrite99.52081
Salmon River siltstoneSedimentary pyrite100.060
Canarvon basinSedimentary pyrite100.080
KutlularVHMS80.041
Total 94.81,896111
DepositDeposit type% correctNumber correctNumber incorrect
ManxmanIOCG100.0290
Punt HillIOCG75.062
DarlotOrogenic Au100.040
East RepulseOrogenic Au77.4247
FortnumOrogenic Au75.0155
Golden MileOrogenic Au82.6194
Granny SmithOrogenic Au75.031
LancefieldOrogenic Au100.010
MarsOrogenic Au77.872
Meekatharra, ProhibitionOrogenic Au100.020
Meekatharra, Micky DoolanOrogenic Au100.020
MinjarOrogenic Au100.020
Nathans LabouchereOrogenic Au100.030
PaddingtonOrogenic Au100.020
SongvangOrogenic Au100.050
Sunrise DamOrogenic Au100.020
VictoryOrogenic Au91.5545
ChalkidikiPorphyry80.0287
CadiaPorphyry89.118823
DonSEDEX100.0930
HYCSEDEX91.928425
Lady LorettaSEDEX100.0250
Pelly NorthSEDEX100.0720
XYSEDEX96.81495
Armadeus basinSedimentary pyrite100.0120
Barney Creek Formation, McArthur basinSedimentary pyrite100.040
Woody Island siltstoneSedimentary pyrite91.7111
Canning basinSedimentary pyrite98.31743
CarbondaleSedimentary pyrite100.0180
Late Jurassic NW ShelfSedimentary pyrite100.050
Perth basinSedimentary pyrite100.0230
Satkinskaya SuiteSedimentary pyrite100.0190
Hamersley basinSedimentary pyrite96.51104
Selwyn basinSedimentary pyrite99.52081
Salmon River siltstoneSedimentary pyrite100.060
Canarvon basinSedimentary pyrite100.080
KutlularVHMS80.041
Total 94.81,896111
Table 11.

Blind Test Results of Individual Ore Deposits

DepositDeposit type% correctNumber correctNumber incorrect
Hill 50Orogenic Au66.7126
WallabyOrogenic Au80.0164
WilunaOrogenic Au96.8602
YouanmiOrogenic Au60.064
Anniversary CentralSEDEX95.2402
Anniversary EastSEDEX100.0150
SEDEXOPSEDEX100.070
CurnamonaSedimentary94.3332
Alum shaleSedimentary100.0220
Doushantuo FormationSedimentary93.1675
Jet RockSedimentary100.0260
Armadeus basinSedimentary100.090
PosidoniaSedimentary100.0200
Railway shaleSedimentary92.3121
Liuchapo FormationSedimentary83.351
Gordon GroupSedimentary100.0130
Que River shaleSedimentary100.090
Oxford J3Sedimentary100.0290
Rocky Cape Group Cowrie SiltstoneSedimentary100.0280
Valkyrie Formation, McArthur basinSedimentary100.090
Dead Bullock FormationSedimentary100.0250
Togari GroupSedimentary95.8462
Yeneena basinSedimentary100.080
Yerrida GroupSedimentary100.0170
VHMS ChaelyVHMS75.031
VHMS DeGrussaVHMS80.0205
VHMS KilikVHMS100.0100
Total 94.256735
DepositDeposit type% correctNumber correctNumber incorrect
Hill 50Orogenic Au66.7126
WallabyOrogenic Au80.0164
WilunaOrogenic Au96.8602
YouanmiOrogenic Au60.064
Anniversary CentralSEDEX95.2402
Anniversary EastSEDEX100.0150
SEDEXOPSEDEX100.070
CurnamonaSedimentary94.3332
Alum shaleSedimentary100.0220
Doushantuo FormationSedimentary93.1675
Jet RockSedimentary100.0260
Armadeus basinSedimentary100.090
PosidoniaSedimentary100.0200
Railway shaleSedimentary92.3121
Liuchapo FormationSedimentary83.351
Gordon GroupSedimentary100.0130
Que River shaleSedimentary100.090
Oxford J3Sedimentary100.0290
Rocky Cape Group Cowrie SiltstoneSedimentary100.0280
Valkyrie Formation, McArthur basinSedimentary100.090
Dead Bullock FormationSedimentary100.0250
Togari GroupSedimentary95.8462
Yeneena basinSedimentary100.080
Yerrida GroupSedimentary100.0170
VHMS ChaelyVHMS75.031
VHMS DeGrussaVHMS80.0205
VHMS KilikVHMS100.0100
Total 94.256735

Effects of metamorphic grade on classifier predictions

To test and assess how high-grade metamorphic overprint will affect the ability of Random Forests to identify ore deposit type, we used analyses from Belousov et al. (2016) that were from upper greenschist or higher-grade metamorphic facies. These data resulted in a total decrease of over 10% effectiveness of the classifier (Table 8) and importantly resulted in 50% of the deposits being inconclusively or misclassified (Table 8) using the initial results or 37.5% after inconclusive analyses (analyses that received less than 40% of the votes) were removed. This suggests that pyrite trace element content can give spurious results in high metamorphic-grade settings. The exact reason for this variation in trace element content is beyond the scope of this study; however, it is interesting to note that the Ni median is higher in the high metamorphic-grade orogenic gold deposits (258 ppm) and lower in the Sb (0.49 ppm; Table 12) more similar to high-temperature pyrite varieties such as porphyry deposits (Table 2; Franchini et al., 2015). This may reflect pyrite dissolution and reprecipitation or recrystallization of the pyrite at high temperatures imparting a chemistry more indicative of magmatic processes.

Table 12.

Median and MAD for High Metamorphic-Grade Pyrite (from Belousov et al., 2016)

DepositStatisticCo (ppm)Ni (ppm)Cu (ppm)Zn (ppm)As (ppm)Mo (ppm)
HighMedian258.01299.6215.401.84137.300.02
metamorphic-grade Orogenic goldMAD217.57173.7615.071.69136.230.01
  Ag (ppm)Sb (ppm)Te (ppm)Au (ppm)Tl (ppm)Pb (ppm)
 Median1.650.496.490.340.0312.26
 MAD1.530.486.140.340.039.21
DepositStatisticCo (ppm)Ni (ppm)Cu (ppm)Zn (ppm)As (ppm)Mo (ppm)
HighMedian258.01299.6215.401.84137.300.02
metamorphic-grade Orogenic goldMAD217.57173.7615.071.69136.230.01
  Ag (ppm)Sb (ppm)Te (ppm)Au (ppm)Tl (ppm)Pb (ppm)
 Median1.650.496.490.340.0312.26
 MAD1.530.486.140.340.039.21

MAD = median absolute deviation

Identification of pyrite that has a source not included in the classifier

One of the limitations of using Random Forests to predict unknowns in a geologic setting is that it will always give an answer that corresponds with the input designations of the training data set. Because there is a wide variety of different deposits and pyrite sources not associated with economic mineral deposits, there is a risk that the classifier will assess everything as coming from a mineralized deposit. To check how a classifier will respond to barren, nonsedimentary pyrite, we used pyrite data from sedimentary pyrite, orogenic gold-related pyrite, and four pyrite generations unrelated to the mineralization from the St. Ives Au district (Gregory et al., 2016). The sedimentary and orogenic Au pyrite was conclusively, correctly identified (note that the orogenic Au pyrite was included in the training data set earlier), while three of the nonmineralized pyrites returned inconclusive results (Table 9). The fourth was incorrectly conclusively identified as orogenic Au. This shows that most barren pyrite can be identified correctly by calculating the proportion of analyses that are inconclusive and by establishing criteria for how many inconclusive identifications are present in a given sample or set of samples. At the same time, it serves as a reminder that this classifier still needs a large number of analyses from many of the deposit types listed, deposit types currently not represented in the classifier, and other types of nonmineralized pyrite before it can be confidently utilized in the mineral exploration industry. Furthermore, it also shows that the classifier has the potential to be used as one of several tools when making decisions regarding priority of drill targets but not as a replacement for traditional tools, such as petrography, when determining the paragenesis of an ore deposit.

Caveats and future work

The pyrite data investigated in this study were obtained from analyses collected over 10 years as part of a number of different projects with contrasting objectives. In addition, the LA-ICP-MS technology has continued to develop over this time, and detection limits for all trace elements vary significantly. This has resulted in a range of detection limits throughout the data, including SEDEX deposits with anomalously high limits for Se, Cd, Au, and Te (Maier, 2011). In the case of the data from Gadd et al. (2016), some of these elements were not analyzed (or reported). Cadmium and Se results were omitted from our training data for this reason but should be included in future analyses, as both these elements accumulate in pyrite and could be useful for discriminating ore deposit type.

Similarly, the optimal Random Forests classifier was refined to not include Au. Tellurium, however, was not omitted from this classifier despite the lack of Te data from SEDEX deposits. The classifier has difficulty identifying orogenic Au mineralization because Te is commonly associated with Au mineralization (Belousov et al., 2016). Because the Random Forests classifier requires all trace elements in the table to contain nonmissing values, the averages from the single SEDEX deposit that had good-quality Te and Au data (Lady Loretta) were used for all the SEDEX analyses. This has probably overestimated the ability of the classifier to identify SEDEX analyses, because the same value for Te was used by all the SEDEX samples. However, because SEDEX pyrite also has distinctly higher Cu, Mo, Sb, Tl, and Pb concentrations compared to most other deposits, it is thought that Te is not particularly important for SEDEX classification. Furthermore, concentrations of Te in SEDEX samples only differ significantly from those in orogenic Au, porphyry Cu, and VHMS samples. To further test this reasoning, the favored classifier test data (omitting Au) was rerun to exclude Te. The results are summarized in Table 13. This experiment showed that, indeed, the SEDEX results were enhanced by substituted Te values; however, the SEDEX analyses without Te were still correctly identified most of the time with a recall of 74.2% correct (for test data). The classifier will be strengthened by addition of new SEDEX analyses with viable Te data, but until those data are available, the average Te concentration from Lady Loretta is used for SEDEX analyses with high detection limits or missing data.

Table 13.

Confusion Matrix for Random Forests Classification of Test Data with No Te in Training Data Set

Predicted
  IOCGOrogenic AuPorphyrySEDEXSedimentaryVHMSSum% correct
ActualIOCG36110103992.3
Orogenic Au191472332419874.2
Porphyry113223609829679.7
SEDEX76115021282367774.2
Sedimentary126571554465285.0
VHMS5923329431693.0
Sum902012785796973332,178 
Predicted
  IOCGOrogenic AuPorphyrySEDEXSedimentaryVHMSSum% correct
ActualIOCG36110103992.3
Orogenic Au191472332419874.2
Porphyry113223609829679.7
SEDEX76115021282367774.2
Sedimentary126571554465285.0
VHMS5923329431693.0
Sum902012785796973332,178 

Tin and W may be useful discriminators, as has been shown for VHMS (high Sn) and orogenic Au deposits (high W; Belousov et al., 2016). These elements were not included in the classifier because of a general lack of data in some data sources. As W and Sn have been proven effective for discriminating between some deposit types, future pyrite analyses should include W and Sn to further assess their utility.

A further weakness of the current classifier is the variability in the number of deposits for which data are available and the amount of data from those sites. Data are available from two porphyry Cu districts and two IOCG deposits. This gap may mean that pyrite trace element concentrations for those deposit types are not fully representative of the ranges likely to be found in mineralized systems. Therefore, additional data from porphyry Cu and IOCG deposit types need to be collected so the variability observed between different deposits of the same type can be better represented.

While an attempt was made to include as many different deposits as possible, we concede that several important deposit types were missing, such as epithermal Au, Carlin-type Au, and Ni/platinum group element deposits. Future iterations of this classification experiment should include these and other deposit types. Similarly, in its current state, the classifier only includes one type of barren pyrite—sedimentary pyrite. Future work should include barren metamorphic and igneous pyrite.

The Random Forests classifier developed here, based on the concentrations of Co, Ni, Cu, Zn, As, Mo, Ag, Sb, Te, Tl, and Pb in pyrite, was found to correctly classify both test data and blind test data. These results yielded an overall accuracy for the test and blind test data of 94.5 and 93.9%, respectively, when inconclusive analyses (less than 40% of votes) are not considered. We can conclude that Random Forests classifiers developed from microanalyses of individual minerals are potentially useful for identifying ore deposit type and should be considered a viable geochemical exploration tool, although it should be stressed that this approach should be regarded as a preliminary positive result; before it can be widely applied in mineral exploration additional ore-related and non-ore-related pyrite varieties need to be added to the classifier. Furthermore, we stress that this should be regarded as one of many tools rather than a single stand-alone classification method. Parties who are interested in using the classifier on their own data sets are encouraged to contact the lead author, who can arrange the processing of LA-ICP-MS pyrite data.

By testing how well the classifier can identify ore deposit type on pyrite that has passed through the midgreenschist facies metamorphic window, we have found that at least in some areas the trace element composition of pyrite has been significantly altered such that the classifier can no longer identify the original pyrite type conclusively. This supports the assertion that pyrite chemistry can be altered at these metamorphic grades.

These results are also important for fields of geology not interested in ore deposits or exploration for ore deposits. The high degree of effectiveness of the classifier for identifying sedimentary pyrite not associated with hydrothermal fluids has created an additional opportunity for recognizing hydrothermal overprints on sedimentary deposits included in paleoceanographic studies.

We would like to acknowledge the Western Australia and South Australia geological surveys for their support of the initial studies that accumulated much of the initial data that this project arose from. We also thank the University of Western Australia Centre for Exploration Targeting (UWA CET) for providing a sample set from Western Australia orogenic gold deposits. Funding for the compilation of additional data and the refining of the classifier was provided by the National Science Foundation Frontiers in Earth System Dynamics (NSF FESD) program and the National Aeronautics and Space Administration (NASA) Astrobiology Institute under cooperative agreement NNA15BB03A issued through the Science Mission Directorate. This study also benefited from data collected as part of the Australian Mineral Industry Research Association (AMIRA) International project P1060, Enhanced Geochemical Targeting in Magmatic-Hydrothermal Systems. The authors gratefully acknowledge Alan Goode and Adele Seymon (AMIRA International) and all the industry sponsors of P1060 for their generous sponsorship of this research. We also thank Artur Deditius and Denis Fougerouse for valuable suggestions on the manuscript.

1.
Belousov
,
I.
,
Large
,
R.
,
Meffre
,
S.
,
Danyushevsky
,
L.
,
Steadman
,
J.
, and
Beardsmore
,
T.
,
2016
,
Pyrite compositions from VHMS and orogenic Au deposits in the Yilgarn craton, Western Australia: Implications for gold and copper exploration
:
Ore Geology Reviews
 , v.
79
, p.
474
499
.
2.
Breiman
,
L.
,
1984
,
Classification and regression trees
:
New York
,
Routledge
,
368
p.
3.
Breiman
,
L.
,
1996
,
Stacked regressions
:
Machine Learning
 , v.
24
, p.
49
64
.
4.
Breiman
,
L.
,
2001
,
Random Forests
:
Machine Learning
 , v.
45
, p.
5
32
.
5.
Carranza
,
E.J.M.
, and
Laborte
,
A.G.
,
2015
,
Data-driven predictive mapping of gold prospectivity, Baguio district, Philippines: Application of Random Forests algorithm
:
Ore Geology Reviews
 , v.
71
, p.
777
787
.
6.
Congalton
,
R.G.
, and
Green
,
K.
,
1998
,
Assessing the accuracy of remotely sensed data: Principles and practices
, 1st ed.:
Boca Raton, Florida
,
Lewis Publications
,
179
p.
7.
Cracknell
,
M.J.
, and
Reading
,
A.M.
,
2013
,
The upside of uncertainty: Identification of lithology contact zones from airborne geophysics and satellite data using Random Forests and support vector machines
:
Geophysics
 , v.
78
, p.
WB113
WB126
.
8.
Cracknell
,
M.J.
, and
Reading
,
A.M.
,
2014
,
Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data, and the use of explicit spatial information
:
Computers and Geosciences
 , v.
63
, p.
22
33
.
9.
Cracknell
,
M.J.
,
Reading
,
A.M.
, and
McNeill
,
A.W.
,
2014
,
Mapping geology and volcanic-hosted massive sulfide alteration in the Hellyer-Mt. Charter region, Tasmania, using Random ForestsTM and self-organising maps
:
Australian Journal of Earth Sciences
 , v.
61
, p.
287
304
.
10.
Danyushevsky
,
L.
,
Robinson
,
P.
,
Gilbert
,
S.
,
Norman
,
M.
,
Large
,
R.
,
McGoldrick
,
P.
, and
Shelley
,
M.
,
2011
,
Routine quantitative multi-element analysis of sulphide minerals by laser ablation ICP-MS: Standard development and consideration of matrix effects: Geochemistry
:
Exploration, Environment, Analysis
 , v.
11
, no.
1
, p.
51
60
.
11.
Demsar
,
J.
,
Curk
,
T.
,
Erjavec
,
A.
,
Gorup
,
C.
,
Hocevar
,
T.
,
Milutinovic
,
M.
,
Mozina
,
M.
,
Polajnar
,
M.
,
Toplak
,
M.
,
Staric
,
A.
,
Stajdohar
,
M.
,
Umek
,
L.
,
Zagar
,
L.
,
Zbontar
,
J.
,
Zitnik
,
M.
, and
Zupan
,
B.
,
2013
,
Orange: Data mining toolbox in Python
:
Journal of Machine Learning Research
 , v.
14
, p.
2349
2353
.
12.
Fernández-Delgado
,
M.
,
Cernadas
,
E.
,
Barro
,
S.
, and
Amorim
,
D.
,
2014
,
Do we need hundreds of classifiers to solve real world classification problems?
:
Journal of Machine Learning Research
 , v.
15
, p.
3133
3181
.
13.
Franchini
,
M.
,
McFarlane
,
C.
,
Maydagán
,
L.
,
Reich
,
M.
,
Lentz
,
D.R.
,
Meinert
,
L.
, and
Bouhier
,
V.
,
2015
,
Trace metals in pyrite and marcasite from the Agua Rica porphyry-high sulfidation epithermal deposit, Catamarca, Argentina: Textural features and metal zoning at the porphyry to epithermal transition
:
Ore Geology Reviews
 , v.
66
, p.
366
387
.
14.
Gadd
,
M.G.
,
Layton-Matthews
,
D.
,
Peter
,
J.M.
, and
Paradis
,
S.J.
,
2016
,
The world-class Howard’s Pass SEDEX Zn-Pb district, Selwyn basin, Yukon. Part I: Trace element compositions of pyrite record input of hydrothermal, diagenetic, and metamorphic fluids to mineralization
:
Mineralium Deposita
 , v.
51
, no.
3
, p.
319
342
.
15.
Gahegan
,
M.
,
2000
,
On the application of inductive machine learning tools to geographical analysis
:
Geographical Analysis
 , v.
32
, p.
113
139
.
16.
Gregory
,
D.D.
,
Meffe
,
S.
, and
Large
,
R.R.
,
2014
,
Comparison of metal enrichment in pyrite framboids from a metal-enriched and metal-poor estuary
:
American Mineralogist
 , v.
99
, p.
633
644
.
17.
Gregory
,
D.D.
,
Large
,
R.R.
,
Halpin
,
J.A.
,
Baturina
,
E.L.
,
Lyons
,
T.W.
,
Wu
,
S.
,
Danyushevsky
,
L.
,
Sack
,
P.J.
,
Chappaz
,
A.
,
Maslennikov
,
V.V.
, and
Bull
,
S.W.
,
2015a
,
Trace element content of sedimentary pyrite in black shales
:
Economic Geology
 , v.
110
, no.
6
, p.
1389
1410
.
18.
Gregory
,
D.D.
,
Large
,
R.R.
,
Halpin
,
J.A.
,
Steadman
,
J.A.
,
Hickman
,
A.H.
,
Ireland
,
T.R.
, and
Holden
,
P.
,
2015b
,
The chemical conditions of the late Archean Hamersley basin inferred from whole rock and pyrite geochemistry with Δ33S and δ34S isotope analyses
:
Geochimica et Cosmochimica Acta
 , v.
149
, p.
223
250
.
19.
Gregory
,
D.D.
,
Large
,
R.R.
,
Bath
,
A.B.
,
Steadman
,
J.A.
,
Wu
,
S.
,
Danyushevsky
,
L.
,
Bull
,
S.W.
,
Holden
,
P.
, and
Ireland
,
T.R.
,
2016
,
Trace element content of pyrite from the Kapai slate, St. Ives gold district, Western Australia
:
Economic Geology
 , v.
111
, no.
6
, p.
1297
1320
.
20.
Gregory
,
D.D.
,
Lyons
,
T.W.
,
Large
,
R.R.
,
Jiang
,
G.
,
Stepanov
,
A.S.
,
Diamond
,
C.W.
,
Figueroa
,
M.C.
, and
Olin
,
P.
,
2017
,
Whole rock and discrete pyrite geochemistry as complimentary tracers of ancient ocean chemistry: An example from the Neoproterozoic Doushantuo Formation, China
:
Geochimica et Cosmochimica Acta
 , v.
216
, p.
201
220
.
21.
Guyon
,
I.
,
2008
,
Practical feature selection: From correlation to causality
, in
Fogelman-Soulié
,
F.
,
Perrotta
,
D.
,
Piskorski
,
J.
, and
Steinberger
,
R.
, eds.,
Mining massive data sets for security—advances in data mining, search, social networks and text mining, and their applications to security: NATO Science for Peace and Security Series—D: Information and Communication Security
 :
Amsterdam
,
IOS Press
, p.
27
43
.
22.
Guyon
,
I.
,
2009
,
A practical guide to model selection: Machine Learning Summer School
:
Canberra, Australia
, January 26-February 6, 2009,
Proceedings
, p.
37
.
23.
Hastie
,
T.
,
Tibshirani
,
R.
, and
Friedman
,
J.H.
,
2009
,
The elements of statistical learning: Data mining, inference and prediction
, 2nd ed., Springer series in statistics:
New York
,
Springer
,
745
p.
24.
Kovacevic
,
M.
,
Bajat
,
B.
,
Trivic
,
B.
, and
Pavlovic
,
R.
,
2009
,
Geological units classification of multispectral images by using support vector machines
:
International Conference on Intelligent Networking and Collaborative Systems, Institute of Electrical and Electronics Engineers (IEEE)
,
Barcelona, Spain
, November 4–6,
2009
, Conference Presentation, p.
267
272
.
25.
Large
,
R.R.
,
Danyushevsky
,
L.
,
Hollit
,
C.
,
Maslennikov
,
V.
,
Meffre
,
S.
,
Gilbert
,
S.
,
Bull
,
S.
,
Scott
,
R.
,
Emsbo
,
P.
,
Thomas
,
H.
,
Singh
,
B.
, and
Foster
,
J.
,
2009
,
Gold and trace element zonation in pyrite using a laser imaging technique: Implications for the timing of gold in orogenic and Carlin-type sediment-hosted deposits
:
Economic Geology
 , v.
104
, no.
5
, p.
635
668
.
26.
Large
,
R.R.
,
Halpin
,
J.A.
,
Danyushevsky
,
L.V.
,
Maslennikov
,
V.V.
,
Bull
,
S.W.
,
Long
,
J.A.
,
Gregory
,
D.D.
,
Lounejeva
,
E.
,
Lyons
,
T.W.
, and
Sack
,
P.J.
,
2014
,
Trace element content of sedimentary pyrite as a new proxy for deep-time ocean-atmosphere evolution
:
Earth and Planetary Science Letters
 , v.
389
, p.
209
220
.
27.
Large
,
R.R.
,
Gregory
,
D.D.
,
Steadman
,
J.A.
,
Tomkins
,
A.G.
,
Lounejeva
,
E.
,
Danyushevsky
,
L.V.
,
Halpin
,
J.A.
,
Maslennikov
,
V.
,
Sack
,
P.J.
, and
Mukherjee
,
I.
,
2015a
,
Gold in the oceans through time
:
Earth and Planetary Science Letters
 , v.
428
, p.
139
150
.
28.
Large
,
R.R.
,
Halpin
,
J.A.
,
Lounejeva
,
E.
,
Danyushevsky
,
L.V.
,
Maslennikov
,
V.V.
,
Gregory
,
D.
,
Sack
,
P.J.
,
Haines
,
P.W.
,
Long
,
J.A.
, and
Makoundi
,
C.
,
2015b
,
Cycles of nutrient trace elements in the Phanerozoic ocean
:
Gondwana Research
 , v.
28
, p.
1282
1293
.
29.
Loftus-Hills
,
G.
, and
Solomon
,
M.
,
1967
,
Cobalt, nickel and selenium in sulphides as indicators of ore genesis
:
Mineralium Deposita
 , v.
2
, no.
3
, p.
228
242
.
30.
Lyons
,
T.W.
,
Werne
,
J.P.
,
Hollander
,
D.J.
, and
Murray
,
R.W.
,
2003
,
Contrasting sulfur geochemistry and Fe/Al and Mo/Al ratios across the last oxic-to-anoxic transition in the Cariaco basin, Venezuela
:
Chemical Geology
 , v.
195
, no.
1–4
, p.
131
157
.
31.
Lyons
,
T.W.
,
Anbar
,
A.D.
,
Severmann
,
S.
,
Scott
,
C.
, and
Gill
,
B.C.
,
2009
,
Tracking euxinia in the ancient ocean: A multiproxy perspective and Proterozoic case study
:
Annual Review of Earth and Planetary Sciences
 , v.
37
, p.
507
534
.
32.
Maier
,
R.C.
,
2011
,
Pyrite trace element haloes to northern Australian SEDEX deposits
: Ph.D. thesis,
Hobart, Australia
,
University of Tasmania
,
217
p.
33.
Maslennikov
,
V.V.
,
Maslennikova
,
S.P.
,
Large
,
R.R.
, and
Danyushevsky
,
L.V.
,
2009
,
Study of trace element zonation in vent chimneys from the Silurian Yaman-Kasy volcanic-hosted massive sulfide deposit (Southern Urals, Russia) using laser ablation-inductively coupled plasma mass spectrometry (LA-ICPMS)
:
Economic Geology
 , v.
104
, no.
8
, p.
1111
1141
.
34.
Maslennikov
,
V.V.
,
Maslennikova
,
S.P.
,
Large
,
R.R.
,
Danyushevsky
,
L.V.
,
Herrington
,
R.J.
,
Ayupova
,
N.R.
,
Zaykov
,
V.V.
,
Lein
,
A.Y.
,
Tseluyko
,
A.S.
,
Melekestseva
,
I.Y.
, and
Tessalina
,
S.G.
,
2017
,
Chimneys in Paleozoic massive sulfide mounds of the Urals VMS deposits: Mineral and trace element comparison with modern black, grey, white and clear smokers
:
Ore Geology Reviews
 , v.
85
, p.
64
106
.
35.
O’Brien
,
J.J.
,
Spry
,
P.G.
,
Nettleton
,
D.
,
Xu
,
R.
, and
Teale
,
G.S.
,
2015
,
Using Random Forests to distinguish gahnite compositions as an exploration guide to Broken Hill-type Pb-Zn-Ag deposits in the Broken Hill domain, Australia
:
Journal of Geochemical Exploration
 , v.
149
, p.
74
86
.
36.
Reimann
,
C.
, and
Filzmoser
,
P.
,
2000
,
Normal and lognormal data distribution in geochemistry: Death of a myth. Consequences for the statistical treatment of geochemical and environmental data
:
Environmental Geology
 , v.
39
, no.
9
, p.
1001
1014
.
37.
Revan
,
M.K.
,
Genç
,
Y.
,
Maslennikov
,
V.V.
,
Maslennikova
,
S.P.
,
Large
,
R.R.
, and
Danyushevsky
,
L.V.
,
2014
,
Mineralogy and trace-element geochemistry of sulfide minerals in hydrothermal chimneys from the Upper Cretaceous VMS deposits of the eastern Pontide orogenic belt (NE Turkey)
:
Ore Geology Reviews
 , v.
63
, p.
129
149
.
38.
Rodriguez-Galiano
,
V.F.
,
Chica-Olmo
,
M.
, and
Chica-Rivas
,
M.
,
2014
,
Predictive modelling of gold potential with the integration of multisource information based on random forest: A case study on the Rodalquilar area, southern Spain
:
International Journal of Geographical Information Science
 , v.
28
, no.
7
, p.
1336
1354
.
39.
Scott
,
C.
,
Lyons
,
T.
,
Bekker
,
A.
,
Shen
,
Y.
,
Poulton
,
S.
,
Chu
,
X.
, and
Anbar
,
A.
,
2008
,
Tracing the stepwise oxygenation of the Proterozoic ocean
:
Nature
 , v.
452
, no.
7186
, p.
456
459
.
40.
Tardani
,
D.
,
Reich
,
M.
,
Deditius
,
A.P.
,
Chryssoulis
,
S.
,
Sánchez-Alfaro
,
P.
,
Wrage
,
J.
, and
Roberts
,
M.P.
,
2017
,
Copper-arsenic decoupling in an active geothermal system: A link between pyrite and fluid composition
:
Geochimica et Cosmochimica Acta
 , v.
204
, p.
179
204
.
41.
Thomas
,
H.V.
,
Large
,
R.R.
,
Bull
,
S.W.
,
Maslennikov
,
V.
,
Berry
,
R.F.
,
Fraser
,
R.
,
Froud
,
S.
, and
Moye
,
R.
,
2011
,
Pyrite and pyrrhotite textures and composition in sediments, laminated quartz veins, and reefs at Bendigo gold mine, Australia: Insights for ore genesis
:
Economic Geology
 , v.
106
, no.
1
, p.
1
31
.
42.
Tribovillard
,
N.
,
Algeo
,
T.J.
,
Lyons
,
T.
, and
Riboulleau
,
A.
,
2006
,
Trace metals as paleoredox and paleoproductivity proxies: An update
:
Chemical Geology
 , v.
232
, no.
1
, p.
12
32
.
43.
Waske
,
B.
,
Benediktsson
,
J.A.
,
Árnason
,
K.
, and
Sveinsson
,
J.R.
,
2009
,
Mapping of hyperspectral AVIRIS data using machine-learning algorithms
:
Canadian Journal of Remote Sensing
 , v.
35
, no.
1
, p.
106
116
.

Daniel Gregory is an assistant professor in economic geology at the University of Toronto, Canada. He worked as an exploration geologist in the Yukon Territory, Canada, before he moved to Australia to complete his Ph.D. degree in economic geology and geochemistry at the Centre for Ore Deposit and Earth Sciences (CODES), Tasmania. Daniel held postdoc positions at CODES and the National Aeronautics and Space Administration (NASA) Astrobiology Institute at the University of California Riverside (UCR) investigating basin-scale whole-rock geochemistry and mineral chemistry using macro- and nanoanalytical techniques. He focuses on in situ trace element analyses to understand the fluids related to ore deposit formation. Dan is testing machine learning techniques to identify ore deposit style and vector toward economic mineralization.

Gold Open Access: This article is published under the terms of the CC-BY 3.0 license.