The Eastern Goldfields of Western Australia is one of the world’s premier gold-producing regions; however, large areas of prospective bedrock are under cover and lack detailed lithologic mapping. Away from the near-mine environment, exploration for new gold prospects requires mapping geology using the limited data available with robust estimates of uncertainty. We used the machine learning algorithm Random Forests (RF) to classify the lithology of an underexplored area adjacent to the historically significant Junction gold mine, using geophysical and remote-sensing data, with no geochemical sampling available at this reconnaissance stage. Using a sparse training sample, 1.6% of the total ground area, we produce a refined lithologic map. The classification is stable, despite including parts of the study area with later intrusions and variable cover depth, and it preserves the stratigraphic units defined in the training data. We assess the uncertainty associated with this new RF classification using information entropy, identifying those areas of the refined map that are most likely to be incorrectly classified. We find that information entropy correlates well with inaccuracy, providing a mechanism for explorers to direct future expenditure toward areas most likely to be incorrectly mapped or geologically complex. We conclude that the method can be an effective additional tool available to geoscientists in a greenfield, orogenic gold setting when confronted with limited data. We determine that the method could be used either to substantially improve an existing map, or produce a new map, taking sparse observations as a starting point. It can be implemented in similar situations (with limited outcrop information and no geochemical data) as an objective, data-driven alternative to conventional interpretation with the additional value of quantifying uncertainty.

With the increasing cost and difficulty of new discovery in areas with substantial amounts of cover, there is a need for improved approaches to mineral exploration. In the Eastern Goldfields of Australia, very detailed geologic, geochemical, and geophysical data sets exist near mines. There is, however, a sharp transition into adjacent greenfield areas where such data are not available and the geology is significantly less well-constrained. Geophysical and remote-sensing data are widely available at a reasonable resolution either in the form of government or multiclient data sets or as a first-pass acquisition performed by explorers when new ground is acquired. Machine learning presents an attractive way forward, facilitating the use of these data to improve a preliminary lithology map or to produce a starting map from limited observations: in each case, improving an explorer’s ability to identify targets. Previous studies, however (e.g., Waske et al., 2009; Cracknell et al., 2014; Harris and Grunsky, 2015), have primarily used a richer and more diverse set of data inputs such as geochemistry or additional spectral information or made use of a different algorithm, such as, for example, support vector machines (SVMs) (Yu et al., 2012) or artificial neural networks (Barnett and Williams, 2009). In this study, we assess the ability of the machine learning algorithm (MLA) Random Forests (RF) to produce a geologic classification using only those geophysical and remote-sensing data that would be available to an explorer in a greenfield, early stage exploration environment.

Geologic setting

The Heron South project area is located approximately 15 km east of the Junction gold mine, in the St. Ives Goldfield of the Yilgarn Craton, Western Australia (Figures 1 and 2). The St. Ives camp is estimated to contain in excess of 300 t of gold, with orogenic and (to a lesser extent) intrusion related gold deposits hosted throughout the entire local stratigraphy, making it one of Australia’s largest gold-producing districts (Crawford, 2011). The Archean (2.7–2.6 Ma) bedrock stratigraphy comprises a series of mafic-ultramafic volcanic and intrusive units, volcanoclastic sediments, and felsic intrusions crosscut by Proterozoic-age basaltic dikes. The region has undergone pervasive, regional greenschist to lower amphibolite metamorphism. The St. Ives Goldfield is bound to the west and east by the Merougil and Boulder-Lefroy Fault Zones, respectively.

The region was subject to several distinct phases of deformation between 2675 and 2620 Ma. Until recently, the deformational framework for the region had largely focused on compressional events (e.g., Ngyuen, 1997; Swager, 1997; Connors et al., 2002). The more recent study by Blewett et al. (2010) includes events related to extension, important in understanding the formation of the younger volcano-sedimentary units of the region (Squire et al. 2010). The revised framework proposed by Blewett et al. (2010) is as follows: D1 is characterized by east–northeast/west–southwest extension. The D2 period represents a phase of east–northeast/west–southwest contraction that caused regional north-northwest-trending folds and reactivation of faults produced during D1 as thrusts. This was followed by D3, a period of extension on the same orientation. The D4a period of contraction tightened existing north–northwest folds and was followed by D4b, a period of sinistral transpression. During this deformation event, existing structures, such as the Boulder Lefroy Fault Zone, which passes through Heron South, were reactivated as sinistral strike slip faults. Localized deflections, step-overs, and local higher order structures produced during D4b are associated with the main mineralizing event in the region (Cox and Ruming, 2004; Blewett et al., 2010; Miller et al., 2010). The D5 period of dextral transtension produced north–northeast-trending strike-slip high-angle faults. These structures may also be associated with a gold mineralizing event at various sites in the region (Connors et al., 2002; Ruming, 2006; Blewett et al., 2010; Miller et al., 2010). The D6 period is not documented in the St. Ives Goldfield. The D7 period of contraction is associated with the emplacement of dominantly east–northeast-trending Proterozoic dikes, which occur in abundance in the study area. The Heron South project area is proximal to the Boulder-Lefroy Fault Zone, which passes through the southwest of the project in a north–northwest orientation.

The geology of the study area is split by the Boulder Lefroy Fault Zone into a western and an eastern region. The western area forms part of the main St. Ives sequences and contains thick successions of Paringa Basalt and Black Flag Group volcano-sedimentary sequences. The eastern area contains north−south-striking, steeply dipping packages of mafic-ultramafic and sedimentary units impinged between larger granitoid bodies (Figure 2). It is anticipated that these units are correlates of the main stratigraphic sequence mapped at St. Ives, however, this has not yet been confirmed. For the purpose of this study, these units have been defined by the interpreted geologic map of the St. Ives Goldfield (Figure 3), as stratigraphically distinct. Stratigraphic labels can be assigned to these units as geochemical and geochronological information becomes available allowing these units to be amalgamated or subdivided as required at a future date.

RF

RF (Breiman, 2001, pp. 5–32) is a supervised ensemble classification algorithm and an extension of the decision tree method. This classifier constructs a “forest” comprising many decision trees (Figure 4), allowing for superior performance and lower sensitivity to over-fitting compared with single classifiers (Hastie et al., 2009, pp. 587–604). Randomness is introduced at two stages during implementation of the algorithm. First, a process of bootstrap aggregation, known as bagging (Breiman, 1996, pp. 123–124) is used to modulate the training data (Ta) available to each decision tree. Bagging obtains for each tree, via random sampling with replacement, a subset of Ta equal in size to Ta. This duplicates some samples and will not select others. An average of approximately 63.2% of instances is included in each training subset, whereas the remaining, or “out-of-bag,” samples (approximately 37.8%) are used for validation. The second form of randomization involves the selection of variables available to the classifier to split each node. At each node, a random subset of input variables selected from all available input variables. The number of variables in this subset is predefined and consistent across the forest. At every node, the randomly selected variables are then ranked by their ability to produce a split threshold that maximizes the homogeneity of child nodes (Figure 4) relative to the parent node. The decrease in the Gini index (equation 1), as implemented by Breiman et al. (1984), provides this measure. The Gini index is an expression of information purity given by
Gini(t)=c=1jgc(1gc),
(1)
where gc is an expression of the relative frequency of each class c, of a set comprising j classes, at a given node t; gc is given by
gc=ncn,
(2)
where nc is the number of samples comprising class c at a given node and n is the total number of samples comprising that node. Using this measure, the variable that produces the greatest improvement in homogeneity in child nodes relative to the parent node is used to split the node at the threshold that produced the best split. This is repeated at every node until sufficient depth is reached to produce nodes with complete homogeneity (or approached to within a defined tolerance). The class assigned by RF to each sample is determined by a majority vote compiled from the output of all classification trees (Breiman, 2001, p. 6).

Many studies have noted a point of diminishing returns, necessitating a forest be grown to a certain extent where a stable error minima is approached, beyond which, additional trees are redundant (e.g., Waske et al., 2009; Cracknell et al., 2014; Rodriguez-Galiano et al., 2014; Harris and Grunsky, 2015). RF has been shown to achieve equal or better accuracy to other classification algorithms with the advantage that parameter selection is relatively straightforward (e.g., Hastie et al., 2009; Cracknell and Reading, 2013). The process of RF training can be performed on any PC with specifications readily commercially available at the time of this study and does not require specialized equipment. In this study, combined training and cross validation of an RF for any given set of parameters required between 15 and 40 s on a Dell Precision T7610 with an Intel Xeon e2630 processor and 32Gb RAM. This is ideal for uptake by geoscientists because requirements for specialized computing skills and equipment are minimal.

RF has been increasingly applied to the problem of lithologic classification. Waske et al. (2009) compare RF and another popular MLA, SVM (Vapnik, 1995, 1998), in the context of mapping lithology using hyperspectral imagery. They conclude that RF and SVM achieve significantly more accurate results than standard classifiers. Although in that instance, SVM marginally outperformed RF, it is noted by the authors that RF remains an attractive option due to its high accuracy and ease of use. Cracknell and Reading (2014) compare RF with four other MLAs: SVM, Naïve Bayes, k-nearest neighbors, and artificial neural networks; as applied to lithologic mapping. In their study, RF marginally outperformed other MLAs. Although there were only small differences in accuracy, Cracknell and Reading (2014) demonstrate that RF was able to produce accurate results with simpler input parameters and at less computational cost than other algorithms evaluated. Another study by Cracknell and Reading (2013) assesses RF and SVM for lithology mapping and identification of lithologic contacts and zones of structural complexity. They discover that RF, in addition to an excellent overall performance, produced more usable outputs. Unlike for SVMs, high uncertainty was spatially associated with incorrect classification and proximal to geologic boundaries and zones of high structural complexity. Cracknell and Reading (2014) note that with increasingly spatially dispersed training data, the comparative performance of RF improved further, widening the gap over other MLAs.

Cracknell and Reading (2014) demonstrate that RF was able to identify and redefine incorrectly mapped features in western Tasmania using 2% of the surface area as training samples. Harris and Grunsky (2015) use a similar approach, applying RF to geologic mapping in northern Canada. They test two Ta selection scenarios: one based on lake sediment geochemical sample locations and another based on field-mapping observations. Both approaches produced meaningful results with the authors concluding that RF is of value as a first-pass mapping tool or as a means of focusing effort into areas where there is a mismatch between predicted geology and legacy maps.

Information entropy

There has been increasing effort in the field of mineral exploration to quantify the uncertainty associated with mapping and prediction. One such method, information entropy (H) (Shannon, 1948), is defined as
H=ki=1npilogpi,
(3)
where pi is the class membership probability at location i, n is the number of candidate classes, and k is an arbitrary positive constant. The k and the base of the logarithm can be selected by the user to define the scale. Information entropy has been used to great effect in a “per-voxel” setting to demonstrate how uncertainty is distributed spatially (Wellmann and Regenauer-Lieb, 2012).

In the process of producing a final classification, RF calculates the class-membership probabilities. These are defined as the proportion of trees in a RF that voted for a given candidate class (Hastie et al., 2009). RF class-membership probabilities can be used in equation 3 to calculate H for each classified instance. The properties of H for a two-class, binary system are such that a value of 0 corresponds to a 100% probability of one class occurring and a value of 1 corresponds to an equal probability of both represented classes being present. Information entropy in its general form preserves monotonicity such that an increase in the number of candidate classes results in a higher H. For the purpose of this study, a normalized version of H has also been used to account for the number of candidate classes by dividing H by the logarithm of the number of classes present, such that H assigned to each pixel represents, on a scale of 0–1, the range of minimum to maximum possible H for that pixel. As such, all pixels are comparable with regard to how close they each internally approach their minimum or maximum possible H. For example, for a pixel with two possible and equally probable classes and a pixel with four possible and equally probable classes, they shall both be described as H being equal to 1.

Data

In this study, 16 geophysical and remote-sensing data sets were used (Figure 5), and interpolated at a grid cell size appropriate (20%–25%) to their respective acquisition line spacing (Table 1). Landsat thematic mapper Landsat Program (2003) and Shuttle Radar Topography Mission (SRTM) products (United States Geological Survey, 2003) were procured in raster format, and their original point separation specifications were preserved (United States Geological Survey [2003] and United States Geological Survey [2006], respectively). Each data set was resampled to a 30 m grid to populate a matrix where each line takes the form: x,y,p1,p2,,pn, where x and y are the spatial coordinates and p are the various measured properties at each pixel. At the extent of the study area, this comprised approximately 56,000 samples. The compiled data were split into subsets comprising training (Ta) and test (Tb) data through a process of stratified spatially random sampling. One hundred samples were taken from each of the eight lithologic classes comprising the study area. These 800 samples comprising Ta represent approximately 1.6% of the total data set (Figure 6). The remaining 98.4% of data Tb were not shown to the classifier during the training process.

Variable ranking and selection

RF facilitates several means of ranking the importance of input variables. In this instance, each variable was permuted and the effect on the out-of-bag classification accuracy was measured. Those variables which, when permuted, produced the greatest change to classification accuracy were ranked highest (Table 2). Due to the relatively small number of data sets used in this study, none of the starting input variables were sufficiently well-correlated (as defined by a threshold at a Pearson’s correlation coefficient = 0.85) with one another to warrant removal, due to the duplication of information prior to ranking. To optimize the speed and interpretability of results, redundant variables were screened at this stage. Using Ta, variables were successively added to the classification according to their ranked importance established in the prior step. Accuracy was assessed using a forest comprising 500 classification trees, via 10-fold cross validation (Table 2). The cross-validation accuracy improved with the input of additional variables, albeit at a diminishing rate, until a peak cross-validation accuracy of 79% was achieved via the inclusion of variables ranked one to eight (Table 2). Beyond this point, no increase in cross-validation accuracy was observed through the inclusion of additional variables; as such, the Landsat data, ranked 9th to 15th, were omitted. This is logical given the sensitivity of reflectance methods to the immediate surface in an area heavily influenced by transported cover. Easting and northing were omitted at this stage to avoid over fitting to the classification based on position.

Classification and uncertainty

Eight hundred samples comprising 100 from each of the lithologic units defined above (Figure 6) were used to train an RF classifier. Each sample was attributed with the eight nonredundant variables identified during variable ranking. We used an RF comprising 500 trees with no limits on individual tree depth or subsequent pruning. The RF produced under these parameters required 12 s to train. Subsequently, the remaining data comprising Tb, which do not have an associated class, were shown to the trained classifier and a class prediction for each was made. Class-membership probabilities, describing the proportion of trees voting for each class, were retained for the calculation and assessment of H.

RFs produced a new version of the geologic map (Figure 7), correctly predicting mapped geology in 76.8% of Tb instances. The remainder of the samples can be categorized either as incorrect predictions or as showing new information not previously mapped, or incorrectly mapped in the starting product. When plotted, class probabilities produced by RF (e.g., Figure 77) show the spatial distribution of lithology-dependent class-membership probabilities. Areas where a class has a very high probability of occupying an area with little likelihood of another class being present such as, for example, the central zone of D2 (Figure 7) are apparent. There are, however, regions where multiple classes compete such that the class that ultimately is predicted displays a marginally higher probability than its competition (e.g., Figure 7 and 7).

The confusion matrix in Table 3 indicates, on a per-class basis, the distribution of correct and incorrect classification percentages with respect to all other classes. Several classes, namely, the basaltic and granitic units, have been predicted with a high degree of accuracy. One of the doleritic units (D2) is commonly classified by RF as basalt or high-MgO basalt. This suggests that either the classification was incorrect in this instance or, alternatively, areas mapped as dolerite are in fact basalt. There is spatial control on classification accuracy with misclassification more likely when units with similar petrophysical properties occur adjacent to one another. The overlapping petrophysical signals of these classes, particularly in the case of potential field data due to smooth transitions as opposed to sharp boundaries, may be contributing to a reduced ability to make accurate predictions. This is particularly notable where these classes occupy the same areas of the map suggesting that the similarity of properties and spatial proximity are factors.

The spatial distribution of H (calculated using equation 3) shows very few examples in which a candidate class has a 0 probability of occurrence in a given pixel. By definition, this means that it must be included as a term in the calculation of H, mitigating the ability to display the monotonic increase in H that additional possible classes impose. As such, a threshold probability of 2% was selected, below which a class can be considered, for this purpose, to be not present in that pixel. The calculation of H with this parameter imposed was used to produce a map of the spatial distribution of H (Figure 8). Areas in the central north and southwest of the project display the highest H, indicating that these areas are characterized by a high level of uncertainty across multiple classes that display a relatively high probability of being predicted. Conversely, areas in the east and west of the project extent that are classified as granite coincide with low H, indicating that, RF classifications can be treated with a high degree of confidence such that no other classes have a high probability of being present. When normalized for the number of possible classes, H represents the relative minimum to maximum possible H on a per-pixel basis (Figure 8). There is a direct relationship between normalized H and the observed discrepancies between the interpretation map and that produced by RF. This correlation can qualitatively observed in a visual comparison of Figure 8 and 8, and it was confirmed quantitatively by Kuhn et al. (2016), who demonstrate statistically distinct populations of H corresponding to correctly and incorrectly classified sample groups. Both H and normalized H can potentially form the basis of the assessment of the quality of RF predictions in the absence of a starting map with which to compare.

In the absence of the information that indicates orogenic gold mineralization directly, the ability to map and interpret geology accurately is a key feature in target identification and the establishment of priority areas for exploration. We have demonstrated in this study that RF was able to classify lithology with an accuracy of approximately 76% relative to an existing interpreted geologic map using only 2% of the available data as training samples. These results are comparable with those achieved by Cracknell et al. (2014), who use a similar approach, achieving 78% accuracy, and they compare favorably with similar implementations using SVM, such as by Yu et al. (2012), who achieve a consistency with the geology map of between 62.2% with a modal convolution filter applied. It is important, however, to note that different data and geologic conditions were encountered in each case. Nevertheless, the results of this study compare positively with similar applications in different settings.

Looking beyond bulk similarities, there is a wide range in performance with regard to predictive power of the RF as applied to individual classes. As shown in Table 3, the VS and D1 classes produced accuracies with respect to the starting geologic map, in the order of 59%, whereas the PB class exceeded 98%. It is likely that this excellent result is due to the spatially discrete and small area defined by the PB class, resulting in a very well-constrained class signature. The poor performance of the VS class is likely due to a highly variable class signature, the result of a wide range of sample locations and, potentially, misidentification in the original map. The D1 class was commonly confused with B (16.5% of instances) and HMgOB (14.9%), which is logical, given the compositional similarity of these mafic units. The D2 class, however, while quite accurately captured at 81.8% was confused most commonly with the THK class at a rate of 10.5% indicating the possibility of unmapped ultramafic material interspersed in the region mapped as D1, or conversely, doleritic intrusions in the THK. Alternatively, this could indicate erroneous mapping of these units in the original geologic interpretation map. The G class was most often confused with the D2 class. This is explained by erroneous mapping in the starting map being repartitioned into the D2 class, which RF extends further to the west, supported by the expression of H in that region (Figure 8).

An important component of these results was the observation that RF was able to preserve class labels defined from stratigraphic relationships and distinguishes between equivalent lithologies. In this case, the stratigraphic sequence is not well-characterized and geochemical data were not available to resolve this distinction. Geologic interpretations indicate that multiple dolerites and basalts are present in this region. The contrast between greenstone, felsic to intermediate intrusive bodies, and sedimentary packages is well-expressed in the gravity and magnetic data sets facilitating mapping using these variables via machine learning. It is, however, difficult to distinguish between units of similar composition using these data sets alone. Nevertheless, RF is able to capture this distinction, to the extent that it was present in the training data, and produce a map retaining stratigraphy and not simply amalgamating by rock type. Results produced by RF do not indicate a large-scale revision to the mapping or understanding of the structure in the area. Updates to lithologic boundaries could form the basis of an adjustment to the position of faults subparallel to stratigraphy and those that offset stratigraphy. Knowledge of the position within the stratigraphic column is important in an exploration context given that several models for the stratigraphic position of favorable host units, relative to the timing of gold deposition, have been identified. Again, this is contingent on the congruency of the sampled region. We suggest that when using geophysical data, the accuracy of RF lithologic predictions cannot be assumed to apply to adjacent terrains. Potential field data in particular are influenced by effects such as cover depth, or the response of deeper sources can produce a shift in absolute signal amplitude, not related to geology as mapped at the surface. As such, the rules defined by RF are only reliably applicable to the domain and from which they were derived. Radiometric data are indicative of surficial features and may be mirrored in adjacent or distant domains, however, it is also likely that these data may be influenced by weathering and vegetation, which differs from the study area. In any event, it is not anticipated that radiometric data alone would be sufficient to propagate mapping to greater distances beyond the sampled region. Our approach is designed as a pragmatic workflow; however, further insights might be gained by more geostatistical- or computer-science-oriented practitioners (e.g., Grunsky and Kjaarsgaard, 2016).

It is important to note that regardless of the physical response, elevation, depth to source, or height of the sensor of a method, RF will preferentially use whichever variables allow the algorithm to most accurately solve the given problem, in this case, lithology. The data sets which are ranked highest and the associated frequency response are entirely determined by those that allow RF to discriminate between the lithologies.

The topographic (SRTM) data set ranked highly among the available input data. Given the contiguity and dominant strike of the geology relative to topography, it is possible that topography is, in fact, serving as a proxy for lithologic position in the landscape. It is also probable that rock composition is one of the controlling factors in preferential weathering and hence topography, although this relationship is not always obvious in the region.

The Bouguer anomaly and reduced-to-pole (RTP) total magnetic intensity data sets were both ranked as more important to the classification than their first vertical derivatives. The most plausible interpretation of this result being that the potential field data are more closely related to rock composition at the scale of this study. The respective derivatives may define detailed features of the units that could reflect structural or compositional variability. This information is of immense value in accurately mapping and interpreting the regional and within-unit structural complexity of the area, but it does not necessitate a change to the lithologic class at any given location. Should the mapping area be expanded, the effects of regional trends would become more significant with derivatives, as a form of high-pass filter, being required to mitigate the influence of these trends and thus would likely be ranked of higher importance. It is possible that the introduction of additional, textural data, derived from those data sets could have improved results. It is worth noting, however, that of key importance is the ease of use of the method by geoscientists and as such we consider this a good demonstration of the method using readily available data sets, accessible to most projects without additional prerequisite knowledge of geographic information systems (GIS) operations.

The value H provided an indication of those areas where an operator can be confident of accurate mapping and those areas where they are more likely to be incorrect. Consistent with prior research (Cracknell and Reading, 2013; Cracknell, 2014), high uncertainty was generally observed in proximity to lithologic boundaries and areas of geologic complexity. Kuhn et al. (2016) demonstrate statistically, through examination of the distribution of normalized H of correctly and incorrectly classified (Figure 8) samples that H provides a good, albeit imperfect, proxy for inaccuracy. As such, H is a valuable tool when mapping in unknown areas and where validation against a known result is not possible. Performing any exploration activity requiring fiscal expenditure through a decision unknowingly underpinned by a type II statistical error in classification has a greater consequence than performing additional study on an area that in fact was mapped correctly. Displaying H highlights areas that require additional data collection, such that geoscientists can further validate these areas to within the scope of reasonable due diligence prior to additional expenditure. Conversely, areas producing low H do not require the same level of attention, and, as such, effort need not be expended here and can be diverted to those areas of higher uncertainty. We believe that H is therefore a valuable mechanism for quantifying uncertainty given that in addition to a normalized product, the purest form of H preserves monotonicity and provides a measure of the absolute uncertainty present throughout the classification.

The presence of highly magnetic Proterozoic dikes often confounds the ability to interpret Archaean stratigraphy. A manual interpreter may opt to attempt to see past these features in a somewhat subjective manner. It does, however, prohibit the use of absolute levels in classification of individual data sets, such as aeromagnetic imagery, when analyzing only that property, dikes are indistinguishable from other mafic units on a pixel-by-pixel basis. In this instance, our randomly selected training data included several samples of various rock units in the locations where they were intruded by Proterozoic dikes. Because this interaction was represented in the training data, RF was able to consistently map the underlying geologic class and was largely immune to the presence of these features. Looking at H, we can see that uncertainty does consistently increase by up to approximately 20% (Figure 8) in areas where dikes intrude other lithologies; however, the correct decision has still been obtained.

It is assumed that classifications produced by RF are deemed incorrect in the event that they do not conform to the geologic map. An interpreted geologic map, however, is a constantly evolving product. The accuracy and level of detail of an interpreted geologic map improves as data of higher resolution and accuracy and better interpretation techniques become available. In a greenfield setting, where a geologic map is based on limited outcrop and interpretation of potential field data sets, it is entirely plausible that it contains errors and/or oversimplifications.

When RF produces a result that differs from the geologic map that the training data are sourced from, H provides a means to assert whether the RF output or the reference information are likely to be incorrect. In this instance (Figure 8), we can see that the western boundary of the greenstone package is moved to the west relative to its position in the interpreted map. Low H at the original boundary suggests that RF predicted with high certainty that this was in fact an area of greenstone. In addition, H increases toward the predicted contact suggesting greater uncertainty as the transition between rock types was approached and the potential field signals “smear” (e.g., gravity decreasing toward the granitoid body). The relationship observed between the RF uncertainty and the distance to geologic boundaries is consistent with prior observations (e.g., Cracknell et al., 2014). A high H value is also observed in the southeast region of the study area. It is not possible to determine whether the interpretation map is incorrect; however, the RF classifications and high H suggest that this rock unit is significantly more complex than is shown. This is a clear example of the benefit of analysis of the RF classification in conjunction with uncertainty, and it may serve to optimize ongoing field efforts, either outcrop mapping or drilling as appropriate.

This study demonstrates that RFs may be applied to reconnaissance-type geophysical data, in the absence of geochemistry, and produce sound lithologic predictions. There are two obvious applications for the use of RF for early-stage geologic mapping. The first is for the refinement of an existing geologic map. The second is for the production of a geologic map from a limited number of observations in the creation of a first-pass map. Sparse outcrop or a broad drilling campaign could provide such starting observations, provided the spatial distribution of the observations adequately samples the project area.

In this demonstration study, RF was able to preserve class labels, i.e., the stratigraphic context, where more than one class comprised the same lithology. This is an important outcome because the timing relationships between mineralization and various stratigraphy are vital information for mineral prospecting. Proterozoic dikes, which are petrophysically indistinguishable from Archean mafic rocks in the study area, confuse aeromagnetic interpretation. RF using a higher dimensional data space can deal with this complication, provided examples of the dikes overprinting the older stratigraphy are sampled in the training data.

Information entropy (H) provides a valuable insight into the classification results. The highest H denotes areas of geologic/geometric complexity and proximity to lithologic boundaries. Where a predicted lithologic boundary significantly differs from the reference map, the behavior of H proximal to interpreted and predicted boundaries indicates which position is most probable. Statistically distinct populations in H correlate with correctly and incorrectly classified samples. Through understanding H, an optimal trade-off, retaining the greatest number of correct samples while discarding incorrect samples can be identified. Understanding the distribution of H for correct and incorrect sample populations allows a user to define an acceptable trade-off between discarding the maximal number of incorrectly classified samples or retaining a more complete, albeit a potentially less accurate, map. This will reflect the tolerance for risk of each individual explorer/company. The combination of RF classification and uncertainty appraisal allows explorers to critique quantitatively, the validity of map outputs — a quality control measure not available in conventional mapping.

We would like to thank Gold Fields Ltd. for access to data for the purpose of this study. S. Kuhn is supported by an Australian Postgraduate Award Scholarship from the University of Tasmania. This research was conducted in collaboration with the ARC Industrial Transformation Research Hub for Transforming the Mining Value Chain (project number IH130200004) at the Centre of Excellence in Ore Deposits, University of Tasmania. The views expressed herein are those of the authors and are not necessarily those of the Australian Research Council. We used the Orange software package (Demsar et al., 2013) for RF classification. Preprocessing, interpolation, and plotting were performed using Geosoft Oasis Montaj and ESRI ArcGIS. D. Doutch is thanked for his input on the geologic and structural setting of the project. We thank the assistant editor, associate editor, and three reviewers for their suggestions, which have significantly improved the manuscript.

Freely available online through the SEG open-access option.