Zircon is a common mineral in igneous rocks, which is resistant to both chemical weathering and physical abrasion. Its chemistry can potentially be used to distinguish ore-forming porphyry magmas from barren magma systems. This study compiles >23,000 zircon analyses from >30 porphyry deposits, barren intrusions, and rivers to determine the principal geochemical characteristics of fertile zircons using predictive modeling, and compares them with traditional geochemical thresholds. The results show that the Eu/Eu* and Dy/Yb ratios, P content, and the curvature at the end of rare earth element (REE) patterns (λ3) are the most diagnostic characteristics of fertile zircons. The use of geochemical thresholds, as Boolean conditions, reach their maximum performance for Eu/Eu* and Dy/Yb (sensitivity [sens] = 0.73, specificity [spec] = 0.90), but it is outperformed by the random forest model (sens = 0.91, spec = 0.93) in the testing set. Explanatory analysis of the models shows that the fertility signal in zircons becomes stronger as the porphyry system evolves and is accompanied by an overall decrease in the middle to light REE and P content, characteristics that are absent in barren zircons. We attribute the observed difference in λ3 to the cocrystallization of other accessory phases, suggesting that the changes in the zircon Ce anomaly is controlled by the depletion of light and middle REE. The low P content in fertile zircons is caused by extensive crystallization of apatite. Fertile zircons have an excess of (REE + Y)3+, which we attribute to charge-balance by H+ in hydrous magmas. Simple machine learning algorithms outperform the traditional geochemical discriminators in their predictions and provide insights into characteristics that have not previously been considered for evaluating porphyry copper fertility using zircon geochemistry. We propose simplified methods that can be easily incorporated into exploration workflows.

Zircon (ZrSiO4) is a widespread accessory mineral found in a range of geologic environments, including sedimentary, igneous, and metamorphic rocks (Finch and Hanchar, 2003). It incorporates diagnostic elements and isotopes into its structure, which—in addition to its remarkable resilience to hydrothermal alteration, chemical weathering, and physical abrasion—makes zircon the mineral of choice for a range of geochemistry and geochronology applications (Finch and Hanchar, 2003, and references therein). For example, detrital zircons have been used to date the oldest known fragments of Earth (Froude et al., 1983; Compston and Pidgeon, 1986) and to identify periods of major mountain building throughout Earth’s history (e.g., Zhu et al., 2022).

Zircon is present as an accessory mineral in most silicate rocks from intermediate to felsic composition (Lu et al., 2016, and references therein) and, in igneous rocks, it contains a significant fraction of the U, Th, Hf, and rare earth elements (REEs). These elements, and in some cases their isotopes, provide insights into petrogenetic processes (Hoskin and Schaltegger, 2003, and references therein). In addition, the incompatibility of Pb in the zircon lattice, and the relative high abundance of U and Th, make zircon a perfect mineral for U-Th-Pb geochronology (Hoskin and Schaltegger, 2003; Ireland and Williams, 2003). The concentrations of Ti, Ce, and U can also be used to calculate intensive parameters such as the crystallization temperature from the Ti-in-zircon geothermometer (Watson et al., 2006; Ferry and Watson, 2007) and oxygen fugacity (Smythe and Brenan, 2016; Loucks et al., 2020).

Several studies have attempted to use the chemistry of zircon to distinguish magmas associated with porphyry copper deposits from those that are not (Ballard et al., 2002; Dilles et al., 2015; Lu et al., 2016; Loader et al., 2017; Pizarro et al., 2020; Lee et al., 2021; Leslie et al., 2021). The main discriminators that have been used are Eu/Eu*, Ce4+/Ce3+ (or CeIV/CeIII), and Ce/Ce*, which are interpreted to indicate the oxidation state of mid-crustal magmas (Ballard et al., 2002), which in turn controls the sulfur speciation in the melt (Jugo et al., 2010; Richards, 2015). The stable form of sulfur in an oxidized magma is S6+, which is more soluble than S2– in silicate melts (Jugo et al., 2010). The higher solubility of S in oxidized magmas delays or prevents the precipitation of immiscible sulfide melts, which sequester the chalcophile elements, including Cu and Au. In the absence of sulfide saturation these elements are incompatible and concentrate in the melt during fractional crystallization (Jugo et al., 2010; Yang, 2012). However, recent studies on the timing of sulfide saturation have shown that most magmas from porphyry copper deposits became sulfide saturated before the ore formation—including the two mega-deposits of El Teniente and Rio Blanco porphyry copper deposits—which suggests that sulfide saturation is a second-order control for ore formation, provided the amount of sulfide is not excessive (e.g., Lowczak et al., 2018; Cajal Contreras, 2023).

Europium anomalies are traditionally calculated as the EuN/Eu*, where Eu* = (SmN × GdN)0.5, with N denoting chondrite-normalized values. Values of Eu/Eu* over 0.4 in zircon have been used to identify grains formed in oxidized magmas and have been generally accepted as a discriminator for magma fertility (Ballard et al., 2002; Burnham and Berry, 2012; Trail et al., 2012; Dilles et al., 2015; Loader et al., 2017; Pizarro et al., 2020). Other studies suggest limits of >0.3 to >0.35 for Eu/Eu* (Lu et al., 2016; Leslie et al., 2021).

The magnitude of the Ce anomaly (as Ce4+/Ce3+ or Ce/Ce*), which could be used to discriminate fertility in porphyry systems, is still debated. Previous studies have suggested that zircons associated with porphyry copper deposits have a Ce4+/Ce3+ >300 (Ballard et al., 2002) or >100 (Liang et al., 2006). However, the main disadvantage of this method is that it requires knowledge of the composition of the parental magma from which the zircon crystallized, which limits its use to cases where the whole-rock or melt compositions are known. Consequently, Ce4+/Ce3+ cannot be calculated for detrital zircons. Another problem with the Ce4+/Ce3+ method is that it assumes that Ce concentrations in whole-rock analyses are representative of their concentrations in the melt (Dilles et al., 2015), and it does not consider the effect that cocrystallizing phases can have on the zircon REE content (e.g., titanite; Loader et al., 2017). Finally, it has been shown that the Ce4+/Ce3+ ratios are not only sensitive to redox changes but also to fractionation and cooling (Loucks et al., 2018; Loader et al., 2022), which can lead to misleading interpretations.

Ce* is traditionally calculated relative to the geometric mean of the adjacent REEs, that is Ce* = (LaN × PrN)0.5. The use of Ce anomaly sensu stricto has generally been avoided because of the analytical challenge of accurately measuring La and Pr in zircon crystals, which are often present at concentrations below the analytical detection limits. Furthermore, La and Pr are also susceptible to contamination by light REE (LREE)-rich inclusions in zircon, such as apatite or monazite (Dilles et al., 2015; Loader et al., 2017; Zou et al., 2019; Zhong et al., 2019; Burnham, 2020). Loader et al. (2017) suggested using NdN = (Ce* × SmN)0.5 as an estimation of Ce*, because Nd and Sm concentrations are more easily determined in zircon than La and Pr. However, as does the Ce/Ce* sensu stricto, this method assumes that the REE pattern of zircon is linear and, because it is not, it overestimates the Ce* values (Zhong et al., 2019; Loader et al., 2022; Carrasco-Godoy and Campbell, 2023). Zhong et al. (2019), on the other hand, used the logarithmic relationship between the REE atomic numbers and the chondrite normalized REE concentrations to provide a better approximation to calculate Ce*. However, when this method is compared to lattice strain estimates of Ce* and La, the former underestimated the values for Ce* and La, with the magnitude of the underestimation increasing at lower values (Carrasco-Godoy and Campbell, 2023). Recently, Carrasco-Godoy and Campbell (2023) presented a new method to calculate Ce/Ce* in magmatic zircons, which is based on Onuma diagrams (Onuma et al., 1968) but uses chondrite-normalized values instead of partition coefficients. This new method allows Ce/Ce* to be calculated when data are missing, and it was calibrated against lattice strain estimates.

Pizarro et al. (2020) studied zircons from porphyry copper deposits and barren suites in northern Chile. They compiled several geochemical indicators from the literature and proposed the term porphyry indicator zircon (PIZ). They attribute a zircon with Hf >8,750 ppm, Ce/Nd >1, Eu/Eu* >0.4, 10,000 × Eu/Eu*/Y >1, (Ce/Nd)/Y >0.01, Dy/Yb <0.3, and Th/U between 0.1 and 1 as having crystallized from an oxidized, water-rich magma and being an indicator of fertility. Leslie et al. (2021) suggested that zircons with Gd/Yb ratios <0.07 are indicative of porphyry fertility. Many of the discriminators that are used to identify zircons from oxidized or fertile magmas are defined on an ad hoc basis, using a mix of geochemical knowledge and empirical observations.

Loucks et al. (2020) suggested an empirical oxybarometer based on the Ce, Ti, and U content of zircon. The sensitivity of the Ti content in zircon to temperature changes (Ferry and Watson, 2007), together with the capacity of Ce and U to register redox changes (Ballard et al., 2002; Loucks et al., 2018), allowed them to calibrate a zircon oxybarometer {ΔFMQ = 2.28 + 4.00 × log10[Ce × (U × Ti)0.5] (± 0.64)} from ΔFMQ –6 to 4. This oxybarometer helps to reach a better understanding of redox changes in evolving magmas and can be used in conjunction with other variables as a prospective tool in porphyry copper exploration.

In this study, we evaluate the hypothesis that if the conditions of formation of an economic porphyry copper deposit produce zircons with a unique trace element geochemistry, then the likelihood that a detrital zircon originated from a porphyry copper deposit can be predicted. Previous studies (e.g., Lu et al., 2016; Pizarro et al., 2020; Leslie et al., 2021) have defined geochemical discriminators using a mix of understanding of the geologic processes involved and elements’ behavior during porphyry copper deposit formation. However, in all previous studies, the discriminators have been arbitrarily defined or based on descriptive statistics, which are not optimal for predictive purposes. We have curated and used a database of more than 23,000 zircons from porphyry copper deposits and barren sources. We have used a data-driven process and machine learning methods to define discrimination boundaries for fertile zircons that can be benchmarked according to their predictive power through metric statistics and compared to those suggested by previous studies. Finally, we have trained simple machine learning algorithms, including logistic regression, classification tree, and random forest models, which not only greatly outperform the traditional geochemical discriminators in their predictions, but also give insights into new characteristics that have not previously been considered for evaluating porphyry copper fertility using zircon geochemistry.

A zircon database

A database containing zircon trace element data from mineral deposits, barren suites, and detrital river zircon grains was compiled from the available literature. This database was used to define a set of limits capable of distinguishing zircons associated with ore-bearing magmas from those that are not, and to train and test machine-learning algorithms.

The data selected were carefully reviewed with consideration of the following:

  1. Cerium and Eu were analyzed.

  2. At least three REEs were analyzed, other than Ce, Eu, and La. This is the minimum amount of data required to impute the missing REE and to obtain Ce and Eu anomalies using the method of Carrasco-Godoy and Campbell (2023).

  3. Zircons were omitted if the source indicated that they contained inclusions, were inherited, or had lost lead.

  4. Ideally, P and Ti data were available to screen for possible inclusions.

In total, data from 23,314 zircon grains were compiled and divided into four subsets:

Deposit data set: This data set contains 8,366 zircons that were carefully compiled from literature considering the factors stated above as well as the analytical method used and data quality. A total of 31 major porphyry copper deposits or districts including Cu-dominant, Cu-Mo, Cu-Au, and Cu-Au-Mo are included and summarized in Figure 1. Details and references are online in Appendix 1, Table A1. These zircons are labeled as fertile (~35% of the complete data set). In addition, each zircon was temporally classified, with respect to the deposit’s main mineralization event, as precursor, premineralization, synmineralization or postmineralization to constrain its geologic relationship to the deposit. The temporal labels were assigned according to published information, where available, or constrained between mineralization and zircon ages if not. An additional class called ore-related magmatism, refers to units temporally and spatially associated with ore deposits that could not be classified as one of the other classes with the available information.

Barren data set: This data set contains 823 zircons that were included in the same publications as the deposits subset but were identified as barren within the deposit districts. These are zircons that are in igneous units that are spatially or temporally associated with the ore deposits, but not associated with mineralization. The S- and I-type granite zircons from Burnham and Berry (2017) and Zhu et al. (2020) were also included in this subset because they are not associated with any known deposit.

The river database fromZhu et al. (2020): This data set contains 7,401 detrital zircons from the world’s major rivers. Porphyry copper deposits are extremely rare (Richards, 2013; Richards, 2022); therefore, it is reasonable to assume that most of the river zircons are not associated with porphyry copper deposits and can be considered barren. This data set does not report Ti concentrations.

GEOROC data set: These zircons are from the Geochemistry of Rocks of the Oceans and Continents (GEOROC) database (Lehnert et al., 2000), which contains more than 300,000 zircon analyses (accessed in May 2021). The data were filtered according to the parameters mentioned above, and only zircons of magmatic origin were considered. After filtering the data for zircons with the required information, 6,661 zircon analyses remained. This data set was treated with care, filtering most, if not all, of the possible zircons that are out of context for this work, because it is impossible to evaluate the quality of every single publication in this aggregated database. The zircons in this data set were also classified as barren after removing three of the 72 entries (4.16%) in the GEOROC database that are from porphyry copper deposits.

Several published data sets lack a complete set of REE analyses, which makes comparison difficult, and sometimes impossible. For this reason, the method of Carrasco-Godoy and Campbell (2023) was used to impute missing REE and Y data when this information was not available and to calculate La, Pr, and Ce/Ce*.

A second data set containing 2,207 zircon analyses was also prepared for external validation of the predictive modeling algorithms. This considers data that were not used for model training or testing. It includes data from rivers in Europe (n = 631; Zhu et al., 2023), the anorogenic suite from the Wichita igneous province (n = 466; Wall et al., 2020) that is considered to be barren, the Tampakan porphyry Cu-Au deposit (n = 288; Parra-Avila et al., 2022), the Quellaveco district (n = 746; Nathwani et al., 2021), and several deposits from Central Asia (n = 76; Shen et al., 2015). There are some zircons from the Tampakan deposit in the main zircon database, but the number is small (n = 51, ≈0.25% of the data set; Lu et al., 2016).

Selecting clean zircons

Evaluating data quality is a major challenge when working with large data sets. Therefore, criteria were established to exclude outliers (e.g., inclusions, analytical issues, or mistyped entries). Zircons containing anomalously high Ti or P and La suggest the presence of titanite or apatite inclusions, which result in analysis with abnormally high light REEs, P, or Ti. Quantile-quantile plots were used to determine boundaries to identify outlier zircon populations.

Figure 2A, B, and C show the Ti, La, and P concentrations in quantile-quantile plots for the whole data set. These plots are built on the distribution of the concentrations’ logarithm, compared with a theoretical perfect normal distribution calculated from the data. Changes in slope show subsets of zircons belonging to different populations (e.g., zircon with and without inclusions, Campbell et al., 2006). Fu et al. (2008) suggested that igneous zircons normally have less than 20 ppm Ti, whereas zircons from more mafic, higher-temperature magmas have higher Ti concentrations. Other authors have used a higher cutoff limit of 40 to 80 ppm (Burnham and Berry, 2017; Zhu et al., 2020). Figure 2A shows a significant change in slope at a Ti concentration of ca. 60 ppm. This concentration is consistent with other values presented in the literature, and we have used it as a cutoff limit for inclusion-free zircon.

Similarly, the La slope changes at ca. 1.5 ppm and there is a step at ~10 ppb (Fig. 2B). It has been suggested that micro-inclusions of LREE-rich minerals (e.g., allanite, apatite, titanite) or surface contamination might cause high La concentrations in zircon (Zhong et al., 2019; Zou et al., 2019; Burnham, 2020). There is no consensus as to the maximum La concentration of an inclusion-free zircon, but modelling of partition coefficient data suggests that the zircon lattice can accommodate La concentrations within the range of 2 to 10 ppb (Claiborne et al., 2017; Zhong et al., 2019; Zou et al., 2019; Burnham, 2020), which is considerably lower than the abundances obtained during routine LA-ICP-MS analyses. Other authors have accepted zircons with values <320 ppb to <1 ppm as being inclusion-free during the data filtering (e.g., Burnham and Berry, 2017; Zhu et al., 2020). We have used a higher concentration of 1.5 ppm as the cutoff limit for our database, based on the change in slope in Figure 2B. A higher cutoff has the benefit of increasing the availability of data for analysis but comes at the risk of including LREE-contaminated analyses. However, we have used calculated values for La and Pr following the method of Carrasco-Godoy and Campbell (2023) in preference of analyzed values, which are considered unreliable. An additional filter at 2.25 ppm calculated La was applied to further discard compromised analyses. The choice of a higher threshold for inclusion-free zircon based on calculated La is because the Carrasco-Godoy and Campbell (2023) method produces La values that are half an order of magnitude higher than lattice strain estimates.

The quantile-quantile plot for P shows a major change in slope at ca. 80 ppm (Fig. 2C). If we take this value as a cutoff for zircon with inclusions, nearly 88% of the data set would be discarded. The two populations can be attributed to differences in the P substitution mechanism. Charge balance for P5+ in the higher P population is by (REE, Y)3+ substituting for Zr4+ in the zircon lattice, known as the xenotime substitution, where (REE, Y)3+ + P5+ = Zr4+ + Si4+ (Finch et al., 2001; Burnham and Berry, 2017). This effect is stronger for S-type granites than for I-type granites because S-type granite melts contain more P (Burnham and Berry, 2017; Zhu et al., 2020). Belousova et al. (2002) studied igneous zircons in different geologic environments and found that P concentrations in zircons from granitic rocks can vary from hundreds to thousands of parts per million, increasing to even weight percent level in metamict zircons. Therefore, following Zhu et al. (2020), a cutoff limit of 2,000 ppm of P was used to classify zircons as inclusion-free.

Most of the high-La and high-Ti zircons came from the GEOROC data set. In contrast, for most of the data from river zircons, those associated with ore deposits and those associated with barren intrusions were screened by the authors prior to publication, whereas the quality of the GEOROC data set was not assessed. Quantile-quantile plots were also used to identify and remove isolated outliers when a single zircon or small group (n <10) were an order of magnitude higher than the rest of the population. This resulted in the removal of one zircon from the GEOROC database with Lu >2,500 ppm; nine zircons from the river database with Eu/Eu* >10; and six zircons from the deposit data set with Al >4,000 ppm, or As or Sr >10 ppm. After filtering the data for outliers, a total of 20,019 zircon grains remained.

Code and methods

The data analysis was completed using the R programming language (v4.2.1) and the tidyverse (v2.0.0) metapackage (Wickham et al., 2019; R Core Team, 2022). The machine learning algorithms were fitted and tuned using the tidymodels (v1.1.1) metapackage (Kuhn and Wickham, 2020; Kuhn and Silge, 2022), using the stats (v4.2.1) package for the logistic regression (R Core Team, 2022), rpart (v4.1.19) for the classification trees (Therneau and Atkinson, 2022), and the ranger (v0.15.1) package for the random forest classification (Wright and Ziegler, 2017). Lambda parameters were calculated using the pyrolite library for python (Williams et al., 2020; Anenburg and Williams, 2022) based on the work of O’Neill (2016).

The predictive modeling workflow considered a feature engineering and selection stage to reduce the dimension of the data and identify the most relevant variables, necessary for nonparametric models. The models were trained using a repeated (n = 5) 10-fold cross validation and a calibration curve on 80% of the data, and then the models were validated using the remaining 20% as a test set. The split into test and training set considered a stratified random sampling by deposit to ensure that all the deposits and barren suites are represented in each set. A second test set with data excluded from the modeling was used for further evaluation of the model. More details of the methods are included in the relevant sections.

A common aim in geochemistry, when applied to economic geology, is developing methods that can be used to discriminate ore-bearing from barren suites of igneous rocks based on their chemical composition or modal mineralogy (e.g., Winchester and Floyd, 1977; Loucks, 2014). This is usually attempted by studying variables that can be used to separate the different groups based on the behavior of selected elements during petrogenetic processes. The procedure starts with known data to define boundaries, which can then be applied to unknown data sets. These boundaries are often empirical and sometimes arbitrary, and new variables can be created in the form of ratios that are visually represented in plots where the different groups can be distinguished by occupying different regions.

The field of predictive modeling (also known as machine learning or statistical learning), and in particular, supervised learning, uses mathematical methods and algorithms to predict the outcome of unknown data based on known observations (Kuhn and Johnson, 2013; James et al., 2021; Kuhn and Silge, 2022). These predictions can be numeric, which are called regression models (e.g., the lattice strain theory), or they can sort observations into categories (in this case fertile or barren zircons) that are called classification models. Classification problems are common in geochemistry, and predictive modeling can help to discriminate between different groups of unknown data based on known data sets.

Classification models use a set of variables to estimate the class membership probability of each observation. The most basic approach to predictive modeling is a limit discriminating between classes. For example, if the value of a specific variable is higher than the boundary condition, then it is one class; whereas if it is not, it is the other class (Irizarry, 2020). Moreover, benchmark metrics in predictive modeling allow us to measure how effective each variable is in discriminating among different classes of a given data set (Kuhn and Johnson, 2013; Irizarry, 2020; James et al., 2021; Kuhn and Silge, 2022).

In this study, we use the tools from machine learning to define discriminating boundaries for fertile and barren zircons from their geochemistry and compare them with current limits in the literature (e.g., Pizarro et al., 2020).

A simple predictive model

The definition of a limit represents the simplest case of a binary classification problem (Irizarry, 2020). If we consider only Eu/Eu*, we expect zircon with an Eu/Eu* >0.4 to be classified as a zircon associated with a porphyry copper deposit (probability of being fertile = 1), whereas those with a value ˂0.4 will be assigned as barren (probability of being fertile = 0). Then, the prediction can be compared against the actual observation in a matrix, which is called a confusion matrix (Table 1; Kuhn and Johnson, 2013; James et al., 2021). In this matrix, the correctly discriminated data, given the classification criteria (Eu/Eu* >0.4) and reference level (fertile zircon), are defined as true-positive (fertile zircon predicted as fertile), true-negative (barren zircon predicted as barren), false-positive (barren zircon predicted as fertile), and false-negative (fertile zircon predicted as barren).

A set of metrics can be extracted from the confusion matrix to evaluate the performance of a classification model (Altman and Bland, 1994a; Altman and Bland, 1994c; Kuhn and Johnson, 2013; Kuhn, 2020; James et al., 2021). The metrics include:

  1. Accuracy = (TP + TN)/(TP + FP + TN + FN) or the proportion of correctly classified cases as barren and fertile.

  2. Sensitivity (sens) = TP/(TP + FN) or true positive rate, which is the proportion of positive cases correctly identified among the positive cases (e.g., the proportion of fertile zircons correctly classified as fertile among the fertile zircons).

  3. Specificity (spec) = TN/(TN + FP) or true negative rate, which is the proportion of negative cases correctly identified among the negative cases (e.g., the proportion of barren zircon correctly classified as barren among the barren zircons).

Table 2 shows the Eu confusion matrix for the zircon database considering zircons with Eu/Eu* >0.4. Using this limit gives an accuracy of 0.81, a sensitivity of 0.80, and a specificity of 0.83. From this information, it is possible to infer that a value of 0.4 as a boundary correctly classifies barren slightly better than fertile zircons, with 80% of the cases classified correctly. This process can be repeated by changing the boundary to Eu/Eu* >0.45, which has an accuracy, sensitivity, and specificity of 0.80, 0.68, and 0.83, respectively. Using this threshold provides almost identical accuracy, but it incorrectly classifies 11% more fertile zircons while increasing the specificity by 0.05. These differences are caused by the imbalance between the class memberships (~7:12 fertile-to-barren zircon ratio), which favors the dominant class. Here, the balanced accuracy, which is the average of the sensitivity and specificity, is more informative with values of 0.81 and 0.77 for the Eu/Eu* thresholds of 0.4 and 0.45, respectively.

Sensitivity and specificity can be obtained for every Eu/Eu* value that can be defined as a limit to discriminate fertile from barren zircons. A plot of the sensitivity vs. 1-specificity (equivalent to the false positive rate) can be constructed (Fig. 3). This type of plot is known as the receiver operating characteristic curve (ROC curve; Altman and Bland, 1994b; Brown and Davis, 2006; Fawcett, 2006; Kuhn and Johnson, 2013). If all the zircons were randomly classified as fertile or barren, they would follow the identity line (1:1 line or “no skill line” in Fig. 3). On the other hand, a variable or model that is a perfect classifier for the reference level would have a sensitivity and specificity of 1, and it will be represented by an ROC curve that goes from coordinates (0,0), to (0,1), to (1,1) (red segmented line, Fig. 3). Then, a perfect classifier has an area under the ROC curve (AUC) of 1, whereas a random classifier would have an AUC of 0.5. These values have a probabilistic meaning: it is equivalent to the probability that the model or classifier will rank a random positive level (fertile zircons) higher than a randomly chosen negative instance (barren zircons; Fawcett, 2006).

Figure 3 shows the receiving operating characteristic for the Eu/Eu* and Lu concentrations in zircon. Lu is an example case of a bad predictor (purple line). The ROC AUC are 0.881 and 0.562 for the Eu anomaly and Lu, respectively, which indicates that the former is a good predictor for fertile zircons, whereas the latter is no better than a random classification. Therefore, the ROC AUC can be used to benchmark the predictive performance of each variable (Fawcett, 2006; Kuhn and Johnson, 2013; James et al., 2021; Kuhn and Silge, 2022).

Identifying the best predictors of fertile zircons

ROC AUC values can be used to quantify the relative merits of selected trace elements for discriminating between zircons from fertile and barren intrusions. The results indicate that zircons associated with fertile porphyries have low P (AUC = 0.847) and low middle REE (Sm to Dy, excluding Eu; AUC between 0.787 to 0.772), compared with zircons from barren porphyries. The heavy REE (HREE) and LREE are moderate predictors of fertile zircons (ROC AUC 0.637–0.755), except for La and Lu, that in addition to Th and U are not useful discriminators of fertile zircons with ROC AUC values close to 0.5. We also found that fertile zircons have somewhat lower Hf content than those from barren intrusions, which contradicts the previous suggestion by Pizarro et al. (2020) that fertile zircons have >8,750 ppm Hf. However, the ROC AUC value of 0.548 for Hf suggests that it is not a strong predictor of zircon fertility and is most likely controlled by other processes.

O’Neill (2016) suggested that an orthogonal polynomial regression can reduce the dimension of chondrite normalized REE patterns, in which each term is independent of the other. The regression coefficients (λ) quantify the different characteristics of the REE patterns as follows: the average ΣREEs concentration (λ0), the slope (λ1), the curvature (λ2) and the inflections at the extremes of the pattern (λ3). These parameters were calculated for each zircon, and their AUC was evaluated. The coefficients λ0, λ1, λ2, and λ3 yield an AUC of 0.748, 0.546, 0.772, and 0.816, respectively (Fig. 4A). These results suggest that differences in the REE patterns’ slope (λ1) is a weak predictor for zircon fertility. However, λ0 and λ2 have moderate fertility predictive power, whereas λ3 is a good fertility predictor.

ROC AUC results for Eu and Ce anomalies and selected elemental ratios show that most are good predictors of zircon fertility with ROC AUC values ranging from 0.881 for Eu/Eu* to 0.826 for Ce/Nd, and P is the element that best distinguishes between ore-associated and barren zircons (AUC = 0.847). The exception is Th/U, which gives a ROC AUC of 0.499, which is no better than a random classification. Ratios of 10,000 × (Eu/Eu*)/Y and (Ce/Nd)/Y have been suggested as zircon discriminators by previous authors (Lu et al., 2016; Pizarro et al., 2020). However, we have found that 10,000 × (Eu/Eu*)/Y is no better than Eu/Eu* and that Ce/Nd and (Ce/Nd)/Y are almost identical (AUC of 0.826 and 0.838, respectively). Fertile zircons are characterized by higher Eu/Eu*, Ce/Ce*, and Ce/Nd and lower Dy/Yb and Gd/Yb than zircons from barren intrusions, which is consistent with previous studies. We also evaluated the Ce/Ce* ratio calculated using the method of Loader et al. (2017), which gives an AUC of 0.725 ± 0.004 (1 standard error [SE]), identical to Ce/Ce* anomaly (AUC = 0.720 ± 0.004, 1 SE) calculated using the method of Carrasco-Godoy and Campbell (2023).

Defining fertility limits

It is apparent from Figure 3 that there is no single threshold for any of the variables considered that perfectly discriminates between fertile and barren zircons. Instead, each element or element ratio yields a different classification performance for distinguishing both classes. Although the ROC AUC and ROC curve provide a method to compare how well a variable performs as a predictor, it does not define the value that provides the best predictions.

There is a tradeoff between specificity and sensitivity when selecting an appropriate limit (as illustrated by the ROC curve in Fig. 3). For example, in the case of Eu/Eu*, increasing the sensitivity will also increase the number of false positives. This uncertainty can be addressed by using indicators that summarize the information contained in the confusion matrix. Some of these metrics are accuracy, balanced accuracy, the Matthews correlation coefficient (MCC), or the F-score (Buckland and Gey, 1994; Jurman et al., 2012; Irizarry, 2020; Kuhn and Silge, 2022). We present the limits defined by accuracy and balanced accuracy.

As with the ROC curve example, we can compare the values of accuracy and balanced accuracy to consider all the possible limits for the Eu/Eu* anomaly. Figure 5 shows the different values of Eu/Eu* vs. different metrics for different boundaries of the Eu anomaly. The highest values for the metrics are reached at 0.36 and 0.40 Eu/Eu* threshold for a balanced accuracy of 0.823 (spec = 0.778, sens = 0.868) and accuracy of 0.821 (spec = 0.809, sens = 0.827; Fig. 5), respectively. We considered balanced accuracy as the best metric to define limits because it considers the classification performance for each class, which means that each limit performs similarly when distinguishing fertile from barren zircons.

Therefore, to compare with previous studies, we report the limits for Eu/Eu*, Dy/Yb, Ce/Ce* (calculated using both the Carrasco-Godoy and Campbell, 2023, and Loader et al., 2017, methods), Ce/Nd, and Gd/Yb ratios; P, Sm, and Gd concentrations; and λ3 (using calculated values for Pr and La). We also present suggested limits for Hf, 10,000 × (Eu/Eu*)/Y and (Ce/Nd)/Y. Table 3 shows the limits for the variables considered to be the best discriminators and their best limits according to their balanced accuracy and accuracy.

The limits we propose are generally in agreement with those previously published (Table 3), but there are several exceptions. First, our suggested Ce/Nd values are 10 times higher than the previously suggested value of >1. The use of Ce/Nd >1 yield sensitivity and specificity of 1 and 0.05 (95% of barren zircon classified as fertile), which contrast with our suggested limits of Ce/Nd >12.125 that has a sensitivity and specificity of 0.76 and 0.73, respectively.

We also suggest that (Ce/Nd)/Y, 10,000*(Eu/Eu*)/Y, and Hf are not useful predictors. Eu/Eu* >0.363 as a discriminator of zircon fertility has a balanced accuracy of 0.824. Using 10,000*(Eu/Eu*)/Y >1 has a balanced accuracy of 0.64, whereas using our suggested limit of >4.4 has a balanced accuracy of 0.796. When both discriminators are used together to define a fertile zircon the balanced accuracy is 0.819, which is no better than using Eu/Eu*. The normalization by Y concentrations (ROC AUC = 0.727) does not improve the discrimination over Eu/Eu*. The reason is that both variables are similar, with a Spearman correlation coefficient of 0.85. Likewise, normalization of Ce/Nd by Y gives a result that is similar to Ce/Nd [Spearman correlation coefficient of 0.9 between Ce/Nd and (Ce/Nd)/Y]. Therefore, we recommend using Eu/Eu* and Ce/Nd as predictors of fertility, and we do not recommend ratios based on elements that are already strong fertility indicators. The ROC AUC for Hf is 0.548, which indicates that it is a poor predictor.

From the analysis of the area under the ROC curve, we also found that zircons from fertile intrusions have low P concentrations, middle REE, and λ3 when compared to those from barren intrusions. Phosphorus concentrations and λ3 have not been reported as indicators of fertility in the literature before and will be discussed further in later sections.

Redefining the porphyry indicator zircon

Pizarro et al. (2020) made a compilation of fertility indicators and proposed the term porphyry indicator zircon (PIZ), which they define as having: Hf >8,750 ppm, Ce/Nd >1, Eu/Eu* >0.4, 10,000 × Eu/Eu*Y >1, (Ce/Nd)/Y >0.01, Dy/Yb <0.3, and Th/U >0.1 to <1. These discriminators, when used together as a Boolean conditional for fertile zircons, give a sensitivity of 0.55 (~45% of fertile zircons are classified as barren), specificity of 0.96 (~4% of barren zircons fall within the fertile range) and a balanced accuracy of 0.76 when applied to our zircon database. Removing the uninformative variables Hf and Th/U from the list of criteria increased the balanced accuracy to 0.82, identifying 15% more fertile zircons with a minimal drop (2%) in specificity, supporting our earlier conclusion that Hf and Th/U are not good fertility predictors. When we test the same data against our suggested limits for the same variables, excluding Th/U and Hf (Table 3), we obtain a balanced accuracy of 0.75, a result that is similar to the limits of Pizarro et al. (2020), and in both cases, nearly a half of zircons associated with porphyry deposits are discarded as barren (sens = 0.53, spec = 0.97). However, if we consider fewer discriminators, for example the two ratios with the highest ROC AUC (Eu/Eu* and Dy/Yb), the sensitivity rises to 0.72 while keeping the specificity at 0.90 and balanced accuracy of 0.81. This is a good example of fewer variables performing better than more.

As the number of variables increases, it becomes increasingly challenging to define the limits of what constitutes a fertile zircon. In the technique presented above, each variable is tested individually, and they are not considered collectively. Several machine learning algorithms have been developed that can be applied to geologic problems to extract knowledge from known data. In the following section, we will use machine learning models to predict fertile zircons.

Supervised machine learning models use known data to build regression (numeric outcome) or classification (categoric outcome as in fertile or barren zircon) models. In this section, we show the results of three machine learning algorithms to distinguish fertile and barren zircons: a logistic regression (a parametric model), classification trees, and random forests (nonparametric models).

Considerations for machine learning

If all the data available is used to train a model, it is not possible to assess how it will perform when applied to new data. The accepted technique used to overcome this problem is the use of a validation set, which considers dividing the data into a training set (~80% of the data) used to select and optimize the model, and a test set (~20% of the data), used once to evaluate how the model performs when applied to an unknown data set (Hastie et al., 2009; Irizarry, 2020; James et al., 2021; Kuhn and Silge, 2022). We have also used an external test set with data from deposits and barren sources that are not present in the main data set for further validation.

Resampling methods, such as bootstrap, are commonly used in the training set to compare and select across different models. The most common method used for large data sets is repeated V-fold (or K-fold) cross-validation (James et al., 2021; Kuhn and Silge, 2022). This technique splits the training set into V-folds (usually 5 or 10), and then one subset (Vi) will be withheld for testing while the remaining (V1-Vi-1) sets are used for the model training. This is iterated for each of the V-folds until all the sets have been used for testing within the training set. This process can be repeated by splitting the training set into different V-folds sets “n” times. This allows accounting for the variability of each model as it is trained with a different subset for each iteration.

There are model parameters that are obtained directly during the modeling process, such as the regression coefficients in linear regressions. However, some parameters cannot be estimated directly from the models, such as the optimal number of trees in a random forest model. These are called hyperparameters and they are usually obtained using iterative techniques and optimized during model training (Kuhn and Silge, 2022).

Bias in machine learning is the difference between the true pattern in the data and the one the model can emulate, whereas variance is how much changing the data to train the model can alter the model predictions (James et al., 2021; Kuhn and Silge, 2022). Feature engineering (variable selection and transformation; Kuhn, 2020), model tuning (determination of parameters and hyperparameters), and cross-validation are some tools that help to evaluate and reduce the model bias and variance (Nathwani et al., 2022).

Details on feature engineering, handling of missing data, hyperparameter tuning, and more information on the logistic regression fit can be found in Appendix 1.

Tree-based models

Classification trees: Regression and classification trees split the data into smaller groups that are more homogeneous with respect to the variable to be predicted. The tree will select the variable that best splits the data (S) into two groups (S1, S2) that are as homogeneous as possible (Breiman, 1984; James et al., 2021). The process is then repeated (the tree “grows”) for each resulting group. Each split creates more homogeneous groups, therefore, if this process is not stopped the tree will memorize the training set. Thus, to avoid overfitting, the tree grows until a stopping criterion is reached. The stopping criteria are hyperparameters that cannot be estimated directly from the model. This includes the minimal node size, tree depth, and cost complexity (Kuhn and Silge, 2022). The minimal node size indicates the minimum number of observations in a group required for splitting. The tree depth indicates the maximum number of splitting iterations at which the tree should stop growing. The cost complexity is a penalty applied to avoid a tree growing too complex, commonly called tree pruning.

Figure 6 shows the classification tree for the zircon data considering a maximum tree depth of four and trained with all of the data. This is not the final tree used for modeling because as the tree depth increases visualization becomes difficult, but it illustrates the principle behind a decision tree. The classification criteria defined by the tree in Figure 6 recall those in Table 3, except that they are here applied to each new group that meets the previous criteria.

Random forest: Decision trees are easy to interpret and computationally inexpensive. In addition, their nonparametric nature does not make a priori assumptions about the data used for modeling, and uninformative variables are discarded during the modeling process (Kuhn, 2020). However, classification and regression trees, in general, produce suboptimal predictions when compared to other models. They are high-variance models, which means that small changes in the data can completely change the splits for the tree. Also, they are biased to select variables with varied values (Kuhn and Johnson, 2013, and references therein).

Ensemble models are those that are created from several, usually weak, models that are combined to build a single strong model (Hastie et al., 2009; Kuhn and Johnson, 2013). Bagging (short for bootstrap aggregated; Breiman, 1996) and random forest (Breiman, 2001) are tree-based ensemble models developed to overcome the weaknesses of decision trees.

Bagged methods help to reduce the variance of a model through resampling. It is often, but not exclusively, used for classification and regression trees. A sample from the training set is used to fit a complete decision tree n times, then the predictions are classified in each of these trees and the outcome is an aggregate of the results of each model. The main drawback of bagged trees is that each tree is not completely independent of the others. Because the same variables are used for each bootstrap iteration, the trees will have a similar structure, particularly toward the top of the tree where strong predictors are favored.

Random forests introduce a small modification to bagged methods to decorrelate trees during bagging. A small subset of m predictors is randomly available to be selected in each split of the tree. The addition of this randomness during the bootstrap process helps to build independent trees and further reduces the variance of the models. As with bagging, the outcome of the model is an aggregate of the result of each tree.

Importance rankings: Tree-based models place better variables higher in the trees, whereas in the random forest model, the best variables for each iteration are repeated more times at the top of each tree. This allows the calculation of the importance ranking, which indicates the variables that have a greater effect on the desired model prediction. Figure 7A and B show the Gini-corrected impurity (Nembrini et al., 2018) and permutation importance rankings (Breiman, 2001) for all the variables. The results are similar for both methods, which shows that Eu/Eu* and λ3 have the greatest effect on the model predictions, with P falling below Dy/Yb and Ce/Nd in the permutation ranking (Fig. 7B). The importance might be underestimated in the case of strongly correlated variables in which competition at each split can lead to a diluted importance value (Gregorutti et al., 2017). This affects the importance of the REEs, specifically the middle REEs (Sm to Dy, excluding Eu) that dominate the central part of the ranking. However, when compared with the univariate ROC analysis (Fig. 4A), which does not consider each variable independently and is insensitive to correlation, the results of both importance rankings are similar to the results presented in Figure 4A and B. The most useful predictors are therefore Eu/Eu*, λ3, P, Dy/Yb, and Ce/Nd, followed by middle REE, but λ3 ranks higher compared to the results shown in Figure 4.

Predictive modeling results

Cross-validation and training set results:Table 4 shows the five times repeated 10-fold cross-validation results for the tested models arranged by their ROC AUC performance metric. Each model was trained with and without upsampling to account for class imbalance, which yielded almost identical results in the overall performance (ROC AUC and accuracy values). However, when class imbalance is considered, upsampled models have lower specificity and higher sensitivity than the models without upsampling. In the worst model (logistic regressions), 50% more zircons were correctly identified as fertile, with an ~5% drop in specificity compared to the porphyry indicator zircon (sens = 0.55, spec = 0.96) of Pizarro et al. (2020).

Random forest performs better than traditional methods when classifying fertile zircons. In addition, the tree-based methods can give insights into the most important variables when studying a classification problem (Fig. 7). The best results are obtained from the random forest with upsampling that gives a high sensitivity of 0.89, which is ~1.6 times better than using traditional univariate discriminators, while keeping the specificity within the same range (0.95). Therefore, random forest is the best model for distinguishing between fertile and barren zircons.

Model probabilities calibration: During the modeling, the probability of a zircon belonging to a class is calculated. These probabilities predicted by the model do not necessarily reflect the true probability of a zircon being fertile. Therefore, the predicted probabilities are usually calibrated according to the real observations obtained from the data. A method to evaluate the model probabilities is to use a calibration plot to compare the average probability predicted by the model against the proportion of fertile zircon for each decile or more quantiles (Kuhn and Johnson, 2013). These probabilities can be calibrated by fitting a model to the predicted probabilities using cross validation in the training set.

The use of an additional set follows the same principles as using a testing and training set because the fertile zircon proportion is not known in unknown data. Figure 8A shows the calibration curve for the random forest training data. Overall, the model is well calibrated, except for low probabilities, where it tends to overestimate the proportion observed in the data. Figure 8B shows the calibration curve after fitting a logistic regression on the model probabilities for calibration (known as Platt scaling; Platt et al., 1999), where the predicted probabilities are closer to the observed probabilities in the population as the result approaches the 1:1 line.

Test set results

During the model training, 20% of the data was held as a testing set to evaluate the model performance when applied to data that the model has not seen. The results of the testing set are shown in Table 5. The test set was predicted using the best setting for each model, which considers random forest and logistic regression with upsampling and decision tree without upsampling. The results of the test set are consistent with the results during cross-validation for all the models.

Figure 9 shows a normalized histogram for the predicted probability of a zircon being fertile in the test set as determined by the random forest model after calibration. Only the deposits data set includes fertile zircons. By default, if the probability is >0.5 (dark red dashed vertical line in Fig. 9), the algorithm will predict the zircon as fertile. However, this cutoff can be adjusted depending on the objective. In the case of mineral exploration, it might be desirable to decrease the chance of misclassifying a zircon from a barren source as fertile. Changing the probability cutoff to 0.8 increases the specificity to 0.98, but the trade-off is that sensitivity drops to 0.74. Nearly all the barren zircons are correctly discarded, however, 17% fewer fertile zircons are correctly classified.

Misclassification evaluation

A detailed analysis of where the model fails to identify fertile and barren zircons can help to better understand the reliability of the model. Table 6 shows the percentage of misclassification according to their temporality for fertile zircons and source for barren zircons, respectively. There is a decrease in the number of misclassified zircons as they become closer to the mineralization events, where precursor intrusions have the highest rate of misclassifications with 24.3% of the zircons considered barren whereas only 3% of postmineralization porphyries are misclassified. This suggests that the fertility signature in zircons increases as the system evolves. In contrast, a higher number of misclassifications for barren zircons occurs in the I-type granites data (18.5%) and those grains that are spatially or temporally associated with porphyry districts but considered barren (12.2%) followed by zircons from the GEOROC data set.

External validation set

The data in the testing set is a random split of the whole data set. Therefore, there are zircons from the same deposits or the same barren sources in both the training and testing sets. An external validation set was also prepared with data that was not used by the model during the training process to further understand the performance of the models on unknown data. This includes the data from rivers in Europe (Zhu et al., 2023) and the anorogenic suite from the Wichita igneous province, which is considered to be barren (Wall et al., 2020), as well as zircons from the Tampakan porphyry Cu-Au deposit, the Quellaveco Cu district, and several porphyry deposits from the Central Asian Orogenic Belt (Shen et al., 2015; Nathwani et al., 2021; Parra-Avila et al., 2022).

The metrics for the external validation are shown in Table 7. The performance of each model is similar to the original testing set. However, there is an overall increase in specificity and a decrease in sensitivity for all of the models. The external testing set confirms that random forest is the best of the tested models. Most of the misclassified fertile zircons in the external data set are from the Yarabamba monzonite and Toquepala granodiorite (110 out of 131 misclassified), which are considered as fertile, the oldest plutonic units in the Quellaveco district (Nathwani et al., 2021). Figure 10 shows the probability of the zircons being fertile, obtained from the random forest model, plotted against the age, for the intrusive units of the Quellaveco district. The zircons from the Yarabamba monzonite show a probability of being fertile that is close to zero, whereas most of the Toquepala monzonite zircons fall below 50% (Fig. 10). If the probability of a zircon being fertile is interpreted as a measure of the fertility of the system, Figure 10 shows that there is an increase in zircons associated with fertile magmas with time. The exclusion of the Yarabamba monzonite and Toquepala granodiorite from the external data set gives a ROC AUC of 0.977, sensitivity of 0.848 and specificity of 0.955 for the random forest models, which is consistent with the results in the test set.

We have shown that fertile zircons have high Eu and Ce anomalies, lower content of the middle REE, higher λ2, and lower λ3 and Dy/Yb than barren zircons. In this section, we will discuss the overall REE pattern of fertile and barren zircons and possible causes for their differences.

Cerium and Eu anomalies have been widely discussed in literature (Ballard et al., 2002; Liang et al., 2006; Burnham and Berry, 2012; Trail et al., 2012; Dilles et al., 2015; Lu et al., 2016; Loader et al., 2017; Pizarro et al., 2020; Nathwani et al., 2021; Loader et al., 2022). Here, we summarize the observations on Eu and Ce anomalies but focus our discussion on the zircon REE content of fertile magmas and its control on the Eu/Eu* and Ce/Ce*. The Eu anomaly has been associated with changes in oxidation state, fractionation of hornblende ± garnet, and suppression of plagioclase crystallization (Müntener et al., 2001; Nandedkar et al., 2014; Ulmer et al., 2018) at deep crustal levels driven by water-rich magmas (>4 wt % H2O; Richards and Kerrich, 2007; Richards, 2015; Pizarro et al., 2020). Although changes in oxidation state can influence Eu/Eu*melt we suggest that it is a second-order effect compared with the competition between plagioclase and amphibole fractionation for Eu, the former producing a negative Eu anomaly in the melt and the latter producing a positive anomaly. Therefore, increased amphibole fractionation in wet, ore-producing magmas, in conjunction with a decrease in the rate of plagioclase fractionation, results in a reduced negative Eu anomaly in the melt and, in rare cases, a positive anomaly depending on the timing and relative proportions of plagioclase and amphibole crystallization. Nathwani et al. (2021) studied the variation of the relationship between the Eu/Eu*whole-rock and Eu/Eu*zircon and suggested that the Eu anomaly is a proxy of the Eu in the melt. Consequently, high Eu/Eu*zircon is the highest-ranked fertility indicator. It ranks well ahead of Dyzircon/Ybzircon, Hozircon, Yzircon, and other more obvious amphibole indicators. The Ce anomaly, including the Ce/Nd and Ce4+/Ce3+ proxies, has mainly been attributed to changes in the oxidation state of the magma (Ballard et al., 2002; Burnham and Berry, 2012; Trail et al., 2012). However, a recent study by Loader et al. (2022) showed that oxygen fugacity stays constant as Ce/Ce* increases, and the authors suggest other mechanisms that can affect the Ce anomaly such as cooling or other cocrystallizing phases like amphibole, apatite, or titanite. Here, we propose that a decrease in La and Pr concentrations—rather than increases in Ce—have a major influence on the variation of the Ce anomaly.

The low λ3 coefficient and Dy/Yb, and high λ2 of fertile zircons is caused by a preferential depletion in the middle to light REE (App. Fig. A2). Figure 11 shows a boxplot of the chondrite normalized REE concentrations for fertile and barren zircons grouped by their temporal relationship within the main mineralization event of each deposit. There is an overall decrease in the zircon chondrite-normalized REE concentrations from barren to premineralization to synmineralization and finally to postmineralization porphyries. This difference increases from Lu, which shows no difference (Fig. 11B), to La where the difference in the median chondrite-normalized values between precursor and postmineralization porphyries is an order of magnitude (Fig. 11A). Ytterbium content is similar for all zircons, whereas Dy concentrations are lower for fertile zircons (Fig. 11 and App. Fig. A2), which causes fertile zircons to have low Dy/Yb. Barren zircons overlap with the range of La and Pr concentrations of the fertile zircons (gray boxplot in Fig. 11A), and therefore, register low importance in the random forest feature ranking. We would expect higher importance in a model that aims to predict the temporality among fertile zircons.

Two processes could lower the LREE content of zircons: (1) a larger decrease in the partition coefficient of the LREE relative to the HREE in zircon as the magma evolves, or (2) a depletion of the LREE relative to the HREE in the melt owing to the crystallization of an LREE-enriched phase.

Figure 12 shows the average partition coefficients (Ds) obtained from experimental and natural zircons for different temperature ranges (Rubatto and Hermann, 2007; Luo and Ayers, 2009; Burnham and Berry, 2012; Taylor et al., 2015; Claiborne et al., 2017; missing Ds calculated by Miller et al., 2022). Overall, the HREE partition coefficients increase by a factor of 10, and LREE increases by a factor of 5, as temperature decreases from >1,000° to <800°C for both natural and experimental zircons. All partition coefficients for La and Pr are below 1. The observed changes from pre- to postmineralization zircon grains show a decrease in La/Lu while the HREE remain constant, therefore, the variation in partition coefficients with cooling cannot explain the variation of the Ce and Eu anomalies.

Another possibility is that the zircons crystallized from a melt that became increasingly LREE-depleted as the system evolved. We evaluated the REE variation by calculating the median values with a 95% confidence interval for each category using bootstrap resampling statistics and repeated 1,000 times (results for all REE + Y are summarized in App. 1, Fig. A3). Figure 13 shows the median chondrite normalized values for each category for LaN, CeN, and PrN. There is an overall decrease for LaN and PrN (Fig. 13A, C) from precursor to postmineralization zircons. The variation of these elements is a factor of 8.7 for LaN and 5.8 for PrN between precursor and postmineralization zircons. In contrast, there is little change for CeN (Fig. 13B) with the variation being less than a factor of 1.5.

We consider two scenarios, one in which CeN remains constant and the other where LaN and PrN are constant. If we use the observed variation in Figure 13, and if La and Pr remain constant, the maximum variation for the Ce anomaly between precursor and postmineralization porphyries would be 1.5 times. In contrast, if the Ce concentration remains constant and LaN and PrN decrease, the Ce anomaly would increase by a factor of 7.1. When the two effects are combined the median varies by a factor of ~10.6. This suggests that the variation in LREE has a greater influence on the changes in the Ce anomaly than the variation of the Ce concentrations. However, these factors should be considered only as indicative because calculated Pr and La values were used. The real magnitude of this effect is difficult to measure with the current technology. Similarly, the depletion of GdN and SmN might contribute to the variation in the Eu anomaly, but the effect is much smaller than it is for the Ce anomaly.

The content of REE in zircon is controlled by the availability of REE in the melt, which in turn is controlled by the initial REE content of the melt and the nature of the fractionating phases, provided there is no magma recharge or mixing. Zhong et al. (2018) used Rayleigh fractionation to model the effect of cocrystallizing apatite, titanite, xenotime, allanite, monazite, and K-feldspar on zircon REE geochemistry. Here, we have repeated the modeling but tested the effect of hornblende and garnet, minerals that are commonly associated with the formation of porphyry systems, instead of K-feldspar and xenotime.

Figure 14A to F shows the effect of the cocrystallization of zircon with different phases using a Rayleigh fractionation model. The model assumes 80% total crystallization with zircon fractionating at a constant rate of 0.25%, and a second phase whose nature and rate of fractionation are specified in Figure 14. Partition coefficients are assumed to be constant and independent of pressure, temperature (T), or composition. The tested minerals can cause an increase (hornblende, titanite, garnet) or decrease of Ce in zircon (monazite, allanite) or keep it near constant (apatite). The models, which include hornblende or garnet as the second fractionating phase, result in the zircons becoming progressively depleted in the middle and/or HREEs, with little effect on the LREEs. In contrast, apatite and titanite crystallization cause the zircon REE pattern to become depleted from Pr to Lu. Hornblende, garnet, apatite, and titanite can have an impact on the Eu anomaly by depleting Sm and Gd relative to Eu, as noted by Loader et al. (2022). Based on our modeling, only allanite and monazite can deplete La in the melt and cause the decrease in the LREE observed in the fertile zircon patterns without affecting the HREE. Crystallization of a LREE-enriched phase can influence the Ce/Ce* ratio. Measured Ce in zircon is a mixture of Ce3+ and Ce4+ dominated by Ce4+, which partitions much more strongly into zircon than Ce3+ (Hoskin and Schaltegger, 2003), whereas Ce* is calculated by projecting the chondrite normalized REE pattern between La and Pr. Of course, Ce* is not Ce3+. If an LREE-enriched phase crystallizes it will deplete the melt of La, Pr, and Ce3+, but Ce3+ will be relatively less depleted because some of the Ce is Ce4+, which will be incompatible in the LREE-rich phase unless it can also incorporate Ce4+. Consequently, the calculated Ce* will be less than the Ce3+ concentration in the zircon giving an increase in Ce/Ce* that is independent of the melt fO2. The true Ce4+/Ce3+ of the melt and zircon will be controlled by the fO2 of the melt, which in turn is governed by its ferric/ferrous ratio (Hoskin and Schaltegger, 2003; Burnham and Berry, 2012, 2014). The apparent Ce4+/Ce3+ of the zircon, as measured by its Ce/Ce*, will be controlled by a mixture of the melt’s fO2 and competing LREE-enriched phase(s) (e.g., Loader et al., 2022). This makes allanite more likely to be the offending LREE-enriched phase because monazite normally has a high Th content, showing that it can incorporate a tetravalent ion (Bregiroux et al., 2004), whereas allanite does not have high Th content. Magmatic allanite has been reported in some porphyry copper deposits as a cocrystallizing phase or as inclusion within zircons (Shen et al., 2015; Buret et al., 2017). This effect is stronger for zircons from postmineralization intrusions because they come from the most evolved felsic magmas, which have undergone extensive fractional crystallization, resulting in low allanite solubility (Were and Keppler, 2021). As Loader et al. (2022) point out, variations in Ce, and possibly Eu anomalies, in zircons from igneous rocks may not be solely attributed to changes in the melt redox conditions because multiple mechanisms can control the Ce, Eu, and the total REE budget in zircon. Variations in temperature and melt composition may also be important. Further research is required to understand REE fractionation in zircons related to porphyry copper deposits.

We have found that fertile zircons have low P concentrations when compared to barren zircons. Previous studies have shown that zircons that formed from I-type granites have lower P content than those that formed from S-type granite magmas. For example, Mo et al. (2023) report ~900 and ~300 ppm median concentrations in S-type and I-type granites, respectively, similar to the values suggested by Burnham and Berry (2017; S-type = 1,100 ppm, I-type = 370 ppm). However, we found that zircons that formed in magmas associated with porphyry copper deposits have a median P concentration (123 ± 74 ppm, 1σ median absolute deviation [mad]) that is nearly three times lower than the I-type granites median. According to our analyses, P is the best element to distinguish between fertile and barren zircons in porphyry systems, a detail that has not previously been recognized.

Figure 15 shows a violin plot for the P distributions for barren and fertile zircons according to their temporality. The GEOROC and river subsets are not divided into subclasses. The blue line at 176.4 ppm indicates the best limit for discriminating barren from fertile zircons according to the balanced accuracy (Table 3). The figure shows that GEOROC, river, barren, I-type granite and precursor zircons have similar distributions. As noted previously, ore-associated zircons have lower median values than those from barren sources. The P content of zircons from postmineralization to synmineralization porphyries is similar, but it is higher for zircons from premineralization intrusions (Fig. 15).

Apatite is the most important host of P in magmas and has low solubility in intermediate to felsic calc-alkaline magmas (Piccoli and Candela, 2002; Webster and Piccoli, 2015), which are the main precursor magmas to ore-related porphyries (Seedorff et al., 2005; Richards, 2009; Cooke et al., 2014). Zircons from porphyry copper deposits have low P because they crystallize near the end of a fractional crystallization cycle, well after the start of apatite fractionation. The P depletion might be further enhanced by the crystallization of other phosphate phases, especially monazite, but apatite is likely to be the dominant contributor to low P in evolved porphyry systems because of its abundance. The trend of phosphorus depletion continues to the zircons from postmineralization porphyries, which have the lowest median P content (Fig. 15), showing that the melts from which they crystallized underwent even more extreme fractionation than the synmineralization porphyries.

The REE + Y that enters the zircon structure can be charge balanced through the xenotime coupled substitution of (REE+Y)3+ for Zr4+ and P5+ for Si4+ (Speer, 1980). Therefore, zircons that are charge balanced by this mechanism are expected to have a 1:1 ratio between molar P and REE + Y. Figure 16 shows a molar mass plot of P against REE + Y. Zircons that are charge balanced by the xenotime substitution follow the 1:1 line, as is the case for most S-type granites (Fig. 16A; Burnham and Berry, 2017; Zhu et al., 2020). However, for both fertile and barren grains, most zircons have (REE + Y)3+ that plot above the 1:1 line, which indicates excess (REE + Y)3+ that is not charge balanced by P. Recent studies have shown that H+ can enter the zircon structure through the substitution of (REE + Y)3+ + H+ = Zr4+ (De Hoog et al., 2014; Cui et al., 2023). Porphyry copper deposits are associated with hydrous magmas (>4 wt % H2O; Richards and Kerrich, 2007; Richards, 2011; Chiaradia and Caricchi, 2017), and therefore it is reasonable to attribute the excess of (REE + Y)3+ to charge balance by H+.

Zircon classification using machine learning requires the collection and processing of adequate data. We propose a simplified model considering a subset of variables: P, Eu/Eu*, Ce/Nd, Dy/Yb, and the middle REEs. Although λ3 is an excellent predictor, it requires additional calculations. However, the latter can be overcome using the BLamdaR application (Anenburg and Williams, 2022) to calculate the λ parameters of O’Neill (2016) though a user-friendly interface. Similarly, the ImputeREEapp (ccarr.shinyapps.io/ImputeREEapp/) from Carrasco-Godoy and Campbell (2023) can be used to calculate missing REE values and Y from a subset of REE, as well as Ce and Eu anomalies from incomplete REE patterns.

We have tuned and fitted a simplified decision tree with a of depth of five that can be printed or easily applied in spreadsheets software (ESM3). The ROC AUC, sensitivity, and specificity of this model are 0.921, 0.83, and 0.90 for the training set and 0.911, 0.83, and 0.89 for the test set, respectively. This model correctly identifies 10% more fertile zircons than a model that uses only Eu/Eu* and Dy/Yb. Alternatively, it is possible to fit a decision tree using the zircon fertility models web application at https://ccarr.shinyapps.io/Zircon_fertility_models/ or run the app locally using the code at https://github.com/cicarrascog/Zircon_fertility_models. This app allows for more flexibility to adjust models to the data that exploration geologists currently have already available.

One of the obvious limitations of using zircon to predict the fertility of porphyry copper deposits is that not all magmas produce zircon. In this context, porphyries associated with silica under-saturated alkaline magmatism (e.g., Galore Creek; Enns et al., 1995) do not crystallize zircon, and consequently, zircon cannot be used as an exploration tool.

Another consideration is the sampling bias toward economic porphyries. The publicly available data are biased toward economic deposits. Subeconomic prospects are often discarded, and the collected data are not published, which creates a gap in the data used to train the model. Regional or genetic geochemical variations also should not be discarded. The deposit data set included in this study should help exploration geologists and researchers to build regional or deposit-type–specific subsets for their analyses.

Finally, most porphyries are young (<100 Ma; Seedorff et al., 2005). We have excluded individual zircon ages from the models because it would create a model that would predict fertility for the geologic periods where most porphyries formed. Therefore, the predictions of these models should be used as an additional exploration tool, not as a substitute for temporal and geologic context, which are fundamental, especially in brownfield terrains.

In this study, we have evaluated traditional geochemical thresholds using classification metrics applied in machine learning. We have concluded that the use of several variables as independent predictors is detrimental to the prediction of fertile zircons and that the use of Eu/Eu* and Dy/Yb ratios explain most of the fertility signature of zircons associated with porphyry copper deposits (sens = 0.67, spec = 0.91). Machine learning models, in particular random forest, increase the classification success to 89% for fertile zircons and 94% for barren zircons in the test set, which outperform geochemical thresholds.

We have also found that random forest importance rankings and the univariate ROC curve analysis identify relevant features for zircon geochemistry applied to porphyry copper deposit fertility. Eu/Eu and Dy/Yb are the most important variables, but we also identified that the curvature at the extremes of the REE pattern (low λ3) and the low P content are important characteristics of fertile zircons that have not been previously considered. The curvature of the zircon REE pattern is caused by the cocrystallization of other accessory phases, in particular, LREE-rich minerals such as allanite or monazite toward the end of the magmatic cycle when postmineralization porphyries are emplaced. The low P content of zircons associated with porphyry copper deposits is the result of their crystallization well after the onset of the apatite fractionation, meaning they crystallize from low-P magmas. Finally, low-P zircons show an excess of REE + Y, which is charge balanced by the abundant water content of porphyry-associated magmas.

We have also proposed simplified methods and built an application that can be easily incorporated into exploration programs that can be applied to new and existing data.

We thank Professor Peter Hollings, Dr. Yongjun Lu, and Dr. Chetan Nathwani for their constructive reviews. We also thank Dr. Lawrence Meinert and Pepe Perelló for the editorial handling that greatly improved the content and readability of the manuscript. Carlos Carrasco-Godoy’s studies at the Australian National University were funded by the National Agency for Research and Development (ANID)/Scholarship Program/DOCTORADO BECAS CHILE/2019–72200364. We are grateful to Helen Cocker, Hongda Hao, and Ziyi Zhu for providing zircon trace element data from the Grasberg district, Northparkes, and river zircons, respectively. We appreciate the suggestions from Dr. Christopher Lawley in the early stages of this study.

The data from the deposit subset used in this work was compiled from the following works: Ballard et al. (2001), Ballard et al. (2002), Valente (2008), Cocker et al. (2015), Buret et al. (2016), Cocker (2016), Lu et al. (2016), Buret et al. (2017), Loader et al. (2017), Large et al. (2018), Pizarro et al. (2020), Carrasco Godoy (2020), Large et al. (2020), Hao et al. (2021), Large et al. (2021), Lee et al. (2021), Carter et al. (2022), and Cajal Contreras (2023).

Additional information to establish the geologic context of zircons when required was obtained from: Ossandon et al. (2001), Richards et al. (2001), Lickfold (2002), Proffett (2003), Redmond and Einaudi (2010), Wainwright et al. (2011), Leys et al. (2012), Rivera et al. (2012), Mpodozis and Cornejo (2012), Porter et al. (2012), Trautman (2013), Barra et al. (2013), Wafforn (2017), Mudd and Jowitt (2018), Hao et al. (2019), and Sulaksono et al. (2021). See Appendix 1, Table A1 for a summary of the data sources.

The data from the barren subset and the external validation subset are indicated within the text. Appendix 2 includes the references for the GEOROC data set.

The compiled data sets are available at: doi: 10.5281/zenodo.10343461 and github.com/cicarrascog/Zircon_Fertility_Data.

The code used for the Zircon fertility models web app is available at doi: 10.5281/zenodo.10272719 or at the development site: github.com/cicarrascog/Zircon_fertility_models, and can be downloaded and set up to run locally.

Conceptualization: (CC-G), (IHC); methodology: (CC-G); formal analysis: (CC-G); investigation: (CC-G), (YC); writing, original draft preparation: (CC-G); data curation: (CC-G), (YC); writing-review and editing: (IHC), (YC), (CC-G); supervision: (IHC); visualization: (CC-G), software: (CC-G).

Carlos Carrasco-Godoy is a final-year Ph.D. candidate at the Research School of Earth Sciences of the Australian National University (ANU). He is currently working on the use of zircon geochemistry to identify fertile magmatism associated with porphyry copper deposits. He obtained his geology degree at the Universidad de Concepcion, Chile, and his M.Phil. at ANU. Carlos has worked in porphyry copper deposit exploration and has research experience in a range of analytical techniques. His current research interests are the application of geochemistry, data science and machine learning to the study of ore deposits and solving geologic problems.

Supplementary data