The geographic provenance of minerals provides key insights into a range of geologic problems, including the source of gem materials. The tourmaline supergroup is unparalleled in its ability to record and preserve extensive chemical signatures of its formational environment. To evaluate the likelihood that tourmalines of similar compositions from separate geographic localities could be differentiated, a multivariate statistical approach has been utilized on two complementary data sets. These chemical analytical data sets of copper-bearing “Paraíba” tourmaline include data sets acquired with Laser Induced Breakdown Spectroscopy (LIBS) and electron microprobe analysis (EMPA).

Fifty-four samples of copper-bearing tourmalines from known source locations from Brazil (São José de Batalha of Paraíba state and the neighboring Rio Grande do Norte state), Mozambique, and Nigeria, were analyzed using LIBS with a subset of these samples analyzed by EMP. Data sets obtained by each method were evaluated with multivariate statistics (PCA, PLSR). Although the sample set is limited, sequential PLSR modeling of the spectra clearly distinguished the four localities with high success: >95% for LIBS and >87% for EMP. The statistical analyses of the two techniques, LIBS and EMP, suggest that each technique emphasizes different elements for discrimination when considered in the context of the available data. The elements Cu, Mn, Fe, Mg, Ti, Zn, K, H, Co, and V were significant in LIBS chemometric models. Statistically significant elements in EMP models were Mn, Cu, Al, Ca, K, and F. Each technique results in a robust determination for geographic provenance of tourmalines with comparable compositions. The significant distinguishing chemical elements reflect geochemical distinctions in each host environment that are imparted on the tourmaline. Multivariate statistics applied to LIBS and EMP data provide an effective tool for provenance discrimination of Paraíba tourmalines, distinguishing Brazilian-sourced samples from African-sourced materials. These data provide new methods for separating the geographic origin of minerals with very similar composition such as demonstrated here for copper-bearing tourmalines.

The determination of the geographic origin (provenance) of minerals separated from their original host rock can provide significant insights into various geological processes. Provenance studies can relate to a geographic origin or locality, which may be associated with a spatially restricted geologic unit or to a host rock environment. For example, provenance elucidates shifting patterns of modern and ancient sedimentation (e.g., Morton et al. 2005), provides key information on paleogeographic/tectonic reconstructions (e.g., von Eynatten and Gaupp 1999), establishes a basis for identification of valuable minerals mined in conflict zones (e.g., Hark et al. 2012; McManus et al. 2020) or the likely sources of some gemstones (e.g., Palke et al. 2018) and refines exploration strategies key to identifying sources of needed critical materials (e.g., Lohmeier et al. 2021). Additionally, geographic origin of gem materials is a complex and important problem in the world economy as companies and organizations strive to maintain and certify a supply chain free of conflict minerals. In other cases, substantial price differences of gemstones result from their different geographic origins. Commonly, mineral chemistry is utilized to provide provenance information. This chemical distinction is challenging when differences among possible source areas are subtle or exhibit considerable overlap in chemical parameters or when age criteria alone are insufficient.

Many minerals retain chemical signatures of their formational environment, but no mineral embeds the range of chemical fingerprints better than the minerals of the tourmaline supergroup. Even during a complex, multistage geologic history that can include crystallization, weathering, reburial, metamorphosis, regrowth, and deformation, tourmaline retains textural and chemical signatures of its earlier evolutionary history (e.g., Henry and Guidotti 1985; Henry and Dutrow 1996; van Hinsberg et al. 2011a, 2011b). Tourmaline’s utility as a petrogenetic indicator stems, in part, from its (1) complex crystal chemistry, providing structural and chemical flexibility to incorporate a wide range of chemical constituents of multiple valence states and sizes to imprint a signature of its chemical environment of formation; (2) stability over an extensive range pressures (P) and temperatures (T) encompassing nearly all crustal and upper-mantle conditions; (3) ability to form in widely varying rock and fluid compositions; and (4) minimal volume diffusion such that its imprinted chemical signature remains intact [see summaries by Henry and Dutrow (1996), Dutrow and Henry (2016), and van Hinsberg et al. (2011b)].

The rich chemical signatures, coupled with its mechanical and chemical stability, make tourmaline a unique target for establishing new methodologies for provenance studies. In some instances, chemical distinctions among sources are subtle, yet critical to define. An excellent test case, and one of economic interest, is the sourcing of copper-bearing tourmalines. Determining their geographic origin, or provenance, is challenging and has important financial implications.

Copper-bearing elbaitic or liddicoatitic tourmaline is widely prized as a gemstone due to its vivid, saturated, “neon” blue hues that are caused by the incorporation of Cu2+ as a chromophore (Fig. 1; e.g., Rossman et al. 1991). Originally discovered in the 1980s in Brazil near the São José da Batalha Mine in the state of Paraíba (Koivula and Kammerling 1989) and later in the 1990s in the nearby state of Rio Grande do Norte (e.g., Fritsch et al. 1990; Shigley et al. 2001), these exquisite Cu-bearing specimens became known as Paraíba tourmalines (Fig. 1). Subsequently, other localities hosting similarly colored Cu-bearing tourmalines were found as elbaitic tourmaline in Nigeria in 2001 (Smith et al. 2001) and Mozambique in 2004 (Wentzell 2004; Abduriyim and Kitawaki 2005; Laurs et al. 2008; Katsurada and Sun 2017). The African tourmalines were found originally in secondary alluvial deposits. Chemically, all of these tourmalines are classified as elbaite or fluor-elbaite species, with a general formula of Na(Li1.5Al1.5)Al6(Si6O18)(BO3)3(OH)3(OH) or F replacing one OH [for species nomenclature see Henry et al. (2011) and Henry and Dutrow (2018)] and Cu2+ substituting into the octahedral site that typically accommodates Li-Al. In 2017, Cu-bearing fluor-liddicoatites—Ca(Li2Al)Al6(Si6O18)(BO3)3(OH)3(F)—were discovered and were attributed to a locality in Mozambique (Katsurada and Sun 2017). The varietal name, “Paraíba” tourmaline, is used to refer to any of the saturated blue, green, and violet tourmalines containing Cu2+ ± Mn2+ as chromophores (LMHC 2023). Paraíba tourmaline sources for gemstones are difficult, if not impossible, to distinguish based on color alone. Yet, the Brazilian material from the original mine area can command prices that are 5–10 times higher than those of their African counterparts of comparable quality and size. Consequently, provenance is an essential component of the tourmaline’s value as a gemstone.

Major-element tourmaline “environmental” diagrams such as the Al-Fe-Mg ternary (Henry and Guidotti 1985) are not effective for determination of Paraíba tourmaline sources because most have elbaitic composition except for the liddicoatitic tourmalines which are easily distinguished based on their elevated Ca contents. Consequently, this necessitates the use of other criteria such as minor and trace elements to potentially fingerprint the likely source of Paraíba tourmalines. For gemmy Paraíba tourmaline, most attempts at provenance evaluations rely on quantities of a limited number of trace and minor element constituents (e.g., Cu, Zn, Ga, Sr, Sn, Pb), obtained via LA-ICP-MS, or isotopes, obtained via Secondary Ion Mass Spectrometry (Ludwig et al. 2011), that are plotted in simple binary or ternary diagrams or in a serial combination of these diagrams as a means to deconvolute the overlapping chemical signatures distinctive of a source (e.g., Abduriyim et al. 2006; Peretti et al. 2009; Palke et al. 2018; Okrusch et al. 2016; see review by Katsurada et al. 2019). Although these types of provenance diagrams have met with varying degrees of success, they do not holistically consider the entire range of Paraíba tourmaline chemistry available for provenance evaluation.

This contribution explores the use of a multivariate statistical approach for enhanced provenance determination that considers a wider spectrum of chemical information available from two distinctively different, but complementary, newly acquired chemical analytical data sets of Paraíba tourmaline: laser-induced breakdown spectroscopy (LIBS) spectra and electron microprobe (EMP) chemical analyses. The purpose of this study is to determine if multivariate statistics can reveal whether one or both data sets can be more effective or, at least complementary, provenance indicators for minerals with very similar compositions.

The LIBS analytical sample set consists of 54 copper-bearing tourmalines with known provenance from four distinct localities (Fig. 2). Samples were obtained from highly reputable gem dealers specializing in Paraíba tourmaline (see Online Materials1 Appendix 1 for sample information). Representing Brazil are 24 grains from two localities: São José de Batalha, Paraíba state [SJdB; the original Paraíba locality; 6 grains, 5.93 carats (ct), color-zoned blue, purple] and Rio Grande do Norte state (RGdN; 15 grains, color-zoned blue, purple; Figs. 2a and 2b). In addition, three samples displaying neon-blue colors are identified as from “Brazil” but with unknown specific localities, two samples are in matrix and one is a single crystal. Mozambique (Moz) is represented by 24 tourmaline grains with a spectrum of colors including pink, blue, purple, and green (total weight of 51.73 ct; see Fig. 2c). Nigeria (Nig) is represented by 11 grains (totaling 28.28 ct; Fig. 2d). Nigerian grains are largely green to blue-green. Most rough crystals measured less than 1 cm in size and were without the matrix material.

LIBS analyses

LIBS is a relatively recent analytical technique that is finding utility in the geosciences [e.g., see reviews by Fabre (2020) and Harmon and Senesi (2021)]. The information-rich spectra contain signatures of all elements in concentrations above detection limits (e.g., Cremers and Radziemski 2013), molecular emissions, select isotopic ratios (e.g., Smith et al. 2002; Doucet et al. 2011: Russo et al. 2011), and some structural information (Serrano et al. 2015) resulting in a detailed chemical fingerprint of the material analyzed. To take advantage of the rich chemical data set embedded in tourmaline, this LIBS study uses the spectrum of relative peak intensities of each tourmaline rather than absolute quantities of individual elements within the tourmaline.

Minimal sample preparation is required for LIBS [see, e.g., McMillan et al. (2018, 2019) for additional information]. Rough samples were cleaned with isopropyl alcohol to remove oils and surface residue and air-dried. Most tourma-lines are individual grains or clusters of grains. Originally, samples were mounted on a plexiglass sheet with BlueTac to secure the grains; later, the BlueTac was eliminated. The sheet was placed into the sample holder in the LIBS instrument chamber. LIBS data were acquired prior to EMP data analyses to avoid any possible contamination from EMP sample preparation, such as polishing and carbon coating of the grains.

Tourmalines were analyzed with an Applied Spectra J200 LIBS instrument at Materialytics, Inc., fitted with a Q-switched Quantel ULTRA 100 Big Sky Nd:YAG laser operated at a fundamental wavelength of 266 nm and <6 ns pulse width. The instrument utilized an Andor Mechelle ME 5000 spectrograph (λ/Δλ = 5000) and an Andor iStar ICCD (intensified charge-coupled device) camera, model DH334T-18F-03. Analytical conditions were a laser power of 150 mJ, with a delay of 0.5 microseconds (µs) between the time of the laser shot and light collection, a gate width (time of light collection) of 10 µs, and a nominal spot size of 50 µm (subsequent analyses demonstrated a larger ablation pit of nearly 80 µm). Spectra were obtained at 1 atm at room temperature in an argon atmosphere to confine the LIBS plasma and thus enhance emission intensity. Where grain size allowed, 64 shots were obtained per sample in an 8 × 8 grid with a spacing of 100 µm between shots—an area covering about 1 × 1 mm. An ancillary study suggested that 64 shots were optimal for characterizing the samples (McMillan et al. 2019). At each analytical location, a cleaning shot was done prior to the analytical shot. The spectral emission was collected over the 26 000+ channels of the detector/spectrometer system to assemble the spectrum in the wavelength range from 200–1000 nm for each analytical shot. Spectra were truncated at 771 nm which preserves the potassium peaks at 766.5 and 769.9 nm but masks the primary argon peaks at higher wavelengths. Multiple shots per sample and their corresponding spectra are averaged and normalized to the mean peak intensity to produce a single spectrum per sample. Averaging LIBS spectra helps mitigate variations caused by inherent shot-to-shot variability (McMillan and Dutrow 2024). Background correction was not applied. Intensities were converted to log values for modeling purposes. Where necessary, identification of LIBS peak positions utilized the online NIST database of optical emission lines (Kramida et al. 2022).

Acquisition of such a large data set requires statistical methods and/or machine-learning techniques for data analyses and interpretation. This study employs the multivariate statistical techniques principal component analysis (PCA) (Esbensen 2004) and partial least-squares regression (PLSR) (Wold et al. 2001; Esbensen 2004) to quantitatively classify spectra with reference to the geographical source of the tourmaline. The strong emission response of some major elements required the masking of select peaks from the spectra to allow subtler chemical variations to be enhanced. For these tourmalines, masking of peaks for the elements Si, Al, Li, Na as well as the Ca peaks at 393.3, 396.8, and 422.7 nm resulted in improved models. While other multivariate statistical techniques may be advantageous, for this test case, methods used previously were followed (e.g., McMillan et al. 2018).

Multivariate statistical modeling

PCA is a dimension-reducing multivariate technique that calculates linear regressions, or Principal Components (PCs), through the data set in multivariate space (24 350 variables). A PCA score plot (sample analyses in n-dimensional space projected onto the plane of two principal components, e.g., PC 1 and PC 2) displays the spectral/compositional relationships of the data set in the two directions of the principal components. This comparison is used to determine the order in which the geographic localities (SJdB, RGdN, Moz, Nig) are modeled beginning with the compositionally most distinct group modeled first (Multari et al. 2010; Kochelek et al. 2015; McMillan et al. 2018).

PLSR models were used to quantitatively discriminate between the samples of the locality of interest and all other localities. PLSR is similar to PCA but includes the value of an independent variable, in this case, the Provenance Variable (PV), in the regression. Spectra of samples from the locality of interest were assigned a Provenance Variable value of 1; spectra of samples from all other localities were assigned a PV value of 0. To calibrate the model, 50% of the spectra from the geographic localities were selected; spectra from the 50% remaining samples were used for test-set validation in a later step. Because the database contained one spectrum per sample, no individual sample was present in both the calibration and validation sets, although samples from a given geographic locality were present in both sets. Statistical modeling was accomplished using the Unscrambler software by Camo. The nonlinear iterative partial least squares (NIPALS) algorithm was applied with 15 PLSR components; no weighting was applied to variables. All models are mean-centered [see also McMillan et al. (2018) for further discussion].

To quantitatively assign a spectrum to a locality group, a numerical value that separates calculated Provenance Variable values for the two groups in the calibration set is defined: the value of apparent distinction (VAD) (Kochelek et al. 2015). The VAD is calculated as the value that gives the highest number of correctly assigned samples during calibration. Any sample with a calculated PV value greater than or equal to the VAD is classified as a tourmaline within the group of interest; those with calculated locality variables less than the VAD are classified as belonging to the group of the remaining localities. Once a VAD is assigned, it does not change during validation.

PLSR models were validated using test-set validation. PV values are calculated for tourmaline spectra not used to calibrate the PLSR model. The VAD determined during calibration is used to predict whether each spectrum in the validation set belongs to the locality of interest or the group of the remaining localities. The prediction accuracy is calculated as the percent of correctly assigned test-set spectra for which locality information is known. For example, Model 1 evaluates São José da Batalha (SJdB) samples. Applying the VAD of 0.45 to the spectra not used in the calibration set, all of the São José da Batalha samples are predicted to be from this locality, as well as one African sample, and the other samples are predicted to belong to the group of remaining samples (Fig. 3). Thus, Model 1 is 96% successful, one sample is miscategorized. Once validated, the decision tree of PLSR plots is developed for each remaining group of samples (RGdN, Moz, Nig).

Each PLSR model identifies spectra that belong to one group (i.e., the geographic locality). After a group is distinguished, those samples are removed from the data set and all subsequent models. In this case, São José da Batalha samples in Model 1 are removed. The order of the models may be critical to obtaining sufficient separation of samples. Each model is determined by choosing the compositionally most distinct group at each step, as defined by the relationships on a PCA score plot. Because the most distinct group is always eliminated, the samples near the final decision tree are those with the most compositional similarities. Typically, samples in those groups are indistinguishable from each other when modeled in the presence of the other samples, but the small differences between them can be extracted and used to separate these groups when they are modeled in isolation after the other groups are removed.

Electron microprobe analysis (EMP)

To test the applicability of the multivariate statistical approach on widely available tourmaline compositional data from EMP, a subset of 15 tourmaline samples for which LIBS data were obtained (Figs. 4a4c; 5 grains Brazil; 6 Mozambique; 4 Nigeria; and two additional samples), were analyzed by wavelength-dispersive spectrometry using the JEOL 8230 electron microprobe at LSU. Quantitative compositional analyses for major and minor elements were obtained at an accelerating potential of 15 kV and a 10 nA beam current using a 2 µm spot size, with Na analyzed first. Natural minerals and synthetic materials were used as standards including andalusite (Al), diopside (Ca, Mg, Si), fayalite (Fe), chromite (Cr), kaersutite (Ti), rhodonite (Mn), willemite (Zn), chalcopyrite (Cu), galena (Pb), albite (Na), sanidine (K), fluorite or fluor-phlogopite (F), tugtupite (Cl) with synthetic Bi2Te3 (Bi), V-diopside glass (V), and GaAs (Ga). EMP detection limits are given in the Online Materials1. Li, H, or B cannot be effectively analyzed by the EMP and were not included in the data modeled. Two well-characterized elbaite tourmalines served as secondary standards. Count times for major elements were 10 s on the peak, 20 s on the background, and for minor and trace elements 60 s peak, 30 s background. Analytical precision is estimated to be ±1% relative for the major elements and ±5% for the minor elements. Where color zoning is apparent, analytical traverses were made across the samples; in other cases, 10–30 analytical spots per grain were randomly selected.

Mineral formulas were normalized following the recommended procedures of Henry et al. (2011) permitting B, H, and Fe3+ to be calculated based on stoichiometry and charge balance and Li estimated by the procedures of Pesquera et al. (2016). Calculating atoms per formula unit (apfu) served as an additional quality check for EMP data but the normalized data are not used for the statistical analysis. To avoid calculation artifacts, oxide weight percentages of measured elements were used for multivariate statistical modeling and are given in the Online Materials1.

Evaluating the efficacy of multivariate statistical models for separating the provenance of Paraíba tourmaline using EMP data followed the same methodology as for separating the LIBS data. However, only 18 variables per chemical analysis are available for modeling. Although the data set comprised 295 analyses, only 15 samples were analyzed. All analyses for each sample were restricted to either the calibration or the validation set to ensure that the models focused on fundamental characteristics of the tourmalines rather than simply identifying analyses from the same sample. Because of the low number of samples, calibrations were based on analyses from 2–4 samples per country, and models were validated with 2 samples from each country. As a result, the calibration set comprised analyses from 4 (Mozambique), 3 (Brazil), or 2 (Nigeria) samples, and the validation set comprised analyses from 2 samples from each country.

Copper-bearing tourmalines analyzed in this study included elbaite or fluor-elbaite species; no samples of the rare Cu-bearing fluor-liddicoatite species were included. Representative EMP analyses for each geographic locality are given in Table 1. Cu-bearing fluor-liddicoatites are Ca-dominant from Mozambique (Katsurada and Sun 2017) and their geographic origin is easily determined based on the Ca-dominance of the tourmaline.

Multivariate statistics using LIBS data

LIBS spectra (unmasked) for the Cu-bearing elbaites display prominent Na, Al, Si, Li, and B peaks, in addition to Cu and Mn peaks as expected (Fig. 5). In several samples, LIBS detected minor and trace elements such as K, Mg, Bi, Zn, Ga, and Sr. The presence of these elements was confirmed by previous LA-ICP-MS analyses of Paraíba tourmaline (Z. Sun, personal communication). Although Ca and Mg are minor components, the high intensity of these emission lines reflects the relatively low ionization energy of the alkaline earth elements (Cremers and Radziemski 2013).

The decision tree for these sample suites consists of three models (Fig. 6; Dutrow et al. 2019). In an initial PCA that includes all the tourmaline spectra from the four localities (São José da Batalha, Brazil, SJdB; Rio Grande do Norte, Brazil, RGdN; Mozambique; and Nigeria), no single group clustered tightly and the groups overlapped in PC1-PC2 space (Fig. 7). The SJdB spectra were chosen as the first group to model because the São José da Batalha, Brazil PLSR model had the highest success rate of all possible first models. Model 1, which classifies spectra as either belonging to the SJdB group or to the group of all other tourmalines, is excellent (Fig. 6), despite the overlap of groups in PCA space (Fig. 7). The calibration shows separation between the groups with a VAD of 0.45 (Fig. 3). The validation is 96% successful, correctly classifying 25 of 26 samples. The one false positive is a sample of Nigerian tourmaline classified as SJdB.

The spectra of SJdB tourmalines were removed from all subsequent models. Model 2 classifies spectra as belonging to RGdN or to the group of all other tourmalines (Mozambique and Nigeria). There is a clear separation between the two groups in the calibration of Model 2 (Fig. 3), which used a VAD = 0.50 value. The validation is 96% successful, correctly predicting the provenance of 22 of 23 samples (Figs. 3 and 6). Again, one Nigerian sample yielded false positive results. This sample is the same as that which was incorrectly classified as SJdB in Model 1.

Finally, Model 3 discriminates between tourmaline spectra from Nigeria and Mozambique (Fig. 6). Spectra are well separated in the calibration with a VAD = 0.52 (Model 3; Fig. 3). The calibration is 94% successful, correctly classifying 16 of 17 samples. One Nigerian sample was misclassified as belonging to the Mozambique group; however, it is a different sample than the false positive sample in Models 1 and 2. The consistent misclassification of Nigerian samples suggests that the sample set is too small to be representative of the actual dispersion of compositions. Alternatively, on visual examination, this sample has a saw mark, which may have left a surface contamination or varied the surface texture of the sample, affecting plasma properties. Overall, the decision tree correctly classified 63 of 66 spectra (one spectrum per sample), resulting in a cumulative prediction accuracy of 95%. The overall true positive rate (only considering the location assigned to PV 1) is 94% (16 of 17; Fig. 3).

Based on the success of the previous geographic modeling, the geographic origin of two unknown Brazilian samples was predicted. Using the LIBS decision tree developed, both unknown samples are classified as being from the Rio Grande do Norte, Brazil, locality.

Multivariate statistics using EMP data

A more widely used analytical technique for characterizing tourmaline mineral chemistry is by electron probe microanalyses (EMP). As such, this multivariate statistical approach was developed using an EMP analytical data set obtained for a subset of the tourmalines for which LIBS data had been acquired (see Online Materials1 for all oxide weight percentages used for multivariate statistics). Importantly, in addition to the major elements, Cu, Mn, and F are present in the tourmalines in amounts readily analyzed by the EMP. F is not easily detected by LIBS but is with EMP. V, Cr and Pb are at, or below, EMP detection limits (Online Materials1).

Modeling EMP data with multivariate statistics followed similar procedures as the modeling for the LIBS data. Because of the smaller sample set size, both Brazilian localities were combined. The character of the EMP data set is different than the LIBS data set, in which each sample is represented by a single spectrum. For EMP data, 10–30 points were analyzed for each of the 15 tourmaline samples (Brazil: 5; Mozambique: 6; Nigeria: 4), resulting in a total of 295 analyses. This data set captures the variability within each sample well, but there are too few samples to be representative of the variability within each country of origin.

A PCA score plot for the calibration EMP analyses in the models shows good clustering for analyses of each tourmaline sample but lacks distinct clustering of samples from each country (Fig. 8). Some relationships are consistent with those found via LIBS. For example, the Brazilian samples plot at negative values of PC2 over a large range of PC1 values. The Nigerian and Mozambican samples cover broad areas that intersect near the origin of the score plot. Analysis of more samples could help the PCA discern different relationships that might provide better separation of the groups.

PLSR is a supervised method where the variables (EMP analyses) are correlated with known provenance variables (PV). Because of this, PLSR models can be successful, regardless of messy relationships in PCA. Model 1 in the EMP decision tree (Fig. 9) separates Brazilian tourmaline analyses from the group of Mozambican and Nigerian samples. The calibration is 96% successful, correctly predicting the origin of 173 of 180 calibration samples with a VAD of 0.51 (Fig. 10). The validation is also 96% successful, correctly predicting the origin 110 of 115 analyses. Five Brazilian analyses are predicted to belong to the group of all others; there are no false positives.

Model 2 is more complex. The calibration (Fig. 10) establishes relatively consistent Provenance Variable values for Mozambican calibration analyses with an average near 1 (average = 0.91; range = 0.36–1.36; standard deviation = 0.18). In contrast, the PV values calculated for Nigerian samples, while less than 1, are different from each other. One sample clusters at an average of 0.45 and the other with an average of 0.03 (Fig. 10). Because one Nigerian sample has relatively high calculated PV values, the VAD that results in the best model success is 0.62. This VAD value results in a calibration accuracy of 97% (112 correct predictions of 115), with two false negatives and one false positive. However, this VAD value is not the best choice for the validation (Fig. 10). A higher VAD would have yielded a higher success, as all of the Mozambican-validated analyses have comparable high calculated PV values, as do 19 of the 40 Nigerian validation analyses. This results in a prediction accuracy of 75% for this model (Figs. 9 and 10; 56 of 75 analyses). More samples with analytical data are needed to calculate more successful models. Overall, the EMPA decision tree correctly predicts the country of origin of 87% of the analyses.

These combined results underscore the utility of multivariate analyses for separating likely geographic source localities of compositionally similar minerals, as demonstrated by elbaitic tourmalines. Significantly, these outcomes result in the separation of geographic localities using considerably different mineral chemical acquisition techniques. For both techniques, the high prediction accuracy of modeling suggests that even with a limited data set, subtle variations in chemical components, when taken as a whole, can provide important signatures of the source region. While the power of the data-rich LIBS spectra coupled with multivariate statistics has been previously demonstrated for separating locality information (e.g., Hark et al. 2012; McMillan et al. 2012; Schenk and Almirall 2010; Kochelek et al. 2015; Gyftokostas et al. 2020), multivariate statistics has not been demonstrated as a useful tool for separating localities using the widely available EMP data. For the LIBS technique, intensity of the emission lines reflects a combination of the elemental abundance and the emissivity properties. Separating localities in this data set required masking peaks from select major elements in part, because they hid more subtle and meaningful chemical variations. In contrast, for the quantitative EMP data, subtle differences in minor elements facilitated separation of geographic localities. However, because of the smaller sample suite, only broad categories could be distinguished. More EMP data from additional samples of each locality would further refine this procedure.

Loading plots (Fig. 11) exhibit the influence of each variable (elemental concentration for EMPA and wavelength intensity for LIBS) on the direction of the principal component through the data set. Variables with values close to zero have minimal impact on the PC and are approximately the same in the samples as in the model. Variables with high positive values strongly influence the direction of the PC, have different values in the samples, and increase in concentration/intensity in the positive direction of the PC on a score plot. Variables with high negative values are similar, except that they increase in concentration/intensity in the negative direction of the PC. PCA models for each pair of localities were calculated for both LIBS and EMP data sets (Figs. 7 and 8, respectively); representative loading plots are presented in Figure 11. For the EMP data, score plots of PCA data indicated a more significant influence of elements Mn, Cu, Al, Ca, and F with lesser influences of K when separating sources (Fig. 11). Previous Paraíba provenance determinations typically use the quantities of six elements (Cu, Zn, Ga, Sr, Sn, Pb), obtained by LA-ICP-MS, for discrimination of geographic source (e.g., Katsurada et al. 2019). While Cu-Zn-Pb is more readily acquired by EMP and Ga to some degree, Pb and Sn are generally below EMP detection limits (typically <0.001 wt% oxide). Although these elements are below detection in EMP data sets, they are not for LIBS data acquisition. The dominant elements in LIBS loading plots are Cu, Mn, Fe, Mg, Ti, Zn, K, H, Co, V, Li, and Na. Interestingly, Ca, Sr, Sn, and Pb were not observed in loading plots for the LIBS data, suggesting these elements did not exert a major influence on the separation of localities for tourmalines studied here (Fig. 11). That implies less that there is something missing from LIBS, but, perhaps, that different elements may enhance geographic discrimination (e.g., K, Bi, Mn, F). Such information allows the development of alternative diagrams for facilitating provenance determination with compositions determined by LA-ICP-MS. Additionally, this study indicates that statistical analyses of the two techniques, LIBS and EMP, emphasize different elements. Even with the analytical limitations of each technique, robust results for geographic provenance are attained.

While the high success for discriminating provenance of remarkably similar tourmaline compositions is encouraging, there are caveats. The Paraíba sample set analyzed here is relatively small with limited variability, in part due to the rarity and cost of materials. No Ca-dominant Cu-bearing tourmalines were among those analyzed, although these are straightforward to distinguish chemically by their Ca concentrations. Not all Cu-bearing tour-malines analyzed display the characteristic “neon” blue hue of the prized Paraíba tourmalines (Fig. 1). Green, greenish-blue and violet hues were included in the sample sets to capture the likely range of chemical variability for Cu-bearing tourmalines. Additionally, a large area is needed for the optimal number of LIBS analyses coupled with the 80 µm spot size. If the sample is zoned, the LIBS analytical spot can include overlapping chemical zones, unlike data obtained with the EMP. Other multivariate techniques, such as Bayesian Statistics (e.g., McManus et al. 2018) or machine-learning algorithms, might enhance the discrimination further.

Overall, these data demonstrate that spectra obtained by LIBS can be used to provide provenance discrimination when coupled with multivariate statistics. Analyses are rapid, with minimal required sample preparation. Loading plots facilitate identification of important elements in discriminating sample localities and can be used to decipher potentially new criteria for provenance determination. Moreover, multivariate analyses of EMP data also allow categories to be differentiated based on more readily obtained chemical data. Application of the multivariate statistics to EMP data suggests that K, Bi, Mn, and F may be additional provenance discriminators. Together these data elucidate elements most useful for geographic discrimination of localities and the sourcing of Paraíba tourmaline.

Determining the provenance of mineral grains separated from their host rock has, for example, revolutionized paleogeographic reconstructions and provided new data on uplift histories and drainage basin development. While many provenance studies rely on zircon ages, expanding the types of detrital minerals used for provenance determination adds new, unexpected opportunities for past geologic reconstructions—the tourmaline source rock types and, for some compositions, the geographic locality can be distinguished. Additionally, in this time of conflict minerals, it is critically important to be able to source conflict gems and metals. This study provides a case study for new methods that allow minerals of very similar compositions to be separated based on chemical parameters. This study shows, for the first time, the power of multivariate statistics applied to EMP data for separating tourmaline localities. Multivariate statistics applied to LIBS and EMP data provide a robust tool for provenance discrimination of Paraiba tourmalines, distinguishing Brazilian-sourced samples from African-sourced materials. Accurate sourcing of gemstones has economic implications as does the sourcing of conflict stones, particularly when economic sanctions may be in place.

Accepted manuscript online December 20, 2023
Manuscript handled by William H. Peck

Shoshauna Farnsworth-Pinkerton and Janelle Hansen facilitated data acquisition. Brian Cook supplied additional information on the Brazilian localities; his insights are appreciated. Matt Wortel, University of Iowa, is thanked for excellent craftsmanship in making and polishing grain mounts. Insightful reviews by Russ Harmon and Beatrice Celata helped clarify and improve the paper and are gratefully appreciated. Funding for this project was provided through NSF-IF 1551434 (to Dutrow and Henry) and 1551415 (to McMillan) and is gratefully acknowledged. Robert Wagner-Beija Flor Gems donated select Brazilan samples from São José da Batalha, Paul Wild donated the Nigerian and Mozambican materials, and Brendan Laurs facilitated this connection for African material; all are thanked for their generosity. Materialytics kindly allowed usage of their LIBS instrument, with special thanks to Catherine McManus and her team.

Deposit item AM-24-69164. Online Materials are free to all readers. Go online, via the table of contents or article view, and find the tab or link for supplemental materials.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC-BY-NC-ND 4.0 License, which permits users to copy and redistribute the work, provided this is not done for commercial purposes and further does not permit distribution of the work if it is changed or edited in any way, and provided that the user gives appropriate credit, provides a link to the license, and that the licensor is not represented as endorsing use of the work.
Open access: Article available to all readers online. This article is CC BY-NC-ND.

Supplementary data