Abstract

Groundwater data-sets with pH and major cation–anion chemistry are widely available but data that include trace metals are much rarer. This paper examines two methods of data imputation to predict U concentrations using pH, major cations, anions and F in a data-set where some of the U concentrations are missing. The methods evaluated were self-organizing maps (SOM) and expectation maximization (EM). Evaluations were made using a groundwater data-set of 187 samples from NSW and Victoria, which contained a wide range of U concentrations up to 225 μg/l. Tests made by setting 25% and 50% of the U concentrations to missing showed that, at 25% missing, SOM gave reasonable estimates, identifying all the samples with higher U. EM did not clearly identify the higher samples. At 50% missing, neither method could accurately identify the higher U concentrations. Thus, imputation using samples with missing data included in the training data-set does not appear to be practical. However, a SOM pre-trained on a data-set with no missing U concentrations may be used to impute U concentrations for samples with 100% missing U data. Training using the original data-set and then imputing concentrations for a second set of 360 samples showed that the samples with higher measured U concentrations could generally be identified, but that other samples were also estimated to be U-rich. This method could substantially reduce the number of samples in a large data-set requiring further investigation.

The performance of imputation for U reflects the complex interaction of water chemistry, geology and mineralogy that actually determines the U concentrations. Imputation is a useful method for improving estimates of data statistics. SOM, through its model-free approach, is a useful addition to the numerical analysis toolbox for geochemists.

You do not currently have access to this article.