Geochemical data are generally derived from government and industry geochemical surveys that cover areas at various spatial resolutions. These survey data are difficult to assemble and integrate due to their heterogeneous mixture of media, size fractions, methods of digestion and analytical instrumentation. These assembled sets of data often contain thousands of observations with as many as 50 or more elements. Although the assembly of these data is a challenge, the resulting integrated datasets provide an opportunity to discover a wide range of geochemical processes that are associated with underlying geology, alteration, landscape modification, weathering and mineralization. The use of data analysis and statistical visualization methods, combined with geographical information systems, provides an effective environment for process identification and pattern discovery in these large sets of data.
Modern methods of evaluating data for associations, structures and patterns are grouped under the term ‘data mining’. Mining data includes the application of multivariate data analysis and statistical techniques, combined with geographical information systems, and can significantly assist the task of data interpretation and subsequent model building. Geochemical data require special handling when measures of association are required. Because of its compositional nature logratios are required to eliminate the effects of closure on geochemical data. Exploratory multivariate methods include: scatterplot matrices (SPLOM), adjusting for censored and missing data, detecting atypical observations, computing robust means, correlations and covariances, principal component analysis, cluster analysis and knowledge based indices of association. Modelled multivariate methods include discriminant analysis, analysis of variance, classification and regression trees neural networks and related techniques. Many of these topics are covered with examples to demonstrate their application.