Compositional Data Analysis in the Geosciences: From Theory to Practice
Since Karl Pearson wrote his paper on spurious correlation in 1897, a lot has been said about the statistical analysis of compositional data, mainly by geologists such as Felix Chayes. The solution appeared in the 1980s, when John Aitchison proposed to use Iogratios. Since then, the approach has seen a great expansion, mainly building on the idea of the ‘natural geometry’ of the sample space. Statistics is expected to give sense to our perception of the natural scale of the data, and this is made possible for compositional data using Iogratios. This publication will be a milestone in this process.
This book will be of interest to geologists using statistical methods. It includes the intuitive justification of the methodology, convincing through case studies and presenting user-friendly software, which includes a section for those who need to see the proof of the mathematical consistency of the methods used.
Compositional data and their analysis: an introduction
Published:January 01, 2006
V. Pawlowsky-Glahn, J. J. Egozcue, 2006. "Compositional data and their analysis: an introduction", Compositional Data Analysis in the Geosciences: From Theory to Practice, A. Buccianti, G. Mateu-Figueras, V. Pawlowsky-Glahn
Download citation file:
Compositional data are those which contain only relative information. They are parts of some whole. In most cases they are recorded as closed data, i.e. data summing to a constant, such as 100% — whole-rock geochemical data being classic examples. Compositional data have important and particular properties that preclude the application of standard statistical techniques on such data in raw form. Standard techniques are designed to be used with data that are free to range from − ∞ to + ∞. Compositional data are always positive and range only from 0 to 100, or any other constant, when given in closed form. If one component increases, others must, perforce, decrease, whether or not there is a genetic link between these components. This means that the results of standard statistical analysis of the relationships between raw components or parts in a compositional dataset are clouded by spurious effects. Although such analyses may give apparently interpretable results, they are, at best, approximations and need to be treated with considerable circumspection. The methods outlined in this volume are based on the premise that it is the relative variation of components which is of interest, rather than absolute variation. Log-ratios of components provide the natural means of studying compositional data. In this contribution the basic terms and operations are introduced using simple numerical examples to illustrate their computation and to familiarize the reader with their use.