In a machine-learning workflow, data normalization is a crucial step that compensates for the large variation in data ranges and averages associated with different types of input measured with different units. However, most machine-learning implementations do not provide data normalization beyond the z-score algorithm, which subtracts the mean from the distribution and then scales the result by dividing by the standard deviation. Although the z-score converts data with Gaussian behavior to have the same shape and size, many of our seismic attribute volumes exhibit log-normal, or even more complicated, distributions. Because many machine-learning applications are based on Gaussian statistics, we have evaluated the impact of more sophisticated data normalization techniques on the resulting classification. To do so, we provide an in-depth analysis of data normalization in machine-learning classifications by formulating and applying a logarithmic data transformation scheme to the unsupervised classifications (including principal component analysis, independent component analysis, self-organizing maps, and generative topographic mapping) of a turbidite channel system in the Canterbury Basin, New Zealand, as well as implementing a per-class normalization scheme to the supervised probabilistic neural network (PNN) classification of salt in the Eugene Island minibasin, Gulf of Mexico. Compared to the simple z-score normalization, a single logarithmic transformation applied to each input attribute significantly increases the spread of the resulting clusters (and the corresponding color contrast), thereby enhancing subtle details in projection and unsupervised classification. However, this same uniform transformation produces less-confident results in supervised classification using PNNs. We find that more accurate supervised classifications can be found by applying class-dependent normalization for each input attribute.

You do not currently have access to this article.