Skip to Main Content
GEOREF RECORD

Monte Carlo simulation of selected binomial similarity coefficients; II, Effect of sparse data

Christopher G. Maples and Allen W. Archer
Monte Carlo simulation of selected binomial similarity coefficients; II, Effect of sparse data
Palaios (February 1988) 3 (1): 95-103

Abstract

Binary similarity coefficients are applied widely to many types of paleontological data. Unfortunately, many of these paleontological data sets are "sparse" (i.e., contain substantially more than 50% "zeros, " or absences, in the data matrix). We have run a series of Monte Carlo simulations with sparse data (10% "1" s, or p = 0.10) on a series of selected coefficients that have been tested previously using 50% "1 "s, or p = 0.50 (Archer and Maples, in press). In decreasing order of ability to approximate the mean of the sparse-data binomial (0.10) and general utility, we rank the coefficients tested as Dice, Braun-Blanquet, Simpson, Jaccard, Baroni-Urbani and Buser, Simple Matching, and Hamann. This ranking is generally the opposite of that for Monte Carlo simulations using 50% "1"s or p = 0.50 (Archer and Maples, in press). Because the mean has such a low value, Dice, Braun-Blanquet, Simpson, Jaccard, and Baroni-Urbani and Buser all truncate the lower zones of significance. Therefore, statistically meaningful comparisons of the differences between any two samples cannot be made using these coefficients. This does not render these particular coefficients as useless, as long as it is realized that significantly similar linkages will occur at what would normally be interpreted as abnormally low values, especially in comparison with data in which a greater percentage of "1"s is present. In contrast, the Simple Matching and Hamann coefficients greatly overestimate the mean because the mutual absence of a character or taxon is considered to be a trait in common between any two samples. Because of the shifted mean, lower zones of significance are present; however, all linkages are high--even those that are significantly dissimilar. The mutual absence of one or more characters between any two samples is usually meaningless for most paleontological problems; coefficients should therefore be selected with this in mind.


ISSN: 0883-1351
Serial Title: Palaios
Serial Volume: 3
Serial Issue: 1
Title: Monte Carlo simulation of selected binomial similarity coefficients; II, Effect of sparse data
Affiliation: Kans. Geol. Surv., Univ. Kans., Lawrence, KS, United States
Pages: 95-103
Published: 198802
Text Language: English
Publisher: Society of Economic Paleontologists and Mineralogists, Tulsa, OK, United States
References: 14
Accession Number: 1988-053962
Categories: General paleontology
Document Type: Serial
Bibliographic Level: Analytic
Illustration Description: illus. incl. 1 table
Secondary Affiliation: AZTeL, USA, United States
Country of Publication: United States
Secondary Affiliation: GeoRef, Copyright 2017, American Geosciences Institute. Reference includes data supplied by SEPM (Society for Sedimentary Geology), Tulsa, OK, United States
Update Code: 1988
Close Modal
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close Modal
Close Modal