The fossil record is notoriously imperfect and biased in representation, hindering our ability to place fossil specimens into an evolutionary context. For groups with fossil records mostly consisting of disarticulated parts (e.g., vertebrates, echinoderms, plants), the limited morphological information preserved sparks concerns about whether fossils retain reliable evidence of phylogenetic relationships and lends uncertainty to analyses of diversification, paleobiogeography, and biostratigraphy in Earth's history. To address whether a fragmentary past can be trusted, we need to assess whether incompleteness affects the quality of phylogenetic information contained in fossil data. Herein, we characterize skeletal incompleteness bias in a large dataset (6585 specimens; 14,417 skeletal elements) of fossil squamates (lizards, snakes, amphisbaenians, and mosasaurs). We show that jaws + palatal bones, vertebrae, and ribs appear more frequently in the fossil record than other parts of the skeleton. This incomplete anatomical representation in the fossil record is biased against regions of the skeleton that contain the majority of morphological phylogenetic characters used to assess squamate evolutionary relationships. Despite this bias, parsimony- and model-based comparative analyses indicate that the most frequently occurring parts of the skeleton in the fossil record retain similar levels of phylogenetic signal as parts of the skeleton that are rarer. These results demonstrate that the biased squamate fossil record contains reliable phylogenetic information and support our ability to place incomplete fossils in the tree of life.

The fossil record is a fundamental natural historical archive for understanding evolution and Earth's history. Without information from fossils, we would have considerably less knowledge concerning the extinction and emergence of lineages (Sun et al. 1998; Valentine et al. 1999; Field et al. 2020), the nature of biotic crises (Sun et al. 2012; Lyson et al. 2019; Petsios et al. 2019), and major phenotypic innovations in Earth's flora and fauna (Daeschler et al. 2006; Murdock and Donoghue 2011; Stein et al. 2012). But behind the illuminating biological information contained in the fossil record is the reality that fossil data are inherently incomplete (Darwin 1859; Foote and Sepkoski 1999; Smith 2001; Kidwell and Holland 2002), due to geological factors (Raup 1976; Smith and McGowan 2005), factors related to fossil preservation (e.g., taphonomic bias; Sansom et al. 2010), and asymmetrical research interest and sampling intensity among workers (e.g., sampling bias; Smith 1994, 2001). These biases have been characterized in the fossil record of a variety of organismal groups (Crane et al. 2004; Brocklehurst et al. 2012; Dean et al. 2016; Driscoll et al. 2019), and a growing number of sampling proxies can be used to account for the multitude of factors that distort paleobiological data. As a result, our ability to explore major questions related to biodiversity in the fossil record is enhanced, but methodological problems remain (Dean et al. 2016; Sakamoto et al. 2017). Furthermore, the patterns of biases associated with a majority of fossil groups are uncharacterized, and the association between biased records and phylogenetic data content remains underexplored (Sansom et al. 2010, 2017; Sansom 2015; Brocklehurst and Benevento 2020).

Taphonomic and sampling biases still represent consistent barriers to reconstructing the phylogenetic relationships of fossil organisms (Patterson 1981; Sansom et al. 2010, 2017; Sansom 2015; Brocklehurst and Benevento 2020), which are essential to the evolutionary frameworks that synthetic studies of biodiversity (Raup 1979), biostratigraphy (Norell and Novacek 1992), and paleobiogeography (Benson et al. 2013) rely upon (Sakamoto et al. 2017). Incomplete fossils with missing morphological data have been shown to decrease the accuracy of inferred phylogenies (Guillerme and Cooper 2016; Vernygora et al. 2020), but this effect is lessened when fossil taxa are scored alongside extant taxa and more morphological characters are coded (Guillerme and Cooper 2016) or when the number of taxa included in the analysis is increased (Vernygora et al. 2020).

The most straightforward way to minimize the negative effects of missing fossil morphological data on inferring evolutionary relationships is by utilizing the most complete and best-preserved fossil specimens. Most animal and plant fossil records, however, are made up of fragmentary remains and disarticulated parts (Crane et al. 2004; Brocklehurst et al. 2012; Dean et al. 2016), and by focusing primarily on complete specimens for phylogenetic analyses, the majority of extinct biodiversity and available fossil data in natural history collections is often excluded. Consequently, important fossil taxa known from fragmentary material remain underutilized in phylogenetic analyses, and we are left with considerable uncertainty in the evolutionary relationships of most known fossil organisms, which obscures our understanding of major evolutionary patterns in Earth's history (Sansom et al. 2010; Dornburg et al. 2015; Sansom 2015).

Maximizing the vast amount of evolutionary information available in fossil collections necessitates a quantitative characterization of the biases present in this record. Furthermore, if we wish to accurately assess the evolutionary relationships of incompletely preserved fossil specimens, we need to know whether the phylogenetic data preserved in these incomplete samples are reliable and consistent. We compiled a large dataset from natural history collections (6585 specimens; 14,417 skeletal elements) to characterize skeletal representation bias in the extensive and diverse fossil record of squamates (lizards, snakes, amphisbaenians, and mosasaurs). We then applied parsimony- and model-based estimates of phylogenetic signal to test a major question fundamental to paleobiology: Does the observed fossil record contain reliable phylogenetic information?

Sampling the Fossil Squamate Skeleton in Natural History Collections

Despite their biases, natural history collections are the most data-rich record of the evolutionary history of squamates and are the basis for all morphological and most occurrence-based analyses of their fossil record. Although natural history collections have been used to assess the nature of bias and asymmetrical preservation in the squamate fossil record, these assessments are either limited to general observations (Nydam 2013; Rage 2013) or restricted taxonomically to mosasaurs (Driscoll et al. 2019). In this study, we quantify which regions of the squamate skeleton are more prevalent in natural history collections and, by extension, their observed fossil record.

To understand which regions of the fossil squamate skeleton appear more frequently in museum collections, we divided the skeleton into seven discrete regions: (1) jaws and palatal bones in the skull (all bones in the mandible, premaxilla, maxilla, vomer, palatine, pterygoid, and teeth); (2) posterior cranial bones (nasal, frontal, parietal, jugal, quadrate, braincase, etc.); (3) axial elements (i.e., vertebrae and ribs); (4) pectoral girdle; (5) pelvic girdle; (6) appendicular elements; and (7) dermal elements (Fig. 1, Supplementary Fig. S1). Using these criteria, we sampled for the number of occurrences of fossil squamate skeletal elements belonging to each region. We surveyed 6585 fossil squamate specimens and tallied 14,417 occurrences of individual skeletal elements across three major fossil squamate body plans: giant, fully aquatic/marine squamates (mosasaurs); small to medium-sized, four-limbed, terrestrial lizards; and small to medium-sized, limb-reduced/limbless squamates (e.g., snakes, amphisbaenians). Our surveyed specimens range in age from the Late Jurassic to the late Pleistocene (Fig. 2A).

This study used in-person observations of fossil squamate collections and surveyed readily available digital collections databases. In-person sampling (institutions: American Museum of Natural History [AMNH], New York, N.Y., U.S.A.; Denver Museum of Nature & Science [DMNH], Denver, Colo., U.S.A.; and Yale Peabody Museum of Natural History [YPM], New Haven, Conn., U.S.A.) was carried out by identifying squamate fossils to the lowest taxonomic level and with as specific anatomical terminology as possible. Taphonomic bias and preservation mode played key roles in the precision of taxon identification and anatomical assignment. Incomplete specimens for which not enough diagnostic information was preserved were often restricted to identification at order/suborder/family taxonomic levels, using generalized anatomical terminology (e.g., “partial jawbone,” “trunk vertebra,” “metapodial”). More complete specimens could be identified at genus/species taxonomic levels, with specific anatomical terminology (e.g., “anterior right maxilla,” “proximal left femur”).

To broadly characterize fossil squamate morphological data available in museum collections globally, we sampled online electronic databases from six institutions on three continents (Institute for Vertebrate Paleontology and Paleoanthropology [IVPP], Beijing, China; Natural History Museum London [NHMUK], London, U.K.; Natural History Museum of Los Angeles County [LACM], Los Angeles, Calif., U.S.A.; the Smithsonian National Museum of Natural History [USNM], Washington, D.C., U.S.A.; University of Florida Museum of Natural History [UFMNH], Gainesville, Fla., U.S.A.; and YPM). Our criteria for selecting these institutions was dependent on readily available digital vertebrate paleontology collections databases on the institutions’ websites, with collections data that could be downloaded as .csv files or efficiently converted to a dataframe. The compilation of each dataframe was achieved on each institution's website by a simple keyword search for “Squamata” in the taxonomy category. An example of this workflow is given in Supplementary Figure S2, using YPM's website.

Once the dataframe was downloaded, we utilized the catalogued specimen labels to discern which elements of the fossil squamate skeleton were included under a given catalogue number. This was accomplished using keyword searches (Command/Control+F) for each bone in the squamate skeleton (Supplementary Data S1–S14) in Microsoft Excel. Because multiple skeletal elements are frequently included under a single specimen catalogue number, our search criterion was “number of occurrences” of a skeletal element on the specimen labels, rather than the number of individual elements with a catalogue number. If specific counts for a given skeletal element were indicated on a specimen label (e.g., vertebra [×35]), that counted number was included in the total specimens from the collection. If no count was included on a specimen label, then we counted the occurrence of that skeletal element as a single occurrence. We designed the keyword searches to be as inclusive as possible regarding the anatomical language on each specimen label. Some specimens had thorough anatomical descriptions (e.g., “posterior portion of the left maxilla”), while others were more general (e.g., “jaw fragment”). Regardless of level of detail on the label, if an element could be assigned to one of the seven anatomical regions we designated, we included it in the survey. Because our anatomical binning of phylogenetic characters for the assessments of phylogenetic signal uses the same seven anatomical regions (see “Parsimony-based Phylogenetic Signal”), this allowed us to include the specimen data with less specific labels in our characterization of bias in museum collections.

Measurement of Phylogenetic Signal

In our survey of fossil squamate collections, we expect that taphonomic and anthropogenic sampling biases will lead to some regions of the skeleton being more frequently represented compared with others (Supplementary Fig. S3). In a phylogenetic context, this imbalanced representation of the skeleton may not align with the regions of the skeleton that are more heavily emphasized for morphological character selection in phylogenetic analyses. For example, the vertebrate skull is usually the most heavily emphasized and character-dense skeletal region in phylogenetic analyses, whereas other regions of the skeleton, such as the spinal column or appendicular regions, are comparatively less emphasized and less character dense. If a group of vertebrates has a fossil record in which few or no skull material is preserved/collected and instead is mostly represented by vertebrae, then the vertebrae are “overrepresented,” given the smaller number of phylogenetic characters they can be scored for. Conversely, the skull material, or lack thereof, is “underrepresented” in the fossil record, given the larger quantity of phylogenetic characters it can be scored for. In this scenario, the accuracy of phylogenetic analyses for this fossil group is dependent on the fidelity of the overrepresented vertebral character evolution to the hypothesized topology of that group. To make sure these characters are reliable for reconstructing phylogeny, it is necessary to use measurements of phylogenetic signal to assess the quality of morphological character data both in overrepresented and underrepresented parts of the skeleton made available by the biased fossil record.

The presence of competing morphology-based hypotheses for the higher-level evolutionary relationships of squamates (Gauthier et al. 2012; Simões et al. 2018) represents an opportunity to explore patterns of phylogenetic signal in the extensive fossil record (>242 Myr; Simões et al. 2018) of a major component of the modern vertebrate fauna (>11,182 extant species; Uetz et al. 2021). Broadly, phylogenetic signal describes the tendency of closely related species to resemble each other in a given trait, as the result of shared evolutionary history (Pagel 1999; Blomberg et al. 2003; Borges et al. 2019). Phylogenetic signal is measured by tracking how closely the evolution of a character trait aligns with a given evolutionary hypothesis. We used two topologically incongruent morphological phylogenetic datasets, one from Gauthier et al. (2012) (GEA) and the other from Simões et al. (2018) (SEA), to assess the reliability of phylogenetic data present in the squamate fossil record.

The GEA and SEA character matrices and .tre files of the strict consensus trees were downloaded from a publicly available phylogenetic database on G. T. Lloyd's website (Lloyd 2019). The time-calibrated Bayesian Majority-Rule Consensus tree from the SEA dataset used for calculation of the δ-statistic was obtained upon request from T. R. Simões. All phylogenetic datasets used in this study are available in Supplementary Data S15–S18 and S24–S29. To minimize the impact of highly incomplete fossils on estimates of phylogenetic signal, fossil taxa were removed from the GEA and SEA datasets using Mesquite (Maddison and Maddison 2018) and the drop.tip function in the ape package (Paradis and Schliep 2019) in R (R Core Team 2013), and only the extant taxa included in GEA and SEA datasets were considered herein. To effectively calculate measurements of character evolution and phylogenetic signal, morphological characters that remained in a constant state (i.e., zero changes across the entire evolutionary tree with dropped fossil taxa) were removed before carrying out analyses.

Parsimony-based Phylogenetic Signal

To assess phylogenetic signal of skeletal elements in a parsimony-based framework, we calculated character consistency index (CI; Kluge and Farris 1969) and retention index (RI; Farris 1989) values corresponding to the seven anatomical bins utilized in our sampling of fossil squamate collections. We used these values to measure differences in homoplasy (CI) and retained synapomorphy (RI) for characters corresponding to the seven regions of the fossil squamate skeleton. Individual CI and RI values for the 572 pruned GEA characters and 222 pruned SEA characters were calculated using the ape, TreeSearch (Smith 2018), and phangorn (Schliep 2011) packages in R (Supplementary Data S19–S22). The characters were binned according to anatomical region in our sampling of museum collections (Fig. 1, Supplementary Fig. S1), and the distributions (see Supplementary Figs. S4, S5) were then compared with one another using two nonparametric statistical tests: (1) the Mann-Whitney U-test; and (2) the Kolmogorov-Smirnov test (Supplementary Data S23). Because we performed multiple statistical comparisons of CI and RI values across all anatomical regions (GEA: n = 37; SEA: n = 29), statistical tests were run using a Bonferroni correction on the α value. Additionally, because the GEA and SEA datasets contain 30 overlapping operational taxonomic units (OTUs), we swapped the character matrices (Supplementary Data S76, S77) and their corresponding topologies (Supplementary Data S80, S81) to compare phylogenetic signal of the same character partitions using a fundamentally different topology (Supplementary Figs. S6–S8).

Model-based Phylogenetic Signal

Recently, the δ-statistic (Borges et al. 2019) was developed as a means of utilizing model-based phylogenetic comparative methods to assess phylogenetic signal within categorical character data and has been recently applied in addressing paleobiological questions (e.g., Deline et al. 2020). Calculating the δ-statistic for a given morphological character relies on ancestral character state probabilities at each node in a phylogeny. The model operates under the expectation that the better a phylogeny is associated with a given trait, the better it is able to infer ancestral states with minimal uncertainty. The ancestral state probabilities simulate “entropy” at a given node: the higher the uncertainty in a given character state (i.e., less phylogenetic signal), the higher the entropy value at that node (for further explanation, see Borges et al. 2019). δ is a measurement of the ratio of the distribution of higher entropy values to the distribution of lower entropy values for a given character across all nodes in a phylogeny. δ is higher when the distribution of entropies favors lower values (i.e., less ancestral state uncertainty) over higher values (i.e., more ancestral state uncertainty). Ancestral states for both the GEA and SEA characters were reconstructed using the R wrapper BTW (Griffin 2018) for the BayesTraits (Meade and Pagel 2016) program (Supplementary Fig. S9). Trees for the GEA dataset did not have branch lengths, so we time-calibrated the tree with a dataset of all fossil ages using the minimum branch length method (Laurin 2004). We dated 78 internal nodes dated using a previously published time-calibrated version of the GEA tree (Pyron 2017) and minimum branch lengths of 3, 5, and 7 Myr with the BinTimePaleoPhy function in the R package paleotree (Bapst 2012).

We initially ran these analyses using all non-constant characters in the GEA and SEA datasets, as we had done for our parsimony-based measurements of phylogenetic signal (Supplementary Figs. S10–S13). However, in contrast to our parsimony-based measurements, we found a strong negative relationship between the percentage of missing/nonapplicable scores and the δ-statistic value for a given character (higher percentage of missing/nonapplicable scores correlates with lower δ-statistic values) (Supplementary Figs. S14–S16, Supplementary Discussion, Supplementary Data S47). For the analyses presented herein, we only considered characters for which all taxa were scored. Additionally, the extreme morphological disparities between limbed and limbless squamate taxa contributed to a large amount of missing/nonapplicable scorings when considering the two body plans together. Because of this, we ran our analyses separately for limbed and limbless squamate taxa for the GEA dataset (Supplementary Figs. S17–S19, S21) and limbed squamates for the SEA dataset (Supplementary Fig. S20). Because there are only 8 extant legless taxa in the SEA dataset, we could not analyze δ-statistic values for this subset of data; δ-statistic model sensitivity drastically decreases using <20 taxa (Borges et al. 2019).

We used the drop.tip function in the ape package (Paradis and Schliep 2019) in R (R Core Team 2013) to remove the appropriate taxa for each analysis. We then calculated node probabilities for each character state for the non-constant, completely scored GEA characters and SEA characters. The δ-statistic was then calculated for each character using the set of resulting node probabilities for the GEA and SEA datasets for each character using the R script from Borges et al. (2019) (GitHub branch: mrborges23/delta_statistic) (Supplementary Data S30–S45, S52–S71). Bootstrap tests for differences between median δ-statistic values among anatomical distributions of phylogenetic characters were carried out using 10,000 samples with replacement in R (Supplementary Data S46). Because we performed multiple comparisons of δ-statistic values across all anatomical regions (GEA limbed: n = 37; SEA limbed: n = 29; GEA limbless: n = 11; GEA snakes: n = 16), bootstrap tests were run using a Bonferroni correction on the α value. Similarly to our parsimony-based assessments of phylogenetic signal, we swapped character matrices (Supplementary Data S78, S79) and topologies (Supplementary Data S80, S81) using 23 overlapping legged OTUs among the GEA and SEA datasets. Code to repeat parsimony- and model-based analyses of phylogenetic signal is available at the GitHub branch chwoolle/PhylogeneticSignal.

Characterizing Bias in the Squamate Fossil Record

Similar patterns of skeletal region representation are found in the fossil record of the three major squamate body plans (Fig. 2B–D). Among mosasaurs (Fig. 2B), 83.41% of the observed fossil elements came from the axial (65.47%, n = 2544) and jaws + palatal (17.94%, n = 697) regions of the skeleton. Among lizards (Fig. 2C), 77.8% of the observed fossil elements came from the axial (14.77%, n = 1050) and jaws + palatal (63.03%, n = 4308) regions of the skeleton. The overwhelming majority (96.95%, n = 2983) of fossil legless squamate skeletal elements (Fig. 2D) belong to the axial region of the skeleton, whereas jaws + palatal elements (2.67%, n = 82) make up almost all of the remaining occurrences. These results reveal that the majority of fossil squamate morphological data (84.28%; Supplementary Fig. S3A) in museum collections is from the axial and jaws + palatal regions of the skeleton, whereas other regions of the skeleton (posterior cranial, dermal, pectoral girdle, pelvic girdle, and appendicular elements) do not occur as frequently. Additionally, notable discrepancies in morphological data are present between specimens sampled in person and specimens sampled using electronic collections databases, with in-person sampling yielding a higher proportion of dermal and appendicular skeletal elements than sampled in electronic databases (Supplementary Fig. S3A). These differences emphasize the importance of increased physical access to natural history collections and caution against overreliance on databases as a panacea for paleobiological analyses (see Supplementary Text).

We interpret the distribution of fossil squamate skeletal data using the following terminology: the jaws + palatal and axial regions of the fossil squamate skeleton are comparatively overrepresented in natural history collections, whereas the posterior cranial region of the skull, pectoral and pelvic girdles, appendicular elements, and dermal elements are comparatively underrepresented. For example, the low occurrence of squamate posterior cranial skeletal elements in collections (4.74% of skeletal element occurrences) contrasts with an outsized proportion of morphological characters sourced from posterior cranial skeletal elements in the GEA and SEA datasets (43.61% and 34.29%, respectively; Supplementary Fig. S3B). Therefore, in both raw numbers and within the context of phylogenetic data, the posterior cranial region of the skull is underrepresented in the squamate fossil record. Similarly, the axial region of the skeleton, which makes up the largest portion of fossil skeletal data (46.79%) but a small portion of phylogenetic characters in the GEA and SEA datasets (4.10% and 14.99%, respectively), is overrepresented in the squamate fossil record. This decoupling of skeletal partitions represented in fossil data from the partitions emphasized in phylogenetic datasets necessitates a test of whether overrepresented regions in the fossil record contain as much, more, or less phylogenetically informative character data than underrepresented regions.

Analyses of Phylogenetic Signal

Parsimony-based Assessments of Phylogenetic Signal

Overall, we found no statistically significant difference between the distributions of CI and RI values (Fig. 3B,C,E,F) among characters corresponding to either overrepresented or underrepresented skeletal regions (Mann-Whitney U, Kolmogorov-Smirnov tests, all p-values > α; Supplementary Data S23). If we parse out the character CI and RI distribution per squamate anatomical bin (Supplementary Figs. S4, S5, Supplementary Data S15–S23), we observe that distribution shapes and median CI values (SEA dataset) and RI values (GEA dataset) of characters corresponding to the pectoral girdle are statistically significantly different from other regions of the skeleton. Among characters in the other anatomical bins (jaws + palate, axial, posterior cranial, pelvic, appendicular, dermal, other characters), the amount of homoplasy and retained synapomorphy of characters is not significantly different when compared with one another in both GEA and SEA datasets (Supplementary Data S23).

With character matrices and topologies swapped (Supplementary Fig. S6), we found no statistically significant differences in median values and distribution shapes of overrepresented and underrepresented skeletal regions at-large (Mann-Whitney U, Kolmogorov-Smirnov tests, all p-values > α; Supplementary Data S23). However, when we parse out the GEA matrix versus SEA topology results by anatomical region (Supplementary Fig. S7), we observe statistically significant differences in median value and distribution shape of the RI values of characters corresponding to the pectoral girdle and appendicular elements are statistically significantly different from other regions of the skeleton (Supplementary Data S23). The median and distribution shape of CI values of GEA matrix versus SEA topology characters corresponding to the pelvic girdle were statistically significantly different from values for other regions of the skeleton (Supplementary Data S23). For the SEA matrix versus GEA topology, there were no statistically significant differences among any of the anatomical regions (all p-values > α; Supplementary Data S23).

Model-based Assessments of Phylogenetic Signal

For both the GEA and SEA datasets, characters with 0% missing data that correspond to overrepresented regions of the squamate skeleton in the fossil record exhibit: (1) similar overall variation in δ-statistic values and (2) similar median δ-statistic values compared with characters with 0% missing data that correspond to underrepresented regions of the squamate skeleton in the fossil record (Figs. 4, 5). Two-sample bootstrap tests comparing 10,000 resamples with replacement of the median δ-statistic values of overrepresented and underrepresented characters, using a Bonferroni-corrected α (Supplementary Dataset S46), reveal that the differences are not statistically significant, regardless of evolutionary hypothesis (GEA limbed: p = 0.4275; SEA limbed: p = 0.6978; GEA limbless: p = 0.1997; GEA snakes: p = 0.475; Supplementary Dataset S46). Sensitivity analyses using each of the minimum branch lengths were run to account for branch length uncertainty in the GEA dataset, though results were broadly the same (Supplementary Figs. S17–S19, Supplementary Data S30–S45). The SEA topology was already time-calibrated (Simões et al. 2018), therefore, sensitivity analyses concerning branch lengths were not run for that topology. Additionally, to assess the stability of our results under different model parameters, we ran a number of additional sensitivity analyses using different Uniform prior distributions (U) on character transition rates (U: 0–0.001; U: 0–0.1; U: 0–1.0). Results were largely insensitive to differences in model parameters (Supplementary Figs. S17–S20). When we swapped the GEA and SEA character matrices with their topologies (overlapping limbed squamate OTUs only), we found no statistically significant differences in the median δ-statistic values between any of the anatomical regions (all p-values > α; Supplementary Data S46, Supplementary Figs. S26–S28).

Biases in the Squamate Fossil Record

Considering that jaw + palatal bones in the skull and the axial region of the squamate skeleton contain dozens to hundreds of individual elements (e.g., vertebrae, ribs, isolated teeth), it is unsurprising that we see the highest raw numbers of skeletal elements from these regions among the fossil specimens surveyed (Fig. 2B–D, Supplementary Fig. S3A). In fact, the relative occurrences of all seven skeletal regions in the fossil records of mosasaurs (Fig. 2B) and legless squamates (Fig. 2D) appear to superficially coincide with the number of potentially “fossilizable” skeletal elements in these animals. The major anomaly, from a preservation/collecting perspective, comes from the fossil record of lizards, which disproportionately favors the jaw + palatal region of the skull (Fig. 2C). This may be due to the fact that tooth enamel is more resilient to taphonomic factors and that isolated lizard jaws can generally be easily distinguished from other small vertebrate teeth and tooth-bearing elements (combination of pleurodonty and heterodonty). As a result, based on raw numbers, the only clearly discernible taphonomic/collecting bias in the squamate fossil record comes from the portion pertaining to lizards. Future work utilizing established fossil completeness metrics (Brocklehurst et al. 2012; Dean et al. 2016; Driscoll et al. 2019; Woolley et al. 2021) would allow us to further characterize the taphonomic and/or collecting biases that lead to the distribution of the fossil squamate skeleton observed herein.

If we are to consider squamate evolution as a whole, and our ability to place fossil squamates into broad, higher-level phylogenies alongside extant taxa, then the squamate fossil record is biased against the region of the skeleton containing the highest amount of phylogenetic data (the posterior cranial region of the skull, 43.61% of GEA characters and 34.29% of SEA characters; Supplementary Fig. S3B). Additionally, our survey shows that the majority of morphological phylogenetic characters available to score (GEA: 67.87%; SEA: 56.48%; Supplementary Fig. S3B) belong in squamate skeletal regions that are underrepresented in museum collections. This means that even though taphonomic and collecting biases largely align with the material in the skeleton available to fossilize (with the exception of fossil lizards), the observed record of fossil squamates is still biased against the regions of the skeleton that contain the majority of phylogenetic characters. Because the squamate fossil record is clearly missing a large quantity of useful phylogenetic data, it is imperative to assess the quality of the data that remain in characters associated with jaws + palatal elements and axial elements.

Homoplasy and Retained Synapomorphy in the Fossil Record

Results from our comparisons of CI and RI distributions among the GEA and SEA datasets demonstrate that overrepresented phylogenetic character data from the squamate fossil record are not any more likely to provide misleading evidence of phylogenetic relationships than character data from the rest of the skeleton. The lack of a significant difference in observed homoplasy between characters more prevalent in the squamate fossil record suggests that the phylogenetic placements of incomplete and fragmentary fossil taxa are neither more nor less reflective of evolutionary convergence than those of more complete, extant taxa. Similarly, the lack of significant difference in retained synapomorphy among characters in overrepresented versus underrepresented fossil squamate skeletal elements suggests that the phylogenetic placements of fragmentary fossil taxa are neither more nor less reflective of shared derived character states than more complete, extant taxa. Critically, this parsimony-based result is recovered regardless of hypothesis of squamate higher-level evolutionary relationships.

It is unclear what the exact cause is behind the lower CI and RI values for characters sourced from the pectoral girdle in the GEA phylogenetic dataset (Supplementary Fig. S4). Comparisons between CI/RI value and percentage of missing data per character (Supplementary Text, Supplementary Data S47) showed no meaningful relationship, unlike our model-based measurement of phylogenetic signal. This suggests that missing/non-applicable character data do not account for the patterns in CI/RI values among pectoral characters in the GEA dataset. It is possible that major ecomorphological transitions in squamate evolution, such as limb loss and/or specialized pectoral/limb morphologies may play a role in these differences. Alternatively, lower overall sample sizes of phylogenetic characters scored from this skeletal region (Supplementary Figs. S4, S5) could factor into these CI/RI distribution differences. Regardless of cause, the lack of significant differences in the distributions of overrepresented and underrepresented CI/RI values in the large samples sizes of phylogenetic characters in the skull (jaws + palatal and posterior cranial regions) appear to be the most influential on our results.

These results are generally consistent even when swapping character matrices and topologies (Supplementary Fig. S6), with the exception of the RI values of GEA appendicular characters mapped onto the SEA topology (Supplementary Fig. S7, Supplementary Data S23). The higher median RI value and higher variation in the range of RI values in the GEA appendicular region (Supplementary Fig. S4) are eradicated when mapped onto the SEA topology (Supplementary Fig. S7). This could be due to the possibility that GEA appendicular characters demonstrate high support for crownward nodes on the GEA topology, but because of the fundamental differences in evolutionary hypotheses, the characters do not support major nodes on the SEA tree. Further tests, including running separate phylogenetic analyses using each anatomical partition (e.g., Wencker et al. 2021), could be used in the future to understand the causes behind these differences in more detail.

Model-based Analyses of Phylogenetic Signal

Among GEA and SEA characters with no missing data, regions of the squamate skeleton that are underrepresented in the fossil record exhibit levels of median phylogenetic signal similar to overrepresented regions as measured by the δ-statistic. This is true when we consider only legged squamate taxa (Figs. 4, 5), only legless squamate taxa (Fig. 6), and snake taxa in the GEA dataset (Supplementary Fig. S21). By running separate analyses of legged taxa and legless taxa, we were able to consider a portion of characters corresponding to the pectoral girdle, pelvic girdle, and appendicular elements in legged squamates. However, even though we were able to survey representative characters for each of the seven anatomical regions, the strong negative correlation of missing data to δ-statistic values meant that we could only include roughly one-third of the total amount of characters (195 GEA characters, or 31.97% of total; 126 SEA characters, or 36.31% of total). In the absence of an appropriate benchmark study stating the minimum number of characters that can be used for the δ-statistic, it is difficult to tell how much our estimates of phylogenetic signal using the δ-statistic are impacted by low character sample size. However, our δ-statistic results are consistent with our parsimony-based results, which: (1) are derived from a much larger sample of characters (572 GEA characters, or 93.77% of total; 222 SEA characters, or 63.97% of total) and (2) have no discernible relationship with the amount of missing data (Supplementary Data S47). This suggests that smaller character sample sizes may not affect the sensitivity of the δ-statistic analyses as much as, for example, the number of included taxa (Borges et al. 2019). Further work to more rigorously assess the effect of small sample sizes on the δ-statistic is needed, but at present, we conclude that our analyses in this study are still an appropriate use of the method.

For both the GEA and SEA datasets, appendicular characters showcased the highest median δ-statistic value, in addition to an interquartile range with the highest δ-statistic values, among all anatomical bins that contained >2 characters (Fig. 5, Supplementary Figs. S22, S23, S27). Possible explanations for this pattern among appendicular characters vary according to phylogenetic dataset. For the GEA dataset, a potential explanation for the high median δ-statistic values in appendicular characters has to do with the unique limb structure of the chamaeleonid taxa (Brookesia brygooi and Chamaeleo laevigatus) that were scored differently from all other limbed taxa. Because chamaeleonids are monophyletic in the GEA hypothesis, this results in higher δ-statistic values for the appendicular character bin, even though the characters do not illuminate relationships for any group of lizards beyond Chamaeleonidae. The character selection and taxa used in the SEA dataset are different, and the one sampled chamaeleonid taxon (Trioceros jacksonii) does not appear to be influencing the δ-statistic as strongly as the chamaeleonids in the GEA dataset. However, for both datasets, there are unique differences between the squamate limb and the limb of the out-group taxon, Sphenodon punctatus (e.g., the presence of epiphyses on long bones), that polarize the characters and result in high δ-statistic values but are not necessarily informative on the interrelationships among limbed squamate taxa. In sum, the higher phylogenetic signal for appendicular characters is probably driven by the monophyly of a small subset of unique taxa (Chamaeleonidae), as well as the in-group/out-group distinction among all squamates and S. punctatus.

Even with the character matrices and topologies swapped (Supplementary Figs. S27, S28), appendicular characters exhibit the highest median δ-statistic values and an interquartile range with the highest δ-statistic values among all anatomical regions. Why this differs from our appendicular RI values is unclear, especially given the median RI value of GEA appendicular characters was significantly lowered when mapped onto the SEA topology (Supplementary Fig. S7B). It is possible that the relatively smaller sample size of GEA characters in the appendicular region for the δ-statistic analyses excluded key characters that contributed to the lower median RI value and greater interquartile range. These conflicting results could also demonstrate greater plasticity in squamate limb morphology, where, apart from a few clades (e.g., Chamaeleonidae), limb morphological character evolution carries lower phylogenetic signal across the squamate tree of life.

δ-statistic values for characters associated with axial region of the squamate skeleton have variable distributions according to dataset and taxon subsets used. For both the GEA and SEA datasets in which only legged taxa were sampled, the axial characters have the lowest (SEA) or third-lowest (GEA) median δ-statistic value out of all sampled anatomical regions (Fig. 5, Supplementary Figs. S22, S23, S27, S28). The same results hold for the swapped character matrices and topologies (Supplementary Figs. S27, S28). For the GEA dataset in which only snakes were sampled, the axial characters had higher median δ-statistic values than other regions (Supplementary Fig. S25). This suggests that, at least for the GEA dataset, axial characters are comparably more phylogenetically informative for snake taxa than they are for legged taxa or all legless taxa within the dataset. Within the context of the fossil record of snakes and other legless squamates (Fig. 2D), in which the vast majority of identified elements are from the axial region of the skeleton, this adds confidence to our ability to include incomplete fossil snakes into broader phylogenetic analyses.

As was the case with our parsimony-based measurements of phylogenetic signal, the anatomical bins with the largest sampling of characters (jaws + palatal and posterior cranial elements of the skull) appear to be influencing the results of our statistical comparisons the most. Bootstrap tests (Supplementary Data S46) indicate that there is no statistically significant difference between the median δ-statistic value of jaws + palatal and posterior cranial elements of the skull (all p-values > α), which correspond to overrepresented and underrepresented regions of the fossil squamate skeleton, respectively. This means that jaws + palatal and posterior cranial elements of the skull, which happen to be the most character-rich regions of the squamate skeleton, showcase levels of phylogenetic signal that are consistent with one another (Fig. 5, Supplementary Figs. S22–S25, S27, S28). It is important to consider this result alongside the observed fossil record of lizards (Nydam 2013; Rage 2013; Fig. 2C), which disproportionately preserves jaws + palatal elements of the skull (in particular, dentaries and maxillae). By demonstrating that these skeletal regions retain similar levels of phylogenetic signal as posterior cranial elements of the skull and other underrepresented regions of the fossil squamate skeleton, we can be more confident that including incomplete fossil lizard taxa in phylogenetic analyses will not provide misleading evidence of evolutionary relationships.

The arrival at similar assessments of phylogenetic signal using two non-independent but distinct phylogenetic datasets (GEA and SEA) gives us confidence that we are describing an actual natural phenomenon in the preservation of fossil data. Our parsimony- and model-based analyses show that the parts of the squamate skeleton most ubiquitously preserved and collected in the fossil record (jaws + palatal bones, vertebrae, and ribs) retain the same level of phylogenetic signal as other parts of the skeleton. Critically, these results are recovered regardless of hypothesis of squamate higher-level evolutionary relationships. This joint assessment of bias and phylogenetic signal in our sample of morphological data adds confidence to our ability to accurately infer the evolutionary relationships of fossil organisms that preserve disarticulated parts. Incorporating the world's abundance of incomplete, but potentially trustworthy fossils of vertebrates (Norell and Novacek 1992), echinoderms (Thompson and Denayer 2017), plants (Crane et al. 2004), and more into an evolutionary framework can bolster our ability to explore patterns of diversification, paleobiogeography, and biostratigraphy in deep time and can provide critical historical data for understanding today's worsening biotic crises (Ceballos et al. 2017). Using a quantitative phylogenetic framework to establish the reliability of fragmentary fossil material harnesses the full power of the fossil record in our effort to address major outstanding questions in Earth's biological history.

We would like to thank J. J. W. Sertich and K. Mackenzie in the Earth Sciences Vertebrate Paleontology collections at the Denver Museum of Natures & Science for access to specimens, collections resources and research opportunities. We are also grateful to D. Brinkman and J. A. Gauthier in the Vertebrate Paleontology Collections at Yale Peabody Museum of Natural History for specimen access and thoughtful discussion. We would also like to thank C. Mehling in the Fossil Amphibians, Reptiles, and Birds Collections at the American Museum of Natural History for specimen access. Additionally, we would like to thank M. Walsh, S. McLeod, and B. Mertz at the Natural History Museum of Los Angeles County for digital specimen data access. We would also like to thank Paleobiology managing editor J. Kastigar for logistical support with revisions. Finally, we would like to thank Paleobiology editor C. K. Boyce, associate editor M. Friedman, an anonymous reviewer, and reviewer N. Brocklehurst for extremely helpful feedback and comments that greatly augmented the quality of this article. C.H.W. was supported by the Richard Estes Memorial Grant from the Society of Vertebrate Paleontology, the Theodore Roosevelt Memorial Grant from the American Museum of Natural History, as well as National Science Foundation EAR 1647841 and ANT 1341475 (to N.D.S.). J.R.T. was supported by a Royal Society Newton International Fellowship and a Leverhulme Trust Early Career Fellowship.

Data available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.m63xsj43b. C. H. Woolley, J. R. Thompson, Y. H. Wu, D. J. Bottjer, and N. D. Smith. 2021. Supplementary information for “A Biased Fossil Record Can Preserve Reliable Phylogenetic Signal.”

This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.