Abstract
Over the last 50 years, access to new data and analytical tools has expanded the study of analytical paleobiology, contributing to innovative analyses of biodiversity dynamics over Earth's history. Despite—or even spurred by—this growing availability of resources, analytical paleobiology faces deep-rooted obstacles that stem from the need for more equitable access to data and best practices to guide analyses of the fossil record. Recent progress has been accelerated by a collective push toward more collaborative, interdisciplinary, and open science, especially by early-career researchers. Here, we survey four challenges facing analytical paleobiology from an early-career perspective: (1) accounting for biases when interpreting the fossil record; (2) integrating fossil and modern biodiversity data; (3) building data science skills; and (4) increasing data accessibility and equity. We discuss recent efforts to address each challenge, highlight persisting barriers, and identify tools that have advanced analytical work. Given the inherent linkages between these challenges, we encourage discourse across disciplines to find common solutions. We also affirm the need for systemic changes that reevaluate how we conduct and share paleobiological research.
Introduction
Paleobiological research practices are evolving. Advances in computational power, modeling, and databases have equipped paleobiologists with new tools to analyze the fossil record. These advances have given rise to analytical paleobiology as a research topic within paleontology. Analytical paleobiology comprises paleobiological research that uses analytical (primarily quantitative) methods, including database-driven analyses, meta-analyses, and primary data analyses (Signor and Gilinsky 1991). Although analytical methods have long been used in paleontology, analytical paleobiology crystallized in the 1970s and 1980s following pivotal computational work that examined past biodiversity dynamics (e.g., Valentine 1969; Raup 1972; Raup et al. 1973; Sepkoski et al. 1981; Raup and Sepkoski 1982). Since then, it has matured both by adapting methods from other disciplines and by developing new methods specific to analyzing the fossil record (Raup 1991; Liow and Nichols 2010; Silvestro et al. 2014; Alroy 2020; Warnock et al. 2020). Analytical paleobiology has now grown to touch most subfields within paleontology. For example, analytical tools have been used to document macroevolutionary patterns, evaluate the causes and consequences of ecosystem change, and predict biotic responses to the current biodiversity and climate crises (Condamine et al. 2013; Finnegan et al. 2015; Muscente et al. 2018; Yasuhara et al. 2020). The demand for workshops on these topics, such as the Analytical Paleobiology Workshop (https://www.cnidaria.nat.uni-erlangen.de/shortcourse/index.html) and Paleontological Society Short Courses at the Geological Society of America annual meeting (https://www.paleosoc.org/short-courses), indicates that this research frontier is set to grow.
Although analytical paleobiology has been firmly established as a research topic, it continues to face challenges related to data analysis, synthesis, and accessibility. Some of these challenges are long-standing (Seddon et al. 2014), while others have been recently illuminated or even amplified by analytical advances (Raja et al. 2022). In response, many paleobiologists—particularly early-career researchers—have advocated for more collaborative, interdisciplinary, and open science. Their willingness to embrace new research practices has already begun to permeate the broader paleontological community. However, the guidelines and community buy-in that are needed to standardize these practices are still developing. As both the challenges that face analytical paleobiology and our capacity to tackle them evolve, it can be productive to monitor progress and reflect on how this research topic might continue to mature.
As one of the most recent cohorts to graduate from the Analytical Paleobiology Workshop (2019), we present this synthetic survey to signpost obstacles in analytical paleobiology from an early-career perspective and map them onto emerging solutions. We outline four interconnected challenges (Table 1), highlight recent progress, and collate a list of tools that have pushed analytical paleobiology in new directions (Supplementary Tables 1, 2). By surveying a wide range of topics, we aim to link disparate advances and provide readers with entry points for engagement with each challenge, while directing them to comprehensive discourse on each. We also echo calls for more consistent and equitable approaches to data production, synthesis, and sharing within analytical paleobiology.
Challenge 1: Measuring Biodiversity across Space and Time
The fossil record provides an invaluable but imperfect time capsule to explore how and why biodiversity has changed over Earth's history. Early studies of deep-time biodiversity interpreted the fossil record at face value, but these interpretations are now widely documented to be confounded by a combination of geological, taphonomic, and sampling biases (Raup 1972, 1976; Sepkoski et al. 1981; Benton 1995; Smith and McGowan 2011; Walker et al. 2020). These biases can distort biodiversity estimates and hinder meaningful comparisons of fossil assemblages across space and time (Close et al. 2020a; Benson et al. 2021). In recent years, quantitative methods have accrued to alleviate some of these limitations, improving our ability to quantify true biodiversity patterns (Supplementary Table 2). However, researchers now face the challenge of creating transparent, reproducible workflows to navigate this landscape of resources as they prepare their raw data for analysis (Fig. 1). Here, we focus on four aspects of this workflow: taxonomic resolution, sampling standardization, spatial standardization, and time series analysis.
Estimates of taxonomic diversity are influenced by the resolution at which specimens are identified. Deep-time biodiversity patterns have long been quantified using counts of higher taxa, such as families (Sepkoski 1981; Labandeira and Sepkoski 1993) or genera (Sepkoski 1997; Alroy et al. 2008; Cleary et al. 2018). Genera are often preferred, because they are typically easier to identify, more robust to stratigraphic binning, and more taxonomically stable than fossil species (Allmon 1992; Foote 2000), such that they are considered to be a good substitute for biodiversity (Jablonski and Finarelli 2009). However, genera are not perfect proxies for species, which are more directly shaped by evolutionary and ecological processes (Hendricks et al. 2014). Nor are they immediately comparable with ecological data, which are often collected at the species level and are increasingly delineated using genetics (Pinzón et al. 2013; Zamani et al. 2022) (Fig. 1A). Authors have therefore called for greater transparency when analyzing genus-level patterns (e.g., justifying the use of genera as well as reporting species-to-genus ratios) and discussing their implications for species (Hendricks et al. 2014). At the same time, the taxonomic work that underpins specimen identification remains chronically undervalued (Zeppelini et al. 2021; Gorneau et al. 2022; although see Costello et al. 2013). To preserve taxonomic knowledge, efforts could be made to invest in taxonomy courses (e.g., Smithsonian Training in Tropical Taxonomy), grants that fund curation and systematics (e.g., Paleontological Society Arthur James Boucot Research Grants), and taxonomy databases (Costello et al. 2013; Fawcett et al. 2022; Grenié et al. 2023). Investments in systematics might, in turn, encourage stronger connections between genus- and species-level analyses when studying biodiversity through time.
Biodiversity estimates are also sensitive to sampling. In the last two decades, numerous quantitative methods have been developed to compare numbers of taxa (taxonomic richness) among assemblages while accounting for variation in sampling. Yet there is still no one-size-fits-all approach, leaving researchers to weigh the trade-offs between different methods (Close et al. 2018; Alroy 2020; Roswell et al. 2021) or use multiple complementary methods (e.g., Allen et al. 2020). Richness estimators are a popular sampling standardization method (Alroy 2020). One example is shareholder quorum subsampling (Alroy et al. 2008; Alroy 2010a,b,c), which standardizes samples based on a measure of sample completeness, or coverage. This approach is mathematically similar to coverage-based rarefaction, which is commonly used in ecology to standardize samples when measuring species diversity (Chao and Jost 2012; Chao et al. 2020, 2021; Roswell et al. 2021). Other popular methods focus on macroevolutionary rates (e.g., origination and extinction). These range from relatively straightforward equations (Kocsis et al. 2019) to more complex Bayesian frameworks (PyRate; Silvestro et al. 2014) and models that incorporate phylogenetic information (fossilized birth–death process; Heath et al. 2014; Warnock et al. 2020). Ecological methods, such as capture–mark–recapture (Liow and Nichols 2010), can also be used to infer biodiversity dynamics from incomplete samples but have not been as widely applied in paleobiology. The diversity of available methods underscores the complexity of measuring biodiversity but also presents an opportunity to establish best practices that fine-tune their usage. As consensus forms, paleobiologists and ecologists could collaborate to consolidate sampling standardization methods across disciplines (Challenge 2).
Although sampling standardization corrects for differences in sample completeness, it does not consider the geographic distribution of samples. Biodiversity patterns in the fossil record have traditionally been interpreted at global scales, yet these inferences are affected by the fossil record's spatial structure (Bush and Bambach 2004; Vilhena and Smith 2013; Close et al. 2020b). If spatial variation in sampling is not addressed, apparent changes in biodiversity might reflect heterogeneity in depositional, environmental, or climatic conditions rather than genuine patterns (Shaw et al. 2020; Benson et al. 2021). Additionally, global analyses can mask local- or regional-scale variation in biodiversity (Benson et al. 2021). Researchers are increasingly using spatially explicit approaches to track biodiversity changes at nested spatial scales (Cantalapiedra et al. 2018; Womack et al. 2021). A variety of procedures have been developed in recent years to account for the spatial distribution of samples. Some are relatively simple metrics, such as the convex-hull area (Close et al. 2017) and number of occupied equal-area grid cells (Womack et al. 2021). Others are more complex, such as kernel density estimators (Chiarenza et al. 2019), summed minimum spanning tree length (Jones et al. 2021; Womack et al. 2021), and spatial subsampling procedures (Antell et al. 2020; Close et al. 2020b; Flannery-Sutherland et al. 2022). Some of the newer statistical approaches have been released with reproducible code or as R packages to allow updates from community members, providing an example of how methods in analytical paleobiology might mature (Challenge 3). Next steps could include efforts to establish incentive structures for contributing to this codebase, guidelines that compare methods, and workflows that link these packages.
Many paleobiological studies aim to quantify biodiversity through time, yet such analyses are complicated by variation in the fossil record's temporal resolution and quality (Fig. 1A). Because stratigraphic sequences are irregularly arranged in time and variably time-averaged, many common approaches to time series analysis (such as autoregressive integrated moving average, or ARIMA, models) cannot be readily applied (Kidwell and Holland 2002; Yasuhara et al. 2017; Simpson 2018; Fraser et al. 2021). Additionally, biodiversity dynamics can be scale dependent (Levin 1992; McKinney and Drake 2001; Lewandowska et al. 2020; Yasuhara et al. 2020) or can interact over different scales to yield emergent patterns (Mathes et al. 2021). Recent efforts to analyze biodiversity trends have been aided by advances in geochronology and age–depth modeling that provide more robust age control as well as models of depositional processes (Tomašových and Kidwell 2010; Kidwell 2015; Tomašových et al. 2016; Hohmann 2021; McKay et al. 2021). Progress has also been made by implementing analyses that can accommodate observations from different types of stratigraphic sequences while accounting for age-model uncertainty. In particular, generalized additive models (Simpson 2018), causal analyses like convergent cross mapping (Hannisdal and Liow 2018; Runge et al. 2019; Doi et al. 2021), multivariate rate-of-change analyses (Mottl et al. 2021), and machine learning methods (Karpatne et al. 2019) are changing research norms from describing temporal change to estimating statistical trends and making causal inferences among paleobiological time series. These approaches are still gaining momentum but will likely become more mainstream as they are incorporated into stratigraphic paleobiology and paleoecology training programs (Birks et al. 2012; Patzkowsky and Holland 2012; Holland and Loughney 2021).
As we highlighted earlier, paleobiological data often require extensive cleaning and standardization before they can be meaningfully analyzed. Open-source tools are being developed to streamline this workflow (e.g., Jones et al. 2022), typically in the R programming environment (Supplementary Table 2). Moving forward, this ecosystem of tools might encourage more reproducible data processing workflows within analytical paleobiology (Challenge 3). Nevertheless, quantitative methods cannot mitigate all biases, particularly those influencing the extent of the sampled fossil record. For example, variation in the preservational potential or environmental types represented by samples elude simple statistical corrections (Purnell et al. 2018; Walker et al. 2020; Benson et al. 2021; de Celis et al. 2021). Socioeconomic disparities can also exacerbate taphonomic or geological biases by fueling differences in sampling effort across countries (Amano and Sutherland 2013; Guerra et al. 2020; Moudrý and Devillers 2020; Raja et al. 2022) (Challenge 4). Although quantitative methods can help illuminate the potential severity of these biases, they cannot fill sampling gaps. As such, understanding the context in which samples were collected and communicating how they were interpreted will remain critical aspects of analytical paleobiology.
Challenge 2: Integrating Fossil and Modern Biodiversity Data
Studies that link data from ancient and modern ecosystems offer holistic insight into processes spanning long timescales. For example, time series of taxon occurrences and environmental conditions in the fossil record can complement real-time monitoring to disentangle drivers of community assembly (Lyons et al. 2016), assess extinction risk (Raja et al. 2021), evaluate how ecosystems respond to disturbances (Buma et al. 2019; Tomašových et al. 2020; Dillon et al. 2021), and inform conservation decisions (Dietl et al. 2015; Kiessling et al. 2019). However, despite becoming more intertwined over the last decade, paleontology and ecology continue to progress as separate disciplines (Willis and Birks 2006; Goodenough and Webb 2022). Here, we outline four obstacles that impede the synthesis of paleobiological and ecological data, although these extend to other multiproxy work.
A first obstacle is data acquisition. Recent years have seen advances in data archiving as well as funding for projects that aggregate fossil and modern biodiversity data. Databases and museum collections, especially when digitized (Allmon et al. 2018), have promoted data discovery (Supplementary Table 1). In turn, application programming interfaces and web interfaces have facilitated data downloads. Examples include the paleobioDB R package, which extracts data from the Paleobiology Database (Varela et al. 2015), and the EarthLife Consortium (https://earthlifeconsortium.org), which queries the Paleobiology Database, Neotoma Paleoecology Database, and Strategic Environmental Archaeology Database (Uhen et al. 2021). As these tools have gained traction, there have been calls to standardize archiving and formatting protocols to increase database interoperability (Guralnick et al. 2007; Morrison et al. 2017; König et al. 2019; Wüest et al. 2020; Heberling et al. 2021; Nieto-Lugilde et al. 2021; Huang et al. 2022) as well as maintain interdisciplinary funding structures (e.g., Past Global Changes, https://pastglobalchanges.org) to ensure their future accessibility (Challenge 4).
A second obstacle stems from the practical aspects of integrating paleobiological and ecological data. Integrative analyses involve combining datasets with different units, scales, resolutions, biases, and uncertainties (e.g., paleoclimate proxies aligned with taxon occurrences; Fig. 1). These disparate data properties can hinder their inclusion in statistical models, which typically require consistent inputs that meet certain conditions (Yasuhara et al. 2017; Su and Croft 2018). In recent years, data synthesis has been streamlined by efforts to: (1) develop analyses that can accommodate heterogeneous datasets (Challenge 3); (2) calibrate complementary methods (Vellend et al. 2013; Buma et al. 2019); (3) standardize data harmonization protocols (König et al. 2019; Rapacciuolo and Blois 2019; Nieto-Lugilde et al. 2021); and (4) support interdisciplinary work (Ferretti et al. 2014). As integrative analyses become more common, best practices could be formalized to describe data properties, processing workflows, and boundaries of inference (e.g., Bennington et al. 2009; McClenachan et al. 2015; Wilke et al. 2016; Lendemer and Coyle 2021). One potential path forward is through frameworks that guide the practice of integration and provide conceptual scaffolding for new analytical techniques (Price and Schmitz 2016; Kliskey et al. 2017; Rapacciuolo and Blois 2019; Napier and Chipman 2022).
Conceptual barriers to data integration pose a third obstacle. These barriers often arise from differences between discipline histories, research goals, or methods (Szabó and Hédl 2011; Sievanen et al. 2012; Yasuhara et al. 2017). Process-, function-, or trait-based metrics offer a potential workaround. These metrics can help align datasets over multiple scales and identify common currencies that are grounded in ecological or evolutionary theory (Eronen et al. 2010; Ezard et al. 2011; Mouillot et al. 2013; Wolkovich et al. 2014; Yasuhara et al. 2016; Pimiento et al. 2017, 2020; Spalding and Hull 2021). This paradigm moves away from conventional attempts to explore an ecological or evolutionary process within the bounds of a single discipline, instead encouraging interaction among researchers who approach the same process from different angles. For example, resilience concepts from the ecological literature are already being applied to the fossil record (Davies et al. 2018; Scarponi et al. 2022). Moving forward, we echo existing calls to improve interdisciplinary communication (Benda et al. 2002; Boulton et al. 2005; Eigenbrode et al. 2007), which could help design meaningful metrics that are comparable between fossil and modern datasets.
Finally, the paleontological and ecological communities remain siloed despite their complementarity. They ask similar questions but use different terminology and tools over different timescales (Rull 2010). Interdisciplinary networks, conferences, departments, journals, and training programs can facilitate cross talk between these disciplines. Many examples already exist that provide blueprints for future partnerships. These include the Oceans Past Initiative (https://oceanspast.org), Conservation Paleobiology Network (https://conservationpaleorcn.org), Crossing the Palaeontological-Ecological Gap meeting (https://www.cpegberlin.com) and journal issue (Dunhill and Liow 2018), and the PaleoSynthesis Project (https://www.paleosynthesis.nat.fau.de). Collectively, such efforts could increase institutional support for interdisciplinary research and gradually change the culture of interdisciplinarity (Ferretti et al. 2014; Price and Schmitz 2016; Yasuhara et al. 2017). We could also learn from other interdisciplinary work such as social-ecological systems research, which links insights across the natural and social sciences (Schoon and van der Leeuw 2015). Ultimately, the high buy-in from early-career researchers in these initiatives bodes well for their longevity and impact.
Challenge 3: Building Data Science Skills to Analyze the Fossil Record
Paleobiology is embracing “big data.” Not only are there more ways to collect high-resolution data (Olsen and Westneat 2015; del Carmen Gomez Cabrera et al. 2019; Goswami et al. 2019) and automate analyses using machine learning (Peters et al. 2014; Hsiang et al. 2018, 2019; Kopperud et al. 2019; Muñoz and Price 2019; Beaufort et al. 2022) but also new opportunities to tap into online databases (Alroy 2003; Brewer et al. 2012) (Fig. 1B). These advances have contributed to the volume, velocity, and variety of datasets that characterize big data (LaDeau et al. 2017). However, with this accumulating information (Supplementary Table 1) comes the need for more awareness of quantitative tools (Supplementary Table 2) and best practices for data analysis. Data science training programs paired with proactive efforts to collaborate with environmental data scientists could aid the transition toward more quantitative research.
There is a growing need for paleobiologists to learn statistical and coding skills. These skills are needed to analyze large heterogeneous datasets, implement reproducible coding practices (Nosek et al. 2015; Lowndes et al. 2017), and streamline analytical workflows (Wilson et al. 2017; Bryan 2018) (Challenges 1 and 2). Training could take the form of community-based discussions (Lowndes et al. 2019) and meetups (e.g., TidyTuesday), formal courses (e.g., Software Carpentry, https://software-carpentry.org), or independent instruction through coding tutorials (e.g., Coding Club, https://ourcodingclub.github.io/course.html). Additionally, data science topics could continue to be incorporated into paleobiology degree programs or taught as stand-alone analytical paleobiology courses. These training opportunities would provide a foundation for paleobiologists to use existing quantitative methods and create new software to analyze the fossil record.
As more paleobiologists run analyses in R, Python, and other coding languages, they could benefit from engagement with data scientists as well as with other disciplines that interface with data science, such as ecology and environmental science. Building computational skills might seem daunting, but there is no need to reinvent the wheel. Tools and infrastructure already exist (Sandve et al. 2013; Michener 2015; Hart et al. 2016; Lowndes et al. 2017; Wilson et al. 2017; Filazzola and Lortie 2022) that can be adapted to paleobiology (e.g., Barido-Sottani et al. 2020). Working groups at synthesis centers such as the National Center for Ecological Analysis and Synthesis (which produced the Paleobiology Database) and online communities like LinkedEarth (https://linked.earth) have already begun to foster data-driven collaborations in paleontology, foreshadowing how quantitative research agendas might progress.
Challenge 4: Increasing Data Accessibility and Equity
Paleobiological data and computing resources are more accessible now than ever, but access to them is not equitable among researchers. Many financial, technological, institutional, and socioeconomic factors determine who participates in research as well as how paleobiological data are collected, interpreted, and shared (Núñez et al. 2020; Valenzuela-Toro and Viglino 2021) (Fig. 2). Advancing equity in the context of analytical paleobiology entails acknowledging that access to analytical resources is unequal and allocating them in relation to researchers’ needs to achieve fairer outcomes (CSSP 2019). Here, we discuss barriers pertaining to the access of paleobiological data and resources. These are by no means exhaustive but represent several broadscale challenges for which solutions have been proposed.
Fossil specimens and their associated morphological, geographic, and stratigraphic information underpin research in analytical paleobiology. Data collection often involves visiting museums or gathering digital data from publications and repositories. However, these data are not always accessible. Visiting museums to study specimens can be logistically, financially, or politically infeasible—or even impossible. Travel grants (e.g., John W. Wells Grants-in-Aid of Research Program at the Paleontological Research Institution) can help offset transportation costs, but they cannot alleviate visa issues or other travel restrictions. Likewise, data underlying publications might be buried in supplementary files or locked behind paywalls or might lack consistent metadata or formatting—if they are even made available. As such, emphasis could be placed on finding alternative ways to make paleobiological data more open, particularly for researchers who historically have had less access.
One major step forward is digitization. For example, many museums have committed to digitizing their collections (Nelson and Ellis 2019; Bakker et al. 2020; Hedrick et al. 2020; Sandramo et al. 2021). However, only a fraction of these “dark data” have been mobilized given the substantial time, money, and effort required (Nelson et al. 2012; Paterson et al. 2016; Marshall et al. 2018). If paleobiology continues to value digital data, financial and logistical support could be expanded for online databases and museum digitization efforts as well as resources for researchers to access those data.
Open-data practices do not end with digitization, however, as digital assets must also be maintained. In 2016, the FAIR Guiding Principles (Findability, Accessibility, Interoperability, and Reusability) for scientific data management and stewardship were published to enhance data discovery and reuse (Wilkinson et al. 2016). Additionally, the TRUST Principles (Transparency, Responsibility, User focus, Sustainability and Technology) were developed to demonstrate the trustworthiness of digital repositories (Lin et al. 2020). Although the biological sciences have embraced these principles, paleontology still lags behind (Stuart et al. 2018; Kinkade and Shepherd 2021). To encourage better data management practices, paleontological journals could require authors to archive their data, metadata, and code in centralized online repositories instead of only in supplementary files (Kaufman and PAGES 2k Special-Issue Editorial Team 2018). Unique dataset identifiers could, in turn, be adopted to track data reuse and credit the authors (Pierce et al. 2019). Normalizing these practices begins with data stewardship training to highlight resources (e.g., https://fairsharing.org) and community standards (e.g., Biodiversity Information Standards, https://www.tdwg.org) when managing paleobiological data (Koch et al. 2018; Seltmann et al. 2018; Stall et al. 2018; Krimmel et al. 2021).
As analytical paleobiology moves toward a future of open data, concerns regarding data ownership, representation, and control have been rekindled, particularly in relation to Indigenous communities and lands (Kukutai and Taylor 2016; Jennings et al. 2018; Rainie et al. 2019; McCartney et al. 2022). In response, the CARE Principles of Indigenous Data Governance (Collective Benefit, Authority to Control, Responsibility, and Ethics) were created to complement the FAIR Guiding Principles and promote the ethical use and reuse of Indigenous data (Carroll et al. 2020, 2021). Methods for implementing the FAIR Guiding Principles and CARE Principles in tandem (Rainie et al. 2019; Carroll et al. 2020, 2021) should be incorporated into analytical paleobiology courses to train researchers how to work with Indigenous data and partners without perpetuating entrenched power imbalances (Liboiron 2021; Monarrez et al. 2021).
Another dimension of access pertains to the language used to communicate information. Studies in analytical paleobiology rely heavily on information published in English (Raja et al. 2022). Although having a shared language of science can facilitate global collaboration, it also selectively excludes voices (Tardy 2004). For example, non-English publications are frequently omitted from data compilations, which might bias results from literature reviews (Amano et al. 2016, 2021; Nuñez and Amano 2021; Raja et al. 2022) and meta-analyses (Konno et al. 2020). To help alleviate language biases, researchers could conduct literature searches and disseminate their findings in multiple languages, advocate for translation or English proofing services at journals, and be considerate of non-native English speakers (Márquez and Porras 2020; Ramírez-Castañeda 2020; Amano et al. 2021; Gaynor et al. 2022; Steigerwald et al. 2022). Creating space for multilingual collaborations in analytical paleobiology would welcome knowledge, perspectives, and skills that might otherwise be overlooked due to language barriers.
Paleontology's history has left an indelible imprint on how research in the field is conducted today, contextualizing the challenges we highlight throughout this article. Knowledge production in analytical paleobiology, like other natural sciences, depends in part on socioeconomic factors such as wealth, education, and political stability, as well as colonial legacy (Boakes et al. 2010; Amano and Sutherland 2013; Hughes et al. 2021; Monarrez et al. 2021; Trisos et al. 2021; Raja et al. 2022). Consequently, sampling effort is not equally distributed across the world. For example, 97% of fossil occurrence data recorded in the Paleobiology Database over the last 30 years was generated by higher-income countries, particularly those in western Europe and North America (Raja et al. 2022). These socioeconomic factors intensify other geographic biases in the fossil record and warp biodiversity estimates (Challenge 1). As such, efforts to obtain a representative view of biodiversity across space and time are not disconnected from efforts to advance equity, inclusion, and ethics in analytical paleobiology. Recent publications have spotlighted actions that individuals and institutions should take to change research norms, urging our community to not only reflect on its past but forge a new path forward (Cronin et al. 2021; Liboiron 2021; Theodor et al. 2021; Cisneros et al. 2022; Dunne et al. 2022; Mohammed et al. 2022; Raja et al. 2022).
Conclusion
Analytical paleobiology has grown in available data, computational power, and community interest over the last half century. Notably, progress in quantitative methods, conceptual frameworks, interdisciplinary partnerships, and data stewardship has contributed to more open and reproducible paleobiological research. These advances have expanded our ability to account for biases in the fossil record, accommodate different data types in models, integrate insights across disciplines, and pursue innovative research questions. Early-career researchers in particular, despite being precarious in terms of employment and career prospects, are embracing these evolving research practices. However, there is still a need to increase their acceptance among the broader paleontological community, establish best practices, and dismantle systemic inequities in how paleobiological data have historically been generated, shared, and accessed. Fortunately, we are not alone in facing these issues, and we can learn a great deal from solutions proposed by other disciplines. Great opportunity lies in both individual and institutional action to transform the future of how we study the past.
Acknowledgments
We thank the Analytical Paleobiology Workshop organizing committee, who indirectly catalyzed this paper by bringing us together as the Class of 2019. We also thank our wonderful instructors, whose teaching and insight shaped our perspectives on the four challenges we present. We thank G. Mathes, N. Raja, and Á. Kocsis for their invaluable feedback, and K. Anderson for their insight into museum collections. We also thank W. Kiessling, M. Yasuhara, and an anonymous reviewer whose detailed comments greatly improved the article. Finally, we thank the University of California for covering the publication fees. E. M. Dillon was supported by a University of California Santa Barbara Chancellor's Fellowship. E. M. Dunne was supported by a Leverhulme Research Project Grant (RPG-2019-365). A.I. was supported by the Austrian Science Fund (FWF; P31592-B25). M.K. was supported by a Royal Society of Science Grant (RGF\EA\180318). S.V.R. was supported by the University of Calgary Faculty of Graduate Studies Eyes High Doctoral Recruitment Scholarship. This paper was composed during the COVID-19 pandemic, and the authors wish to acknowledge the widespread and profound political, economic, and personal effects that this event has had, and continues to have, on the early-career researcher community.
Declaration of Competing Interest
The authors declare no competing interest.
Data Availability Statement
Supplementary Tables are available from the Zenodo Digital Repository: https://doi.org/10.5281/zenodo.7340036.