Uneven spatial sampling distorts reconstructions of Phanerozoic seawater temperature

Paleotemperature proxy records are widely used to reconstruct the global climate throughout the Phanerozoic and to test macroevolutionary hypotheses. However, the spatial distribution of these records varies through time. This is problematic because heat is unevenly distributed across Earth’s surface. Consequently, heterogeneous spatial sampling of proxy data has the potential to bias reconstructed temperature curves. We evaluated the spatiotemporal evolution of sampling using a compilation of Phanerozoic δ 18 O data. We tested the influence of variable spatial coverage on global estimates of paleotemperature by sampling a steep “modern-type” latitudinal temperature gradient and a flattened “Eocene-type” gradient, based on the spatial distribution of δ 18 O samples. We show that global paleotemperature is overestimated in ∼ 70% of Phanerozoic stages. Perceived climatic trends for some intervals might be artifactually induced by shifts in paleolatitudinal sampling, with equatorward shifts in sampling concurring with warming trends, and poleward shifts concurring with cooling trends. Yet, the magnitude of some also be the be due to in Our suggest for deep is to improving these


INTRODUCTION
The geological record provides critical context for understanding past, current, and future climate change (Burke et al., 2018) and its effect on biodiversity (e.g., Mannion et al., 2015). To date, measurements of the ratio of stable isotopes 18 O and 16 O (δ 18 O) from fossils provide the most comprehensive paleotemperature record of the Phanerozoic (Veizer and Prokoph, 2015;Song et al., 2019;Grossman and Joachimski, 2020). Typically, δ 18 O values are derived from the calcitic shells or skeletons of foraminifera, brachiopods, belemnites, and bivalves (Jones and Quitmyer, 1996;Veizer and Prokoph, 2015), though phosphatic skeletal elements from conodonts are commonly used for the Paleozoic and Triassic (Wenzel et al., 2000;Trotter et al., 2015).
For studying the evolution of temperature on Earth, δ 18 O measurements are regularly collated to produce global or regional paleotemperature reconstructions (e.g., Veizer and Prokoph, 2015;Song et al., 2019;Grossman and Joachimski, 2020). However, global reconstructions are potentially biased by heterogeneous spatial sampling. Today, temperature varies considerably with latitude, declining steeply from the tropics to the poles. Incomplete and heterogeneous sampling of this gradient can lead to misleading reconstructions of global paleotemperature, and temporal variation in the spatial distribution of samples might produce artifactual temperature trends. This issue is further complicated by the strength of the latitudinal temperature gradient varying over geological time scales, with a flattened gradient recognized for some intervals, such as in the late Mesozoic and early Paleogene (Zhang et al., 2019).
We evaluate how heterogeneous spatial sampling might impact reconstructions of global paleotemperature and the perceived temporal evolution of temperature on Earth. Using an extensive compilation of Phanerozoic δ 18 O data (Veizer and Prokoph, 2015), we quantified the spatiotemporal evolution of sampling. Subsequently, we tested the impact of variable paleolatitudinal sampling by generating artificial global paleotemperature estimates for the past ∼500 m.y. through sampling models of latitudinal temperature gradients. This approach reveals the direction and possible magnitude of temperature offset in reconstructions of Phanerozoic paleotemperature.

MATERIALS AND METHODS Oxygen Isotope Data Set
Phanerozoic δ 18 O values were extracted from a compilation of marine carbonate isotopes (Veizer and Prokoph, 2015). This data set and derivatives thereof have been used extensively in a range of studies (e.g., Eichenseer et al., 2019;Mills et al., 2019;Song et al., 2019). For our analyses, we restricted the Phanerozoic data compilation to shallow-water entries (<300 m depth) and temporally binned the data into stage-level bins (Table S1 in the Supplemental Material 1 ), based on The Geological Time Scale 2012 (Gradstein et al., 2012) age estimates provided by Veizer and Prokoph (2015). The sensitivity of our results to binning protocol was tested using 10 m.y. bins (see the Supplemental Material). Following previous work (Reddin et al., 2018;Eichenseer et al., 2019;Mathes et al., 2021), we converted δ 18 O values to seawater temperature using the δ 18 O (‰, Peedee belemnite) to temperature (T; degrees Celsius) transfer function from Veizer and Prokoph (2015), including a Phanerozoic trend of increasing seawater δ 18 O, with t denoting the age in millions of years before present: δ O ‰ ‰ ‰ (1) Veizer and Prokoph (2015) inferred this trend using low-latitude data, but a similar Phanerozoic trend emerges with the data from mid-to high latitudes (Fig. S1 in the Supplemental Material). To enable our spatial analyses, we geocoded the δ 18 O data, assigning present-day coordinates to all data (see the Supplemental Material). Approximately 10% of the data lacked any reference to a locality and were removed from the data set. The effect of removing a further 7% of the data for which only the country of origin was known was also evaluated (Figs. S2 and S4-S5). Using the assigned coordinates from geocoding, each data point was paleorotated via the PALEOMAP plate model (Scotese and Wright, 2018). The final data set contains 20,093 δ 18 O samples (Fig. 1).

Spatial Sampling Metrics
To quantify the spatiotemporal evolution of sampling, we calculated three metrics within each temporal bin: (1) the absolute paleolatitudinal median of δ 18 O data; (2) the number of occupied equal-area grid cells (100 km spacings) gen-erated via the R package dggridR (Barnes et al., 2020); and (3) the summed minimum-spanning tree (MST) length between δ 18 O samples; i.e., the minimum total distance of segments capable of connecting all samples in their paleorotation. The number of occupied equal-area grid cells provides a measure of overall sampling coverage, whereas the summed MST length provides a measure of the spatial extent of sampling.

Global Paleotemperature Estimates
To evaluate the influence of heterogeneous spatial sampling on global reconstructions of paleotemperature, we extracted temperature values from two modeled latitudinal temperature gradients: (1) a "modern-type" steep gradient, and (2) an "Eocene-type" flattened gradient ( Fig. 2B; Fig. S3). These models represent two distinct climate states throughout Earth's history ("icehouse" and "greenhouse", respectively) and were generated via a generalized additive model, with temperature regressed against absolute (paleo-)latitude (see the Supplemental Material). For the modern-type model, we used the Bio-ORACLE global grid of present-day mean annual sea-surface temperature (https://www. bio-oracle.org; Tyberghein et al., 2012). The Eocene-type model was produced using a late Paleocene-early Eocene compilation of paleotemperature proxies from Zhang et al. (2019). To generate stage-level mean temperature estimates, based purely on sampling, we extracted temperature values from the gradients using the respective paleolatitudes of stage-binned δ 18 O samples. This mean was then compared to the "known" global temperature from the moderntype and Eocene-type models. The global temperature from latitudinal temperature gradient models was calculated as the mean of the average temperatures within all 1° latitudinal bands, weighted by surface area within each latitudinal band (modern: 17.84 °C; Eocene: 27.58 °C). To evaluate whether changes in the paleolatitudinal median of sampling explain observed changes in the Phanerozoic δ 18 O temperature curve, we implemented ordinary least-squares regression of the global mean temperature against the absolute paleolatitudinal median of sampling, with both variables first differenced.
The number of stage-level occupied equal-area cells, a measure of the paleogeographic coverage, is also variable through time (μ = 16.9 ± 13.6; Fig. 3B). A maximum of 61 occupied cells is observed for the Campanian and Maastrichtian (Late Cretaceous), whereas most Cambrian stages as well as the Induan (Early Triassic) lack any samples. Coverage is substantially higher from the Jurassic onward (μ = 26.4 ± 13.7) than before the Jurassic (μ = 8.3 ± 5.3). Likewise, stagelevel paleogeographic extent, measured by summed MST length, varies through time (μ = 21,473 ± 17,841 km; Fig. 3C). Summed MST length scores vary throughout the Paleozoic and early to mid-Mesozoic and are lowest in the Silurian, Triassic, and earliest Jurassic prior to a long-term increase (Fig. 3C). The substantial rise in summed MST length and paleolatitudinal range (Fig. 3A) from the Albian onward coincides with an increase in the number of samples originating from ocean drilling sites (Fig. 3B), which tend to be less geographically clustered than non-ocean drilling site data (marine fossils collected on land).

Temperature Curves
We evaluated the validity of the Phanerozoic paleotemperature curve (Fig. 4A) by sampling the modern and Eocene latitudinal temperature gradient models using the paleolatitudes of δ 18 O samples. Due to incomplete and heterogeneous spatial sampling, stage-level averages of extracted temperatures differ considerably from the empirical modern and Eocene global mean temperature in most stages (Figs. 4B and 4C). The magnitude of this variation is smaller in the Eocene model due to a flatter temperature gradient, yet similar temporal patterns persist, with the direction of temperature bias consistent in 84 of 88 stages (Figs. 4B and 4C). Sampling

modern-type" and "Eocene-type" latitudinal temperature gradients. (C) The mean of all extracted temperature values (blue and red points) for modern and Eocene gradients are compared to the "known" (from models in B) global mean temperature of the modern and Eocene (blue and red lines).
in low paleolatitudes results in extracted stagelevel global mean temperatures that are above the "known" global mean temperature in ∼70% of stages (modern: 64 of 88; Eocene: 60 of 88; Figs. 4B and 4C). Overall, the Phanerozoic δ 18 O temperature curve tracks the sampling-based extracted temperature curves across some, but not all, intervals (Fig. 4D). Regression analysis demonstrates that changes in the δ 18 O temperature curve are partially explained by changes in the median of paleolatitudinal sampling (R 2 = 0.172, P < 0.001), after removing the Early Triassic outlier (Fig. S8).

DISCUSSION
Our results suggest that the spatial distribution of δ 18 O samples influences our understanding of the Phanerozoic evolution of global temperature. The paleolatitudinal median of sampling shifts substantially over the past ∼500 m.y. (Fig. 3A), inducing potential artifactual temperature fluctuations with a magnitude of up to 20 °C (Fig. 4). This temperature offset is greater under a steep, "modern-type" latitudinal temperature gradient, but both the modern-type and a flatter "Eocene-type" gradient suggest that stage-level global mean temperatures might be overestimated in ∼70% of stages. While latitudinal temperature gradients have changed substantially over geological time (e.g., Zhang et al., 2019; Alberti et al., 2020), our analysis provides an estimate of the potential range of this offset under "icehouse" and "greenhouse" conditions.
Comparison of the Phanerozoic δ 18 O temperature curve with the extracted, samplingbased temperature curves enables evaluation of perceived temporal changes in global temperature. For example, the major Ordovician cooling trend observed from the δ 18 O temperature curve appears to be genuine, with minimal temporal shifts in extracted modern and Eocene temperatures. In fact, the observed equatorward shift of sampling toward the end of the Ordovician suggests that cooling might be even more extreme than previously considered (e.g., Shields et al., 2003;Trotter et al., 2008). By contrast, Late Cretaceous warming may be overestimated due to an equatorward shift in the median of paleolatitudinal sampling. Yet, the fact that δ 18 O temperature still exceeds the Eocene extracted temperature suggests that this interval might be genuinely warmer than even the early Paleogene, supporting the hypothesized "Cretaceous Thermal Maximum" (Huber et al., 2002(Huber et al., , 2018Friedrich et al., 2012). Continental drift, in tandem with preferential sampling of some continents, might also have influenced observed trends in the Phanerozoic δ 18 O temperature curve. For example, the poleward shift of Europe and North America from the Middle Triassic to Late Jurassic is tracked by a poleward shift in the paleolatitudinal median of sampling as well as a long-term cooling trend (Fig. S9). This mechanism has also been shown to influence perceived trends in global and paleolatitudinal biodiversity (Allison and Briggs, 1993;Jones et al., 2021). While the influence of sampling bias is well documented in the fossil record (e.g., Vilhena and Smith, 2013;Close et al., 2020), it has been largely overlooked in the oxygen isotope record. This may have affected the interpretations of previous studies investigating the influence of climatic drivers on climatically sensitive organisms and ecosystems in deep time.
Overall, our results show that global paleotemperature estimates are not reliable if the incompleteness and spatial heterogeneity of the underlying data are disregarded. An alternative means to study the coevolution of climate and life on Earth is given by general circulation models (e.g., Jones et al., 2019;Saupe et al., 2019), which provide spatiotemporally explicit data and fill proxy-data-deficient gaps (Haywood et al., 2019). Nevertheless, compilations of temperature proxies-such as δ 18 O measurements-are vital for constraining general circulation models and still offer considerable opportunity for reconstructing regional or local temperature changes, provided that geographical data (i.e., coordinates) are well documented. The extensive paleolatitudinal range of sampling from the Albian onwards suggests that reliable global paleotemperature estimates may be obtained for the past ∼110 m.y. This could be achieved with subsampling protocols, to control for uneven proxy sampling intensity, and by constructing latitudinal temperature gradient models to account for latitudinal variation. These measures, along with greater synthesis of a range of paleotemperature proxies (e.g., Zhang et al., 2019), may ultimately produce reliable, proxy-based estimates of global paleotemperature across substantial parts of the Phanerozoic.