Marine planktonic microfossils have provided some of the best examples of evolutionary rates and patterns on multi-million-year time scales, including many instances of gradual evolution. Lineage splitting as a result of speciation has also been claimed, but all such studies have used subjective visual species discrimination, and interpretation has often been complicated by relatively small sample sizes and oceanographic complexity at the study sites. Here we analyze measurements on a collection of 10,200 individual tests of the Eocene planktonic foraminifer Turborotalia in 51 stratigraphically ordered samples from a site within the oceanographically stable tropical North Pacific gyre. We use novel multivariate statistical clustering methods to test the hypothesis that a single evolutionary species was present from 45 Ma to its extinction ca. 34 Ma. After identification of a set of biologically relevant traits, the protocol we apply does not require a prior assignment of individuals to species. We find that for most of the record, contemporaneous specimens form one morphological cluster, which we interpret as an evolving species that shows quasi-continuous but non-directional gradual evolutionary change (anagenesis). However, in the upper Eocene from ca. 36 to ca. 34 Ma there are two clusters that persistently occupy distinct areas of morphospace, from which we infer that speciation (cladogenesis) must have occurred.

Marine planktonic microfossils have provided some of the best examples of evolution in the fossil record because they can be obtained in large numbers from stratigraphically continuous seafloor sediments. Whereas it is claimed that evolution in some types of organisms typically occurs in small geographic isolates with stasis elsewhere (Gould 2002: pp. 765–773), this appears not to be the case in marine plankton which are dispersed and mixed over wide geographic areas by circulating ocean currents (Norris 2000; Lazarus 2011). Detailed stratigraphic and morphometric sampling of microfossil lineages was pioneered by Hays (1970) and Kellogg (1975, 1976) on Pacific and Southern Ocean radiolarians, revealing examples of gradual size change and apparent lineage branching (speciation). Malmgren and Kennett (1981), Arnold (1983), and Malmgren et al. (1983) used a similar approach to investigate a variety of species of planktonic foraminifera, tracing gradual evolution over multi-million-year time scales. These and other subsequent studies are summarized in the Appendix, which compares the type of organism, number of specimens and stratigraphic horizons examined, and the type and number of traits measured on each microfossil. Studies to date have revealed widespread evidence for morphological evolution (Backman and Hermelin 1986; Hunter et al. 1988; Wei and Kennett 1988; Biolzi 1991; Wei 1994; Lazarus et al. 1995; Malmgren et al. 1996; Kucera and Malmgren 1998) and some examples of apparent lineage branching which is interpreted as speciation (Kellogg 1976, Lazarus 1986; Lazarus et al. 1985, 1995; Sorhannus et al. 1988, Norris et al. 1996; Knappertsbusch 2000, 2007; Hull and Norris 2009).

A frequent complicating factor in previous studies is that changes in ocean circulation patterns can affect the interpretation of a given down-core record. For example, movement of oceanic fronts over a study site might mean that the stratigraphic record imprints an element of geographic variability in the measured traits, which may be difficult to disentangle from evolutionary change (Scott 1982; Lohmann and Malmgren 1983; Scott et al. 2007). Another issue for those studies that claim to resolve phyletic branching patterns is that species are distinguished by the eye of a specialist. Although this may allow very fine species-level discrimination, the resulting evidence is not objective because it is always possible that a worker might divide a single population arbitrarily on size or shape criteria, whereupon subsequent measurements on the two groups would merely support this artificial subdivision. Another limitation of some studies is that the relatively small sampling size may limit the evolutionary patterns that can be resolved (see Appendix).

We attempt to minimize these problems in a new study of the Eocene planktonic foraminifer Turborotalia. Planktonic foraminifera are sexually reproducing protists that secrete a calcite test built from successively added chambers, with an external aperture (Hemleben et al. 1989). These tests are commonly one of the main constituents of seafloor sediment, where they can accumulate gradually over millions of years. We have chosen a core location (Ocean Drilling Program Site 865) that has been, for its entire history, within the tropical North Pacific gyre (Longhurst 2007), one of the two largest and most stable oceanographic provinces on Earth. Continuous rotation of water and lack of internal geographic barriers make the gyre a single coherent and constantly mixed biological province with continuous connection to the world ocean. Sedimentological and geochemical evidence (Bralower et al. 1995; Coxall et al. 2000; Tripati and Elderfield 2004) indicates an oligotrophic and well-stratified water column through the study interval. The absence of siliceous microfossils such as radiolarians confirms that the site was never influenced by the tropical upwelling belt. Every sample that we examined contains abundant specimens of Turborotalia, showing that it was continuously present in large numbers for the entire ∼11 Myr of its existence and can essentially be sampled “at will.” Although it is necessary for a specialist to manually separate the general target morphology from the sediment, including other types of planktonic foraminifera, the distinction is fairly easy and we did not subsequently attempt to classify or discriminate potential species by eye. Instead we developed a multivariate protocol that uses the traits that are routinely used by stratigraphers and taxonomists to assign individuals to species (Ezard et al. 2010), allowing us to test our initial hypothesis that just one evolutionary lineage can be detected through the entire record (Toumarkine and Bolli 1970; Pearson 1993).

Sampling and Measurement .—

Ocean Drilling Program Site 865 (18°26′N, 179°33′W; paleodepth 1300–1500 m) was drilled on Allison Guyot, a seamount in the equatorial central North Pacific (Shipboard Scientific Party 1993). The hole penetrated 140 m of unconsolidated pelagic carbonates (mainly foraminifer-nannofossil ooze) of Paleocene to Pleistocene age. Age control for the expanded middle and late Eocene part of the succession (Fig. 1) is provided by a series of biostratigraphic datums based on nannofossils (Bralower and Mutterlose 1995) and foraminifera (H. K. Coxall and P. N. Pearson unpublished data). Although always within the gyre, the site has been drifting north with the Pacific Plate since the Paleocene.

The extinct Turborotalia cerroazulensis group of morphospecies is very abundant in the interval ca. 45–34 Ma. For detail of the morphology we provide a 360° rotation movie of a typical test in Supplemental Information Item 1. Our initial qualitative observations indicated that the pattern of test shape change at Site 865 is similar to that previously described from sediments worldwide which has been used in biostratigraphic schemes to split the group into a series of time-restricted morphospecies (e.g., Bolli 1957; Blow and Banner 1962; Toumarkine and Bolli 1970; Toumarkine 1975, 1978; Blow 1979; Toumarkine and Luterbacher 1985; Premoli Silva and Boersma 1988; Pearson et al. 2006; Wade et al. 2011). Accordingly, tests are relatively rounded in the middle Eocene (T. frontosa), becoming more angular upsection (T. possagnoensis, T. pomeroli) and, in the upper Eocene, acquire a faint rim or keel around the periphery (T. cerroazulensis) or become morphologically flattened (T. cocoaensis, T. cunialensis). Other observable changes include a general increase in size, an increase in the number of chambers per whorl, and a change in the predominant coiling direction from dextral to sinistral.

As is often the case, the species-level classification discussed above arose historically from the efforts of taxonomists to distinguish Turborotalia in sporadic samples from different ages and places (reviewed by Pearson et al. 2006). The existence of a set of morphospecies, even if they are of proven biostratigraphic use, does not imply that they necessarily represent separate biological species (Trueman 1930; Pearson 1998; Ezard et al. 2012). Even if morphospecies coexist for many millions of years, we are not entitled to assume that they therefore represent distinct biospecies as recently argued by Strotz and Allen (2013) (see also Aze et al. 2013). Indeed, when stratigraphically continuous records were first examined in detail (Toumarkine and Bolli 1970; Toumarkine 1975, 1978), it was suggested that the above six morphospecies actually represent transitional stages in the evolution of a single globally distributed evolutionary species. If so, this group of morphospecies would be one of the clearest examples of evolutionary transition in the fossil record and provide important insights into the rate and pattern of evolutionary processes in the ocean plankton.

We took 51 × 10 cm3 samples through a ∼50 m part of the succession in the middle and upper Eocene at intervals of ∼1 m, representing an average sample spacing of ∼200 Kyr, although minor coring gaps and variations in sedimentation rate mean that some parts of the record are more densely sampled than others. Sediment mixing by seafloor currents and bioturbation means that each sample is mixed over at least several thousand years, but adjacent samples in our set are far enough apart to be well separated from each other in geological time. We washed sediment samples over a 63-μm sieve and manually separated the first 200 tests of the Turborotalia cerroazulensis group under the microscope, using the taxonomy of Pearson et al. (2006) (this group comprises the morphospecies T. frontosa, T. possagnoensis, T. pomeroli, T. cerroazulensis, T. cocoaensis, and T. cunialensis). We avoided other morphospecies that are not considered part of this group (T. altispiroides, T. ampliapertura, and T. increbescens, and all other planktonic foraminifera). We also avoided apparently reworked specimens (e.g., iron-stained tests or obviously out of place specimens in levels with known reworking). Tests were manually picked and permanently mounted sequentially in edge view on cardboard slides. Fine orientation of each specimen was achieved by using a universal stage, upon which we took a photograph and 13 measurements to generate ten taxonomically informative traits (Fig. 2; orientation and measurements were made by a single individual to ensure consistency). The aim was to capture the main aspects of morphological variation that are used by taxonomists in the discrimination of morphospecies, which is generally achieved in edge view (Pearson et al. 2006). Measurements were based on those selected by Malmgren and Kennett (1981) for a morphologically similar case study. The entire collection of 10,200 individuals is deposited in the National Museum of Wales, Cardiff (accession number: 2013.47G). All data are provided in Supplemental Information Item 2.

Statistical Methods .—

Our approach required the adoption of several recently developed statistical techniques (see Ezard et al. 2010 for full discussion). First we tested, separately for each stratigraphic sample, whether one or more discrete clusters in multivariate space better represent the data without assuming the number, size, or shape of such clusters. This test consists of four steps: (1) obtain orthogonal components that are robust to non-normal variance (Croux and Ruiz-Gazen 2005); (2) reduce dimensionality to only those components with significant explanatory power (Peres-Neto et al. 2005); (3) identify the optimal number, shape, and orientation of clusters within the rotated, dimension-reduced data (Fraley and Raftery 2002); and (4) perform model diagnostics to assess the effect of outliers on the rotation and clustering (Filzmoser et al. 2008). All calculations were performed in the R environment (v. 2.11.1, R Development Core Team 2010) and used the pcaPP (v. 1.8, Filzmoser et al. 2005), “mvoutlier” (v. 1.4, Filzmoser et al. 2008), and “mclust” (v. 3.4.5, Fraley and Raftery 2002) packages, as well as custom functions based on the “paran” package (v. 1.4.3, Dinno 2009). A full description of the steps we took is given in Ezard et al. (2010), which also contains an R script with self-contained example, but we discuss some key considerations here.

The abundant data lets us test and, potentially, relax some restrictive assumptions, which facilitates assignment of individuals to clusters without prior specification. A particularly relevant assumption for biospecies identification during the early stages of divergence is the assumption of normal (Gaussian) distributions (or joint normal distributions in multiple dimensions). Principal components are independent (orthogonal) only if the data are joint normally distributed. Data collected from natural populations will often fail this criterion; hence we have explored ways to obtain components robustly (Ezard et al. 2010). In particular, outliers can, potentially, exert a large effect on the orientation of the principal components (Croux and Ruiz-Gazen 2005) and rotated axes are more likely to pass through the center of the multivariate data cloud when obtained using robust covariance estimators (Li and Chen 1985). Populations containing incipient biospecies are very unlikely to be characterized optimally by means and standard deviations. Over time, the putative, incipient biospecies may diverge entirely and the sample distribution will be bimodal. During divergence, it is harder to describe the distribution, which may have a wide plateau or be strongly skewed, parametrically, making a robust approach that does not assume Gaussian distributed traits preferable to a “standard” principal components analysis.

Because the robust estimation is based on algorithmic identification of the most variable axis, we generated 100 dimension-reduced data sets for each stratigraphic sample. Each dimension-reduced population was clustered using a Bayesian model-based methodology (Fraley and Raftery 2002), which defines clusters by the volume, shape, and orientation of the data structure to test, for example, whether round clusters fit better than elliptical ones. The likelihood of particular cluster arrangements is obtained by iterative expectation-maximization methods for maximum likelihood. We tested all structures for their dependence on outlying data points (Filzmoser et al. 2008), discarding 100 of our 10,200 individuals for this reason in the dimension-reduced data sets. The choice between competing volumes, shapes and orientations of cluster models was made using the Bayesian (or Schwarz) Information Criterion, which provides a compromise between variance explained and the number of parameters used (Burnham and Anderson 2002). The model with most support was selected using model weights, which can be interpreted as the probability that a particular model provides the optimal description of the data given a set of candidates. If fewer than 80 of the 100 dimension-reduced data sets support models with more than one well-separated cluster, we argue that this support is not sufficiently strong to advocate retention over simpler models that suggest homogeneous data, i.e., no species differentiation within the sample.

Having found our clusters, we perform a final step called stratophenetic linkage (after Gingerich 1990) by comparing stratigraphically adjacent populations. If there is strong overlap in morphospace between the clusters in respective samples, we assume they are linked in an ancestral-descendant relationship. The clustering protocol will occasionally result in Type I errors (false positives) where, for example, two clusters are found instead of one, or three instead of two. To minimize these we interpret the clusters as well-separated species only if (1) similar clusters are found in at least one stratigraphically adjacent sample, and/or (2) the level of statistical support is not marginal.

Our approach assumes that different biological species are morphologically distinct. We acknowledge that discrimination of closely similar taxa may be problematic and any amount of morphologically invisible “cryptic” speciation may have occurred. Genetic studies of modern planktonic foraminifera (e.g., Huber et al. 1997; de Vargas et al. 1999; Darling and Wade 2008) suggest that cryptic genotypes are abundant in the modern ocean although this is not true of all morphospecies (André et al. 2013). However, if a lineage shows statistically significant morphological change over time, barring parallel cryptic evolution we would expect that even initially cryptic species, if sufficiently long lived, would eventually diverge morphologically and be resolved by our methods (Alizon et al. 2008).

A typical example of a set of univariate traits (a variety of lengths, ratios, and angles) from a single population is shown in Figure 3. There is always a strong overlap in the trait distributions in stratigraphically adjacent samples. This is well demonstrated by a set of figures similar to Figure 3 for all 51 samples, which is available as Supplemental Information Item 3.

Two examples of multivariate cluster analyses are contrasted in Figure 4, one showing support for just one cluster and the other for two. A metric illustrating the degree of statistical support for the presence of two or more clusters is plotted against geological time in Figure 5. Whereas most (41) of the stratigraphic samples constitute one morphocluster, ten samples showed significant support for two or more clusters in all of the 100 dimension-reduced data sets. Of those, two have much weaker support than the others and are stratigraphically isolated in the record (at 39.0 and 42.1 Ma) and consequently are rejected. In contrast, the remaining eight form a successive series in the upper Eocene younger than 36 Ma with strong support for at least two clusters, and weak support for a third cluster in two instances (at 34.7 and 35.6 Ma). The three-cluster cases show no clear stratigraphic pattern. Hence we conservatively regard our record as consisting of one cluster for most stratigraphic levels and then two for the last eight samples. A detailed breakdown of the results for each stratigraphic sample, including the number of principal components retained, the number of clusters found, the best model cluster shape (one dimensional, spherical, ellipsoidal, diagonal), and the number (if any) of outliers removed, is available as Supplemental Information Item 4.

For the eight successive cases where two clusters were found, in each case cluster separation was based on similar traits (i.e., the clusters can be linked stratophenetically). In Figure 6 we show the means, 95% confidence intervals, and total range of values of the stratigraphically more variable traits. This diagram illustrates the general pattern of evolution. A movie showing a 3-D point cloud of three important traits is also available as Supplemental Information Item 5. The movie conveys the style of morphologic and variance change from level to level throughout the record in a way that a static figure does not, emphasizing the stratigraphic patterns and how the axis of divergence between clusters in the eight successive two-cluster cases is broadly consistent.

For convenience we divide our discussion of the evolutionary patterns into intervals before and after 36 Ma. From 45 to 36 Ma there is always very strong overlap in all traits between stratigraphically adjacent samples, although over millions of years the samples become highly divergent such that specimens from the beginning, middle, and end of the record are very easily distinguished from one another, even if this is not necessarily clear from any single trait. The record contains many statistically significant short-term reversals among adjacent samples. In general, there is more lasting evolutionary change in the earlier part of the record, 45–40 Ma, than in the period 40–36 Ma. The change in morphology through time (chronocline) might in principle be explained either by evolution or as a local response such as a geographic cline moving over the study site, or a combination of such factors. However, in this case, the gyre setting argues against significant local complexities, as does the fact that the morphological changes are the same as have long been used in worldwide biostratigraphic correlation (Pearson et al. 2006). Hence we interpret the main chronocline as representing evolution although we cannot rule out local peculiarities that only broader geographic sampling will uncover.

The most parsimonious explanation of the data is that successive populations are directly linked in a chain of ancestry and descent in the gyre, with as yet unstudied connections to the rest of the world ocean. In other words, the fossils represent an evolutionary lineage in the sense of Simpson (1961: p. 153; see also Aze et al. 2011 and Ezard et al. 2012, for further discussion of this concept). To some paleontologists, arguing from a strictly cladistic standpoint, claims of ancestry and descent from the fossil record are inherently inadmissible (e.g., Gee 2000: p. 147), so it is important to define what we mean by ancestry and descent. Of course we can never know whether an individual fossil, a, in our collection was literally an ancestor of another, b, from a higher stratigraphic level (later time), because we cannot assess the reproductive success or otherwise of individual a or its possible descendants over the next few generations. So the hypothesis of “hard” ancestry between fossils a and b is virtually untestable. It is, however, well known that any individual in genetically well-mixed sexual populations becomes literally ancestral to the whole population after a small number of generations (Fisher 1930). Because the populations are ∼200 Kyr apart, it becomes trivial for most purposes (for example in morphometric studies of evolution) whether individual a is a hard ancestor to b or a close cousin to such an ancestor. We call this “soft” ancestry and emphasize that (1) it is amenable to study in the fossil record and (2) it is a very different concept to that of the sister-group relationship between Operational Taxonomic Units, which is axiomatic in cladistic methodologies. We also contend that given any reasonable population or quantitative genetic model, or indeed phylogenetic model involving cryptic species, our collection must actually include many hard ancestors of all others from higher stratigraphic levels even though they are indistinguishable, given our data, from those individuals that are not hard ancestors.

Accepting that this part of our record is an ancestral-descendant lineage, we applied a series of models (Hunt 2006) to test which mode of evolution (stasis, random walk, or directional trend) best fits our data (Table 1). For nine of the ten traits considered, the model with most support is an unbiased random walk, indicating significant anagenetic change without a consistent directional trend over the whole time period. The exception is test chirality (coiling direction), which shows a significant trend toward sinistral coiling from an initial dextral dominance. At present we offer no explanation for why this apparently neutral trait should show directional change over such a long time interval.

Having previously followed Toumarkine and Bolli (1970) in regarding the group as a probable single lineage (Pearson 1993), we were surprised to find consistent support for two clusters throughout the interval 36–34.8 Ma. One cluster comprises generally smaller tests that are more dorsoventrally compressed and have more equidimensional apertures than average (illustrated graphically in Fig. 4), although in most other respects the tests in both clusters are similar. Expressed as morphospecies, the more compressed forms are more similar to the holotype of Turborotalia cocoaensis (Cushman 1928) and the less compressed forms to that of T. cerroazulensis (Cole 1928). We interpret these clusters as representing two separate evolutionary lineages that share a common ancestor in the single lineage that existed before this time (Fig. 7). In most traits except the number of chambers in the final whorl, the T. cerroazulensis-like cluster is more similar to the ancestral lineage.

The divergence presumably occurred sometime in the early part of the late Eocene (Fig. 6); at least, that is when it becomes apparent in our data. With only a single sample site examined so far it is not possible to determine whether Site 865 documents the inferred time of speciation itself or if it occurred in some distant area followed by migration into the North Pacific. We note, however, that compressed morphologies of the T. cocoaensis type are common worldwide in the upper Eocene, but not in the middle Eocene (Pearson et al. 2006); hence, speciation by sympatry involving reproductive timing (Lazarus et al. 1995; Pearson et al. 1997) or perhaps habitat depth-related parapatry (Hull and Norris 2009) in globally connected populations ca. 36–37 Ma remains a likely explanation. Note that the extinction level of these species is not recorded in our study because of a sedimentation gap, but both became extinct simultaneously near the beginning of a period of rapid global cooling ca. 33.8 Ma (Wade and Pearson 2008; Wade et al. 2011).

In principle the evolutionary shape changes seen in our study could either be “neutral” or “adaptive.” It is impossible to prove adaptation by selection in the fossil record because selective events themselves generally leave no trace. Assuming adaptation can lead to instances of lazy or circular thinking that have been criticized as “Just So Stories” (Gould 1978) or “adaptationism” (Gould and Lewontin 1979). With this limitation in mind, we nevertheless presume that test morphology must have affected feeding, protection, and other factors that were important for fitness, at least in subtle ways, in which case test shape would have been subject to selection pressure. Applying Wright's (1931) classic metaphor of an “adaptive landscape” as developed by Simpson (1944) for morphospace (see Arnold et al. 2001 for discussion of these concepts), the multi-million-year time scale of our study and an average gap between samples of just over 200 Kyr preclude the suggestion that the Turborotalia population could have been gradually evolving toward some distant adaptive peak for most of the record. Instead we envisage individuals as always being well adapted on the time scales of our study, in which case the morphological variability we observe describes changes, usually slow but sometimes abrupt, in the shape of the adaptive landscape over time.

Such changes are likely to have been driven by environmental and ecological conditions in the gyre. The predominantly gradual change that is observed in the measured traits may be linked to the long-term deep-water cooling and increased thermal stratification that characterize the middle and upper Eocene oceans (Zachos et al. 2008). More compressed shapes may contribute to lower overall buoyancy; hence, as has been argued for Neogene Globoconella (Wei and Kennett 1988), a trend toward greater test compression could have been caused by water-mass cooling in which an incumbent species continually adapts to subtle increases in density in its environment. Extending this model, the division into more and less compressed species ca. 36 Ma could reflect speciation into different depth / density habitats in the water column. These suggestions could be explored through geochemical analysis of the tests, although we note that the limited data available to date for depth-differentiation of late Eocene Turborotalia morphospecies are inconclusive (Poore and Matthews 1984; Boersma et al. 1987; Coxall et al. 2000; Pearson et al. 2001; Wade and Pearson 2008).

Examples of evolutionary transition in the fossil record are rare because of the well-known problems of incompleteness first highlighted by Darwin (1859: Chap. 9). Marine microfossils have provided some of the best examples of evolution across time scales of millions of years. Our study is one of the largest morphometric investigations of a fossil lineage to date. By measuring ten traits on 200 individuals in each of 51 populations, and using recently developed statistical methods that differentiate species from variation in data rather than by subjective prior classification, we are able to test for morphological clustering at each stratigraphic level. Our study shows that for most of the ∼11-Myr record just one evolutionary lineage can be detected, which appears to evolve gradually over time, although with many “reversals” in the measured traits and without any consistent morphological trend. Toward the end of the record we consistently detect two such lineages, which are distinguished by both size and shape, indicating that speciation has occurred. Our case study is supported by a variety of movies and other resources online.

We thank M. Carroll, J. Gregson, S. Knott, R. Laver, and J. Retter for help in assembling the collection. H. Coxall contributed biostratigraphic data. A. Purvis, H. Coxall, T. Aze, and B. Wade contributed to discussion and interpretation. We thank C. Watling for collaboration on visualization tools. The manuscript benefited from detailed reviews by D. Lazarus and B. Hannisdal for which we are grateful. This work was supported by NERC grant NE/E018165/1 to A. Purvis and P. N. Pearson, NERC Grant NE/I005870/1 to G. Foster and P. N. Pearson, and NERC Fellowship NE/J018163/1 to T. H. G. Ezard.

This article is licensed under a Creative Commons Attribution 3.0 U.S. License.