Large-scale biogeographic provinces of Cretaceous ammonoids, as currently defined in the literature, were delimited using qualitative assessments of taxonomic inventories. Using aggregated species occurrences in the Paleobiology Database, we generated a geographic network to quantify connectivity of Albian epicontinental basins and used the flow-based Infomap algorithm to delineate bioprovinces. Despite taxonomic, stratigraphic, and geographic limitations of the data, the Infomap bioprovinces are largely concordant with the traditional, qualitatively derived biogeographic model, including the Boreal-Pacific Subrealm, Arctic Subrealm, Tethyan Realm, and Austral Realm. An agglomerative hierarchical cluster analysis applied to the same occurrence data failed to replicate the Infomap bioprovinces or reproduce the traditional qualitative model. The observed asymmetrical distribution of the Infomap bioprovinces is consistent with the known hemispheric differences in paleogeographic and oceanographic features of the Albian Earth. The geographic network derived from ammonoid data is twice as dense as the one derived for Albian benthic marine invertebrates and thus more effective in delineating global biogeographic units. The network-based approach establishes a reproducible quantitative framework for delineating geographic boundaries of marine bioprovinces, tracking biogeographic changes over evolutionary time scales, and identifying biotic and abiotic factors that influence global partitioning of marine biodiversity.

Biogeographic studies of fossil organisms have contributed to our current understanding of the relationships between plate tectonics and evolution of life (Lieberman, 2005). However, the impact of those studies extends beyond basic biogeographic questions to include, for example, conservation biology and climate change (Perrin and Kiessling, 2012; García-Molinos et al., 2015). The network theory has brought significant advances to our understanding of biogeographic patterns in fossil organisms (Brayard et al., 2007; Dera et al., 2011; Sidor et al., 2013; Vilhena et al., 2013; Dunhill et al., 2016). One of the most relevant features of network analysis is its ability to detect community structure, i.e., natural partitioning of network nodes into densely connected subgroups (Newman and Girvan, 2004). The network-based approach has been demonstrated to be a powerful tool in modern biogeography (Vilhena and Antonelli, 2015; Edler et al., 2017). Here we employ a network approach to examine the biogeography of Albian (mid-Cretaceous) ammonoids using data from the Paleobiology Database (, a major geoinformatics initiative aimed at providing fossil occurrence data across all taxa retrievable from the geological record (Peters and McClennen, 2016).

Large-scale biogeographic regions for the Cretaceous have been delimited by ammonoid experts using qualitative assessments of taxonomic inventories (e.g., Kennedy and Cobban, 1976; Jagt-Yazykova, 2011). This approach in ammonoid research has deep historical roots and remains unverified by quantitative assessments (Ifrim et al., 2015). Because of the high dispersal potential of ammonoids, their biogeographic partitioning is likely to reflect large-scale physical, climatic, and/or biotic environmental changes (Bengtson and Kakabadze, 1999). We focused on the Albian because this stage is represented by the largest number of ammonoid records in the Paleobiology Database compared to the other Cretaceous stages. We explicitly test the qualitatively derived biogeographic model for the Albian ammonoids as defined in the biogeographic synthesis of Lehmann et al. (2015): the Boreal Realm (subdivided into Boreal-Atlantic, Boreal-Pacific, and Arctic subrealms), Tethyan Realm, and Austral Realm. These bioprovinces have been consistently recognized in studies on the distribution of ammonite taxa (e.g., Owen, 1973; Kennedy and Cobban, 1976; Page, 1996).


Occurrence data of Albian ammonoids were downloaded from the Paleobiology Database on 20 January 2016. The search was restricted to occurrences with species-level resolution, and those with uncertain or provisional taxonomic identification (i.e., qualified by aff., cf.,, sensu lato, or quotation marks) were excluded. The following data fields were downloaded: collection number, genus name, species name, geologic formation, and present-day latitude and longitude. The reduced data set contained a total of 1795 occurrences that met the filtering criteria. Paleogeographic coordinates were determined using the PointTracker Software (Scotese, 2010) and rotated occurrences were plotted on the plate tectonic configuration from the PALEOMAP Project (Scotese, 2013). The stage-level resolution used in this analysis reflects data limitations: the number of species occurrences currently available is insufficient to conduct a meaningful network analysis at finer stratigraphic resolution.

Network Construction and Partitioning

The network analysis implemented here is a four-step process (Fig. DR1 in the GSA Data Repository1). First, we aggregate the rotated occurrence data into a geographical grid of 5 × 5 ° cells. This spatial resolution is widely used in studies of mid-Cretaceous climate and paleogeography (e.g., Fluteau et al., 2007). Second, aggregated data are used to generate a bipartite network (G) between species (S) and grid cells (P) denoted G = (V, E), where V is the node set of two disjoint subsets (S, P) (Fig. 1). EP × S is the edge set that links P and S subsets. The incidence matrix (B) (Table DR1) representing the connections between S and P has the elements Bij such that
Because no edges are possible within node subsets P and S, the proportion of all possible edges that are actually present (i.e., density) in the bipartite network G was calculated as |V| / (|P| × |S|). Third, we performed a weighted projection from the bipartite network G onto the node subset P (Alzahrani and Horadam, 2016). The projection procedure generates a geographic network GP = (P, EP) in which two grid cells k and lP are linked together if they have at least one common taxon in S (Fig. 2A). The adjacency matrix A representing the connections between grid cells in the projected network GP has the elements Akl such that
For a grid cell kP, let Γ(k) denote the set of neighbors of k. Then, the connection strength (CS) between distinct grid cells k and l in P is given by
where C(k) and C(l) are the total number of collections recorded at grid cells k and l. This common neighbors index, standardized using the number of collections, takes into account variations in sampling effort. We also performed a projection from the bipartite network G onto the node subset S and generated a network GS = (S, ES) in which two species are linked if they occur together at least in one grid cell (Fig. DR2). Fourth, we applied the Infomap clustering algorithm (Rosvall and Bergstrom 2008; Lambiotte and Rosvall, 2012) to partition the grid cells within the projected network GP into bioprovinces (Edler et al., 2017). Infomap finds the modular structure of the network with respect to flow by using random walks. The algorithm calculates the theoretical limit of how concisely we can describe the trajectory of a random walker on the network and selects the partition that gives the shortest description length. The analyses were performed using the R-package igraph 0.6 (Csardi and Nepusz, 2006).

The bipartite occurrence network of Albian ammonoids (G) comprises a total of 540 nodes partitioned into 78 grid cells (P) and 462 taxa (S). This network is fairly sparse, comprising 859 edges that represent only 2% of all possible connections (density 0.02). The number of edges per node of the subset P is highly variable (mean degree 11.0 ± 12.5 standard deviation, SD) and correlated with the number of sampled formations per grid cell in the data set (r = 0.7, n = 78, p < 0.01). Ammonoid species (subset node S) are connected with ∼2 grid cells on average (1.9 ± 1.6 SD) (Fig. 1). The emergent projection GP is a spatially explicit arrangement of 78 nodes connected by 433 weighted edges. The GP projection covers most of the Albian epicontinental basins and quantifies the strength of the connectivity across most marine regions (Fig. 2A). This geographic network is fairly sparse (density 0.14), and the number of edges per node approximates a power law distribution (Fig. DR3). The maximum node to node distance (diameter) of the network GP, as measured by number of edges, equals 5. The numbers of mutual connections between GP network nodes range from 1 to 43, with ∼12 mutual connections on average. A small fraction of nodes (∼6%) are disconnected from all remaining nodes.

The Infomap algorithm applied to the geographic network GP partitioned data into four non-overlapping bioprovinces. The number of nodes per Infomap bioprovince ranged from 9 to 40. Disconnected nodes were disregarded because they did not contain meaningful information about the overall network structure. The Infomap bioprovinces match closely the traditional qualitatively established biogeographic units: Boreal-Pacific Subrealm, Arctic Subrealm, Tethyan Realm, and Austral Realm. The modularity score of this division of the network GP, i.e., the fraction of edges within the given Infomap bioprovince minus the fraction expected if edges were distributed at random, is relatively low (Q = 0.24), but indicates the presence of a significant community structure (Newman, 2006a). To find out how stable this division of the network GP is, we compared the results obtained by Infomap to six different partitioning procedures using the Normalized Mutual Information similarity score (NMI) (Table DR2). According to the NMI, the Label Propagation procedure (Raghavan et al., 2007) produces a grouping most closely aligned with the Infomap output. The increasingly less concordant (although still generally consistent) groupings were derived by Walktrap (Pons and Latapy, 2006), Multilevel (Blondel et al., 2008), Fastgreedy (Clauset et al., 2004), and Leading Eigenvector (Newman, 2006b) partitioning procedures. The Edge Betweenness partitioning algorithm (Girvan and Newman, 2002) produced numerous small-sized communities. Because two partitioning algorithms can reach a similar level of performance (NMI) but produce different grouping structures (Orman et al., 2011), we implemented a qualitative comparison of the different partitions using the largest node overlap across procedures. Despite some variations, the grouping structure of the network GP was relatively consistent regardless of the algorithm and most nodes were within the corresponding Infomap community. However, some of those algorithms failed to distinguish Boreal subrealms, whereas some algorithms further subdivided the Tethyan Realm (Fig. 3). The Infomap bioprovinces differ from those obtained by an agglomerative hierarchical clustering method (Unweighted Pair Group Method with Arithmetic Mean, UPGMA) in which a few of the nodes clustered out of their geographic context (Fig. DR4). The network analysis was also implemented for Albian benthic invertebrates in the Paleobiology Database (Table DR3). The construction procedure resulted in a geographic network with a similar size, but less dense than that derived from ammonoids (Fig. DR5). The clustering procedure resulted in a network partition with a higher modularity for the benthic invertebrates compared to the ammonoids (Table DR4), and failed to reproduce the ammonoid Infomap bioprovinces.

The partitioning of the geographic network GP into four groups using the Infomap algorithm resulted in an analytical outcome that is largely concordant with the traditional biogeographic model of Albian ammonoids. The most obvious difference between the qualitative and network-based biogeographic model is that Infomap failed to delineate the Boreal-Atlantic Subrealm, either reflecting limitations of our data or inaccuracies of the traditional biogeographic model. This Boreal-Atlantic Subrealm has been, however, recognized in qualitative studies based on non-ammonoid taxa (e.g., Iba et al., 2011). Our outcome may also reflect the resolution limits of the Infomap algorithm implemented in igraph, which uses standard teleportation; a procedure used in random walk-based methods that allows walkers to randomly teleport across the network on any node (Lambiotte and Rosvall, 2012). Because the primary goal of this study was the global-scale test of major biogeographic bioprovinces, this limitation is not critical. The results presented here offer tentative evidence that the Boreal Realm was partitioned into the Arctic and Boreal-Pacific subrealms. The hierarchical cluster UPGMA failed to replicate the Infomap bioprovinces regardless of the level of similarity at which grid cells are joined (Fig. DR4), pointing to the advantage of Infomap for detection of biogeographic structure.

The location of the Austral-Tethyan boundary in our biogeographic model differs substantially from that suggested by Lehmann et al. (2015). In our model, the Austral Realm is restricted to the Australian epicontinental sea. However, a node representing the Northern Territory of Australia clustered with the Tethyan Realm (Fig. 2B). This configuration is driven by the occurrence of widespread forms in Australian borderlands and endemism of the Great Artesian Basin (Wright, 1963; Henderson, 1990). The Tethyan Realm in our biogeographic model extends from the open waters of the Tethyan Ocean, the Central Atlantic, and the paleo-Caribbean to paleolatitude ∼60°S, and includes the northernmost Antarctic Peninsula region. This asymmetric pattern may be controlled by the paleogeographic position of both South America and Africa, which were still relatively close and largely located in the Southern Hemisphere. Such a geographic configuration appears to have provided a number of relatively well connected epicontinental basins suitable for ammonoids. In addition, southern latitudes were relatively warm and the northernmost Antarctic Peninsula region maintained relatively strong marine connections with South America and Africa (Bice and Norris, 2002; Martin and Hartnady, 1986). Overall, the spatial distribution of the Infomap bioprovinces is reminiscent of the known large-scale paleogeographic and oceanographic features of the Albian Earth: restricted Arctic Ocean, relatively open Tethyan region, and partly isolated South Atlantic Ocean (Sewall et al. 2007). The latitudinal distribution of the Infomap bioprovinces (Fig. 4) appears to support the hypothesis that latitude-related factors also played an important role in shaping the biogeographic partitioning of the mid-Cretaceous ammonoids by reducing dispersal and constraining bioprovinces in space (Reboulet, 2001; Ifrim et al., 2015). However, these interpretations may partly depend on geographic and stratigraphic resolutions of analyzed data, and may thus be inapplicable for analyses carried out at finer observational scales. Our results shed new light on the controversial affinities of the Albian ammonoid faunas from the Netherlands Antilles (see Owen and Mutterlose, 2006). For multiple network partitioning procedures, the grid cell containing this tropical region clustered with the Arctic Subrealm (Fig. 2B). The spatial relationship among grid cells of the Infomap bioprovince representing the Boreal Realm in our model suggests that a Boreal affinity for areas proximal to the paleoequator is unlikely. This outcome is consistent with the Tethyan affinity suggested for this region by Owen and Mutterlose (2006). Centrality measurements on the projected network GS (Fig. DR2; Table DR5) allows us to identify important or highly connected species.

Despite having a similar size, the geographic network derived from ammonoid data is twice as dense as the network derived for Albian benthic marine invertebrates. The overall comparison indicates that benthic invertebrates do not replicate the biogeographic patterns observed in the ammonoids, and thus confirms the premise that ammonoid biogeographic partitioning is more likely to reflect large-scale environmental changes (Bengtson and Kakabadze, 1999). The approach utilized in this study establishes an alternative, objective standard framework for studying the spatiotemporal dynamics of the marine paleobioprovinces over evolutionary time scales (e.g., origination, extinctions, expansions, contractions, and migrations). It is also a powerful methodological framework for comparative biogeographic studies within and across taxa. This approach complements qualitative studies by providing a spatially unambiguous and reproducible strategy for quantitative delineation of bioprovinces.

We thank the contributors to the Paleobiology Database ( who collected Albian ammonoid data, especially Austin Hendy, Matthew Clapham, and Loïc Villier. We thank Martin Rosvall (Umeå University, Sweden) for his comments on the Infomap, Christopher Scotese ( for providing the PALEOMAP and PointTracker Software, and James E.T. Channell (University of Florida, Gainesville) for his comments on an earlier version of the manuscript. We thank Wolfgang Kiessling, Dieter Korn, an anonymous reviewer, and editor Judith Totman Parrish for helpful comments and constructive criticism that greatly improved this manuscript. Rojas is grateful to Elizabeth Sandoval for her continuous support. Rojas was supported by a Jon L. and Beverly A. Thompson Endowment Fund. Publication of this article was funded in part by the University of Florida Open Access Publishing Fund.

1GSA Data Repository item 2017213, Figures DR1–DR6 and Tables DR1–DR5, is available online at or on request from
Gold Open Access: This paper is published under the terms of the CC-BY license.