Using network analysis to trace the evolution of biogeography through geologic time: A case study

The biogeographic distribution of organisms has continuously changed through Earth’s history as plate tectonics changed the configurations of land masses, ocean basins, and climate zones. Yet, methods to investigate this dynamic through geologic time are limited. Here, network analysis is used to explore and to visualize the biogeographic history of brachiopods through the entire Triassic period. Many previously recognized biogeographic provinces are found, and in addition, the stratigraphic ranges of these provinces were identified. Provinces in the Tethys Ocean show the lowest degree of connectedness, which can be linked to higher evolutionary rates in this tropical ocean basin and possibly also to higher habitat heterogeneity. Stratigraphically, the Tethyan provinces are separated largely along the boundaries of the Early, Middle, and Late Triassic. This suggests that the events resulting in faunal changes among the index fossils used to define these sub-periods also affected the brachiopods. However, through the ~50 m.y. of the Triassic period, geographic proximity played a greater role in producing faunal similarity than proximity in geologic age. Thus network analysis is a viable tool to better understand the dynamic evolution of biogeography through geologic time.


INTRODUCTION
Many systems of interacting units in the biological, social, and technological world can be seen as complex networks.Obvious examples include food webs, neural and social networks, the Internet, power grids, and many more (Albert et al., 1999;Dunne et al., 2002;Newman and Park, 2002;Greicius et al., 2003;Pagani and Aiello, 2013).Understanding the nature of these networks is crucial for a broad range of applications, like assessing the vulnerability of critical infrastructures to failure, or predicting and preventing the spread of epidemics (Rinaldi et al., 2001;Pastor-Satorras et al., 2015).Of particular interest are methods to detect large-scale structures such as communities and modules in networks (Newman, 2012), and these approaches have recently begun to be used to address biogeographic questions (Moalic et al., 2012;Kougioumoutzis et al., 2014;Kiel, 2016).
Present-day biogeographic distributions have a historical basis, and the fossil record is a key to understand both this historical basis and the longterm evolution of biogeography (Jablonski et al., 2006;Renema et al., 2008).Biogeographic analyses through geologic time often use presenceabsence matrices and are typically done in a stepwise fashion, whereby each geologic time interval is analyzed separately and this succession is then described in a narrative way (Brayard et al., 2007;Neubauer et al., 2015;Dunhill et al., 2016).But this approach does not fully capture the dynamics in biogeographic evolution.Here, occurrence data of the phylum Brachiopoda from the entire Triassic period (~50 m.y.) are analyzed as a single complex network.The aims are to (1) delimit biogeographic provinces and their range through time, (2) evaluate the relative importance of time and geography in delimiting those provinces, and (3) assess regional differences in connectivity.

DATA
The basis is a recently published data set of late Permian to Late Triassic brachiopod occurrences (Ke et al., 2016).This data set was compiled from published sources, covers all brachiopod clades, and includes only taxa that had been systematically described and illustrated; the taxonomic unit used herein is the genus.The geographic unit used in the original study was called a "paleogeographic analysis unit" and represents a set of collections from a certain region delimited by geographic or tectonic borders (Ke et al., 2016); information on the sampled facies were not given.These units span 1.9°-4.6° of longitude on average, and values >12° (up to 41°) were only reached in very high latitudes (i.e., northern Siberia); in latitude, they span 1°-2.5° on average, with a maximum extension of 10° (the Qiangtang block during the Carnian).These units are also used herein and are referred to as localities or nodes.Excluded from the original data set were the Permian records and the records of Dienerian age (late Induan) because of the very low number of records, and nodes with fewer than three genera.
For a subsequent assessment of the impact of evolutionary rates on the longevity of the biogeographic provinces, I determined for each node how many of its genera have their first appearance (FA) in the geologic stage to which the respective node belongs.Permian occurrences were added back into the data set when counting the number of first occurrences to avoid inflated origination in the oldest Triassic stage.To analyze the relationship between faunal similarity between two nodes on the one hand, and their geographic and temporal distance on the other, the great-circle distance between the paleocoordinates of any two nodes was calculated using the haversine formula (the paleo-coordinates were originally calculated using PointTrack version 7.0 [Ke et al., 2016]).The time difference between any two nodes was calculated for the midpoints of their respective geologic stages, using the International Commission on Stratigraphy International Chronostratigraphic Chart version 2016/12 (Cohen et al., 2013).

NETWORK ANALYSIS
The present approach uses weighted networks, in which the localities represent the nodes, and the links between them indicate not only the presence of a connection, but are assigned a value (the weight) indicating the strength of the connection.These weights were calculated using the Bray-Curtis distance measure.They range from zero when the two nodes are identical in faunal composition, to 1 when they share no taxa, resulting in the counterintuitive situation that a link with a high weight indicates low similarity between the nodes, and vice versa.
A thresholding approach was used to illustrate and quantitatively analyze relationships between nodes (Moalic et al., 2012;Kivelä et al., 2015;Kiel, 2016).The threshold is the maximum weight for the links used to construct the network; links beyond that threshold (indicating lower similarity) are not considered (Kivelä et al., 2015).The result is that at low thresholds, the network is broken up into components, which have strong links within each component but only weak links to the remaining network.This pattern is characteristic for communities within networks (Newman, 2012), and the components were therefore used as the foundation to define paleobiogeographic provinces with both a spatial and a temporal dimension.With increasing thresholds, i.e., with the addition of successively weaker links, the components will "grow" by the addition of further nodes and will merge with other components.The thresholding approach thus allows looking at how patterns change along a sliding scale of threshold values, and networks at different thresholds can provide different insights into biogeographic relationships (i.e., Fig. 1).When all components merge into one giant component, the so-called percolation point is reached (Kivelä et al., 2015), and networks at thresholds beyond this point are typically of limited value for delimiting paleobiogeographic provinces.
Construction and most analyses of the networks were performed using the software package EDENetworks version 2.18 (Kivelä et al., 2015); additional analyses were performed using PAST version 2.17c (Hammer et al., 2001).

Paleobiogeographic Provinces
The biogeographic provinces derived from the thresholding approach (Fig. 1) are listed in Table 1, and many of them correspond to previously recognized provinces (Ke et al., 2016).In addition, the network analysis retrieved their durations in geologic time as described in the original study (Ke et al., 2016).This is regarded here as "proof of concept" that network analysis is a viable tool to investigate biogeography through geologic time.Figure 2 shows the network plotted onto a series of paleogeographic maps.It illustrates the distinctiveness and longevity of the Austrazea province compared to all other provinces, and shows that the main breaks in time between the Tethyan provinces are along the boundaries of the Early, Middle, and Late Triassic.These boundaries were originally defined based on ammonites and microfossils from low latitudes (Visscher, 1992;Kozur, 2003).Finding this zonation also among Tethyan brachiopod provinces suggests that the mechanisms resulting in faunal changes among ammonites and the microfossils also affected the brachiopods.

Impact of Geography and Geologic Age on Faunal Similarity
Adding stratigraphic duration to the properties of biogeographic provinces raises the question of whether geography or geologic age is the main force shaping paleobiogeographic pattern.To address this question, the impacts of each factor were first analyzed individually, and then relative to each other.Regarding age, there is a strong positive correlation between the weights of the links and their geologic ranges (ordinary least-square test, p < 0.0001; Fig. 3, left panel).In other words, the closer the localities are in age, the more similar they are in faunal composition.Regarding geography, there is a strong positive correlation between the weights of the links and the geographic distance of their nodes (p < 0.0001; Fig. 3, right panel).Thus the closer two nodes are geographically, the greater their faunal similarity.Because geographic distance was calculated as great-circle distance and thus without taking geographic barriers such as land bridges into account, a second comparison was made using ocean basins to define geographic proximity.Mean and median weights of links between nodes from the same ocean basin are in almost all cases lower than those of links between nodes from different ocean basins (Table 2), confirming the result from the comparison to geographic distance.
In summary, both geography and age influenced faunal similarity among Triassic brachiopods, so which factor had a greater impact?I compared the fractions of links "within ocean basins" to those of links "within geologic stages".Because different numbers of ocean basins and geologic stages are compared (five ocean basins versus seven geologic stages), these fractions need to be compared to some expected values.The expected values are derived from the fully connected network (a hypothetical version of the network in which all nodes are connected to all others) and represent the fractions of links "within ocean basins" and "within geologic stages" in this fully connected network.In the actual networks, there are consistently more links "within ocean basins" compared to the expected value than there are links "within geologic stages" (Table DR1 in the GSA Data  Repository 1 ).This indicates that geographic proximity has a stronger impact on faunal similarity than proximity in geologic age.Network topology supports this conclusion.At thresholds around the percolation point, three geographic realms are evident: (1) Gondwana realm, 1 GSA Data Repository item 2017229, Table DR1 (summaries of the links by geologic stage and ocean basin) and Table DR2 (correlations between sampling intensity and latitude), is available online at http:// www.geosociety.org/datarepository/2017/or on request from editing@geosociety.org.
(2) northern Panthalassa and the Boreal Ocean, and (3) the tropical Tethyan realm (Fig. 1).If geologic age had a stronger impact on faunal similarity, then the major divisions at these high thresholds should have been along stratigraphic boundaries.

Regional Differences in Connectedness
At a threshold of 0.99, the average weights of links within and across ocean basins show that localities in the southern Panthalassic Ocean have by far the highest average degree of connectedness, whereas those in the Tethys Ocean have the lowest (Table 2).This indicates that localities in the Tethys Ocean are the most heterogeneous in faunal composition.This pattern could have various causes, including sampling biases, varying evolutionary rates, habitat heterogeneity, or other physical or biological heterogeneities.Higher evolutionary rates in the   tropics (Jablonski et al., 2006;Kiessling et al., 2010) could result in lower biogeographic connectedness simply because newly evolved taxa are unlikely to have a wide geographic distribution.The average numbers of FAs per ocean basin are in the opposite order as the average weights of the links within those basins: Tethyan sites have the highest average number of FAs, whereas the southern Panthalassic sites have the lowest (Tethys = 6.03; western Panthalassa = 5.33; eastern Panthalassa = 2.85; Boreal = 1.75; southern Panthalassa = 1.6).This indicates that higher evolutionary rates indeed result in lower biogeographic connectedness.
The tropical Tethys Ocean could potentially have had a higher habitat diversity than the other ocean basins: reefs and carbonate platforms were common in the Tethys Ocean, but were largely absent from the other regions (Martindale et al., 2015).Rigorously testing this hypothesis might be possible by compiling data on habitat types at the investigated localities, but this was beyond the scope of the present paper.Uneven sampling can potentially distort the results of any tive approach to biogeography, and network analysis is no exception to this.I tested the data set for latitudinal sampling disparities that could be inflating origination rates in the Tethys Ocean relative to other regions.The nodes showed no correlation between their latitude and the number of individual sites per node (Table DR2).Thus sampling bias is unlikely to have affected the regional differences in biogeographic connectedness.

CONCLUSIONS
Network analysis was used here to investigate the evolution of Triassic brachiopod biogeographic provinces through geologic time.The method retrieved many previously recognized paleobiogeographic provinces; in addition, it (1) recognized the stratigraphic ranges of these provinces, which had only been outlined in a narrative way in the original study (Ke et al., 2016), and (2) allowed an intuitive visualization of the results.Furthermore, the approach allowed the detection and quantification of regional differences in duration and connectedness of the paleobiogeographic provinces, showing that provinces in the tropical Tethys Ocean were short lived and had low connectedness, whereas the highest connectedness and longevity were found in the southern Panthalassic ocean.This pattern is linked to evolutionary rates: FAs of genera were highest in the Tethys and tropical western Panthalassa, and lowest in high latitudes.
Perhaps the most remarkable property of the Triassic brachiopod paleobiogeographic network is that through ~50 m.y. of time, geographic proximity had a stronger impact on faunal similarity than geologic age.It would be interesting to investigate (1) how much geologic time it takes until geologic age plays a greater role (if at all), (2) which role mass extinction events might play in this context, and (3) which effect the taxonomic level (species, genus, family) might have on these patterns.
The strong impact of geographic proximity on faunal similarity emphasizes that present-day biogeographic distributions also have a strong historical basis, and that investigating this historical basis will lead to a better understanding of the biogeography of the world today (Sanmartín and Ronquist, 2004;Krug et al., 2010).Other future research directions in network paleobiogeography include the exploration of directed networks (Leicht and Newman, 2008), modeling of ideal scenarios and comparing them to real world examples, assessing the impact of sampling biases, and many more.I hope that the approach outlined here will be explored further and will result in a better understanding of the dynamic evolution of biogeography through geologic time.
Figure 1.Triassic brachiopod paleobiogeographic network at two different thresholds.Color coding of nodes is by ocean basin; thickness and color coding of links is by weight (thicker and lighter = higher faunal similarity); numbers correspond to those in Table 1.Node letters indicate their geologic age: G-Griesbachian; O-Olenekian; A-Anisian; L-Ladinian; C-Carnian; N-Norian; R-Rhaetian.

Figure 3 .
Figure 3. Relationships between weights of links in Triassic brachiopod network and their geologic ranges (left panel) and geographic distance they span (right panel), at threshold of 0.99.

Figure 2 .
Figure 2. Triassic brachiopod network thresholded at 0.65 plotted on a series of paleogeographic maps (adapted from Scotese, 2004); node and line colors as in Figure 1.