Abstract
The U.S. Geological Survey has constructed a paleontological database for the Great Basin physiographic province that can be served over the World Wide Web for data entry, queries, displays, and retrievals. It is similar to the web-database solution that we constructed for Alaskan paleontological data (www.alaskafossil.org). The first phase of this effort was to compile a paleontological bibliography for Nevada and portions of adjacent states in the Great Basin that has recently been completed. In addition, we are also compiling paleontological reports (known as E&R reports) of the U.S. Geological Survey, which are another extensive source of legacy data for this region. Initial population of the database benefited from a recently published conodont data set and is otherwise focused on Devonian and Mississippian localities because strata of this age host important sedimentary exhalative (sedex) Au, Zn, and barite resources and enormous Carlin-type Au deposits. In addition, these strata are the most important petroleum source rocks in the region, and record the transition from extension to contraction associated with the Antler orogeny, the Alamo meteorite impact, and biotic crises associated with global oceanic anoxic events.
The finished product will provide an invaluable tool for future geologic mapping, paleontological research, and mineral resource investigations in the Great Basin, making paleontological data acquired over nearly the past 150 yr readily available over the World Wide Web. A description of the structure of the database and the web interface developed for this effort are provided herein. This database is being used as a model for a National Paleontological Database (which we are currently developing for the U.S. Geological Survey) as well as for other paleontological databases now being developed in other parts of the globe.
INTRODUCTION
Paleontological data are one of the fundamental pillars upon which geologic maps and stratigraphic columns are constructed. At present very few practicing stratigraphic paleontologists are working in the U.S., making it extremely difficult to get new field collections identified and assessed in the context of previously acquired paleontological data. To remedy this problem, it is imperative to preserve and archive previously generated paleontological data into databases that facilitate entry, retrieval, use, and updating of the data by geologists and other interested parties. This is an extremely time-consuming and almost impossible task for an individual to accomplish using traditional methods, but is feasible using modern information technology.
We modeled the Great Basin paleontological database after our previous work to generate a web-based paleontological database for Alaska (Zhang and Blodgett, 2003; Zhang et al., 2003; Blodgett and Zhang, 2007). Data compilation in Alaska was slightly easier due to the existence of previously published bibliographies on various aspects of the fossil fauna found there (Dutro, 1956; Addicott, 1971; Wilson, 1981). For more modern references in Alaska, extensive literature search was required, utilizing such sources as Georef and the Zoological Record, as well as contacting paleontologists who have actively worked in Alaska. In this regard, we have likewise contacted many paleontologists working in the Great Basin to acquire similar lists of publications pertaining to their work in this region. These publications include not only systematic papers, but also stratigraphic papers; the latter often contain a wealth of paleontologic information in the form of lengthy faunal and/or floral lists. This database, as well as the preexisting Alaska Paleontological Database, is being used as a model for a National Paleontological Database (which we are currently developing for the U.S. Geological Survey [USGS]) as well for other paleontological databases now being developed in other regions of the globe.
The information compiled to populate the Great Basin paleontological database is derived from published articles, unpublished theses, unpublished USGS fossil reports (known also as E&R [Evaluation and Report] reports), as well as released industry data. Acquisition of data from the literature is straightforward. While acquisition of theses is slightly more problematic, interlibrary loans have allowed us to obtain many references of interest. The aforementioned articles and theses are organized by geologic age and taxonomic group in the paleontological bibliography (Blodgett et al., 2007). The only previous fossil compilation in this region (Coats, 1986) dealt solely with Elko County.
The vast collection of E&R reports for the Great Basin remains to be indexed. They were prepared by members of the Branch of Paleontology and Stratigraphy, which was disbanded in 1995 as part of a reorganization of the USGS. The Branch of Paleontology and Stratigraphy personnel were primarily situated in three locations (USGS National Headquarters in Reston, Virginia, formerly based in Washington, D.C, Denver, Colorado, and Menlo Park, California). The only complete set of E&R reports is in Reston; however, some copies are also available in Denver, notably for reports completed by paleontologists based there. We recently were granted access to the archive of E&R reports in Reston and made copies of a significant portion of those relevant to the Great Basin.
Initial data entry is focused on Devonian and Mississippian fossil localities in the Great Basin because strata of this age host sedimentary exhalative (sedex) Au, Zn, and barite deposits (Emsbo, 2000; Koski and Hein, 2004) and Mississippi Valley–type (MVT) Zn-Pb deposits and related occurrences of hydrothermal dolomite (Diehl et al., 2005), as well as enormous Carlin-type Au deposits (Cline et al., 2005). The USGS Metallogeny of the Great Basin Project (Hofstra and Wallace, 2006) used such fossil data to better constrain the ages and depositional environments of the rock units that host these deposits. Fossil localities in Devonian–Mississippian rocks also are of interest because they are the most important petroleum source rocks in the region (Sandberg and Poole, 1978), track the transition from extension to contraction associated with the Antler orogeny (e.g., Dickinson, 2006, and references therein), contain breccias associated with the Alamo meteorite impact (Morrow et al., 2005), and record biotic crises related to global oceanic anoxic events (Bond and Wignall, 2005). We also added information from a new database on Cambrian–Triassic conodonts in the state of Nevada (Harris and Crafford, 2007). An enormous amount of paleontological data has been gathered that remains to be entered.
DATABASE DESCRIPTION
Our database deals with both legacy and current data. E&R reports, published literature, or released industrial reports do not have a standard format or structure. These reports contain data with highly variable spatial or age resolution. Some reports have ambiguous or outdated taxonomic, age, or stratigraphic assignments. Nonetheless, these reports contain a wealth of geological information and are the result of years of field exploration efforts. The database must be designed to take into account problems inherent in the data source. The goal is to preserve the original data, including its inconsistencies and inaccuracy, while allowing for updates and queries in the context of current knowledge. We discuss how we took a modular design approach to construct a database structure that is flexible enough to accommodate many different types of data.
Database Structure
Figure 1 provides an overview of our database structure. The database is composed of seven modules: Locality, Taxon, Age, Formation, Environment, Image, and Reference. Each of the seven modules consists of one or more tables that store a set of related data. The modular design has several advantages. It is easier to develop conceptually because each module represents a coherent set of information. Each module is a self-contained minidatabase that can be developed independently. Once developed, each module becomes a reusable object. They can be plugged into other geological databases using the modular approach. Database functionality can be increased by adding different modules as needed. For example, Image and Environment modules are late additions to our database. Not all modules need to be implemented as a database object. For example, images can be stored either as binary large objects (BLOBs) in a database table or as image files in a file system. There is an ongoing debate among database developers on the merits of either approach. We store the image index system in the database, but the images are saved as TIFF (tagged image file format) files in the Windows file system to avoid having a bloated database. Additional modules, such as web services and account management, were planned. However, they are not included in the current implementation because of funding shortfalls.
The database was first implemented in the Microsoft SQL Server Desktop Engine (MSDE) and later upgraded to Microsoft SQL Server 2005 Express. The database structure can be created in any SQL compatible relational database management system (RDBMS). Complete lists of modules and tables of the database are provided in Appendix tables A and B, respectively. In the following sections, the Locality and Age modules are described in detail to show how they are constructed and how we solve some of the common problems in designing a paleontological database.
Locality Module
The Locality module forms the core structure of our database. Figure 2 illustrates the Locality module and its relationship with Taxon, Age, and Reference modules. Many paleontological databases have adopted a similar structure, depicted in Figure 2 (Markwick and Lupia, 2002). However, there are some meaningful differences. Most databases have a “one-to-many” relationship between the age table and the locality table, limiting a locality to a single age assignment. In reality, it is not uncommon to have multiple reports for a locality and consequently different age assignments for the same locality. In order to track who assigned what age to a locality it is necessary to have a many-to-many relationship between the above-mentioned two tables. The same relationship is also applicable to locality comments and taxon assignments. Please note the difference between the description field of the Locality table and the comment field of the Locality_comment table. The description field records the field observation, such as location and outcrop description, of a locality. One locality should only have one description. If a locality has different descriptions from separate reports, they should be combined into a single description field. On the other hand, the comment field contains interpretations made in each report regarding a locality. It is important for users to see the reporting history of a locality with interpretations by various authors. Tables, such as Locality_age and Locality_comment, also help to keep an audit trail of any updates. When new data or interpretations become available, it can be appended to the locality as a new comment or age assignment by a new reference. Tracking changes in interpretation is an important feature that encourages online collaboration (Johnson et al., 2005).
Unique Locality Identifier
Lehnert et al. (2000) discussed the importance of unique identification of samples in a database. Data integration depends on unambiguous identification of samples. The term locality in our database is synonymous with the term sample. We chose “locality” over “sample” because the former is commonly used in the literature from which we derived our data. Localities are commonly identified in the field with a field number. Additional numbers, such as USGS locality number or museum number, are often added to a locality as different investigators analyzed the data. Although none of the numbers are guaranteed to be unique, they represent the only available identifiers for localities. Among them, field number is the most commonly applied. A field number from E&R reports is often written in the form “year + initials + a number” such as “67ABa-10a.” The same field number can appear in several variant forms depending on the variation in delimitation. For example, the field number “67ABa-10a” can be written in following ways: “67ABa10a,” “67 ABa-10-a,” and “67-ABa 10(a).” Humans can recognize that these variants represent the same number, but it is difficult for a machine to do so. A special program has to be written for the database to enable it to recognize the common variants of a field number. After normalizing, field number provides the most reliable identifier for localities in our database.
Age Module
The GeoWhen Database (www.stratigraphy.org/geowhen) provides the most comprehensive geological time scale with absolute age assignments for every age unit ever published. Although many of these age assignments are tentative and will likely be modified continuously, they help to anchor age unit boundaries in absolute numbers that are clear to everyone. Unless very high resolution is required, GeoWhen offers a convenient tool for most paleontological application. Using the system provided by the GeoWhen Database, any age assignment to a locality can be calibrated to a time interval. This makes legacy data integration a straightforward process.
To adapt the time scale from GeoWhen to our database, the text version of the Geological Timeline data set was downloaded from the GeoWhen Database Web site. The content of the downloaded file was then parsed and imported into three tables: an Age_unit table that stores age unit names; an Age_time table that stores age boundaries in Ma; and an Age_unit_time table that establishes the lower and upper boundaries of an age unit via foreign keys, i.e., “age_id” from the unit table, “time_start_id” and “time_end_id” from the time table. The Age_version and the Age_rank tables were added to store the versions of time scale and unit ranking systems, respectively. The “age_version_id” and “age_rank_id” foreign keys in the Age_unit_time table help to track the rank and the boundary time assignments of a particular unit according to different versions of geological time scale. Not all units are given the same rank from version to version. The Region table is added for the purpose of filtering the list of unit names during data entry for a region.
The Age module as illustrated in Figure 3 is very versatile. It allows any legacy record to be entered with the original age assignment even if the age unit is obsolete. Searches based on the current time scale will bring up any related records whether or not the original age assignment is based on a local or obsolete age name. A search based on a beginning and an ending stage will include all records that are assigned to stages bracketed between the two bounding stages. The system also allows for easy update for (1) an age boundary that affects one or more age units, (2) correlation between two units, (3) parent-child relations (e.g., a stage is reassigned to another period), and (4) the hierarchical rank of an age unit. Any update on an age unit can be made in one place but affects all the records assigned to the age unit. This age module can be integrated into any geological database that requires a time scale.
Expanding Database Functionality
Modules can be added to the database to expand its function. The Environment module was created after the database was already constructed. The Environment module enables the assignment of sedimentary environment of deposition to a locality based on either Cook's carbonate platform model (Cook and Corboy, 2004) or Boucot's benthic assemblages (Boucot, 1975). One pitfall we have encountered in this regard is that many paleontological papers from the Great Basin are sorely lacking in complementary sedimentological data or interpretation.
Other modules such as a geochemical module can be developed for the database if there is a need. A future goal is to expand the fossil database into a comprehensive sedimentary rock database that can be linked to other subject data sets to enable geologists to gain a better understanding of the relationships between various geological features and associated mineral resources in the Great Basin.
WEB APPLICATION DESCRIPTION
The Great Basin Paleontological Database is built as a web application based on a three-tiered architecture: (1) a backend relational database, (2) a middle tier business logic, and (3) a front-end web application. The web application is developed using ASP. Net technology and includes a geographic information system (GIS) module that was developed using AspMap (www.vdstech.com/aspmap.htm). The web application comprises a business layer and a presentation layer. Business logic is implemented through a number of classes in the business layer while all the web pages are generated from the presentation layer. The site is hosted at www.alaskafossil.org site during its development. It will be transferred to a USGS server upon completion.
User Interface—Data Entry
When possible, data are batch loaded into database tables. For example, a short program was written to import over 2600 records from the conodont color alteration index (CAI) data set of Harris and Crafford (2007). However, most legacy data in publications and E&R reports require manual entry. Eight data entry forms have been developed for manual data entry into the following modules: Locality, Reference, Taxon, and Image. Entry forms associated with other modules will be developed later.
Forms authentication is used to guard against unauthorized access to data entry forms. Once a user logs in with the proper password, data entry menu items become visible. Because there can be simultaneous data entries by different users, it is important to prevent users from entering duplicate records. Each form has a builtin function to warn users about a possible duplication during the data entry process. For example, when entering a reference, the author, publication date, and the reference title are used together to check whether a reference record already exists in the database. The check for duplicate records is carried out automatically immediately as a user fin-ishes the required field. The locality entry form relies on the locality field number to check for possible duplicate records. As a user types in a field number, the system automatically checks if the same number already exists in the database. Once the system detects a duplicate field number during a data entry session, a popup list displays the locality number and the associated reference source. There can be three possible actions when a duplicate field number is encountered. (1) The same record from the same reference source is already entered. The record should not be entered again. (2) Although the same locality is entered, it is from another reference. In this case, a relevant entry from the popup list mentioned above should be selected. The selection automatically copies the description and location data of the locality into the entry form. The interpretive data, such as the fossil occurrence list and age assignment, can then be added to the locality. (3) For multiple localities that share the same field number, a check box labeled “Stop dup check” can be selected; thus a new locality can be entered with the same field number.
In entry form design, there is a trade-off between how quickly a record can be entered versus how easily the data entry process can be followed. Our data entry forms are designed for speed. A single page form is preferred over multipage or multitab forms. Data entry fields are organized in sections on a page. The locality entry form (Figs. 4 and 5) has four sections: description and location; age, formation, and environment assignment; locality type; and fossil occurrence. The latter two sections are expandable sections. When the well or measured section is selected, the locality type section expands to display additional fields for entering such data as interval or name for a measured section. A click on the “Enter Fossils” button expands the fossil occurrence section. Three dropdown lists, group, genera, and species, allow the user to quickly pick a taxon from the lists. The fourth dropdown list contains such terms as “cf.” or “taxon?” to help define the confidence level of a fossil identification.
User Interface—Browsing and Searching
We strived to design an intuitive user interface. Web pages are organized in an easy to follow work flow. Top menu items such as Locality, Age, Taxon, or Map allow users to browse the complete data set in either a tabular form or in a map view. Hot links and dropdown lists are judiciously provided on each page, helping users to either carry out quick queries or jump to a relevant page. Users can often find what they are looking for in a few clicks. The Search page provides advanced search capability. Here data can be queried by a combination of criteria, including author, collector, age, formation, taxon, location, or locality number.
The interactive map is both a query and visualization tool. The map filter panel allows users to view a plot of paleontological data by age, taxon, environment, or conodont CAI on a map. Figure 6 provides an example of the database query and display capability and shows the distribution of Mississippian conodont CAI values from Harris and Crafford (2007). Users can vary the time period to identify patterns and trends of data sets on a map. Better base maps will be added later to enhance the visualization experience. Data points on the map are selectable and they are linked to the Locality Detail page, which displays the complete information entered for a locality. If the locality is a part of a core or measured section, its stratigraphic position in the core or section is shown in a column along with related localities (Fig. 7).
Query results on localities are downloadable. Once a user narrows the locality list by one or more search criteria, a download button appears at the bottom of the locality list on the Locality page. Clicking the button leads to the Download page. The default download data set includes eight fields: field locality number, USGS locality number, county, quadrangle, latitude, longitude, age, and formation. Users can choose to include other fields such as the associated authors or fossils for each locality. The data set is downloaded as a tab delimitated text file, which can be imported into a spreadsheet program. Because the coordinates are given in decimal degree, the locality data can also be easily incorporated into a GIS application such as ArcGIS to create a paleontological data layer on a user's machine.
The description above offers an overview of our web application. Interested readers are encouraged to visit our web site to experience and examine the user interface design in detail.
APPLICATIONS OF THE GREAT BASIN PALEONTOLOGICAL DATABASE
The utility of the Great Basin paleontological database is multifaceted. Primarily one can conceive of it as providing basic ground truth for fundamental future geologic mapping at any scale within the region. However, many other uses are also obvious. Many land management agencies (e.g., U.S. Park Service, Bureau of Land Management, U.S. Forest Service) have a great interest in preserving and protecting paleontological resources within their domains, and this database would provide them with a wealth of information to which they otherwise have no access.
Another important aspect of the database is its application to the determination of Paleozoic paleogeographies of the Great Basin region, including recognition of types of differing carbonate shelf morphologies (i.e., homoclinal ramp versus distally steepened ramp versus rimmed platform), an important issue with mineral and oil exploration workers. A critical study of the communities represented digitally on map projections should identify shelf-to-basin transitions and their character, e.g., in the case of the Silurian and Lower Devonian, by plotting shelfal shelly megafossil associations (communities) against those of more offshore graptolitic facies community associations. Being in the initial phase of data entry, we focused on data from the Lower Devonian, because it is an interval for which considerable published information now exists (i.e., rugose corals: Merriam, 1973, 1974; Pedder and Murphy, 1998, 2003, 2004; brachiopods: Johnson, 1970, 1986, 1990; Johnson and Kendall, 1976). Other faunal groups of significant paleoecological significance also are well represented in this interval, but have not yet been entered (i.e., ostracodes: Berdan, 1977, 1986; Kennedy, 1977; gastropods: Blodgett et al., 1988; trilobites: Haas, 1969). Figure 8 shows the distribution of paleoenvironments (based in large part on paleocommunity data) we currently recognize for the Emsian Stage (upper Lower Devonian). According to the data we have entered, we recognize both platform and slope environments, but no platform margin environments can be recognized, in accordance with the absence of coral-stromatoporoid reef complexes during the Emsian of the Great Basin. In contrast, Emsian age coral-stromatoporoid buildups are readily recognized along the platform margin of the North American craton in both Alaska and Yukon Territory (Clough and Blodgett, 1984, 1988a, 1988b). The absence of Emsian coral buildups in the Great Basin was confirmed by A.E.H. Pedder (2007, personal phone communication), a Devonian coral specialist with significant experience with Great Basin coral faunas. Currently we believe the carbonate depositional model that best approximates the data of Emsian strata of the Great Basin is that of a homoclinal ramp, or, less likely, a distally steepened ramp. This is in contrast with the Lochkovian (lowermost stage of the Lower Devonian), when a rimmed platform model was applicable to the Great Basin (portions of the Lone Mountain Dolomite representing a reef complex). The absence of coral-stromatoporoid buildups in post-Lochkovian Lower Devonian strata of the Great Basin is possibly attributable to reduced seawater temperatures during that time (Pedder and Murphy, 2004), compared to warmer temperatures before and after this interval.
Paleobiogeographers can also benefit from the database, as it will provide them with faunal (or floral) lists from a voluminous data set that would otherwise take a researcher many years to accumulate and catalog. In summary, this database and its web interface for data entry, queries, display, and retrieval will provide an invaluable tool for future geological investigations in the Great Basin. Such a tool could ultimately be developed to facilitate utilization of fossil data in investigations in other regions in the U.S.
The initial efforts to gather data and construct the web interface greatly benefited from the advice of many geologists and paleontologists who have an active working interest in the geology of the Great Basin. Discussions with William R. Page, A. Elizabeth J. Crafford, Arthur J. Boucot, Norman J. Silberling, George D. Stanley, Jr., Forrest G. (Barney) Poole, Jared D. Morrow, and Charles A. Sandberg were especially fruitful.