Development of an integrated paleontological database and Web site of Florissant collections, taxonomy, and publications
Published:January 01, 2008
- PDF LinkChapter PDF
Herbert W. Meyer, Matthew S. Wasson, Brent J. Frakes, 2008. "Development of an integrated paleontological database and Web site of Florissant collections, taxonomy, and publications", Paleontology of the Upper Eocene Florissant Formation, Colorado, Herbert W. Meyer, Dena M. Smith
Download citation file:
A detailed survey of collections and publications for the Florissant fossil beds (Colorado, USA) forms the basis for developing a new relational database and Web site that documents information that had become widely scattered following 130 years of scientific study at Florissant. More than 1700 species that remain valid, mostly of plants, insects, and spiders, had been described in more than 300 publications, and these published specimens had been dispersed among ∼15 museums. Some of these specimens were not well documented in original publications and many of the type specimens had never been illustrated.
Catalog data were compiled on-site at museums, specimens were photographed, and all of the publications referring to Florissant specimens were located. Taxonomic classification of the fossils was updated to modern concepts. A relational database incorporates the data into five core tables for specimens, bibliography, references to specimens in publication, taxonomy, and images. The database allows for complex searches to interrelate these categories, enabling new research and facilitating collections management. Examples show that the largest number of scientific publications and new species descriptions appeared from 1890 to 1920 and that most of the originally described insect species, but only about half of the plant species, still remain valid and unrevised. Digital images of the fossils and digital files for pre-1923 publications form an archive that is linked to the data records. A Web site makes the database publicly accessible for technical use, and also provides a less complex application for the layperson as well as a new college-level curriculum.
The paleontological site at Florissant, Colorado, is world renowned for its high taxonomic diversity, primarily of fossil plants, insects, and spiders. The fossils occur in lacustrine, fluvial, and lahar deposits of the Florissant Formation, which is dated at 34.07 Ma and represents the end of the late Eocene (Evanoff et al., 2001). Tremendous scientific attention by numerous paleontologists for more than a century resulted in the description of hundreds of species, and this information was documented in publications and collections that were so widely dispersed both in the literature and between different museums that it became very difficult for researchers to access easily.
In order to locate all published specimens and to create a taxonomic inventory that facilitates specimen-based research at Florissant, the National Park Service (NPS) initiated a project in 1995 to survey all of the existing museum collections and bibliographic references and to integrate these into a single relational database that includes museum collection, publication, and updated taxonomic information (Meyer, 1998; Meyer et al. 2002; Wasson and Meyer, 2004). Scanned photographic images for almost all of the published specimens, and digitized copies of many of the relevant publications, are also included. This database is accessible as a public Web site (http://planning.nps.gov/flfo) that primarily serves the needs of researchers, although other applications are being developed to better accommodate educators, students, and the layperson (once completed, these Web sites will be available by link from the Web site for Florissant Fossil Beds National Monument at www.nps.gov/flfo).
This paper discusses the methods by which the Florissant database was developed, as well as its applications to research, education, and interpretation. It is hoped that this will provide a guideline for similar projects that seek to develop databases for other significant fossil sites with similarly complex publication and collection histories.
BACKGROUND AND PURPOSE
Most of the existing collections of Florissant fossils were made during the late 1800s and early 1900s, long before the establishment of Florissant Fossil Beds National Monument in 1969 (Meyer, 2003; Veatch and Meyer, this volume). These collections consist of at least 40,000 specimens that are now accessioned into more than 25 museums, ∼15 of which include the published type specimens that define new species. About 1700 megafossil species that are still considered valid in the literature have been described, and many more species were described but later placed into synonymy in subsequent publications. These specimens have been referenced in more than 300 publications. Almost all of the species represented by these type specimens are unique to Florissant. As the size and dispersal of collections, the number of described taxa, and the body of scientific publications grew over the decades, so too did the complexity and challenge of easily finding this information.
There were many reasons for developing an integrated database to document fully the museum data, citations in publications, and taxonomic placement of the fossils:
A complete compilation of all published specimens listing the museums in which they were housed did not exist, and often the original or subsequent publication of a specimen did not indicate the repository and/or catalog number.
Only a partial bibliography documenting the publication of Florissant specimens had been compiled.
Many specimens had been cited in multiple publications, with the later publications sometimes assigning new taxonomic combinations. These later publications included taxonomic monographs that were not specific to Florissant, yet contained one or more citations of Florissant specimens.
Some museums lacked database programs and relied upon old paper catalog records that had no search capability.
Museum records or specimen labels often included erroneous data, but such inconsistencies frequently could be resolved by comparison with publication data.
Entire collections had moved from one museum to another, and new catalog numbers had been assigned in the process. For example, the entire Princeton University collection from Florissant, which included many type specimens, had moved to the National Museum of Natural History and Yale Peabody Museum.
Corresponding halves (i.e., part and counterpart) of the same specimen in many cases had become disassociated and were in different museums with different catalog records and no cross-referencing to document the existence of the other part.
Half of the 5190 published specimens had never been illustrated in the original publications, including 444 holotypes and 1179 syntypes (although these numbers were unknown until the database was completed). Those that had been illustrated often were depicted only with line drawings or artistic renditions. Photographic documentation was needed in order to provide a consistent format for illustration, including close-up details.
No complete species list had been compiled to assess taxonomic diversity, which is fundamental to understanding the composition of the paleocommunity.
Many taxonomic concepts had changed since the most recently published taxonomic assignments for some of the Florissant species, yet there had been no compilation to update this information. For example, many of the genera and some of the families to which Florissant fossil species were assigned decades ago subsequently had been subsumed under other names, based upon the taxonomic classification of extant organisms, yet this was not documented in the literature pertaining specifically to Florissant.
These various factors had resulted in a widely scattered body of data, both throughout the literature and among different museums, making it difficult for paleontologists to easily locate, access, and correlate the basic information about Florissant's paleontology.
Although many databases were in use by different repositories at the time we began this project, none of these had a structural format that could be easily adapted for our effort. For that reason, it was necessary to create a customized structure for the Florissant database.
DEVELOPMENT OF THE DATABASE
Development of the database involved two separate phases. The first of these was to acquire data and photographs at the museums where the Florissant collections are housed, and to locate and copy the publications in which those specimens had been referenced. A later, second phase was to develop a relational database in order to simplify the organization and housing of these data and to serve this information via the internet.
Acquisition of Data
Compilation of Specimen and Publication Data
The purpose of the database was defined at the outset to include all of the previously published specimens from Florissant, including type specimens and specimens that had been figured or specifically mentioned in publication. Also included was a selection of unpublished specimens that were unusual or particularly well preserved, as well as specimens that could potentially represent currently undescribed new species.
Development of the museum specimens table for the database involved on-site collection inventories at all of the museums known to house published Florissant specimens in order to compile collection data and complete the photography of all published specimens. This collections-first approach was used instead of surveying the literature as the primary step because, in many instances, the publications did not indicate the whereabouts of the specimens that were described therein. In most cases, a copy of each publication in which a located specimen had been referenced was obtained while on-site (usually in the museum's or university's library), and specimens were compared in-hand with the information and illustrations in those publications. Photocopies of these publications were added to the library at Florissant Fossil Beds National Monument, and many of these were later digitized for the database archive.
Each specimen was correlated to every scientific publication in which it had been referenced, and in many cases specimens had been referenced in multiple publications. These were differentiated by defining a field for publication status (see “status” under Reference Table in appendix), which indicates the sequence of a specimen's publication history by the terms original (indicating that the record shows the first publication in which the specimen was cited), most recent (indicating that the record shows the most recently published taxonomic assignment for the specimen), and intermediate (indicating that the record shows a published treatment of the specimen between its original and most recent publications). Frequently, each of these various publications assigned the specimen to a different taxonomic name, either by placing it into synonymy or by creating a new combination. In some cases, it was necessary to follow the International Codes of Botanical and Zoological Nomenclature in order to resolve particular nomenclatural problems that resulted from inconsistencies in the publication history for a specimen.
Specific data fields were defined for compiling the data (see appendix), and records were made on-site using paper worksheets. Each specimen was assigned an inventory number as a unique identifier to provide a means for sequentially tracking specimens in the database and in the photo log. This method proved to be more effective than using catalog numbers for several reasons: (1) some specimens already had multiple catalog numbers; (2) some catalog numbers needed to be changed as problems were discovered during the course of the inventory for this project; and (3) some specimens had different non-sequential numbers for parts and counterparts of the same specimen in the same museum, yet the objective of the inventory was to combine these. Only when corresponding halves of the same specimens were in different museums were separate inventory numbers assigned.
The collection size for each museum was estimated by extrapolating from specimen counts in several representative drawers that were categorized according to variations in specimen sizes and arrangement. This estimate totals ∼40,000 specimens. Estimates for each museum are shown in Table 1.
The organizational arrangement of collections varied from one museum to another in the way that specimens were grouped. For example, published specimens in the fossil plant collections at the U.S. National Museum of Natural History (USNM) and the University of California Museum of Paleontology (UCMP), and the fossil insect collection at the American Museum of Natural History (AMNH), were arranged according to the publication in which those specimens had been first described and included both type and other referenced specimens. The Museum of Comparative Zoology at Harvard (MCZ) had a collection of fossil insects that grouped all published specimens and was arranged taxonomically; in many instances, however, this secondarily resulted in a correlation with publications that had dealt with particular taxonomic groups, and although these were not always the original publications in which the fossils had been first described, it usually was easy to trace back from these secondary publications to locate the original description. The University of Colorado Museum (UCM) and the National Museum of Natural History had type collections of fossil insects that were arranged alphabetically by genus with no correlation to publication, yet other published (non-type) specimens were difficult to recognize because they were intermixed and unmarked in the general collections. Some collections, such as the fossil plant type collection at the National Museum of Natural History, had specimen labels that traced an individual specimen through multiple publications, and this greatly facilitated the inventory process. Because of the variability in collection organization, the approach to inventorying the Florissant collections had to be customized for each particular museum. In some cases the on-site surveys resulted in an inventory that was closely associated with the publication histories of the specimens, and in other cases the surveys resulted in data entries for genus, species, and type status, but the correlation with publications remained unknown.
After all of the specimens from the collection inventory had been entered into the developing database, the publications were checked again in order to correlate the unmatched type specimens (i.e., those that had been entered only as genus, species, and type status) to the appropriate publications in which they had been described, and to search for specimens that had not been located on-site. Those specimens that were documented in publication but had not been located in the collections were incorporated into the database and are indicated in the field for catalog number as “unknown [specimen has not been located]” and in the field for object status as “apparently missing [specimen has not been located in any of the museum surveys].” In most cases, these were probably originally in the collections of the same museums that were surveyed, but have since been lost. We recognize that our method could have missed some specimens and/or publication records altogether, although considering the wide scope of the museum inventory and the number of museums included, it is unlikely that more than a few such specimens still exist among major museum collections.
The inventory of collections was done primarily by the first author between 1995 and 2003, and involved a total of ∼250 days spent at 17 museums to examine and photograph nearly 5200 specimens and to record accompanying catalog and accession records (i.e., at a rate of ∼20 specimens per day). An additional 250 specimens that are referenced in publication but that are currently missing from collections or could not be located were also included in the database. Further, ∼250 palynomorphs were included based upon previous photographic documentation in publication, and ∼500 unpublished specimens were included. Currently, the database includes a total of 5663 specimens (Table 1), ∼5200 of which have been referenced in publication. Of these, more than 3750 had been designated as various type specimens (e.g., holotype, lectotype, syntype, paratype) in the original publications, but ∼325 of these were subsumed by synonymy in subsequent publications.
Compilation of Bibliography and Library
At the outset of the project, a substantial bibliography of many of the scientific publications dealing with Florissant's paleontology already had been compiled from sources such as the GeoRef database and the archive collection of the late F. Martin Brown (housed at Florissant Fossil Beds National Monument). Photocopies of these references were acquired through interlibrary loans or from the library of the U.S. Geological Survey in Denver. As the museum survey progressed, other relevant publications were located and copied from various sources, including (1) information on specimen labels or reprints filed with collections; (2) citations in various related publications, such as the Treatise on Invertebrate Paleontology (Carpenter, 1992); (3) the late Frank M. Carpenter's taxonomic card file and reprint collection, as well as part of the reprint collection of the late Samuel H. Scudder, in the MCZ at Harvard University; (4) research in the Ernst Mayr Library at Harvard University; and (5) reprints and specialized knowledge of the literature provided by museum curators. Whenever possible, these reprints were in-hand as a source of verification when each individual fossil specimen was examined.
Compilation of Updated Taxonomy
Data from the survey of collections and publications provided the basis for compiling the first complete species list for Florissant. This list was derived from the taxonomic treatment of type specimens in the most recent publications in which they had been cited, although many of these most recent publications were more than a century old, and many taxonomic concepts, particularly at the levels from genus to family, had been modified over the years. The taxonomic component of the database was developed to update the taxonomic classification into more contemporary, consistent terminology, and in many cases this supersedes the taxonomic information from the older publications.
All of the fossil plants described by early workers such as Lesquereux (1883) were thoroughly revised by MacGinitie (1953) and many of these revisions were examined again by Manchester (2001). Because these treatments were relatively recent, almost no updating of generic names was needed for the plants. Generic names were verified according to Mabberley (1997), and compilation of names for higher taxonomic ranks followed Takhtajan (1997) for the flowering plants. By contrast, many of the fossil insects and spiders had not been treated in publication since some of the early workers such as Scudder (1890, 1893, 1900). Taxonomic classification of many of the insects at the generic level and higher had been revised since these early works, and some of the names had been changed on the basis of more recent taxonomic studies of extant members of these groups. It was therefore necessary to examine each generic assignment for the insects (Boyce Drummond, unpublished report for the National Park Service) and to update some of these generic names according to sources such as Nomina Insecta Nearctica (Poole and Gentili, 1996–1997; www.nearctica.com/nomina/main.htm), and higher taxonomic ranks according to Borror et al. (1989). In such cases, the names are based on revisions of extant ma terial, yet the Florissant fossils were not individually reevaluated to determine with certainty whether or not they necessarily possess the characters that conform to those generic changes. Such work on the fossil insects, comparable to MacGinitie's (1953) revision of the plants, will take years of detailed research to complete. The resulting updated taxonomic list for Florissant incorporating these changes was reported previously (Meyer, 2003). Our conservative approach to higher taxonomic rankings in the database follows the widely recognized classifications of Takhtajan (1997) and Borror et al. (1989), although of course it is probable that the database will be revised in the future to incorporate newer phylogenetic approaches to systematics as these become better established.
Compilation of Digital Archive
At the time the project began in 1995, the technology for digital photography was still new and developing rapidly, and it was decided instead that all images would be taken as photographic transparencies using Kodak Kodachrome film. Kodachrome 40, a tungsten balanced film, was used initially, but when Kodak discontinued this product, Kodachrome 64 was substituted in conjunction with an indoor tungsten light filter. Kodachrome was selected because of its known longevity for color stability, which far exceeds that of other transparency films. All photographs were taken in duplicate, and in some cases, the duplicates were taken on separate rolls of film as a security measure, which proved to be beneficial. All of the original transparencies from this project, totaling ∼12,000 mounted slides, including the set of duplicates, are in the collection at Florissant Fossil Beds National Monument. All of these were scanned with a Polaroid Sprintscan® slide scanner, and these digital images were incorporated into the database archive. Because of limitations on the capacity for storage of numerous large digital files at the time the scanning was done, most of the images were scanned at low resolution and are ∼500 × 300 pixels.
As required by several of the individual museums, permission was needed to use images of the specimens and to make the collection data available on a Web site. For many specimens, this permission does not extend to reuse of images or data by users of the Web site. For that reason, it always remains the legal responsibility of the Web site user to contact the individual museum for any subsequent use of an image or data, including use in scientific publications. A clear statement of warning to this effect appears on every page within the Web site. Inclusion of all museums was critical, because if even one museum's specimens were excluded, it would compromise the integrity of the database and limit its utility. It has been mutually beneficial, however, and the museums have been able to access their own data from the Florissant database in ways that their own databases may not have provided. Exports of the Florissant data and images have been provided to several museums to aid them in developing or expanding their own databases.
Almost all of the publications in which the Florissant specimens were referenced were obtained as originals or copies for the library at Florissant Fossil Beds National Monument. As a component project for the Colorado Digitization Project, more than 186 publications were digitized (∼4000 of the 12,000 pages of relevant publications about Florissant paleontology), and these range in size from large monographs such as those of Scudder (1890) and MacGinitie (1953) to the many very short contributions of Cockerell. The publications that were incorporated into the database archive are those that were free of copyright restrictions (primarily those pre-dating 1923) or that were used by permission. A large portion of the scientific work at Florissant was done in the late 1800s and early 1900s, however, and hence the archive captures many of the most significant publications. It is noteworthy that these pre-1923 publications included more than 1500 original descriptions of Florissant's new species, although many of these species were subsumed by synonymy in the later publications.
Design of the Database
Structural Design and Functionality
In order to synthesize the widely dispersed information about Florissant specimens and publications and make it more accessible to researchers, a relational database management system (RDBMS) was developed. This database was designed to facilitate the kinds of complex searches that are needed to document the collections and their associated literature, which in turn provides a census of the taxonomic diversity at Florissant.
In this context, a RDBMS is defined as a series of data tables with defined relationships among the tables, and among the queries, forms, and programming code that are used for data entry, management (e.g., quality assurance and control), analysis, and reporting. Readers are referred to Hernandez (2003) and Riordan (1999) for an introduction and overview of such systems. This section focuses primarily on the underlying tables and their relationships, and describes how this information is served over the Internet. The tables are the core of the system, whereas the Web interface is what most users will see. The desktop forms and code that are used for data entry are of lesser importance because they frequently change to accommodate hardware, software, and user needs.
The Florissant database addresses five information elements: (1) museum specimens, (2) the bibliography of publications, (3) references to specific citations of specimens in those publications, (4) current taxonomy, and (5) a digital archive of the specimen photographs and copies of scientific publications. By housing all of the five elements in one system, it is possible to link and manage all of the information efficiently. The design of this relational database was entirely customized to accommodate these information needs, and therefore was not modeled after any other system that existed at the time this project began in 1995.
The table structure of a relational database for a particular application could potentially take various forms, although the ideal database design is normally one that closely reflects the real-world system that it is attempting to describe. The design and success of this database system therefore depended on five core tables that we believe provide a sound conceptual and functional model for the information elements that were needed to make this an effective paleontological database. These tables, their attributes (or fields), and the relationships between them are shown in Figure 1.
The database framework illustrated by Figure 1 utilizes basic relational database theory, design, and symbology. This framework emphasizes that the power and efficiency in a relational database comes in part from storing each information element in only one location, where tables are linked together using primary keys (unique identifiers) and foreign keys (references to the unique identifiers), and “1” and “∞” represent cardinality between tables, or one-to-many respectively.
Contents of the Tables
Specimens. The museum specimens table (tblSpecimen) contains information inherent to each individual specimen as an object, such as the name of the museum where it is housed, current and past catalog numbers, accession number, collector, locality, and object status. These data are similar to those in other paleontological collections databases, including the University of California Museum of Paleontology (http://bscit.berkeley.edu/ucmp/, February 2006) and the Florida Museum of Natural History (http://www.flmnh.ufl.edu/databases/, February 2006). The unique inventory number that was assigned during the museum collections inventory was used as the primary key in the specimens table, and as a linked foreign key in the images and references tables (Fig. 1).
Bibliography. The bibliography table (tblBibliography) contains standard bibliographic information for almost all of the scientific publications pertaining to Florissant. These fields include author, year, title, journal name, volume number, book reference, and page numbers. A list of topical keywords is included to facilitate search capabilities.
References to specimens in publications. The reference table (tblReference) serves as a linking table between the standalone specimen and bibliography tables (Fig. 1). As such, it contains foreign keys that link to the primary keys in both of these other tables. It also describes the many-to-many relationship between the specimen and bibliography tables, where one specimen can be referenced in multiple scientific publications, and/or one publication can refer to many different specimens. The table holds information that documents the description or citation of each specimen in a particular publication, the taxonomic classification that is used in that publication, and the status of the specimen in that publication (i.e., original, intermediate, or most recent). Each specimen is correlated to every scientific publication in which it has been referenced, and the table then links to these publications in the bibliography table. Many specimens are referenced in multiple publications, and often these various publications assign the specimen to a different taxonomic name.
Taxonomy. The content of the taxonomy data table is conceptual in nature and is based on updated classification schemes as discussed previously. The table contains hierarchical names for various taxonomic ranks of the Florissant organisms, and it is linked to the actual specimens in the specimen data table (Fig. 1). These names sometimes differ from those to which a specimen may be linked in the reference table if the updated taxonomic name is more recent, and previous taxonomic names to which a specimen was referred in publication must be searched from the reference table. The taxonomy data table is linked to all of the museum specimens, and each specimen has one taxonomic record.
Digital archive. The digital archive is composed of two types of digital files: images of specimens, and copies of the publications. Each specimen can have one or many related images. To accommodate this, the image table (tblImage) contains the name of each image, the file path to where it is located, and information about when the photo was created and by whom. Images are stored in a subdirectory on the same server as the database. The images that are accessible by Web site are jpeg files at low resolution, and this is intended to limit third parties from using them for high-quality reproduction, in accordance with the conditions of the permission from some of the museums whose specimens are included. Researchers can request a high-resolution image from Florissant Fossil Beds National Monument once they have obtained permission for use from the appropriate museum. In total, the database contains more than 6000 photographic images of fossils.
Digital copies of scientific publications, as documented in the bibliography, are stored in portable document format (pdf). There is always a one-to-one relationship between the bibliographic citations and the digital document (except for one very large document that needed to be divided into three separate parts), and therefore the link to the digital document is maintained in the bibliography table.
Keeping Pace with Advances in Technology
One obstacle to the optimization of the database during its development, particularly with the desktop application used for data entry, was the rapid change in software, hardware, and NPS standards regarding information systems. Various programs were used throughout the development of the desktop database application owing to the changing needs of the project and in particular the internet compatibilities of the different programs. Initially, the database was assembled using Blackwell Idealist® because of its ease of data entry and simple database structure. As the information needs became more complex and the data were to be served over the internet, the data were migrated to Microsoft Access® using a customized modification of a natural resources management system application known as IRMS (Integrated Resource Management System). Although this application proved to be useful through the inception of the Web site in 2002, changes in versions of Access and the discontinued support of the IRMS system by the NPS forced another revision of the database to its current relational data structure in Access. Further modifications are currently under way, as discussed below.
APPLICATIONS OF THE DATABASE
The database has been adapted into several levels to allow for a variety of users. The desktop version of the database accommodates technical users at the NPS while the Web interface (http://planning.nps.gov/flfo/) is available to the general public. This Web site provides tools for simple searches to obtain data for research and collections management, and it has been instrumental in developing a non-technical interpretive Web site and an educational curriculum (which will be available by link from www.nps.gov/flfo). The database also provided the basis for a book about Florissant paleontology (Meyer, 2003). It should be noted that the tabulation of data for tables and figures presented in this paper is based on the January 2006 version of the database.
This section describes our current configuration of the database system. It is intended to provide the reader with a basic understanding of our desktop application and Web interface, realizing that this configuration is dynamic and will evolve as technology and software continue to change.
The desktop version of the database has two primary functions: data entry, and specialized searching and reporting. Data entry is facilitated through Microsoft Access® forms, coded using Visual Basic for Applications (VBA), which enables the user to add information about publications, specimens, and taxonomy. The most versatile part of the desktop application is the ability to accommodate highly focused, multilevel queries. These are capable of showing complex interrelationships between the various fields, including the specimens and their cataloging data, the scientific literature that cites these specimens, the varying taxonomic treatment of these specimens in multiple publications over time, and the naming of these specimens into currently recognized taxonomic ranks. These complex queries are not preprogrammed and require a basic knowledge of how to use the Microsoft Access® query interface.
The Web site is currently served from an NPS server in Washington, D.C. Presently, the database tables (Fig. 1) are housed in Microsoft Access® and served using Web pages coded in Macromedia Cold Fusion®. This Access-Cold Fusion combination is suitable for small-scale Web applications that rarely experience more than a few concurrent requests for information. Since its inception in 2002, the first page of the Florissant Web site has received ∼20 hits per day.
Unlike the desktop application, which has the ability to create customized queries as needed, search options on the Web interface had to be pre-programmed. Thus, the current Web site design provides three search portals based on taxonomy, bibliographic citations, and museum specimens. Following an initial search, the output can be refined further by using more detailed search criteria, and it also allows the user to link between the taxonomy, museum specimens, and bibliography databases (Fig. 2). This provides a flexible framework that helps to accommodate more focused questions, although not to the same level of complex refinement that the desktop database enables. This is the first implementation of the Web site, and future modifications may allow Web users to conduct even more detailed searches. The current Web site address is http://planning.nps.gov/flfo/, or it can also be located by a link from the Web site for Florissant Fossil Beds National Monument.
Currently, plans are underway to migrate the data into Specify, which is a customizable collections-based research database (Specify Software Project, 2007). This conversion will enable the development of a revised Web site that will be hosted as a partnership between the NPS and the University of Colorado at Boulder.
Data Sharing with Other Paleontology Databases
Since the inception of the Florissant database, similar databases such as Paleoportal (www.paleoportal.org), CHRONOS (www.chronos.org), and the Paleobiology Database (paleodb.org/cgi-bin/bridge.pl) came into existence around 2000. These projects are joint ventures among various scientific institutions and organizations, and all share the objective of compiling and synthesizing paleontologic, geologic, and biologic data about fossil species into a single, easily searchable Web site. The approaches to these Web sites vary from creation of large databases housing information provided to them by researchers and professionals, to creation of electronic infrastructures that provide access to a multitude of other internet databases. They provide a single portal for searching the wealth of information from these other sites.
We are pursuing the potential for integrating with these other paleontology data clearinghouses. In addition, we will consider alternative means for sharing and integrating the data, including the use of extensible markup language (World Wide Web Consortium, 2006), Web services, and service oriented architectures (He, 2003). Connecting or incorporating data from the Florissant database would help to minimize the need for these other databases to acquire the information independently from widely scattered and poorly updated sources. Paleoportal already recognizes Florissant as an example of “famous floras and faunas,” and although the Paleoportal Web site includes a brief explanation of Florissant and a link to the Florissant database Web site, it does not yet serve the full body of Florissant collection data.
Potential problems in sharing the database with these sites include providing additional server space for the link, extending the permission for use from the various museums, and creating a filtering mechanism by which the sites could avoid duplications of data in those cases where particular museums with Florissant collections are already linked to these sites. An additional problem in serving so many diverse databases through one portal is that the ability of these larger, servicing databases to continually provide data can rely entirely on the sustained functionality of the databases to which they link. Whatever the solution, we hope to minimize user confusion and reduce the time spent searching, yet still address concerns regarding data compatibility, sensitivity, and ownership. One promising option is to use the Distributed Generic Information Retrieval (DiGIR; Specify Software Project 2007) protocol, which is one type of XML-based Web service developed specifically for the sharing of taxonomic information. DiGIR allows numerous database systems to be simultaneously queried from one Web interface and overcomes the issues that arise when attempting to merge incompatible database systems.
Applications to Research
The database and Web site provide ready access to information and materials that previously were difficult to find because they were widely dispersed among different museums and publications. The specimen component of the database enables direct searches of collections and specimen data including links to all of the publications in which a specimen was referenced. The associated archive of specimen photographs provides images that can be used to examine the general morphology of the fossil organism and assess its state of preservation. Many of these fossils, even the type specimens, were never illustrated in publication and are available exclusively in the Florissant database (Fig. 3). These images can aid a researcher in deciding which specimens would be the most useful to request on loan, and in some instances can provide the needed research information in themselves.
The taxonomic component of the database provides a consistent conceptual framework that updates the taxonomic position for the fossil organisms and in some cases supersedes the older taxonomic information from the most recent publication of the fossil specimens. Researchers with an interest in certain taxonomic groups can use the database to readily compile and tabulate an inventory of these groups and to locate the museums in which the type specimens are reposited. The database provides the first comprehensive census of the taxonomic diversity at Florissant, thus enabling more accurate reconstructions of the paleo-community and its paleoenvironment. A summary count of the major taxonomic groups is shown in Table 2.
The bibliographic component of the database readily facilitates literature searches to locate publications in which the fossils were described, generate lists of species for each publication, and document nomenclatural changes. Many of these publications are instantly available in the database's digital library archive, which includes many holdings of older, obscure publications. Some of these are difficult to obtain otherwise, even from large libraries.
The combined components of the database also enable research into the history of paleontology at Florissant (Veatch and Meyer, this volume). Figure 4, for example, shows the number of publications about Florissant during each decade; Figure 5 shows the number of new species described in these works per decade; and Table 3 shows the publications in which most of these specimens were referenced. This clearly shows the emphasis of the earlier publications on describing new species, particularly during the period from 1890 to 1920. It also reflects the fact that new species descriptions were concentrated into a few lengthy monographs during the 1890s (e.g., Scudder, 1890, 1893), whereas the numerous publications of Cockerell, Rohwer, Brues, and Wickham during the 1900s and 1910s each included fewer descriptions of new species.
The database contributes information for broader research topics such as assessing global biodiversity through time. For example, because the Florissant database tracks the taxonomic treatment of particular specimens through different publications, it is capable of documenting nomenclatural changes to taxonomy and therefore of contributing to an assessment of the “flux ratio” that compares historical rates of invalidation and revalidation (Alroy, 2002). One simple assessment of this is shown in Figure 6, which illustrates the number of species originally described at Florissant compared to the number that are still valid in the most recent publications. In this regard, an analysis of the species described from Florissant during the 1800s and early 1900s (see Fig. 5) shows that many of the original binomial names for the plant megafossils have been invalidated or placed into synonymy (Fig. 6), resulting in nearly a 50% reduction by the work of MacGinitie (1953) alone, whereas a similar analysis for the Florissant insects indicates that many of the original names remain unchanged by the lack of subsequent critical studies. Thus the diversity of insect species at Florissant, based on the previously published names documented in the database as most recent, is probably misleading and may be significantly inflated if the insect names are invalid or need to be synonymized to a level comparable to that of the plants. If this is true and if the Florissant database is incorporated along with other databases in developing a global paleontological database, then biodiversity estimates for the Florissant fossil insects based on previously described specimens could be considerably overestimated relative to the fossil plants, particularly for those insect genera that were highly split into different species by the original workers.
Applications to Collections Management
Curators at museums with Florissant collections often can find more complete and more searchable information from the Florissant Web site than in their own records. In part this is because the compilation of the database resolved various catalog inconsistencies, correlated all specimens with publications, and provided photographic documentation. For example, the database is used by the University of Colorado Museum (UCM) to (1) confirm type specimens and matching them with their references, (2) determine whether specimens indicated in publications as UCM types are in fact elsewhere, not in that museum, (3) locate specimens in the collection that are the counterparts of type specimens in other museums, (4) update old taxonomic names on labels to the currently valid name, (5) locate bibliographic citations for specimens that have been lost over the years and are no longer in the collection, and (6) taxonomically identify non-type specimens in the general collection by comparing them to photographs in the database (Amy Moe, January 2006, written commun.). The American Museum of Natural History (AMNH) uses the database to compare actual specimens against the database photographs in order to establish a specimen's condition when it goes out for loan and comes back in, and also to refer potential collection users to a source for browsing the AMNH collection (Bushra Hussaini, January 2006, written commun.). The Natural History Museum, London (NHM) has used the database to access data and images for particular specimens in their collection, because the Florissant specimens are not in their database (Andrew Ross, January 2006, written commun.). The MCZ at Harvard University houses the largest number of specimens included in the database (Table 1), yet during the time of our collections survey, the museum did not have its own database for managing these collections. An MCZ fossil insect database was developed subsequently, and many of the data fields were populated using an export from the Florissant database (Phil Perkins, February 2006, written commun.).
Applications to Education and Interpretation
A Virtual Museum for the Layperson
Although the database Web site provides the comprehensive documentation and search capability that is needed by scientific researchers and museum curators, its complexity may quickly seem overwhelming to a layperson. To provide information for less-specialized users, the database was used to derive a digital photographic gallery that includes condensed interpretive information (to be available as a link from www.nps.gov/flfo).
This “online museum” allows users to choose two optional portals: (1) a simple slide show of some of the most impressive fossils, and (2) a series of Web pages that discuss Florissant's geology and paleontology in more detail. Some of the most impressive specimens were selected, and these are accompanied by descriptions of the fossil organisms and an illustrated overview of the geologic history of the Florissant fossil beds. A succession of Web pages leads users through various levels of the major taxonomic groups, including common names, and culminating with generic examples.
Utilizing the Database to Develop an Undergraduate Curriculum
Another application currently under development will provide an educational curriculum for undergraduate students. Once completed, this can be located by link from www.nps.gov/flfo. This curriculum will serve as a stand-alone laboratory supplement for paleontology courses, and as an introductory exercise that could be used by the many field geology classes that visit Florissant during the summer. It will use a subset of selected leaf macrofossils from the database, and students will be able to obtain random samples that they can analyze to identify characteristic taxa and develop hypotheses about paleoecology and paleoclimate. This curriculum has several objectives: (1) to expose students to the functionality of the database, (2) to develop skills in understanding taxonomy and identifying fossil leaves, (3) to analyze physiognomic characters of fossil leaves as a basis for climate reconstruction, and (4) to create a virtual geologic map based upon outcrop and rock photographs.
Developing Other Media
The accessibility to information that resulted from the compilation of the database was critical in developing the first publication that listed the entire taxonomic diversity and bibliography for Florissant (Meyer, 2003). This made it possible to summarize concisely some of the most important aspects of Florissant's paleontology in the format of a book that was intended for a broad audience, ranging from amateurs to professional paleontologists. The database was invaluable for accomplishing this and for locating some of Florissant's most impressive, well-preserved specimens for illustration. Key components of the database were synthesized for use as appendices, including a listing of museums with Florissant collections, and a taxonomic compilation that lists the authors of all species, the year in which each species was described, and the location of the primary type specimen (holotype or syntypes). The database also provided the source for publishing a complete bibliography in the book.
New exhibit designs are currently under way for Florissant Fossil Beds National Monument and the database is providing a source for locating unique specimens that will be incorporated into exhibits as photographs to help illustrate some of the important interpretive themes. The database Web site, including the general user's “online museum,” will provide an interactive kiosk exhibit to help visitors understand Florissant's paleontology.
Problems and Limitations
Understanding the Complexity of the Data and the User Requirements
To obtain meaningful results from more complex database queries, users must have a thorough understanding about the nature of the data content as well as concepts of paleontology, museum cataloging practices, taxonomy, and nomenclatural procedures. Seemingly simple searches often can become much more complex than what was anticipated.
For example, a user might search the taxonomy table as a simple means for determining the diversity at Florissant (Table 2), yet the resulting hit list of 1994 total records would be misleading. In part, this is because two fundamentally different approaches to naming palynomorphs have been used at Florissant, resulting in duplicate names that cannot be correlated easily between these publications. Further, many of these palynomorph taxa are organ duplicates of leaf and fruit species known from the macrofossil record. Consideration of such anomalies results in a total taxonomic estimate of ∼1700 species (Meyer, 2003). As another example, a query to determine how many species were originally described in a particular monograph requires combining multiple syntype specimens into a single species category during the search. Users must formulate their queries carefully, and presentations of data tabulations frequently require some degree of qualification and explanation.
In designing a database, user requirements ideally should be well defined before the actual design begins. Unfortunately, database design and data collection often occur without a clear vision about how the information ultimately will be used. As users refine their needs, the ability to answer complex questions may be inhibited or prohibited by the relationships among tables or the fields within the tables. Ultimately, this limitation can be addressed through the addition or modification of tables or fields, although this increases the risk of adding ad hoc patches to the database, which may add unnecessary confusion and require more overhead in maintenance.
The Florissant database is designed to serve as a long-lasting digital repository that provides public access to a broad spectrum of information about the most important fossils from Florissant and their treatment in various publications. Site-specific databases such as this synthesize collection-level documentation of paleontological diversity that has broad applications to the science. Collectively, such databases can provide valuable tools for assessing previously published literature in order to provide synthesized information about biodiversity trends through time, patterns of evolution, the biogeography of particular taxonomic groups, and the nature of regional and global community evolution through time, and for defining the specific needs for new field- and specimen-based research (Alroy, 2003). The Florissant database contributes to this by completely documenting our knowledge of this diversity at one of the world's most productive paleontological sites.
Development of this database helps to fulfill the objective of Florissant Fossil Beds National Monument to provide information that will stimulate ongoing, innovative paleontological research at Florissant. Such information contributes to understanding Florissant's significance as a unique locality in relation to the global paleoecosystem of the late Eocene and to providing a basis for comparison with other paleontological sites worldwide. It is hoped that the methods used for inventorying the Florissant collections and in developing this database and Web site can be modified for application to other significant paleontological sites that share with Florissant such characteristics as very high taxonomic diversity, a rich publication history, and widely dispersed collections between numerous repositories.
The U.S. National Park Service and the Colorado Digitization Project provided funding for this project. Many of the interns and seasonal staff in paleontology at Florissant Fossil Beds National Monument were involved in data entry, acquisition of publications, and digitization of images, including Melissa Barton, Amanda Cook, Michelle Dooley, Melissa Hicks, Tobin Hieronymus, Scotty Hudson, Trudy Kernan, April Kinchloe Roberts, Cayce Lillesve, Rebecca Lincoln, Beth Simmons, Yinan Wang, and Marie Worley. Boyce Drummond completed a project for the National Park Service to reevaluate the higher taxonomic classification for the insect portion of the taxonomic database. John Fraser completed about half of the collection inventory at Harvard University. The Geological Society of America, through the GeoCorps AmericaTM program, provided funding for an intern to assemble the “online museum” for the general public, and that project was completed by Joseph Hall. David Pillmore, Michael Young, Ted Fremd, Jeff Brown, and Kaisa Barthuli provided technical assistance at various stages in the development of the database. The curatorial staffs at all of the museums that are included in the database were very supportive in providing access to their collections, including U.S. National Museum of Natural History (Scott Wing, Bill Di-Michele, Conrad Labandeira, Jann Thompson, Mark Florence, and Robert Purdy); Harvard Museum of Comparative Zoology (Phil Perkins, Brian Farrell, Laura Leibensperger, and Raymond Paynter); American Museum of Natural History (Bushra Hussaini, Ivy Rutzky, Neil Landman, David Grimaldi, and Malcolm McKenna); Yale Peabody Museum (Leo Hickey, Linda Klise, Tim White, and Mary Ann Turner); University of California Museum of Paleontology (Howard Schorn and Diane Erwin); University of Colorado Museum of Natural History (Peter Robinson, Paul Murphey, Amy Moe, and Dena Smith); Denver Museum of Nature & Science (Kirk Johnson and Logan Ivy); Carnegie Museum of Natural History (Albert Kollar, Ilona Wyers, and Elizabeth Hill); Paul R. Stewart Museum of Waynesburg College (James Randolph); Field Museum of Natural History (Jenny McElwain, Peter Wagner, and Lance Grande); Florida Museum of Natural History (Steve Manchester and Roger Portell); San Diego Museum of Natural History (Tom Deméré); Milwaukee Public Museum (Peter Sheehan and Paul Mayer); U.S. Geological Survey (Doug Nichols); The Natural History Museum, London (Andrew Ross, Tiffany Foster, Peter Forey, and Cedric Shute); and National Museums of Scotland (Liz Hide). The manuscript was improved during the review process thanks to comments from Ted Fremd, Larry Gall, Bushra Hussaini, and Dena Smith.
Figures & Tables
Paleontology of the Upper Eocene Florissant Formation, Colorado
- data bases
- data processing
- Florissant Fossil Beds National Monument
- Florissant Lake Beds
- Teller County Colorado
- United States
- upper Eocene
- World Wide Web