The geologic time scale is a complex data structure composed of abstract elements that represent time intervals and instants and their relationships with specific concrete representations in the geologic record as well as the observations made of those concrete representations. The International Union of Geological Sciences' International Commission on Stratigraphy guidelines recommends a very precise usage of the relationships between these components in order to establish a standard time scale for use in global correlations. However, this has been primarily described in text. Here, we present a formal representation of the model using the Unified Modeling Language (UML). The model builds on existing components from standardization of geospatial information systems. The use of a formal notation enforces precise definition of the relationships between the components. The UML platform also supports a direct mapping to an eXtensible Markup Language (XML)–based file format, which may be used for the exchange of stratigraphic information using Web-service interfaces.
The goal of this paper is to provide an information model for the geologic time scale. A formal notation is used so as to provide a rigorous description of the various elements required to describe the structure and calibration of the time scale in a manner that allows the logical consistency of the model to be evaluated. This is important, since although stratigraphic methodology is one of the most rigorously studied aspects of geological practice, it has evolved throughout the era of historical geology. There have been significant changes in best practice, in particular in the shift from characterizing units to defining the boundaries between them. Nevertheless, the time scale itself remains based on named units and eras. Other residues of earlier practice remain visible in the description of the time scale, particularly where agreement on the application of current practices is incomplete. In this context, a rigorous characterization of the relationships between the elements of the time scale, the evidence in the geologic record, and the application of specific procedures to effect numeric calibration of the scale is essential.
A secondary goal is to develop a machine-processable format for the exchange of information related to the time scale. The modeling notation selected (Unified Modeling Language [UML]) is a commonly used software engineering standard, so the model can be implemented easily on various standard platforms. In particular, an eXtensible Markup Language (XML) document schema can be derived directly from the model. The XML implementation supports lossless data transfer using standard Web protocols, and it provides a basis for formal encoding and computer processing of information that is the basis for defining geologic time scales.
Structure of the Paper
This paper is structured as follows. An introduction and summary of key aspects of stratigraphic methodology is provided in the first section. Next, we briefly introduce information standardization activities that provide the modeling framework and notation used in this study. We then describe a general framework for temporal reference systems and develop a formal model for the geologic time scale and its calibration within that framework. This conceptual model is used as the basis for an XML implementation of the model, presented using example documents describing the International Union of Geological Sciences (IUGS)'s International Commission on Stratigraphy (ICS) time scale with global stratotype section and point (GSSP) references. Some theoretical issues arising from the models are discussed. A summary of UML notation is given in 1
THE GEOLOGIC TIME SCALE
Units, Boundaries, and Stratotypes
The conventional geologic time scale is a reference system defined by a contiguous sequence of time intervals, each identified with a name. These are recursively subdivided, resulting in a hierarchy composed of intervals of various ranks. The units in the scale are ordered, so the relative temporal positions of geologic objects and events may be recorded or asserted, denoted by the names of units from the scale.
As originally conceived, units of the geologic time scale identify intervals, each corresponding to the time during which a particular sequence of rocks (stratigraphic unit) was deposited. Historically, the units were chosen because, within the region where they were defined, they could be recognized through uniform lithological and biostratigraphic characteristics, which correlated with a relatively consistent geological environment in a single period. The representative object or prototype in a stratigraphy defined in this way is a type section for the geologic unit of interest, formally called a unit stratotype. This approach supports stratigraphic practice in which assignment of strata to a unit is based on the presence of characteristics that match the strata to the type section for the unit. The basis for the matching, or correlation, may be lithologic, paleontologic, or geochronologic, defining lithostratigraphic, biostratigraphic, and chronostratigraphic units.
Construction of a consistent time scale using the unit stratotype approach depends critically on stratigraphic completeness and the ability to geochronologically correlate bodies of rock globally. However, the corollary of a model based on continuity within units is that the boundaries between units correspond with changes in the geological environment and discontinuities in the stratigraphic record. Such discontinuities, and the incongruity of lithostratigraphic and biostratigraphic units with chronostratigraphic units, result in inconsistent and ambiguous correlation in the vicinity of chronostratigraphic unit boundaries.
For these reasons, the complementary approach to specification of the time scale is now preferred, focusing on the boundaries rather than the intervals. Within this model, the focus is on a representative point within the geologic horizon corresponding to the boundary of interest, formally called a boundary stratotype and often known as the “golden spike.” Observations made at the point provide a basis for characterizing the boundary. For example, estimates of the precise age of the boundary may be made using geochronologic techniques on specimens sampled from the stratotype point. These observations support progressive chronometric calibration of the time scale one boundary at a time.
Boundary stratotypes are defined, wherever possible, within sections where the geologic record is continuous across the boundary. The material above and below the point provide context for the boundary, showing evidence relating to the geological history adjacent to the boundary. With enough stratotype points, each representing boundaries of units of sufficiently fine rank, the partial units represented in the sections containing the points overlap to provide a basis for characterizing the complete time scale. However, a complete compilation is possible only based on sections that explicitly include boundaries. Implicitly, the shift from unit stratotypes to boundary stratotypes recognizes that, although the time scale is based on time intervals covering the domain, its logical consistency is contingent on a globally recognizable, unambiguously ordered sequence of events.
Governance and Calibration
The guidelines of the ICS formalize this practice (Remane et al., 1996). These are used by the GSSP project. The GSSP provides a forum for the specification of boundary stratotypes used for correlation on a global scale. Local stratigraphic definitions will still be used for local stratigraphic correlations, but global stratotypes provide the ultimate reference for inter-regional correlation. Local stratigraphic schemes must be related to the GSSP stratotypes through a coherent chain of correlations in order to be connected to the global time scale. Alternatively, geochronometric methods may be used for objects within those parts of the time scale where chronometric values for the boundaries have been accepted.
Physical changes in the rock record observed at the boundary horizon are inferred to result from a geologic event or events. Ideally, these are manifested globally by similar or related physical changes in sediment deposited at the same time. Correlation of the boundary horizons by correlation of physical changes in other stratigraphic sections is the basis for establishing the relative age of strata throughout Earth.
A GSSP is a particular sequence of stratified rocks defined to contain a particular, physically located stratigraphic point that serves as the reference object to define the boundary between units in the time scale. In chronologic terms, each point corresponds to a time instant at the boundary between time intervals that compose the time scale, while in stratigraphic terms, the stratigraphic point defines the lower boundary of rocks formed during a named interval.
In the context of the GSSP project, boundary stratotypes provide the ultimate definition of elements of the time scale from the beginning of the Ediacaran up to but (probably) not including the beginning of the Holocene. For the earlier parts of the time scale, boundaries between intervals are defined chronometrically by assigning a numerical age to the boundary. These numerically defined boundaries are called global standard stratigraphic ages (GSSA).
Ages recorded using named units from the geologic time scale allow ordering without requiring numeric values. However, while the assignment of numeric values is not necessary for use of the time scale, it is convenient to calibrate the time scale against a time line. For those boundaries defined by a stratotype, it provides a locality from which specimens may be collected, the age of which may then be measured using quantitative techniques. When the material in the stratotype is unsuitable or insufficient for estimating the numerical age, then specimens from locations that can be correlated with the stratotype may be used instead. The experimentally determined dates from such specimens provide an estimate of the chronologic position (numeric coordinate on a time line) of the boundary.
This overview of the construction and calibration of the geologic time scale refers to a variety of concepts, instances of which are related to each other in various ways. Effective use of the time scale for stratigraphic correlation requires that these concepts and their interrelationships be precisely understood. There is a significant body of literature that describes the system, an introduction to which is provided by Walsh et al. (2004) and references therein, in the ICS guidelines (Remane et al., 1996), and in the overview to the International Stratigraphic Chart (International Commission on Stratigraphy, 2004).
However, the model for the construction and calibration of the geologic time scale has been described primarily in prose. This leaves open the possibility of ambiguity and omission. The most formal representation of the conceptual model is provided by the schemata (column headings) of various tabulations of the time scale and associated components. In this paper, we attempt to improve on this by providing a description of the system in a standard framework used for modeling technical information, using a standard symbolic notation. The framework is object-oriented (OO) modeling and analysis, and the notation that we use is the class diagram from the UML (Object Management Group, 2004).
We use the UML notation for three reasons: (1) it provides a rich description of concepts and their interrelationships that allows us to capture most of the nuances required, while its graphical nature allows these to be communicated to readers of various levels of expertise; (2) it is commonly used in software engineering, so models defined in UML can be easily converted into software representations, including XML (Yergeau et al., 2004), Java, C++, C#, etc.; and (3) it is the notation used by the leading organizations involved in standardization of geospatial information systems. This allows us to construct specialized software components for the geosciences that leverage existing technology.
A particular advantage of the OO approach is that the key information can be partitioned into different objects, and relationships between objects can be described independent of the level of detail known about each of the objects. While it is out of scope in this paper to provide an introduction to OO modeling and analysis (there are many introductory books), a brief summary of UML as used in this report is provided in 1
General Geospatial Information
Most investigations in geology have a strong geospatial flavor, and database and geographic information system (GIS) technology are commonly used for the management and display of geologic data. It is therefore both convenient and efficient to develop models and encodings for geologic information that leverage developments in geospatial standards. The standardization framework that we use also addresses temporal issues in a manner that allows us to combine the concerns in a single model and encoding.
The methodology used in this report follows procedures and notation used in specifications issued by the International Organization for Standardization Technical Committee 211 (ISO/TC 211) and by the Open Geospatial Consortium (OGC). These organizations have been active since the mid-1990s, working to standardize models, representations, and processing services for geographic information.
ISO/TC 211 is responsible for around 40 international standards and reports in the ISO 19100 series. These specifications primarily describe conceptual or abstract models and approaches. OGC plays a complementary role in developing and testing specifications for geospatial data access and processing services, and implementation encodings for some of the information models developed through ISO. Of interest to us in the context of this report are the following:
ISO 19109, Rules for Application Schema, formally recognizes the existence of communities with needs for thematically specific information models and provides a “feature model” that serves as the basic framework for classifying the items of interest in the application domain;
ISO TS 19103, Conceptual Schema Language, lays out the UML profile used as the notation across the ISO 19100 series;
ISO 19108, Temporal Schema, describes a consistent model for temporal objects and reference systems; and
ISO DIS 19136, Geography Markup Language (GML), is an XML schema for components for geographic information.
GML (Cox et al., 2004) was developed by OGC, and includes an implementation of the temporal schema from ISO 19108. In this report, we lean heavily on the theoretical framework provided by ISO 19108 and its XML implementation in GML. The models shown in this report utilize the UML notation defined in ISO 19103.
A model and encoding for observations and measurements (O&M) has also been published through OGC (Cox, 2003). This provides a basis for describing specialized date measurements used for calibration of the time scale.
In addition to these activities that standardize generic geospatial information, there are several related projects directly addressing the geosciences. These include:
eXploration and Mining Markup Language (XMML) (http://xmml.arrc.csiro.au)—GML-based XML encoding for mineral exploration data. This includes components related to observations and measurements that are used in this report.
The North American Data Model (NADM) (http://nadm-geo.org)—conceptual models for information associated with geologic maps, presented using UML. This includes models for earth materials, geologic units, genesis, geologic structure, etc., which define a basic framework for representing geoscience information in computer information systems. NADM was developed by a group of geologists and information specialists from state and federal geological surveys in Canada and the United States.
GeoSciML—a project under the auspices of the IUGS Commission for Geoscience Information, to develop a GML-based XML encoding for geosciences. The principal stakeholders are statutory data custodians, and the focus is on the information required to support the maintenance of geologic maps. However, this is being done through a high-level conceptual model that has many components that will be of interest to the broader geoscience community.
GeoSciML is building on earlier work, in particular, that of the XMML and NADM projects. The geologic time scale encoding described in this report is likely to be incorporated in GeoSciML.
STANDARDS COMPONENTS USED IN THE TIME SCALE MODEL
Most of the classes shown in the following sections and diagrams inherit some standard properties from common base classes. The base classes are taken from ISO 19108 and from the UML representation of GML 3, GeoSciML, and O&M. Important base classes referred to in this report include:
Definition carries a mandatory “name” property, plus an unlimited set of aliases, and has an optional “description,” which carries the text of the definition or a link to a source.
Observation is an event, producing a “result” describing the value of some property of its target (e.g., a specimen). The result may take many forms: a Measurement is a special case that results in a numeric value with unit-of-measure. Every Observation has a featureOfInterest and uses a Procedure.
Procedure is a description of a process, such as an instrument, sensor, sample preparation, calculation, simulation, etc.
Section, Station are identifiable locations with a shape corresponding to a curve and point, respectively, usually associated with making observations and retrieving specimens.
Specimen is material retrieved by sampling at a location or on another feature, typically used as the subject of an observation or measurement.
Temporal Reference Systems
Figure 1 shows components concerning generic temporal objects and reference systems. The classes prefixed “TM_” are taken directly from ISO 19108, while the other classes represent the extensions required for the model in this paper.
All temporal reference systems derive from a common TM_ReferenceSystem. This has a mandatory property, domainOfValidity, which describes the spatiotemporal scope of the reference system (e.g., common era, global or Cambrian, Australia). Four concrete specializations are defined as follows:
TM_Calendar is a reference system based on years, months, and days.
TM_Clock is based on hours, minutes, and seconds in a particular day.
TM_CoordinateSystem provides the basis for numerically describing temporal position. This uses two properties to define a time line: an origin, which ties the scale to an external temporal reference position, and interval, which provides the basic unit or precision, such as seconds or millions of years.
TimeOrdinalReferenceSystem (TORS) provides the system required for a time scale based on named intervals. A TORS is composed of an ordered sequence of one or more component TimeOrdinalEra (TOE) elements. A TOE may be recursively decomposed into ordered member TOE elements, thus allowing a hierarchical system to be constructed. Each era is characterized by its name (inherited from Definition). Note that the term “era” is used generically, and is used for component intervals of all ranks.
In the definition of TOE, we have found it necessary to introduce a variation to the ISO definition. In the standard model (ISO 19108: 2003), the limits of TM_OrdinalEra are defined precisely by attributes of type DateTime. However, in historic, archaeological contexts, and certainly in the geologic time scale, while the order of eras within a TORS is known, the positions of the boundaries are often not precisely known and can only be estimated. We suggest that standard practice is better represented by a model using an explicit TimeOrdinalEraBoundary (TOEB) element to carry information concerning the transition between two TOEs. The temporal position of the era boundary is given by an associated TimeInstant, but the TOEB exists in its own right even if its position is not known. In the context of the geological time scale, the TOEB is central since it is the temporal concept that is associated with the boundary stratotype.
Finally, the TM_Position data type carries a “frame” property, which indicates which reference frame is used, usually an instance of one of the temporal reference systems.
MODEL FOR GEOLOGIC TIME SCALE
In the portrayal of the model in this report, the information is partitioned between several diagrams for convenience (Figs. 2–5). The union of these describes a general model for time scales and the relationships with evidence in the geologic record. It is intended to include all components necessary to describe the practice specified in the ICS guidelines (Remane et al., 1996), but it also includes elements and relationships that relate it to other methodologies, in particular, for defining local or regional time scales. In order to illustrate the application of the model to the GSSP, a summary diagram is provided in which relationships that are not required by the ICS guidelines are omitted (Fig. 5).
The Geologic Time Scale
Figure 2 shows the way the generic components are extended for geochronologic purposes.
GeologicTimescale is a kind of TORS that is composed of one or more ordered TOEs together with two or more TOEBs. GeologicTimescale is, thus, a temporal complex that includes both the eras and the boundaries composing the reference system as first-order elements. Geochronologic specializations of each are provided.
GeochronologicEra is a kind of TimeOrdinalEra with boundaries defined by geologic evidence. It specializes TimeOrdinalEra by adding a “rank” attribute, whose value is one of the standard terms, such as eon, era, period, etc. Some additional properties are discussed in a following section.
Two specializations of TOEB are introduced. GeochronologicBoundary represents an era boundary defined with reference to some geologic evidence. Its properties are discussed in detail in the following section. NumericEraBoundary is provided for those boundaries defined chronometrically. It has no additional associations, but a “status” attribute is provided.
Geologic Evidence and the Time Scale
Figure 3 shows relationships of the conceptual objects with certain physically realizable geologic feature types, including units and sections, samples of which may play roles as stratotypes. Both eras and boundaries are represented in the geologic record.
A ChronostratigraphicUnit is the (notional) feature composed of all the rocks formed during the associated GeochronologicEra. Both ChronostratigraphicUnit and the commonly used LithostratigraphicUnit are kinds of GeologicUnit. Similarly, a ChronostratigraphicBoundary is the (notional) compound surface marking the upper or lower bound of a unit. Both ChronostratigraphicBoundary and LithostratigraphicBoundary are kinds of StratigraphicBoundary. In practice, the complete shape of any ChronostratigraphicUnit and ChronostratigraphicBoundary instance will not be precisely known, so while the existence of a unit and its boundaries is a fact, they will never be fully described. The ChronostratigraphicUnit is the complete body of rock formed during (formedDuring) a GeochronologicEra, and the ChronostratigraphicBoundary correlates with (correlatesWith) a GeochronologicBoundary. A ChronostratigraphicUnit carries a “rank” attribute, whose value is one of the standard terms such as system, stage, zone, etc.
Conventional samples of both units and boundaries may be defined, with a spatial dimensionality two orders less than the parent. For a solid unit, this sample is a section (whose shape is a curve), while for a surface the sample is a point. These are shown in the model as StratigraphicSection and StratigraphicPoint, respectively. A StratigraphicPoint is always contained within a StratigraphicSection, which is its hostSection. In principle, a StratigraphicSection may host any number of StratigraphicPoints. While an unlimited number of samples of the concrete geological unit or boundary may be made, a single instance must provide the reference locality or stratotype for the associated era or era-boundary, respectively.
A further important concept is the StratigraphicEvent. In general, events are associated with time primitives of either zero or a finite extent (i.e., time instants or time periods). However, in the context of the geological time scale, a useful event has negligible duration, and is associated with a boundary and characterized by a StratigraphicPoint.
Calibration of the Time Scale
The time scale is calibrated by estimating the position or time-coordinate of boundaries within it (Gradstein and Ogg, 2004). Determination of geologic age fundamentally relies on isotopic dating of mineral phases that can be related to the age of the enclosing rock (see Faure, 1977), or to the correlation of calculated changes in Earth's orbital parameters as a function of time to patterns of physical property variations related to those parameters in stratigraphic sequences (Laskar, 1999; Shackleton et al., 1999). Thus, estimation of the position of a boundary is based on observations made on specimens collected from stratigraphic sections that contain, or may be correlated with, a boundary stratotype, or observations made concerning the position of a boundary stratotype within patterns displayed in its host section. Figure 4 introduces classes supporting the process of estimating the numeric position of a boundary.
DateMeasurement is a kind of measurement whose result is a (numeric) value with reference to a TimeCoordinateSystem (Fig. 1). In common with all observations, it relates to a physical target or featureOfInterest, usually either (1) a specimen, or (2) a sampling site such as a StratigraphicPoint in its context within its host StratigraphicSection. The measurement uses a DatingProcedure, preferably a precise numeric method such as one of the radiometric methods or based on astronomical cycles. If these are not suitable for the physical evidence, then less precise methods are used.
The GeochronSpecimen is some material that samples a site. The site will strictly be a small interval bracketing a point of interest (i.e., a short section), but may often be treated as a point at the scale of interest. If the material at the stratotype itself is unsuitable for date determination, then the featureOfInterest related to the actual measurement may sample a different locality that is correlated with the stratotype, or with another known relationship with the stratotype.
Finally, StratigraphicDateEstimate represents an identified interpretation of temporal position and is substitutable for TimeInstant. The observationalBasis of the StratigraphicDateEstimate may be one or more observations (e.g., DateMeasurements). Thus, the association labeled “Geometry” between TimeInstant and TimeOrdinalEraBoundary usually refers to a StratigraphicDateEstimate when the GeochronologicBoundary descendent is involved
Note the various cardinalities on the associations. A measurement is associated with a single procedure and a single target object. Specimens may be associated with a point or interval. Measurements may be made on a target. Some of the associations are only traversable in one direction: a procedure does not know about all the measurements made using it; a DateMeasurement does not know if it is used as the basis for a StratigraphicDateEstimate.
Representation of the ICS Model
The model shown in Figures 1–4 provides a description of a relatively comprehensive set of relationships between objects involved in the definition and calibration of the geologic time scale. This includes a number of associations that reflect relationships between objects in the system, but which are not required or are deprecated in modern stratigraphic practice, as defined in the guidelines of the ICS (Remane et al., 1996).
The diagram in Figure 5 shows a complete model, constructed by combining elements introduced in Figures 1–4, but suppressing classes and associations that either conflict with or are not used by the practice described in the guidelines (Remane et al., 1996). Furthermore, in this diagram most of the required class attributes are shown. For example, a number of attributes describe details of points and sections, some of which are inherited from parent classes as indicated by the annotation.
We may summarize the story told in this model as follows.
GeologicTimescale is a specialized TORS and is composed of an ordered sequence of TOE elements, along with the TOEB elements that act as reference points. TOE elements are recursively nested and assigned a rank within a standard hierarchy. GeochronologicEra and GeochronologicBoundary are specializations of the standard eras and boundaries.
One StratigraphicPoint plays the role of stratotype for a GeochronologicBoundary, which records a GeochronologicEvent. The GeochronologicBoundary corresponds with the initiation of rock formation during the GeochronologicEra for which the StratigraphicPoint is the lower boundary-stratotype. Under ICS guidelines there is no corresponding association of a unique stratigraphic section with a GeochronologicEra. Unit stratotypes may be used for regional and local purposes, but their use is deprecated for specification of the global time scale.
DateMeasurements are made on either (1) a StratigraphicPoint in its context (e.g., for determinations based on astronomical cycles), or (2) on a GeochronSpecimen (e.g., for radiometric date determinations). Specimens may be sampled in the stratotype section, or another StratigraphicSection that is correlated with the stratotype. A StratigraphicDateEstimate provides the preferred value of the position of the GeochronologicBoundary. The estimate is usually based on one or more DateMeasurements, but may be derived from some other basis.
A StratigraphicDateEstimate has a quality associated with it, which allows the estimated error to be recorded. StratigraphicDateEstimate along with both StratigraphicPoint and StratigraphicSection have status attributes that can be used to record whether these are ratified through GSSP.
A suitable StratigraphicPoint has an association with one or more StratigraphicEvents, which are associated with observable evidence in the section that defines the point, such as the appearance or disappearance of particular fossil taxa, or the beginning or end of some climatic phenomenon. Note, however, that in the ICS approach, it is the StratigraphicPoint itself (the golden spike) that provides the ultimate reference for the boundary, so its position will remain unchanged even if new evidence modifies the interpretation of the stratigraphic event (Walsh et al., 2004).
StratigraphicEvent inherits from the event class (not shown) an eventTime association with a notional time object. However, as used here, it is assumed that the position of a StratigraphicEvent is not available directly, but may be recovered by tracing the association with a boundary or prototype point, for which estimates of the position are available.
The ICS guidelines (Remane et al., 1996) provide a set of information that must be supplied for a proposed GSSP. 01 shows how these are implemented in the UML model presented here. All the required information has suitable slots in the model, so this means that the record of a submission to ICS could take the form of a document structured according to this model.
XML Document Format
UML is a convenient means to represent an information model. To make use of the model, we require an implementation that allows instances that conform to the model to be expressed. Some software development environments support automatic configuration and code generation of data structures and representations based on a UML model. The instances may take various forms, including tables and messages.
In this work we focus on a message format using XML (Yergeau et al., 2004). XML is a text-based method for serialization of structured and semi-structured data primarily developed for transfer using Web protocols, but may also be used to define file-formats for persistent storage. A particular advantage of the plain-text encoding is that it allows inspection and modification using basic text-processing tools to supplement processing using specialized software. However, it is important to note that XML is not a presentation format, and the information in an XML document should be transformed and formatted for human consumption. This might include transformations of values from mathematical formats into conventional presentation forms. For example, latitude and longitude appear as signed decimal degrees in a GML document, but cartographic practice prefers a sexadecimal (degrees-minutes-seconds-hemisphere) representation; geological dates are negative numbers relative to the standard origin, but are usually viewed as positive numbers corresponding to age (see further discussion below).
As discussed above, XML representation of data using GML is at the core of various Web-service interfaces defined for access to geospatial data by the Open Geospatial Consortium, such as Web Feature Service (Vretanos, 2005). GML is being ratified through ISO/TC 211 and currently has the status of Draft International Standard 19136.
A pattern for XML serialization is provided by ISO 19118 and GML 3 (Cox et al., 2004). This depends on the model using the UML profile defined in ISO 19103 and GML 3, mentioned above. In summary, the serialization method implements the metamodel in the following ways:
The structure of the XML document corresponds to a view of the UML model as a tree rooted at the class of interest;
Both classes and properties (UML attributes and associations) appear as XML elements in the XML instance document;
The XML element name matches the class or property name (i.e., UML attribute names, or in the case of associations, the rolename at the target class);
Where the value of a property has a complex structure (i.e., shown as a class rather than a data type in the UML model) it may be given either as a structure of sub-elements nested within the property element (“inline”), or via a reference to a value elsewhere using an “xlink:href” attribute on the property element; and
Generalization is implemented as substitution group affiliation in the XML schema.
Rule 1 is concerned with accommodating the fact that, in general, a UML model is a graph of links between classes, while XML's nested element pattern embodies a set of relationships that have a tree form.
Rules 2 and 3 mean that the resulting instance document is “striped,” with nested elements alternating between class and property names, and where the appearance of a class as a descendent of another class is always mediated by a container element corresponding to a property.
The method uses an intermediate World Wide Web Consortium (W3C) XML Schema (Fallside et al., 2004) that implements the model and supports schema-validation of instance documents.
Following the model and encoding rules described above, a representation of (parts of) the geologic time scale and its calibration following the ICS guidelines is shown in Listings 1–5.
Listing 1 represents the complete geologic time scale, though only the three eras of rank Eon are shown, along with descriptions of the two intermediate boundaries. An illustration of the finer decomposition of parts of the Phanerozoic and Late Permian is shown in Listing 2.
Listing 1. The geologic time scale decomposed to eon level.
<?xml version=“1.0” encoding=“UTF-8”?>
<gml:description>The geologic timescale, as defined by ICS—decomposed to eons only</gml:description>
<gml:name>ICS Geologic Timescale—eons only</gml:name>
<!—Positions of Global Stratotype Points as given in http://www.stratigraphy.org/gssp.htm 2004–04–25—>
<gml:description>Three different estimates of the position of this boundary are included.</gml:description>
<gml:name>Base Lower Cambrian</gml:name>
Note that in this and the subsequent examples, the XML document is composed of elements from several different namespaces (Bray et al., 1999). The components that are specific to the geologic time scale are in a namespace for which the “gt” prefix is used. Components from other parts of GeoSciML use the prefix “gsml.” General components inherited from GML are indicated by the “gml” namespace prefix, some supporting elements from the “meta” namespace, and components relating to observations and measurements and sampling, are from the “om” and “sa” namespaces.
The time scale is contained within a GeologicTimeScale element, and is composed of a set of GeochronologicEra, NumericBoundary, and GeochronologicBoundary elements.
The role of each GeochronologicEra in the GeologicTimeScale is indicated by its container component element. GeochronologicEra carries a name, start, end, and rank properties. Ordering of the GeochronologicEras is encoded in the sequence of elements in the XML document. Following a standard GML pattern (Cox et al., 2004), the values of the start and end properties are given via references that link to definitions of boundaries available elsewhere. The value of each link is a URIReference (Berners-Lee et al., 1998), pointing to a fragment in a document identified by a URI. In the examples shown here, many of the references are internal to the same document, so the short-form of pointer is used, comprising a pound symbol followed by the handle of the target element.
Both the NumericBoundary and GeochronologicBoundary play the same role, as referencePoints in the GeologicTimeScale, and as the values of start or end properties of the relevant eras. Note that these have multiple names, which are equivalent. The Archean-Proterozoic boundary is a NumericBoundary whose position is given as a TimeInstant. On the other hand, the Proterozoic-Phanerozoic boundary is a GeochronologicBoundary, for which the position is a StratigraphicDateEstimate, based on several (links to) DateMeasurements. For a more complete illustration of StratigraphicDateEstimate and DateMeasurement, see Listing 3 below. The boundary is associated with an event, and has a stratotype, whose value is a link to a description of a StratigraphicPoint. For a more complete illustration of StratigraphicEvent, and StratigraphicPoint, see the discussion of Listing 4 below.
Note that each of the elements representing distinct identifiable objects (i.e., those that instantiate classes shown in Fig. 5) carries “gml:id” attribute. The value of this is unique within the document, and provides a handle for the document element and its contents, which supports cross-references to this component. Although it is not necessary for the handle to have any semantic significance, here we use the standard symbols for the handle for eras as a mnemonic device (except that G instead of ε is used for the Cambrian Period. This allows encoding using the reduced character set available on most standard keyboards), and for boundaries we concatenate the symbols for the two adjoining eras with an underscore.
Listing 2 shows an expansion of the Phanerozoic eon. This has three member GeochronologicEra elements, describing the Paleozoic, Mesozoic, and Cenozoic eras. The conventional decomposition of the Paleozoic is shown by five member elements carrying links (to descriptions of eras not shown here) and a final member element which contains a GeochronologicEra element describing the Permian period. The latter is decomposed further into GeochronologicEra elements representing (a subset of) the relevant epochs and ages. Elements describing subage and chron ranks are not shown.
Listing 2. The Phanerozoic era, decomposed to age level (Late Permian only).
<gml:description> Paleozoic Era.
Note that this era definition contains references to some eras that are not yet described here:
viz. G, O, S, D, C.</gml:description>
<gt:member xlink:title=“Cambrian” xlink:href=“#G”/>
<gt:member xlink:title=“Ordovician” xlink:href=“#O”/>
<gt:member xlink:title=“Silurian” xlink:href=“#S”/>
<gt:member xlink:title=“Devonian” xlink:href=“#D”/>
<gt:member xlink:title=“Carboniferous” xlink:href=“#C”/>
<gml:description> Permian-Carboniferous time scale is derived from calibrating a master composite section to selected radiometric ages</gml:description>
<gml:description> Mesozoic Era
Note that this era definition contains references to some eras that are not yet described here:
viz. T, J K.</gml:description>
<gt:member xlink:title=“Triassic” xlink:href=“#T”/>
<gt:member xlink:title=“Jurassic” xlink:href=“#J”/>
<gt:member xlink:title=“Cretaceous” xlink:href=“#K”/>
<gml:description> Cenozoic Era
Note that this era definition contains references to some eras that are not yet described here:
viz. Pg, Ng.</gml:description>
<gt:member xlink:title=“Paleogene” xlink:href=“#Pg”/>
<gt:member xlink:title=“Neogene” xlink:href=“#Ng”/>
Listing 2 primarily illustrates how the component GeochronologicEra elements are nested, following the structure of TimeOrdinalReferenceSystem given by ISO 19108.
Listing 3 shows the details of two GeochronologicBoundary elements, which delimit the Changhsingian age shown in Listing 2. The estimate of the time position of each is carried by a StratigraphicDateEstimate element, each of which in turn points to their observationalBasis in the form of DateMeasurements shown in Listing 4. The structure of DateMeasurement follows the Observations and Measurements model (Cox, 2003) to capture various metadata about the details of the measurement. In the cases shown here, the target of all DateMeasurements is indicated simply as the stratotype, but in general a feature such as a GeochronSpecimen may be indicated, supporting a full record of the details of the experimental process. Each boundary has event and stratotype elements which carry links to a StratigraphicEvent and StratigraphicPoint, respectively.
Listing 3. Two Geochronologic Boundry descriptions and associated StratigraphicDataEstimate and the DateMeasurement elements relating to one of the eras shown in Listing 2.
<gml:name>Base of Changhsingian</gml:name>
<gml:name>Base of Mesozoic</gml:name>
<gml:name>Base of Triassic</gml:name>
<gml:name>Base of Lower Triassic</gml:name>
<gml:name>Base of Induan</gml:name>
<gt:status>GSSP Ratified 2001</gt:status>
Listing 4 contains the details of DateMeasurements that are the basis for StratigraphicDateEstimates. This listing is referred to as dates.xml in Listing 3.
Listing 4. Details of DateMeasurements that are the basis for StratigraphicDataEstimates. This listing is referred to as dates.xml in Listing 3.
<gml:description>Calibration of a master composite section to selected radiometric ages</gml:description>
<gml:description>U-Pb ages bracket GSSP</gml:description>
<gml:name>Bowring et al., 1998</gml:name>
Finally, Listing 5 illustrates the structure of the descriptions of StratigraphicEvent and StratigraphicPoint elements referred to by the event and stratotype properties of the GeochronologicBoundary elements in Listing 3. Note that since a StratigraphicPoint element potentially describes a golden spike in the calibration of the time scale, this has a status property to indicate if it has been ratified through the GSSP program.
Listing 5. The StratigraphicPoint and StratigraphicEvent elements associated with the boundries shown in Listing 3.
<gml:description>Leading candidates are in China</gml:description>
<gt:offset uom=“m” xsi:nil=“true”/>
<gml:description>Base of Bed 27c, Meishan, Zhejiang, China</gml:description>
<gt:hostSection xlink:title=“Bed 27c, Meishan, Zhejiang, China”/>
<gt:additionalCorrelationProperty>Termination of major negative carbon-isotope excursion</gt:additionalCorrelationProperty>
<gt:additionalCorrelationProperty>About 1 myr after peak of Late Permian extinctions.</gt:additionalCorrelationProperty>
<gt:status>GSSP Ratified 2001</gt:status>
<gml:description>Near lowest occurrence of conodont Clarkina wangi</gml:description>
<gml:description>Conodont, lowest occurrence of Hindeodus parvus</gml:description>
The examples shown in Listings 1–5 show a representative subset of a time scale. A complete time scale would reuse the patterns shown here for the full set of eras of all ranks and their associated boundaries.
SOME THEORETICAL IMPLICATIONS
Reference Systems and Time Scales
There has been some discussion of the relationship between the geologic time scale and other measurement systems (Walsh et al., 2004). The model for temporal reference systems summarized above provides a useful framework for this. Broadly, there are three kinds of reference system or scales involved here:
Ratio or absolute scale—a value on a ratio scale describes the “amount of” something. The amount, or measure, is given as an unsigned number that is scaled by some unit of measure. This may be expressed in arbitrary precision (though not necessarily accurate or meaningful). Mass, length, and concentration are measured on ratio scales. In a temporal context, the length of a time interval or the age of an object may be given as a number of seconds, years, etc.
Interval scale or coordinate system—a value on an interval scale describes position relative to a datum or origin. The distance from the datum is given as an amount scaled by a unit of measure, in arbitrary precision. The position of the origin of an interval scale is arbitrary, so positions on both sides of a datum are possible, hence the value must be signed. The value of a potential must be expressed using an interval scale. In a temporal context, a position or date may be expressed as a numeric value relative to a time coordinate system. In geochronology, the conventional origin for numeric scales is 1950, though this is only distinguishable from “the present” for very high precision dating methods dealing with the relatively recent past.
Ordinal reference system or ordinal scale— values given as an ordinal unit or classifier, denoted by a symbol such as a word or code. Relative sizes or positions may be described using an ordinal reference system, with a fixed precision determined by the extent of the ordinal unit, which may vary across the scale. Ordinals may be used for classification of absolute values (e.g., the well-known grain-size classifications in sedimentology) as well as position (the geologic time scale), so the ordinal scale may be calibrated against either a coordinate system or ratio scale. It is important to note that while an ordinal system depends on the ordering of the events that define the boundaries between units in the system; the positions of these boundary events is not necessarily known. Walsh et al. (2004) refer to ordinal units as classificatory pigeonholes.
The differences between the types of scale are also shown by the operations that are valid on values using them and their results. The relative quantities of two measures may be determined by subtraction or division, with the result being a measure or a ratio, respectively. The relative separation of two positions on an interval scale may only be determined by subtraction, with the result being an amount on a ratio scale.
The common practice of giving geologic age as an unsigned number is consistent with considering age to describe the “amount of years” in an object. Age and temporalposition are often used interchangeably in geochronology, with little confusion, because of the practice of setting the datum as the present. In the context of the encoding shown here, the position of a boundary is given as a signed number on an interval scale, while the result of an age determination should be a measure on a ratio scale. However, in order to utilize the standard structures provided by ISO 19108:2003 and GML, the StratigraphicDateEstimate inherits the position property from TM_Instant, and thus gives the value as a (signed) position on an interval scale.
Ordinal Reference System versus Constrained Topology
The GeologicalTimeScale described here is structured as a TimeComplex, composed of eras and boundaries corresponding to the time edges and time nodes of a temporal topology complex (ISO 19108:2003). There are two issues with expressing the time scale as a topology complex, however.
The first is that this would require multiple-inheritance, with the TORS class deriving from both TM_ReferenceSystem and TM_TopologyComplex. While useful in principle, multiple inheritance is notoriously problematic in practice, and alternatives such as interfaces are commonly used. Thus, in ISO 19108 the concepts of ordinal reference system and topology complex are kept separate. This reflects a preference for single-inheritance in the model, with the ordinal reference system grouped with reference systems rather than topology complexes.
The second issue concerns the constraints that must be imposed so that the complex can fulfill the requirements of a reference system. These are as follows. The ordinal eras and ordinal era boundaries must form a connected, covering network or complex for the domain of the reference system. Furthermore, the complex must be constrained such that each era may only be subdivided once by a set of eras of a lower rank. In terms of the topology complex, the set of edges that either starts or ends at any node must include exactly one of each rank between the highest and lowest rank represented. The single hierarchy that results ensures there is no ambiguity in the relative positions of eras. We might term this an UnambiguousTimeTopologyComplex.
For example, in the temporal topological complex shown in Figure 6, we show edges representing eras as arrows, between nodes representing boundaries shown as filled circles. The eras have various ranks implied by the thickness of the line, and are labeled B, C1, etc. Some of the nodes are labeled B_C, B4_B5, etc.
The parts of the graph colored green represent a valid ordinal reference system. For example, a feature assigned the age B22 is unambiguously earlier than a feature of age B4, and is during the life of a feature of age B. These relationships are clear even if the numerical positions of the end points of some or all of the eras are not known, or not known precisely.
The parts of the graph colored blue contain an alternative primary decomposition of era B, labeled b1, b2, etc, where the elements of the decomposition have the same rank as the elements in the existing decomposition. Note that, unless the positions of the nodes are precisely calibrated on a numeric scale, it is not possible to determine the relative temporal positions of features whose ages are b3 and B24. The order of components is ambiguous, so the complex including both does not qualify as a valid reference system.
The blue subset may, however, comprise a different reference system for era B, for example, having a different (spatial) domain of validity. Note that the temporal relationship between objects characterized using different reference systems is in general indeterminate.
This describes the common situation in stratigraphy where the relative age of objects from different regions may not be possible if local time scales are in use. Correlation projects attempt to resolve this by discovering, or asserting, relationships between elements of time systems defined originally for different domains of validity. If successful, this may result in a merging of different systems to form a single system (hierarchy) with a domain of validity that is the union of the domains of the contributing systems.
We have presented an integrated model for the geologic time scale, its formal definition using type localities according to ICS guidelines, and the measurements involved in calibrating it against a numeric scale. The model is represented using a formal notation, the UML Class Diagram, which is widely used in software engineering and business-process analysis. Furthermore, we have used a profile of UML that allows us to generate an XML encoding compatible with geospatial standards from ISO and OGC. The latter means that information related to the time scale may be transferred using standard Web-service interfaces, such as Web Feature Service.
The UML model and XML schema, and example instances described in this report, are available online from https://www.seegrid.csiro.au/subversion/xmml/trunk/GeoSciML/draft/model/, https://www.seegrid.csiro.au/subversion/xmml/trunk/GeoSciML/draft/schema/, and https://www.seegrid.csiro.au/subversion/xmml/trunk/GeoSciML/draft/instances/geoTime/.
APPENDIX 1. INTRODUCTION TO UML CLASS DIAGRAMS
The UML (Object Management Group, 2001) is a well-known notation, and is described in many introductory and advanced books (e.g., Fowler and Scott, 2000). It may be used to model various technical, social, and natural systems, and is commonly used for analysis of business processes and in software design, particularly of interfaces.
The UML includes several diagram types. In this report we use only class diagrams (see Figs. 1–5). These are superficially similar to the entity-relationship (E-R) notation used in data modeling for relational database design. However, the UML includes refinements to support the description of systems according to object-oriented principles. In particular, the relationships between concepts are classified in various ways, indicated on the diagram by different line and arrow styles with annotations. In addition to attributes, other kinds of properties may be specified for each concept.
Furthermore, we use the capabilities of class diagrams in a constrained way, broadly corresponding to the profile described in ISO 19103 and in Annex E of the GML specification (Cox et al., 2003). The key elements used are summarized in the following paragraphs.
Each concept of interest is represented as a class, and shown on the class diagram as a multi-compartment box. The top compartment holds the name of the classifier, optionally preceded by the name of the package it belongs to. An instance of a class is called an object, with an “is a” relationship with the classifier (e.g., Abby is a person). In the case of abstract classes, which exist to support a coherent class hierarchy but will never supply instances, the name is shown in italics. Attributes of the class are listed in the second compartment, each by an entry of the form “name:Type,” with optional cardinality. Operations, responsibilities, constraints, tags, etc., are shown in additional compartments. We are primarily interested in class attributes.
Relationships between classes are indicated in the diagram by lines of various styles. In this study, we use two types of relationship: generalization and association.
Association is denoted by a line that may be ornamented with various arrowheads and labels at either or both ends. These indicate “has a” relationships between instances of the classes (e.g., Abby (a person) owns Iko (a cat)). Almost all relationships shown on an E-R diagram are comparable to UML associations. However, in the UML, these relationships may be named, and each end of the association may also carry a rolename. Cardinality may be expressed as an integer or a range, where, for example, “2..*” implies that at least two instances of the association are required but an unlimited number may be provided. No cardinality constraint implies exactly one. The association may be directed, shown by a stick arrowhead (→), such that an instance of the class at only one end knows about instances of the class at the other end. Filled and open diamond-arrowheads may be used to indicate tight and loose association (known as composition and aggregation), but are mostly not used here.
Specialization and/or generalization is denoted by a line with an open arrowhead (▿) adjacent to the generalized class. This indicates a relationship at the model level, where the child class bears an “is a type of” relationship to a parent (e.g., a cat is a type of animal). Specialization usually adds attributes and relationships to those inherited from the parent class, but may involve other constraints. As well as inheritance of properties, generalization usually also implies polymorphism, such that instances of the child class are considered to be instances of the parent. Thus, an association with a class implies a potential association with any of its descendents (e.g., if a person owns an animal, this might be a dog, cat, fish, or hamster, etc.). This last feature is particularly important and is used extensively in the model here.
It is important to understand that the diagram is merely a representation of an underlying model. Furthermore, one diagram will usually not show the entire model, but rather just a view of a selection of related classes, perhaps with only certain properties displayed. This is convenient, since it means that unnecessary detail can be suppressed in order to allow a diagram to illustrate particular points. But in order to understand the entire model, it is necessary to combine the information from several diagrams.
Some classes will appear in more than one diagram, describing a different subset of relationships with other classes in each diagram. These provide the joining points between the subsets of the model shown in different diagrams. The complete set of properties of a particular class is the union of properties shown where it appears in the various diagrams, together with other information that may not be shown on any diagram.
Thus, while UML diagrams may be constructed with generic drawing tools (including paper and pencil), professional UML tools maintain an abstract representation of the model, and use that to ensure consistency between different views.
Following the usage prescribed by ISO 19109 and used in GML, class attributes and associations are referred to collectively as properties, with the attribute name or association rolename providing the name of the property. Rolenames are required on the traversable ends of associations. Furthermore, following a lexical rule prescribed in GML 3, classnames are in UpperCamelCase, while attribute and rolenames use lowerCamelCase as far as possible.
This study was initiated as a contribution to the Chronos project. The work has been improved as a result of discussions with Cinzia Cervato, Morishige Ota, Ilene Rex and Charles Roswell, and comments by reviewer Peter Sadler. Cox's contributions were supported by the XMML consortium, CSIRO, and the Predictive Mineral Discovery Cooperative Research Center.