Abstract

Complications exist when describing the dimensionality of geoscientific data sets. One difficulty is that there are a number of different, valid ways to consider dimensionality. Unlike traditional methods of field data capture, modern digital methods typically record the position of every sample point relative to a three-dimensional (3D) coordinate system, even for simple measurement strategies such as 1D line sampling. Critically, the best way to describe the dimensionality of a data set will depend on the context in which the data are presented. Terms such as “2½D” are generally inappropriate for nonspecialist audiences. Because ambiguity and inconsistency are already widespread, it is usually advisable to explain clearly the nature of each data set, the method used to capture the data, and particularly whether data acquisition was restricted to the outcrop surface or includes sampling of the subsurface.

INTRODUCTION

We live in a three-dimensional (3D) world, and there is common agreement that methods of geospatial data acquisition and analysis aim to investigate the geological world in three dimensions. However, at a recent Penrose Conference (“Unlocking 3D Earth Systems—Harnessing New Digital Technologies to Revolutionize Multi-Scale Geologic Models”; Durham, 17–21 September 2006), vigorous discussion developed concerning the terminology needed to describe different kinds of geospatial data. Although debate initiated around the validity of one specific term (“2½D”), plenary discussion revealed a surprising lack of consensus regarding the fundamental semantics of dimensionality in relation to geospatial data. This note is a summary of key points raised during the discussion sessions at the conference, together with recommendations that are aimed toward a standard nomenclature for the description of dimensionality of spatial data in the geosciences.

DIMENSIONALITY—DEFINITIONS AND COMMON USAGE

One of the main contributing factors underlying current terminological ambiguity is that although there are a number of precise mathematical definitions of dimensionality, common usage in the physical sciences is generally more varied and less exact.

Dimension and Spatial Position

Within a traditional mathematical framework based on Euclidean geometry (e.g., Abbott, 1884), a single point has a topological dimension of zero. Similarly, a line or curve connecting two points is one-dimensional, a plane or surface is two-dimensional, and a volume is three-dimensional. From this perspective, the dimensionality of an object is independent of whether the object's location in space is known. For example, a point may have X, Y, and Z attributes with known values that define its position relative to a given coordinate system, but this does not alter the point's Euclidean dimension of zero. So from a mathematical standpoint, dimension can be defined in relation to topology, and dimensionality does not directly equate to spatial position. However, in common scientific usage, this distinction is less clear cut, and dimensionality is often taken to be synonymous with the number of parameters (or coordinate axes) needed to describe the position of any point within a given space.

Coordinate Reference Frames

While dimensionality is an intrinsic topological property of a geometric object, the position of the object is defined in relation to an extrinsic spatial reference frame (or coordinate system). Reference frames used to describe geospatial data are typically based on spherical or geoidal coordinate systems (e.g., with locations given as values of latitude, longitude, and elevation), although Cartesian (e.g., X, Y, Z values relative to three orthogonal axes) and cylindrical coordinate systems (e.g., azimuth, inclination and radial distance) are also widely used for small areas or volumes. Because a given coordinate system is entirely arbitrary, a position within one coordinate system can be mapped (i.e., transformed) to a different coordinate system, without any change in dimension.

Dimensionality of Multiple Data Points

In many branches of science, it is common to describe the dimensionality of a data set as a whole, not just the dimension of each individual measurement within the set. For example, a data set is described as “1D” when the data collectively represent a line, even if each constituent measurement in the set is a zero-dimensional data point. Thus in general usage, borehole data are usually considered to be 1D, cross sections and maps represent 2D data, and 3D seismic data are clearly classed as 3D. Many traditional methods of gathering field data are also consistent with this terminology.

Traditional and Geospatial Sampling Strategies

Prior to the development of digital mapping and surveying methodologies, most geological field data were not precisely spatially located (McCaffrey et al., 2005). For example, a traditional approach to line sampling (and section logging) would record the position of individual measurements in terms of distance along the sample line (i.e., a one-dimensional position recorded as length along a tape measure). Similarly, box-counting methods and planetable mapping would sample a surface (two-dimensional) and record position relative to only two axes (e.g., X and Y, but not Z). These data sets, although internally accurate in relation to their own reference axes, were generally not precisely located with respect to an external 3D coordinate system. Hence with traditional methods such as these, the dimensionality of the resultant data sets corresponds directly with the number of axes used to record the spatial position of each measurement. In contrast, a modern approach to line sampling or surface mapping, using digital methods of geospatial data acquisition, will typically record the full XYZ position of every sample point (Fig. 1). Nevertheless, a data set collected by sampling along a line using a differential global positioning system (dGPS) can still be considered as one-dimensional, even if there is no longer a direct match between the dimensionality of the bulk data set and the number of coordinate axes used to record the spatial position of each point along the sample line. Similarly, modern drilling methods in the hydrocarbon industry allow the spatial position of the drill bit to be recorded, so the 3D geometry of the entire 1D borehole is typically known in detail. This emphasizes that in common usage the overall dimensionality of a data set generally reflects the dimensionality of the space that is sampled by the data set as a whole, irrespective of both the dimensionality and spatial location of individual measurements in the set.

Data Sampling and Sample Density

The dimensionality of a data set is also independent of whether the density of data sampling is high or low. Many types of spatial data (including new digital methods of data capture such as airborne and terrestrial laser scanning) consist of sets of many individual points that collectively sample a given region of space. The proportion of a region that is sampled by a data set (i.e., the sample density) can often be controlled by the geologist at the time of data capture, albeit within the inherent limits of the method and equipment used. For example, with terrestrial laser scanning (TLS), choosing a fine angular resolution will result in closely spaced points that give a dense sampling of the scanned surface; a coarser resolution will make scanning quicker, but will not sample the surface as densely. In either case, however, data capture will be limited to surface measurements of the region scanned, since TLS is a nonpenetrative method. By comparison, penetrative volumetric methods such as reflection seismics and ground-penetrating radar (GPR) are able to sample larger proportions of a given space; i.e., they have the potential to capture geological data sets of higher dimensionality. TLS, 3D seismic surveys, and GPR can all produce rich data sets recorded in a 3D reference frame; however, the penetrative methods can capture fully 3D geological architectures, while TLS can only capture the topographic surface of the outcrop.

Connectivity of Data Points

Data sampling methods often have implicit ways to connect individual points in a data set to recreate the space sampled by the data. For example, it is straightforward to recreate a line sample simply by sequentially connecting the points in the data set; i.e., in order of increasing distance along the line (Figs. 2A, 2B). Similarly, a raw stream of laser-scan points can be restored to create a mesh of the scanned surface (Figs. 2C, 2D). The way in which individual points in a data set are connected to each other reflects the topological dimensionality of the space sampled by that data set. In a one-dimensional data set, data points can only be connected sequentially; a point can be connected to no more than two other points (hence branches are precluded), and there should be no closed loops or line segments that cross. In a two-dimensional data set, points can be connected to other points to form patches, although no more than two patches can share a common edge, and patches cannot form closed surfaces. In a three-dimensional data set, more than two patches can share a common edge, and patches interconnect to form a lattice that encloses a volume (Figs. 2E, 2F).

Fractional Dimensions

Whereas a traditional mathematical approach based on Euclidean n-space restricts dimensions to positive integers, Mandelbrot (1982) described a framework in which fractals provided a link between dimensionality and the proportion of space occupied by a shape or object. In this way, while a plane is still regarded as 2D, a more irregular surface (traditional topological dimension also 2D) covers more of 3D space than the plane, but does not completely fill a 3D volume: therefore its frac-tal dimensional is somewhere between 2 and 3. Typical topographic surfaces on Earth have a fractal dimension of ∼2.2–2.7 (Mark and Aronson, 1984). Hence in the context of geospatial data, the concept of fractional dimensionality can be useful, because it helps to highlight the mismatch between the topological dimension of a natural surface, and the number of coordinate axes needed to record the position of any point on the surface. That is, although the topological dimension of the surface of an outcrop is two, we will need three axes to be able to describe fully the geometry of the surface (irrespective of the position and orientation of our chosen coordinate reference frame).

OTHER COMMON USAGE

“2½D” to Describe Irregular Surface Data

Improved understanding, based on the concept of fractal geometry, that there are different ways to consider dimensionality, has prompted use of the term “2½D” (or “2.5D,” “2.nD,” “2.cD,” or “2.xD”) to describe irregular topographic and geological surfaces, such as the uneven surface of an outcrop. In particular, use of the term in this sense helps to emphasize that an irregular surface will occupy a greater proportion of a space than a 2D plane, but that surface data are inherently more limited than 3D volumetric data that sample inside the outcrop.

“2½D” to Describe Digital Elevation Models and Surface Drapes

Most types of digital elevation model (DEM) consist of a regularly spaced grid of elevation data. Such data sets are usually stored as an ordered stream of height values, with space in the data structure to store only a single elevation at any XY position (e.g., Bonham-Carter, 1994). This has led some users to describe DEM data sets as “2½D,” and DEM height values are used to extend a 2D aerial image (photo or satellite data) along the Z axis to give a “2½D drape.” A key factor used to explain the use of “2½D” in this context is that common DEM data structures are often not able to accommodate multiple Z values. This is also a traditional limitation with some geographic information system (GIS) programs, many of which were not originally designed to be able to handle data sets with multiple Z values. This is not a serious limitation for most topographic data sets, since so little of the Earth's surface (presumably <<0.001%) is overhanging, but is clearly a major drawback if the full 3D geometry of folded or thrusted geological layers is to be represented.

Usage of “2½D” in this context is unrelated to the topological or fractal dimension of the surface, or to the mismatch between either of these measures of dimensionality and the fact that the position of every point in the data set is known in 3D. Rather, it relates to a limitation of some aerial acquisition methods and/or the data structure of the formats commonly used to store aerially captured data sets.

Computer Visualization

Within the realm of computer graphics, 3D visualization refers to the capability of hardware and software to render a scene that is composed of objects constructed from graphics primitives (points, lines, planes), all with known XYZ positions (Foley et al., 1990). Each frame that is rendered is a snapshot of the 3D scene, projected onto the 2D viewing plane (e.g., the computer screen or head-mounted display), so the appearance of each object depends on its location relative to the viewing position. Hence in computer visualization, “3D” refers to the general nature of the scene (the “world”) and the coordinate system used to store, manipulate, and render objects within it. In this context, all types of geospatial data can be considered as 3D; the same visualization hardware and low-level 3D graphics software libraries are used for the display of light detection and ranging (lidar) point cloud data, object-based geomodels, and volumetric 3D seismic data alike.

“2½D” in Computer Aided Design and Computer Games

Used in the context of computer graphics, “2.5D” is unrelated to mathematical definitions of dimension, and is used to denote reduced graphics capability relative to fully 3D systems. In computer aided design (CAD) software, a 2.5D surface is one in which coordinate points can be extruded along an axis perpendicular to the main 2D viewing plane (directly analogous to a “2½D” GIS drape as described above). Since no “overhangs” are allowed with a 2.5D extrusion, the resultant objects are easier (and generally cheaper) to transform from a CAD model to a machined physical prototype, though they are limited to less complex forms.

In computer gaming, “2.5D” is a catchall term to describe a number of different programming techniques that use 2D graphics algorithms to mimic 3D appearance. These range from the depiction of 2D images at different depths in a scene, zooming a 2D image of an object so it appears to be moving closer, restricting the view position so that it is always in front (not above) the 2D facades that compose the scene, and many others. All methods are aimed at increasing realism by using fast 2D rendering methods to produce pseudo-3D visualization.

CONCLUSIONS AND RECOMMENDATIONS

Differences exist regarding the description of the dimensionality of geospatial data sets. This has arisen mainly because common scientific usage of dimensionality differs from a precise mathematical definition of dimension based on topology. Mathematically, dimension and spatial position are different concepts; a point is zero-dimensional, although we use a 3D geospatial reference frame to define the position of the point in 3D space. For irregular geological and topographical surfaces, there is generally a discrepancy between topological dimension (= 2), and the number of coordinate axes needed to fully describe the geometric complexity of the surface (= 3). The concept of fractal dimension can help to reconcile this discrepancy by defining dimension in terms of the proportion of 3D space that an irregular surface occupies.

Many modern methods used to capture geospatial data, including terrestrial laser scanning and dGPS area sampling, record XYZ position for every measured point on a topographic surface. In describing the resultant data sets as 3D, we are choosing to emphasize that all points in the data set have known spatial positions within a 3D coordinate system, and that the full geometric detail of the data can only be described if we use all three coordinate axes. Alternatively, if we were to describe the same data set as 2½D (or “2.cD,” etc.) we are choosing to emphasize that the data are sampling an irregular surface, and that the data set is not as dimensionally rich as full 3D volumetric data acquired using a penetrative method. Furthermore, if the same data set is presented in map form (i.e., projected onto a 2D plane), we would generally describe it as a 2D representation, even if the underlying data set has higher dimensionality. In short, the dimensionality of geospatial data depends on the terminology we choose.

Since there are a large number of mathematical definitions that allow dimensionality to be considered in precise, though different ways (e.g., Lebesque, Hausdorff, Fourier, and Krull dimensions), and since common scientific usage is variable and often inconsistent, it is unrealistic to expect that a single definition of dimensionality will be adequate to describe geospatial data in every situation. Consequently, the most appropriate description of a data set will inevitably depend on the context in which the data set is presented, and upon the expectation and understanding of the target audience (even though this situation does nothing to reduce terminological inconsistency, and can mean that the same data set may be assigned a different dimensionality for different uses and/or users).

Based on the above considerations, our recommendations for future usage are as follows, with examples of recommended use given in 01Table 1.

  • For many situations involving the practical application of geospatial data, it can often be counterproductive to discuss dimensionality in terms of topology (or other areas of mathematics), except for specific purposes within a specialist group. To most people who are customers (or end users), rather than providers of geospatial data, a precise definition that distinguishes between different mathematical representations of dimensionality is of academic interest only.

  • When a geospatial data set is presented in a way that disregards precise 3D spatial location data, it is usually most appropriate to approximate the dimensionality to match the way the data set is represented: i.e., follow the current common usage of describing line samples and borehole depth logs as 1D, and maps and sections as 2D. Where necessary, additional clarification can be given to emphasize that the position of each constituent data point is known within a 3D coordinate system.

  • Our preference is for geospatial data sets that capture the geometry of irregular surfaces (including, e.g., ground-based and airborne lidar data) to be described as 3D, rather than unqualified use of “2½D” or similar. This suggestion is partly because three coordinate axes (i.e., dimensions, sensu lato) are needed to record the geometry of such data, and partly because the actual fractal dimension of the surface is rarely measured, but mostly for the pragmatic reason that in our experience, the term “2½D” is unintuitive and confusing to a majority of end users, and usually provokes an unfavorable response. However, the distinction between surface and volumetric data sets is crucially important to most end users, and we therefore have a strong preference that these data types are distinguished through use of terms such as “surface 3D dataset” and “volume 3D dataset.” We also recommend that wherever possible, additional clarification is given that highlights the nature of the data set in this regard. For example: “This is a surface 3D dataset captured by terrestrial laser scanning, in which the precise geometry of the outcrop is measured in detail by recording the 3D position of many millions of points across the surface of the exposure. This method is nonpenetrative, and therefore the raw dataset contains no data from within the outcrop….”

  • Similarly, the use of “2½D” to describe topographic surface drapes is unnecessarily confusing for many nonspecialist end users, and 3D is a preferable term. In rare cases where there is overhanging topography that will not have been captured by aerial imagery or DEM data, this can be explained in more detail.

  • Volumetric data sets should clearly also be described as 3D, but can be distinguished from surface data sets by emphasizing that the method used for data acquisition is penetrative, and so is able to sample the subsurface.

  • We suggest that the use of fractional dimensions to describe geospatial data sets is more appropriate for specialist research environments, particularly where fundamental aspects of dimensionality are important, including studies that measure the actual fractal geometry of topographic and geological surfaces, and those looking at, for example, scaling relationships.

  • Within a specialized environment in which a more in-depth discussion of dimensionality may be appropriate, care is needed to explain conceptually what you mean by dimensionality. For example, inconsistency and misunderstanding can be reduced by explaining how the surface of an irregular outcrop can have a topological dimension of 2, a fractal dimension between 2 and 3, and yet be measured relative to a 3D coordinate system.

We thank Dogan Seber and Randy Keller for useful reviews and editorial assistance. Dogan Seber suggested the terms “surface 3D dataset” and “volume 3D dataset,” which we have adopted. Thanks to participants at the Penrose Conference in Durham for lively and informative debate on the dimensionality of geospatial data.