Abstract

For better understanding of geologic and geomorphic processes responsible for the formation of ridges and valleys widely distributed over the Earth’s surface, accurate delineation of the features is very important. This paper introduces an automated method for accuracy assessment of ridge and valley polylines using high-resolution digital elevation models (DEMs) derived from light detection and ranging (LiDAR) data and software implementation of a Python add-in toolbar for Esri’s ArcGIS1. Compared with existing methods that use two input polyline layers for comparison, the new method automatically extracts reference points from the DEM on-the-fly. In addition to producing statistical results of positional accuracy following the National Standard for Spatial Data Accuracy, the method also identifies locations where inaccuracies occur, so that the polylines can be edited and reevaluated. Two data sets from New Mexico and Colorado are also used as samples to demonstrate the application of the geographic information system (GIS) add-in. The results suggest that the free GIS add-in can be widely used by geoscientists.

INTRODUCTION

Ridges and valleys are very common landforms on the surface of the Earth. In addition to the most common ridges that are associated with stream drainage valleys, ridges can form as a result of glacial activity (such as moraines and drumlins [Spagnolo et al., 2010], arêtes and eskers [Shreve, 1985]); eolian processes (such as dune ridges; Ewing and Kocurek, 2010); stratigraphic evolution (such as the Valley and Ridge Appalachians); and volcanic eruption and asteroid strike (such as circular ridges around volcanic craters and impact craters). Similarly, most valleys are caused by water erosion, while some others are caused by glacial erosion (such as glacial valleys) and tectonic activity (such as rift valleys). Accurate delineation of ridges and valleys are essential for a better understanding of geologic and geomorphic processes responsible for the formation of these features. Therefore, developing an automated method for assessing the positional accuracy of ridges and valleys can be useful for studies involving these features. Ideally, such a method should be able to report statistical results of accuracy assessment and to identify locations where inaccuracies occur.

Applications of geospatial technology have produced many topographic features in the past decades, particularly from digital elevation models (DEMs) (e.g., Sagar et al., 2003; Yang et al., 2011; Martha et al., 2012). Although accuracy assessment of DEMs has been well documented (e.g., Muller, 1999; Reusser and Bierman, 2007; Aguilar and Mills, 2008; Höhle and Höhle, 2009), few studies have been carried out for automated assessment of linear topographic features such as ridges and valleys, which can be derived from DEMs or mapped using other methods. Positional accuracy of these features needs to be defined by measuring differences between points on the features (“tested source”) and corresponding points determined with higher accuracy (“reference source”) (Goodchild and Hunter, 1997). It should be noted that the tested source and reference source are different from the ground truth. However, if the accuracy of the reference source is sufficiently high, the difference between it and the ground truth can be ignored (Goodchild and Hunter, 1997). For positional accuracy assessment of linear features, Goodchild and Hunter (1997) proposed a procedure in a geographic information system (GIS) by computing the percentage of the length of the tested source lying within given buffer distances of the reference source. In some cases, a reference source is not as readily available as linear features in GIS but could be obtained on-the-fly from other data sources such as high-resolution DEMs. With increasing availability of LiDAR data, it becomes feasible for positional accuracy assessment of ridges and valleys using LiDAR-derived DEMs.

This paper introduces an automated method for accuracy assessment of ridge and valley polylines using high-resolution DEMs derived from LiDAR data, and software implementation of a Python add-in toolbar for Esri’s ArcGIS [see footnote 1]. Following the method and software implementation section, two data sets from the White Sands Dune Field (WSDF), New Mexico, and Green Mountain near Boulder, Colorado, are used to showcase the application of the GIS add-in. The proposed method is based on two assumptions: (1) The accuracy of LiDAR-derived DEMs is sufficiently high compared with DEMs created from other data sources, as reported in many studies (Schumann et al., 2008; Sexton et al., 2009; Tarolli, 2014); and (2) random points on the tested source can be used for comparison between ridge and valley locations on the polylines and ridge and valley locations on the DEM, in lieu of predefined points on the polylines and matching ground points, which can be difficult or expensive to obtain. The tested source and reference source usually should be from independent data sources. However, if ridge and valley features are manually or digitally extracted from the DEM used as reference source, the proposed method should also be applicable to the ridge and/or valley features because manual digitizing and geoprocessing (such as thinning and smoothing) may induce positional errors.

METHOD AND SOFTWARE IMPLEMENTATION

Method

Given a ridge or valley polyline RS (Fig. 1), a series of random points can be generated on the polyline through point construction in ArcPy, Esri’s ArcGIS Python module. If the length of RS is 1, the distance between R and a random point on RS can be represented by a number between 0 and 1. The random point is close to R if the number is close to 0, while the point is close to S if the number is close to 1. Therefore, random points on a polyline can be created by generating a series of random numbers between 0 and 1.

For any random point p(x, y) on the polyline (Fig. 1), if a straight line p1p2 can be constructed so that p1p2 is normal to curve RS at point p, the straight line p1p2 can be used as a profile line to extract elevations from the input DEM. To construct the straight line p1p2, two points pb(xb, yb) and pa(xa, ya) can be created immediately before and after point p(x, y) with a curve length of 2ε using positionAlongLine() method of a geometry object. If the ε is small enough (for example ε = 10–10), curve pbpa can be considered a straight line pbpa, and p0(x0, y0), the midpoint of the profile line p1p2, can be considered the same point as p(x, y). Using p1p2 as the profile line, elevations can be extracted from the DEM (Fig. 2). Once a profile is created, the position (d) of the maximum (for ridges) or minimum (for valleys) value of the profile can be compared with the polyline position d0 to calculate the residual |dd0| (d0 is the buffer size set by the user). The above process is repeated for each random point on each polyline. Finally, the mean absolute error (MAE) and root-mean-square error (RMSE) are calculated as statistical measures for accuracy assessment. Comparisons and discussions of MAE and RMSE can be found in Chai and Draxler (2014). The MAE and RMSE are defined as:
graphic
graphic
The above definition of RMSE does not separate RMSEs in x and y directions (RMSEx and RMSEy) and can be considered circular RMSE, or RMSEr. Since (did0)2 = (xix0)2 + (yiy0)2, if RMSEx = RMSEy, RMSEx and RMSEy can be calculated from RMSEr:
graphic
graphic
Based on the United States National Map Accuracy Standards (NMAS) (U.S. Bureau of the Budget, 1947) and the American Society for Photogrammetry and Remote Sensing (ASPRS) Accuracy Standards for Large-Scale Maps (ASPRS Specifications and Standards Committee, 1990), the Federal Geographic Data Committee (FGDC) developed the National Standard for Spatial Data Accuracy (NSSDA) in 1998. In NSSDA, positional accuracy is estimated using root-mean-square error (RMSE) and is reported in ground distances at the 95% confidence level, which means that 95% of the positions in the data set will have an error that is equal to or smaller than the reported accuracy value. Assuming systematic errors have been eliminated as well as possible and error is normally distributed and independent, the NSSDA accuracy, Accuracyr, can be calculated using the following formula (Federal Geographic Data Committee, 1998):
graphic

Software Implementation

Following the method described above, a Python add-in toolbar named “Ridge/Valley Accuracy” is developed for ArcGIS 10.4.1 using the Python programming language and ArcPy (Fig. 3). The add-in needs licenses for ArcGIS Spatial Analyst and 3D Analyst Extensions. The light-weight add-in is freely available (see footnote 1), and can be easily installed by ArcGIS users.

The pseudocode and the items on the toolbar are explained below.

Pseudocode:

  • Set the following parameters: workspace, DEM raster, ridge/valley option, polyline feature class (PFC) for ridges or valleys, buffer size (d0), and minimum polyline length (MPL) for generating random points. Set ε = 10–10.

  • Check out Spatial Analyst and 3D Analyst Extensions.

  • Create a new polyline feature class profLines for profile lines.

  • Create a new point feature class Samples for random points, and add a field (double data type) named “RESIDUAL.”

For each feature f in PFC:

If (length of f) ≥ MPL:

  • Calculate number of random points for f: N = int(length of f) / int(MPL).

  • Generate a list rdmList for N random numbers between 0 and 1.

  • For each random number rdm in rdmList:

    • Generate a random point p(x, y) on feature f using rdm.

    • Generate two points pb(xb, yb) and pa(xa, ya) immediately before and after p using ε.

  • Calculate dx = xa – xb, dy = ya – yb.

  • Calculate the slope of the line from pb to pa.

  • Generate profile line segment p1p2 within the DEM extent, and save the line segment to profLines (p1p2 is perpendicular to pbpa and has a length of 2d0).

  • Extract DEM values using p1p2, and save the results to a temporary table tmpTable.

  • If option = = “Ridge”:

    • Find the maximum elevation Zmax and the corresponding distance d in tmpTable.

  • If option = = “Valley”:

    • Find the minimum elevation Zmin and the corresponding distance d in tmpTable.

  • Insert point p(x, y) to Samples.

  • Calculate residual |dd0| and save it to field “RESIDUAL” in Samples.

  • Calculate root-mean-square error (RMSE) and mean absolute error (MAE).

  • Clear variables in memory.

Description of items on the toolbar:

  • (1) Workspace: an empty folder for output files; (2) DEM: input raster for digital elevation model; (3) Ridge/Valley: option for ridge or valley; (4) Feature Layer: polyline feature class for ridges or valleys; (5) Buffer: distance from the polyline features (buffer size is half of the profile length); (6) MPL: minimum polyline length for random sampling points; if the MPL is 30 m, no random points will be generated on polylines shorter than 30 m; otherwise, the number of random points is calculated using integer division between polyline length and MPL. For example, a polyline of 100 m long has 100/30 = 3 random points. (7) OK: after clicking OK, a point shapefile for random sampling points and a polyline shapefile for profile segments will be created in the workspace folder. The point shapefile has a field “RESIDUAL” showing the value of |dd0|. In addition, the total processing time, the root-mean-square error (RMSE) and the mean absolute error (MAE) will be displayed in a message box. Note that the profiles in Figure 2 are for illustration purposes only. In actual implementation, the profiles are handled in computer memory rather than being displayed on screen.

APPLICATION DEMONSTRATION

The application of the Ridge/Valley Accuracy Python add-in toolbar for ArcGIS is demonstrated using the following two data sets: (1) Dune crestline polylines and a LiDAR-derived 1-m resolution DEM from a study area of 401 m by 802 m in the White Sands Dune Field (WSDF), New Mexico, USA (Fig. 4). (2) Valley polylines and a LiDAR-derived 1-m resolution DEM from a study area of 1 km by 1 km in Green Mountain near Boulder, Colorado, USA (Fig. 5) (Anderson et al., 2012). For the WSDF data set (“ridge dataset”), the buffer value is 10 m, the MPL value is 40 m, and 53 random points are generated. For the Green Mountain data set (“valley dataset”), the buffer value is 10.16 m, the MPL value is 30 m, and 99 random points are generated. It takes ∼5 seconds and 9 seconds to obtain results from the ridge data set and the valley data set using a laptop computer with 2.90 GHz dual-core processor and 8.0 GB RAM. The residual values, MAE, RMSE, and Accuracyr are listed in Tables 1 and 2. Since a shapefile for random sampling points is automatically saved for each data set along with residual values at each point, the add-in not only produces the accuracy values, but also shows where inaccuracies occur so that polylines can be manually edited to improve data accuracy. For example, the red points in Figure 5 show locations where the residual is equal to or greater than 10 m. These locations are listed in Table 2 with residual values close or equal to 10.16 m (the buffer value). The valley polylines at these locations can be manually edited, and a new accuracy assessment report can be produced.

CONCLUSIONS

This paper introduced an automated method for accuracy assessment of ridge and valley features using high-resolution DEMs such as those derived from airborne LiDAR data. The development of a Python add-in toolbar for automated accuracy assessment of ridge and valley features in ArcGIS was also presented. Compared with some existing methods for positional accuracy assessment of linear features, the new method has two major features: (1) As reference points are obtained from a high-resolution DEM on-the-fly, a reference source does not need to be readily available as a linear feature layer in GIS; and (2) the new method not only produces statistical results of positional accuracy following the National Standard for Spatial Data Accuracy but also identifies locations where inaccuracies occur. The second feature allows users to run accuracy assessment, identify and correct the errors, and rerun accuracy assessment until satisfactory accuracy results are produced. The initial experiments of the Python add-in using LiDAR-derived DEMs from two study areas show that it takes only seconds to complete the whole process described in the pseudocode for ∼100 random points.

Some limitations of the method are discussed here. Firstly, the scale and resolution of the tested source and reference should be taken into account. For example, comparing regional-scale (small-scale) ridge and valley features with high-resolution (less than 5 m) DEMs may not be suitable because minor changes in the tested source (such as polyline generalization) may create large positional errors when compared with high-resolution DEMs. Secondly, even for large-scale valley polylines, the method works better for V-shaped valleys than U-shaped valleys, because U-shaped valleys may have multiple local minima which make automated assessment difficult. Further study is needed to address the second limitation.

ACKNOWLEDGMENTS

The White Sands LiDAR data acquisition and processing was completed by the National Center for Airborne Laser Mapping (NCALM: http://www.ncalm.org). The Green Mountain LiDAR data acquisition and processing were completed by the National Center for Airborne Laser Mapping (NCALM: http://www.ncalm.org), with project director Qinghua Guo. Collection for Dr. Suzanne Anderson (University of Colorado/INSTAAR, Boulder) as part of the Boulder Creek Critical Zone Observatory (CZO: http://czo.colorado.edu/), funded by National Science Foundation (NSF-0724960). NCALM funding was provided by NSF’s Division of Earth Sciences, Instrumentation and Facilities Program, EAR-1043051. LiDAR data used in this study are based on data services provided by the OpenTopography Facility with support from the National Science Foundation under NSF Award Numbers 1226353 and 1225810. This project was supported by the National Natural Science Foundation of China (No. 41371434) and the State Key Laboratory of Remote Sensing Science, Chinese Academy of Sciences (Project OFSLRSS201607).

1Supplemental Materials. Python add-in for ArcGIS 10.4.1(and later versions), test data zip file, and user’s guide. Please visit http://doi.org/10.1130/GES01477.S1 or the full-text article on www.gsapubs.org to view the Supplemental Materials.
5 figures; 2 tables; 1 supplemental item.

Supplementary data