Abstract

With the advent of digital elevation models (DEMs) and geographic information systems (GIS), several methods have been proposed to extract channels from raster DEMs. Light detection and ranging (lidar) can produce high-resolution DEMs and poses new challenges to existing methods for channel extraction. This paper introduces a semi-automated method for extracting stream channels and channel profiles from high-resolution DEMs using image processing techniques. Based on user-specified approximate locations of start and end points and a few simple parameters, the method implements five automated steps: (1) channel detection using a local minimum value search; (2) channel delineation using Bresenham’s line algorithm and mathematical morphological operation; (3) vectorization; (4) profile generation; and (5) accuracy assessment. The method is implemented as an ArcGIS Python add-in toolbar named Channel Extraction. The application of the toolbar is demonstrated using a lidar-derived DEM in a study area along the San Andreas fault in California, USA. The software and test data are freely available for download (see Supplemental Files1). The demonstrated samples suggest that this new semi-automated method for extracting channels and channel profiles is flexible and user-friendly and can produce accurate results to support geomorphic studies.

INTRODUCTION

Stream channels are critical components in hydrology, fluvial geomorphology, and tectonic geomorphology, and have long been studied qualitatively and quantitatively (Horton, 1932, 1945; Hack, 1957, 1973; Broscoe, 1959; Strahler, 1964; Shreve, 1966; Sieh and Jahns, 1984; Montgomery and Buffington, 1997; Ouchi, 2005; Ambili and Narayana, 2014; Luirei et al., 2015; Mudd et al., 2018; Yadav and Hatfield, 2018). Length, drainage area, slope, and cross section are some of the important measurements for stream channels. These measurements reflect the dynamics between tectonics, climate, and surface processes, and in early studies, they were generally made on topographic maps, aerial photographs, or in the field (Schumm, 1956; Hack, 1957; Morisawa, 1957; Broscoe, 1959). Such measurements were time-consuming and subject to considerable inaccuracy due to many local irregularities (Hack, 1957). With the advent of digital elevation models (DEMs) and geographic information systems (GIS), many methods have been proposed to extract channels from raster DEMs (Montgomery and Dietrich, 1988; Tarboton et al., 1991; Montgomery and Foufoula-Georgiou, 1993; Costacabral and Burges, 1994; Tarboton, 1997; Giannoni et al., 2005; Pelletier, 2013; Bhowmik et al., 2015). Some of the more recent software tools include TecDEM (Shahzad and Gloaguen, 2011), TauDEM (http://hydrology.usu.edu/taudem/taudem5/index.html), and GeoNet (Sangireddy et al., 2015).

A typical workflow of the channel extraction methods is: (1) filling pits in the DEM; (2) calculating flow direction for each cell; (3) calculating flow accumulation for each cell; (4) identifying basin boundaries and outlets; and (5) extracting channels using a flow accumulation threshold or terrain curvature. A major limitation of the above workflow is that different channels usually have different contributing areas, and a single threshold for flow accumulation may not be able to extract desired channels (McNamara et al., 2006). In addition, filling pits in the DEM poses another problem for the traditional workflow for channel extraction because pit filling modifies the original DEM and may induce unexpected changes to important topographic features. Some recent methods are able to extract more accurate stream longitudinal profiles from unfilled DEMs (e.g., Byun and Seong, 2015).

Light detection and ranging (lidar) has provided unprecedented details of Earth’s surface features in the past two decades (Woolard and Colby, 2002; Staley et al., 2006; Arrowsmith and Zielke, 2009; Dong, 2012, 2015; Dong and Chen, 2018). Many studies have been carried out specifically for channel analysis using lidar data (Cavalli et al., 2008; Passalacqua et al., 2010a, 2010b; Hunt and Royall, 2013; Hooshyar et al., 2015). For example, Cavalli et al. (2008) studied the performance of lidar data in the recognition channel-bed morphology; Passalacqua et al. (2010a) developed an automated method for channel network extraction from lidar databased on nonlinear diffusion; Hunt and Royall (2013) analyzed changes in stream channel cross sections in an urban-rural land-use boundary based on lidar data; and Hooshyar et al. (2015) investigated extraction of wet channel networks using lidar intensity and elevation data. In morphotectonic studies, Zielke and Arrowsmith (2012) developed MATLAB-based software tools for back-slipping of offsets along active faults and visual assessment of channel reconstruction. While lidar data are being increasingly used in tectonic geomorphology (Lin et al., 2009, 2013; Dong, 2015; Dong and Chen, 2018), few studies have been conducted for the analysis of channel longitudinal profiles derived from lidar data to support morphotectonic studies. This can be partly attributed to the following reasons: (1) Some lidar data for morphotectonic studies are usually acquired in a limited area along active faults and do not include complete drainage areas or channel heads, prohibiting the use of traditional methods for channel extraction; (2) geologists often need to select one major channel and obtain longitudinal profiles and measurements from the main channel efficiently (for example, Ouchi [2005] measured ten offset channels across the San Andreas fault by walking along each channel and taking measurements for analysis of channel development); yet a flexible and accurate tool is not available in current GIS software packages.

The objective of this study is to develop a semi-automated GIS tool with minimal user input to (1) extract channels from lidar-derived DEMs; (2) extract longitudinal profiles of the channels; and (3) calculate quantitative measures including positional accuracies of the channel and slopes of the longitudinal profile segments. The method is semi-automated because the user needs to manually define the approximate start and end points of the channel to be extracted and specify a few simple parameters. Once the input parameters are specified, the next steps are fully automated, and output files are saved in the user-specified folder. The methods, application demonstrations, discussion, and conclusions are presented in the following sections.

METHODS

Methodology Flow Chart

The proposed method is semi-automated and only needs a few simple inputs from the user: output folder, DEM raster, search window size (w), length of cross profiles, number of cross profiles (optional), and on-screen digitizing of the start and end points near the channel. The methodology flow chart is shown in Figure 1, and several important steps are described in the following sections.

Channel Detection

The basic idea behind our channel detection is to connect the local minimum cells in the DEM starting from a point in the channel down to the lower part of the channel. The start point Pa(i, j) and end point Pz(i, j) are provided by the user through on-screen digitizing near the channel. These two points do not need to be located at the bottom of the channel but should be located as close as possible to the channel. The output of channel detection is a binary raster, with 1s for channel cells and 0s for non-channel cells. The lowest cell Pb(i, j) around the start point is automatically identified in the w × w search window and set a value of 1; then the program will search for a new local minimum Pc(i, j) in a 3 × 3 window around Pb(i, j) and set the cell value to 1. If Pc(i, j) is not found in the 3 × 3 window, the window size will increase to 5 × 5, 7 × 7, …, k × k (k < w). Figure 2A is an example showing the channel detection process. Starting from cell (i0, j6) that has an elevation of 8.17, a local minimum of 8.11 is found at (i1, j5) in the 3 × 3 neighborhood window around (i0, j6). Using (i1, j5) as the new center for the search window, a new local minimum of 8.03 is found at (i2, j4) in the 3 × 3 window around (i1, j5); then the search window is centered at (i2, j4). Note that a local minimum of 8.03 is also found at (i1, j4), but the local minimum of 8.03 at (i2, j4) is the last local minimum found in the 3 × 3 window from top to bottom. Situations such as this will be handled by mathematical morphological operations explained in the next section. At the new center (i2, j4), no values are lower than 8.03 in the 3 × 3, 5 × 5, and 7 × 7 windows around (i2, j4), until the window size changes to 9 × 9, where a new local minimum of 7.89 is found at (i4, j0). Note that cell values around (i2, j4) can be higher than 8.03 (Fig. 2B), particularly in high-resolution DEMs such as lidar-derived DEMs. The black cells in Figure 2 are detected channel cells.

A problem in Figure 2 is that the detected channel cells (black cells) are not all connected because local minimum cells are not found in the 3 × 3, 5 × 5, and 7 × 7 windows around (i2, j4). In situations such as this, Bresenham’s line algorithm (Bresenham, 1965) is employed to fill the gaps. The Bresenham’s line algorithm is one of the earliest algorithms in computer graphics. Originally proposed to draw two-dimensional (2D) lines, the algorithm has been extended to three-dimensional (3D) applications (Liu and Cheng, 2002; Au and Woo, 2011). In 2D cases, the Bresenham’s line algorithm determines which pixels should be used to best approximate a straight line between two points (x0, y0) and (x1, y1): 
graphic

Figure 3 is a 12 × 6 raster and a straight line with a start point at (1, 1) and an end point at (11, 5). For any pixel (xk, yk) on the line between the start point and the end point, Bresenham’s line algorithm determines if the next pixel is (xk+1, yk) or (xk+1, yk+1) based on which pixel is closer to the line at xk+1. More details of the algorithm can be found in Bresenham (1965) and Abrash (1997). The gray cells in Figures 2 and 3 are identified by Bresenham’s line algorithm.

Mathematical Morphological Operations

Mathematical morphology is a theory and technique for processing geometrical structures (Serra, 1982) and has been widely used for image analysis (Haralick et al., 1987; Wang et al., 1995; Dong, 1997). The basic idea in binary mathematical morphology is to process the image using a pre-defined shape (structuring element) and get results based on how the shape fits or misses the shape in the image. Let EN be an N-space, A a binary image in EN, and B a structuring element. The four basic operations in mathematical morphology are defined as (Haralick et al., 1987): 
graphic
 
graphic
 
graphic
 
graphic

As can be seen from Equations (4) and (5), an opening operation is an erosion followed by a dilation, and a closing operation is a dilation followed by an erosion. Figure 4 shows a binary image and four mathematical morphological operations of the image by a 3 × 3 square structuring element. It can be seen in Figure 4 that dilation can bridge the gaps between objects, and erosion will remove objects that are smaller than the structuring element.

The channel detection method described in the Channel Detection section may create unnatural channel cells such as right-angle turns, because it is possible to have several local minimum values in a search window, but only the last local minimum value is selected as a channel cell. To remove such irregularities, a closing operation with a 3 × 3 square structuring element is applied to the channel cells.

Vectorization and Profile Generation

After the mathematical morphological closing operation described in the previous section, the channel may have a width of more than one cell in some sections. It is necessary to reduce the width of the channel to one cell so that the channel cells can be vectorized to produce a polyline feature. The extra channel cells are removed using a thinning method in ArcGISTM (Zhan, 1993) to produce a binary raster (thin.tif) in the output folder. The thinned channel raster is then converted to a polyline shapefile (long.shp) in the output folder. The vectorized channel feature (long.shp) is used to extract a longitudinal profile from the DEM and produce an ExcelTM spreadsheet (long.xls) in the output folder. Optionally, if cross profiles are needed, the cross profiles will be generated automatically using a method developed by Dong (2017). The locations of the cross profiles are saved in a shapefile (cross.shp) in the output folder, along with individual ExcelTM spreadsheets (cross1.xls, cross2.xls, etc.). Figure 5 shows some intermediate outputs for channel detection.

The longitudinal profile spreadsheet (long.xls) has two columns for distance (FIRST_DIST) and elevation (FIRST_Z) values (Fig. 6). To help users better interpret the longitudinal profile, some extra columns are calculated based on the distance and elevation values. It is noticed that profiles generated in ArcGIS do not follow a specific direction (high elevation to low elevation or low elevation to high elevation), even though the polyline direction is from the end point to the start point of the channel. Therefore, reverse distance (REVERSE_D) and reverse elevation (REVERSE_Z) values are calculated and added to the spreadsheet, so that users can create longitudinal profiles using FIRST_DIST and FIRST_Z or REVERSE_D and REVERSE_Z. The longitudinal slope (si) values are calculated for the “TROR” column, which stands for “total rise over run,” to show the variation of the longitudinal profile slopes: 
graphic
where zi is the elevation (FIRST_Z) at location i, z0 is the first elevation value in the “FIRST_Z” column, and di is the distance from the first “FIRST_DIST” value to the current “FIRST_DIST” value. Similarly, “REVERSE_S” values are calculated using values from the “REVERSE_Z” column and the “REVERSE_D” column.
The variation of the longitudinal profile slopes can also be analyzed using segmented slope (ti) (Equation 7) and segmented linear regression (Equation 8): 
graphic
 
graphic
where k is the segment length, zi is the elevation at location i along the longitudinal profile, di is the distance reading at location i, m is the slope of the best-fit straight line, and b is the y-intercept. To determine whether the segmented slopes and linear regressions are based on the “FIRST_DIST” and “FIRST_Z” values or “REVERSE_D” and “REVERSE_Z” values, the first value z0 in the “FIRST_Z” column is compared with the last value zk in the “FIRST_Z” column. If z0 < zk, the “FIRST_DIST” and “FIRST_Z” values are used for segmented slope calculation and linear regression; otherwise, the “REVERSE_D” and “REVERSE_Z” values are used for segmented slope calculation and linear regression. Using a segment length of 50, distance and elevation values from the first row and the last row are used to calculate segmented slope ti (the “S_SLOPE” column in Fig. 6); then the first 50 rows of distance and elevation values are used to create a best-fit straight line. The m value of the line is written to the first row of the “F_SLOPE” column, and the coefficient of determination (R2) is written to the first row of the “R2” column. Then the start point of the segment moves down one row to get another 50 rows of data and calculate values of ti, m, and R2. This process is repeated until the segment reaches the end of the table.

Accuracy Assessment

Here accuracy assessment is not for a DEM as studied by many authors (e.g., Muller, 1999; Reusser and Bierman, 2007; Aguilar and Mills, 2008; Höhle and Höhle, 2009) but for channel features extracted from a DEM. Positional accuracy of features usually needs to be defined by measuring differences between points on the features (“tested source”) and corresponding points determined with higher accuracy (“reference source”) (Goodchild and Hunter, 1997). It is assumed that the positional accuracy of a lidar-derived DEM is sufficiently high, and the difference between the DEM and the ground truth can be ignored. Factors affecting the positional accuracy of channels include the channel detection algorithm, mathematical morphological operation and thinning of channel cells, and raster to vector conversion. Using the channel feature as the “tested source” and the high-resolution DEM as the “reference source,” a method developed by Dong (2017) is employed to calculate the mean absolute error (MAE), the root-mean-square error (RMSE), and the National Standard for Spatial Data Accuracy (NSSDA) developed by the Federal Geographic Data Committee (FGDC, 1998). These three measures are calculated from 100 random sampling points along the channel feature. The MAE, RMSE, and NSSDA are defined as: 
graphic
 
graphic
 
graphic
where n is the number of random sampling points, di is the position of the minimum value of the cross profile at point i, and d0 is the half-length of the cross profile. The 100 random sampling points are saved in a shapefile “samples.shp” in the output folder with an “ERROR” field in the attribute table for graphic.

Software Implementation

Following the methodology flow chart (Fig. 1) and methods described above, a Python add-in toolbar named “Channel Extraction” is developed for ArcGIS 10.6.1 using the Python programming language and ArcPy (Fig. 7). The add-in needs licenses for ArcGIS Spatial Analyst and 3D Analyst Extensions. The open-source N-dimensional array package (NumPy) and the fundamental library for scientific computing (SciPy) are also used. The add-in and test data are freely available (see footnote 1).

The items on the toolbar user interface (Fig. 7) are briefly described here: “Workspace” is an empty folder for output files; “DEM” is the DEM raster layer selected from the ArcMapTM table of contents; “Window Size” is the search window size (in the same measurement unit as the input DEM) used for channel detection; “P-Length” is the length of cross profiles (in the same measurement unit as the input DEM); and “P-Number” is the number of cross profiles that can be zero. A digitizing tool before the OK button on the toolbar is designed for on-screen input of the approximate locations of the start and end points of the channel. The items on the toolbar have tool tips and descriptions to help users. When “OK” is clicked, the program will run to produce output files in the output folder and generate a summary report “results.txt.” Figure 8 is a sample summary report.

APPLICATION DEMONSTRATION

Channel Detection

Figure 9 shows the stream channels derived from a 1-m-resolution, lidar-derived DEM near central San Andreas fault, California (USA) based on flow accumulation values calculated in ArcGISTM. As can be seen in Figure 9A, many small channels can be extracted using a flow accumulation threshold of 100, but substantial user editing is needed to remove the small channels and keep the major channels. If a flow accumulation threshold of 1000 is used (Fig. 9B), some major channels were not extracted because of limited contributing areas in the lidar-derived DEM.

For comparison purposes, the same study area in Figure 9 was selected and processed using the Channel Extraction toolbar. Four channels (C1, C2, C3, and C4) were extracted after defining the approximate locations of the start and end points for each channel on screen. Figure 10A shows the four channels extracted from the 1-m-resolution DEM with a backdrop of a hillshaded DEM. Figure 10B shows the four channels draped over a Google EarthTM image. Table 1 lists the MAE, RMSE, and NSSDA accuracies of the four channels. Both visual and quantitative analyses show that the Channel Extraction toolbar can produce very accurate results.

Longitudinal Profile Analysis

As shown in the methodology flow chart in Figure 1, cross and longitudinal profiles are generated after the channels are extracted. The profiles are saved as an Excel spreadsheet in the output folder. Figure 11 shows an offset channel extracted from lidar-derived 1-m-resolution DEM at Wallace Creek across the San Andreas fault near Carrizo Plain in California, USA. The green lines are cross profile locations, and the white dots are automatically generated sampling points for accuracy assessment. The cross profiles near the start and end points of the channel polyline are shown in Figure 12, and the longitudinal profile is shown in Figure 13. It should be noted that the extracted polyline represents the current location of the offset channel. The old offset channel is visible on the DEM near the northwest corner of the study area (Sieh and Jahns, 1984).

A trendline of the longitudinal profile is added to Figure 13 to better display the general shape of the longitudinal profile. It can be seen that the longitudinal profile has a slightly concave shape in general. Causes of the concavity in longitudinal profiles of river channels can be found in Sinha and Parker (1996). Using Equation 6, the longitudinal slopes can be calculated using the “rise over run” method (Fig. 14). The initial values of longitudinal slope at lower elevations range from 0.020 to 0.037 because of the relatively short distance values. From 80 m to 240 m along the channel, the longitudinal slope remains at slightly over 0.030. After 240 m along the channel, the longitudinal slope changes from 0.031 to 0.035, in line with the concave shape of the longitudinal profile in Figure 13. A relatively rapid increase in channel elevation near 270 m along the channel can be seen in both Figures 13 and 14.

Since the longitudinal slope in Figure 14 was calculated using “rise over run,” it may not be very sensitive to local variations in slope at longer distances (for example, after 200 m from the end of the channel). Therefore, segmented slope (Equation 7) and trendline slope (Equation 8) were automatically calculated using segments of 50 records (50 points) in the longitudinal profile spreadsheet (long.xls) (Fig. 15). As can be seen in Figure 15, both segmented slope and trendline slope values are relatively stable from 0 to 200 m. After 200 m (the turning point in Fig. 16), there is a rapid increase in both segmented slope and trendline slope; this increase is caused by the increase in channel elevation near 270 m along the channel. Other local variations from 300 m to 1100 m are also visible in Figure 15.

Discussion

As mentioned in the Introduction section, Ouchi (2005) measured ten offset channels across the San Andreas fault by walking along each channel and taking measurements for analysis of channel development. It was hoped that a semi-automated method based on high-resolution DEMs can greatly improve the efficiency and accuracy of channel measurement. Unlike the traditional methods for channel detection such as the methods based on flow direction and flow accumulation, the proposed channel detection method does not need to process all cells in the DEM, and the input DEM does not need preprocessing to fill the sinks. Once a DEM is stored in computer memory, the algorithm will search for local minimum cells based on the user-specified start and end points and a few simple parameters, thereby avoiding unnecessary calculations for cells not in the neighborhood of the channel. Also, the proposed method does not involve drainage areas, which is especially useful for processing lidar-derived DEMs, because lidar data collection may not cover large areas, and in some cases, only covers a limited zone such as neighboring areas along a fault. Users have the flexibility in extracting individual channels and obtaining cross and longitudinal profiles along with several slope measurements. These features are advantages of the new method compared with some existing methods.

As for the channel extraction examples, it should be noted that the four channels in Figure 10 were selected from a lidar-derived, high-resolution DEM in an area without major human disturbances. To test whether the Channel Extraction tool also works for medium- to low-resolution DEMs, a 90-m-resolution Shuttle Radar Topography Mission (SRTM) DEM near Misha River in Jianchuan County, Yunnan Province, China was selected for channel extraction (Fig. 17). The RMSE of the extracted polyline is ∼100 m, comparable to the 90 m resolution of the DEM. In fact, the channels in low-resolution DEMs could be arguably called “valleys,” which might include flood plains and channels. Because structures such as reservoirs and dams created by human activities may affect channel detection, more tests are needed to determine the applicability of the Channel Extraction tool in such scenarios.

Although the application examples show that the channels can be extracted and measured efficiently and accurately, there are two limitations of the current version of the Channel Extraction tool: (1) While the tool provides flexibility by allowing users to choose approximate locations of the start and end points of a channel on screen, only one channel can be extracted at a time. This limitation can be removed in a future version by using an input file for start and end points of desired channels, so that the channels can be extracted in a batch mode. (2) For DEMs covering large areas, a computer may not be able to read the whole DEM into the memory. The algorithm needs to be improved so that the input DEM can be dynamically partitioned into blocks during data processing. Although these limitations do not affect the use of the tool in many applications, comparison of the method with some existing methods (e.g., Passalacqua et al., 2010a, 2010b; Pelletier, 2013; Clubb et al., 2014) would be helpful for improving the method.

The application of the Channel Extraction tool in tectonic geomorphology needs further evaluations. In the study by Ouchi (2005) based on manual measurements in Wallace Creak across the San Andreas fault, three segments of 50–100 m in length were selected in the channel similar to the 0–400 m section on the horizontal axis in Figure 15, and segmented slopes of 0.032, 0.038, and 0.032 were reported. These values are comparable to the values in Figure 15, but Figure 15 provides continuous values to show much more detail of local variation in segmented and trendline slopes. Figure 15 also suggests that care should be taken when comparing segmented or trendline slopes of the longitudinal profile, because segmented or trendline slopes can be quite different along the longitudinal profile of the channel. However, more in-depth investigation on the relationship between the longitudinal profile slopes of Wallace Creek and neotectonic movement is beyond the scope of this paper.

CONCLUSIONS

This paper introduces a semi-automated method for extracting stream channels and channel profiles from lidar-derived, high-resolution DEMs. The method is implemented as an ArcGIS Python add-in toolbar named Channel Extraction involving the Spatial Analyst, 3D Analyst, and Python programming with ArcPy, NumPy, and SciPy. Based on user-specified approximate locations of the start and end points and a few simple parameters, the tool executes five automated steps: (1) channel detection, (2) mathematical morphological operation, (3) vectorization, (4) profile generation, and (5) accuracy assessment. The output includes a thinned channel raster, shapefiles for cross and longitudinal profiles of the channel, spreadsheet files for cross and longitudinal profiles, a shapefile for random sampling points along the channel for accuracy assessment, and a summary text file for output results. The proposed channel detection method does not need to process all cells in the DEM, and the input DEM does not need preprocessing to fill the sinks. Since the proposed method does not involve drainage areas, it is especially useful for working with lidar-derived DEMs covering limited areas.

Channel detection was demonstrated using a lidar-derived 1-m-resolution DEM in a study area along the San Andreas fault in California, USA. Visual and quantitative analyses suggest that the method can produce accurate channel polylines. Automated extraction of channel profiles was demonstrated using a lidar-derived DEM at Wallace Creek across the San Andreas fault in California, USA. Compared with a previous study in Wallace Creek, the tool can provide continuous values of segmented and trendline slopes along the longitudinal profile. The results from Wallace Creek also suggest that care should be taken when comparing segmented profile slopes due to the sensitivity of slope values along the longitudinal profile. Although there are two limitations that need further improvement, the demonstrated samples show that the new semi-automated method for extracting channels and channel profiles is efficient and user-friendly and can produce accurate results to support geomorphic studies.

ACKNOWLEDGMENTS

The first author would like to thank Beijing Advanced Innovation Center for Imaging Theory and Technology (Capital Normal University) and Yunnan University and Department of Science and Technology of Yunnan Province for providing summer research funds (C176240107 and C176240210019). This work is supported by the National Natural Science Foundation of China (grants 41371434 and 41461103).

1Supplemental Files. Channel Extraction add-in for ArcGIS 10.6 and 10.7, user’s manual, and test data. Please visit https://doi.org/10.1130/GES02188.S1 or access the full-text article on www.gsapubs.org to view the Supplemental Files. Files also available at http://geography.unt.edu/∼pdong/software/channel/.
Science Editor: Shanaka de Silva
Gold Open Access: This paper is published under the terms of the CC-BY-NC license.