Terrestrial light detection and ranging (LiDAR) data can be acquired from either static or mobile platforms. The latter presents some challenges in terms of resolution and accuracy, but the opportunity to cover a larger region and repeat surveys often prevails in practice. This paper presents a machine learning algorithm (MLA) for automated lithological classification of individual points within LiDAR point clouds based on intensity and geometry information. Two example data sets were collected by static and mobile platforms in an oil sands pit mine and the MLA was trained to distinguish sandstone and mudstone laminations. The type of approach presented here has the potential to be developed and applied for geological mapping applications such as reservoir characterization or underground excavation face mapping.

Light detection and ranging (LiDAR) is a relatively new technology that is now a prominent aspect of many applications in geology and engineering, including, but not limited to, tunneling (Van Gosliga et al., 2006), hazard evaluation (Jaboyedoff et al., 2009; Abellan et al., 2011) and hydrogeological studies (Harpold et al., 2015). The application of terrestrial LiDAR in the geosciences can be broadly grouped into three main categories: (1) surveying, (2) monitoring, and (3) characterization. Surveying applications typically dictate the need for raw geometry and relationships of interest for the target surface. Examples include the creation of digital elevation and/or terrain models for rockfall monitoring (Jaboyedoff et al., 2012), recording as-built conditions (Su et al., 2006), or performing quality control evaluations (Fekete et al., 2009). Most surveying applications require a very high level of local accuracy as well as control used for georeferencing within either a regional or global reference system. Monitoring applications introduce the fourth dimension of time and evaluate repeat measurements (scans) of the same surface or environment in order to identify changes between epochs. Examples include snow thickness evaluation (Deems et al., 2013), shotcrete thickness evaluation (Lato and Diederichs, 2014), tunnel deformation monitoring (Walton et al., 2014; Nuttens et al., 2014), or rockfall and landslide monitoring (Maerz et al., 2012; Jaboyedoff et al., 2012; Royán et al., 2014; Hutchinson et al., 2015; Kromer et al., 2015). For these types of applications, a fundamental understanding of the underlying process inducing the surface change is required as well as an established understanding of the limitations of the observations (scanner) such that monitored changes can be correctly attributed to a physical phenomenon or process.

Characterization involves the extraction of features and/or attributes from a scanned surface (or more precisely, a LiDAR point cloud) using either manual identification or automated postprocessing algorithms, such as machine learning algorithms (MLA). MLA applications can be computationally expensive and often require strict and carefully selected training data sets as well as calibrated classification schemes. Perhaps the most common use of LiDAR for rock outcrop characterization is the mapping of planar fracture surfaces (Lato et al., 2009a; Pate and Haneberg, 2011; Otoo et al., 2011; Maerz et al., 2012, 2015; Gigli et al., 2014). The automated extraction of fracture attributes from point clouds has been discussed extensively in the literature (Roncella and Forlani, 2005; Ferrero et al., 2009; Gigli and Casagli, 2011; Lato and Vöge, 2012; Maerz et al., 2013; Riquelme et al., 2014).

This study is particularly focused on the development and testing approach for automated lithological characterization of a rock outcrop using LiDAR. The development of automated characterization approaches has the potential to improve mapping efficiency and allow for quantitative study of geological attributes. For example, the ability to rapidly and quantitatively describe the size, shape, and distribution of geological units of interest has potential applications within the field of petroleum reservoir characterization and for face mapping in underground excavations.

In Mills and Fotopoulos (2013), the roughness or morphology of an outcrop surface was utilized to distinguish between distinct lithologies. In such cases, as illustrated in Figure 1, it is easy to see that such an approach for lithology classification has the potential to be effective.

Another approach is to utilize the reflection intensity values as an indicator of lithology (Fig. 2 illustrates a case where this type of approach is clearly applicable). Burton et al. (2011) demonstrated the promise of such an application by correlating LiDAR intensity values from core to the associated downhole gamma-ray log. Fowler et al. (2011) used intensity values to interpret rock face weathering and to distinguish a coal layer from surrounding (more reflective) rocks. Inocencio et al. (2014) used a k-means algorithm to segregate points into distinct interpreted lithologies based on intensity values.

In this study we examine the use of reflection intensity and geometry information for distinguishing two distinct lithologies at an oil sands pit mine in Canada (Fig. 2 represents one example data set collected at this site). In particular, mudstone laminae are separated from the surrounding sandstone in an effort to allow the characteristics of the laminae to be examined. Both static terrestrial LiDAR data and mobile LiDAR data were used for the study, and recommendations on the application of these two data collection approaches are provided.

In the following we provide a comparison of the static and mobile data collection methods; we then present rock type classifications obtained using the MLA, and conclusions and recommendations for future work.

LiDAR scanners are capable of collecting high-density spatial coordinates by emitting a laser and recording its reflected return. The position of any point that returns the laser is recorded as a set of x, y, z coordinates, along with the intensity of the reflected wave. These coordinates can either be kept in a relative reference frame, or georeferenced to a datum (Shan and Toth, 2008).

There are two main types of LiDAR scanners: time-of-flight and phase based. Time-of-flight scanners emit laser pulses, and for each pulse, the delay between the transmission of the pulse and the return of the pulse from a reflective surface is used to calculate the distance to the surface, given that the speed of light is known. Phase-based scanners continuously emit waves across a variety of wavelengths; by recording the phase shift of each wavelength when reflected energy returns to the scanner, a unique range can be identified based on where the different wave phases align. Because their data acquisition is not limited by pulse frequency, phase-based scanners tend to acquire data more quickly and with higher accuracy than time-of-flight scanners, but their scanning range is limited by the maximum wavelength used due to issues of aliasing (Slob, 2010; Zhang and Arditi, 2013).

Aside from advances in data collection and processing workflows specific to various areas of study, there have been significant developments in scanner technology; modern scanners typically have measurement ranges of several hundreds of meters (or longer for time-of-flight scanners), fields of view covering 360° horizontally and nearly 360° vertically, and measurement rates of 1 × 106 points/s (Puttonen et al., 2013; Faro, 2016). With respect to data quality, the single point range accuracies quoted by manufacturers have reached as low as ±2 mm. As such, LiDAR systems are beginning to rival traditional surveying techniques with respect to accuracy. There are numerous advantages of LiDAR over traditional data collection methods including (Ferrero et al., 2009; Lato et al., 2012):

  1. Data are collected for a fully three-dimensional area, rather than at specific points or along specific scanlines.

  2. Data can be collected in areas that are inaccessible or dangerous to access.

  3. An objective record of site conditions is established and available for reinterpretation in the future.

  4. Information can be extracted from high-density position data through postprocessing that might not otherwise be possible to obtain from conventional surveying.

  5. Automated data processing combined with rapid data collection can save time.

When designing a LiDAR survey, data considerations (positional accuracy, orientation bias, point spacing) must be balanced with operational constraints (time required for data collection, and consequently cost). Each of these considerations is addressed with respect to mobile and static LiDAR data collection here.

Positional Accuracy

Positional accuracy is influenced by several factors, including type of instrument, data collection speed, distance from the scanner, the accuracy of scanner positioning in the case of mobile scanning, and any repeated measurement averaging performed by the scanner. The scanners used for this study were the Z+F Imager 5010 and Riegl VMX-450 for static and mobile deployment, respectively. The data collection settings were 1 × 106 points/s for both the static and mobile scanners. Both the static and mobile surveys were performed with the scanners at similar distances from the outcrop of interest (∼30 m). The network accuracy of the mobile system is poorer than that of the static system (±20 mm versus ±5 mm) (RIEGL, 2015; Z+F USA, Inc., 2015). For the desired application, however, the accuracy of the mobile data was deemed acceptable because finer scale geometrical attributes of the outcrop were not necessary for the determination of lithology.

Orientation Bias

LiDAR is a line-of-sight technology, meaning surfaces oriented parallel to the ray connecting them to the scanner system will not be imaged. By extension, surfaces that are subparallel will return a signal to the scanner from a very small number of points relative to their size when compared to surfaces that are perpendicular to the line of sight (see Fig. 3, where the horizontal plane subparallel to the instrument’s line of sight has a lower point density than the subvertical plane, which is nearly perpendicular to the line of sight). This point density bias was studied extensively by Lato et al. (2010). To minimize this bias, multiple scanning locations can be used (Kemeny and Turner, 2008; Lato et al., 2012; Mills, 2015). In the case of static surveys, this can significantly increase the total scanning time, whereas for mobile surveys, multiple lines of sight are employed by virtue of the mobility of the scanner. Nonetheless, subhorizontal surfaces will still tend to be undersampled.

Point Density

The mobile and static data were collected with typical point densities of ∼1 point/20 mm (2500 points/m2) and 1 point/5 mm (40,000 points/m2), respectively. One weakness of mobile data is that variability in mobile platform speed can lead to mobile data striping, where stripes of higher or lower point density appear in the point cloud (see Fig. 4). The resulting irregularities in point density can be extreme and detrimental to the application of algorithms that require relatively uniform point densities (e.g., the clustering algorithm employed by Riquelme et al., 2014). To address this issue, subsampling of the original point cloud to achieve a uniform spacing can be performed (e.g., Hou et al., 1990); this approach is only appropriate, however, if the minimum point spacing of the point cloud exceeds the desired resolution.

Effective Resolution

Point density is a primary determinant of LiDAR data resolution. Below a certain threshold, however, the limiting factor influencing effective resolution of the data instead becomes the laser beam width. Lichti and Jamtsho (2006) suggested that a point spacing of 86% of the expected beam width is optimal; at smaller point spacings, features are effectively smoothed between points. Point density is influenced by the speed of scanning (user defined), surface orientation biases (see preceding), and scanner movement speed (in the case of mobile scanning). Typically mobile data will have a larger point spacing than static data (although this is not necessarily always the case); the beam widths of the static and mobile surveys at the outcrop in this study were ∼12 mm and 16 mm, respectively.

As mobile data is collected while the vehicle is in motion, data acquisition times are correspondingly shorter than those associated with static data collection for large-scale projects. A typical single static terrestrial scan requires as few as 3 min to complete, but typically requires ∼10 min per scan position when equipment setup time is considered. In addition, static data collection requires survey of exterior reference targets if accurate georeferencing of cloud data is required. In contrast, mobile data are automatically georeferenced as the trajectory of the vehicle is determined with respect to an operating global positioning system base station with known coordinates. As a result, as the scale of data collection programs and the associated number of required static scan positions increase, mobile surveys become increasingly time efficient in comparison to static scans.

This study attempts to distinguish mudstone laminae from a surrounding sandstone unit using both mobile and static LiDAR data. Figure 5 shows a photograph of the area of interest along with a visualization of the static data set analyzed in this study. Although it is not quite as clear as in Figure 2, it can still be seen that there is a notable intensity contrast between the mudstone and sandstone units.

Preliminary attempts to extract points corresponding to the mudstone layers focused on a sequential filtering approach. For example, points with low intensity values [scaled to the interval (0, 255)] could be removed first, and then the remaining points (ideally corresponding primarily to mudstone layers) could be tested to ensure that they belonged to a linear, subhorizontal neighborhood of points. These approaches were unsuccessful, primarily because the intensity values were not sufficient parameters for (first pass) filtering. Even relatively low intensity cutoffs (60) had the effect of removing some portions of mudstone data, while higher intensity cutoffs (120) still left significant areas of noise in the data (see Fig. 6). One such source of noise is vertical excavation marks, which show anomalously high intensity in the sandstone unit (this type of issue was noted in a similar study by Hartzell et al., 2014).

In lieu of the sequential filtering approach, an MLA was adopted. The MLA consists of a supervised classifier used with a set of hand-picked mathematical functions that encode localized properties of point cloud geometry and intensity into a numerical feature vector.

The extra-trees classifier (Geurts et al., 2006) was employed to distinguish between mudstone and sandstone points based on a vector of numerical features (see Fig. 7). Previous work (e.g., Mills, 2015) indicates that while the extra-trees classifier generally yields good performance for this type of analysis involving large point cloud data sets, other supervised classifiers are equally suited and have been used successfully in similar studies (e.g., Brodu and Lague, 2012).

The classifier is initially provided with a set of training data (manually classified). Each data point has an associated feature vector, with each element of the vector representing an individual feature of the data point. These features can be simple (e.g., the reflection intensity associated with the data point) or complex (e.g., an eigenvalue calculated from the spatial positions of all data points within a certain distance of the point of interest). When using the extra-trees classifier, subsets of training data are randomly selected to develop decision trees based on these features (x1 and x2 in Fig. 7). Each training subset is considered individually, and a random subset of its features is selected to define thresholds that most accurately split the training subset into the manually assigned categories. Once a large number of decision trees have been established based on the training data, the remainder of the data are classified by evaluating each of the decision trees separately; the probability of a given point belonging to a given classification is assigned as the proportion of decision trees that selected that classification (Breiman, 2001; Strobl et al., 2007).

Because intensity is a key visual indicator that can be used in the manual distinction of mudstone and sandstone layers within the data, an initial feature vector was developed based purely on intensity data. Individual point intensity depends on several factors other than intrinsic rock properties (i.e., albedo), however, such as angle of incidence and distance from the sensor. To attempt to mitigate the impact of these confounding factors, features were selected based on the fact that geological factors are likely to vary more discontinuously than factors associated with data acquisition. For example, intensity may decrease gradually as a function of distance, but within a given region of a data set that contains points at similar distances from the sensor, a point with a high intensity relative to its neighbors is more likely to correspond to mudstone. Therefore, features were selected to incorporate information about each point as well as its neighbors.

Each feature calculation is based on a subset of data (referred to as a neighborhood) centered on the point of interest (referred to as the query point) and returns a vector or scalar value. The outputs of these calculations are inherently dependent on the size of the neighborhood used. Therefore, 6 analysis scales were used ranging from 0.15 m to 0.6 m; these values were selected to coincide with the approximate thickness range of the mudstone laminae in the data. Points within the neighborhoods bounded by spheres of radius equivalent to the analysis scales for each query point were identified. The query point’s composite feature vector was then formed by concatenating the output of each function applied to each of the query point’s neighborhoods (Brodu and Lague, 2012; Mills, 2015).

The first feature set incorporates four scalar-valued functions and is purely intensity based (referred to as i4). Computing these four features at all 6 analysis scales resulted in a 24 element composite feature vector. Its functions are described as (1) mean neighborhood intensity; (2) mean neighborhood intensity − query point intensity; (3) query point intensity − minimum neighborhood intensity; and (4) maximum neighborhood intensity − query point intensity.

Preliminary testing on the static LiDAR data set showed that the i4 feature set did not provide satisfactory results. Therefore, an alternative feature vector was developed. The so-called g8 feature vector contains eight features, and incorporates information about neighborhood geometry in addition to information about neighborhood intensity. Computing these 8 features at 6 analysis scales yielded a 48 element vector; g8 consists of 2 scalar-valued and 2 vector-valued functions: (1) the sum of the neighborhood intensity values divided by the neighborhood volume (intensity weighted density); (2) the ratio of the intensity weighted density to the point density (defined as the number of points in the neighborhood divided by the neighborhood volume); (3) the two largest normalized eigenvalues of the weighted covariance matrix (computed by scaling each point’s (x, y, z) vector from the geometric center of the neighborhood by its intensity value); (4) the x and y (horizontal) components of the normalized eigenvectors corresponding to the two largest eigenvalues.

A training set consisting of ∼45,000 mudstone points and 45,000 sandstone points was selected from the static data set, while a training set of ∼60,000 points from each class was selected from the mobile data set. In each of the three classifications (i4 with static data, g8 with static data, g8 with mobile data), the extra-trees classifier was used to assign a probability Pm to every point in the point cloud, describing its likelihood of belonging to a mudstone lamination.

An initial qualitative inspection was performed to assess the relative performance of the two feature vectors. Points with a Pm value of 0.8 or greater were assigned to the mudstone class, while points with a Pm of 0.2 or less were assigned to the sandstone class. The remaining points were ignored in the initial visualizations, shown in Figures 8 and 9.

From a qualitative perspective, the g8 feature vector performed more favorably than the i4 feature vector, in that both fewer sandstone and mudstone points appear to be misclassified (see Fig. 8). There are still some areas where noise is prevalent, however; for example, below the main mudstone layer, the influence of mechanical excavation marks on the data intensity has a notable effect on the output of the classifier, even when using g8.

Comparing the results from this feature vector to a manually interpreted result (see Fig. 9) allowed for a more quantitative comparison of the two feature sets (see Table 1). The g8 feature vector misclassifies notably fewer points than i4 at a Pm threshold of 0.8 (3.7% of the data set misclassified versus 5.6%); g8 also tended to provide less confident results overall, however, with the Pm threshold of 0.8 resulting in 56% of the data set remaining unclassified (versus 51.4% for i4).

To quantitatively examine performance for a broader range of Pm thresholds, two metrics were utilized: “recall” represents the fraction of the total number of manually interpreted mudstone points designated as mudstone by the MLA, whereas “precision” represents the fraction of MLA-designated mudstone points that were manually interpreted to be mudstone (Raghavan et al., 1989). For each of the three cases tested (g8 static, i4 static, g8 mobile), recall and precision values were determined through comparison of MLA results to manually interpreted classifications at a variety of Pm thresholds; for the determination of these values, all points not assigned to the mudstone class were assigned to the sandstone class (unlike in Fig. 9, uncertain points are not excluded). The effect of the Pm threshold on the precision and recall of each case is shown in Figure 10, as is the precision versus recall plot, which shows the set of attainable precision-recall pairs for all Pm thresholds.

Similar to the conclusions obtained from qualitative visual comparison (Fig. 9) and initial quantitative comparison (Table 1), the g8 approach outperforms the i4 approach based on the results in Figure 10. As seen in Figure 10C, at all levels of recall, the g8 static results have a level of precision greater than or equal to the i4 static results. The g8 mobile results show a very similar trend to the g8 static results, although with slightly lower precision levels. If precision and recall are valued equally as performance indicators, the overall performance of the MLA in each case can be quantified using the area under the corresponding curve. These area values (Fig. 10C legend) show that the g8 static results outperform the g8 mobile results, and that the i4 static results are the poorest.

In addition to considering the overall performance of the MLA in each case, the variability of the MLA performance within each case should be addressed. This variability can best be visualized by plotting mudstone points correctly classified by the MLA as well as both false positives (sandstone classified as mudstone) and false negatives (mudstone classified as sandstone). This visualization is presented in Figure 11.

When comparing the g8 results for the static and mobile data sets (Figs. 11A, 11C), the biggest differences are associated with the discrepancy in the resolution of these data sets. In the g8 mobile result, several closely spaced mudstone laminae are merged in the mobile data (shown as red bands between white laminae in Fig. 11C); given the coarser spatial resolution of the mobile data set, this result is hardly surprising. In the g8 static result, merging of mudstone laminae is less prevalent, but resolution still plays a key role, in that many of the laminae missed by the MLA (mudstone classified as sandstone) are the thinnest. These results seem to suggest that the effective resolution of the MLA appears to be slightly poorer than that of the data sets it analyses; this effective resolution of the MLA may be related to the analysis scales used.

When considering the results shown in Figure 11, it is important to remember that these have been generated based on a comparison of the MLA classifications to manual interpretations of the same data sets collected by the same sensors, not based on a comparison to ground truth. For example, mudstone laminations that have a relatively low intensity contrast with the surrounding sandstone points due to nonintrinsic factors such as those mentioned here (distance from sensor, angle of incidence) may be difficult for either a human or MLA to recognize. This can cause classifications such as those labeled 1 and 2 in Figure 11. In the case of label 1, the MLA has labeled coherent sets of subhorizontal points as mudstone. Although these points may correspond to mudstone laminations, they are considered misclassified because they were not noted during manual interpretation. In the case of label 2, the same lamination was classified by the MLA as mudstone when analyzing the static data set, but classified as sandstone when analyzing the mobile data set; however, the manual interpretations in both cases were reversed. These cases highlight the limitation of this analysis, that the manually interpreted results used a baseline for comparison to the MLA are not fully accurate, with the potential for errors in interpretation being greatest in areas with low overall intensity such as near the edges and bottoms of the data sets.

The other major limitation of the proposed approach is that despite the design of feature vectors to minimize the effects of nonintrinsic intensity variations, the MLA is generally less reliable in areas with lower overall intensity (much like the manual interpretation). This is illustrated by the area labeled 3 in Figure 11, where a relatively significant mudstone package was largely missed by the MLA at the edge of the data set. Although further optimization of the feature vectors used could mitigate this limitation to some degree, nonintrinsic intensity influences will have some effect on the classifications of an MLA that relies on intensity information.

Both static and mobile terrestrial LiDAR platforms and observations show immense promise for applications requiring semiautomated rock mass characterization. In terms of rock type delineation, the use of intensity and geometry information incorporated through an MLA for point classification represents an advance over more direct approaches that we attempted (such as sequential filtering based on raw intensity values). As with any interpretive technique used with remotely sensed data, the importance of data quality to the ground-truth performance of this approach cannot be understated.

Generally speaking, a mobile scan (potentially downsampled to a uniform point cloud) is suitable for larger scale investigations, whereas a series of static scans can provide enhanced accuracy and resolution in cases where fine detail and geometric accuracy are a priority. Because distinct data-acquisition systems will allow for the collection of data sets of differing quality, it is most important to ensure the acquisition of data with an appropriate resolution (considering both point spacing and beam size) relative to the scale of the lithological units of interest. MLA-derived classifications may have a slightly lower resolution than the data being analyzed. Future studies should be performed to examine the potential impact of the selected analysis scales on the effective MLA resolution.

In order to improve upon the results obtained in this study, there are many other options for future work. For example, tests could be performed to evaluate the sensitivity of the classifier output to the selected training sets. Likewise, other features could be tested and implemented to improve the performance of the classifier. Perhaps the most valuable approach to pursue, however, will be to incorporate hyperspectral data into the classification workflow, as preliminary studies using hyperspectral data for rock identification at the outcrop scale suggest that these data add significant amounts of information for characterization purposes; this is particularly true when these data are calibrated based on spectroradiometer reflectance measurements (Kurz et al., 2012; Hartzell et al., 2014).

We thank Suncor Energy Inc. for allowing us to publish the results of their data collection campaign. The authors would also like to thank the reviewers and editorial team at Geosphere for their contributions to the improvement of this manuscript throughout the peer-review and publication process.