The application of structure from motion–multiview stereo (SfM-MVS) photogrammetry to map metric- to hectometric-scale exposures facilitates the production of three-dimensional (3-D) surface reconstructions with centimeter resolution and range error. In order to be useful for geospatial data interrogation, models must be correctly located, scaled, and oriented, which typically requires the geolocation of manually positioned ground control points with survey-grade accuracy. The cost and operational complexity of portable tools capable of achieving such positional accuracy and precision is a major obstacle in the routine deployment of SfM-MVS photogrammetry in many fields, including geological fieldwork. Here, we propose a procedure to overcome this limitation and to produce satisfactorily oriented models, which involves the use of photo orientation information recorded by smartphones. Photos captured with smartphones are used to: (1) build test models for evaluating the accuracy of the method, and (2) build smartphone-derived models of outcrops, used to reference higher-resolution models reconstructed from image data collected using digital single-lens reflex (DSLR) and mirrorless cameras. Our results are encouraging and indicate that the proposed workflow can produce registrations with high relative accuracies using consumer-grade smartphones. We also find that comparison between measured and estimated photo orientation can be successfully used to detect errors and distortions within the 3-D models.
The application of structure from motion–multiview stereo (SfM-MVS) photogrammetry for generating three-dimensional (3-D) surface reconstructions of rock outcrops (virtual outcrop models, VOMs) has enjoyed rapid proliferation over the past decade (e.g., Sturzenegger and Stead, 2009; Favalli et al., 2012; Bemis et al., 2014; Bistacchi et al., 2015; Bisdom et al., 2016; Seers and Hodgetts, 2016; Tavani et al., 2016; Fleming and Pavlis, 2018; Hansman and Ring, 2019). The fidelity of VOMs built using SfM-MVS photogrammetry now compares favorably with that of models generated by terrestrial laser scanning (also known as terrestrial lidar) (Harwin and Lucieer, 2012; Nocerino et al., 2014), with relatively low cost and highly portable digital single-lens reflex (DSLR) or mirrorless cameras enabling the construction of models with resolutions down to a few tens of microns (Corradetti et al., 2017). However, two major limitations of SfM-MVS photogrammetry still prevent its routine use in field geology. The first one is the common requirement for model registration during post-processing. The spatial registration of metric- to hectometric-scale outcrops, done within either a local or a global coordinate frame, typically requires the placement of ground control points (e.g., Javernick et al., 2014; James et al., 2017; Martínez-Carricondo et al., 2018) with centimeter to sub-centimeter (i.e., survey-grade) accuracy. Such levels of accuracy and precision can be achieved with a total station or using real-time kinematic differential global navigational satellite system (RTK-DGNSS) receivers (Carrivick et al., 2016). Such tools, however, do not form part of the standard equipment of the field geologist, and are impractical to deploy, as they are expensive, cumbersome, and require specialist operation. The second limitation of SfM-MVS models in virtual outcrop geology is the occurrence of errors emanating from scene reconstruction (James and Robson, 2014), which cannot be determined a priori. Such errors are readily detectable when dealing with simple planar surfaces. However, for topographically complex surfaces, it is commonly more arduous to determine errors. The identification of errors in such cases requires the known positions of several ground control points, again necessitating survey-grade tools, which negates many of the advantages in terms of portability and the low cost that SfM-MVS photogrammetry offers for geospatial data collection.
In this work, we explore the feasibility of utilizing camera attitude information from smartphone magnetometer and inclinometer measurements during image capture as a means to orient SfM-MVS photogrammetry–derived 3-D models, thus providing a pragmatic alternative to ground control points. We propose that the presented workflow will open the door for the routine use of photogrammetric surveys in many fields, including but not limited to geological fieldwork. In addition to facilitating model registration, we also investigate the application of smartphone-derived camera pose information to quantify errors within the generated 3-D model.
Three-dimensional reconstruction via SfM-MVS photogrammetry is based upon the collinearity equation (Fig. 1A), which defines the intersection between (1) the ray joining the camera’s optical center (hereinafter named camera position) and a given point in the object space and (2) a plane (i.e., the photo plane) lying at a given distance (i.e., focal length) from the camera position. The two-dimensional coordinates of the point of intersection on the photo plane (xi, yi), which represent the input for SfM-MVS photogrammetry reconstruction, depend upon (Fig. 1A; Table 1): (1) the camera position and the point location; (2) the orientation of the photo plane (the photo view direction), defined by the photo plane–normal unit vector ξ (the camera attitude); (3) the distance between the camera’s optical center and the photo plane (i.e., the focal length); and (4) the reference system within the photo plane, defined by the roll angle, which measures in the photo plane the angle between the horizontal and the long axis of the photo (defined by the unit vector ρ). Solving the collinearity equation for different photos provides the 3-D coordinates of each point detected in two or more photos. However, the full solution requires camera pose information (i.e., the camera’s extrinsic parameters). When this information is unknown, the collinearity equation can be solved using an arbitrary 3-D reference frame, with the resultant 3-D reconstruction being output as an unreferenced point cloud. In order to georeference the scene, a further similarity transform (roto-translation, uniform scale) must be performed, which requires the known position of at least three non-collinear cameras (e.g., Turner et al., 2014) or ground control points (e.g., Carrivick et al., 2016). Conversely, deriving scaling factors requires knowing the distance between only two key points within both the real-world and arbitrary reference frames, which can be achieved with varying degrees of accuracy using rudimentary tools, such as laser distance meters in the case of small outcrops, or by measuring the distance between two objects on orthophotos for larger (i.e., hundreds of meters wide) exposures. Translation is not always required in geoscience applications, especially where only the relative orientations of geologic structures (e.g., faults, fractures, bedding planes) are required (Tavani et al., 2014). Assuming that the model is accurately scaled and rotated, a coarse georegistration can be achieved by matching a single point in the arbitrary coordinate frame to the equivalent location manually identified from georeferenced remote sensing imagery and/or digital terrain models (e.g., within Google Earth).
It is clear from the above discussion that orienting the model poses the greatest challenge when attempting to spatially rectify 3-D reconstructions of real-world scenes. In practice, orienting a 3-D model is typically achieved by multiplying the 3-D coordinates of each point with a 3 × 3 rotation matrix (i.e., by rotating the model around an axis of a given rotation angle). Determining the rotation matrix requires the known locations of at least three non-collinear points, both in the arbitrary reference system and in the target reference frame. In the field, this ostensibly trivial problem is exacerbated by the poor portability and/or high cost of local and global positioning systems capable of achieving survey-grade measurement accuracies. Indeed, recognizing the position of three non-collinear points in 3-D virtual scenes is relatively simple, whereas determining their positions in the north-east-up reference system requires centimeter to sub-centimeter accuracy, achievable with a total station or RTK-DGNSS receivers.
The alternative workflow for orienting models explored in this work consists of taking attitude data tagged to smartphone images, rather than the position of ground control points, for determining the 3-D model’s rotational transform. In summary, our procedure consists of three simple steps: (1) acquire smartphone photos with the AngleCam app for Android, a software application which records and stores camera attitude data associated with individual photographs; (2) build a model using the smartphone photos and extract the estimated unit vector ξ and roll angle (and the associated unit vector ρ; Fig. 1A) of the photos as defined in the arbitrary reference system; and (3) determine the rotation matrix using the measured and estimated values of ξ and roll.
The 3-D models presented in this work were constructed using Agisoft PhotoScan software (Verhoeven, 2011; Plets et al., 2012), version 1.4.4, Professional Edition, a commercially available SfM-MVS photogrammetric tool chain. Photo-alignment in PhotoScan allows for the estimated direction of photos (estimated ξ and estimated ρ) to be derived, whereas measured ξ and measured roll angle are provided by the AngleCam app (the roll angle is then transformed into the measured ρ). The rule adopted herein is that the trend of ξ is the direction of view with respect to north, and the plunge for both ξ and ρ is positive looking downward. The trend of the ρ unit vector is taken, looking in the same direction and sense as the ξ direction, on the right side of the ρ direction.
The four unit vectors (i.e., measured ξ, estimated ξ, measured ρ, and estimated ρ) are required by the presented workflow to orient a virtual outcrop model (their value for the photos for the five models presented herein are provided in the Supplementary Material1). Specifically, the rotation axis (Rax) and rotation angle (Ran) of each model are derived by adopting a procedure of minimization of the residual sum of squares (RSS). Given two unit vectors A and B, all of the rotation axes that permit the transformation of A to B lay on the plane γ orthogonal to the vector joining A and B (hereinafter named vector J) and passing through the origin of the coordinate frame (Fig. 1B). If a second vector pair (A′ and B′) is added to the system, along with its J′ vector and γ′ plane (Fig. 1C), the intersection between γ and γ′ provides the rotation axis that allows simultaneous rotation of the two vector pairs. For each unit vector pair (i.e., measured and estimated ξ and measured and estimated ρ), the Jξ and Jρ vectors (which join the measured and estimated unit vectors; see Table 1) are computed, as well as the planes perpendicular to these vectors. For each model, the optimal rotation axis is provided by the maximum intersection of these planes. The plane normal vectors (J vectors) are transformed into a second-order symmetric tensor (e.g., Whitaker and Engelder, 2005): the eigenvector corresponding to minimum eigenvalue is the direction of minimum concentration of J, which is the direction of maximum concentration of intersections between planes (i.e., the optimal rotation axis, Rax). Having defined the axis of rotation, estimated ξ and ρ of each photo is rotated around Rax using 0.1° increments. Using the entire photographic data set, the RSS between the rotated ξ and ρ and the measured ξ and ρ of each photo is computed. The angle generating the minimum RSS is taken as the Ran. This procedure is implemented in OpenPlot software (Tavani et al., 2011).
Two test models of a 200-m-long segment of the “Acquedotto Felice” in the Park of Aqueducts in the city of Rome (Italy) were constructed using 12 Mpx (megapixel) images, captured using a Xiaomi MiA1 smartphone (Fig. 2A). AngleCam, developed for the Android mobile operating system (http://anglecam.derekr.com), was used to obtain the camera attitude associated which each survey photo in the form of trend, plunge, and roll angle (Fig. 2B). First, the handset was set to airplane mode to reduce electromagnetic interference between the magnetometer and the smartphone’s computing hardware (the reader should note that recent findings indicate that having the airplane mode off does not significantly affect orientation measurements; Novakova and Pavlis, 2017). Moreover, the handset’s integrated compass and accelerometer were both calibrated using the provided calibration tool. A 51-photograph data set was acquired at a distance of ∼30 m from the aqueduct. Photos were acquired approximately perpendicular to the aqueduct, and at two opposing oblique angles (∼50°) to its strike.
The first test model was constructed using the entire photo data set (model 1; Fig. 2C) and resulted in a point cloud of nearly 4 × 106 points. The second model consisted of a point cloud of 3 × 106 vertices and was constructed only using photographs that were approximately perpendicular to the aqueduct, exhibiting poor image overlap. The survey regimen of the latter data set was designed to enhance the doming effect of the reconstructed scene in order to produce an intentionally deformed model (model 2; Fig. 2D). The four unit vectors of each photo (i.e., estimated and measured ξ and estimated and measured ρ) were used to determine the rotation matrix for each model. After rotating the estimated ξ and ρ, we obtain the estimated-and-rotated ξ and ρ of each photo.
Ideally, when measurement errors and model distortions do not occur, the estimated-and-rotated ξ and ρ for each photo should exactly coincide with the measured ξ and ρ. Therefore, differences between measured and estimated-and-rotated parameters provide an indirect estimation of model quality. Accordingly, the angular difference between the measured and transformed ξ and between the measured and transformed ρ, from hereon named Δξ and Δρ, respectively, were computed. The average of the absolute values of both parameters is nearly 2° for model 1, whereas it is ∼7° for model 2 (i.e., the model with induced geometric distortion) (Fig. 3A). In Figure 3, we also plot Δξ and Δρ versus the position of photograph along the survey path and the measured photo direction (measured ξ) (Fig. 3B). These plots evidence a remarkable difference between the two models.
In model 1, both Δξ and Δρ show poor correlation with the position along the survey path, these parameters being nearly −0.5° and 0.5° at the two ends of the survey path respectively. Moreover, for model 1, the line of best fit has a low R2 (<0.02), indicating low residual error between AngleCam-measured and transformed SfM-estimated orientation of cameras. In model 1, both Δξ and Δρ correlate with the measured ξ, with the R2 of the linear fit being >0.6. The measured ξ ranges between 200° and 280° and, in this 80°-wide interval, Δξ and Δρ pass from nearly −4° to 4°, with a slope of the line of best fit being ∼0.1°. For model 2, both Δξ and Δρ increase with increasing (measured) ξ, with a slope of the line of best fit being 0.3°, thus the difference between the measured and the estimate-and-rotated directions is more sensitive to the photo direction. However, R2 is <0.3 for model 2. Although model 2 is more sensitive to photo direction than model 1, the distortions are still mostly dependent on survey position. Indeed, Δξ and Δρ correlate strongly with the position along the survey path, with a R2 of the best fit line of 0.9 and a slope of 26° (implying values of about −13° and 13° at the two ends of the survey path, respectively). Also, the linear regression of Δξ and Δρ versus position along the survey path shows a remarkable fit with the measured distortion of model 2 (indicated with red circular markers in Fig. 3B). In detail, the measured distortion represents the angular difference between the reconstructed and real-world scene. A similar analysis has not been conducted for model 1, as the distortion for this model is ostensibly <0.5° along the entire survey path, and the red markers in Figure 3B would all lie at y = 0.
In summary: (1) Δξ and Δρ have similar trends in all plots; (2) the “distorted” model (model 2) has higher average values for the absolute value of Δξ and Δρ, higher slopes of the best-fit linear regressions for the Δξ and Δρ versus measured ξ plots, and higher slopes of the best-fit linear regression for the plots of Δξ and Δρ versus position along the survey path; (3) the graph relating Δξ and Δρ to the position along the survey path (which is displayed only for model 2) overlaps the measured distortions along the model.
In Figure 4, we show how well model 1 conforms to the geometry of the mapped scene. Figure 4A displays an orthophoto of the area of the Acquedotto Felice. Figure 4B displays the same orthophoto with topographic contours, and with the reconstructed 3-D model (green markers) seen in orthographic nadir view. Note that the reconstructed model shows excellent agreement with the northeasterly facing exterior wall of the Acquedotto Felice, with an angular deviation of <0.5°. This value becomes ∼2.5° when considering the magnetic declination of the area (∼3° east). Figure 4C displays the frontal view of the model using orthographic projection, with the slope computed between two points of known altitude (topographic contours displayed in Fig. 4B). The difference between the computed (2.11°) and real (1.07°) slope is close to 1°. Finally, using OpenPlot (Tavani et al., 2011), we computed the dip (i.e., the measure of how vertical the aqueduct wall is) of a 60-m-wide segment of the aqueduct, which is 89.5° (only the near-vertical portions of the aqueduct were used for this purpose) and serves to quantify the angular difference around a third axis nearly orthogonal to the previous two (i.e., the vertical axis and the axis parallel to the view direction of Fig. 4C). In summary, in order to fully fit the real geometry of the aqueduct, the reconstructed model 1 must be rotated ∼2° around a vertical axis and ∼1° about two mutually perpendicular horizontal axes.
Field models consist of three models of geological exposures from the Oman Mountains (also known as Al-Hajar Mountains; eastern Arabian Peninsula), where hundred-meter-wide poorly vegetated exposures were mapped using a Xiaomi MiA1 smartphone (12 Mpx; sensor size, 5.11 × 3.84 mm; focal length [35 mm equivalent], 26 mm) and a Nikon D5300 DSLR camera (24 Mpx; sensor size, 23.5 × 15.6 mm; focal length [35 mm equivalent], 27 mm). For model 3, no compass calibration was carried out before image capture, with airplane mode switched off during image acquisition. Conversely, for models 4 and 5, the smartphone was set to airplane mode and compass calibration was performed prior to photo acquisition.
Models of the three exposures were independently built from the smartphone and DSLR data sets using Agisoft PhotoScan (Fig. 5). We attempted to merge the two photographic data sets (i.e., smartphone and DSLR) within a single model. However, this led to point clouds with lower vertex densities when compared with models generated exclusively from DSLR images. Once constructed, the point clouds were imported into CloudCompare (https://www.danielgm.net/cc/), an open-source point cloud processing and analysis software tool, where the smartphone and DSLR models were manually aligned using a minimum of six keypoints. For each model, and for the overlapping area between smartphone- and DSLR-derived models, the vertex-to-vertex distance between compared models was computed. Deviations between the transformed models are displayed in Figure 5 as percentage of the exposure width. For all models, this difference is below ∼0.1%–0.2%. It is worth noting that such a value incorporates both geometric differences between the models and divergence related to the alignment procedure. An optimized alignment could potentially reduce the calculated disparities between the compared models. However, at this stage our main purpose is to establish the equivalence of the DSLR and smartphone models in order to demonstrate the reproducibility of results from each data-capture modality.
For the reorientation of smartphone models, the same procedure described for test models 1 and 2 was repeated. The measured and estimated camera orientation parameters were used to derive the rotation matrix and values of Δξ and Δρ for each photo. The average value of the modulus of Δξ and Δρ for models 4 and 5 is extremely low (<2°), while for model 3 it is 4.6° for both parameters. Due to the similar behavior of Δξ and Δρ, as seen in Figure 3, for these three field models, we plot only Δξ versus the position of the photo along the survey path and the measured photo direction ξ (Fig. 5). For all models, Δξ versus position along the survey path is characterized by linear regressions having a slope <3.2 and R2 <0.31, which means that the maximum distortion between the edges of the models is ∼3°, indicating that the doming effect is negligible (especially for models 4 and 5). For models 4 and 5, the slope of the best-fit linear regression between Δξ and measured ξ is <0.06, whereas for model 1 it is 0.7. The derived rotation matrix was also used to rotate the three smartphone models in CloudCompare, obtaining oriented smartphone models. Oriented DSLR models (with higher resolution and improved noise characteristics) were then obtained by alignment to these previously oriented smartphone models.
Oriented DSLR-derived models, seen from above in orthographic projection mode (here termed DSLR orthophotos), are shown in Figure 5, along with the same area as seen in orthophotos from Google Maps. Some key features (e.g., vertical cliffs, large boulders, buildings, trees) have been traced on both Google Maps and DSLR orthophotos to scale the DSLR model and later estimate the angular difference between the oriented DSLR models and georeferenced aerial photography. Rotations around the vertical axis of <2° had to be applied to models 4 and 5 to match Google’s orthophotos (which remained <2° when considering the present-day magnetic declination of the area, 1.5° east). In contrast, model 3 is strongly misoriented, as indicated by a rotation >40° being required to match features observed in the DSLR orthophoto and Google’s orthophoto. It should be noted that user errors introduced during the tracing procedure do not appear to significantly affect these values: considering the resolution of both Google’s orthophotos and our DSLR models (∼1 m), errors resulting from the manual picking of objects is on the order of a couple of meters, which for the studied 0.5-km-wide region of interest, translates into maximum admissible angular errors of <0.3°.
DISCUSSION AND BEST PRACTICE
The potential application of orientation of photographs, rather than the position of known features, to accurately orient SfM-MVS photogrammetry–derived models was recently suggested by Fleming and Pavlis (2018). Here, we have expanded upon this proposition by using the orientation parameters associated with smartphone-captured imagery to produce accurately oriented intermediate-resolution models. These were later used to successfully orient higher-resolution DSLR derived models. We have also computed the difference between measured and estimated camera orientation parameters, namely Δξ and Δρ, which provides an indirect indication of the quality of the reconstruction.
Both systematic and random errors occur in the smartphone measurements (Allmendinger et al., 2017), and SfM-MVS models always include distortions and range-error artifacts (Carrivick et al., 2016). Systematic measurement errors for smartphones are undetectable without an external control tool (e.g., the Google Maps orthophotos used in Figs. 3 and 5). Conversely, random measurement errors in the smartphone camera attitude or in the extrinsic camera calibration are expected to produce a mismatch between the measured and estimated camera orientation. Anecdotally, our case study results are in agreement with the above, whereby lower values of Δξ and Δρ correspond to models with null observable global geometric distortions. The average values of Δξ and Δρ for models 1, 4, and 5 is ∼2°, and, as observed in Figure 4, this corresponds to a mismatch between the reconstructed model and the real-world scene of ∼2°. The two models with distortions (i.e., models 2 and 3) are characterized by average values of Δξ and Δρ ranging between 4.6° and 7°. The source of error for these two models is different, as depicted by plots of Δξ versus position along the survey path and measured ξ. Figures 4 and 5 show that for models 1, 3, 4, and 5, no remarkable doming effect occurs. For all of these models, Δξ versus position along the survey path graph is characterized by a best-fit linear regression with low R2 (<0.31) and low slope (ranging between −0.5 and 3.1). Conversely, for model 2, where we induced a doming effect, the slope value is ∼26 and the value of Δξ is essentially controlled by the position of the photograph along the survey path, as evidenced by the high R2 (0.93). Also, it is worth noting that in this graph for model 2, the Δξ overlaps the measured distortion of the model. In agreement, Δξ versus position along the survey path successfully describes the model’s geometric distortion associated with the doming effect. The source of error for model 3 is instead associated with errors in the smartphone inclinometer and magnetometer measurement, which we attribute the correct operating procedure not being applied prior to and during data capture (note that the compass and the accelerometer were not calibrated and airplane mode was not activated before photo acquisition for this model). This results in a model registration based upon unreliable orientation data. This is in line with previous work on the use of smartphones as measurement tools (Allmendinger et al., 2017; Novakova and Pavlis, 2017), indicating that recalibration of sensors before acquiring images for building a VOM should be practiced to avoid orientation measurement errors. The measurement errors in model 3 are manifested not only by the high average value of the modulus of Δξ and Δρ, but also by plots of Δξ versus measured ξ. Notably, the slope of the linear regression of Δξ versus measured ξ for model 3 is 0.9, whereas it is <0.1 for the optimized models (i.e., models 1, 4, and 5). To summarize, Δξ provides two additional parameters, which are the slopes of the best-fit lines of (1) Δξ versus measured ξ and (2) Δξ versus position along the survey path. These two derivative parameters serve to define the quality of a model and, in the case presented here, to understand the source of the model’s distortion. However, the Δξ parameter can also be used independently to discriminate between geometrically accurate and distorted models: anecdotally, we find that average values of Δξ <2° provides indication of models reconstructed with a high degree of geometric fidelity.
A final note concerns the accuracy of smartphone sensors (i.e., magnetometer and accelerometer), which forms the most conspicuous potential source of error within the presented workflow. We have provided strong evidence supporting the high degree of accuracy of smartphone-based attitudinal measurements (i.e., errors <1° for the test model 1). This is not surprising, as smartphones have previously been successfully employed as a compass during many field campaigns. During these campaigns, comparison with data acquired by means of a Silva compass constantly indicates that errors are typically <2°. To quantify these discrepancies more accurately, we suggest that users conduct stability and accuracy analyses before employing specific smartphones for orientation data collection, as individual models of handsets may have error characteristics that are different from those of the model used within this study.
In this work, we have proposed a workflow to produce properly oriented 3-D models by means of terrestrial SfM-MVS photogrammetry, which involves the use of smartphone photos. Our results are encouraging and indicate that high-precision registrations of 3-D reconstructed scenes can be achieved in a few simple steps. Orientation parameters of smartphone photos can be used with the twofold purpose of orienting high-resolution DSLR models and providing input for estimating model quality. We have individuated three parameters defining the quality of a model: (1) the average value of the modulus of Δξ and Δρ, which, roughly, is of 2° or less for high-quality models with null geometric distortion, and >4 for models having noticeable distortion; (2) the slope of the best-fit line of a plot of Δξ versus position along the survey path, which in high-quality models is <3 (or <−3); and (3) the slope of the best-fit line of a plot of Δξ versus measured ξ, which in high-quality models is ≤0.1.
Comments by Reuben Hansman, Zachariah Fleming, and Terry Pavlis greatly helped us to improve the original version of the paper.