Studies of active fault zones have flourished with the availability of high-resolution topographic data, particularly where airborne light detection and ranging (lidar) and structure from motion (SfM) data sets provide a means to remotely analyze submeter-scale fault geomorphology. To determine surface offset at a point along a strike-slip earthquake rupture, geomorphic features (e.g., stream channels) are measured days to centuries after the event. Analysis of these and cumulatively offset features produces offset distributions for successive earthquakes that are used to understand earthquake rupture behavior. As researchers expand studies to more varied terrain types, climates, and vegetation regimes, there is an increasing need to standardize and uniformly validate measurements of tectonically displaced geomorphic features. A recently compiled catalog of nearly 5000 earthquake offsets across a range of measurement and reporting styles provides insight into quality rating and uncertainty trends from which we formulate best-practice and reporting recommendations for remote studies. In addition, a series of public and beginner-level studies validate the remote methodology for a number of tools and emphasize considerations to enhance measurement accuracy and precision for beginners and professionals. Our investigation revealed that (1) standardizing remote measurement methods and reporting quality rating schemes is essential for the utility and repeatability of fault-offset measurements; (2) measurement discrepancies often involve misinterpretation of the offset geomorphic feature and are a function of the investigator’s experience; (3) comparison of measurements made by a single investigator in different climatic regions reveals systematic differences in measurement uncertainties attributable to variation in feature preservation; (4) measuring more components of a displaced geomorphic landform produces more consistently repeatable estimates of offset; and (5) inadequate understanding of pre-event morphology and post-event modifications represents a greater epistemic limitation than the aleatoric limitations of the measurement process.

The geomorphic expression of active fault zones contains valuable information about earthquake surface ruptures, including offset amounts and their distribution along and across a fault. Where a dominant sense of slip persists, horizontally and vertically offset geomorphic features can be used to constrain cumulative offset after initiation and quasi-stabilization of the landforms (e.g., Wallace, 1968, 1990; Burbank and Anderson, 2001; Cowgill, 2007; McCalpin, 2009) (Fig. 1). In addition to developing long-term slip histories, offset features can be coupled with paleoseismic and geochronologic constraints to reconstruct surface offset distributions of successive earthquake events. Such information is essential for estimation of paleo-earthquake extents and magnitudes and for evaluation of conceptual models for earthquake recurrence (e.g., Shimazaki and Nakata, 1980; Sieh and Jahns, 1984; Schwartz and Coppersmith, 1984; Field et al., 2014).

Several recent studies have highlighted the scientific potential of high-resolution topographic data sets for reconstruction of strike-slip surface offsets, formulation and evaluation of earthquake recurrence models, and earthquake forecasts (e.g., Hudnut et al., 2002; Haugerud et al., 2003; Grant-Ludwig et al., 2010; Zielke et al. 2010, 2012; Salisbury et al., 2012). Direct reconstruction of slip in earthquakes using these data will come from full three-dimensional differencing of data sets that are recorded before and after great earthquakes (where they exist; e.g., Borsa and Minster, 2012; Nissen et al., 2012, 2014; Oskin et al., 2012). Where full displacement fields are unattainable (i.e., for past earthquakes), surface slip accumulation patterns come from reconstruction of preserved offset landforms measured in the field or with high-resolution topography derived either from lidar scans or structure from motion (SfM) photogrammetry.

The increasing availability of high-resolution topographic data and resources for offset reconstructions is a provocative prospect, but models of slip accumulation are only as reliable as the individual slip measurements on which they are based. In practice, making reliable measurements in the field or with remotely assessed high-resolution topographic data is not a trivial task, in part because the initial conditions (i.e., shape) of the channel or marker are not known, and because the geomorphic modification of an offset feature is often not well understood or constrained. In addition, the ability to reliably assess offset landforms is controlled by user experience and is one of many factors (e.g., climatic calibrations, geomorphic evolution) whose influences we must research as we pursue these types of studies.

In this paper we examine the influence of operator decisions on remote offset measurements to provide a framework for remotely analyzing strike-slip offset. Although there are many similarities, dip-slip faults produce a fundamentally different type of offset landforms with somewhat different sources of uncertainty. We provide best-practice recommendations for making remote measurements of tectonically offset geomorphic features, provide information regarding the best way to report measurement data for fault behavior analysis, and provide insight into common challenges faced when making remote measurements. This work represents a critical step toward enhancing consistency of analyses based on high-resolution digital topography and establishes community protocols for future work.

We used a database of meter-scale slip measurements compiled from numerous paleoseismic and tectono-geomorphic studies of active faults in California (Uniform California Earthquake Rupture Forecast version 3, UCERF3; Madden et al., 2013; Field et al., 2014) to summarize the existing field- and remotely based measurements of offset geomorphic features, compare the existing field-based measurements with new remote offset measurements, and investigate the benefits of standardizing offset measurement methods and reporting schemes. With a series of public surveys, we explored the influence of investigator experience, offset quality, and measurement tools on the repeatability of remotely measured fault-offset geomorphic features and factors that can affect measurement accuracy.

The early studies of offset geomorphic features were conducted in the field with the support of aerial photographs (e.g., Wallace, 1968; Sieh, 1978; Lienkaemper and Sturm, 1989). However, comprehensive field examinations are often impractical or impossible because of temporal, financial, and land access limitations. In such conditions, remotely sensed data sets alone can provide additional coverage and are useful for the identification of new faults and displaced geomorphic features. The aerial imagery and high-resolution (i.e., <1 m/pixel) topographic data sets increasingly supplement and sometimes replace conventional field studies. Measuring offsets is not trivial and it is important that they are conducted with care (because they have a direct influence on the resulting slip accumulation patterns). It is therefore relevant to estimate the reliability and repeatability that is associated with those remote-sensing measurement approaches.

We restrict this discussion to high-resolution topography and evaluate tools that are currently used for purely remote measurements to evaluate their effectiveness and repeatability given variable degrees of user skill levels. For a complete description of those tools and underlying methods, please see “Measuring Earthquake-Generated Surface Offsets from High-Resolution Digital Topography” in Supplemental File 11. A recent review (Zielke et al., 2015) also summarized aspects of this activity. In this paper we explore in depth the validation of offset measurements. We briefly introduce the UCERF3 data set as a basis for our study, the major assumptions inherent to studies of offset features, and the generally accepted methods for making offset measurements.

Uniform California Earthquake Rupture Forecast 3

The Working Group on California Earthquake Probabilities recently updated databases describing active faults and paleoseismicity within California in a major effort known as the Uniform California Earthquake Rupture Forecast 3 (UCERF3;; Field et al., 2013, 2014). The UCERF3 offset database focuses on California’s fast-slipping strike- and dip-slip faults, combining historic, prehistoric, paleoseismic, and geomorphic data for single and multievent offsets. The database (UCERF3, Appendix R; Madden et al., 2013) represents the best available compilation of fault offset data from a variety of investigators, faults, environments, base maps, and quality rating schemes (Fig. 2), and provides an excellent opportunity to examine a large number of results from paleoseismic and tectono-geomorphic studies. However, this data diversity was also challenging because it meant we had to regularize the different data sets to make them comparable.

Madden et al. (2013) standardized the quality rating schemes used to rank offsets in existing studies by assigning each UCERF3 database entry a rank from 1 to 3, where 1 is high (best) quality, and 3 is low (worst) quality. Offset measurements lacking a quality rating (predominantly from historic ruptures) were assigned a quality of 1 for the UCERF3 compilation. For potentially hazardous fast-slipping faults that did not have existing work, Madden et al. (2013) generated new measurements of meter-scale offsets from analyses of high-resolution topography data sets using a standard measurement protocol. In this study we treat this subset of new measurements as its own data set as we explore methodologies and build upon the reporting standards proposed by Madden et al. (2013). We use the UCERF3 offset database as part of our examination of repeatability of surface offset measurements from high-resolution topography. An examination of field-based offset measurements with similar intentions was conducted by Scharer et al. (2014). A review of data types and methodologies of some recent studies with considerations for earthquake recurrence models was provided in Zielke et al. (2015).

Offset Channel Measurements

Inherent to the reconstruction of lateral slip in past earthquakes using offset landforms are four major assumptions: (1) offset along faults occurs coseismically and postseismically (with no significant interseismic contribution by creep); (2) deformation is focused along the fault with little to no off-fault deformation; (3) the frequency of erosional and depositional events that sculpt landforms is such that sufficient markers are generated between successive earthquakes (if geomorphic markers are altered less frequently than earthquake recurrence, even the smallest discernable offsets may represent multiple earthquake ruptures); and (4) offset in successive earthquakes is large enough to be uniquely recognized in an offset landform. This may not always hold in the reconciliation of inferred offset sequences from landforms and rupture sequences from paleoseismology (e.g., Zielke et al., 2010, 2015; Akciz et al., 2010; Grant Ludwig et al., 2010).

The preservation of offset markers is dependent upon a variety of conditions; the fidelity of landforms to record tectonic offset depends not only on the original shape and orientation with respect to the fault, but also on the climatically controlled postoffset erosional and depositional modifications to the feature. Following tectonic displacement, pre-earthquake patterns of aggradation (or degradation) can be altered (Haddad et al., 2012), and in some instances, streams with high transport capacity may bury or erode tectonic offsets completely. Most important, the relationship between the size of a geomorphic feature and the magnitude of tectonic offset will ultimately dictate whether an earthquake will leave a distinct mark in surficial geomorphology (Cowgill, 2007). Given that surface-rupturing strike-slip earthquakes typically produce surface offsets of 1–10 m (Wells and Coppersmith, 1994), ephemeral channels of 1–100 m width provide the best opportunity to measure past offsets and are thus the most common for developing slip reconstructions (San Andreas fault [SAF]: Wallace, 1968; Sieh, 1978; Lienkaemper, 2001; Zielke et al., 2010, 2012; Garlock fault [GF]: McGill and Sieh, 1991; San Jacinto fault [SJF]: Salisbury et al., 2012; Elsinore fault [EF]: Rockwell and Pinault, 1986; Rockwell, 1990; Talas Fergana fault: Trifonov et al., 1992; Altyn Tagh fault: Washburn et al., 2001; Fuyun fault: Awata et al., 2010; Klinger et al., 2011; North Anatolian fault: Kondo et al., 2005, 2010; Bocono fault: Audemard et al., 2008; Denali 2002 earthquake: Haeussler et al., 2004; see also reviews by McCalpin, 2009; Yeats et al., 1997; Burbank and Anderson, 2001).

An offset measurement typically contains multiple parts: the quantitative measurement of tectonic offset, the quantitative, aleatoric uncertainty of that measurement, and an assessment of epistemic quality associated with the measurement. Measuring the tectonically offset features requires delineation of several geomorphic components, including the fault trace orientation and width, offset landform elements (e.g., the channel margins or thalweg), and the projection lines of landform elements into the surface fault trace (e.g., the piercing line). The along-fault distance between landform element projections is the offset measurement (Sieh, 1978; Lienkaemper and Sturm, 1989; Lindvall et al., 1989; Lienkaemper, 2001) (Fig. 1). Quantitative (aleatoric) uncertainty of the measurement typically comes from assessment of minimum and maximum credible offset reconstructions (e.g., Lienkaemper, 2001). The acceptable offset range is dependent on the scale of geomorphic features versus magnitudes of offset, the clarity of landform features, and the precision of particular measurement tools.

The quality rating is an assessment made by the geologist and depends on the simplicity of landform projections and fault trace delineations. This rating has been conducted several ways, but typically high-quality measurements are made from obvious fault-normal piercing lines that are offset by narrow, well defined fault zones; low-quality measurements are made from less-obvious, ambiguous, poorly preserved, or highly oblique piercing lines that are offset by a broad, poorly defined fault trace (Sieh, 1978; Lienkaemper, 2001; Madden et al., 2013). We represent these two important quality controls with a bivariate rubric that compares obliquity between the offset feature elements and the fault zone with fault zone width (as an indicator of structural complexity) (Fig. 3).

One of the primary controls on measurement accuracy stems from the difficulty of remotely interpreting the evolutionary history of a landform (both before and after tectonic perturbations). These types of epistemic uncertainties directly control the soundness of quantitative, aleatoric uncertainties and are often difficult to unravel without field excavations (Scharer et al., 2014). However, a rating scheme of some sort helps to systematize what is discernable in the topography and it can be useful for subsequent data compilation and stacking, effectively emphasizing highly reliable measurements and deemphasizing questionable ones (McGill and Sieh, 1991; Zielke et al., 2010, 2015; Klinger et al., 2011; e.g., McCalpin, 2009; Salisbury et al., 2012; Madden et al., 2013).

There are significant advantages to using imagery and high-resolution topography to measure surface offsets. Aerial views of offset features preclude some of the foreshortening associated with human perspectives on the ground and in some instances (e.g., in dense vegetation) can provide a more representative view of the offset geomorphic feature (e.g., Lienkaemper, 2001; Salisbury et al., 2012). Furthermore, the ability to change lighting direction (hillshade rendering) helps to illuminate features in complex terrain (Oskin et al., 2007). Klinger et al. (2011) used the aerial perspective to assess the quality of an aggregate of channels after a single restorative back-slipping step. Similar to Lienkaemper and Sturm (1989), Zielke and Arrowsmith (2012) utilized recently acquired high-resolution topography to define channel shape for automatic detection of piercing lines with minimal subjective user input with a program called LaDiCaoz (lateral displacement calculator, by O. Zielke).

Reporting Offset Measurements

Historically, individual offset measurements have been reported using a range of approaches. Offset is usually presented as a single measurement (typically the offset reconstruction preferred by the scientist) with uncertainties on that measurement (e.g., Lienkaemper, 2001; Sieh, 1978). Most of the literature does not discuss which probability distribution should be used to describe the measurement. Exceptions include McGill and Sieh (1991), who assumed that a Gaussian probability distribution was appropriate and used the preferred measure and uncertainties as the mean and 2σ uncertainties, respectively (Fig. 4). Subsequent studies have experimented with alternative probability distribution shapes. In instances where offset reconstructions are less clear and preferred offset estimates span several meters, a rectangular or trapezoidal (boxcar) distribution is useful (Brooks et al., 2013; Fig. 4). Alternatively, triangular probability density functions (PDFs) provide a simple representation of measurement data, particularly when measurement uncertainties are asymmetric (Madden et al., 2013; Fig. 4).

Several recent studies use LaDiCaoz, a Matlab ( script to determine the offset (Chen et al., 2015; Salisbury et al., 2012; Zielke and Arrowsmith, 2012). This program determines the offset by improving the goodness of fit between two cross-feature profiles upstream and downstream of the fault; from this, the user determines preferred offset and a range of offsets. When these are assumed to be Gaussian PDFs, the best estimate of offset magnitude is the mode, and plus-minus estimates (aleatoric uncertainties) represent ±2σ uncertainties (black curves, Fig. 4).

Representing offset magnitudes as distributions offers an intuitive method of combining individual PDFs along strike for cumulative offset probability distributions (COPDs) for a fault reach (over length of 102-103 m). COPDs may reveal groups of similarly offset geomorphic features that represent slip in individual ground-rupturing events, a technique pioneered by McGill and Sieh (1991), among others. For this step, the individual PDFs can be scaled according to their qualitative ranking to create weighted COPDs, thereby emphasizing offsets with low epistemic uncertainties and deemphasizing those with high epistemic uncertainties. Each style of measurement representation has distinct advantages and disadvantages in terms of true representation of the epistemic and aleatoric uncertainties and the generation of the COPD. An additional review on COPDs was provided in Zielke et al. (2015), including their construction and interpretation.

We report what factors ultimately control the overall accuracy of a measurement and what level of precision is achievable by users of different skill levels with tools of varying complexity based on analysis of the UCERF3 offset feature database and our own controlled experiments.

Analysis of the UCERF3 Compilation

We utilized the UCERF3 database to summarize traits of existing measurements of geomorphic features offset of as much as 20 m. We noted the number of component measurements (i.e., individual horizontal and vertical offset measurements) in addition to the number of unique geographic measurement sites. In many instances, multiple measurements were made at the same location (horizontal and vertical offsets recorded by the channel thalweg and one or two of the channel margins) or multiple measurements were made using different methods (lidar, aerial photographs, field measurements) for the same feature. For an investigation of method reliability, we assessed the consistency of replicate measurements made at a point with different tools. We mined the database to compare existing field-based measurements with lidar-derived offset measurements where both exist for particular landforms, and we analyzed new lidar-derived measurements made specifically for the UCERF3 effort.

Offset Measurement Validation Experiment

We explored validation of offset measurements by inviting the participation of students, colleagues, geoscience community members, and the general public to measure 10 predefined geomorphic offsets using high-resolution topography as a base. Our experiment consisted of two major components: an online public survey element (conducted fall 2012–fall 2013; n = 55 participants) and a classroom-based hardcopy element (conducted fall 2012–spring 2014; n = 102 participants). The setup for both was the same: we chose 10 different offset features from major active faults in western North America and asked people to measure them. The materials used in this study are provided in Supplemental Files 22 and 33. We focused primarily on major strike-slip faults where geomorphic features that developed roughly normal to fault strike are horizontally offset by single or repeated surface-rupturing earthquakes. Most fault-offset features we chose are of fluvial origin (e.g., channel walls, margins, or thalwegs) and are composed of elements that can be projected to the fault plane and used as piercing lines to estimate fault slip. Features vary in estimated age from several to hundreds of years old and are of poor to excellent quality. Participants were told that offsets were along northwest-striking right-lateral faults, but in general there was no annotation of the figure to indicate the fault or offset. Site locations are shown as yellow stars in Figure 2. Survey responses (including mapped fault traces and piercing lines) were anonymously submitted to an online database or the document was filled out by hand and mailed to us.

In addition to the measurement results from the surveys, we collected information about experience levels of participants with three questions. The first question asked about general experience level: (1) I have no prior experience whatsoever. (2) I am familiar with the basic geologic principles and/or high-resolution topographic data. (3) I have measured offset geomorphic features in the field or with high-resolution topography/imagery. (4) I have extensive experience measuring offset features in the field or with high-resolution topography/imagery.

The second question gathered information about data types that one may have previously used to measure offset features (field methods, aerial photography, high-resolution digital elevation models) and how measurements were made (tape measure or ruler, total station, Google Earth [], geographic information systems [GIS]). The third question asked whether one had taken or taught field geology, geomorphology, earthquake geology, Quaternary geology, tectonic geomorphology, or GIS.

We selected three primary methods by which to complete the survey in order to reflect the range of work styles and experience of current researchers. The different tools included a paper image and scale, the Google Earth ruler tool, and a Matlab GUI (graphical user interface) for calculating backslip required to properly restore tectonic deformation (LaDiCaoz; Zielke and Arrowsmith, 2012). In one subexperiment we used a simpler variant of LaDiCaoz for classroom studies, allowing the fault restoration to be determined by progressively backslipping images of topography without a corresponding explicit goodness-of-fit determination.

The paper-based survey was designed to be suitable for classroom dissemination, but some individual participants also used it. The survey was used in undergraduate geology classes at San Diego State University, Arizona State University, and the University of Potsdam, Germany. Each image consisted of a combination of three lidar-derived products: an opaque hillshade, a semitransparent digital elevation model (DEM), and a contour map. We used both EarthScope (Prentice et al., 2009) and B4 lidar (Bevis at al., 2005) data, the latter of which were manually filtered to remove vegetation using a multiscale curvature classification algorithm (Evans and Hudak, 2007). Map scales and contour intervals ranged from 1:175 to 1:800 and 10 to 100 cm, respectively. Participants were asked to delineate the fault and geomorphic features (e.g., channel thalweg, channel margins, bar crest) used to estimate tectonic offset. Each page had a scale bar on the bottom right corner that was torn off and used for measuring. Participants were asked to report the measurements and uncertainties and to rate the quality of the offset using the provided rubric.

The Google Earth–based measurement survey was popular because of convenience. We saved georeferenced map images from the paper survey as *.kmz files and provided them for download from the survey webpage. It was therefore possible to zoom to each site, view topographic imagery and contextual image data, delineate features, and measure offsets with little GIS experience. Survey instructions included step by step text as well as short YouTube video tutorials on the use of the Google Earth application for this purpose. For each site and/or image, the participants (1) zoomed to the site, (2) defined the fault and offset features as paths for at least one offset (but they were encouraged to use multiple offset landscape elements), (3) measured the offset features using the ruler tool, and (4) saved the result from the measurement with a title corresponding to the analyzed feature (e.g., “channel thalweg measurement”). In addition, we asked for any other comments to be included with the measurement path description.

The resulting measurements and line work were saved as a location file in Google Earth (*.kmz) and anonymously uploaded to our database upon completion of the experience survey.

The LaDiCaoz graphical user interface allows for direct interaction with DEMs to measure and record horizontal offsets (Zielke and Arrowsmith, 2012). We provided raw, small-scale DEM files for each of the ten sites and assumed that participants had experience using LaDiCaoz to measure offset features. We circled targets on the topographic images to ensure that participants measured the same offsets because the small-scale maps contained several offset features. LaDiCaoz allows users to save preferred offset measurements, the measurement uncertainties, and the quality ratings. These results were anonymously uploaded to our server upon completion of the experience survey.

While the available measurement methods spanned a range of complexity, most submitted responses were generated in Google Earth. We sifted results manually and compiled offsets, measurement uncertainties, and quality ratings, grouping measurements by particular geomorphic features as some sites contained multiple offset stream channels. In the case of our Google Earth results, we collected traces that participants used to delineate fault zones and offset landforms for graphical comparison.

The UCERF3 database and our experimental survey provide a rich suite of data on which to build our understanding of offset measurements. We are also able to explore the controls of measurement accuracy for different groups of investigators. We start our presentation of results with exploration of the UCERF3 database and measures of offset magnitude, uncertainty, and quality, as measured from different physiographic settings by different investigators using lidar and field-based approaches. Transitioning from the UCERF3 examination, the final presented results come from our measurement experiment survey that included both the online public and the classroom hardcopy elements of this study.

Analysis of the UCERF3 Offset Database

There are 4918 component measurements (individual horizontal and vertical slip measurements) made at 1522 geographic locations along UCERF3-defined fault strands (Figs. 2 and 5; Madden et al., 2013). Of the total component measurements, 2759 are from historic earthquake ruptures (22 UCERF3 segments) and 2159 are of prehistoric offsets (40 UCERF3 segments). Most measurements in the UCERF3 database are of the highest quality rating (1) (Fig. 2), principally because many existing measurements had no initial quality rating and were assigned a high quality rank in the UCERF3 compilation (Madden et al., 2013). Most of them are also from twentieth century California earthquakes so the presumption of high quality preservation is reasonable.

Measurement methods differ significantly for historic and prehistoric offsets groups. Historic surface rupture measurements are dominated by field measurements, whereas the majority of prehistoric earthquake slip measurements are a combination of field- and/or lidar-based measurements (Figs. 5A, 5C). Measurements for more than half of the studied faults (16 of 25 strands) are exclusively from paleoseismic excavations with relatively few offset measurements (Fig. 2A). Slip measurements in these cases are made from subsurface channel or structural reconstructions with a wide range of uncertainties (Fig. 5D).

In most cases, only one measurement is made for each location, but for some faults there are significantly more measurements than measurement sites (Figs. 5B, 5D). This is particularly apparent along the prehistoric San Jacinto fault and Garlock fault ruptures, where there exist both field and lidar measurements for the same set of features. For historic ruptures such as along the Emerson fault (1992 Landers earthquake), there are many sites with both horizontal and vertical measurements made for the same geomorphic feature.

We note a crude logarithmic relation between offset magnitude and associated measurement uncertainty (Fig. 6). For the smallest field-based measurements (millimeter- and centimeter-scale historic earthquake ruptures), there are often no measurement uncertainties assigned. Where it can be determined, field-based historic earthquake measurements tend to have lower uncertainties for a given offset than prehistoric earthquake measurements, the majority of which involve more degraded geomorphic features analyzed using aerial photographs or high-resolution topography. Many different investigators made these measurements using a variety of methods in a range of site conditions.

In contrast, the new remote measurements compiled and generated for the UCERF3 database all used similar methods and reporting schemes, albeit by different investigators along different faults (Elsinore, Garlock, Owens Valley, creeping, Cholame, Carrizo, Big Bend, Mojave, and Coachella portions of the SAF and the Clark strand of the SJF). Note that the SJF and GF have accompanying field studies, and the 2004 Parkfield earthquake rupture measurements are not included. We categorize these new measurements according to the same subjective, semiquantitative quality ranking scheme (described in the discussion of methods), where 1 is for the highest quality and 3 is for the lowest quality; we use these quality ratings to compare other measurement attributes (Figs. 3 and 7). We define the difference between maximum and minimum estimated offsets for a feature as the acceptable offset range (AOR). For experienced investigators, AORs correlate with offset magnitude. The SAF is special in several ways: it has significantly more measurements, the largest average offset magnitudes per fault segment (>10 m) with correspondingly large AORs, and is the only fault system with increasing AORs and worsening quality ratings.

Several individual sites include field- and lidar-based measurements for the same set of offset landforms. In Zielke et al. (2010), new lidar measurements were compared with Sieh’s (1978) field-based measurements along the SAF. A comparison of field and two different lidar-derived measurements for numerous targets was presented in Salisbury et al. (2012) (Fig. 8A). Madden et al. (2013) compared lidar measurements from the Garlock fault with McGill and Sieh’s (1991) field measurements (Fig. 8B). In general, repeated observations are well correlated within the error of individual measurements. It was shown (Salisbury et al., 2012) that in some cases, field measurements were systematically lower than those from lidar surveys and attributed this to the synoptic perspective available from a remote view of the bare Earth (e.g., Lienkaemper, 2001).

Offset Measurement Validation Experiment

Our measurement survey results are divided into two categories: the online public survey element (conducted fall 2012–fall 2013), and the classroom-based hardcopy element (conducted fall 2012–spring 2014) (see Supplemental File 44).

For our online public survey, we received 55 anonymous responses (consisting of experience level, mapped fault traces, and piercing lines) from individuals of all experience levels. Of the 55 online responses, 28 participants used Google Earth, and we emphasize them in the following discussion. Even though we provided a simple quality-based rating scheme (Fig. 3), few of the participants reported measurement uncertainties or estimates of measurement quality. In some cases, the only quality descriptions were general, rather than guided by the scheme.

For comparison, we split responses into two groups: experienced users (levels 3 and 4) and inexperienced users (levels 1 and 2). The difference in experience level is manifest in the polygon that spans the faults mapped by each group; inexperienced users generally had a wider area that encompassed parts of the terrain for which there was no geomorphic evidence of a fault (e.g., scarps, hillside benches, offset topography) (Fig. 9). Reported offset measurements are predominantly in agreement with one another, with experienced users determining a slightly lower mean offset than inexperienced users in 7 of the 10 cases. Offsets 1, 3, and 6 have the best correlation between groups. In several instances, inexperienced users have more variable responses, either due to fault mislocation or fault azimuth variability (offsets 5, 8, 9, and 10). Offsets 2 and 4 represent significantly poorer interpretations by inexperienced users due to fault mislocation and fault-strike uncertainty, respectively. The standard deviation of the site measurements increases with larger feature size and total displacement (Fig. 9B).

We have ∼100 paper-based surveys from beginner-level participants in upper-level undergraduate geology classrooms at San Diego State University, Arizona State University, and the University of Potsdam. Many of these surveys were only partially completed, however, and the number of individual measurements for each of the 10 features is highly variable (where n ranges from 33 to 101). In addition to the overall group of paper-based surveys, we isolate two subgroups. In the first subgroup (group A, n = 9), geomorphology students completed the paper-based survey on two occasions: once prior to a lecture on neotectonics and strike-slip faulting and a second time one week later after receiving specialized instruction on how to recognize and measure offsets. In the second subgroup (group B, n = 14), students used the aforementioned slimmed-down version of LaDiCaoz at the University of Potsdam in Germany (supervised by O. Zielke).

Subgroup A did not show a marked change in mean offset measurements before and after the introductory lecture. Of 10 measured channels, averaged offset estimates of half increased and the other half decreased. However, the average of reported uncertainties for 8 of the 10 offset features significantly increased after the lecture, which we interpret as the students’ increased attention to subtleties of the geomorphology. In general, average quality estimates remained the same before and after the lecture. Subgroup B underestimated offset magnitudes in comparison to the measurements completed on paper by other groups, all of whom consistently underestimated offset magnitudes compared to those of the authors.

Together, the ∼100 beginner surveys represent a statistically significant population of measurement estimates. Extreme measurement outliers have been excluded, as we assume these discrepancies to be less associated with measurement variability and more associated with improper interpretation of offset features themselves (epistemic uncertainty). Figure 10 summarizes paper-based classroom survey responses. Average offset estimates and average AORs (for survey participants and authors) are depicted at arbitrary y-axis positions. Note that the geomorphic features are not necessarily depicted at the same scale at which they were measured (see paper-based survey in Supplemental File 2). These results show that while there is considerable spread among the beginning users, the measurement modes consistently are within the AOR defined by us.

There are a number of factors, both external and internal, that dictate an individual’s ability to get the “right answer” (Bond et al., 2007, 2011). In most scenarios, actual amounts of offset at a particular location are unknown and we consider our most agreed-upon measurement to be the correct answer. A correct measurement must ultimately begin with a proper interpretation of the geomorphic feature in question. Here we discuss the factors that control measurement repeatability for all experience levels.

Epistemic versus Aleatoric Uncertainty

Epistemic uncertainty relates to the overall interpretation of the geomorphic feature (its evolutionary history both before and after tectonic perturbations). Epistemic uncertainty, therefore, is intrinsic to all measurements and governs the validity of aleatoric uncertainty, a statistical uncertainty associated with the measurement process (black curves, Fig. 4). The tall narrow black curve in Figure 4 represents low aleatoric uncertainty and the short wide black curve represents high aleatoric uncertainty.

The results suggest that when a person examines and interprets the topography, and from this develops a model of the offset (i.e., what features to correlate across the fault), the difference in experience level among practitioners (a proxy for epistemic uncertainty, as experienced practitioners can better interpret tectonic versus geomorphic contributions to an offset) contributes a larger share of variability to the final measured offset than does discrete measurement error (aleatoric uncertainty). Particularly for the inexperienced user, it is likely that in some cases epistemic uncertainty will swamp aleatoric uncertainty. This is consistent with other studies (e.g., Gold et al., 2012; Scharer et al., 2014) that established that major discrepancies in offset estimates are usually attributable to improper feature interpretation rather than poor measurement practices. In particular, Gold et al. (2012) presented single-operator assessments of measurement error and uncertainties using high-resolution terrestrial-based lidar point clouds; they showed that while high-resolution data sets are fundamental to remotely measuring offset features, fine topographic data cannot necessarily reduce below a certain level the epistemic uncertainty associated with reconstructing the geomorphic features.

Operator Biases

Our interest in validation of geological measurement methods is not new. Bond et al. (2007, 2011) conducted a similar study focusing on interpretations of reflection seismic data by interpreters with various levels of expertise. In an attempt to quantify the subjectivity of seismic interpretation, Bond et al. (2007, 2011) defined conceptual uncertainty as the acceptable range of concepts that geoscientists apply to a single data set. Bond et al. (2007) argued that conceptual (epistemic) uncertainty must be incorporated into resulting geologic models because they represent fundamental unknowns that outweigh individual measurement uncertainties (aleatoric). Bond et al. (2007) concluded that a range of factors influence how an individual’s prior knowledge will affect interpretations, but that particular biases are as pervasive for those with 15 or more years of experience just as they are for those with very little experience. In particular, two types of biases are nearly unavoidable: anchoring and confirming biases.

An anchoring bias is failure to depart from initial ideas, whereas confirmation bias involves actively seeking facts to support one’s own hypotheses (while actively disregarding conflicting observations). In fact, investigators with more experience are likely to ask for confirmation biases, or some sort of a starting point (e.g., where in the world is this?; i.e., what fault am I looking at and what is the geomorphic setting). Weldon et al. (1996, p. 295) similarly cautioned “… bias could be derived from the unconscious choice of a best match of uncertain features that is consistent with previous choices. This statement is not meant to suggest any impropriety in the data collection, but to acknowledge that it is extremely difficult to avoid bias where measurements of ‘matches’ involves interpretation of the exact location of the feature being measured. From experience we know that after one finds several convincing offsets, one’s eye is keyed to looking for matches in that range, so that one will often overlook or misinterpret offsets that are unexpected…”. In the case of offset channels and our experiments, novices had less of a trained eye to locate and interpret features but also had fewer obvious existing biases. Conversely, experts can more readily identify and interpret features but they also have more preexisting expectations that lead to operator biases. Implementing a blind measurement approach, where the actual offset value is provided to the interpreter only after the measurement is completed, was suggested in Zielke et al. (2015); simple adaptation for field studies, as well as use of a modified version of LaDiCaoz, was proposed.

New user performance suffers due to inexperience. We noted behavioral peculiarities common to beginning users. Several comments provided by classroom participants suggest that professionals take for granted the ability to intuitively work with aerial perspective DEMs and high-resolution topography (e.g., ∼10 cm contour intervals). The Google Earth interface allows for some terrain familiarization and is typically preferred over the static, paper-based surveys because the zooming allows one to get a better overall view of the fault and feature orientation, whereas the paper-based surveys were all large-scale topographic maps, making feature delineation difficult. Beginners lack self-confidence in assigning uncertainties (in that they frequently omit aleatoric measurement uncertainties) and nearly always use symmetrical, Gaussian-style distributions around preferred offset values. Furthermore, aleatoric measurement uncertainties typically remain the same magnitude regardless of total amount of offset. For example, it is common for an uncertainty value of ±1.5 m (3 m AOR) to accompany a 5 m offset as well as a 25 m offset, even though the preservation and expression of such a large feature might be substantially inferior. After receiving detailed instruction about making measurements and assigning uncertainties, most beginners included larger AORs around preferred measurements of all magnitudes.

Identifying the Appropriate Fault Strike

Location and orientation of a fault strand along which a feature is measured can have a substantial impact on the interpretation of offset geomorphic features and subsequent measurement values. In addition, we see much higher measurement uncertainties when fault strike is ambiguous for a reach. Ease of fault delineation is controlled by expression and preservation of the local fault trace and in many instances, portions of prehistoric ruptures are no longer visible. One pitfall of only making remote measurements in the office is a tendency to focus on individual offsets and to search for them in intuitive locations (along an idealized linear fault trace). In contrast, field geologists are able to rely upon subtle geomorphological evidence of active faults traces to locate sequences of offset features. Where meter-scale faulting is not evident, a common practice is to resort to the regional-scale fault fabric and orientation for along-strike measurements. We think that this is a suitable substitute when microgeomorphology is no longer preserved or is below the resolution of available lidar data sets.

In Rockwell and Klinger (2013), it was shown that for the 1940 Imperial fault rupture (4–6 m of offset), making measurements with either a regional fault azimuth or with varying local azimuths (at the scale of tens of meters) will yield roughly the same reach-averaged estimate of offset. A range of measurements is typically acceptable (offset and symmetric and/or asymmetric offset uncertainty) for a given reach, and using a consistent approach to defining the fault strike can minimize the overall spread of offsets.

Fault Zone Width and Complexity

The width of faulting and the distance over which features are projected into and across fault zones have a significant effect on the accuracy of a measurement and associated uncertainty. Narrow (localized) fault zones offsetting clearly defined features require little or no projection and aleatoric measurement uncertainty is low. In contrast, a broad fault zone (as much as several meters wide) may lead to large aleatoric measurement uncertainty (regardless of preservation quality or linearity of geomorphic features). Similarly, we note an increase in measurement variation and user uncertainty as features deviate from the ideal fault-normal orientation. In cases where features require lengthy projections, measuring more feature components (e.g., thalweg, margins) results in an offset estimate closer to the collective mean (across various users) and a more robust estimate of measurement uncertainty.

Complications arise as fault zone width and complexity increase. Wide zones of coseismic deformation are often recognized as several discrete fault strands and it can be difficult to determine synchronicity of activity on adjacent strands. As this depends on the spatial scale of individual fault strands, geologic substrate, and subsequent feature preservation, these issues must be dealt with on an individual basis. Where ruptures are relatively young, the degree of scarp degradation can indicate relative ages of activity. For older ruptures this may not be possible. Typically, offsets along neighboring (parallel) fault strands are summed if geomorphic features appear to be roughly the same age, and this summed value is used as an estimate of slip at a point along strike. Reported uncertainties should acknowledge the largest possible range of offset in these cases; choice of a PDF should be guided by the assurance that the user has in the allocation of slip across a fault zone.

Natural Lateral Variability of Slip in Surface Ruptures

There are now several studies demonstrating significant lateral variability in offsets along historical surface ruptures. Using long fence lines and orchards of planted trees, 20%–30% variability in offsets over short distances (10–100 m) was shown along the 1999 İzmit and 1999 Düzce ruptures (Rockwell et al., 2002). Similar variations were observed along the 2010 El Mayor–Cucapah rupture in Mexico, using Cosi Corr (Co-registration of Optically Sensed Images and Correlation; technology (see Leprince et al., 2011), with kilometer-scale and 15-km-scale systematic variability. In a reassessment of the 1940 Imperial fault rupture (Rockwell and Klinger, 2013) hundreds of closely spaced crop rows and orchard tree alignments were used to measure lateral displacement, and ∼30% lateral variability over dimensions of tens to hundreds of meters was noted. All of these observations are consistent with earlier mapping along historical surface ruptures, but in previous cases it was commonly assumed that the variability was due to the inability to measure the full field of deformation. In contrast, the measurements using long crop rows that extended tens to hundreds of meters from the rupture trace show that these lateral variations in displacement are real and significant.

New studies of lateral variability of surface rupture slip have a direct impact on results of our study from several perspectives. First, if an observer locks into an offset magnitude because of high-quality measurements along a stretch of rupture (anchoring bias), there may be a tendency to repeat this offset value, even though the actual displacement has increased or decreased. Second, the magnitude of offset can be biased by the choice of local fault strike versus regional fault strike if measurements are not made consistently (Rockwell and Klinger, 2013). Both of these factors can have a significant influence on the perception of overall, average, and maximum displacement for an event, factors that are very important for earthquake hazard analysis.

Geomorphic Modification

As offset features age, surface processes modify fault traces and piercing lines and it becomes less likely that features will preserve true tectonic offset. In settings where fluvial modification is significant, there is a high probability that features will be obliterated, reoccupied, or buried. In areas where fluvial modification is low, however, a feature may persist for many successive earthquakes and perhaps even for multiple earthquake cycles. The subsequent modification of an offset feature plays a large role in whether the feature is a useful indicator of (actual) tectonic offset, and whether the feature is recognizable.

A simple proxy for the geomorphic diffusion, or smoothing, of offsets can be mean annual precipitation (MAP; e.g., Hanks, 2000). While climate has varied over the last millennium in California, spatial variation in decadal MAP may provide a useful relative gauge of the vigor of geomorphic smoothing in the UCERF3 database. Figure 11 shows a plot of measurement uncertainty (as a percentage of total offset magnitude) for a suite of offsets as a function of MAP along a corresponding fault reach. The SJF points refer to the Clark strand, divided into two segments to the northwest and southeast of Burnt Valley, and the SAF refers to the Cholame, Carrizo, Big Bend, and Mojave segments. The data suggest a weak trend of increased uncertainty where precipitation is higher (see following).

Klinger et al. (2011) suggested an exponential decrease of cumulative offset probability distribution (COPD) peaks with increasing offset magnitude (i.e., age) along the Fuyun fault. This phenomenon of decreasing COPD signal strength with increasing offset was attributed to the increasing number of successive earthquake events to which the offset has been exposed, and to the amount of fluvial modification and in situ geomorphic diffusion to which the offset has been subjected (see discussion in Zielke et al., 2015). A modest increase in single-investigator uncertainty with increasing MAP could suggest that geomorphic conditions associated with wetter sites are less likely to sharply preserve offset features. The resilience of a geomorphic feature is therefore a complete combination of internal and external factors at a particular location.

In this study we asked participants to measure features embodying a range of preservation states (some that would normally be avoided because of large epistemic uncertainties). This may help explain the variability in user-submitted responses for the more challenging sites. The tendency to avoid older, diffuse features predisposes studies to include only those features more recently offset, ultimately exacerbating the natural trend noted by Klinger et al. (2011) and limiting the age of earthquakes to which we can apply these methods.

Offset Quality

Of major importance to hazard models is the quality rating of a measurement. What level of quality is associated with a determination and what level of emphasis should a particular measurement receive? In the course of this study, we have seen two approaches to the quality rating. The first, more simple approach was a qualitative, gut reaction rating (set to some arbitrary numerical scale) that seeks to encompass several variables, such as the understanding of preoffset morphology, preservation of the feature, as well as fault trace and feature complexities (see Supplemental File 2). This intuitive approach is highly subjective, however, and is dependent upon experience in the field and with high-resolution topography. For this method to be most effective, a clear set of criteria must be defined prior to measuring offsets (e.g., Sieh, 1978), and some lower limit of acceptability must be established, below which offsets measurements would not be used.

The second approach, a semiquantitative rating rubric (Fig. 3), is more systematic (and less subjective than the gut reaction rating), but it is insufficient for adequate offset feature classification because obliquity of the features and the fault zone width are not necessarily the only controls on reconstruction quality. Even when we choose criteria by which to rate offset quality there will be some subjectivity involved with the process. In some cases, offsets received high or otherwise acceptable quality ratings according to our rubric, but user gut reactions were negative (e.g., if the tectonic nature of the offset was ambiguous). It is interesting that new measurements made for this study are predominately medium (2) quality measurements, with highest (1) quality ratings being the least common (Fig. 7A).

Styles of interpretation vary and depend on prior field experience. Investigators tend to subconsciously define quality thresholds for geomorphic features in question: if features fail to meet these often not consciously defined criteria, then features will be ignored and measurements will not be made. We argue that, particularly for lidar studies, it is important to make measurements of all potential features and then some features can be discarded or given low weight at a later date if necessary. Ignoring particular features can preclude one from discovering small-magnitude offsets, or along-strike variability that indicates multiple offsets. We suggest initially making as many credible measurements as possible, using a set of criteria to assign a quality rating, and then disregarding or shifting emphasis away from particularly low rated offsets later depending on the purpose of the study.

One interesting complication associated with quality ratings is how practitioners choose to treat lidar-based versus field-based measurements. Presumably we would approach both types of data sets in the same way (via numerical rating or rubric of some sort), but should field measurements inherently be more highly regarded for hazard calculations, or vice versa? In areas where fault zone width and rupture complexity are high, the synoptic view afforded by lidar or other remote sensing data is extremely useful for capturing full fault deformation. Conversely, where dense vegetation obstructs the ground surface such that high-quality bare-earth DEMs are unobtainable, field investigation is particularly advantageous. Where possible, a summary set of best measurements characterized both remotely and in the field is preferred.

The majority of UCERF3 offset measurements are from historic surface rupture studies and are dominated by field measurements (Madden et al., 2013). Driven by the desire to better understand faulting in the upper crust and to better inform earthquake hazard forecasts, there is an increasing trend toward lidar and other remotely based studies that utilize high-resolution topography to analyze historic and prehistoric ruptures.

This work examines key challenges faced when remotely measuring fault-offset geomorphic features. The ability of investigators to perform tasks (making measurements, assigning uncertainties and quality ratings) is highly dependent on the geomorphic quality (i.e., preservation) of offset features and digital representation of the features, and on the investigator’s previous experience with neotectonic principles, measurement tools, and fault-specific characteristics (that may introduce biases). Furthermore, fluvial channels in tectonically active regions are prone to change, and degradation begins immediately after feature formation. The longer lived and larger the geomorphic feature and associated offset, the greater the uncertainty becomes, making offset estimates far into the past more difficult to interpret. Consequently, the applicability of older offsets to fault rupture evaluation and estimating slip accumulation patterns also diminishes. The following conclusions can be made based upon our study.

1. Offset features (particularly those from prehistoric earthquakes) require significant interpretive work for a complete understanding of the effects of local climate, geologic substrate, microtectonic setting and other factors on the validity of slip measurements. This understanding is preferably verified in the field when possible, but practical limitations may prevent field studies in some places.

2. Direct comparison of field- and lidar-based measurements for the same geomorphic features (made by experienced investigators) shows that high-resolution topography techniques are a suitable means to investigate fault-offset geomorphic features. Standardizing remote measurement methods and reporting schemes that fully describe the uncertainties are crucial to the utility and repeatability of such studies.

3. Accurate and repeatable performance correlates well with experience. For all participants in our survey, major measurement discrepancies are typically due to different interpretations of the overall geologic features and history (epistemic uncertainty). However, we found that this more often occurs in the least-experienced populations; beginners have more issues with epistemic uncertainties (i.e., understanding topography, consideration of preoffset channel orientation and form, geomorphic evolution of offset features post offset) than experienced individuals. The bulk of our results, however, suggests that the measurement methods among both groups are sound.

4. Single-investigator comparison of measurements made in different climatic regions reveals systematic differences in measurement uncertainties. Climate, in this case, can be used as a crude proxy for geomorphic modification of offset features in general and warrants further investigation to increase the utility of studies that target older offsets and uninvestigated surface ruptures in different climate regimes.

5. For both remote and field studies, making measurements of all potentially offset geomorphic features is crucial. As we continue to investigate along-strike slip variability, it is important that we avoid biases by preselecting features to measure. Furthermore, measuring more components of an individual geomorphic feature (e.g., channel thalweg, margins) produces more consistently repeatable estimates of fault offset for a particular feature. For rating the quality of offset measurements, we suggest that a clear set of objective criteria be defined prior to measuring offsets, and all potentially offset geomorphic markers are addressed; even features that are later deemed to not be offset might tell us something about geomorphic processes at a point.

6. For experienced users, particular styles of offset representation (i.e., Gaussian normal, boxcar) become increasingly important because they provide valuable information regarding epistemic and aleatoric uncertainties associated with particular estimates of displacement. An inadequate understanding of pre-event morphology and post-event modifications (epistemic) represents a greater limitation than feature condition and subsequent representation in the field or computer laboratory, so in general we find that the uncertainties or the PDF should be generous rather than restrictive.

While field validation is useful for familiarization of fault zone characteristics, in many cases it can be impractical because of temporal, financial, and land access limitations. For these reasons, the use of lidar and other remote sensing–based studies of active fault zones is becoming pervasive and is something that practitioners must explore with a range of available tools. In this study we suggested preferred measurement and reporting protocols, a crucial first step toward enhancing consistency of high-resolution topography based analyses of active faults and establishing community protocols for future work.

Discussions with many colleagues have helped to focus our thinking on the problems identified in this paper. We thank the many participants in our surveys and Tim Dawson, Suzanne Hecker, and two anonymous reviewers for constructive comments. This work was supported by the U.S. Geological Survey National Earthquake Hazards Reduction Program (G11AP20029 and G11AP20020). The Uniform California Earthquake Rupture Forecast version 3 (UCERF3) was supported by the California Earthquake Authority, U.S. Geological Survey, and the Southern California Earthquake Center. The topographic data presented here were gathered by the National Center for Airborne Laser Mapping and processed and delivered by OpenTopography (

We received an exempt status from the Arizona State University Office of Research Integrity and Assurance Institutional Review Board for our research involving the use of educational tests with human subjects, Federal law 45 CFR 46.101(b) exempt category 7.2. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

1Supplemental File 1. Methods recommendations. Please visit or the full-text article on to view Supplemental File 1.
2Supplemental File 2. Classroom survey. Please visit or the full-text article on to view Supplemental File 2.
3Supplemental File 3. Offset locations and digital elevation models. Please visit or the full-text article on to view Supplemental File 3.
4Supplemental File 4. Classroom and online results tables. Please visit or the full-text article on to view Supplemental File 4.