Freely available online through the SEG open-access option.


Geologic models are based on the interpretation of spatially sparse and limited resolution data sets. Nonunique interpretations often exist, resulting in commercial, safety, and environmental risks. We surveyed 444 experienced geoscientists to assess the validity of their interpretations of a seismic section for which multiple concepts honor the data. The most statistically influential factor in improving interpretation was writing about geologic time. A randomized controlled trial identified for the first time a significant causal link between being explicitly requested to describe the temporal geologic evolution of an interpretation and increased interpretation quality. These results have important implications for interpreting geologic data and communicating uncertainty in models.


Geology is an interpretation-based science (Frodeman, 1995). Waste disposal, carbon capture and storage, and hydrocarbon exploitation require an increasingly nuanced understanding of the subsurface. Subsurface understanding is often based on remotely sensed geophysical imagery, combined with spatially limited borehole data. Interpretation is required for the value of such data to be realized through the creation of geologic models.

Traditional quantification of uncertainty in geologic models considers a single “base case” conceptual interpretation, within which the locations of interpreted horizons and/or model parameters are represented probabilistically (Harbaugh and Bonham-Carter, 1970; Mann, 1993). An inherent weakness of this type of approach is that it only explores part of the uncertainty space (Chamberlin, 1890; Bentley and Smith, 2008): Only one geologic concept is considered, whereas many could honor the data (Bond et al., 2008). Bond et al. (2008) propose that multiple models should be created in an interpretation work flow and then assessed for validity against accepted geologic rules. These rules are determined by the geometric relationships and physical properties of geologic units and their evolution through time (Bond, 2015). Expert elicitation may be used to assess the likelihood of each interpretation (Polson and Curtis, 2010). However, there remains a need to improve work flows to increase the number of structurally valid seismic interpretations (Bond et al., 2012).

Factors affecting interpretation

We conducted a survey to quantify the factors affecting geoscience interpretation, targeting experts from academia and the oil and gas industry. Bond et al. (2007, 2012) use a synthetic seismic image constructed from a single forward-modeled geologic cross section. We built on this by using real seismic reflection data (Stewart, 2007) for which more than one interpretation was valid, and we increased the sample size and elicited more detailed information on individuals’ training and experience. Critically, we then tested our findings to demonstrate that a change in work flow increased interpretation quality.

Four hundred and forty-four experienced geoscientists interpreted a real 2D seismic image by hand (Figure 1a). An accompanying questionnaire included 21 questions that captured respondents’ backgrounds (i.e., education, work environments, and experience). The questionnaire and the resulting data sets used in the analyses are openly available (see Macrae et al., 2016). An experienced geoscientist was defined as one who was greater than 21 years old, had a university degree, had more than two years of experience after the completion of their highest degree, and had experience of seismic interpretation and structural geology. Structural geology experience was included as part of our definition of an experienced geoscientist because our previous research (Bond et al., 2007, 2012) indicated that it affected 2D seismic interpretation quality. To avoid influencing their interpretation, respondents were not told where the seismic data were from and the only instruction was “please interpret the whole seismic image.”

The age and gender of respondents were analyzed for demographic representation by combining the 2009 membership lists of AAPG, AGU, EAGE, and Geological Society of London to serve as a proxy for the underlying geoscientist population. Compared with the pooled membership lists, our sample had less than a 13% absolute difference in each of the questionnaire’s age categories and less than a 3% absolute difference in the gender categories.

Because the survey used real seismic data, there was no “correct” interpretation. To assess respondents’ interpretations, they were compared with the interpretations of five reference experts (REs). The REs were contacted separately and asked to spend at least 30 min interpreting the seismic image without collaboration. The REs had a median of 24.5 years of experience since completion of their highest degree, came from different technical backgrounds and they were acknowledged leading experts in seismic interpretation, structural geology, sedimentology, and tectonics.

The response variable in the analysis was the similarity of respondents’ interpretations to at least one of the REs’ interpretations. The five REs were asked to provide key geologic features (“those geologic features that helped to define the tectonic setting and/or stratigraphic setting of the interpretation”) that were integral to their interpretation (Figure 1b). The REs provided between six and nine key features each. All five RE interpretations differed from each other, but honored the data and were geologically valid, i.e., met accepted geologic rules. Interpretations ranged from a detached listric normal fault interpretation to a transtensional fault interpretation; all but one expert interpreted an extensional tectonic regime. All experts interpreted salt with a detachment horizon, but only four experts chose salt as a key feature, indicating disagreement on its importance relative to other parts of their interpretations.

Respondents’ interpretations were scored against the key features, identified via visual inspection by author E. J. Macrae, with a sample independently verified by coauthors C. E. Bond and Z. K. Shipton. We define a high-quality interpretation as one with a high RE score; i.e., the respondent’s interpretation captured most of the key features identified by one (or more) of the experts. During the analysis of respondents’ interpretations, we did not detect any other type of interpretation that was geologically valid and not at least partially represented by the five RE interpretations. We do not consider that any one RE interpretation is best; we allow that each could be correct.

Respondents’ questionnaire responses yielded 28 background factors that characterized their training and experience. A further 17 interpretational factors were derived from the techniques used to interpret the seismic image (Table 1). These factors captured whether or not respondents had each type of experience (or the amount of experience) and whether each technique had been used. Use of the techniques was determined via visual inspection. We statistically analyzed the data to determine which type of respondents (in terms of their backgrounds) performed best and what interpretational techniques were most effective. Multivariate ordinal logistic regression with the proportional odds model (McCullagh, 1980), a form of generalized linear modeling, was used to determine which factors (if any) were associated with high scores, indicating better interpretations. Factors were assessed against the response variable individually, and nonsignificant factors were not progressed to the multivariate analysis. The multivariate analysis allowed the influence of each factor to be assessed relative to the simultaneous influence of all other factors in the statistical model. The analysis started with the inclusion of all individually significant factors. Factors were then iteratively removed (one at each step) until only the significant factors were left, i.e., using a manually applied backward stepwise regression procedure (Draper et al., 1966).

During the factor selection process, the statistical significance of each factor was determined by the p-value: a measure of the strength of the evidence provided in the data for a relationship between the response variable and the factor. In the multivariate analysis, we considered factors to be statistically significant when p-values were less than 0.05; smaller values were interpreted as being highly significant. Odds ratios, a measure of effect size, were then used to rank the factors in the final model. In our application, the odds ratio quantified, for each factor, how likely respondents in one category of the factor were to have higher scores than those respondents who were in another category. Factors with higher odds ratios had a greater positive effect on respondents’ scores than factors with lower odds ratios. Confidence intervals for the odds ratios; these quantify the precision with which our results generalize to the underlying population, had other geoscientists been sampled.

What affects interpretation quality?

The median number of key features identified by the 444 geoscientists was three (mean 2.83). The standard deviation of the distribution was 1.5 key features. One respondent achieved a score of eight.

What was the typical training and experience of respondents?

Respondents had a median of 10 years of experience after completion of their highest degree. The median number of geographical locations where respondents had worked was six, and the four most common specialist technical areas were seismic interpretation (47.7%), structural geology (46.6%), geophysics (32.2%), and stratigraphy (27.9%). Most of respondents had experience in multiple work environments (e.g., academia, consultancy, oil and gas industry, or service companies) and specialist technical areas. Table 2 gives further information on respondents’ education, experience, technical ability, and tectonic experience.

What were the typical interpretational techniques deployed?

Respondents used simpler techniques (e.g., marking stratigraphic horizons) more frequently than techniques that required substantial geologic reasoning (Table 1). Only five out of 444 respondents (1.1%) considered the “geologic evolution” of their interpretation, and 254 of the 444 respondents (57.2%) used no writing techniques (i.e., no annotations, labels, descriptions, or explanations). The median self-reported time spent interpreting the seismic image was 10 min.

What leads to a high-quality seismic interpretation?

Four background factors and five interpretational techniques proved important in producing a high-quality interpretation. Of the background factors, the “level of experience in structural geology,” “how often seismic images are interpreted or used,” “background in a super-major or major oil company,” and the “number of geographical locations” in which respondents had worked were all significant. The “length of time spent interpreting the seismic image” was not. Bond et al. (2012) also find experience in structural geology to be significant but did not collect data on the other significant factors.

The significant factors (Table 3) are ranked by their odds ratios. The most influential background factor was experience in structural geology: “specialists” were 3.25 times more likely to produce better interpretations than respondents with a “basic working knowledge,” regardless of their backgrounds and the techniques used. To our knowledge, this is the first study to demonstrate that structural geologic experience is significant over and above experience in seismic interpretation.

In general, the interpretational techniques used (Table 3) were more influential in producing high-quality interpretations than respondents’ background experience. The most influential technique was “geologic time.” Regardless of the other significant background and technique factors, those respondents who wrote about geologic time were 4.46 times more likely to gain higher scores than those who did not. The next most influential technique was “drawing cartoons” that explained part of the interpretation, followed by writing about “geologic processes.” The “justified interpretation” technique was not significant, indicating that generalized explanations were less beneficial; and the technique of “geologic evolution” was not significant, probably because so few respondents (five) used it. We defined “geologic time” to include local-scale features, such as the timing of a sedimentary package with respect to a fault, whereas “geologic evolution,” a subset of geologic time, involved attempting (but not necessarily succeeding) to explain the evolution of the geology in the whole seismic image.

Although “geologic time” was identified as the most effective technique, it was not clear from this data set whether this technique caused geoscientists to produce better interpretations or whether it was just a natural consequence of being a good interpreter. This is important because if this technique produces better interpretations regardless of the individual, then current practice can be improved.

To investigate whether “geologic time” causes individuals to make better interpretations, four identical workshops were conducted. The experimental design was based on a randomized controlled clinical trial (Amberson, 1931). In total, 49 experienced geoscientists, who had not taken part in the survey, were recruited from four oil and gas companies. In each workshop, managers were asked to randomly allocate participants into two groups (a control group and a test group) and to keep the distributions of experience approximately equal while taking no other factors into account. The managers did not know the hypothesis being tested, and the geoscientists were told that they had been allocated randomly. All participants were given the same seismic image to interpret as the survey respondents, but unknown to the control group, the test group was given different written instructions. Because all other experimental factors were equal, the experiment tested for a causal link between these instructions and interpretational quality. A two-sample Poisson’s rates statistical test (Przyborowski and Wilenski, 1940) was then used to determine whether the mean scores of the groups were significantly different.

The workshop control groups, which had a total of 24 participants and a median of 20.5 years of experience, were given the same instructions as the survey respondents, whereas the test group of 25 participants, which had a lower median of 14 years of experience, was instructed to: “interpret the whole seismic image. Please focus your interpretation on the geologic evolution of the section” and was asked to “summarize the geologic evolution below.” All groups were given 35 min to complete the exercise; the median time taken for the control group was 30 min, whereas it was 22.5 min for the test group.

The workshops’ results proved extremely strong; the quality of the two groups’ interpretations was different (Figure 2). Despite having an average of 6.5 years less experience, the test group attained scores that were, on average, 62% higher than the control group (4.12/2.541.62). The control group scores did not differ significantly from the survey respondents. The statistical test demonstrated that the 62% increase in mean score was highly significant (p=0.002), thus establishing a causal link between “focusing on and stating” the geologic evolution and producing high-quality interpretations when compared with geoscientists who are not given any direction.

To gain a deeper understanding of the consideration of geologic evolution during interpretation, qualitative data were collected from workshop participants via a postinterpretation questionnaire and by structured group discussions. Before the groups were told what hypothesis was being tested, all participants were individually asked in a questionnaire whether considering the geologic evolution was beneficial to producing a valid interpretation; all agreed, apart from five who did not provide a response, but 36 out of 43 indicated that they found considering the geologic evolution to be “challenging” (28) or “moderately challenging” (8). Even in the test group, who were explicitly instructed to write about the geologic evolution, only 14 of the 25 participants did, and in doing so, achieved a mean score of 4.71; the remaining 11 participants explained or described their interpretations instead and achieved a lower mean score of 3.36. This difference was not significantly different, possibly due to the smaller sample size.

The managers we surveyed said that they prompted staff to consider the geologic evolution, and 90.9% or participants said it was part of their normal work flow. In the postinterpretation questionnaire, 83.3% of the control group stated that they considered the geologic evolution of their interpretation, but according to our definition only one had. This demonstrates that even though individuals might think they are considering the geologic evolution, it is the explicit process of having to write their concept down that leads to a better interpretation. Some workshop geoscientists believed they had insufficient time to consider the geologic evolution; however, the test group proved this was untrue given that they produced better interpretations than the control group in less than 23 min, despite (coincidentally) being less experienced.

Improving interpreter performance

The results show that even though experience is important, particularly in structural geology, interpretational techniques have the greatest impact on quality. We present statistical evidence that being instructed to “focus on and state” the geologic evolution causes geoscientists to produce better interpretations of seismic data. It is possible that some participants considered geologic evolution in their interpretation but left no evidence of doing so. The statistics, however, are clear: It is writing about geologic evolution that significantly increases quality when measured against experts. Our research implies that geoscientists should be required to draw sketches with written explanations to justify the geologic evolution of their interpretations. Not only does this lead to better interpretations, it also enables improved knowledge transfer between colleagues and allows future interpreters to understand the rationale for decisions. Consideration of geologic evolution may also mitigate overconfidence because the technique can be challenging to apply. Furthermore, multiple interpretations of the same data set can be tested, and as more data become available, some can be rejected. Our results support the theoretical proposition that effective seismic interpretation must be an investigation of structural evolutionary concepts involving geologic reasoning (Bond et al., 2015) rather than a simple stratigraphic correlation of faults and horizons. We recommend that companies would benefit from conducting controlled trials, using more complex in-house 2D and 3D data sets, with mixed teams of geologists and seismic interpreters: This approach would allow for optimization of industry work flows and would enable identification of best practice within a specific commercial environment.


We surveyed 444 experienced academic and industrial geoscientists and statistically analyzed their interpretations of a 2D seismic image with respect to their experience, qualifications, and the interpretational techniques used. Building on previous research, but using real seismic data for which there is no correct interpretation, we show that explicit consideration of temporal structural evolution of a section is rare among geoscientists, but it is the most influential factor in improving interpretation quality. We go on to show, through the use of controlled trials, that if interpreters are explicitly asked to describe the geologic evolution of a section, they produce significantly better interpretations. Furthermore, not only will the incorporation of written descriptions of geologic evolution within industry work flows lead to better interpretations, it will also improve knowledge transfer between colleagues and allow future interpreters to understand the rationale for historical decisions.

Our findings have implications for the interpretation of all remotely sensed data sets in which the data are sparsely distributed (e.g., gravity, magnetics, resistivity, LiDAR, and photogrammetry) and for the creation of any interpretation-based models (e.g., geologic maps). New work flows for geologic interpretation and model building, focused on evolutionary thinking, should be introduced as standard procedure to increase interpretation quality and hence reduce commercial, safety, and environmental risks.


E. Macrae was funded by an NERC Open CASE Ph.D. award (NE/F013728/1) with Midland Valley Exploration Ltd. as the industry partner. We thank 763 geoscientists for their participation, and in particular, the REs who gave their time freely to the project. M. Scott (University of Glasgow, UK) is thanked for assisting with the statistical analysis. Four reviewers are thanked for their constructive comments that improved the manuscript.

Euan Macrae received a degree (2008) in mathematics and statistics from the University of Strathclyde and then received a multidisciplinary Ph.D. as detailed in this paper. His “Freyja” data collection questionnaire was distributed all over the world; he enjoyed traveling and discussing his research with all the geoscientists as he was lucky enough to meet. From 2012, he worked as a decision and risk analyst with LR Senergy in Aberdeen. As well as completing decision analysis training in Houston, Texas, he was able to attend the University of Aberdeen’s exploHUB training center for three months to study hydrocarbon exploration. Since 2014, he has been working as a data scientist for Outplay Entertainment, a leading developer of innovative games for smartphones, tablets, and social networks, based in Dundee.


Clare Bond received a first class degree from the University of Leeds followed by a Ph.D. from the University of Edinburgh. She is a structural geologist with interests in interpretational uncertainty and structural modeling. She then worked across a range of sectors and interests, from conservation and policy to technical consulting and academia. She has spent time in industry working for Midland Valley; she initially led their North American and Scandinavian portfolios before managing their client-facing knowledge team and has provided geologic input into the development of the structural modeling software, Move. Since 2010, she has held a senior lectureship position at the University of Aberdeen. Her current projects include seismic interpretation, fracture modeling, and the use of LiDAR and photogrammetry to create and interpret virtual outcrop models. She is the chair of the Tectonic Studies Group of the Geological Society of London and she codirects the Fold-Thrust Research Group. She is an advocate of public engagement with science.


Zoe Shipton is a professor of geologic engineering at the University of Strathclyde. She works on the link between faulting and fluid flow in applications, such as hydrocarbons, CO2 storage, radioactive waste disposal, and geothermal energy, as well as the structure of modern and exhumed earthquake faults. She also conducts research into the quantification of geologic uncertainties and the perception and communication of risk and uncertainty. She has advised UK and Scottish governments in the area of shale gas. She is a member of the UK Sense about Science Energy Panel and actively engages with the media and public groups to communicate the underpinning science of risk from fracking and unconventional hydrocarbons. She was elected as a fellow of the Royal Society of Edinburgh in 2016.


Becky Lunn is a professor of engineering geosciences at the University of Strathclyde. Her research focuses on the development of new technologies in ground engineering, in particular engineered barriers for nuclear decommissioning and waste disposal. In 2014, she was elected as a fellow of the Royal Society of Edinburgh and a fellow of the Institution of Civil Engineers. She leads two multipartner Engineering and Physical Sciences Research Council research consortia in nuclear waste disposal and decommissioning: “Biogeochemical Applications in Nuclear Decommissioning and Disposal” (BANDD) and “SAFE Barriers.” In 2015, she was elected as an Outstanding Woman of Scotland by the Saltire Society for her contributions to research and for her support of women in engineering. Her current research interests include the development of new monitoring technologies for nuclear waste disposal and geologic carbon storage sites and design of novel grouts for injection as ground barriers and for sealing fractures in rocks.