We have developed a Bayesian statistical framework for quantitative geosteering in real time. Two types of contemporary geosteering approaches, model based and stratification based, are introduced. The latter is formulated as a Bayesian optimization procedure: The log from a pilot reference well is used as a stratigraphic signature of the geologic structure in a given region; the observed log sequence acquired along the wellbore is projected into the stratigraphic domain given a proposed earth model and directional survey; the pattern similarity between the converted log and the signature is measured by a correlation coefficient; then stochastic searching is performed on the space of all possible earth models to maximize the similarity under constraints of the prior understanding of the drilling process and target formation; finally, an inference is made based on the samples simulated from the posterior distribution using stochastic approximation Monte Carlo in which we extract the most likely earth model and the associated credible intervals as a quantified confidence indicator. We extensively test our method using synthetic and real geosteering data sets. Our method consistently achieves good performance on synthetic data sets with high correlations between the interpreted and the reference logs and provides similar interpretations as the geosteering geologists on four real wells. We also conduct a reliability performance test of the method on a benchmark set of 200 horizontal wells randomly sampled from the Permian Basin. Our Bayesian framework informs geologists with key drilling decisions in real time and helps them navigate the drilling bit into the target formation with confidence.
Geosteering is the iterative process of navigating the bottom hole assembly (BHA) in a given geologic setting to achieve prespecified targets. To guide the directional drilling process, directional survey and logging-while-drilling (LWD) sensor measurements are used to estimate the BHA position and the lateral changes of the geologic structure. Geologists usually refer to the geologic structure as the formation model or earth model. Accurate geosteering is crucial to ensure short-term production and long-term reservoir access of a well (Griffiths, 2009).
Geosteering is also an application of horizontal drilling. Before drilling a horizontal well, a pilot vertical well is drilled penetrating through all formations of interest to acquire a stratigraphic log. Geologists also refer to this vertical well as the typewell and its corresponding log as the typelog. The typelog is a stratigraphic signature of the geology over a region of interest. The gamma-ray (GR) log is a commonly used stratigraphic measurement in geosteering because it is relatively insensitive to fluid saturation and porosity variation (Griffiths, 2009, chapter 5.8). GR measurements are counts of the number of GRs emitted from the three nature isotopes commonly found in the earth formations: thorium (Th), uranium (U), and potassium (K). Because GR has a limited depth of investigation (DOI), from a few inches to a few feet (PetroWiki, 2012; Kullawan et al., 2014), it cannot be used to map the boundary of the formations. However, GR is sometimes used as the only stratigraphic measurement when there is a sufficient difference of response between formation layers. Here, we use the GR log from the typewell as a stratigraphic signature of the region and we introduce two techniques to geosteer a horizontal well.
Two types of geosteering approaches exist: model based and stratification based. Griffiths (2009, chapter 1.5.1) summarizes the procedure of a model-based method as “model, compare, and update.” Given a formation model and wellbore trajectory, the expected log response is calculated based on a typelog and compared with the real-time measured log. The formation model is then iteratively updated until the expected responses match the actual log. The stratification-based approach was introduced by Stoner (2007): Based on a formation model, the measured logs are converted from the lateral to the stratigraphic domain and then correlated with the typelog. The formation model is again iteratively updated until there is a good match of the patterns between the converted log and the typelog. Geosteerers implement either approach by iteratively updating the formation model and then visually matching the patterns from the typelog and measured log. Clearly, human-based geosteering interpretations are easily affected by individual opinions and prone to various errors. Therefore, statistical machine-learning algorithms are urgently needed to avoid individual variation and standardize the geosteering process.
In this paper, we use stochastic clustering and pattern matching (SCPM), a computational method based on Bayesian optimization, to predict the distance from the BHA to the target formation top and model the lateral changes of the target geologic structure in real time. In particular, we formulate the stratigraphic-based geosteering approach as a Bayesian optimization problem, cluster the noisy GR measurements to ameliorate the impact from potential outliers, and use a variety of similarity metrics to match patterns with the stratigraphic signature from the typelog. The proposed optimization is solved using a stochastic approximation Monte Carlo (SAMC) (Liang et al., 2007).
Bayesian methods have been widely used for quantitative geosteering. These methods can be roughly categorized into three types, depending on different LWD logs: (1) seismic log-based, (2) nonazimuthal log-based, and (3) azimuthal log-based. The GR LWD log used in this paper is collected by a nonazimuthal sensor. Kullawan et al. (2014) compare the DOI of different LWD tools. Eidsvik and Hokstad (2006) use seismic borehole data and wellbore position estimated from the gravitation and magnetism LWD to assess the drilling-bit position, distance to the geologic marker, and the parameters of the velocity model. They develop a Bayesian dynamic model integrating prior knowledge, forward modeling, and observation data. However, very limited amount of seismic LWD data can be transferred to the surface in real time by an acoustic mud-pulse telemetry system (Griffiths, 2009, chapter 1.6.2) and one even has to stop drilling to acquire the vertical seismic profiling data that slow down the horizontal drilling process. Kullawan et al. (2014) use the deep directional resistivity (DDR) (Griffiths, 2009, chapter 1.5.3) log to measure the distance to bed boundaries (DTBBs). They apply a Bayesian approach with informative priors to obtain the posterior of the DTBB. Yuan et al. (2015) and Qin et al. (2017) use the azimuthal GR log to predict the relative dip angle, and then they calculate the distance from the bit to the boundary surface. DDR and azimuth LWD logs are more expensive to obtain than the nonazimuth log, and they are used less often in the convectional drilling context.
Perhaps the most comparable approach in our methodology is described by Winkler (2017), who proposes a Bayesian network to infer the joint marginal distribution of the well path and earth model given the instrument prior information, directional survey, and GR LWD log. However, a simplified earth model was considered, and only GR LWD logs at survey stations were used, which much reduced the problem complexity. In addition, Winkler (2017) uses discretized random variables to further save computations and requires careful specification of several hyperparameters. This discretization contradicts the fact that variables such as formation dip and wellbore inclination are continuous, and it leads to extremely wide posterior credible intervals as evidenced by the interpretations of three legacy wells in Fierstien et al. (2018). In comparison, our method is far more general: We treat the geosteering as a 3D problem and provide interpretation for each GR LWD data point. Our method uses continuous random variables and only has two or three hyperparameters to be specified. Furthermore, our proposed sampling algorithm converges fast and provides satisfactory interpretations with narrow posterior credible intervals in real time.
The rest of the paper is structured as follows: We first introduce a geometric model based on the Setchell equation and a stochastic clustering step to alleviate noises from the LWD GR log. Then, we propose a Bayesian optimization formulation of the stratigraphic-based approach to geosteering and use SAMC to solve the optimization problem. We illustrate our method using synthetic data sets and compare its performance on some real horizontal wells with interpretations from geosteering geologists. We conclude the paper with a short discussion about some avenues for future investigation.
To formulate stratigraphic-based geosteering as a statistical model, we need to first define a valid geometric model to project the lateral GR log into the stratigraphic domain, and then we aggregate the raw GR log that falls into a grid of prespecified stratigraphic intervals to correlate with the typelog in the same domain.
We represent the target formation, usually the hydrocarbon-bearing layer, as two 3D planar parallel surfaces. The top of the formation is a surface parameterized by a true dip and a true dip direction azimuth . The bottom of the formation is another surface at some offset distance parallel to the top of the formation.
We introduce three terms used to describe various thicknesses of a formation layer in Figure 1. True stratigraphic thickness (TST) is the thickness measured by a well penetrating perpendicularly to the layer. True vertical thickness (TVT) is the thickness measured by a vertical well through the layer. Measured depth thickness (MDT) is the thickness measured along the trajectory of a deviated well that intersects the layer.
Given the directional surveys of the inclination, azimuth, and MD, we can estimate the trajectory of the wellbore in the 3D Cartesian coordinates in terms of true vertical depth (), east (E), and north (N) at the measured depth of the survey stations (usually between 15.24 m [50 ft] and 30.48 m [100 ft]). For geosteering applications, the MD sampling frequency for GR is typically 0.15 m (0.5 ft) or 0.3 m (1 ft). For depths that do not coincide with the depth of the survey stations, we use the minimum curvature method () to interpolate the trajectory positions between survey stations (Sawaryn and Thorogood, 2005). assumes that the well path between two adjacent survey stations follows a minimum curvature trajectory, and it is a circular arc. The assumption of is not always valid, but it is still recognized as a standard method for trajectory position estimation. We provide the details about the minimum curvature interpolation (MCM) interpolation in Appendix A.
The cost function, in equation 13, achieves a balance between the similarity metric and the prior knowledge about the target formation and horizontal drilling process. In particular, it encourages the converted stratigraphic LWD GR log to match the pattern from the stratigraphic signature . It also penalizes deviations of the wellbore inclination from and deviations of the local formation dip from the global dip angle . The hyperparameters and control the strength of our belief that we are drilling a horizontal well in a region with global target formation dip . If the region dip of the target formation is unknown, can be set to zero, which corresponds to a flat target formation.
Stochastic approximation Monte Carlo
Sampling from high-dimensional distributions is challenging due to the likely existence of multiple modes. Markov chain Monte Carlo (MCMC) samplers, such as the Metropolis-Hastings (MH) algorithm (Hastings, 1970; Chib and Greenberg, 1995), are prone to becoming trapped in the local mode.
A remarkable feature of SAMC is its self-adjusting mechanism, which operates based on past samples. This mechanism penalizes the over-visited subregions and rewards the under-visited subregions, and thus it enables the system to escape from local traps very quickly. Mathematically, if a subregion is visited at time , will be updated to a larger value such that this subregion has a smaller probability to be visited in the next iteration. On the other hand, for those regions not visited this iteration, will decrease to a smaller value such that the chance to visit these regions will increase in the next iteration. This mechanism is very effective for sampling from high-dimensional systems with multiple modes.
We have parameters such as wellbore inclination angle , wellbore azimuth direction angle , and local formation dipping angle at each sampling step from the LWD data. Given this relatively high-dimensional problem, SAMC is an ideal sampler to acquire samples from the posterior distribution. If the performance of the sampler is still not satisfactory due to the multiple modes of the posterior landscape, one possible solution is to run multiple MCMC chains with different initial values. Using the parallel computing framework such as OpenMP (OpenMP Architecture Review Board, 2008), we can simultaneously run several chains and aggregate the results from each chain. Our proposed SCPM is summarized in Algorithm 1.
We use synthetic data to demonstrate the performance of our method and compare different correlation metrics. First, we construct a simulation study to show that our algorithm is robust to different characterizations of the LWD logs. Then, we use our method to interpret four real horizontal wells and compare our results with the manual interpretation from geosteering geologists.
Simulator and synthetic data
Using Algorithm 1, we obtain posterior samples and discard the initial samples as the burn-in. From the remaining samples, we select the sample that maximizes our cost function in equation 13 and treat it as our maximum a posteriori (MAP) estimator . Given these samples, we can also compute the posterior 95% pointwise credible interval and MAP estimator for each and denote them as and , correspondingly. Those credible intervals are considered as a quantified confidence indicator for the estimated formation marker .
We applied our SCPM algorithm to the synthetic data sets with various correlation metrics, signal noise parameters and fault displacement variances . When fitting our model, we chose and to allow for a larger variation for the local dip than the wellbore inclination . We obtained 105,000 RSD trajectory samples and discarded the initial 5000 as burn-in. Our sampler is written in an efficient algebra library called Eigen (Guennebaud and Jacob, 2010) in C/C++. On average it takes approximately 1 min to interpret a 1524 m (5000 ft) horizontal well using an Intel i7-5820 K 3.3 GHz processor. Because the chain converges fast, one can use the Gelman-Rubin convergence diagnostic test (Brooks and Gelman, 1998) to early stop the sampler and further reduce the computational time.
Figure 3 illustrates the MAP estimation of the formation marker overlying with the 95% pointwise credible intervals (the red intervals) from the posterior samples for four simulation examples with different and . These credible intervals can be regarded as a confidence indicator of our model. We can visually assess the performance of SCPM by comparing our MAP estimator (the green line) with the true known structure (the dashed red line). In Table 1, we reported the performance metrics using equation 27 with thresholds and , the EC probabilities defined in equation 28 for the posterior credible intervals, the Pearson’s correlation coefficient, and time to complete in seconds.
The proposed algorithm consistently achieves 97.78% within 0.3 m (1 ft) and 100% within 1.52 m (5 ft) on average from the true target for the no fault simulation case or . The EC probability for the posterior credible intervals is approximately 94.47%. For the fault case of , the algorithm still accomplishes approximately 81.89% within 0.3 m (1 ft) and 100% within 1.52 m (5 ft) from the true target. The EC probability decreases to approximately 79.48%, which is still an acceptable result because our algorithm is not designed to take the faults into account. Therefore, the posterior credible intervals are shown to be good for the no-fault setting, with some under coverage in the presence of faults.
Comparing Figure 3b with 3a, we observe that the posterior credible intervals (the red band) expand when increases from 1 to 10. This conforms with the fact that the confidence level of our model decreases when the observation noise increases for the GR LWD sensor. For the small fault setting shown in Figure 3a and 3b, our estimated structure notably misses the second and third faults, but the fitted GR log (the green line) for this MD interval is still close to the real log (the blue line).
Geosteering data case study
To demonstrate SCPMs performance on real well data, we interpret four horizontal wells A to D and compare our results with the interpretation from the geosteering geologist. Although we anonymized the well names, we provided the other information sufficient for the geosteering purpose. Similar as the synthetic data, we used the MCM to acquire the wellbore trajectory from the directional survey. We used the same LWD GR log and the GR typelog as the geologist. We show that our SCPM MAP estimator provides an accurate estimation of the target formation and the associated credible intervals are a good indicator of the uncertainties of the MAP estimate.
Horizontal well A
The target formation for well A, Wolfcamp W1 Shale, has an estimated bed thickness of 3.35 m (11 ft). The expected global dip angle of the formation is 0.6°. We used the same hyperparameters and as our synthetic example. Instead of using the expected global dip as , we set to demonstrate that SCPM works even without the prior knowledge about the global dip. We also assumed that there is no fault in this area, which is also hinted at by the continuity of LWD GR log in Figure 4aD.
The MAP estimator from SCPM (the green line) is very close to the interpretation (the dashed red line) in Figure 4aC. In addition, we note that SCPM is not sensitive to the variation of the GR sensor noises from MD 3780 m (12,400 ft) to 3840 m (12,600 ft). The credible interval from the posterior trajectory samples is very thin, indicating that SCPM is very confident about its estimation. In Table 2, we reported the geosteering performance via equation 27 using the human’s interpretation as the ground truth. Our MAP estimator is on average 59.6% within 1 ft and 100% within 1.52 m (5 ft) away from the interpretation by a human.
Horizontal well B
The target formation for well B, third Bone Springs D, has an estimated bed thickness of 5.79 m (19 ft). The expected global dip angle of the formation is . We used the same hyperparameters and prior global dip of 0° as well A, and we assumed that there is no fault throughout the horizontal drilling.
Our interpretation from SCPM (the green line) is again very close to the ground truth (the dashed red line) in Figure 4bC. We pointed out that the LWD GR log is polluted by the outliers from MD 4480 m (14,700 ft) to the total depth. However, via the stochastic clustering step, SCPM is able to disregard those anomaly sensor errors. We also observed that the credible interval for well B is much larger than well A. The increasing credible intervals indicate the accumulative effect of various noise from the data, reflecting a cone of uncertainty for the estimated formation. However, our MAP estimator is still very close to the ground truth as indicated in Table 2. We achieve on average 59% within 0.3 m (1 ft) and 100% within 1.5 m (5 ft) away from the geosteering geologists’ interpretation.
Horizontal wells C and D
The target formations for wells C and D are Wolfcamp W Sand and share the same estimated bed thickness of 15.24 m (50 ft). The expected global formation dip angle for well C is , whereas the angle for well D is 0.11°. We used the same hyperparameters, prior dip 0°, and no-fault assumption as our previous examples. According to Figure 5a and 5b, we again achieve good performance even when the target formation thickness is large and the formation type is sand. For well C, despite that the GR signal is polluted and the raw log is quite noisy from 3048 m (10,000 ft) to 3658 m (12,000 ft), our method is still able to perform as good as the human. Furthermore, our method is insensitive to the large and abrupt changes in the RSD as the drilling bit shifts approximately 7.6 m (25 ft) from 4084 m (13,400 ft) to 4237 m (13,900 ft). For well D, the Pearson’s correlation coefficient score of our interpretation is 0.98, which is considerably higher than the score of the human’s interpretation with a value of 0.27. It indicates that our interpreted RSD sequence gives better pattern matching between the interpreted GR log and the typelog, and this is further confirmed by the fitting of the raw GR log in Figure 5bD. The performance for other correlation metrics for wells C and D is also summarized in Table 2: Our RSD estimation is on average 52% within 0.3 m (1 ft) and 100% within 1.5 m (5 ft) away from the human for well C, whereas it is on average 34% within 0.3 m (1 ft) and 100% within 1.5 m (5 ft) away from the human for well D when the geosteering geologist’s interpretations are used as the ground truth.
We applied the proposed SCPM algorithm to the interpretation of 200 horizontal wells randomly sampled from the Permian Basin in Texas, USA. The range of MD of the horizontal part for the benchmark wells is from 656 m (2151 ft) to 3167 m (10,392 ft). The global formation dips of wells also differ from one to another. We aim to show the reliability of our model under various horizontal well lengths and geologic settings.
We experimented SCPM algorithms with three choices of correlation metrics: Spearman, Pearson, and cosine, and we also considered the corresponding parallel version. Specifically, we run eight parallel SCPM algorithms with different initial values and we average the top three RSD trajectories ranked according to the cost function in equation 13. We evaluated the performance of each algorithm based on the interpretation time (in seconds) per 30 m (100 ft) and the Pearson’s correlation score between the interpreted stratigraphic log and the typelog.
We showed two boxplots in Figure 6, which summarize the distributions of the running time and the correlation score between the interpreted log and the typelog, respectively. Figure 6a shows that the cosine is the fastest correlation metric for single and parallel SCPM and that the Spearman is the slowest because it requires converting the raw values to rank before computing the Pearson’s correlation. Furthermore, the parallel version is approximately five times slower than that of the single version due to the potential communication overhead among processors, but approximately 10 s on average per 30 m (100 ft) is still sufficiently fast for geosteering purpose because it usually takes days to drill the horizontal part of a well. Figure 6b indicates that the cosine is again the best metric because it performs consistently well with low variation across different wells. We also notice that the performance of a parallel SCMP is generally better than its single version. This is because the density landscape of the posterior cost shown in equation 13 is often multimodal. Running multiple chains independently in parallel reduces the chance of SCPM being trapped into a local optimum and leads to more robust performance of the method. Compared to the geosteering geologists’ interpretations, ParSCPMs with cosine and Pearson metrics perform better and show consistently high correlation scores.
The proposed Bayesian model is sufficiently flexible to incorporate prior information during the exploration of the region of interest. For example, we might drill several horizontal wells in a drilling pad. Our understanding about the reservoir formation such as fault locations, global formation dip , and dip azimuth is updated constantly from the predrilled wells. This prior knowledge can be crucial for a successful interpretation of a new well.
We illustrated this idea via another simulation with several large fault displacements, or , in the target formation. Although our model does not take the fault into consideration, if we roughly know the location of each fault from the historical wells or seismic image data (Eidsvik and Hokstad, 2006), we can carry out another stochastic search on the possible displacement . Figure 3c illustrates the performance of such a modified SCPM on a synthetic data set with large faults, and their displacements are (), (), 2.08 m (6.83 ft), and (). The MAP estimator is still good enough achieving 71.58% within 0.3 m (1 ft) and 98.13% within 1.52 m (5 ft). Compared with Figure 3a, due to larger faults the posterior credible intervals in Figure 3c increase considerably when encountering the first fault at MD 2898 m (9509 ft). Despite missing the second and third faults, the fitted GR (the green line) is again close to the LWD GR log (the light blue line). On the other hand, without prior knowledge on faults, stochastic searching for the fault location is computationally expensive.
We have developed a Bayesian method to estimate the distance from the BHA to the target formation top, or RSD, and the associated credible intervals in real time. We have adopted the Setchell equation and formulated the stratigraphic-based geosteering approach as a Bayesian optimization. Stochastic clustering was performed after projecting the MD-based GR LWD log to RSD-based log to avoid the possible effect of outliers. The similarity of signal patterns from the typelog and projected RSD log was assessed using a correlation metric. A cost function was given to balance between the pattern similarity and our prior belief about the target formation and drilling process. Finally, the SAMC algorithm was used to obtain the posterior samples from the cost function. Besides the GR log, some horizontal wells can also be interpreted using a different nonazimuth log. The proposed algorithm is also applicable to these geosteering settings with another nonazimuthal log such as resistivity and porosity. In general, a typelog, a directional survey, and a nonazimuthal LWD log are sufficient for our algorithm to provide geosteering interpretation. However, a successful interpretation requires that the typelog is informative and that it can be considered as a representative signature of the formations.
Although the proposed algorithm is generally appropriate for most geosteering settings, there are some exceptions. As shown in our discussion, our method is able to estimate fault displacement given the prior location of the possible faults. It is also possible to detect potential faults, but searching for fault locations is computationally intensive for our model and might render the algorithm harder to run in real-time applications. Because LWD sensors are at some distance behind the drilling bit, the data gathered from LWD tools represent the formation that has already been drilled. Our cost function does not take the sequential nature of the geosteering data into account. Therefore, our model is not able to predict the future RSD at the bit and its corresponding uncertainty. However, both issues can be efficiently solved using Bayesian sequential models, which we plan to address in future work.
Our proposed method is based on a Bayesian machine-learning approach, as it defined an appropriate cost function for geosteering and then applies stochastic optimization to search for the optimal earth model. Other machine-learning approaches could be used to build upon our model formation and cost function. For example, based on the historical data, a machine-learning approach could be used to optimize certain parameters of the cost function or to set their values in an adaptive manner. Our Bayesian view of the stratigraphic-based geosteering approach integrates the prior understanding about the drilling and target formation with the real-time LWD GR log, and provides geologist with accurate estimation of RSD in real time. Our algorithm performs fast inference of RSD. For a 1524 m (5000 ft) horizontal well with 0.3 m (1 ft) per LWD GR data point, the sampling process normally completes within 1 min using a single CPU core. The method is robust to different formation types and various global regional dips; it achieves consistent good results with small faults in the target formation and outliers in the LWD GR log; it performs as well as the geosteering geologist on real horizontal wells with informative typelogs. Our model also readily incorporates the expert knowledge and other external information such as predrilled wells and seismic profile. The posterior credible intervals from the obtained samples quantify the uncertainty of the MAP estimator and help geologists make the right decisions confidently.
The authors thank the editor, associate editor, and two referees for their comments that have led to significant improvement of this paper. The authors thank Royal Dutch Shell for permission to publish this paper.
DATA AND MATERIALS AVAILABILITY
Data associated with this research are confidential and cannot be released.