This contribution is an introduction to the thematic collection ‘Digitally enabled geoscience workflows: Unlocking the power of our data’. The goal of the collection is to show how advances in data-science are transforming the process of scientific research and fueling a new generation of energy geoscience workflows. These workflows are providing game-changing advances in terms of time saving on complex tasks, improved consistency and repeatability of interpretation, and utilization of scarce experienced geoscientists. Eight articles have been accepted for publication as part of this thematic collection, five in Petroleum Geoscience and three in Geoenergy. We provide a short summary of each of these contributions and hope that this collection will provide inspiration and examples of the breadth of workflows that can be transformed by embracing the coming wave of digital technologies.

This thematic collection resulted from an open call for papers on the theme of ‘Digitally enabled geoscience workflows: Unlocking the power of our data’. Eight contributions have been accepted for publication, five in Petroleum Geoscience and three in Geoenergy. Although the energy geoscience industry typically employs statistical workflows that are highly data intensive, it has been relatively slow to adopt modern data-science technologies. This is a result of historical reliance on established methods, the cost and complexity of adopting new technologies, and cultural and organizational challenges. However, with improved computing power and growing interest in data sciences, this is now changing rapidly, with the development and application of data-driven workflows an active area of research in energy geoscience. It is expected that the publication of research contributions in this area will continue to accelerate, with this collection providing a useful summary of the current key emerging themes.

Thematic collection: This article is part of the Digitally enabled geoscience workflows: unlocking the power of our data collection available at: https://www.lyellcollection.org/topic/collections/digitally-enabled-geoscience-workflows

The goal of this collection is to showcase how advances in data science are transforming the process of scientific research and fueling a new generation of workflows including:

  • Ingestion, clean-up and interpretation of large geological datasets, such as subsurface well logs and markers

  • Data analytics and visualization; e.g. dashboards to summarize prospect evaluation data or field production

  • Application of Industrial Internet of Things (IIoT) and edge computing for remote monitoring of equipment and maintenance scheduling

  • Time series analysis: e.g. for automated history matching or event detection in industrial components

  • Computer vision techniques to automate processing and interpretation of seismic datasets; or analysis, classification and segmentation of thin section images or core photographs

  • Generative artificial intelligence (GenAI) and large language models (LLMs) for extracting and summarizing useful information from unstructured text sources such as drilling reports

These techniques are providing game-changing advances in subsurface workflows in terms of time saving on complex tasks, improved consistency and repeatability of interpretation, and utilization of scarce experienced geoscientists.

Papers included in this collection describe a diverse range of digitally enabled applications including automation, machine-learning, numerical modelling and intelligent data processing and their relevance to the field of energy geosciences.

Pantaleo et al. (2024) estimate CO2 saturation using a deep learning method, in which the synthetic seismic data are used for training the deep learning model. This research proposes a new method to combine the deep learning and feature extraction method, aiming to enhance to accuracy for predicting CO2 saturation maps. Furthermore, the two-dimensional continuous wavelet transform is applied to analyse the seismic shot gathers at different scales, and a transfer learning approach is adopted to improve the network performance in noisy conditions.

Stewart (2024) proposes an innovative workflow for mapping hydrodynamic traps in areas where well pressure data are sparse or even absent. The workflow uses a probabilistic approach where all reasonable combinations of hydraulic gradient, azimuth and tilt amplification factor are searched. The method can be implemented manually but would also be suitable for automation and can handle pressure and fluid uncertainty in underexplored settings where detailed pressure and structural information may be lacking.

Sahu et al. (2024) demonstrate a classical supervised machine-leaning model for propagating image-based rock classes derived from core photographs to well intervals that lack drill core. Image preprocessing is used on core photos and 2D coiled tubing (CT) scanned images to extract image-based rock classes. A random-forest classifier model is trained on the rock classes and conventional wireline log data, with k-fold cross validation used to tune the model hyperparameters. The resulting trained model is applied to classify conventional well logs from well intervals lacking core. When compared with rock classifications obtained by conventional methods, the authors report that their model achieves an average accuracy of 94%. This workflow shows a great promise in reducing interpretation time and minimizing the need for costly core recovery and imaging.

Sahu and Roy (2023, 2024) analyse fracture clustering in outcrop analogues and integrate those results with streamlined flow simulation using TRACE3D to generate insights into the influence of fracture network parameters on subsurface fluid flow. They describe a ‘lacunarity parameter’ that quantifies scale-dependent clustering of fractures in terms of the distribution of spaces or gaps in a pattern as a function of scale. This parameter can distinguish between fracture networks that have a similar fractal dimension, allowing prediction of connectivity and flow behaviour in fractured reservoirs. This has implications for the important task of calibrating discrete fracture network (DFN) models against subsurface fluid flow behaviour.

Fathi et al. (2024) apply a physics-informed machine-learning approach to well placement and completion optimization in an unconventional reservoir. Several model types are tested, with an extra trees regressor model displaying the best performance. The model is used to identify optimal parameters for maximizing gas production, ranks wells and identifies poorly performing ones that could be subject to intervention and identifies optimal well spacing. Ultimately the study shows that an approach utilizing shorter stage length and higher sand-to-water ratio can increase cumulative gas production by up to 8% in the target reservoir, while reducing human bias in data analysis and evaluation.

Siler (2023) outlines new methods to predict the locations of deep circulation fault-hosted geothermal systems in the Great Basin. The approach identifies eight different types of structural discontinuity that are expected to act as loci of hydrothermal upwelling: fault bends, horsetail fault terminations, stepovers, fault intersections, inward- and outward-dipping accommodation zones, displacement transfer zones and transtensional pull-aparts. Using synthetic 3D models of these structural discontinuity types, MATLAB code is used to apply boundary element modelling to track the distribution of strain in the crust surrounding the discontinuities. The results show that fault stepovers and terminations host the largest number of hydrothermal systems in the Great Basin, and have the largest and most localized stress and strain concentrations associated with them, providing process-based explanations for the observed distribution of exploitable hydrothermal systems in the Great Basin.

Osah and Howell (2023) apply machine learning to the large, multivariant problem of predicting oil field performance, using a database of 60 oil fields from the UK continental shelf. Using principal component analysis (PCA), the large number of potentially influential parameters is reduced to five key ones: gross depositional environment, average permeability, net-to-gross, gas/oil ratio and number of wells. The two outcomes examined are recovery factor and maximum field rate. Five different machine-learning algorithms were tested, with support vector regression producing good results for both outcome measures, depending on the kernel function selected. The approach shows the potential for statistical prediction methods to be applied to multivariant problems in field performance.

We hope that this collection will provide inspiration and examples of the breadth of workflows that can be transformed by embracing the coming wave of digital technologies. With the rapid development of new technologies from data acquisition to analysis, we hope to see a ‘Cambrian Explosion’ of contributions to the field and look forward to reaping the benefits and new insights this will reveal.

DA: writing – original draft (equal), writing – review & editing (equal); RF: writing – original draft (equal), writing – review & editing (equal); PW: writing – original draft (equal), writing – review & editing (equal).

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.