Abstract
Like many other types of data in the energy industry, well data stored electronically can be divided into two categories: (1) data stored in relational or object databases that are highly structured and (2) data located in documents in various formats (TIFF, JPG, PDF, XLS, etc.) that are typically gathered in folders in a semistructured or unstructured form. Typically, these data break down into 20% structured data versus 80% semi- or unstructured data; this figure is in line with what is observed for other types of data across the industry. This situation affects the ability to make informed decisions since geoscientific software and risk-assessment analytic systems only operate on structured data. Current practices to extract data and metadata from unstructured documents involve a mainly manual and costly process. Data model limitations of the most prevalent databases are a further hindrance to the capture of unstructured data. We discuss a feasibility study to access the 11,500 well headers and 450,000 documents from the United Kingdom Continental Shelf (UKCS) that were released by Common Data Access Limited (CDAL — a wholly owned subsidiary of Oil and Gas UK, funded by 55 operators to share subsurface E&P data) as part of its 2016 Unstructured Data Challenge initiative. A cost-effective solution based on emerging machine learning technology “taught” and guided by data-management experts can support the reliable indexing and cataloging of these forms of data, paving the way for much more reliable E&P business decisions in the future.