Abstract
Causal factors of induced seismicity in the Permian Basin are investigated by collecting and processing data on reported earthquakes, hydraulic fracture operations, and salt water disposal (SWD). Data are collected from five online sources: (1) the TexNet Earthquake Catalog, which provides earthquake data for Texas; (2) the TexNet Injection Volume Reporting Tool, which provides daily SWD data for select Texas wells; (3) the FracFocus Chemical Disclosure Registry, which provides hydraulic fracture data to the public; and (4) B3 Insight and (5) IHS Enerdeq Browser, which are proprietary database services that provide current and historical well data through paid subscriptions. TexNet makes their data available to the public at dynamic map websites. We automate data processing and data management using Python and ArcGIS Pro tools. The workflow produces quick, reliable, consistent, and reproducible output. We developed a Python script for each collected data table to filter, select fields, and write a new table. We created ArcGIS Pro Model Builder models for each new table to control format properties at import to the geodatabase. Further models contain customized ArcToolbox tools arranged to run geospatial, quality assurance, and quality control processing steps. In addition to discussing the source data and general workflow, we also review the results of the automated data processing. To illustrate our method, we create areas of investigation around the 5.4 magnitude Coalson earthquake to collect and process available data to create maps, charts, and data products for use in subsequent analysis. We make our Python scripts available on GitHub.
Introduction
Induced seismicity is actively studied in the oil and gas fields across Texas and beyond. The Permian Basin of west Texas and southeastern New Mexico is an unconventionally producing region experiencing ongoing induced seismicity for which potential causal factors are hydraulic fracture operations and salt water disposal (SWD). Studies are initiated and mitigation efforts are enacted in response to high-magnitude events such as the 5.4 magnitude Coalson earthquake that happened in Reeves County on 16 November 2022. This earthquake was located in the Delaware Basin, which is described by Ewing (2019) as a deep sedimentary basin with complex internal structure located in the Permian Basin. A mix of horizontal drilling, hydraulic fracturing, and SWD is occurring in the Permian Basin. Mitigation efforts are supported by investigating oil and gas-related causal factors across time and within geologic context.
Scientists study induced seismicity by analyzing space-time relationships among earthquakes, well operations, and well activities using advanced software and modeling tools. To carry out these spatiotemporal analyses, researchers need accurate data relating to earthquake timing, location, magnitude, and depth. In addition, they need data pertaining to well location, depth, pressure, injected fluid volumes, and timing of injection into the subsurface for hydraulic fracturing operations and SWD.
The data collected and processed to investigate induced seismicity meet the criteria suggested by Ma and Mei (2021) as the four V’s of big data: volume, velocity, variety, and veracity. Voluminous and detailed well data are collected from two proprietary subscription-based databases, B3 Insight (B3) and IHS Enerdeq Browser (IHS). The earthquake data reported by TexNet are accurate, updated daily, and available to the public online. Hydraulic fracture well data for the United States are available to the public as a direct download from the FracFocus Chemical Disclosure Registry (FracFocus) website. TexNet manages a database of injection wells that provides daily injection volumes for select injection wells in Texas. Well operators voluntarily report well description information, injection volumes, and pressure readings dating back to the year 2016. This database contains a sample of injection wells operating in Texas during this time frame, and many of the participating wells are in the vicinity of ongoing seismic activity.
The situation in the field is always changing; data are constantly being generated, and databases are being updated. Scientists want to analyze the most current records, so updated data are routinely collected from the various sources and processed for further use. Mohammadpoor and Torabi (2020) suggest that powerful technologies are needed to effectively search and efficiently assemble large amounts of complex data from multiple sources. Python is used first on the collected data tables because it is fast, accurate, and relatively easy to develop. Python scripts are developed to solve data issues related to file size by selecting columns and filtering records. In some instances, Python scripts are used to automate the coding of records with new data elements that will be used later to filter the table or classify the features.
We import the Python-generated files into the geographic information system (GIS) environment to continue data processing using prearranged models stored in ArcToolbox. Imported tables and resulting spatial data are stored in a geodatabase, and the files are managed using ArcCatalog. As Shafapourtehrany et al. (2023) explained, GIS is a powerful mapping and modeling environment that imports and exports many data formats. ArcGIS Pro Model Builder is used to automate data processing by creating custom models organized to run in a particular order. These models define multistep geospatial, quality assurance, and quality control (QA/QC) processing specific to the source, data table, and future use case. Data sets exported from the ArcGIS Pro environment are designed to meet specifications defined by the scientist conducting the spatiotemporal analysis. For example, data sets are designed for Savvaidis et al. (2020) to conduct space-time clustering analysis and probabilistic associations based on distance and time. Other data sets are designed by Grigoratos et al. (2022) for use in hindcasting models.
To solve problems and overcome challenges faced when preparing large amounts of data, we develop data processing workflows that use Python and ArcGIS Pro tools. In this paper, we highlight how this automated approach efficiently produces quality, consistent, and reproducible results. We describe using scripts and models to customize data table production, geospatial processing, QA/QC, and final output production. The automated workflow developed for these recurrent tasks ensures timely and reliable data sets and materials are provided to scientists for further analysis.
Data
Earthquake data, hydraulic fracture data, and SWD data are collected from five online sources: (1) the TexNet Earthquake Catalog, which provides earthquake data for Texas; (2) the TexNet Injection Volume Reporting Tool, which provides daily injection data for select Texas wells; (3) the FracFocus Chemical Disclosure Registry, which provides hydraulic fracture data to the public; and (4) B3 Insight and (5) the IHS Enerdeq Browser, which are proprietary database services that provide current and historical well data through paid subscriptions. TexNet makes their data available to the public on dynamic map websites.
Data are collected from the source websites in comma-separated values (CSV) format. Each website has different dynamic mapping functions, database query styles, and download options. Each data download is unique in contents, size, accuracy, completeness, and structure. The workflow includes collecting updated source data, processing collected files using Python, and importing Python-generated tables into the GIS environment for further processing. The workflow also includes the production of updated output, such as maps and data sets, designed for further investigation and analysis. Individual scripts and models are developed for each file we use. The complexity of the workflow depends on what data are collected and how the data are processed. The amount of time spent on data collection and processing depends on the data source, data structure, size of the area of investigation (AOI), and length of the date range.
The TexNet Earthquake Catalog dynamic map website provides information and data downloads to the public on Texas earthquakes and seismic stations used for locating earthquakes in Texas. TexNet began reporting in January 2017. Event table fields that are used for the investigation of induced seismicity include event ID, date, time, magnitude, latitude, latitude error, longitude, longitude error, depth, and depth uncertainty. Station table fields of interest include station name, network, elevation, latitude, longitude, date deployed, and date decommissioned. The Python script developed for the seismic station table adds station type classifications that are used in maps (Figure 1).
The TexNet High Resolution Catalog is a data set that contains relocated earthquakes in the Delaware Basin of Texas. These data are available to the public to view or download from the dynamic map website. Select earthquakes from the TexNet Earthquake Catalog were relocated using waveform crosscorrelations and software called GrowClust by Trugman and Shearer (2017). This data set is periodically updated. The high-resolution catalog is contained in a single CSV file. We map these data points and classify the events by year to show how seismicity changes over time (Figure 2).
The Railroad Commission of Texas (RRC) Oil and Gas Division is the regulatory agency for the exploration, production, and transportation of oil and natural gas. They manage multiple databases with online platforms on which the public can search for data related to production, drilling permits, well records, and more. In response to high-magnitude seismicity, the RRC defines a seismic response area (SRA) and requests action by well operators within the defined area boundaries. The RRC may require or request more frequent reporting, or they may limit or suspend injection into the subsurface. The RRC webpage Seismicity Response provides data and information about notices and SRA boundaries. Currently, there are three areas in Texas where actions were taken: Gardendale (Midland-Odessa), North Culberson-Reeves (NCR), and Stanton.
The TexNet Injection Data Capture System allows injection well operators to share detailed information about injection wells in Texas. Data of interest include well ID, injection zone top depth, injection zone bottom depth, daily injection volumes, and daily pressure readings. The TexNet Injection Volume Reporting Tool makes these data available to the public on a dynamic map website. Data collection began in November 2021 after the RRC requested well operators voluntarily provide daily data for disposal wells. This database often is updated as new daily records are uploaded to the data capture system and as new wells join the voluntary program. This data set contains a sample of injection wells from across Texas. As of January 2024, there are 490 injection wells contributing to this database. The daily injection records span from February 2016 to present. Some wells providing information to the TexNet daily database are located in the Delaware Basin and operate within the NCR SRA boundary (Figure 3). Sharing this data helps the public understand seismicity mitigation efforts.
The subscription-based data service B3 provides data on current and historical injection wells in Texas and New Mexico. The B3 database is updated frequently by technicians and contains well-injected records that go back many decades. Texas injection wells and New Mexico injection wells are direct download zip files that each contain a set of related tables in CSV format. A routine task with B3 data is to build a data set containing well headers and monthly injection information within a defined date range and AOI. The injection well record contains well ID, latitude, longitude, injection type, and permitted injection interval depths. The monthly injection record contains the well ID, date of injection, and associated volume. The well table is joined to the monthly volume table using the unique well ID. Mapping the wells from the B3 database shows the extent of disposal wells that have existed in the Delaware Basin (Figure 3).
The FracFocus website provides data on hydraulic fracture jobs completed in the United States beginning in 2011. These data are available to the public as a zip file download containing a set of tables in CSV format. Well operators in Texas are required by the RRC to report hydraulic fracture job information and ingredients to FracFocus. Well operators are responsible for uploading this information into the system. Hydraulic fracture job data include well American Petroleum Institute (API) number, latitude, longitude, total vertical depth, job start date, job end date, and the volume of fluid injected into the subsurface. Mapping the wells contained in the FracFocus database shows the extent of hydraulic fracture operations that have occurred in the Permian Basin (Figure 4).
The IHS platform is a subscription-based data service providing access to large volumes of historical and current oil and gas well information. This database is managed by technicians. A web-based platform is used to query the data and submit data requests. Well data are posted for download as a zip file containing a set of related CSV tables. Geospatial data describing the well architecture are available for download in shapefile format. Three shapefiles showing the well surface, well bottom, and wellbore are provided as 3D features, with limited well information in the attribute table. The spatial data are added to the GIS environment, and the Coalson earthquake is buffered to create an AOI (Figure 5).
Methods
For illustrative purposes, a radius around the 5.4 magnitude Coalson earthquake in Reeves County, west Texas is defined as the AOI for which to collect and compile updated earthquake and well data. The source CSV files are collected, processed with Python, and imported to the ArcGIS Pro environment. Here, geospatial and QA/QC processing tasks are automated in customized models, whereas processing information is documented in history. Final data tables exported from the GIS environment are designed to meet end-user specifications.
We develop Python scripts for source data tables to select columns of interest, filter records, and change field headings to meet best practices while writing a new file designed for easy import into other software environments. For the initial step, each data table is processed using a specific Python script to prepare a new CSV table for import to ArcGIS Pro. The new tables are imported to the geodatabase using automated models stored within ArcToolbox. Models also define additional automated tasks, such as geospatial functions and QA/QC processing.
Collecting and processing TexNet event and seismic station data are simple. The two CSV files available for download are relatively small and easy to filter. For this reason, scientist requests pertaining to these data are typically for maps or map packages showing events with complex symbology and seismic stations classified by type. Python processing includes adding new data elements to the separate tables. Time period codes representing earthquake year and quarter are added to the event records. Station type classifications are added to the records in the station table. Python-created tables are imported to the ArcGIS Pro geodatabase using models that contain the path to the input files and define table properties upon import to GIS. Geospatial models store information and instructions for spatial processing. Table data are converted into spatial data points by mapping the latitude and longitude in the defined coordinate system. Another model splits TexNet event point features by the Python-coded time period to facilitate complex symbology for which the event time period is the symbol color and magnitude is the symbol size.
FracFocus is our primary source to download hydraulic fracture job data. Python processing here prepares the table while merging three separate tables that comprise the total data set. Columns selected for use from the source data table include the well API number, latitude, longitude, fluid volume, total vertical depth, job start date, and job end date. Because of known issues with typos and missing data in the early records of this database, an automated method was developed to identify and flag issues for wells located in Texas and New Mexico. To fix the flagged issues, substitute data elements are found in other databases or searched at RRC online query platforms. Some corrections can be automated but some cannot, so we developed a workflow to retain corrected records through each update. The hydraulic fracture job data set update begins with downloading the source data and Python processing. Using models, the Python-generated table is imported to the ArcGIS Pro geodatabase and converted to spatial data points according to the coordinate system defined in the record. There are three defined coordinate systems in this database, so three feature classes are created. Models transform the point features into a standardized coordinate system and merge the parts into one feature class. Models identify and flag potential issues in the records. For example, we created a model to QA/QC the fields called job start date and job end date. One function of the model selects records with date issues and codes them with a note to check the record. Another function of the model is to check the two date fields and transfer the date from one field to another for selected records. Additional models are developed to compare versions of this data set for the purpose of identifying and isolating records that are newly listed in the FracFocus download. Issues are identified in the new records and corrected where possible before the new records are appended to our updated version of the data set.
Texas injection wells and New Mexico injection wells are direct download packages from B3. The data derive from different state-based regulatory agencies, but the same type of information is reported. The Texas and New Mexico zip files contain two main CSV tables of interest: injection well and monthly volumes. CSV files are Python processed and imported to the ArcGIS Pro geodatabase using models that define the table properties at import. Geospatial and QA/QC processing models map the injection wells using coordinates found in the table and then join the well table to the monthly volume records using a unique ID assigned by B3.
Similarly, data tables downloaded from IHS are processed using Python and imported to geodatabases using models. From the IHS Well Workbook, the header table contains a well description and location information. Other data of interest are contained in the survey point table and test treatment table. The survey point table gives a detailed description of the wellbore in lateral space and depth. The test treatment table provides treatments on the wellbore involving fluids and including hydraulic fracture jobs going back in history. Data related to well activities, including produced fluids and fluid volumes injected into the subsurface, are contained in the IHS Production Workbook. Related tables are joined using a 14 digit numbering system, which creates unique identifiers based on the 10 digit well API number and any wellbore changes.
The header table from the IHS Well Workbook can be joined to well data sets from other sources using the well API as a unique ID. This allows manual and automated comparisons between data sets. The joined header table may contain information to correct errors and replace null values identified in the records from the other data sources. Models are designed to make table joins, select records, and define data operations. Data values are systematically transferred from one data table to another, leading to a more complete data set.
Results
Python scripts developed for use with TexNet Earthquake Catalog data save time by automating the addition of descriptive data elements necessary for creating seismic station and event symbology in the GIS project and on exported maps. To facilitate splitting and grouping the events contained in the table, the script adds a time period code and parses the date into separate fields for the month, day, and year (Table 1).
Python scripts developed for use with TexNet data downloads are available on GitHub. Also available is a Python script developed to calculate and convert the TexNet Injection Volume Reporting Tool data from daily volumes to monthly volumes. This converted monthly data can be compared or compiled with monthly injection data from other data sources.
Tableau Desktop is a data-driven visual analytics platform that creates many types of charts and graphs. To compare multiple data sets over time, we make dual-axis charts with time synchronized on the x-axis and the two other measures on the left and right sides of the y-axis. We use the automated workflow to prepare data for import to Tableau. Within this workflow, TexNet events are spatially joined to areal features, and the area name is added to the event record. Tableau uses the area name field to classify the event data within the chart (Figure 6).
Python scripts and ArcGIS Pro models developed for the hydraulic fracture update save time and minimize concern about lost records. During QA/QC processing, a short description of any identified issue is coded into our data set, making the issue easy to address. Substitute values can be found in other databases or searched at the RRC online queries. To date, approximately 100 records for wells located in Texas have been identified to contain some issues. These issues could lead to those records being dropped due to errors in longitude and latitude or other fields commonly used for filtering the database. Two hydraulic fracture records located within a 30 km radius of the Coalson event are found to have a typo in the longitude field (Figure 7). The two records identified here are for hydraulic fracture jobs from the year 2012. The corrected records will remain in our version of the data set through future updates.
Discussion
Developing Python scripts and automated tools for processing data and exporting data sets to be used for the purpose of investigating induced seismicity in Texas is warranted. High quality and up-to-date data sets are necessary for conducting advanced research and analysis of ongoing seismicity. Creating large, complex data sets over a large AOI and date range takes a significant amount of time to complete the numerous steps. Doing this work manually must be done carefully to avoid missteps or mistakes.
Automating the steps of this workflow using Python scripts and ArcGIS Model Builder models saves time and produces consistent results. Data exported from ArcGIS are designed to meet defined specifications, and the results from the automated data processing are reliable and reproducible. Our proposed methodology, using automated tools to customize these workflows, ensures the produced data set will be compatible with the intended use case. This data processing and data set production work is efficiently carried out using these powerful tools, so scientists are able to conduct an accurate investigation, analysis, and assessment of induced seismicity.
Conclusion
Using Python to strip unnecessary or unwanted columns and records from tables while creating new universally compatible field headings immediately solves challenges met when importing data. The GIS environment can easily manage large data sets related to well architecture, operations, and activities. ArcGIS Pro contains powerful tools for automating complex data processing workflows using Model Builder and ArcToolbox.
This method was developed to meet the need to produce quality maps and data sets consistently and efficiently. Python and the ArcGIS Pro environment are used together in this organized workflow to effectively process the large amount of data that are required for induced seismicity studies across Texas. Data collected from the various sources are swiftly processed using these automation tools, resulting in quality map and data set production.
Acknowledgments
The authors would like to thank the State of Texas for funding this research through TexNet. The authors would like to thank the RRC and FracFocus for providing data to the public. Also, the authors would like to thank B3 and IHS for providing access to their databases through subscriptions. The authors would like to acknowledge ESRI for creating and supporting the ArcGIS Pro software and Salesforce for Tableau Desktop.
Data and materials availability
Data associated with this research are available and can be accessed via the following URLs:
py4_texnet_eqcat — https://github.com/ut-beg/py4_texnet_eqcat
B3 Insight — https://www.b3insight.com/
FracFocus Chemical Disclosure Registry — https://fracfocus.org/data-download
IHS Enderdeq Browser — https://my.ihs.com/energy
TexNet Earthquake Catalog — https://catalog.texnet.beg.utexas.edu/
TexNet High Resolution Catalog — https://hirescatalog.texnet.beg.utexas.edu/
TexNet Injection Data Capture System — https://injection.texnet.beg.utexas.edu/
Texas Railroad Commission Oil and Gas records — https://www.rrc.texas.gov/oil-and-gas/research-and-statistics/obtaining-commission-records/oil-and-gas-well-records/
Texas Railroad Commission Seismiciy Response — https://www.rrc.texas.gov/oil-and-gas/applications-and-permits/injection-storage-permits/oil-and-gas-waste-disposal/injection-disposal-permit-procedures/seismicity-review/seismicity-response/
GitHub for TexNet related Python scripts — https://github.com/ut-beg/py4_texnet_eqcat.
Texas Railroad Commission Notice to Oil and Gas Operators — https://www.rrc.texas.gov/media/zwnay4sj/2021-nto-texnet-volume-reporting-tool-11-03-2021.pdf
Biographies and photographs of the authors are not available.