GEOTRACES is an international program that involves scientists from 35 nations and that has as its primary goal an improved understanding of biogeochemical cycles and the large-scale distribution of trace elements and their isotopes (TEIs) in the ocean (www.geotraces.org) (Lam and Anderson 2018 this issue). Trace elements are critical for marine life and influence the functioning of ocean ecosystems and the global carbon cycle (Lohan and Tagliabue 2018 this issue); other trace elements are of concern as contaminants or toxins (Hatje et al. 2018 this issue). Thus, understanding the role of trace elements in the marine environment has become a top geoscience priority.
This Toolkit describes the ongoing efforts to harmonise the global, high-quality geochemical datasets produced through GEOTRACES. The ultimate goal of the GEOTRACES initiative is to provide large data products that can be used across multiple disciplines in Earth science. It is hoped that the methodologies we describe here might serve as a model for other programs that seek to collect, standardise, aggregate and release data to the science community. The authors of this Toolkit served as co-chairs of the Data Management Committee of the GEOTRACES program between 2013 and 2017.
GEOTRACES has been releasing a series of Intermediate Data Products (IDPs) during the active observational phase of the program (2010–2025). The first IDP was released in 2014 (Mawji et al. 2015). It included data from 15 research cruises undertaken by 7 nations in the Atlantic, Arctic, Southern and Indian Oceans between 2007 and 2012. The second IDP was released in 2017 (Schlitzer et al. 2018), which nearly doubled the volume of data contained in the previous release while expanding observations to the Pacific Ocean. The 2017 data release (IDP2017) also included biological, aerosol and rain parameters which should be useful for characterising internal cycling and atmospheric sources of TEIs.
Both the 2014 and 2017 GEOTRACES Data Products consist of two parts: (1) a digital data compilation of discrete sample datasets, including TEIs and other parameters; (2) an electronic atlas that provides plots and animated 3-D scenes of the data for different ocean regions. The discrete sample datasets in IDP2017 include data from 458 parameters measured at 1,810 stations visited during 39 cruises undertaken between 2007 and 2014 (Fig. 1). In total, the 2017 discrete data compilation is the result of 46,794 unique sample analyses. An additional feature of the IDP2017 release is that it provides data quality flags that use the simple International Oceanographic Data and Information Exchange quality flagging scheme (www.iode.org/mg54_3) and 1σ data uncertainty values, where available. These can be used for data filtering. Furthermore, the IDP2017 digital data include two useful accompanying metadata files: (1) full cruise reports, with detailed documentation of ship operations, sampling equipment and procedures; (2) information about data originators, data sources, sample processing and analytical methods, including links to a searchable reference database that links to the original publications of the data. In order to make it clearer how this project was developed, we have provided a flow chart outlining the processes used for constructing IDP2017 and the roles of the different GEOTRACES groups and committees that were involved in building the product (Fig. 2). It is anticipated that the next GEOTRACES IDP will be released in 2021.
Active data management is essential for promoting data sharing and collaboration amongst the broader oceanographic and Earth science community. Production oversight of the GEOTRACES IDPs is coordinated by the Data Management Committee, with the compilation of data and the associated metadata from GEOTRACES cruises carried out primarily by the GEOTRACES International Data Assembly Centre (www.bodc.ac.uk/geotraces). GEOTRACES investigators can either submit their data directly to the GEOTRACES Data Centre or via four national data centres in the US, Japan, France or the Netherlands. A significant component towards data quality is that GEOTRACES requires intercalibration of data in accordance with rules set by the Standards and Intercalibration Committee. Such intercalibration of data is fundamental for assuring the robustness of the IDPs, because it allows direct comparability of TEIs data between different investigators in different regions and which may have been collected at different times. During the data management processes, formalising the names and definitions of GEOTRACES parameters and cruises was a key step in building the IDPs. The work of both GEOTRACES’ Standards and Intercalibration Committee and the Parameter Definition Committee has proved crucial when preparing the data products.
A key aspect of GEOTRACES is the participation of many nations. However, because individual basin-scale cruises (‘sections’) (Fig. 1) use different sampling equipment and produce analyses in different laboratories from around the world, it is essential to ensure that the data generated are precise, accurate and comparable. Thus, sampling methods for dissolved and particulate constituents must take representative (of the water depth/water mass) and uncontaminated samples. Such samples must be stored (or immediately analysed) in a fashion that preserves their chemical compositions and speciation, and the analyses of these samples must yield accurate data for such parameters as concentration, activity, isotopic composition and chemical speciation. Procedures for acquiring data on TEIs has been documented in the GEOTRACES Cruise and Methods Manual (the ‘Cookbook’). Because methods continually evolve, the GEOTRACES Standards and Intercalibration Committee monitors advances, as validated by the intercalibration activities and then modifies procedures accordingly. GEOTRACES coordinated two intercalibration cruises early in the program: one in the subtropical Atlantic Ocean, and the other in subtropical Pacific Ocean (see suite of papers in Limnology and Oceanography Methods, 2012, vol. 10 issue 6). GEOTRACES is continuing with active intercalibration exercises for specific types of data. From these cruises, a series of reference materials were generated that could be distributed to laboratories to aid with method development and validation, as well as enabling quality control and traceability of sample data.
All data in the GEOTRACES IDPs adhere to specified standards. An intercalibration report must accompany data submissions, documenting the sampling, sample processing, measurement and intercalibration procedures that were used. Each report is examined to ensure samples were collected according to the relevant ‘Cookbook’ protocols and that methods, blanks and detection limits were all documented. External evaluation is achieved by analyses of suitable certified reference materials, or consensus materials. Importantly, most GEOTRACES sections use ‘crossover’ stations whereby two cruises occupy the same station, thereby allowing for a direct intercalibration of data. Where this is not possible (e.g. a second nation has not yet completed their section), groups collect a replicate set of samples to be analysed in a second, independent laboratory.
The assessment of external intercomparability of each group of TEIs differs depending on the type of TEI, the current state-of-theart of the available analytical techniques, the variability of the signal as a function of time, and the nature of data acquisition (e.g. sensor vs. bottle). Participation of groups in other international programs (e.g. GO-SHIP, CLIVAR and SOLAS, each of which have their own inter-calibration procedures) is also taken into consideration when assessing data submissions. All data for TEIs in the IDP2017 have undergone rigorous quality checks by carefully following the inter-calibration and crossover station procedures (pathway 1; http://www.geotraces.org/dp/submit-data/flow-chart).
During the production of GEOTRACES IDP2017, a strict convention for naming and defining each of the 400+ parameters was imposed. Parameters were placed in 6 distinct groups following a six-token format: 1_Element/compound, [2_Oxidation State], [3_Atomic Mass], 4_Phase, 5_Data Type, and 6_Sampling System; tokens 2 and 3 were optional. This structure allowed for the intuitive searching of data and provided a mechanism for expanding the list of parameters so as to accommodate the more than 1,000 parameters that are expected by the time the GEOTRACES Final Data Product is compiled around 2025. For IDP2017, the parameter names supplied by contributing investigators were modified to be consistent with the convention (pathway 2; http://www.geotraces.org/dp/submit-data/flow-chart).
We would now like to explain how one can access IDP2017 digital data. These data are provided as an open-access resource via www.bodc.ac.uk/geotraces/data/idp2017. Data can be downloaded as a full package or as customised data subsets. Users are provided with four format options: ASCII, Excel, netCDF and Ocean Data View. Via the eGEOTRACES resource (www.egeotraces.org), an electronic atlas generates a quick overview of many of the geochemically relevant parameters. This feature is particularly useful for teaching, outreach and policy initiatives.
The GEOTRACES IDPs bring together many different threads from major, largely independent, international science initiatives. The push to make datasets standardised will lead to the release of more coherent data directly back to the scientific community. In doing so, the program will facilitate scientific efforts and outreach far beyond the primary goal of GEOTRACES. The work we present here is obviously the result of a great amount of effort by many members of the GEOTRACES network, and in particular the many individuals in the various GEOTRACES subcommittees. We thank Maeve Lohan, Reiner Schlitzer and Walter Geibert for their useful comments on this paper. GEOTRACES gratefully acknowledges the financial support of the Scientific Committee on Oceanic Research through grants from many international agencies.