Skip to Main Content
Skip Nav Destination

Managing scientific data; from data integration to scientific workflows

Bertram Ludaescher, Kai Lin, Shawn Bowers, Efrat Jaeger-Frank, Boyan Brodaric and Chaitan Baru
Managing scientific data; from data integration to scientific workflows (in Geoinformatics; data to knowledge, A. Krishna Sinha (editor))
Special Paper - Geological Society of America (2006) 397: 109-129


Scientists are confronted with significant data management problems due to the large volume and high complexity of scientific data. In particular, the latter makes data integration a difficult technical challenge. In this paper, we describe our work on semantic mediation and scientific workflows and discuss how these technologies address integration challenges in scientific data management. We first give an overview of the main data integration problems that arise from heterogeneity in the syntax, structure, and semantics of data. Starting from a traditional mediator approach, we show how semantic extensions can facilitate data integration in complex, multiple-world scenarios, where data sources cover different but related scientific domains. Such scenarios are not amenable to conventional schema integration approaches. The core idea of semantic mediation is to augment database mediators and query evaluation algorithms with appropriate knowledge representation techniques to exploit information from shared ontologies. Semantic mediation relies on semantic data registration, which associates existing data with semantic information from an ontology. The KEPLER scientific workflow system addresses the problem of synthesizing, from existing tools and applications, reusable workflow components and analytical pipelines to automate scientific analyses. After presenting core features and example workflows in KEPLER, we present a framework for adding semantic information to scientific workflows. The resulting system is aware of semantically plausible connections between workflow components as well as between data sources and workflow components. This information can be used by the scientist during workflow design, and by the workflow engineer, for creating data transformation steps between semantically compatible but structurally incompatible analytical steps.

ISSN: 0072-1077
EISSN: 2331-219X
Serial Title: Special Paper - Geological Society of America
Serial Volume: 397
Title: Managing scientific data; from data integration to scientific workflows
Title: Geoinformatics; data to knowledge
Affiliation: University of California, Department of Computer Science, Davis, CA, United States
Affiliation: Virginia Polytechnic Institute and State University, Blacksburg, VA, United States
Pages: 109-129
Published: 2006
Text Language: English
Publisher: Geological Society of America (GSA), Boulder, CO, United States
ISBN: 0-8137-2397-3
References: 52
Accession Number: 2006-043275
Categories: Miscellaneous
Document Type: Serial
Bibliographic Level: Analytic
Illustration Description: illus. incl. sketch maps
Secondary Affiliation: San Diego Supercomputer Center, USA, United StatesNatural Resources Canada, CAN, CanadaUniversity of California at San Diego, USA, United States
Country of Publication: United States
Secondary Affiliation: GeoRef, Copyright 2017, American Geosciences Institute.
Update Code: 200624
Close Modal

or Create an Account

Close Modal
Close Modal