Abstract

Scientists sometimes have the idea that “data are eternal,” i.e., that our scientific observations long outlive our hypotheses and ideas based on such. In this Editorial, we make use of work by science historians Chang (2004), Danielson and Graney (2014) and others, to show that data have a limited period of usefulness—a date of expiration so to speak. Beyond that date (probably mostly unknown beforehand), data either are (1) no longer of interest, because the problems that motivated their collection are resolved or no longer of concern, or (2) because new technologies render them obsolete. Scientific progress in any era is thus defined as the art of making observations that are “good enough”—so as to develop the “middle-level” and other theories, which as discussed by Chang (2004), appear to have lasting value.

Earlier this year Science published an Editorial titled “Data Eternal” (McNutt 2015); the premise is not uncommonly expressed—you’ve probably heard it many times: ideas come and go, but data endure well after our initial thoughts on a problem have vanished. If the premise is true, it has profound implications regarding what should be published, and even how we do science. If nothing else, publication policies should at least be modestly attuned to editorial philosophy.

So are data “eternal”? Of course not. The premise, though, is deceptively seductive: all scientists have had the experience of viewing a set of observations in one particular way, then later changing their view without any accompanying change in the data. Such experiences would seem to put data, or observations, on a much stronger footing than ideas; and in the very short term, this is often true. However, this comparison mistakes weak hypotheses with strong hypotheses, and strong hypotheses with theory. In the long term, data have a natural expiration date, like that ham sandwich you left in your refrigerator last weekend. It’s not easy to predict the half-life of decay, and data decay occurs for different reasons. But by contrast—and with luck—our ideas can have lasting value, either by supporting or proposing strong hypotheses, or “middle-level” theories (Chang 2004), while the data themselves fade into irrelevance. Scientific progress, then, can be defined as making observations that are good enough—for the time being—to address whatever questions interest us at a given moment.

The Copernican hypothesis of a heliocentric solar system provides an interesting example. Danielson and Graney (2014) nicely illustrate that resistance to a heliocentric model in the 16th and 17th centuries had little to do with religious prejudice and a lot to do with data—and two observations in particular. First, by an optic quirk, light from distant stars can appear to have a measurable diameter, and Tycho Brahe, the greatest astronomer of that age, made precisely such measurements. Second, even Copernicus recognized that stars exhibited no parallax when viewed at different times of the year. Copernicus argued that the diameter of his proposed terrestrial orbit was trivial compared to the distance of stars. But the sensible counter-argument, given the star-diameter data at that time, was that if the stars are immeasurably distant, their sizes would then be remarkably vast—much greater than the sun. So while the Copernican model might simplify calculations of planetary positions, it created problems as well: the stellar data led many to conclude that (1) stars are not that far away, and thus (2) with no observable parallax, Earth does not move. Copernicus could not explain the star-size issue, which was not resolved until much better optical instruments were developed. [As a fascinating aside, Danielson and Graney (2014) further show that in light of such incontrovertible star-size data, Copernicus’ supporters were consigned to calling upon an “infinite creator” to provide for large stars at vast distances—so much for a religion vs. science war.] Today, the astronomical data of that era are of historical interest only. The planetary positional data of Arab astronomers used by Copernicus are unlikely to be used to plan a flight path to Mars, or create a map of the solar system for elementary school textbooks. The garbage star-size observations are a lesson in humility: some data have a useful lifespan of zero. But Copernicus’ model was on its way to changing our view of the solar system.

Boyle’s and Charles’ Law provide other excellent examples—ideas that, in the form of equations, have lasted for centuries, while the data on which they were based were useful for decades at most. Boyle’s law states that the volume of a gas (V) is inversely proportional to its pressure (P), and so is commonly written as V ∝ 1/P, or as PV = k1, where k1 is a constant (at a given temperature). Charles’ Law provides another part of what would become the ideal gas law, i.e., that V is directly proportional to T, or V = k2T. The data of Charles were never published [we might think this a more gracious era, but then we have Isaac Newton’s treatment of Robert Hooke (Inwood 2004) to ruin that thought], but John Dalton and Gay-Lussac created similar experiments. None of these data are used today to obtain equations of state for air. The experiments performed at Robert Boyle’s laboratory are quite interesting and clever for the time. It is often said that Boyle noticed that P1V1= P2V2, at constant temperature, regardless of the values of P1, P2. But nothing of the sort is true. He noticed that P1V1 and P2V2 were similar—similar enough to posit that P1V1= P2V2, if there were no experimental error—which, of course, is impossible. In fact, one of the philosophical advances of Boyle’s era, of which his experiments provide an example, is the recognition of random error in observational data. As to John Dalton, let us defer to his contemporary Humphry Davy, who said, “He was a very coarse Experimenter & almost always found the results he required.” It is not clear if this was meant as a compliment. In any case, Boyle’s and Charles’ laws would lead to the ideal gas law, which is in turn used to derive the familiar relationship: lnKeq = −ΔG/RT. Boyle’s and Dalton’s experiments—based on fundamentally flawed and quickly outdated data—formed the foundation of modern physical chemistry.

Numerous examples occur in geology as well. Norman Bowen’s (1928) experiments on various binary and ternary systems laid the foundation for his theoretical approach to fractional crystallization and opened up a cottage industry of phase equilibria experiments on synthetic systems. His data have been little improved upon since. But none of Bowen’s data are used in MELTS (Ghiorso et al. 2002) and much of his data are ignored by most other quantitative petrologic models, which tend to focus on natural compositions. More recently, Sobolev et al. (2007) showed that analyzing olivine grains at high beam currents greatly increases precision on minor elements, like Mn and Ni. Petrologists interested in using olivine to understand mantle processes now consider low beam current data (e.g., almost everything published pre-2007) obsolete. Similarly, field geologists today now ignore older K-Ar dates in favor of newer and more accurate Ar-Ar dating techniques (though the earlier, preliminary timescales and stratigraphic ideas survive in their approximate form), and advances in zircon single-crystal age-dating have caused something of a minor revolution in the geochronology of granitic rocks, leaving nearly obsolete whole-rock U-Pb and Rb-Sr age dates. And advances in TEM, AFM, and related sample preparation methods appear poised to revolutionize our understanding of chemical bonding and crystalline structures. Many more examples of data with limited lifespans are outlined in Naomi Oreskes’ (1999) excellent book on the history of plate tectonics.

Nonetheless, some scientists fall mostly on one side of a debate that is more than 2000 years old. Plato distrusted observation, as he knew how easily the senses can be deceived. Plato thus raised reason—ideas detached from observation—to the highest plane of knowledge. Descartes agreed while Bacon preferred experience over axioms and syllogism. But David Hume showed them both to be in error. Where does this leave us? Chang (2004), making use of ideas by Feigl (1974), posits that “middle-level theories” (Snell’s Law, Archimedes’ Law of Levers, fixed points in thermometry, etc.) have much greater and lasting value than the sense data from which they are derived, and they may provide a remarkably secure ladder upon which science is elevated. But this also means that data, no matter how carefully collected, is very unlikely to withstand technological and theoretical advances. Eventually, either the problems of interest to us today will be solved to collective satisfaction, or new technologies—or even new hypotheses—will provide a better means to address those questions that remain of interest. For these reasons and others, the last geologic map of the Grand Canyon has yet to be drawn, and the last geochemical study of Hawaii (nearly 1000 studies since 1914), is yet to be performed.

If we accept that scientific progress occurs by collecting data that are “good enough”—to create useful or interesting ideas, or “middle-level theories”—then what does this mean for science publications? As a journal, we participate in the “norms” outlined by Robert Merton (1942), of universalism, organized skepticism, disinterestedness, and communalism (“communism”), which is a sociologist’s way of saying that we universally share, criticize, and build on one another’s ideas and observations. These activities happen outside a journal to be sure, but the journal review and publication process manifests all four norms. New observations (and even bad ones) can catalyze new ideas, which in turn catalyze and direct the collection of new data. This is not to say that there is no role for the publication of a “data journal,” i.e., a journal dedicated to the publication of observations unconnected to ideas, problems, or scientific hypotheses. And certainly, the data archival issues noted in the Science Editorial (which were really the key topic there) are important. But our vital role is to bring to the attention of readers data and ideas that potentially act as catalysts—which attempt to solve problems, or propose, validate, refine, or overturn existing ideas, etc. There is room for papers focused on ideas or data. But to focus on one to the exclusion of the other is to ignore half the scientific mission—or to make a bold attempt at fathoming scientific germaneness at some future date, with a dim and dubious hope of relevance.