Is Computational Reproducibility Data-Driven Journalism’s Second Source?

Cover from The Best Of The Journal Of Irreproducible ResultsFact checking is basic journalism. A source tells you something, you verify it, often with a second trustworthy source.

Data-driven journalism makes that harder: the reporter personally analyzes and interprets the data instead. There’s nobody to verify the reporter’s data is right, the software used to slice and dice the data is right, the software was used correctly, and the results were interpreted correctly.

I like the idea of accompanying data-driven news reports, infographics, and even financial statements with the supporting material. Publish the data sets, methodology, and scripts you used. Publish enough of the raw material and your tools so others can reproduce your results, confirming or building on your work, or find your mistakes and correct the public record. 

Because we need to trust what we read. We need that in peer-reviewed economic papers with Excel errors. We need this kind of transparency from companies communicating their capabilities, from governments reporting their condition, and from reporters telling the first draft of history. 

So when you’re setting up terms for access to data sets for your research, make sure you negotiate some rights to repost that data as a supplement to your report. When you write a Python script or an Excel macro or an R pattern to analyze the data, be sure to you can post it on Github.

Because computational reproducibility is the new second source. 

See also: 

Science Code Manifesto for researchers pursuing reproducibility. 

Replicating Research: Austerity and Beyond by Nancy Folbre. 

On May 4, 2013, kdmcBerkeley will host its first Complexity & Context Data Journalism Symposium.

Phil Wolff

Phil Wolff is a product manager with Code for America’s Open Oakland brigade. Phil helped personal data startups at PDEC, the Personal Data Ecosystem Consortium, a Small Data NGO. Wolff was a director of the DataPortability Project and co-author of the project’s model Portability Policy. He’s had management, technology, and marketing roles at Adecco SA, LSI Logic, Bechtel National, Wang Laboratories, Compaq Computer, the City of Long Beach, the State of California, and the U.S. Navy Supply Systems Command. On LinkedIn, ORCID 0000-0002-7815-4750, Quora Top Writer of 2012. Phil lives in Adams Point, Oakland, California.

Facebook Twitter LinkedIn Google+ Flickr Skype