Fact checking is basic journalism. A source tells you something, you verify it, often with a second trustworthy source.
Data-driven journalism makes that harder: the reporter personally analyzes and interprets the data instead. There’s nobody to verify the reporter’s data is right, the software used to slice and dice the data is right, the software was used correctly, and the results were interpreted correctly.
I like the idea of accompanying data-driven news reports, infographics, and even financial statements with the supporting material. Publish the data sets, methodology, and scripts you used. Publish enough of the raw material and your tools so others can reproduce your results, confirming or building on your work, or find your mistakes and correct the public record.
Because we need to trust what we read. We need that in peer-reviewed economic papers with Excel errors. We need this kind of transparency from companies communicating their capabilities, from governments reporting their condition, and from reporters telling the first draft of history.
So when you’re setting up terms for access to data sets for your research, make sure you negotiate some rights to repost that data as a supplement to your report. When you write a Python script or an Excel macro or an R pattern to analyze the data, be sure to you can post it on Github.
Because computational reproducibility is the new second source.
Science Code Manifesto for researchers pursuing reproducibility.
Replicating Research: Austerity and Beyond by Nancy Folbre.