Data journalism — also called computational journalism or computer-assisted reporting — is the art of using digital data sets to illuminate something about the world that people who are not experts need or want to know. As more and more data is being put online, and as tools for managing and visualizing it have improved, data journalism has surged in importance in contemporary newsrooms.
As sketched by its leading practitioners, the major steps in data journalism are:
- Finding data sets that may contain stories or prove useful. This may include filing FOIA requests from government, which takes time — sometimes years.
- “Cleaning” the data. All data sets contain errors, imperfections and small glitches that have to be removed before the real work can start. New tools are making this a lot easier but it can still be tedious and time-consuming.
- Analysis: finding significant trend lines or stories in the data that may be worth reporting.
- Verification: do the findings make sense? Do they cohere with what else we know? Can we check them against on-the-ground reporting? Are we making a mistake?
- Visualization: making the data come alive for users with charts, tables, maps, illustrations and the like.
Sarah Cohen, editor of the New York Times computer-assisted reporting team, cautions: “Just like most of your notes never end up in a story, or most of the photographs taken don’t end up published, most of these [visualizations] are used only for our own understanding, not for publication. If it works, that’s fine. If not, well, it’s just part of the process.”
Often in data journalism the final product is an “interactive,” which means a published feature that permits users to explore the data themselves by clicking around in it or entering personal information like a postal code.