Has Data Journalism Delivered?
Data journalism has been around since the 1950s, but it’s only just started to become ubiquitous in reporting. How do we define data journalism today? Who’s putting data to good use? Read on for these answers and more.
It was not so long ago that data journalism was the latest buzzword du jour. From Nate Silver’s FiveThirtyEight blog that had political pundits on the edge of their seats during the 2016 U.S. election to the AP’s big 2015 investment in the future of data journalism, there were plenty of media strategists who thought data — and, perhaps more specifically, data visualization — were the key to saving journalism. But has data journalism actually delivered on that promise? Let’s explore.
What is data journalism?
Part of a journalist’s job has always been to collect and analyze data — or find sources who can — but in the technology age reporting and, more specifically, data journalism have transformed. As defined by TechTarget, “Data journalism is an approach to writing for the public in which the journalist analyzes large data sets to identify potential news stories.” This change has been enabled by technology. Artificial intelligence (AI) and machine learning make it possible for journalists to find patterns and stories in big data dumps. Meanwhile, the digital nature of the web makes it possible for journalists and designers to make interactive data visualizations that engage and delight readers.
Examples of data journalism in action
For a deeper understanding of data journalism, one need look only to the wide variety of use cases. As part of their quest to explore the professed “future of journalism,” The Bureau of Investigative Journalism identified several compelling examples of data journalism at work in the United Kingdom, such as:
- BuzzFeed News — BuzzFeed Contributor Maeve McClenaghan, in conjunction with BBC Radio 5 Live, used data from over 140 Freedom of Information requests to reveal a large increase in missing children who were seeking asylum.
- Chronicle Live — Sam Houlison and Charles Boutaud-WO analyzed data from prisons across England and Wales to reveal HMP Durham erroneously released at least one inmate every year since 2008.
- Digital Times — Megan Lucero helped analyze over 72,000 financial transactions from the Department of International Development, uncovering a staggering £3.4 billion in foreign aid was being spent on “consultancy work.”
Datajournalism.com cites several additional examples of data journalism in action, including The Guardian’s visualization of glacial melting over the last 40 years, Al Jazeera’s map displaying the effects of 20 years of war, and the International Consortium of Investigative Journalists (ICIJ)’s explosive Pandora Papers, a collaborative uncovering of extensive offshore dealings by public officials and heads of state.
How does data journalism work?
Data journalism can take many forms, but at its core it deals with the selection, aggregation, cleaning, and analysis of chosen datasets.
Online Journalism Blog classifies the typical data journalist’s process as an “inverted pyramid” comprised of five steps:
- Compile — As Online Journalism Blog puts it, “Data journalism begins in one of two ways: either you have a question that needs data, or a dataset that needs questioning. Whichever it is, the compilation of data is what defines it as an act of data journalism.”
- Clean — Just because you’ve identified a dataset doesn’t mean it’s ready for analysis. A critical component of data journalism is cleaning data, which usually means identifying and eliminating human error and/or converting the workable data to the same format.
- Context — Data is only as valuable as its context, and so the methodology must be verified just like for any other source. This step is also when any specific terminology must be understood or translated.
- Combine — “Good stories can be found in a single dataset, but often you will need to combine two together,” says Online Journalism Blog. “After all, given the choice between a single-source story and a multiple-source one, which would you prefer?”
- Communicate — Once your clean datasets have been verified and combined, it’s time for the journalism part of data journalism. One popular way of communicating data journalism is through visuals such as infographics or maps.
Essential data journalism tools
As much as data journalism is driven by reporters with big ideas and sharp research skills, it is also the result of advances in technology. Most data journalists have come to rely on a suite of technologically-driven tools to process and analyze large volumes of data — some of which are humbler than you might expect. When polling data journalists about data analysis and visualization tools, Datajournalism.com’s 2021 survey found 74% used Excel and 58% used Google Sheets to manage data.
Invest in Tech cites Google’s Open Refine as a popular open-source tool for data cleaning and German data visualization tool Datawrapper for aggregating “imported and formatted data from different sources” into compelling charts, graphs, and maps.
But when it comes to processing large or complex datasets, data journalists are increasingly turning to coding. Datajournalism.com found, “The most commonly used programming language is Python (63%), followed by HTML/CSS (51%) and R (46%).”
The present — and future — of data journalism
The term “data journalism” has been in circulation since the 1950s, but with the proliferation of data over the past decade, it has finally expanded from a niche reporting style to standard practice among most major publications.
Never has this been more apparent than during the height of the COVID-19 pandemic, when the public suddenly relied on data-driven reporting to keep them safe. A 2021 survey conducted by Datajournalism.com found “25% of respondents first got into data journalism because of the pandemic.” Moreover, 46% of respondents found COVID “strengthened data journalism,” and 43% believed it “increased audience data literacy (43%).” Reuters’ COVID-19 Vaccination Tracker and The New York Times article “What It Takes to Understand a Variant” are two high-profile examples of cutting-edge data journalism being used to report on the coronavirus.
Looking ahead to the future, it seems inevitable data journalism will continue to be a staple of major news organizations. Technology will only help this cause. Emerging tools powered by AI and machine learning will allow journalists to process larger volumes of data even faster — and enable more publications to bring an objective, data-driven edge to their reporting.