Data-powered journalism

eNCA // Johannesburg

Tyler Dukes // @mtdukes

"Data journalism is not new."

Simon Rogers
Founder, Guardian Datablog

The Manchester Guardian, May 1821

No document of a similar nature has yet been laid before the public.

At all times, such information as it contains is valuable; because, without knowing the extent to which education, and particularly the education of the labouring classes, prevails, the best opinions which can be formed of the condition and future progress of society must be necessarily incorrect.

N.H., 1821

Puzzles and secrets

It has
never before
been easier for journalists
to seek, analyze and integrate data
into stories.

Data has limits

"We tend to think of data as immutable truth. But we forget that data and data-collection systems are created by people."
Meredith Broussard, The Atlantic

Why Poor Schools Can’t Win at Standardized Testing

The Atlantic // United States

Broussard found the textbook data was so untrustworthy, it was pretty much unusable by public officials.

A document state of mind

What each of us needs is a mindset
not a list
a mindset that says,
"The information I need
is out there somewhere,
and I'm going to find it."

Pat Stith
Investigative reporter, The News & Observer

Start small

Start smart

Truth testing

Authorities in Kenya say preteen girls in rural areas drop out of school for reasons of tribal tradition: This is the age they help parents with housework and chores.

Toilets and Grades

NTV Kenya // Kenya

The story

Young girls in rural Kenya are dropping out of school because of a lack of sanitation facilities.

How they did it

After ruling out medical problems from records, reporter Irene Choge used Kenya Open Data water information to show schools with the worst academic records were also the ones with the worst physical infrastructure.

Civic participation

GotToVote Kenya

Code for Africa // Kenya

The story

Civic hackers found that Kenya's Independent Electoral and Boundaries Commission published voter registration centers only in PDF format - and decided to build an app to help people find out how to vote.

How they did it

With about 24 hours and $500, the Code4Kenya team turned the PDF data into an interactive spreadsheet. They then launched a site that helped people find where to vote based on where they live.

Environmental protection

Rhino poachers

Oxpeckers // South Africa

The story

Reporters for the Oxpeckers Centre for Investigative Environmental Journalism found that South Africa fails to convict for crimes related to rhino poaching. In 2010, the conviction rate was as 2.6 percent.

How they did it

The group reused a data platform from InfoAmazonia, and using data from the Police Ministry on arrests and prosecutions since 2010, mapped and analyzed that information.

Public infrastructure

Frozen pipes

CBC News // Canada

The story

Reporters identified - for the first time - 5,171 properties in the city of Winnipeg in Canada that could develop frozen pipes - 70 percent of those at risk.

How they did it

After being told that Winnipeg officials would only release data one address at a time, CBC scraped the data for more than 190,000 properties and did the analysis on their own.

Public safety

The Child Exchange

Reuters // United States

The story

Using a loophole in the country's adoption rules, children traded in an underground market for were abused and neglected in a practice called "rehoming."

How they did it

Reuters scraped and analyzed 5,029 posts over a five-year period on message boards used by parents seeking to re-home children, using it as a starting point for their stories.


The Migrants Files // International

The story

A team of 10 journalists from six countries built a comprehensive database to track deaths of migrants in the Mediterranean sea amid an international discussion of how to prevent these tragedies.

How they did it

No database of this information existed. So journalists built their own, combining data from government sources and carefully curated news sources to track 13,718 migrants.

Following the money

Secrecy for sale

ICIJ // International

The story

In a massive project, a huge team of journalists probed the use of offshore tax havens to expose government officials, wealthy citizens and ponzi schemers to hide money for financial gain.

How they did it

In November 2012, investigators at La Nacion Costa Rica anonymously received a storage device with millions of data points spread over 320 spreadsheets in multiple formats and no data dictionary. They analyzed the data and spent months rebuilding and collaborated across newsrooms to figure out what they contained.

Puzzle or secret?

Types of tools

Two major categories


Allows visualizations of stories, sharing with audiences

  • Often provide embed code
  • Can be maps, audio elements or other interactives
  • Often lets your audience explore further


Help find the seed of a story by spotting patterns, trends

  • Convert documents into easier-to-handle formats
  • Clean dirty data
  • Allows for deeper analysis by reporters to suss out leads


Price: FREE

Requirements: None

  • Organization of time-based events
  • Visualization of long-term chronology
  • Makes searching for patterns/trends easier

Google Drive

Price: FREE

Requirements: Google account

  • Document storage, collaboration & sharing
  • Limited optical character recognition
  • Powerful Excel-like spreadsheet tools


Price: FREE (premium version available)

Requirements: None

  • PDF conversion to Excel, txt, html and more
  • File sharing (expires)
  • Optical character recognition (premium)


Price: FREE

Requirements: None


Price: FREE

Requirements: Must be a working journalist

  • Automatic optical character recognition
  • Document annotation
  • Entity Extraction
  • Collaboration and sharing

Surveillance catalog

Fusion Tables

Price: FREE

Requirements: Google account

  • Simple joins with existing datasets
  • Allows embeddable mapping of data

State, Feds Move to Increase Hospital Price Transparency


Price: FREE

Requirements: None

  • Computer-assisted categorization of large document sets
  • Allows reporters to quickly read and tag docs
  • Syncs with DocumentCloud account

Putting it all together



Contact me


Get this presentation: