Upload
craig-silverman
View
5.931
Download
2
Embed Size (px)
Citation preview
Accuracy Tips for Data Journalism
I’m Not a Data Journalist, So…
I spoke to three really good ones:• James Ball, special projects editor, The
Guardian• Tasneem Raja, interactives editor, Mother
Jones• Sarah Cohen, CAR team editor, The New York
Times
This is What They Told Me
“There is a huge difference between records and statistics … I really want to work with records
— I don’t want to work with statistics.” — Sarah Cohen
Interview the Data
• “The first thing looking for is completeness. Do I have everything I was supposed to get? If I know 3 million deportations over X number of years, then there should be 3 million rows.” — Cohen
• Compare it to the form it was generated from.• “Look for typos if things have been manually entered” —
Ball• Are there blank rows or columns? Are the duplicate rows
or columns? Have dates/time been rendered correctly?• Are the data consistent within rows/columns?
Interview the Data cont…
• “Looking for missing things as opposed to things that are there.” — Cohen
• Does it answer the question, you had when you decided to ask for the data?
• Find an expert. “There are people out there who know datasets really well and what they should be saying. They’re going to be able to tell you whether you made some fundamental flaw in your logic.” — Cohen
Beware of Spreadsheet Limits
• Excel 2003: 65,536 rows by 256 columns.• Excel 2007/2010: 1,048,576 rows by 16,384
columns• Google Docs: 256 columns
Rules of Thumb
• “Having a good set of rules of thumb and applying them can prevent a lot of horror shows.” — Ball
• Collect credible, relevant data to compare with what you’ve gathered.
Standardize the Data
• “We’ve trained all of our reporters to use Google spreadsheets … We need a method of collecting data that’s not a barrier to entry for non-programmer people like fact checkers, copy editors and editors.”— Raja
Look for Outliers
• Visualize your data in multiple ways to have fresh eyes (bar, line, pie etc.)
• “I’ll make a visualization in 30 different ways and I’ll look at them on different scales…. Try to look at it in so many different ways that it’s not possible for you to have just gotten tunnel vision.” — Cohen
• What doesn’t look right?• What’s not the way you expected?• “Anytime you see a big outlier or something
counterintuitive assume its probably because something has gone wrong.” — Ball
Visualize with Care
• “Be really careful on visualizing data. It loses all the ambiguity. Most look at a graphic and very few read the small print at the bottom. If it’s in a graphic it gives sense of urgency.” — Ball
Annotate/Describe
• “I tend to go back over all my notes and the programs and the interviews and make sure there is nothing I had internalized but forgot I should tell people about it” — Cohen
• “Nerd box.”• Try to get your data details/caveats into the
main piece, too.
Data Diary
• “Essentially a report’s notebook that describes the whole process of where does this data live, how did we clean it and merge it, what APIs did we use, what calls did we send out to the API ...” — Raja
Publish the Data
• “There’s a big risk of being a black box and saying, ‘We have great data and here’s the core result’ and then not putting it up.” — Ball
• More eyes = better information.• Helps you get more data.• Make corrections.
General Tips
• Be aware of the tendency to oversimplify or magnify data.
• Data are just like any other facts: you have to verify them.• Establish credible rules of thumb for your data.• Visualize data in multiple ways to get a sense of what you
have.• What you do with data also has to be verified.• Be aware of the tools you use, and how they can affect
accuracy.• Explain your methods and the data, and share the data.
Additional Reading
All here: http://bitly.com/bundles/silverman/1
• https://github.com/nikeiubel/data-smells• https://source.opennews.org/en-US/learning/
times-regrets-programmer-error/• http://onlinejournalismblog.com/2013/09/20
/ethics-in-data-journalism-automation-feeds-and-a-world-without-gatekeepers/
• https://source.opennews.org/en-US/learning/statistically-sound-data-journalism/
Accuracy Fundamentals
• Don’t assume, verify.• Mistakes are natural and a byproduct of how
journalists/humans work.• Just because someone official tells you/gives
something, it doesn’t mean its accurate.• Develop and consult credible sources.• Verification is a team sport.