Upload
open-data-institute
View
224
Download
0
Embed Size (px)
Citation preview
8/10/2019 How to polish your (open) data - introducing Open Refine
1/13
November, 2014 | Ulrich Atz @statshero
How to polish your data
8/10/2019 How to polish your (open) data - introducing Open Refine
2/13
Introductions
Your name Where youve come from
Role
Your aims/expectations for thissession.
8/10/2019 How to polish your (open) data - introducing Open Refine
3/13
Explore tools to validate and clean data
ensure high quality of data and its usag
8/10/2019 How to polish your (open) data - introducing Open Refine
4/13
Understand and identify the most common e
occur in data.
Clean datasets ready for further processing.
Perform some simple analysis of data using oshelf tools.
O
8/10/2019 How to polish your (open) data - introducing Open Refine
5/13
8/10/2019 How to polish your (open) data - introducing Open Refine
6/13
Data is messy
8/10/2019 How to polish your (open) data - introducing Open Refine
7/13
Discussion
What are common problems indatasets?
8/10/2019 How to polish your (open) data - introducing Open Refine
8/13
Some common problems
Dates e.g. British and American date formats (7/1and 12/31/2012)
Multiple representations (e.g Vice-President Marketing and VP Mar
Duplicates Identical records (rows) appear more than
Summation records E.g. a row of column sums
Mixed use of scales E.g. hours and days in a timesheet
Missing data Empty cells, invalid strings, missing record
Numeric ranges E.g. 18-25 years or equally date ranges
Spelling errors pubic inquiry
8/10/2019 How to polish your (open) data - introducing Open Refine
9/13
OPEN RE
8/10/2019 How to polish your (open) data - introducing Open Refine
10/13
Introducing Open Refine
https://code.google.com/p/google-refine/
8/10/2019 How to polish your (open) data - introducing Open Refine
11/13
1. Go to http://bit.ly/odi-stories2
2. Task 3: Import UK property prices, price paid
(September 2014)
3. Work your way through the list of Things to
Ex
8/10/2019 How to polish your (open) data - introducing Open Refine
12/13
1.
Keep a copy of the original dataset
2. Record your steps
3. Allow enough time for data cleaninand validation
Leading
8/10/2019 How to polish your (open) data - introducing Open Refine
13/13
Thank you!