Upload
claude-g-theoret
View
602
Download
0
Embed Size (px)
Citation preview
Every minute 8-10 months ago:
• 48 hours of video are downloaded on Youtube
• 320 new accounts and 98,000 tweets appear on Twitter
• 168,000,000 million emails are sent
• 20,000 new posts on Tumblr
• 6,600 photos appear on Flickr
• Over 20% of all websites are CMS/wordpress/etc…
Every minute today:
• 60 hours of video are downloaded on Youtube
• ??? new accounts and 236,000 tweets appear on Twitter
• 204,000,000 million emails are sent
• 28,000 new posts on Tumblr
• 1,600 photos appear on Flickr !!! No shit!
But…
• Facebook has lost 1.5 million users in Canada and 6 million in the United States
• Yahoo study: 50% of the content that is read and shared by humans is produced by only 20, 000 accounts 0.05%
@cgtheoret
In a lot of ways “Big Data” is like Oil…
• Can’t be used by consumers unless refined
• More expensive at every step of refinement
@cgtheoret
In a lot of ways “Big Data” is like Oil…
• Difficult and expensive to extract
• Difficult and expensive to store and distribute
• Cheapest in its unrefined form
• More expensive at every step of refinement
• Produces a plethora of derived products
• and it’s actually quite “dirty”!!!!
@cgtheoret
Social Data is one of the reasons why IBM added a 4th V to the Big Data Definition
VERACITY
@cgtheoret
6 factors affect Data Veracity …
1. Accuracy: Is it true?
2. Precision: If true, error margin?
3. Reliability: Is it there all the time?
4. Provenance: Can you trace the source?
5. Fidelity: Did it change from the source?
6. Permission: Can you use it for the context?
@cgtheoret
The Big Guys are now doing Veracity …
MuraliKrishnam<[email protected]>MuraliKrishnam<[email protected]>
@cgtheoret
@cgtheoret
Merci!