38
Chapter 2 The 4 th V: Data Veracity

Infopresse veracity

Embed Size (px)

Citation preview

Chapter 2

The 4th V: Data Veracity

@cgtheoret

@cgtheoret

Every minute 8-10 months ago:

• 48 hours of video are downloaded on Youtube

• 320 new accounts and 98,000 tweets appear on Twitter

• 168,000,000 million emails are sent

• 20,000 new posts on Tumblr

• 6,600 photos appear on Flickr

• Over 20% of all websites are CMS/wordpress/etc…

Every minute today:

• 60 hours of video are downloaded on Youtube

• ??? new accounts and 236,000 tweets appear on Twitter

• 204,000,000 million emails are sent

• 28,000 new posts on Tumblr

• 1,600 photos appear on Flickr !!! No shit!

@cgtheoret

@cgtheoret

@cgtheoret

@cgtheoret

@cgtheoret

But…

• Facebook has lost 1.5 million users in Canada and 6 million in the United States

• Yahoo study: 50% of the content that is read and shared by humans is produced by only 20, 000 accounts 0.05%

@cgtheoret

@cgtheoret

@cgtheoret

Gartner is predicting an explosion in Social Media Analytics It spending

@cgtheoret

@cgtheoret

In a lot of ways “Big Data” is like Oil…

• Difficult and expensive to extract

@cgtheoret

Difficult and expensive to extract

@cgtheoret

Difficult and expensive to store and distribute

@cgtheoret

Cheapest (and least useful) when its unrefined

@cgtheoret

@cgtheoret

@cgtheoret

In a lot of ways “Big Data” is like Oil…

• Can’t be used by consumers unless refined

• More expensive at every step of refinement

@cgtheoret

The Market is Producing a plethora of derived higher value data products

@cgtheoret

@cgtheoret

In a lot of ways “Big Data” is like Oil…

• Difficult and expensive to extract

• Difficult and expensive to store and distribute

• Cheapest in its unrefined form

• More expensive at every step of refinement

• Produces a plethora of derived products

• and it’s actually quite “dirty”!!!!

@cgtheoret

Social Data Analytics = Oil Refineries

@cgtheoret

Social Data is one of the reasons why IBM added a 4th V to the Big Data Definition

VERACITY

@cgtheoret

6 factors affect Data Veracity …

1. Accuracy: Is it true?

2. Precision: If true, error margin?

3. Reliability: Is it there all the time?

4. Provenance: Can you trace the source?

5. Fidelity: Did it change from the source?

6. Permission: Can you use it for the context?

@cgtheoret

Black Hat SEO : Blogs

Black Hat Social Marketing : Twitter

Twitter: 50% of brand followers are bots

Or in some cases over 90 %…

Dissapearing Romney: FB as well…

Trying to solve the Veracity problem …

Trying to solve the Veracity problem …

The Big Guys are now doing Veracity …

MuraliKrishnam<[email protected]>MuraliKrishnam<[email protected]>