27
Big Data and Big Analytics – Why, what and how

Big Data and Big Analytics - Why, what and how

Embed Size (px)

DESCRIPTION

Webinar Presentation from 2013-08-30 An introduction to what Big Data and Big Analytics can be used for and why it is relevant for your. Includes real life samples and ideas and concludes with a look at InfiniDB

Citation preview

Page 1: Big Data and Big Analytics - Why, what and how

Big Data and Big Analytics –

Why, what and how

Page 2: Big Data and Big Analytics - Why, what and how

Agenda

• Big Data and Big Analytics – What is it?

• Big Analytics vs. the Data Warehouse?

• Big Analytics examples

• Database technologies for Big Analytics

• Questions and Answers

Page 3: Big Data and Big Analytics - Why, what and how

What is Big Data?

• Big Data is data that is not immediately related to my own business

• Big Data is largely unstructured

• Big Data consists of data from many different sources, such as Facebook,Twitter web-pages, blogs and any other source you can find

• Big Data is all about volume and analysis!

Page 4: Big Data and Big Analytics - Why, what and how

Because you want to grow your

business!

• You can get customers from your competitors

– The data on these customers are not in your CRM!

– Why did they go with someone else than with you? Your Data Warehouse has few answers to this!

• You can grow the market

– Those new customers are not in your CRM or Data Warehouse either, to a large extent!

• You can do both of these!

Page 5: Big Data and Big Analytics - Why, what and how

Why do I need all this data

• “My Data Warehouse tells me all I ever want to know, in gruesome detail, about my customers, what more do I need?”

• “I get much more data from my CRM system than I do from friggin’ Facebook!”

• “Why would I need all those pictures from Facebook and all those twitter texts, they tell me nuthin’!”

Page 6: Big Data and Big Analytics - Why, what and how

What is Big Analytics

• To get insights from Big Data, you need a more powerful analysis: Big Analytics

• Big Analytics often cannot relyon simple BTREE indexes

• Big Analytics providesexponentially better accuracythe more data you have

Page 7: Big Data and Big Analytics - Why, what and how

What is Big Analytics useful for?

• For getting information on thingsin the “outside world”

– My competitors

– My competitors customers

• For foreseeing trends

– What will be “the next big thing” in my business?

– What new markets are developing?

– What is happening in my current market?

Page 8: Big Data and Big Analytics - Why, what and how

Big Data, Analytics and Insights!

Big Data

Big Analytics

Big Insights!

Page 9: Big Data and Big Analytics - Why, what and how

Big Analytics use cases

• The higher the volume of your business, the more useful Big Data becomes

– If you have very few customers, Big Data might be less useful

• Retail is a common use case, but there are many more

– Finance – Big Data trend analysis

– Intelligence – Analysis of new and unknown trends and loosely tied groups

– Politics – What is my competition up to?

Page 10: Big Data and Big Analytics - Why, what and how

Big Analytics vs. Data Warehouse

• Your Data Warehouse is very focused and contains high quality information on low level data:“John Doe bought Chocko Chocolate Chip Cookies for $3.61 on Jan 12 2013”

• Big Data provides much more data, but each information item has less detail to it:“Chocko Chocolate Chip Cookies suck!”“An increasing amount of people tweet about Chocolate Chip Cookies”

Page 11: Big Data and Big Analytics - Why, what and how

Big Analytics vs. Data Warehouse

• What Big Analytics lack in terms of data item correctness can be compensated for by:

– Volume: If more than 200.000 tweets agree that our Chocko cookies suck, then we should probably look into it.

– Proper analysis: Images can be analyzed for content and stuff you didn’t think about: Maybe “Ma Cookies” brand cookies has an edge on us in that their packaging looks more pleasing? Do we see “Ma Cookies” being eaten in unexpected places or at unexpected times?

Page 12: Big Data and Big Analytics - Why, what and how

Big Analytics - Linguistic analysis

• This is for tweets, blogs, Facebook and similar.Proper linguistic analysis is complex:

– Sentiment“Ma Cookies might seems like they suck, but they are actually quite tasty”

– Temporal“In January 2011 we wrote that Chocko Cookies used to taste like manure in 2008, but that they have improved since then”

– Ranking

– Really complex for larger blocks of text

Page 13: Big Data and Big Analytics - Why, what and how

Other types of Big Analytics

• Image analysis is a fast developing field, where we find new and interesting use cases

– What are the most popular colors?

– What color has peoples clothes?

– How long has that suitcase been standing at the floor at the airport?

• Location analysis

– Where did this happen?

– In what city is that? What country?

• Temporal analysis

– When did this happen? When was it published?

Page 14: Big Data and Big Analytics - Why, what and how

New Visualizations for New Insights

• Visualizing data as a report with columns and rows isn’t always effective

• With new and diverse types of data, we need new ways of visualizing data

– Location on maps

– Timelines

– Sentiments

• Even with traditional Data Warehouse data, new visualizing can provide new insights!

• Interactive visualizations

Page 15: Big Data and Big Analytics - Why, what and how

Big Analytics and Visualization examples

Page 16: Big Data and Big Analytics - Why, what and how

What is Mitt Romney talking about?

Page 17: Big Data and Big Analytics - Why, what and how

Map Visualization – Android or iOS

Visualizations by MapBox

• Smartphone OS metadata in Geography view

– iPhone is Red, Android is Green

– Based on data from Verizon passed to NSA

Page 18: Big Data and Big Analytics - Why, what and how

Big Analytics database issues

• Big Analytics is complex!

• Big Analytics doesn’t always allow the “analyze-once-find-later” attributeof a classic index!

• Big Analytics is compute intensive

• Big Analytics needs someprogramming. Yikes!

Page 19: Big Data and Big Analytics - Why, what and how

Map-Reduce to the rescue

• Map-Reduce allows distributed processing on large amounts of data

– Map – Algorithm to distribute data across nodes

– Reduce – Algorithm to aggregate data from the nodes

• Hadoop is the best known and used Map-Reduce framework

• Map and Reduce still must be developed

• But we still need some kind of database

Page 20: Big Data and Big Analytics - Why, what and how

So, what we need is an Analytical

Database

• Support for complex analysis

• Support for distributed, parallel processing (Map-Reduce for example)

• Support for storing and processing massive amounts of data

• Some kind of cool index technology that work with big data, both reads and writes

– Or maybe. A scary idea just came to me…

Page 21: Big Data and Big Analytics - Why, what and how

No indexes! Because you don’t

need or want them!

• What! What’s wrong with good old BTREEs?

– They are not well suited to Big Data!

– Their usefulness slows down as data grows

– Updates slow down significantly as the tree grows!

– Skewed data is doesn’t work well

• SPATIAL? FREETEXT? HASH? BITMAP?

– These are either too specialized or lacks the functionality we need

Page 22: Big Data and Big Analytics - Why, what and how

Calpont InfiniDB

Real-time, Consistent Query Performance

Linear Scale for Massive Data

Removes Limits to Dimensions and Granularity

Easy to Deploy and Maintain

Page 23: Big Data and Big Analytics - Why, what and how

Tiered Query Execution

•User Module – Processes SQL Requests

•Performance Module – Executes the Queries

or

Single ServerMPP

Page 24: Big Data and Big Analytics - Why, what and how

Map-Reduce for Powerful Analytics

SQL Operations are mapped to Performance Module threads

• Parallel/Distributed Data Access

• Parallel/Distributed Joins (Inner, Outer)

• Parallel/Distributed Sub-queries (From, Where, Select)

• Parallel/Distributed Group By, Distinct, and Aggregation

• Extensible with Parallel/Distributed User Defined Functions

Results are returned to User Module in Reduce Phase

Map Reduce

Page 25: Big Data and Big Analytics - Why, what and how

Calpont InfiniDB

• Support for Amazon EC2

– Full EBS support

– Prepackaged AMIs for ease of provisioning

• Hadoop connector

• Multiple parallel loadoptions

• Available now!

Page 26: Big Data and Big Analytics - Why, what and how

• This is true of analytics in general, but particularly true when working with Big Analytics

• The more data you have, the morerelevant questions you can ask

• The more questions you ask, the moreyou know

• The more you know, the more questionsyou can ask

• The wider the range of data you have, the wider questions can be asked

If you think you have all the right answers,

you haven’t asked all the right questions

Page 27: Big Data and Big Analytics - Why, what and how

Questions? Answers!The question is not “What is

the answer?”, the question is

“What is the question?”.

Henri Poincaré