Introduction to Big Data

Preview:

Citation preview

big data So What?

12 October 20161

Who am I?• Software guy

• Technology leader with experience in software development as CTOs and development managers of mid-sized teams.

• Doing big data hands-on since 2009

• Running http://meetup.com/bigdatabe since 2011 (1700 members!)

2

@wimvanleuven wim@bigboards.io

3

4

“Big data is data that exceeds the processing capacity of conventional database systems.

The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures.”

5

–Edd Dumbill, O’Reilly

What is big data?

http://radar.oreilly.com/2012/01/what-is-big-data.html

…too big…6

IOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIOIOIOIIOIIOIOIOIIIOIOOOOIOIOOIIOIIIIIOIIOIIOIOIOIO

… moves to fast …7

8

… doesn’t fit …9

What is Big Data not?• not a delivery model (on-premise vs hosted vs

cloud vs IaaS/PaaS/SaaS vs serverless)

• not a deployment model (private, public, hybrid)

• not a revenue model (license vs subscription vs Pay-as-you-Go)

• not software architecture

10

“We don’t do Hadoop because we have Big Data; we do Big Data because we have

Hadoop.”

11

–Unknown developer, Facebook

What is Big Data? — revisited

New tools and technologies to capture and process data on a cluster of commodity

hardware so that the system acts as one, is resilient to failures and scales linearly.

12

What is Big Data? — revisited

Big Data is no panacea13

• First decide what problem you want to solve; pick a real business problem to add immediate value

• Start small, the technology is made for linear scalability (a 3-node cluster is a cluster!)

• Then become lean: learn through experimentation

Big Data challenges• Beware of hype, Big Data - washing and fad

• Tech infancy

• IT | Biz

• Data is hard

• Lack of skills!

14

Benefits

• Scalability of course

• Collect more and more data

• Robustness inherent to the setup

• More predictable performance

15

16

Questions?

17

Co-existence

BigData

View

ESB

App

ETL

DFS18

1

2

3

4

5

2

4

5

1

2

5

1

3

4

2

3

5

1

3

4

Node A Node B Node C Node D Node E

MapReduce19

4

5

3

2

1

Node A

Node B

Node C

Node D

Node E

Map Shuffle Reduce

x y z

𝛌20

𝛋21

3

1

2

45

22

Q&A

Recommended