Big data hype or reality

Preview:

DESCRIPTION

Big Data is een hype. Je hoort er iedereen mee zwaaien als de Big Thing van vandaag en tot morgen. Ondanks deze Buzz is het voor ons technische mensen meer en meer een realiteit. Het zal weldra zijn vaste plaats hebben in onze gereedschapskist. In deze sessie bekijken we wat Big Data echt is en wat je moet weten om de Big Data vragen van je klant technisch te beantwoorden. Naast de betekenis, de verscheidene disciplines, een overzicht en architectuur gaan we ook een aantal technologieen kort van dichtbij bekijken. - Hadoop, de computing engine, de omgeving en al zijn sattelieten. - Neo4j, de graph database. - ElasticSearch, de search database.

Citation preview

Agenda

• What is Big Data?

• Technology Radar

• Technologies in scope.

• Architecture

• Wanted!

• Next steps.

The world of data is changing.

Data has a chaotic nature.

Big Data <> Big DataBig Data == Big in Data.

Big Data = 4 V’s.

Volume = Dealing with the size.

Variety =

Handling the multiplicity of types, sources and formats.

Velocity=

Reacting to the flood of information in the time required by the application.

Veracity =

How can we cope with uncertainty, imprecision, missing values or untruths.

Big Data 1.0=

Building the capabilities to process large dataIn support of their current operations

(efficiency improvement).

Big Data 2.0=

What can I now do that I couldn’t do before, or do better then I could do before.

Polyglot persistence

• Relational databases are not dead.

• Enterprises should expect multiple data-storage technologies for different applications.

• Even for a single application, polyglot persistence is good.

• Do not replace one database solution with another to expect wonders.

Technologies in the picture

• Hadoop and technologies build on top of it.

• ElasticSearch.

• neo4J.

Hadoop

• Apache Foundation

• Commercial solutions

• Hortonworks

• Cloudera

• MapR

And many more...

ElasticSearch

• Based on lucene.

• ElasticSearch is also the name of the company.

• Search, analyze and index in realtime.

• Distributed.

• High availability.

• Document-oriented.

• Schema free

• RESTful api

neo4j

• Graph database.

• Ideal for metadata and relationships.

• Not for large content.

• Not for large graphs.

Polyglot persistence

• Relational databases are not dead.

• Enterprises should expect multiple data-storage technologies for different applications.

• Even for a single application, polyglot persistence is good.

• Do not replace one database solution with another to expect wonders.

Next steps.

Learning and case study group.

I need datastores:- Openstreetmap.- NASA- ....

Q&A