A peek into the future

A PEEK INTO THE FUTURE OF DATA

ORMNoSQLBig Data

Presented by:PRATEEK CHAUHAN10ESKCS738

BEFORE STARTING….

• Are relational tables the most efficient way to manage data?

• Do companies like Facebook, Twitter really use traditional relational DBMS to manage data?

ORM

OBJECT RELATIONAL MAPPING

O

R

M

PART 1

WAYS TO ACCESS DATABASE

• Using a GUI based DBMS• Using a console based DBMS• Using database embedded with applications

(most important).

THE BRIDGE ?

APPLICATION PROGRAMMING

INTERFACE(API)

DATABASE

THE BRIDGE

THE BRIDGE: JDBC•Standard Java API for database-independent connectivity between the Java programming language and a wide range of databases.

•JDBC provides a flexible architecture to write a database independent applications that can run on different platforms and interact with different DBMS without any modification.

•JDBC includes APIs for each of the task commonly associated with database usage:

Making a connection to a database.Creating SQL statements.Executing SQL queries in the database.Viewing & modifying the resulting records.

JDBC

Pros of JDBC• Clean and simple SQL

processing• Good performance with

small data• Very good for small

applications• Simple syntax so easy to

learn

Cons of JDBC• Complex if it is used in large

projects• Large programming

overhead• No encapsulation• Hard to implement MVC

concept• Query is DBMS specific

The Problem

The Problem

• Mapping member variables to columns • Mapping Relationships• Handling data types (esp. Boolean)

• Managing changes to object state

The Problem

Object

Relational

Mapping!

Saving without ORM

• Database Configuration• The Model Object• Service method to create the model object• Database Design• DAO method to save the object using SQL

queries

The ORM Way

• JDBC Database Configuration – ORM specific Configuration

• The Model object – Annotations• Service method to create the model object –

Use the ORM framework API API• Database Design – Not Needed ! • DAO method to save the objects using SQL

queries – Not Needed !

THE ONLY DISADVANTAGE

• Boilerplate code => XML configuration files => XML system files => Extra classes like POJO, etc.

PART 2

NoSQL: THE NAME

• SQL: In general, “Traditional Relational DBMS”.

• Past decade: RDBMS isn’t the best solution.

• NoSQL: “No SQL”=> Not using traditional RDBMS

ISSUES WITH RDBMS

• Primary issue: big package, has all the features, but sometimes we don’t need all of them:COMPROMISE

S

•Convenient•Multi-user

SIMILAR

•Safety•Persistent

BOOSTS•Reliable•MASSIVE (big data)•Efficient

NoSQL SYSTEMSAlternative to traditional RDBMS

Pros• Flexible Schema• Quicker/ Cheaper to

setup• Massive scalability:

handle big data

• Relaxed Consistency: higher performance & availability

Cons•No declarative query

language: more programming

•Relaxed Consistency: fewer guarantees

Example: Social-Network Graph

Each record: User ID1, User ID2 …Separate records: User Id, name, age, gender …

A

C F

D

B

E

G

J

I

L

K

H

Example: Social-Network Graph

• TASK: Find all friends of given users.

• TASK: Find all friends of friends of given user.

• TASK: Find all women friends of men friends of given user.

• TASK: Find all friends of friends of…. friends of given user.

INCARNATIONS OF NoSQL

• MapReduce Framework: OLAP (big operations)

• Key-Value Store: OLTP (small operations)

• Document Stores

• Graph database systems

MapReduce Framework

• Originally from Google, open source: Hadoop.• Two main functions:

1. Map: divides the problem into sub problem.2. Reduce: operates upon the sub problems and

combines output to give record.• Current implementations:

1. Hive: SQL like language2. Pig: statement language

Graph Database Systems•Data Model: nodes and edges.•Nodes may have properties.•Edges may have labels or roles.•Example: neo4j, FlockDB, Pregel

ID: 1

ID: 2

ID: 3

Likes Likes

Friends

Friends

PART 3

AGAIN, SOME QUESTIONS…• What is the maximum file size you’ve dealt so

far?• What is the maximum download speed you

get?• How much time required to just transfer data?

What is Big Data?• Every day, we create 2.5 quintillion bytes of data — so

much that 90% of the data in the world today has been created in the last two years alone.

• From the beginning of recorded time until 2003, We created 5 billion gigabytes (exabytes) of data.

• In 2011, the same amount was created every two days• In 2013, the same amount of data is created every 10

minutes.THIS IS “BIG DATA”

What is Big Data?-FINALLY..

• Big- Data’ is similar to ‘Small-data’ but bigger

• But having data bigger it requires different approaches:– Techniques, tools, architecture

• With an aim to solve new problems–Or old problems in a better way

Type of Data• Relational Data (Tables/Transaction/Legacy

Data)• Text Data (Web)• Semi-structured Data (XML) • Graph Data– Social Network, Semantic Web (RDF), …

• Streaming Data – You can only scan the data once

What to do with these data?

• Aggregation and Statistics – Data warehouse and OLAP

• Indexing, Searching, and Querying– Keyword based search – Pattern matching (XML/RDF)

• Knowledge discovery– Data Mining– Statistical Modeling

MARKET SIZE

Big Data Analytics Technologies

• NoSQL: non-relational database solutions such as Hbase, Cassandra, MongoDB, Riak, CouchDB, and many others.

• Hadoop: It is an ecosystem of software packages, including MapReduce, HDFS, and a whole host of other software packages.

Summarizing…

• Key enablers for the appearance and growth of ‘Big-Data’ are:+ Increase in storage capabilities+ Increase in processing power+Availability of data

THANK YOU

Education

A peek into the future