Upload
prateek-chauhan
View
86
Download
2
Embed Size (px)
DESCRIPTION
The presentation is divided into 3 parts: 1. ORM- Object Relational Mapping 2. NoSQL 3. Big Data
Citation preview
A PEEK INTO THE FUTURE OF DATA
ORMNoSQLBig Data
Presented by:PRATEEK CHAUHAN10ESKCS738
BEFORE STARTING….
• Are relational tables the most efficient way to manage data?
• Do companies like Facebook, Twitter really use traditional relational DBMS to manage data?
ORM
OBJECT RELATIONAL MAPPING
O
R
M
PART 1
WAYS TO ACCESS DATABASE
• Using a GUI based DBMS• Using a console based DBMS• Using database embedded with applications
(most important).
THE BRIDGE ?
APPLICATION PROGRAMMING
INTERFACE(API)
DATABASE
THE BRIDGE
THE BRIDGE: JDBC•Standard Java API for database-independent connectivity between the Java programming language and a wide range of databases.
•JDBC provides a flexible architecture to write a database independent applications that can run on different platforms and interact with different DBMS without any modification.
•JDBC includes APIs for each of the task commonly associated with database usage:
Making a connection to a database.Creating SQL statements.Executing SQL queries in the database.Viewing & modifying the resulting records.
JDBC
Pros of JDBC• Clean and simple SQL
processing• Good performance with
small data• Very good for small
applications• Simple syntax so easy to
learn
Cons of JDBC• Complex if it is used in large
projects• Large programming
overhead• No encapsulation• Hard to implement MVC
concept• Query is DBMS specific
The Problem
The Problem
• Mapping member variables to columns • Mapping Relationships• Handling data types (esp. Boolean)
• Managing changes to object state
The Problem
Object
Relational
Mapping!
Saving without ORM
• Database Configuration• The Model Object• Service method to create the model object• Database Design• DAO method to save the object using SQL
queries
The ORM Way
• JDBC Database Configuration – ORM specific Configuration
• The Model object – Annotations• Service method to create the model object –
Use the ORM framework API API• Database Design – Not Needed ! • DAO method to save the objects using SQL
queries – Not Needed !
THE ONLY DISADVANTAGE
• Boilerplate code => XML configuration files => XML system files => Extra classes like POJO, etc.
PART 2
NoSQL: THE NAME
• SQL: In general, “Traditional Relational DBMS”.
• Past decade: RDBMS isn’t the best solution.
• NoSQL: “No SQL”=> Not using traditional RDBMS
ISSUES WITH RDBMS
• Primary issue: big package, has all the features, but sometimes we don’t need all of them:COMPROMISE
S
•Convenient•Multi-user
SIMILAR
•Safety•Persistent
BOOSTS•Reliable•MASSIVE (big data)•Efficient
NoSQL SYSTEMSAlternative to traditional RDBMS
Pros• Flexible Schema• Quicker/ Cheaper to
setup• Massive scalability:
handle big data
• Relaxed Consistency: higher performance & availability
Cons•No declarative query
language: more programming
•Relaxed Consistency: fewer guarantees
Example: Social-Network Graph
Each record: User ID1, User ID2 …Separate records: User Id, name, age, gender …
A
C F
D
B
E
G
J
I
L
K
H
Example: Social-Network Graph
• TASK: Find all friends of given users.
• TASK: Find all friends of friends of given user.
• TASK: Find all women friends of men friends of given user.
• TASK: Find all friends of friends of…. friends of given user.
INCARNATIONS OF NoSQL
• MapReduce Framework: OLAP (big operations)
• Key-Value Store: OLTP (small operations)
• Document Stores
• Graph database systems
MapReduce Framework
• Originally from Google, open source: Hadoop.• Two main functions:
1. Map: divides the problem into sub problem.2. Reduce: operates upon the sub problems and
combines output to give record.• Current implementations:
1. Hive: SQL like language2. Pig: statement language
Graph Database Systems•Data Model: nodes and edges.•Nodes may have properties.•Edges may have labels or roles.•Example: neo4j, FlockDB, Pregel
ID: 1
ID: 2
ID: 3
Likes Likes
Friends
Friends
PART 3
AGAIN, SOME QUESTIONS…• What is the maximum file size you’ve dealt so
far?• What is the maximum download speed you
get?• How much time required to just transfer data?
What is Big Data?• Every day, we create 2.5 quintillion bytes of data — so
much that 90% of the data in the world today has been created in the last two years alone.
• From the beginning of recorded time until 2003, We created 5 billion gigabytes (exabytes) of data.
• In 2011, the same amount was created every two days• In 2013, the same amount of data is created every 10
minutes.THIS IS “BIG DATA”
What is Big Data?-FINALLY..
• Big- Data’ is similar to ‘Small-data’ but bigger
• But having data bigger it requires different approaches:– Techniques, tools, architecture
• With an aim to solve new problems–Or old problems in a better way
Type of Data• Relational Data (Tables/Transaction/Legacy
Data)• Text Data (Web)• Semi-structured Data (XML) • Graph Data– Social Network, Semantic Web (RDF), …
• Streaming Data – You can only scan the data once
What to do with these data?
• Aggregation and Statistics – Data warehouse and OLAP
• Indexing, Searching, and Querying– Keyword based search – Pattern matching (XML/RDF)
• Knowledge discovery– Data Mining– Statistical Modeling
MARKET SIZE
Big Data Analytics Technologies
• NoSQL: non-relational database solutions such as Hbase, Cassandra, MongoDB, Riak, CouchDB, and many others.
• Hadoop: It is an ecosystem of software packages, including MapReduce, HDFS, and a whole host of other software packages.
Summarizing…
• Key enablers for the appearance and growth of ‘Big-Data’ are:+ Increase in storage capabilities+ Increase in processing power+Availability of data
THANK YOU