If you can't read please download the document
Upload
indicthreads
View
4.925
Download
0
Embed Size (px)
Citation preview
Gaurav Kohli
Xebia
Breaking with DBMS and Dating with
Relational Hbase
me
Gaurav Kohli
[email protected]
About Consultant
Xebia IT Architects
Why are we here ?
Something about RDBMS
Limitations of RDBMS
Why Hbase or any NoSql solution
Overview of Hbase
Specific Use cases
Paradigm shift in Schema Design
Architecture of Hbase
Hbase Interface Java API, Thrift
Conclusion
Agenda
Databases
Relational
Relational Databases have a lot of
limitations
LimitationsData Set going into PetaBytes
RDBMS don't scale inherentlyScale up/Scale out ( Load Balancing + Replication)
Hard to shard / partition
Both read / write throughput not possibleTransactional / Analytical databases
Specialized Hardware ... is very expensiveOracle clustering
ReplicationMaster Slave
Master
Slave
Replication
Scaling Out
Master - Many Slave
Scaling Out MySQL master becomes a problem
All Slaves must have the same write capacity as master
Single point of failure, no easy failover
Master
Reads
Writes
Slave nodes
Dual Master
Master
Master
Slave
Replication
NoSQL
2006.11Google releases paper on BigTable
2007.2Initial HBase prototype created as Hadoop contrib.
2007.10First usable HBase
2008.1Hadoop become Apache top-level project and HBase becomes subproject
2010.5~Hbase becomes Apache top-level project
2010.6 Hbase 0.26.5 released.
2010.10 HBase 0.89.2010092 third developer release
Background
Distributeduses HDFS for storage
Column-Oriented
Multi-Dimensionalversions
High-Availability
High-Performance
Storage System
Hbase
A Sql DatabaseNo Joins, no query engine, no datatypes, no sql
No Schema
Denormalized data
Wide and sparsely populated data structure(key-value)
No DBA needed
Hbase is
Not
Bigness Big data, big number of users, big number of computers
Massive write performanceFacebook needs 135 billion messages a month
Twitter stores 7 TB data per day
Fast key-value access
Write availability
No Single point of failure
Use Case
Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc.
Real-time inserts, updates, and queries.
Fraud detection by comparing transactions to known patterns in real-time.
Analytics - Use MapReduce, Hive, or Pig to perform analytical queries
Specific
Use Case
Column-oriented database
Table are sorted by Row
Table schema only defines Column familiescolumn family can have any number of columns
Each cell value has a timestamp
Storage Model
Storage Model
Storage Model
Storage ModelSorted Map(RowKey, List(SortedMap(Column, List(value, Timestamp
)
)
)
)SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))
A BIG SORTED MAP
Row Key+ Column Key + timestamp => value
Row KeyColumn KeyTimestampValue
1info:name1273516197868Gaurav
1info:age127387182418428
1info:age127387182302234
1info:sex1273746281432Male
2info:name1273863723227Harsh
3Info:name1273822456433Raman
2 Versionsof this row
Timestamp is a long value
Column Qualifier/Name
Sorted by Row key andcolumn key
Column family
Schema DesignStudent table
Every row has a row key Rows are stored sorted by row key
A table may have 1 or more column familiesCommon to have small number of column families
They should rarely change
Column family can have no. of columns
Each row has a timestampEach cell can have multiple versions
Schema DesignExample of a Student and Subject
Student Table
PKid
nameagesex
Example of a Student and Subject
Subject Table
PKid
titleintroductionteacher_id
Student-Subject Table
student_id
subject_id
type
m
n
Example of a Student and Subject
RDBMS
Schema Design
Three tableskeynameagesex
1Gaurav28Male
idtitleintroductionteacher_id
1HbaseHbase is cool10
Student table
Subject table
student_idsubject_idtype
11elective
Student-Subject table
Hbase
Student-Subject schema - Hbase
Schema DesignRow KeyColumn familyColumn Keys
student_idinfoname, age, sex
student_idsubjectsSubject Id's as qualifier(key)
Only two tableRow KeyColumn familyColumn Keys
subject_idinfotitle, introduction, teacher_id
subject_idstudentsStudent id's as qualifier(key)
Student table
Subject table
Hbase
Schema Designkeyinfosubjects
1info:name=Gauravinfo:age=28info:sex=Malesubjects:1=electivesubjects:2=main
keyinfostudents
1info:title=Hbaseinfo:introduction=Hbase is coolinfo:teacher_id=10students:1students:2
Student-Subject schema - Hbase
Student table
Subject table
Only two table
AttributePossible ValuesDefault
COMPRESSIONNONE,GZ,LZONONE
VERSIONS1+3
TTL1-2147483647(seconds)2147483647
BLOCKSIZE1 byte 2 GB64k
IN_MEMORYtrue,falsefalse
BLOCKCACHEtrue,falsetrue
Column families attributes
Region: Contiguous set of lexicographically sorted rowshbase.hregion.max.filesize (default:256 Mb)
Region hosted by Region Servers
Each Table is partitioned into Regions
Regions
Regions and
Splitting
row200
row201
row500
row1
new row
Regions and
Splitting
row200
row201
row350
row1
row 351
row 501
Master
Zookeeper
RegionServers
HDFS
MapReduce
Architecture
Architecture
Java API, Thrift...
Tools
Java API, Thrift...
ToolsJava
Thrift ( Ruby, Php, Python, Perl, C++... )
REST
Groovy DSL
MapReduce
Hbase Shell
Java API, Thrift...
ToolsJavaGet
Put
Delete
Scan
IncrementalColumnValue
Hbase v/s RDBMSNot a replacement
Solves only a small subset(~5%)
Conclusion
Where Sql makes life easyJoining
Secondary Indexing
Referential Integrity (updates)
ACID
Where Hbase makes life easy Dataset scale
Read/Write scale
Replication
Batch analysis
Conclusion
Hbase Apache (http://hbase.apache.org/)
Hbase Wiki (wiki.apache.org/hadoop/Hbase)
Hbase blog (blog.hbase.org)
Images from Google Search
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html
References & Credit