View
6
Download
0
Category
Preview:
Citation preview
HBASE
V.Hariharaputhran
o Fourteen years in Oracle Development / DBA / Big Data / Cloud Technologies
o All India Oracle Users Group (AIOUG) Evangelist
o Passion to learn and share
o Blog: www.puthranv.com
Harish P o Eight Plus years in Oracle DBA
o Big Data / Cloud Technologies/ RAC
Specialist
o All India Oracle Users Group (AIOUG)
Evangelist
o Passion to learn and share
Agenda • Big Data Introduction
• Hadoop Components
• Hbase Overview
• Hbase in Hadoop
• Why Hbase
• Hbase Architecture
• Hbase Read and Write
5
Data Data Data…Lots of Data
Google keeps track of you
World Population
Banking/Telecom/Energy…every industry contribute
No Data Archiving Logic
Iam always online
6
Internet of People to Internet of Things
QUALITY &
CONSISTENCY MAINTAIN & REPAIR SMART SHOPPING MONITOR POLLUTION
LEVELS
WILDLIFE PROTECTION FARMING ENERGY
Devices TALK to each other as they become SMART & generate DATA
7
Hadoop Components
8
Hadoop Components
HDFS – Distributed File system
MapReduce – Distributed Data Processing
Model
Hive – Provides SQL-Based Query Language
HBASE – Distributed column-based database
Pig – Data Flow Execution
9
HDFS - Daemon / Background Process
Data Node(DN)
Secondary
Name Node(SNN)
Name Node (NN)
DN4
DN1 DN2 DN3
NN SNN
10
MapReduce - Daemon / Background Process
Task Tracker
Job Tracker
DN1 DN2 DN3
NN SNN
11
Hbase – Daemon / Background Process
Region Server
Hbase Master
RS1 RS2 RS3
HM SNN
12
SQL vs NoSQL
EMPID NAME SALARY
100 Karthick 50000
101 Shiva 40000 Row Column
100 CF – Name Timestamp value = Karthick
100 CF – Salary Timestamp value = 50000
101 CF – Name Timestamp value = Shiva
101 CF – Salary Timestamp value = 40000
EMPID NAME SALARY CITY
100 Karthick 50000 CHENNAI
101 Shiva 40000
100 CF – City Timestamp value = Chennai
EMPID NAME SALARY CITY
100 Karthick 50000 DELHI
101 Shiva 40000 100 CF – City Timestamp value = Delhi
13
No SQL Databases
NO SQL
Document
databases
Key-value
stores
Wide-column
stores
14
Hbase Keys & Column Families
Rowkey
100
101
Personal Data
Name Address
Tom SFO
Mike SFO
Demographic
DOB Gender
01-01-1960 M
01-01-1970 M
Each row has a Key
Each record is divided into Column Families
Each column family consists of one or more Columns
15
Hbase Overview
•Scalable, distributed data store
•Open source avatar of Google’s Bigtable
•Sparse
•Tightly integrated with Hadoop
•Not a RDBMS
16
Hbase is
• Column family oriented database
• Column family oriented
• Tables consisting of rows and columns
• Persisted Map
• Sparse
• Multi dimensional
• Sorted
• Indexed by rowkey, column and timestamp
• Key Value store
• [rowkey, col family, col qualifier, timestamp] -> cell value
17
Hbase is not..
• A relational database
• No SQL query language
• No joins
• No secondary indexing
• No transactions
18
When to use Hbase
•Data volume
•Application Types
•Hardware environment
•No requirement of relational features
•Quick access to data
19
Hbase Features
•Scalability
•Sharding
•Distributed storage
•Failover support
•API support
•MapReduce support
•Back up support
20
Hbase Vs RDBMs
21
Hbase Shell
bin/hbase shell
• Create table
•create ‘mytable’ , ‘cf1’
• List tables
• list
• Describe table
• describe ‘mytable’
22
Hbase Shell Cont
• Put a row
• put ‘mytable’ , ‘row1’, ‘cf1:cq1’ , ‘val1’
• Get a row
• get ‘mytable’ , ‘row1’
• Put more
• put ‘mytable’ , ‘row2’ , ‘cf1:cq1’ , ‘val2’
• put ‘mytable’ , ‘row1’ , ‘cf1:cq2’ , ‘val3’
• Get a row
• get ‘mytable’ , ‘row1’
• Scan table
• scan ‘mytable’
23
Demo
24
Hbase – Column Families Cont
Rowkey ColumnFamily Column Timestamp Value
1
CF1 COL1 123 INDIA
COL1 124 27
COL2 126 AIOUG
COL2 127 NI
CF2 COL3 123 12.6
COL3 128 ORACLE
Key Value Pair
Row Key CF1 CF2
COL1 COL2 COL3
1 INDIA 12.6
1 27
1 AIOUG
1 NI
1 ORACLE
Timestamp
123
124
126
127
128
Row Format
25
Hbase Read and Write
26
Hbase Catalog Tables
Keeps Track where
.META FILE is
present Keeps Track of All Table,
Regions that are present
27
Meta Table
28
Table - TBL
Hbase – Region and Region Servers
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
Region1
Region2
Region3
Region4
Table TBL,Region 1
Table TBL,Region 2
Table TBL,Region 3
Table T, Region 240
Table TBL,Region 4
Table A,Region 500
Region Server - RS1210
Region Server - RS 1230
Region Server - RS1260
29
A table can be divided horizontally into one or more regions. A region
contains a contiguous, sorted range of rows between a start key and an end
key
Each region is 1GB in size
A region of a table is served to the client by a RegionServer
Hbase Region
30
Hbase Client – Locate Data
31
Client
Region
Server
Region
Server
Zookeper
META
DATA
DATA NODE DATA NODE
Hbase Client – Read / Locate Data
META Location
META
Cache
32
Where does your data Reside ?
33
Hbase Region Server Components
34
Hbase Write
35
WAL
Hbase Write
100
1
50
Client HMaster
Region Server 102
Region Server 102
Memstore 100
1
50
HFile
ACK
36
How Data is Stored in Hfile
37
Demo
38
Hbase Delete
When Delete command is triggered actual data is not deleted
A tombstone marker is set
HBase periodically removes deleted cells during compactions.
Tombstone Marker
- > Version delete marker
Marks a single version of a column for deletion
-> Column delete marker
Marks all versions of a column for deletion
-> Family delete marker
Marks all versions of all columns for a column family for deletion
39
40
Recommended