27
Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University HBase Tutorial Sungmin Hwang

Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Embed Size (px)

Citation preview

Page 1: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering

Chonnam National University

HBase Tutorial

Sungmin Hwang

Page 2: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

HBase

Column-oriented data store

Distributed Designed for large tables

Scalable

NoSQL DB No SQL based access Not attached to Relational Model for storage

Based on Google’s Bigtable Built on top of HDFS

Page 3: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

HBaseBeing used by

Facebook, Twitter, Yahoo, Netflix, Adobe

When to use? Compared to RDBMS, Hbase has very simple and lim-

ited API Suitable for large amounts of data

• Large data• Large amounts of clients/requests

If data is too small, all the records will end up on a single node

Bad for Relational analytics such as join, group by Text-based search access

Page 4: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Data model

Tables contain rows rows – referenced by a unique key (string, long,

…)Rows are made of columns which are

grouped in column familiesData is stored in cells

Identified by row _ column-family _ columnColumns are grouped into faimiliesFamily definitions are staticMovie

familiy info: Columns: title, director, date

content Columns: story

Page 5: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Page 6: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Region – a range of rows stored together Master server – daemon which manages region servers Hbase stores its data into HDFS

Hfile – key-value map WAL(write ahead log) - when data is added, it’s also written to WAL When-memory data exceeds maximum value, it is flushed to an HFile

Hbase utilizes Zookeeper for distributed coordination

Page 7: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Page 8: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Accessing HBase

Hbase ShellNative Java APIHBqlRestful API

Page 9: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Page 10: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Continued from Hadoop assign-ment

Same environment with previous Hadoop assignment

Pseudo-distributed mode Hadoop 1.0.3 Java 1.7 Ubuntu 12.04 LTS

Page 11: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Installation guide - download

Download hbase Recent version – 0.98.7 http://www.interior-dsgn.com/apache/hbase/hb

ase-0.98.7/

In the folder, there are 2 versions• hbase-0.98.7-hadoop1-bin.tar.gz• hbase-0.98.7-hadoop2-bin.tar.gz• Each number represents Hadoop version• In here, as use hadoop1.03, we download version 1

wget http://www.interior-dsgn.com/apache/hbase/hbase-0.98.7/hbase-0.98.7-hadoop1-bin.tar.gz

Page 12: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Installation guide - environment

extract$ tar zxvf hbase-0.98.7-hadoop1-bin.tar.gz$ cd hbase-0.98.7-hadoop1 Configure JAVA_HOME directory$ vim conf/hbase-env.sh

In hbase-env.sh, remove comment where the line starts with JAVA_HOME, and set up the path for java

Page 13: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Configure – hbase-site.xml

hbase-site.xml$ vim conf/hbase-site.xml

Configuration for pseudo-distributed mode

Page 14: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Starting HBase

For starting,$ ./bin/start-hbase.sh

For stopping Hbase,$ ./bin/stop-hbase.sh

Page 15: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Web based management

Both Master and Region servers run web serverMaster: http://localhost:60010

Page 16: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Web based management

Both Master and Region servers run web serverRegion server: http://localhost:60030

Page 17: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

HBase shell

$ ./bin/hbase shell

hbase> help “command” to get detailed use of commands

example ) hbase> help “get”

Page 18: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Hbase shell

Jruby IRB (Interactive Ruby shell) + hbase commands

Quote all names Table and column names Single quotes for text

• Hbase> create 'test', 'cf‘ Double quotes for binary

• Use hexadecimal representation of that binary value

Specifying parameters {‘key1’ => ‘value1’, ‘key2’ => ‘value2’, …} Example:

• Hbase> get ‘UserTable’, ‘userId1’, {COLUMN => ‘ad-dress:str}

Page 19: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Hbase shell - Commands

General status, version

DDL Alter, create, describe, disable, drop, enable,

exists, listDML

Count, delete, deleteall, get, get_counter, incr, put, scan, truncate

Cluster administration Balancere, close_region, move, split, …

Page 20: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Exercise – Creating Table

Create table called ‘Movie’ with the follow-ing schema 2 families

• ‘info’ with 3 columns: ‘title’, ‘director’, and ‘date’• ‘content with 1 column family: ‘story’

Hbase> create ‘Movie’, {NAME=>’info’}, {NAME=>’content’}

Hbase> put ‘Movie’, ‘movie-1’, ‘info:title’, ‘AboutTime’Hbase> put ‘Movie’, ‘movie-1’, ‘info:director’, ‘Richard Curtis’Hbase> put ‘Movie’, ‘movie-1’, ‘info:date’, ‘2013’Hbase> put ‘Movie’, ‘movie-1’, ‘content:summary’, ‘Time traveler story. ’

Movie

familiy info: Columns: title, director, date

content Columns: story

Page 21: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Exercise – get

Select single row Hbase> get ‘table’, ‘row_id’

Select specific coloumns Hbase> get ‘table’, ‘row_id’, {COLUMN=>[‘c1’,

‘c2’]}

Page 22: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Exercise - get

Select specific timestamp or time-range Hbase> get ‘table’, ‘row_id’,

{TIMERANGE=>[ts1,ts2]}

Modifying maximum version

Select more than one version Hbase> get ‘table’, ‘row_id’, {VERSIONS=>3}

Page 23: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Exercise - ScanScan an entire table

Hbase> scan ‘table_name’Limit the number of results

Hbase> scan ‘table_name’, {LIMIT=>1}Scan a range

Hbase> scan ‘Movie’, {STARTROW=>’startRow’, STO-PROW=>’stopRow’}

Page 24: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Exercise - Scan

Applying filters to scan

Some filters are included in hbase

Page 25: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Delete

Delete cell by providing table, row id and column coordinates Delete ‘table’, ‘row_id’, ‘column’ Deletes all versions of that cell

Delete only versions before certain time-stamp Delete ‘table’, ‘row_id’, ‘column’, timestamp

Page 26: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Drop table

Table should be disabled before dropping Hbase> disable ‘table’ Hbase> drop ‘table’

Page 27: Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University

Distributed Networks & Systems Lab

Thank you