The Hadoop Ecosystem & HBase - Meetupfiles.meetup.com/3137102/WHUG 4. Hadoop Ecosystem... ·...

Preview:

Citation preview

The Hadoop Ecosystem& HBase

Kai Voigt, Cloudera Inc.Warsaw Hadoop User Group, July 11th, 2012

Freitag, 13. Juli 12

A Hadoop ClusterFreitag, 13. Juli 12

Part 1:Hadoop Ecosystem

Freitag, 13. Juli 12

Freitag, 13. Juli 12

HDFS

Freitag, 13. Juli 12

HDFS

MapReduce

Freitag, 13. Juli 12

HDFS

MapReduceJava

Java

Freitag, 13. Juli 12

HDFS

MapReduceJava

Java

hadoop fs

CmdLine

Freitag, 13. Juli 12

HDFS

MapReduceJava

Java

hadoop fs

CmdLine

FUSE

Posix

Freitag, 13. Juli 12

HDFS

MapReduceJava

Java

Sqoop

RDBMS

hadoop fs

CmdLine

FUSE

Posix

Freitag, 13. Juli 12

HDFS

MapReduceJava

Java

Sqoop

RDBMS

Flume

Events

hadoop fs

CmdLine

FUSE

Posix

Freitag, 13. Juli 12

HDFS

MapReduceJava

Java

Sqoop

RDBMS

Flume

Events

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Freitag, 13. Juli 12

HDFS

MapReduce

Hive

SQL

Java

Java

Sqoop

RDBMS

Flume

Events

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Freitag, 13. Juli 12

HDFS

MapReduce

Hive Pig

SQL

Java

Java

Script

Sqoop

RDBMS

Flume

Events

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Freitag, 13. Juli 12

HDFS

MapReduce

Hive Pig Mahout

SQL

Java

Java

Script Java

Sqoop

RDBMS

Flume

Events

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Freitag, 13. Juli 12

HDFS

HBaseMapReduce

Hive Pig Mahout

SQL

Java

Java

Script Java

Sqoop

RDBMS

Flume

Events

Java

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Freitag, 13. Juli 12

HDFS

HBaseMapReduce

Hive Pig Mahout

SQL

Java

Java

Script Java

Sqoop

RDBMS

Flume

Events

Java

Oozie

Whirr

hadoop fs

CmdLine

FUSE

Posix

Streaming

Script

Hue

Freitag, 13. Juli 12

CDH 4.0

• Cloudera's Distribution Including Hadoop

• http://www.cloudera.com/

• Packages and Virtual Machines

• True Apache

HDFS HMapReH P MSJJ

S J

SR FlE

JOW

Freitag, 13. Juli 12

Part 2:Apache HBase

Freitag, 13. Juli 12

Data ModelRowID Col1 Col2 Col3 Col4 Col56289121219328342

aaa bbb cccddd eee 111

fff 222ggg hhh

iii jjj kkk lll 333mmm nnn

Freitag, 13. Juli 12

RegionsRowID Col1 Col2 Col3 Col4 Col56289121

aaa bbb cccddd eee 111

fff 222

RowID Col1 Col2 Col3 Col4 Col5219328342

ggg hhhiii jjj kkk lll 333

mmm nnn

Freitag, 13. Juli 12

Column FamiliesRowID Col1 Col26289121

aaa bbbddd

RowID Col3 Col4 Col56289121

ccceee 111

fff 222

RowID Col1 Col2219328342

gggiii jjj

mmm

RowID Col3 Col4 Col5219328342

hhhkkk lll 333nnn

Freitag, 13. Juli 12

Multiple Versions

Foo21:09RowID: 627

ColumnName: Col7

Freitag, 13. Juli 12

Multiple Versions

Foo21:09RowID: 627

ColumnName: Col7

Bar22:34

Freitag, 13. Juli 12

Multiple Versions

Foo21:09RowID: 627

ColumnName: Col7

Bar22:34'DEL'

23:12

Freitag, 13. Juli 12

Multiple Versions

Foo21:09RowID: 627

ColumnName: Col7

Bar22:34'DEL'

23:12

(RowID, Columnname, Timestamp) -> Value

Freitag, 13. Juli 12

Simple API

• PUT 'table', 'rowid', 'column', 'value'

• GET 'table', 'rowid', 'column'

• GET 'table', 'rowid'

• DELETE 'table', 'rowid', 'column'

• DELETE 'table', 'rowid'

• SCAN 'table'

Freitag, 13. Juli 12

Additional Features

• MapReduce Input/Output Format

• Hive Interface

• Thrift API

• RESTful API

• Sqoop Connector

• Flume Sink

Freitag, 13. Juli 12

Thank You!

Freitag, 13. Juli 12

Recommended