25
Hooking up Flume with HBase LA-HUG Aug’11 -Dani Abel Rayan

Flume HBase

  • Upload
    irayan

  • View
    5.279

  • Download
    0

Embed Size (px)

DESCRIPTION

How to connect Flume and HBase

Citation preview

Page 1: Flume HBase

Hooking up Flume with HBaseLA-HUG Augrsquo11

-Dani Abel Rayan

Who am I

bull Big Data Ninja at Riot Gamesbull Flume Contributorbull Cloudera Intern Alumbull Graduated with Masters CS from Georgia Tech

What am I presenting here

bull Flume event modelbull HBase data modelbull Compelling reasons to hook lsquoem up bull Configuration examplesbull What are the new upcoming Sinks bull How to write new Flume-Sink

What is needed before we start

bull Understanding of Flumersquos architecturebull Usage of Flumersquos abstractions such as Plugins Events Sources Sinks Escape Sequences and Decoratorsbull Understanding of HBase and Hadoopbull Regexbull Thatrsquos it httparchiveclouderacomcdh3flumeUserGuideindexhtml

A Quick Glance hellip

Flume Event Model

bull A Flume event has these six main fields Unix timestamp Nanosecond timestamp Priority Source host Body and a Metadata table with an arbitrary number of attribute value pairs

bull The body is the raw log entry body The default is to truncate the body to a maximum of 32KB per event This is a configurable

bull One can custom bucket attributes with help of escape sequences

HBase Data Model

What is a Flume Sink

Reasons For HBase Sink

bull Near Real-Time aggregation of Streaming Databull Low Latency access to the aggregated databull Offline Big Data Analytics

Types of Flume HBase Sink

1 hbase() Highly expressive hbase(table rowkey cf1 c1 val1[cf2 c2 val2 ] writeBufferSize=int writeToWal=true|false)

2 attr2hbase() Flexible and powerful semantics but could be confusing (at first glance)attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 2: Flume HBase

Who am I

bull Big Data Ninja at Riot Gamesbull Flume Contributorbull Cloudera Intern Alumbull Graduated with Masters CS from Georgia Tech

What am I presenting here

bull Flume event modelbull HBase data modelbull Compelling reasons to hook lsquoem up bull Configuration examplesbull What are the new upcoming Sinks bull How to write new Flume-Sink

What is needed before we start

bull Understanding of Flumersquos architecturebull Usage of Flumersquos abstractions such as Plugins Events Sources Sinks Escape Sequences and Decoratorsbull Understanding of HBase and Hadoopbull Regexbull Thatrsquos it httparchiveclouderacomcdh3flumeUserGuideindexhtml

A Quick Glance hellip

Flume Event Model

bull A Flume event has these six main fields Unix timestamp Nanosecond timestamp Priority Source host Body and a Metadata table with an arbitrary number of attribute value pairs

bull The body is the raw log entry body The default is to truncate the body to a maximum of 32KB per event This is a configurable

bull One can custom bucket attributes with help of escape sequences

HBase Data Model

What is a Flume Sink

Reasons For HBase Sink

bull Near Real-Time aggregation of Streaming Databull Low Latency access to the aggregated databull Offline Big Data Analytics

Types of Flume HBase Sink

1 hbase() Highly expressive hbase(table rowkey cf1 c1 val1[cf2 c2 val2 ] writeBufferSize=int writeToWal=true|false)

2 attr2hbase() Flexible and powerful semantics but could be confusing (at first glance)attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 3: Flume HBase

What am I presenting here

bull Flume event modelbull HBase data modelbull Compelling reasons to hook lsquoem up bull Configuration examplesbull What are the new upcoming Sinks bull How to write new Flume-Sink

What is needed before we start

bull Understanding of Flumersquos architecturebull Usage of Flumersquos abstractions such as Plugins Events Sources Sinks Escape Sequences and Decoratorsbull Understanding of HBase and Hadoopbull Regexbull Thatrsquos it httparchiveclouderacomcdh3flumeUserGuideindexhtml

A Quick Glance hellip

Flume Event Model

bull A Flume event has these six main fields Unix timestamp Nanosecond timestamp Priority Source host Body and a Metadata table with an arbitrary number of attribute value pairs

bull The body is the raw log entry body The default is to truncate the body to a maximum of 32KB per event This is a configurable

bull One can custom bucket attributes with help of escape sequences

HBase Data Model

What is a Flume Sink

Reasons For HBase Sink

bull Near Real-Time aggregation of Streaming Databull Low Latency access to the aggregated databull Offline Big Data Analytics

Types of Flume HBase Sink

1 hbase() Highly expressive hbase(table rowkey cf1 c1 val1[cf2 c2 val2 ] writeBufferSize=int writeToWal=true|false)

2 attr2hbase() Flexible and powerful semantics but could be confusing (at first glance)attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 4: Flume HBase

What is needed before we start

bull Understanding of Flumersquos architecturebull Usage of Flumersquos abstractions such as Plugins Events Sources Sinks Escape Sequences and Decoratorsbull Understanding of HBase and Hadoopbull Regexbull Thatrsquos it httparchiveclouderacomcdh3flumeUserGuideindexhtml

A Quick Glance hellip

Flume Event Model

bull A Flume event has these six main fields Unix timestamp Nanosecond timestamp Priority Source host Body and a Metadata table with an arbitrary number of attribute value pairs

bull The body is the raw log entry body The default is to truncate the body to a maximum of 32KB per event This is a configurable

bull One can custom bucket attributes with help of escape sequences

HBase Data Model

What is a Flume Sink

Reasons For HBase Sink

bull Near Real-Time aggregation of Streaming Databull Low Latency access to the aggregated databull Offline Big Data Analytics

Types of Flume HBase Sink

1 hbase() Highly expressive hbase(table rowkey cf1 c1 val1[cf2 c2 val2 ] writeBufferSize=int writeToWal=true|false)

2 attr2hbase() Flexible and powerful semantics but could be confusing (at first glance)attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 5: Flume HBase

A Quick Glance hellip

Flume Event Model

bull A Flume event has these six main fields Unix timestamp Nanosecond timestamp Priority Source host Body and a Metadata table with an arbitrary number of attribute value pairs

bull The body is the raw log entry body The default is to truncate the body to a maximum of 32KB per event This is a configurable

bull One can custom bucket attributes with help of escape sequences

HBase Data Model

What is a Flume Sink

Reasons For HBase Sink

bull Near Real-Time aggregation of Streaming Databull Low Latency access to the aggregated databull Offline Big Data Analytics

Types of Flume HBase Sink

1 hbase() Highly expressive hbase(table rowkey cf1 c1 val1[cf2 c2 val2 ] writeBufferSize=int writeToWal=true|false)

2 attr2hbase() Flexible and powerful semantics but could be confusing (at first glance)attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 6: Flume HBase

Flume Event Model

bull A Flume event has these six main fields Unix timestamp Nanosecond timestamp Priority Source host Body and a Metadata table with an arbitrary number of attribute value pairs

bull The body is the raw log entry body The default is to truncate the body to a maximum of 32KB per event This is a configurable

bull One can custom bucket attributes with help of escape sequences

HBase Data Model

What is a Flume Sink

Reasons For HBase Sink

bull Near Real-Time aggregation of Streaming Databull Low Latency access to the aggregated databull Offline Big Data Analytics

Types of Flume HBase Sink

1 hbase() Highly expressive hbase(table rowkey cf1 c1 val1[cf2 c2 val2 ] writeBufferSize=int writeToWal=true|false)

2 attr2hbase() Flexible and powerful semantics but could be confusing (at first glance)attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 7: Flume HBase

HBase Data Model

What is a Flume Sink

Reasons For HBase Sink

bull Near Real-Time aggregation of Streaming Databull Low Latency access to the aggregated databull Offline Big Data Analytics

Types of Flume HBase Sink

1 hbase() Highly expressive hbase(table rowkey cf1 c1 val1[cf2 c2 val2 ] writeBufferSize=int writeToWal=true|false)

2 attr2hbase() Flexible and powerful semantics but could be confusing (at first glance)attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 8: Flume HBase

What is a Flume Sink

Reasons For HBase Sink

bull Near Real-Time aggregation of Streaming Databull Low Latency access to the aggregated databull Offline Big Data Analytics

Types of Flume HBase Sink

1 hbase() Highly expressive hbase(table rowkey cf1 c1 val1[cf2 c2 val2 ] writeBufferSize=int writeToWal=true|false)

2 attr2hbase() Flexible and powerful semantics but could be confusing (at first glance)attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 9: Flume HBase

Reasons For HBase Sink

bull Near Real-Time aggregation of Streaming Databull Low Latency access to the aggregated databull Offline Big Data Analytics

Types of Flume HBase Sink

1 hbase() Highly expressive hbase(table rowkey cf1 c1 val1[cf2 c2 val2 ] writeBufferSize=int writeToWal=true|false)

2 attr2hbase() Flexible and powerful semantics but could be confusing (at first glance)attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 10: Flume HBase

Types of Flume HBase Sink

1 hbase() Highly expressive hbase(table rowkey cf1 c1 val1[cf2 c2 val2 ] writeBufferSize=int writeToWal=true|false)

2 attr2hbase() Flexible and powerful semantics but could be confusing (at first glance)attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 11: Flume HBase

How to Use a Plugin

bull Compile Add the jar with the new plugin classes to flumersquos classpath

bull In flume-sitexml add the class names of the new sources sinks andor decorators to the flumepluginclasses property

bull Restart the Flume nodes (Including Master)bull Verify that your plugin is loaded is to check if it

is displayed on this page httpflume-master35871masterextjsp

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 12: Flume HBase

hbase()Source tail(ldquoprocvmstatrdquo)

nr_free_pages 594693nr_inactive_anon 1392nr_active_anon 45259nr_inactive_file 107132nr_active_file 141458

Sink regexAll(ldquow+)s+(w+)rdquordquocolnamerdquordquovalue) Flume Events

timestamp 24353455

colname nr_free_pages

value 594693

timestamp 24353456

colname nr_inactive_anon

value 1392

timestamp 24353457

colname nr_active_anon

value 45259

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 13: Flume HBase

hbase()bull hbase(tablename rdquos rdquostats rdquocolname rdquovalue)use nanos instead of s if you want nano-second timestamp

Rowkey Timestamp Column Family stats

24353455 T1 nr_free_pages = 594693

24353456 T2 nr_inactive_anon = 1392

24353457 T3 nr_active_anon = 45259

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 14: Flume HBase

hbase()

bull Thus the FDL syntax would be

bull node tail(rdquoprocvmstat) | regexAll((w+)s+(w+) rdquocolname rdquovalue) collector(300000) hbase(table rdquos rdquostats rdquocolname value)

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 15: Flume HBase

Demo

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 16: Flume HBase

attr2hbase()

bull Donrsquot have to list all possible event attributes you want to store in HBase along with their destination column families and qualifiers

bull Source andor decorators can produce any

(reasonable) number of attributes with dynamic names (eg depending on the values) and they will be written into HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 17: Flume HBase

attr2hbase

bull attr2hbase(table[sysFamily[writeBody[attrPrefix[writeBufferSize [writeToWal]]]]])

bull sysFamily holds the name of the column family that is used to store ldquosystemrdquo data (event timestamp host priority)

bull In case this parameter is absent or equals ldquordquo the sink doesnrsquot write ldquosystemrdquo data

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 18: Flume HBase

attr2hbase

bull writeBody indicates whether event body should be written with other ldquosystemrdquo data By default (when this parameter is absent or equals rdquordquo) the attribute body is not written

bull This parameter should have the ldquocolumn-familyqualifierrdquo format in order for the sink to write the body to the specific column-familyqualifier

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 19: Flume HBase

attr2hbase

bull attrPrefix defines which attributes will be written to HBase every attribute with the name prefixed with attrPrefix parameterrsquos value is written The attribute key should be in the following format to be properly written into HBase

ldquoltattrPrefixgtltcolfamgtltqualgtrdquobull The default value of attrPrefix is ldquo2hb_rdquo This means that

all attributes with names ldquo2hb_ltcolfamgtltqualgtrdquo should be written to HBase

bull Attribute with key ldquoltattrPrefixgtrdquo must contain row key for Put otherwise if no row can be extracted the event is skipped and no record is written to the HBase table

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 20: Flume HBase

attr2hbase examplebull node tail(procvmstatrdquo) | regexAll((w+)s+(w+)

colnamevalue) value(2hb_colnames escape=true) value(2hb_statvalue value escape=true) attr2hbase(table-attr2hbasesystembodycontents)]

Rowkey Timestamp Column Family stat

pgpgin1313244007 t1 value=985543

pgpgin1313244008 t2 value=985543

pgpgin1313244009 t3 value=985543

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 21: Flume HBase

Demo Time

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 22: Flume HBase

What are the New Plugins

bull httpscwikiapacheorgFLUMEflume-pluginshtml

bull I pushed OpenTSDB Sink just few weeks back

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 23: Flume HBase

How to Contribute a new Plugin

bull Extend EventSinkBasebull Override Open() Have your connections

setup to the Storebull Override Append() Every new Event gets

processed here Doing the ldquoPutsrdquo into Storebull Override Close () Yay Cleanup the

connections and flushing etc to the Storebull Implement a SinkBuilder builder()

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 24: Flume HBase

My Contacts

bull drayanriotgamescombull drverticalenginecombull Twitter rayanandi

PS We are Hiring

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN
Page 25: Flume HBase

GOOD LUCKHAVE FUN

Play Freehttpwwwleagueoflegendscom

  • Hooking up Flume with HBase LA-HUG Augrsquo11
  • Who am I
  • What am I presenting here
  • What is needed before we start
  • A Quick Glance hellip
  • Flume Event Model
  • HBase Data Model
  • What is a Flume Sink
  • Reasons For HBase Sink
  • Types of Flume HBase Sink
  • How to Use a Plugin
  • hbase()
  • hbase() (2)
  • hbase() (3)
  • Demo
  • attr2hbase()
  • attr2hbase
  • attr2hbase (2)
  • attr2hbase (3)
  • attr2hbase example
  • Demo Time
  • What are the New Plugins
  • How to Contribute a new Plugin
  • My Contacts
  • GOOD LUCK HAVE FUN