20
Heka Unified Data Processing

Heka - Rob Miller

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Heka - Rob Miller

HekaUnified Data Processing

Page 2: Heka - Rob Miller

So. Much. Data.

Page 3: Heka - Rob Miller

So. Much. Data.

•Server level ops data

•Process level data

•Ops data / metrics

•Business data

•Logging output

•Error reports / tracebacks

Page 4: Heka - Rob Miller

So. Many. Tools.

•collectd / tcollector

•statsd / graphite / etc.

•[r]syslog[-ng]

•Logstash

•Riemann / Esper / other CEP

•Nagios / Zenoss

Page 5: Heka - Rob Miller

One Basic Pattern

•Acquire data

•Transform and/or Transport data

•Output data

Page 6: Heka - Rob Miller

One Multi-Tool?

What would it be like to build a tool to tackle this in the general case?

Wins:

•Fewer processes to manage

•Increased client / configuration consistency

•Processing shared across domains

Page 7: Heka - Rob Miller

One Multi-Tool?

Requirements:

•Lightweight

•Flexible and configurable

•Easily extended

Page 8: Heka - Rob Miller

I know, I know...

Page 9: Heka - Rob Miller

BUT!

Replacing even two services on each box is a net ops win.

SCIENCE!

Page 10: Heka - Rob Miller

How Heka Is Put Together

Page 11: Heka - Rob Miller

Inputs

•Listen or fetch

•Just about the low level transport

Page 12: Heka - Rob Miller

Splitters

•Slice Inputs' raw data streams into discrete events

•Text or binary protocols

•Decouple protocols from their transports

Page 13: Heka - Rob Miller

Decoders

•Parse event data to populate a metadata envelope for all event types

•Extract structure from unstructured data...

•... or just wrap a blob

•Sandbox-able (Lua)

Page 14: Heka - Rob Miller

Router

Simple, efficient grammar for matching messages:

Type == "counter" && Payload == "1"

Type == "applog" && Logger == "marketplace"

Type == "alert" && (Severity==7 || Payload=="emergency")

Type == "myapp.metric" && Fields[name] =~ /.*\.stat/

Page 15: Heka - Rob Miller

Filters

•Watch flowing data

•Generate output messages

•Sandbox-able (Lua)

Page 16: Heka - Rob Miller

Outputs

•Deliver to external service...

•… and/or to upstream Heka...

•… and/or directly to Heka Dashboard UI

•Configurable reconnect

Page 17: Heka - Rob Miller

Sandboxes Are Fun!

• Dynamically added to running Heka w/ no config changes, no restart

● CPU cycles and RAM usage monitored

● Misbehaving plugins are shut off

Page 18: Heka - Rob Miller

Sandboxes Are Fun!

• LPeg (parsing expression grammar) & JSON libraries for data parsing

• Circular buffer library for time series data

Page 19: Heka - Rob Miller

Sandboxes Are Fun!

Circular buffers auto-generate dashboard graphs

Page 20: Heka - Rob Miller

Try It Out

https://github.com/mozilla-services/heka

http://hekad.readthedocs.org

https://mail.mozilla.org/listinfo/heka

irc.mozilla.org, #heka

[email protected]