26
Eron Wright @eronwright HTM & Apache Flink Extending Flink for Anomaly Detection with Hierarchical Temporal Memory (HTM)

HTM & Apache Flink (2016-06-27)

Embed Size (px)

Citation preview

Page 1: HTM & Apache Flink (2016-06-27)

Eron Wright@eronwright

HTM & Apache FlinkExtending Flink for Anomaly Detection with Hierarchical Temporal Memory (HTM)

Page 2: HTM & Apache Flink (2016-06-27)

What is HTM?

2

Page 3: HTM & Apache Flink (2016-06-27)

3

Hierarchical Temporal Memory (HTM) is a theory of

computation for the neocortex.

Page 4: HTM & Apache Flink (2016-06-27)

History

4

2005 – 2009 HTM theory First generation algorithms Hierarchy and vision problems Vision Toolkit

2002

2004

2009 – 2012 Cortical Learning

Algorithms SDRs, sequence

memory, continuous learning

Applications exploration

2013 – 2015 Continued HTM

development NuPIC open source

project Grok for anomaly

detection

2005 2014 – Sensorimotor Goal directed

behavior Sequence

classificationhttp://www.slideshare.net/numenta/why-neurons-have-thousands-of-synapses-a-model-of-sequence-memory-in-the-brain

Page 5: HTM & Apache Flink (2016-06-27)

Computational Properties Online, Unsupervised Learning High-order Representations

• For example: sequences “ABCD” vs “XBCY” Multiple Simultaneous Predictions

• For example: “BC” predicts both “D” and “Y” Anomaly Scores

5

Page 6: HTM & Apache Flink (2016-06-27)

Implementations of HTM Numerous Implementations• NuPIC – official reference library (Python/C)• HTM.java – community-supported library

(Java) Evolving Rapidly• Tracking the theory!

6

Page 7: HTM & Apache Flink (2016-06-27)

7

NuPIC learns the time-based patterns in data, predicts future values, and

detects anomalies.

Page 8: HTM & Apache Flink (2016-06-27)

8

Introducing Flink-HTM

Page 9: HTM & Apache Flink (2016-06-27)

9

flink-htm provides HTM-based learning operators for the Flink

DataStream API, based on HTM.java.

Page 10: HTM & Apache Flink (2016-06-27)

Benefits Good fit for Apache Flink

• Automated model-building• Continuous learning• Temporal awareness

10

Contrast with:github.com/StephanEwen/flink-demos/tree/master/streaming-state-machine

Page 11: HTM & Apache Flink (2016-06-27)

Benefits (con’t) Good fit for HTM• Integration w/ data pipeline• Data connectivity

• e.g. Kafka, Twitter, HDFS, AWS Kinesis• DSL for stream pre- and post-processing

• e.g. aggregation, transformation• Distributed, reliable processing• Event-Time Awareness

11

Page 12: HTM & Apache Flink (2016-06-27)

Features `Learn` Operator

• Feeds input data to an HTM model • Emits predictions and anomaly scores• Supports keyed and non-keyed streams

Checkpoint Integration• Models are serialized• Facilitates exactly-once processing

Numenta RiverView Connector• Public-domain temporal datasets

12

Page 14: HTM & Apache Flink (2016-06-27)

14

Page 15: HTM & Apache Flink (2016-06-27)

General Approach1. Define Input Type2. Add Data Source3. Apply Learn Operator

• w/ HTM Network Definition• w/ Field Encoders

4. Define Select Function1. Process the inference data (predictions & anomaly

scores)

15

Page 16: HTM & Apache Flink (2016-06-27)

16

Page 17: HTM & Apache Flink (2016-06-27)

17

Page 18: HTM & Apache Flink (2016-06-27)

Advanced Topics `Reset` Function• Indicates the start of a temporal

sequence• For example: A,B,C,D,E, (reset),

A,B,C,D,E Stateful Functions• Use `mapWithState` to store

predictions for the future18

Page 19: HTM & Apache Flink (2016-06-27)

19

Page 20: HTM & Apache Flink (2016-06-27)

20

Extending Flink

Page 21: HTM & Apache Flink (2016-06-27)

Streaming API/DSL Java

1. Static Entrypoint, then2. Intermediate Representation (e.g.

HTMStream), then3. DataStream!

21

Page 22: HTM & Apache Flink (2016-06-27)

Streaming API/DSL (con’t) Scala

1. `RichDataStream` extensions2. Scala Functions3. Scala-Specific TypeInformation

Other• Serialization Hooks• Clean your closures!

22

Page 23: HTM & Apache Flink (2016-06-27)

Learn Operator Implement `AbstractStreamOperator` Respect Flink’s type system• Use the `TypeInformation` class

Use the State Handle abstraction• * keyed streams only

Instrument your code• Accumulators

23

Page 24: HTM & Apache Flink (2016-06-27)

RiverView Connector Extend `RichParallelSourceFunction`• Parallelism is user-defined• Must handle partition assignment

Mix in `Checkpointed`• Synchronize on checkpoint lock

Support cancel/stop

24

Page 25: HTM & Apache Flink (2016-06-27)

25

Closing

Page 26: HTM & Apache Flink (2016-06-27)

Help Wanted!

26

Issues: github.com/htm-community/flink-htm/issues

Follow: @ApacheFlink, @dataArtisans, @Numenta Info: http://numenta.org/