HTM & Apache Flink (2016-06-27)

Preview:

Citation preview

Eron Wright@eronwright

HTM & Apache FlinkExtending Flink for Anomaly Detection with Hierarchical Temporal Memory (HTM)

What is HTM?

2

3

Hierarchical Temporal Memory (HTM) is a theory of

computation for the neocortex.

History

4

2005 – 2009 HTM theory First generation algorithms Hierarchy and vision problems Vision Toolkit

2002

2004

2009 – 2012 Cortical Learning

Algorithms SDRs, sequence

memory, continuous learning

Applications exploration

2013 – 2015 Continued HTM

development NuPIC open source

project Grok for anomaly

detection

2005 2014 – Sensorimotor Goal directed

behavior Sequence

classificationhttp://www.slideshare.net/numenta/why-neurons-have-thousands-of-synapses-a-model-of-sequence-memory-in-the-brain

Computational Properties Online, Unsupervised Learning High-order Representations

• For example: sequences “ABCD” vs “XBCY” Multiple Simultaneous Predictions

• For example: “BC” predicts both “D” and “Y” Anomaly Scores

5

Implementations of HTM Numerous Implementations• NuPIC – official reference library (Python/C)• HTM.java – community-supported library

(Java) Evolving Rapidly• Tracking the theory!

6

7

NuPIC learns the time-based patterns in data, predicts future values, and

detects anomalies.

8

Introducing Flink-HTM

9

flink-htm provides HTM-based learning operators for the Flink

DataStream API, based on HTM.java.

Benefits Good fit for Apache Flink

• Automated model-building• Continuous learning• Temporal awareness

10

Contrast with:github.com/StephanEwen/flink-demos/tree/master/streaming-state-machine

Benefits (con’t) Good fit for HTM• Integration w/ data pipeline• Data connectivity

• e.g. Kafka, Twitter, HDFS, AWS Kinesis• DSL for stream pre- and post-processing

• e.g. aggregation, transformation• Distributed, reliable processing• Event-Time Awareness

11

Features `Learn` Operator

• Feeds input data to an HTM model • Emits predictions and anomaly scores• Supports keyed and non-keyed streams

Checkpoint Integration• Models are serialized• Facilitates exactly-once processing

Numenta RiverView Connector• Public-domain temporal datasets

12

14

General Approach1. Define Input Type2. Add Data Source3. Apply Learn Operator

• w/ HTM Network Definition• w/ Field Encoders

4. Define Select Function1. Process the inference data (predictions & anomaly

scores)

15

16

17

Advanced Topics `Reset` Function• Indicates the start of a temporal

sequence• For example: A,B,C,D,E, (reset),

A,B,C,D,E Stateful Functions• Use `mapWithState` to store

predictions for the future18

19

20

Extending Flink

Streaming API/DSL Java

1. Static Entrypoint, then2. Intermediate Representation (e.g.

HTMStream), then3. DataStream!

21

Streaming API/DSL (con’t) Scala

1. `RichDataStream` extensions2. Scala Functions3. Scala-Specific TypeInformation

Other• Serialization Hooks• Clean your closures!

22

Learn Operator Implement `AbstractStreamOperator` Respect Flink’s type system• Use the `TypeInformation` class

Use the State Handle abstraction• * keyed streams only

Instrument your code• Accumulators

23

RiverView Connector Extend `RichParallelSourceFunction`• Parallelism is user-defined• Must handle partition assignment

Mix in `Checkpointed`• Synchronize on checkpoint lock

Support cancel/stop

24

25

Closing

Help Wanted!

26

Issues: github.com/htm-community/flink-htm/issues

Follow: @ApacheFlink, @dataArtisans, @Numenta Info: http://numenta.org/

Recommended