Upload
eron-wright
View
747
Download
0
Embed Size (px)
Citation preview
Eron Wright@eronwright
HTM & Apache FlinkExtending Flink for Anomaly Detection with Hierarchical Temporal Memory (HTM)
What is HTM?
2
3
Hierarchical Temporal Memory (HTM) is a theory of
computation for the neocortex.
History
4
2005 – 2009 HTM theory First generation algorithms Hierarchy and vision problems Vision Toolkit
2002
2004
2009 – 2012 Cortical Learning
Algorithms SDRs, sequence
memory, continuous learning
Applications exploration
2013 – 2015 Continued HTM
development NuPIC open source
project Grok for anomaly
detection
2005 2014 – Sensorimotor Goal directed
behavior Sequence
classificationhttp://www.slideshare.net/numenta/why-neurons-have-thousands-of-synapses-a-model-of-sequence-memory-in-the-brain
Computational Properties Online, Unsupervised Learning High-order Representations
• For example: sequences “ABCD” vs “XBCY” Multiple Simultaneous Predictions
• For example: “BC” predicts both “D” and “Y” Anomaly Scores
5
Implementations of HTM Numerous Implementations• NuPIC – official reference library (Python/C)• HTM.java – community-supported library
(Java) Evolving Rapidly• Tracking the theory!
6
7
NuPIC learns the time-based patterns in data, predicts future values, and
detects anomalies.
8
Introducing Flink-HTM
9
flink-htm provides HTM-based learning operators for the Flink
DataStream API, based on HTM.java.
Benefits Good fit for Apache Flink
• Automated model-building• Continuous learning• Temporal awareness
10
Contrast with:github.com/StephanEwen/flink-demos/tree/master/streaming-state-machine
Benefits (con’t) Good fit for HTM• Integration w/ data pipeline• Data connectivity
• e.g. Kafka, Twitter, HDFS, AWS Kinesis• DSL for stream pre- and post-processing
• e.g. aggregation, transformation• Distributed, reliable processing• Event-Time Awareness
11
Features `Learn` Operator
• Feeds input data to an HTM model • Emits predictions and anomaly scores• Supports keyed and non-keyed streams
Checkpoint Integration• Models are serialized• Facilitates exactly-once processing
Numenta RiverView Connector• Public-domain temporal datasets
12
13
NYC Traffic Examplehttp://data.numenta.org/nyc-traffic/meta.html
14
General Approach1. Define Input Type2. Add Data Source3. Apply Learn Operator
• w/ HTM Network Definition• w/ Field Encoders
4. Define Select Function1. Process the inference data (predictions & anomaly
scores)
15
16
17
Advanced Topics `Reset` Function• Indicates the start of a temporal
sequence• For example: A,B,C,D,E, (reset),
A,B,C,D,E Stateful Functions• Use `mapWithState` to store
predictions for the future18
19
20
Extending Flink
Streaming API/DSL Java
1. Static Entrypoint, then2. Intermediate Representation (e.g.
HTMStream), then3. DataStream!
21
Streaming API/DSL (con’t) Scala
1. `RichDataStream` extensions2. Scala Functions3. Scala-Specific TypeInformation
Other• Serialization Hooks• Clean your closures!
22
Learn Operator Implement `AbstractStreamOperator` Respect Flink’s type system• Use the `TypeInformation` class
Use the State Handle abstraction• * keyed streams only
Instrument your code• Accumulators
23
RiverView Connector Extend `RichParallelSourceFunction`• Parallelism is user-defined• Must handle partition assignment
Mix in `Checkpointed`• Synchronize on checkpoint lock
Support cancel/stop
24
25
Closing
Help Wanted!
26
Issues: github.com/htm-community/flink-htm/issues
Follow: @ApacheFlink, @dataArtisans, @Numenta Info: http://numenta.org/