Upload
datasciencemd
View
219
Download
0
Embed Size (px)
Citation preview
Streaming with Heron on the Mesos/Aurora Stack
Ron Wilcom - Chief Engineer
August 30, 2017
Agenda
● Cloud Analytics - Why Streaming?● Heron Advantages and Concepts● Deployment/Architecture - Mesos/Aurora/Heron● Developing a Topology in Heron● Heron Failover and Elastic Scaling
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Cloud Analytics - Types
● Batch : distributed large batches, analyze history using map/reduce, RDD - stateless (slow)
● Micro-Batch : distributed small batches, RDD, allows for windowed Tx - stateful (medium?)
● Streaming : real-time data, analyze and act; continuous queries; windowed Tx - stateful (fast)
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Big Data - 3 Vs
● Volume - terabytes, petabytes● Velocity - realtime (or near)● Variety - social, blogs, logs, sensors, etc
These may overlap… when ‘velocity’ is a factor you probably want to use streaming.
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
“Operational Intelligence” (OI) for immediate action
○ sales lead generation - real time sales promotions○ live geolocation movement - co-mingling, travel, alerts, etc○ detection of system problems○ bank transactions - stock market○ point of sale - inventory, reorder, trends○ radar and signals events
Complementary to Batch - but some argue it’s good for everything !?
What is Streaming Analytics good for?
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming Analytics (and Processing)
● Live sensors/alerting/collection - a “stream” … hmm? (~ tuple/objects)
● Real time events and results (sec to min) ● Transactional / Windowed - stream ‘sources’ and
‘workers’ make a “topology” (DAG)● DAGs - Directed Acyclic Graphs (stateful)e.g. Apache Storm, IBM InfoSphere Streams, Twitter Heron, TIBCO Streambase
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron
What led me to research Heron
… and Aurora/Mesos?
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron
What led me to research Heron
… and Aurora
… and Mesos
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
credit: https://github.com/cncf/landscape
Streaming - Heron
● Developed by Twitter (led by Karthik Ramasamy and Sanjeev Kulkarni)
● “Rewrite” of Apache Storm (Storm also developed by Twitter)
○ Fully backwards compatible with Storm code● Production use by Twitter since 2014 (~500m tweets per day)
● Twitter, Google, Microsoft, Machine Zone … more
○ Storm: Yahoo, Groupon, TWC, WebMD, Spotify, Yelp, Verisign, more● Backed by Streamlio (streaml.io)
● Incubation application has been submitted to Apache● Free and open source (compare to IBM, Tibco…)
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron Advantages
● isolated processing - clean mapping logical to physical (behavior/perf/profiling)
● specific resource allocation configurations per topology● shares hardware resources with ANY other system (lowers infrastructure cost)
● no single point of failure - automatic failure restarts/reallocation● ‘backpressure’ mechanism automatically stabilizes throughput● elastic scaling of nodes based on performance thresholds (new!)
● higher throughput at scale with less resource usage (~10x better vs Storm)
● modular and agnostic to resource mgrs and schedulers (*somewhat)
fits well with ‘Java shops’ - jobs run as JVM containers - Java, Scala, Groovy, Clojure, JRuby (Python)
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron (== Apache Storm concepts)
● the “topology” (a DAG)○ logical plan (vs physical)
● sources = “spouts” ○ streaming feed / direction○ backpressure management○ “replay” point○ HDFS, Kafka, NiFi, etc
● workers = “bolts”○ transform, enrich, filter, and join○ data stays live^○ pushes back out to storage/memory
● flow = “stream groupings”○ shuffle* = load balanced○ field* = determined by field values○ direct = one way○ all = replicate to many○ global = join/narrow to one ^overhead of fault tolerance and metrics
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
credit: https://twitter.github.io/heron/docs/concepts/architecture/
Streaming - Heron
Now we have a “topology” - lets distribute it across our cluster/cloud .. 5x, 25x, 100x, 500x! How do we:
● Deploy all of those!?
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron
Now we have a “topology” - lets distribute it across our cluster/cloud .. 5x, 25x, 100x, 500x! How do we:
● Deploy all of those!?● Manage server resources?!
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron
Now we have a “topology” - lets distribute it across our cluster/cloud .. 5x, 25x, 100x, 500x! How do we:
● Deploy all of those!?● Manage server resources?!● Monitor performance?
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron
Now we have a “topology” - lets distribute it across our cluster/cloud .. 5x, 25x, 100x, 500x! How do we:
● Deploy all of those!?● Manage server resources?!● Monitor performance?● Deal with node or topology failures?
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron
“We” DON’T ... instead think of it this way ...
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
“Pets vs Cattle”
Streaming - Heron - Stack of Products
● HDFS (Apache Hadoop Distributed File System)
● Zookeeper (Apache Distributed Coordination)
● Mesos (Apache Distributed Resource Manager)
● Aurora (Apache/Twitter Distributed Scheduler)
● Heron (Twitter Distributed Stream Processing)
Why choose this over others? How do these fit together?! Configure each - ouch!!
note: this is the “prescribed stack"
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron - Stack of Products
● HDFS (Apache Hadoop Distributed File System)
● Zookeeper (Apache Distributed Coordination)
● Mesos (Apache Distributed Resource Manager)
● Aurora (Apache/Twitter Distributed Scheduler)
● Heron (Twitter Distributed Stream Processing)
Why choose this over others? How do these fit together?! Configure each - ouch!!
Note: this is the “prescribed stack" - but there are many combinations (modular)!
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Heron is Modular … so other examples:
● SCP/Mesos/Marathon/Heron● HDFS/YARN/Heron● SCP/Slurm/Heron● GlusterFS/Kubernetes/Heron● … and more?
Mesos/Aurora/Heron - Responsibilities
● Aurora - job scheduling, deployment, and monitoring
● Mesos - resource management, task deployment
● Aurora Thermos - process execution● Heron - streaming, processing,
analytics, metrics, self health reporting
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
credit: http://aurora.apache.org/documentation
Streaming - Heron - Physical Deployment
Zookeeper
HDFS Master
Mesos Master
Aurora SchedulerAurora Client
Heron BinariesTopology Package
Heron Tracker/UI
“Leader” VM
.. repeat for ‘Standby’ instances
HDFS Data Node
Mesos Agent
Aurora Thermos
VM#x ……................. VM#xxx
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Used as the registry for all services to synchronize their configurations and coordinate addresses, etc.
Used to distribute Heron binaries and *your* topology packages to available resources.
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
8 CPU16G RAM40G Disk
8 CPU16G RAM40G Disk
8 CPU16G RAM40G Disk
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Central service to manage available resources across the cluster - tracks what system resources are available for jobs.
One to many agents reports to the master what they are offering up as resources (CPUs, RAM, Disk)
8 CPU16G RAM40G Disk
8 CPU16G RAM40G Disk
8 CPU16G RAM40G Disk
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Mesos Monitoring ToolsStreaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017Mesos Monitoring Tools
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Aurora Scheduler
Aurora Thermos
Aurora Thermos
Aurora Thermos
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Aurora Scheduler
Aurora Thermos
Aurora Thermos
Aurora Thermos
Executor for tasks derived from the scheduled jobs. Directed by Mesos/Aurora Scheduler.
Schedules jobs to Mesos for execution based on requested resources -- continual retry on deployment and will monitor/redeploy the job.(aka Mesos ‘framework)
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Aurora Scheduler
Aurora Thermos
Aurora Thermos
Aurora Thermos
Heron
Heron BinariesTopology Package
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Heron
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Aurora Scheduler
Aurora Thermos
Aurora Thermos
Aurora Thermos
Heron BinariesTopology Package
LOTS OF CONFIGURATION …. then ….
Submit Heron topology to Aurora …
$ heron submit {cluster}/{role}/{prod/stag/devel} my-topology.jar com.my.MyTopology MyTopology
… the Heron configuration is setup for Aurora … and the topology requests: resources (CPU, RAM, Disk) and the number of instances.
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Heron
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Aurora Scheduler
Aurora Thermos
Aurora Thermos
Aurora Thermos
Heron BinariesTopology Package
Constructed job initial configuration...
[aurora, job, create, --wait-until, RUNNING, --bind, TOPOLOGY_NAME=AckingTopology, --bind, SANDBOX_SYSTEM_YAML=./heron-conf/heron_internals.yaml, --bind, COMPONENT_RAMMAP=exclaim1:1073741824,word:1073741824, --bind, SANDBOX_METRICS_YAML=./heron-conf/metrics_sinks.yaml, --bind, INSTANCE_JVM_OPTS_IN_BASE64="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg==", --bind, ROLE=tester, --bind, SANDBOX_PYTHON_INSTANCE_BINARY=./heron-core/bin/heron-python-instance, --bind, ENVIRON=devel, --bind, SANDBOX_SCHEDULER_CLASSPATH=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*, --bind, SANDBOX_INSTANCE_CLASSPATH=./heron-core/lib/instance/*, --bind, ISPRODUCTION=false, --bind, TOPOLOGY_CLASSPATH=heron-examples.jar, --bind, CLUSTER=tcop, --bind, SANDBOX_EXECUTOR_BINARY=./heron-core/bin/heron-executor, --bind, STATEMGR_CONNECTION_STRING=10.10.20.26:2181, --bind, COMPONENT_JVM_OPTS_IN_BASE64="", --bind, TOPOLOGY_ID=AckingTopology830a100b-45f6-46de-9497-3e6c4b2034ef, --bind, TOPOLOGY_PACKAGE_URI=/heron/topologies/tcop/AckingTopology-tester-tag-0--4824306064619316036.tar.gz, --bind, SANDBOX_STMGR_BINARY=./heron-core/bin/heron-stmgr, --bind, CORE_PACKAGE_URI=/heron/dist/heron-core.tar.gz, --bind, SANDBOX_METRICSMGR_CLASSPATH=./heron-core/lib/metricsmgr/*, --bind, TOPOLOGY_PACKAGE_TYPE=jar, --bind, RAM_PER_CONTAINER=6442450944, --bind, SANDBOX_TMASTER_BINARY=./heron-core/bin/heron-tmaster, --bind, TOPOLOGY_BINARY_FILE=heron-examples.jar, --bind, TOPOLOGY_DEFINITION_FILE=AckingTopology.defn, --bind, NUM_CONTAINERS=2, --bind, CPUS_PER_CONTAINER=5.0, --bind, SANDBOX_SHELL_BINARY=./heron-core/bin/heron-shell, --bind, DISK_PER_CONTAINER=17179869184, --bind, STATEMGR_ROOT_PATH=/heron, --bind, HERON_SANDBOX_JAVA_HOME=/opt/jdk1.8.0_91, tcop/tester/devel/AckingTopology, /home/rwilcom/.heron/conf/tcop/heron.aurora, --verbose]
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Heron
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Aurora Scheduler
Aurora Thermos
Aurora Thermos
Aurora Thermos
Heron BinariesTopology Package
Constructed job final configuration...
JobConfiguration(instanceCount=2, cronSchedule=None, cronCollisionPolicy=0, key=JobKey(environment=u'devel', role=u'tester', name=u'AckingTopology'), taskConfig=TaskConfig(isService=True, priority=0, taskLinks={}, tier=None, executorConfig=ExecutorConfig(data='{"environment": "devel", "health_check_config": {"initial_interval_secs": 15.0, "health_checker": {"http": {"expected_response_code": 0, "endpoint": "/health", "expected_response": "ok"}}, "interval_secs": 10.0, "timeout_secs": 1.0, "max_consecutive_failures": 0}, "name": "AckingTopology", "service": true, "max_task_failures": 1, "cron_collision_policy": "KILL_EXISTING", "enable_hooks": false, "cluster": "tcop", "task": {"processes": [{"daemon": false, "name": "fetch_heron_system", "ephemeral": false, "max_failures": 1, "min_duration": 5, "cmdline": "/opt/hadoop/hadoop/bin/hdfs dfs -get /heron/dist/heron-core.tar.gz heron-core.tar.gz && tar zxf heron-core.tar.gz", "final": false}, {"daemon": false, "name": "fetch_user_package", "ephemeral": false, "max_failures": 1, "min_duration": 5, "cmdline": "/opt/hadoop/hadoop/bin/hdfs dfs -get /heron/topologies/tcop/AckingTopology-tester-tag-0--4824306064619316036.tar.gz topology.tar.gz && tar zxf topology.tar.gz", "final": false}, {"daemon": false, "name": "launch_heron_executor", "ephemeral": false, "max_failures": 1, "min_duration": 5, "cmdline": "./heron-core/bin/heron-executor {{mesos.instance}} AckingTopology AckingTopology830a100b-45f6-46de-9497-3e6c4b2034ef AckingTopology.defn 10.10.20.26:2181 /heron ./heron-core/bin/heron-tmaster ./heron-core/bin/heron-stmgr \\"./heron-core/lib/metricsmgr/*\\" \\"LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg==\\" \\"heron-examples.jar\\" {{thermos.ports[port1]}} {{thermos.ports[port2]}} {{thermos.ports[port3]}} ./heron-conf/heron_internals.yaml exclaim1:1073741824,word:1073741824 \\"\\" jar heron-examples.jar /opt/jdk1.8.0_91 {{thermos.ports[http]}} ./heron-core/bin/heron-shell {{thermos.ports[port4]}} tcop tester devel \\"./heron-core/lib/instance/*\\" ./heron-conf/metrics_sinks.yaml \\"./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*\\" \\"{{thermos.ports[scheduler]}}\\" ./heron-core/bin/heron-python-instance", "final": false}, {"daemon": false, "name": "discover_profiler_port", "ephemeral": false, "max_failures": 1, "min_duration": 5, "cmdline": "echo {{thermos.ports[yourkit]}} > yourkit.port", "final": false}], "name": "setup_and_run", "finalization_wait": 30, "max_failures": 1, "max_concurrency": 0, "resources": {"disk": 17179869184, "ram": 6442450944, "cpu": 5.0}, "constraints": [{"order": ["fetch_heron_system", "fetch_user_package", "launch_heron_executor", "discover_profiler_port"]}]}, "production": false, "role": "tester", "announce": {"primary_port": "http", "portmap": {"aurora": "http"}}, "lifecycle": {"http": {"graceful_shutdown_endpoint": "/quitquitquit", "port": "health", "shutdown_endpoint": "/abortabortabort"}}, "priority": 0}', name='AuroraExecutor'), metadata=frozenset([]), requestedPorts=set([u'http', u'yourkit', u'scheduler', u'port4', u'port2', u'port3', u'port1']), jobName=u'AckingTopology', environment=u'devel', ramMb=6144, job=JobKey(environment=u'devel', role=u'tester', name=u'AckingTopology'), production=False, diskMb=16384, owner=Identity(role=u'tester', user='rwilcom'), container=Container(docker=None, mesos=MesosContainer()), maxTaskFailures=1, contactEmail=None, numCpus=5.0, constraints=set([])), owner=Identity(role=u'tester', user='rwilcom'))
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Aurora Scheduler
Aurora Thermos
Aurora Thermos
Aurora Thermos
Heron BinariesTopology Package
5 CPU6G RAM16G Disk
5 CPU6G RAM16G Disk
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Aurora Scheduler
Aurora Thermos
Aurora Thermos
Aurora Thermos
Heron BinariesTopology Package
5 CPU6G RAM16G Disk
6 CPU6G RAM16G Disk
Aurora submits the job to Mesos - (e.g.) asking for 2 instances that requires at least 5 CPUs, 6G RAM, and 16G of Disk.
Meanwhile, Heron Binaries and the Topology are pushed to HDFS so they are made available to the resources.
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017Aurora Monitoring Tools
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017Mesos Monitoring Tools
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Aurora Scheduler
Aurora Thermos
Aurora Thermos
Aurora Thermos
Heron BinariesTopology Package
5 CPU6G RAM16G Disk
5 CPU6G RAM16G Disk
Heron Topology
Heron Topology
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming - Heron - Architecture
(standby instances not represented)
Zookeeper
HDFS
Mesos Master
Mesos Agent
Mesos Agent
Mesos Agent
Aurora Scheduler
Aurora Thermos
Aurora Thermos
Aurora Thermos
Heron BinariesTopology Package
Mesos checks its agent offerings - finding 2 instances that can provide the resources required - the job is passed to those agents as tasks and they are launched within their own sandbox (VM) via the Aurora Thermos executor …….. the topology is now running.
5 CPU6G RAM16G Disk
5 CPU6G RAM16G Disk
Heron Topology
Heron Topology
Aurora keeps a handle on the health of each process and will redeploy on failures. Failure rates and metrics can also be monitored by the Heron UI.
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017Aurora Monitoring Tools
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017Mesos Monitoring Tools
Streaming - Heron - Architecture
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
● Topology Master● Container
○ Stream Manager○ Metrics Manager○ “I1” … == Instances
(Spouts and Bolts)
credit: https://twitter.github.io/heron/docs/concepts/architecture/
Streaming - Heron - Architecture
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
credit: https://twitter.github.io/heron/docs/concepts/architecture/
● Topology Master● Container
○ Stream Manager○ Metrics Manager○ “S*”/”B*” ==
Spouts and Bolts
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017Heron Monitoring Tools (uses the Heron Tracker API)
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017Heron Monitoring Tools (uses the Heron Tracker API)
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017Heron Monitoring Tools ( w/ Heron Tracker API)
Heron - Developing Topologies
● “Heron Test” Topology● Tuples● Spouts/Bolts● Configuring the Topology● Guaranteed Delivery - Tuple Tree / ACKs
○ Spout is the ‘replay’ point○ Exactly-Once Semantics (new!)
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
tip: use ‘Storm’ documentation!
Heron - Developing Topologies
“Heron Test” Topology
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
AirT
raffi
c
OpenSkyAir Traffic
Data Source
Nor
mal
izer
Pun
chin
gBag
Rou
ter
Loca
tionE
nric
h
Loca
tionA
lert
Act
ivity
Enr
ich
Act
ivity
Ale
rt
Ale
rtPub
lishe
r
Redis
Goals of this test topology:
● Use live data feed● MICRO services● Enrich / Alert● Try Guaranteed Delivery● Test adverse conditions● Split and Join streams● Publish live alerts● Deployment environment● TODO: elastic scaling● TODO: exactly-once delivery
spoutbolt bolt bolt
bolt bolt
bolt bolt
boltRedis
Redis
Redis
Heron - Developing Topologies - Tuple
com.twitter.heron.api.tuple.Fieldscom.twitter.heron.api.tuple.Values
/* primitive list */Fields tupleSchema = new Fields( “ID”, “Country”, “isFlying”, “onTime”, “Color”);Value myTuple = new Values( “XTF123“, “France”, true, false, “Blue”);
OR
/*serialized object*/Fields tupleSchema = new Fields( “AirplaneObj”);byte[] airplaneAsBytes = /* convert object to byte array */;Value myTyple = new Values( airplaneAsBytes );
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Heron - Developing Topologies - Spoutpublic class AirTrafficSpout extends com.twitter.heron.api.spout.BaseRichSpout {
private SpoutOutputCollector collector;
public void open(Map<String, Object> map, TopologyContext tc, SpoutOutputCollector soc){collector = soc;/* prep data source */
} public void close() { … }
public void nextTuple() { /* your code here */com.twitter.heron.api.metric.GlobalMetrics.incr("AirTrafficSpout_emit"); /* use Heron metrics */collector.emit( /*your tuple here*/, msgId /*ack/fail message ID*/);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) { /*define tuple schema - Fields*/ }public void ack(Object msgId) { /*if guaranteed delivery, clear tuple*/ }public void fail(Object msgId) { /*if guaranteed delivery, manually replay tuple*/ }
}
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Heron - Developing Topologies - Boltpublic class NormalizerBolt extends com.twitter.heron.api.bolt.BaseRichBolt {
private OutputCollector collector;
public void prepare(Map<String, Object> map, TopologyContext tc, OutputCollector oc) { collector = oc;}
public void declareOutputFields(OutputFieldsDeclarer declarer) { /*define tuple schema*/ } public void execute(Tuple tuple) {
/* your code here */
//emit the results down the stream (note: anchor tuple to build a ‘tuple tree’ for guaranteed delivery) collector.emit( tuple, /*your new tuple here - Fields*/);
//if done with *this* tuple then ack it - or don’t if you are carrying it forwardcollector.ack(tuple);
}}
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Heron - Developing Topologies - Builder public static void main(String[] args) throws Exception { /* note: simplified from full topology defined in previous slides */ com.twitter.heron.api.topology.TopologyBuilder builder = new com.twitter.heron.api.topology.TopologyBuilder(); builder.setSpout(“AIRTRAFFIC_DATA_SOURCE”, new AirTrafficSpout(), 1 /*# instances*/); builder.setBolt(“NORMALIZER_NODE”, new NormalizerBolt(), 4).shuffleGrouping(“AIRTRAFFIC_DATA_SOURCE”); builder.setBolt(“ROUTER_NODE”, new RoutingBolt(), 2).shuffleGrouping(“NORMALIZER_NODE“); ... builder.setBolt(“LOCATION_ENRICH_NODE”, new LocationEnrichBolt(), 2) .shuffleGrouping(“ROUTER_NODE”, ”LOCATION_STREAM”); /*split stream - name it*/ builder.setBolt(“ALERT_ENRICH_NODE”, new AlertEnrichBolt(), 2) .shuffleGrouping(“ROUTER_NODE”, ”ALERT_STREAM”); /*split stream - name it*/ ... BoltDeclarer bdAlertPublisher = builder.setBolt(“ALERT_PUBLISHER_NODE“, new AlertPublisherBolt(), 4); bdAlertPublisher.shuffleGrouping(“LOCATION_ENRICH_NODE”, “LOCATION_STREAM”); bdAlertPublisher.shuffleGrouping(“ACTIVITY_ENRICH_NODE”, “ACTIVITY_STREAM”);
/* see next slide */ ... }
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Heron - Developing Topologies - Builder (continued)
public static void main(String[] args) throws Exception { ... /* see previous slide */ com.twitter.heron.api.Config conf = new com.twitter.heron.api.Config(); ... conf.setEnableAcking(true); //turns on guaranteed delivery conf.setNumStmgrs(5); //number of stream managers == number of containers conf.setComponentRam(“AIRTRAFFIC_DATA_SOURCE”, ByteAmount.fromMegabytes(500) ); conf.setComponentRam(“NORMALIZER_NODE”, ByteAmount.fromMegabytes(200) ); … conf.setComponentRam(“ALERT_ENRICH_NODE”, ByteAmount.fromMegabytes(200) ); ... conf.setContainerDiskRequested( ByteAmount.fromGigabytes(1) ); /*whole container setting*/ conf.setContainerCpuRequested( 2 ); /*whole container setting*/ com.twitter.heron.api.HeronSubmitter
.submitTopology(args[0] /*passed in topology name*/, conf, builder.createTopology()); }
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Heron - Node Failures/Adversity
● Heron Reaction to Failures or other problems?○ “PunchingBagBolt” - FAILED ACKs○ “PunchingBagBolt” - Memory Failure○ “PunchingBagBolt” - Slow Processing / Performance
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Heron - Elastic Scaling
● Health Monitor (“Dhalion”) (new!)
○ Self Healing/Regulating - unique to Heron!○ Under Provisioned, Performance, Data Skew○ Configured Thresholds○ Prevents Backpressure Scenario○ Invasive (dynamically alter topology) or Non-Invasive (alert about topology)
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
white paper: https://www.microsoft.com/en-us/research/wp-content/uploads/2017/06/p1218-floratou.pdf
Streaming - Heron - code examples
Heron Java, Python, Scala Topology examples (developed by Heron)
https://github.com/twitter/heron/tree/master/heron/examples/src
Heron Java Air Traffic w/ Redis example (developed by Me)
https://github.com/rwilcom/herontest
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Passionately committed to making a significant, tangible, positive difference in the security and
well-being of our country and our allies.
Ron WilcomChief Engineer
[email protected]://nextcentury.com/
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
Thoughts or Questions?
Links
● https://twitter.github.io/heron/ (documentation)● https://blog.twitter.com/2015/flying-faster-with-twitter-heron (article)● http://www.infoworld.com/article/3078134/analytics/had-it-with-apache-storm-her
on-swoops-to-the-rescue.html (article)● https://dzone.com/articles/getting-started-with-heron-on-apache-mesos-and-apa
(article)● https://pdfs.semanticscholar.org/e847/c3ec130da57328db79a7fea794b07dbccd
d9.pdf (Twitter - original whitepaper)● https://twitter.com/heronstreaming (Twitter feed)
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017
● Spark Streaming (Apache/Hadoop)● Storm (Apache/Twitter)● Heron (Twitter - direct Storm migration)● Samza (Apache/LinkedIn)● S4 (Yahoo/Apache)● Flink (Apache/Hadoop - unify batch and streaming)● Kinesis^ (Amazon - plug in Spark, etc)● Data Torrent^ : Apache Apex● Esper^● IBM InfoSphere Streams^ ● Tibco Streambase^
^ commercial
Streaming Analytics (streaming and mirco-batch) - Notable Open Source and Commercial
Streaming With HeronRon Wilcom, Chief Engineer Next Century Corporation - 2017