Scaling Apache Storm - Hadoop Summit 2014

Embed Size (px)

DESCRIPTION

Slides form my Hadoop Summit presentation on scaling Apache Storm. Presented at Hadoop Summit 2014, San Jose.

Citation preview

  • Scaling Apache Storm P. Taylor Goetz, Hortonworks @ptgoetz
  • About Me Member of Technical Staff / Storm Tech Lead @ Hortonworks Storm Committer / PPMC Member / Release Mgr. @ Apache
  • About Me Member of Technical Staff / Storm Tech Lead @ Hortonworks Storm Committer / PPMC Member / Release Mgr. @ Apache Volunteer Firefighter since 2004
  • 1M+ messages / sec. on a 10-15 node cluster How do you get there?
  • How do you fight fire?
  • Put the wet stuff on the red stuff. Water, and lots of it.
  • When you're dealing with big fire, you need big water.
  • Water Sources Lakes Streams Reservoirs, Pools, Ponds
  • Data Hydrant You heard it here first.
  • How does this relate to Storm?
  • Littles Law L=W The long-term average number of customers in a stable system L is equal to the long-term average effective arrival rate, , multiplied by the average time a customer spends in the system, W; or expressed algebraically: L = W. http://en.wikipedia.org/wiki/Little's_law
  • Batch vs. Streaming
  • Batch Processing Typically operates on data at rest Velocity is a function of performance Poor performance costs you time
  • Stream Processing At the mercy of your data source Velocity fluctuates over time Poor performance.
  • Poor performance bursts the pipes. Buffers fill up and eat memory Timeouts / Replays Sink systems overwhelmed
  • What can developers do?
  • public class MyBolt extends BaseRichBolt { public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { // initialize task } public void execute(Tuple input) { // process input QUICKLY! } public void declareOutputFields(OutputFieldsDeclarer declarer) { // declare output } } Keep tuple processing code tight Worry about this!
  • public class MyBolt extends BaseRichBolt { public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { // initialize task } public void execute(Tuple input) { // process input QUICKLY! } public void declareOutputFields(OutputFieldsDeclarer declarer) { // declare output } } Keep tuple processing code tight Not this.
  • Know your latencies L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD* 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150 ms https://gist.github.com/jboner/2841832
  • Use a Cache Guava is your friend.
  • DevOps will appreciate it. Expose your knobs and gauges.
  • What can DevOps do?
  • How big is your hose?
  • Text Find out!
  • Text Performance testing is essential!
  • How to deal with small pipes? (i.e. When your output is more like a garden hose.)
  • Parallelize Slow sinks
  • Parallelism == Manifold Take input from one big pipe and distribute it to many smaller pipes The bigger the size difference, the more parallelism you will need
  • Sizeup Initial assessment
  • Every fire is different.
  • Text
  • Every Storm use case is different.
  • Sizeup Fire What are my water sources? What GPM can they support? How many lines (hoses) will I need? How much water will I need to flow to put this fire out?
  • Sizeup Storm What are my input sources? At what rate do they deliver messages? What size are the messages? What's my slowest data sink?
  • There is no magic bullet.
  • But there are good starting points.
  • Numbers Where to start.
  • 1 Worker / Machine / Topology Keep unnecessary network transfer to a minimum
  • 1 Acker / Worker Default in Storm 0.9.x
  • 1 Executor / CPU Core Optimize Thread/CPU usage
  • 1 Executor / CPU Core (for CPU-bound use cases)
  • 1 Executor / CPU Core Multiply by 10x-100x for I/O bound use cases
  • Example 10 Worker Nodes 16 Cores / Machine 10 * 16 = 160 Parallelism Units available
  • Example 10 Worker Nodes 16 Cores / Machine 10 * 16 = 160 Parallelism Units available Subtract # Ackers: 160 - 10 = 150 Units.
  • Example 10 Worker Nodes 16 Cores / Machine (10 * 16) - 10 = 150 Parallelism Units available
  • Example 10 Worker Nodes 16 Cores / Machine (10 * 16) - 10 = 150 Parallelism Units available (* 10-100 if I/O bound) Distrubte this among tasks in topology. Higher for slow tasks, lower for fast tasks.
  • This is just a starting point. Test, test, test. Measure, measure, measure.
  • Internal Messaging Handling backpressure.
  • Internal Messaging (Intra-worker)
  • Turn knobs slowly, one at a time.
  • Don't mess with settings you don't understand.
  • Storm ships with sane defaults Override only as necessary
  • Hardware Considerations
  • Minimum Hardware Requirements
  • CPU Cores More is usually better The more you have the more threads you can support (i.e. parallelism) Storm potentially uses a LOT of threads
  • Memory Highly use-case specific How many workers (JVMs) per node? Are you caching and/or holding in-memory state? Tests/metrics are your friends
  • Network Use bonded NICs if necessary Keep nodes close
  • Other performance considerations
  • Dont Pancake! Separate concerns.
  • Keep this guy happy. He has big boots and a shovel. He will hurt you if you piss him off.
  • Shameless Plug http://www.packtpub.com/sto rm-distributed-real-time- computation-blueprints/book
  • Thanks! Questions? Storm BoF Session 3:30 Room 230A