汇报人：李旺龙 Discretized Streams: Fault-Tolerant Streaming Computation at Scale Matei...

汇报人：李旺龙

Discretized Streams: Fault-Tolerant Streaming Computation at Scale

Matei Zaharia UC Berkeley AMPLab

SOSP 2013ACM Symposium on Operating Systems Principles

Streaming

BREAD PPT DESIGN

目录Introduction1

CONTENTS

Background2

Implementation3

Experiment4

Conclusion5数据库与知识工程实验室

www.dbke.sinaapp.com

Streaming

BREAD PPT DESIGN

Introduction

数据库与知识工程实验室

Much of “big data” is received in real time, and is most valuable at its time of arrival

Social network may wish to detect trending conversation topics in minutes

E-Commerce website may wish to model which users visit a new page

Service operator may wish to monitor program logs to detect failures in seconds

BREAD PPT DESIGN

Introduction

To enable these low-latency processing applications, there is a need for streaming computation models that scale transparently to large clusters

Most distributed streaming systems, including Storm,TimeStream, MapReduce Online, and streaming databases, are based on a continuous operator model

long-running, stateful operators receive each record, update internal state, and send new records.

BREAD PPT DESIGN

Introduction

BREAD PPT DESIGN

Introduction

Major Problems : Faults & Stragglers

Continuous operator model perform recovery through two approaches : Replication, where there are two copies of each

node costs 2× the hardware Upstream Backup, where nodes buffer sent

messages and replay them to a new copy of a failed node takes a long time to recover

BREAD PPT DESIGN

Introduction

Major Problems : Faults & Stragglers

Neither approach handles stragglers: Replication, synchronization protocols to

coordinate replicas slow down

Upstream Backup, treated as a failure costly recovery

BREAD PPT DESIGN

Introduction

This paper presents a new stream processing model, discretized streams (D-Streams), that overcomes these challenges

Instead of managing long-lived operators, the idea in D-Streams is to structure a streaming computation as a series of stateless, deterministic batch computations on small time intervals

BREAD PPT DESIGN

Introduction

BREAD PPT DESIGN

Introduction

Challenge 1 ： latency lowWe use a data structure called Resilient Distributed Datasets (RDDs) ， which keeps data in memory and can recover it without replication by tracking the lineage graph of operations that were used to build it

Challenge 2 ： quickly recovery from faults and stragglersParallel recovery, When a node fails, each node in the cluster works to recompute part of the lost node’s RDDs, resulting in significantly faster recovery than upstream backup without the cost of replication

BREAD PPT DESIGN

Introduction

We have implemented D-Streams in a system calledSpark Streaming, based on the Spark engine

The system can process over 60 million records/second on 100 nodes at sub-second latency, and can recover from faults and stragglers in sub-second time.

BREAD PPT DESIGN

Introduction

Spark Streaming’s per-node throughput is comparable to commercial streaming databases, while offering linear scalability to 100 nodes, and is 2–5× faster than the open source Storm and S4 systems, while offering fault recovery guarantees that they lack.

D-Streams use the same processing model and data structures (RDDs) as batch jobs, a powerful advantage of our model is that streaming queries can seamlessly be combined with batch and interactive computation.

BREAD PPT DESIGN

目录Introduction1

CONTENTS

Background2

Implementation3

Experiment4

Streaming

BREAD PPT DESIGN

Background

Review SparkCreator of Hadoop Doug Cutting says “the use of MapReduce engine for Big Data projects will decline, replaced by Apache Spark”

BREAD PPT DESIGN

Background

Review Spark

The Spark Stack

Spark SQLRelationalOperators

MLLibMachineLearning

GraphXGraph

Processing

SparkStreamingReal-time

Spark Runtime

YARN, Mesos, AWS HDFS, S3, Cassandra …Cluster Managers Data Sources

A fast and general engine for large-scale data processing

BREAD PPT DESIGN

Background

Review Spark

Resilient distributed datasets (RDDs) that enables efficient data reuse in a broad range of applications

Fault-tolerant Parallel data structures Explicitly persist in memory Control their partition A rich set of operators

arthur

BREAD PPT DESIGN

Background

Review Spark

1-102-11

1-jack2-tom

1-(10,jack)2-(11,tom)

BREAD PPT DESIGN

Background

过去“人人都是产品经理”这两年“人人都是大数据专家”再过两年“人人都是电影导演”数据库与知识工程实验室

Review MapReduce

BREAD PPT DESIGN

data block (key,value)

(key,value)

(key,value_list)

(key,value)

split map reduceshuffle/partition

The school motto analysis by MapReduce自强弘毅求是拓新（武大）明德厚学求是创新（华科）自强 1 弘毅

1求是 1 拓新 1 明德 1 厚学 1 求是 1 创新 1求实 1 创新 1 进取 1 团结 1严紧 1 求实 1 团结 1 创新 1

自强 1求是 1 1…明德

求实创新进取团结（大连理工）严紧求实团结创新（同济）

0 自强弘毅求是拓新 1 明德厚学求是创新 0 求实创新进取团结1 严紧求实团结创新

自强 1求是 2…明德 1

map shuffle reduce

弘毅 1严紧 1…创新 3

弘毅 1严紧 1…创新 1 1 1

BackgroundReview MapReduce

BREAD PPT DESIGN

data block (key,value)

(key,value)

(key,value_list)

(key,value)

split map reduceshuffle/partition

The school motto analysis by MapReduce

BackgroundReview MapReduce

val file = spark.textFile("src/main/resources/abc") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey( (a,b) => a+b ) counts.saveAsTextFile("src/main/resources/out")

BREAD PPT DESIGN

Background

Our work targets applications that need to run on tens to hundreds of machines, and tolerate a latency of several seconds. Some examples are: Site activity statistics Cluster monitoring Spam detection

For these applications, we believe that the 0.5–2 second latency of D-Streams is adequate, as it is well belowthe timescale of the trends monitored. We purposely donot target applications with latency needs below a fewhundred milliseconds, such as high-frequency trading

BREAD PPT DESIGN

目录Introduction1

CONTENTS

Background2

Implementation3

Experiment4

Streaming

BREAD PPT DESIGN

Implementation

val conf = new SparkConf().setMaster("local[2]") .setAppName("NetworkWordCount")val ssc = new StreamingContext(conf, Seconds(3))val lines = ssc.socketTextStream("203.195.218.212“ ,10000)val words = lines.flatMap(line=> line.split(" "))val pairs = words.map(word => (word,1))val wordCounts = pairs.reduceByKey( (a,b) => a+b )wordCounts.print()ssc.start()ssc.awaitTermination()

BREAD PPT DESIGN

Implementation

BREAD PPT DESIGN

Implementation

The window operation groups all the records from a sliding window of past time intervals into one RDD

Windowingwords.window("5s") yields a D-Stream of RDDs containing the words in intervals [0 , 5), [1 , 6), [2 , 7)…

BREAD PPT DESIGN

Implementation

Incremental aggregationpairs.reduceByWindow("5s", (a, b) => a + b)pairs.reduceByWindow("5s", (a,b) => a+b, (a,b) => a-b)

BREAD PPT DESIGN

Implementation

State trackinghow many sessions have a bitrate above X ?

One could count the active sessions from a stream of (ClientID, Event)

sessions = events.track((key, ev) => 1, // initialize function(key, st, ev) => // update functionev == Exit ? null : 1,"30s") // timeoutcounts = sessions.count() // a stream of ints

BREAD PPT DESIGN

Implementation

Unification with Batch & Interactive ProcessingSpark Streaming provides several powerful features to unify streaming and batch processing D-Streams can be combined with static RDDs computed using a standard Spark job Users can run a D-Stream program on previous

historical data using a “batch mode.” Users run ad-hoc queries on D-Streams

interactively by attaching a Scala console to their Spark Streaming program and running arbitrary Spark operations on the RDDs there

counts.slice("21:00", "21:05").topK(10)

BREAD PPT DESIGN

Implementation

BREAD PPT DESIGN

Implementation

Master tracks the D-Stream lineage graph and schedules tasks to compute new RDD partitions.Worker nodes that receive data, store the partitionsof input and computed RDDs, and execute tasks.Client library used to send data into the system

BREAD PPT DESIGN

Implementation

New data is replicated across two worker nodes before sending an acknowledgement to the client library, because D-Streams require input data to be stored reliably to recompute results. If a worker fails, the client library sends unacknowledged data to another worker.

BREAD PPT DESIGN

Implementation

Spark Streaming relies on Spark’s existing batchscheduler within each timestep, and performs manyof the optimizations in systems

It pipelines operators that can be grouped into a single task, such as a map followed by another map.

It places tasks based on data locality. It controls the partitioning of RDDs to avoid

shuffling data across the network

BREAD PPT DESIGN

Implementation

Optimizations for Stream Processing Network communication :asynchronous I/O Timestep pipelining: submitting tasks from the

next timestep before the current one has finished

Task Scheduling : messages size, more task Storage layer: RDDs are immutable, they can

be checkpointed over the network without blocking computations on them and slowing jobs.

Lineage cutoff : forget lineage after an RDD has been checkpointed

Master recovery : run 24/7

BREAD PPT DESIGN

Implementation

Memory ManagementEach node’s block store manages RDD partitions in an LRU fashion

User can set a maximum history timeout, after which the system will simply forget old blocks without doing disk I/O

The memory required by Spark Streaming is not onerous, because the state within a computation istypically much smaller than the input data

BREAD PPT DESIGN

Implementation

Parallel RecoveryThe system periodically checkpoints some of the state RDDs, by asynchronously replicating them to other worker nodesWhen a node fails, the system detects all missing RDD partitions and launches tasks to recompute them from the last checkpoint. Many tasks can be launched at the same time to compute different RDD partitions, allowing the whole cluster to partake in recovery.

BREAD PPT DESIGN

Implementation

Parallel Recovery

BREAD PPT DESIGN

Implementation

Parallel Recovery

恢复量满载恢复时间新数据

BREAD PPT DESIGN

Implementation

Straggler MitigationD-Streams also let us mitigate stragglers like batch systems do, by running speculative backup copies of slow tasks.Such speculation would be difficult in a continuous operator system, as it would require launching a new copy of a node, synchronizd populating its state, and overtaking the slow copy. whenever a task runs more than 1 .4×longer than the median task in its job stage, we mark it as slow. More refined algorithms

BREAD PPT DESIGN

Implementation

Master RecoveryWriting the state of the computation reliably when starting each timestep Having workers connect to a new master and report their RDD partitions to it when the old master fails

Stores D-Stream metadata in HDFSgraph, function objects, checkpoint time,updated rdd

A 100-node cluster resuming work in 12 seconds

BREAD PPT DESIGN

目录Introduction1

CONTENTS

Background2

Implementation3

Experiment4

Streaming

BREAD PPT DESIGN

Experiment

Amazon EC2 m1.xlarge 4 cores and 15 GB RAM1 s latency target -> 500 ms input intervals2 s latency target -> 1 s intervals100-byte input records

BREAD PPT DESIGN

Experiment

Spark Streaming’s per-node throughput of 640,000 records/s for Grep and 250,000 records/s for TopKCount on 4-core nodesOracle CEP 1 million records/s on 16 coresStreamBase 245,000 records/s on 8 coresEsper 500,000 records/s on 4 cores

While there is no reason to expect D-Streams to be slower or faster per-node, the key advantage is that Spark Streaming scales nearly linearly to 100 nodes

BREAD PPT DESIGN

Experiment

S4 was limited in the number of records/second it could process per, which made it almost 10× slower than Spark and Storm.

Storm is still adversely affected by smaller record sizes

BREAD PPT DESIGN

Experiment

1-second batches with input data residing in HDFS20 MB/s/node for WordCount 80 MB/s/node for Grepcheckpoint interval of 10 seconds 20 four-core nodes

BREAD PPT DESIGN

Experiment

BREAD PPT DESIGN

Experiment

doubling the nodes reduces the recovery time in half

BREAD PPT DESIGN

Experiment

We tried slowing down one of the nodes instead of killing it, by launching a 60-thread process that overloaded the CPU

BREAD PPT DESIGN

目录Introduction1

CONTENTS

Background2

Implementation3

Experiment4

Streaming

BREAD PPT DESIGN

Conclusion

We have proposed D-Streams, a new model for distributed streaming computation that • enables fast recovery from both faults and stragglers

without the overhead of replication• forgot conventional streaming wisdom by batching

data into small timesteps• support a wide range of operators and can attain high

per-node throughput, linear scaling to 100 nodes, sub-second latency, and sub-second fault recovery

• compose seamlessly with batch and interactive queries

BREAD PPT DESIGN

工作进展

论文工作

实习工作手 Q 质量数据处理

流数据挖掘

Spark 调研

BREAD PPT DESIGN

实习工作对手机 QQ 十多个质量指标，约 50 个事件进行监控收发图片、收发消息、收发文件、登陆、页面切换等群图片、讨论组图片、用户间图片等每天收图片日志 iPhone 13 亿 Android 60 亿约 8 万条 / 秒， 80M/ 秒， 7T/ 天 Java + Python + Hive + Pig + PostgreSQL

BREAD PPT DESIGN

实习工作数据样例

2014080108, 中国 , 浙江省 , 中国电信 ,unknown,unknown,10.157.89.36 2014-08-01 07:59:59.949,INFO,0S200MNJT807V3GE,5.0.0.146,beacon,1.8.0,H30-U10;Android 4.2.2,level 17,122.242.114.5,122.242.114.5,wifi,actGroupPicSmallDownV1,true, 4397,66,A2=000000000000000&A1=1085779492&A4=000000000000000&param_NetworkInfo=2&A3=000000000000000_00:0c:e7:30:13:cf&A6=20:08:ed:07:c6:8d&param_step=1_1_1_0_65;2_-1_0_0_0;3_-1_0_0_0&serverip=61.151.234.34&A7=7638540fc92e0c2e &param_groupPolicy=1&param_uuid={2115FC55-2DA4-4A73-3570-FA89969A3C17}.jpg &param_uinType=1&A67=com.tencent.mobileqq:MSF&QQ=&A28=122.242.114.5&A27=4397&param_FailCode=0&A26=66&A25=true&A23=2017&param_DownMode=1&param_ProductVersion=537039093&param_NetworkOperator= 中国移动&param_SsoServerIp=14.17.42.23:8080&param_runStatus=0&A19=wifi&param_grpUin=213478033&param_GatewayrIp=122.242.114.5&param_Server=61.151.234.34,2014-08-01 07:59:15,2014-08-01 07:59:59,,1085779492,Android,4.2.2,1085779492,0, 000000000000000_00:0c:e7:30:13:cf,0,,20:08:ed:07:c6:8d,7638540fc92e0c2e,,,,,,,,,,,,wifi,,,,2017,,true,66,4397,122.242.114.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,com.tencent.mobileqq:MSF,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20140801080

BREAD PPT DESIGN

实习工作手 Q 质量数据处理流程

灯塔灯塔库表

各指标小时统计总表小时汇总到天总表

收图小时表

收图天表

TDWHDFS PG入库出库计算

发图天表

发图小时表

……

BREAD PPT DESIGN

手 Q 质量数据处理由 SQL 转向 Pig

灯塔灯塔库表

各指标小时统计总表小时汇总到天总表

收图小时表

收图天表

TDWHDFS PG入库出库计算

发图天表

发图小时表

……

…收图统计

发图统计

…TDW

PigHDFS

入库

PigHDFS转移

www.dbke.sinaapp.com实习工作

BREAD PPT DESIGN

从 SQL 转向 Pig 后，每天手 Q 质量数据处理成本由 3000+ 降低至约 1000

BREAD PPT DESIGN

手 Q 质量数据处理由 SQL 转向 Pig 成效——原因分析SQL 重复解析 0_1_0_12;1_1_0_372;2_1_1_2245 => col1--col15

Pig 定义一个 UDF

PS ： Hive 也支持 UDF 但 TDW SQL 不支持

BREAD PPT DESIGN

Spark 现状

HiveStormMahout Giraph

采用 Scala 编写，支持Python 、 Scala 、 Java

Spark SQL Spark StreamingMLlib GraphX

实习工作

BREAD PPT DESIGN

实习工作

• 学术界对工业的理论创新 RDD vs MapReduce• 不仅支持 MapReduce ，还支持 Pregel 等多范式• 充分利用内存，支持 DAG ，少序列化、 IO 、网络• 数据加载时， partition 可控• 多级别内存持久化可控，交互式查询• 基于血统的容错机制，类管道支持• 速度优势明显、内存消耗大• 支持 SQL 、流数据、离线数据、图数据、机器学习等• 学习了基本的 Spark 使用，其他框架上手容易不仅仅是快

Spark VS Hadoop

BREAD PPT DESIGN

实习工作

Spark 未来 ( San Francisco| June 30 - July 2, 2014Spark Summit 2014 )Spark SQL

• 优化：代码生成、更快的 join 等• 语言扩展：将支持 SQL92• 更好的集成

BREAD PPT DESIGN

实习工作

Spark 未来 ( San Francisco| June 30 - July 2, 2014Spark Summit 2014 )MLlib

• 支持的算法将由 15 个翻倍到 30 个左右，涵盖抽样、相关性、估计、检验等描述性统计学算法以及 NMF 、 Sparse SVD 和 LDA 等机器学习算法• SparkR 上线并集成到 MLlibStreaming将支持更多的数据源GraphX优化和 API稳定业界的贡献特性

停止MapReduce转向 Spark

BREAD PPT DESIGN

实习工作

Spark meetup in China

2014 年 8月 9 日 @北京 1st Intel 、亚信、 Databrick 2014 年 8月 31 日 @杭州华为、阿里巴巴2014 年 9月 6 日 @北京 2nd traintracks.io 、微软、京东2014 年 9月 21 日 @深圳华为、腾讯2014 年 10月 26 日 @北京 3rd Intel 、阿里巴巴、微软、美团、NJU

8月 9 日， Spark-User Beijing Meetup第一次分享活动在亚信科技总部研发中心大厦成功举办。本次活动吸引了包括百度、新浪、京东、 Tibco 、豌豆荚、豆瓣、微博、小米、华为、爱奇艺、美团、 58 、海星、搜狗、 CBSI 、神舟泰岳、大唐电信、 Talking Data 、安达佳、中航信、清华大学、北京邮电大学及银行系统等32 家不同公司、高校、金融系统共 121 人参与。

星火燎原

BREAD PPT DESIGN

进展• 研读了几篇关于流数据挖掘的博士论文，对于流数据挖掘的挑战与常用解决方法有了基本认识• 查阅了 Storm 这个工业界比较成熟的流数据系统的科普知识• 阅读MOA （ Massive Online Analysis ）这个流数据挖掘工具的文档，测试了一些例子计划• 深入对流数据挖掘算法的研究• 完成小论文的实验与撰写• 确定毕业论文的具体题目

www.dbke.sinaapp.com论文工作

BREAD PPT DESIGN

Thank You !

汇报人：李旺龙 Discretized Streams: Fault-Tolerant Streaming Computation at Scale Matei...

Documents

Joseph Gonzalez Postdoc, UC Berkeley AMPLab jegonzal@eecs.berkeley

Author: Yang Zhang[SOSP’ 13] Presentator : Jianxiong Gao

A Retrospective on AMPLab and the Berkeley Data Analytics Stack

DISCRETIZED BOND-BASED PERIDYNAMICS FOR SOLID MECHANICS …

Discretized Streams: Fault-Tolerant Streaming Computation at Scale

SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab

Solution of Discretized Equations

BIDET FLOW SOSP FL64 ING-FRA-SPA

Alluxio Presentation at AMPLab Summer Retreat 2016

Author: Yang Zhang[SOSP’ 13] Presentator: Jianxiong Gao

Discretized Streams

Differentiating discretized metrics and applications filelogo The continuous framework Applications Discretization Results Di erentiating discretized metrics and applications Filippo

The Next AMPLab: Real-Time, Intelligent, and Secure Computing

DISCRETIZED LIGHT-CONE QUANTIZATION: FORMALISM FOR QUANTUM ... · DISCRETIZED LIGHT-CONE QUANTIZATION: .- FORMALISM FOR QUANTUM ELECTRODYNAMICS* ... renormalization techniques required

Cells SOSP Final

Pensieve: Non-Intrusive Failure Reproduction for Distributed ...yuan/papers/pensieve-sosp...Pensieve: Non-Intrusive Failure Reproduction for Distributed Systems SOSP ’17, October

PeerReview: Practical Accountability for Distributed Systems SOSP 07

PacMin @ AMPLab All-Hands

Periodic Orbits of Discretized Rotations

Discretized Marching Cubes