View
230
Download
0
Category
Preview:
DESCRIPTION
BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 Much of “big data” is received in real time, and is most valuable at its time of arrival Social network may wish to detect trending conversation topics in minutes E-Commerce website may wish to model which users visit a new page Service operator may wish to monitor program logs to detect failures in seconds
Citation preview
汇报人:李旺龙
Discretized Streams: Fault-Tolerant Streaming Computation at Scale
Matei Zaharia UC Berkeley AMPLab
SOSP 2013ACM Symposium on Operating Systems Principles
Streaming
BREAD PPT DESIGN
目录Introduction1
CONTENTS
Background2
Implementation3
Experiment4
Conclusion5数据库与知识工程实验室
www.dbke.sinaapp.com
Streaming
BREAD PPT DESIGN
Introduction
数据库与知识工程实验室
www.dbke.sinaapp.com
Much of “big data” is received in real time, and is most valuable at its time of arrival
Social network may wish to detect trending conversation topics in minutes
E-Commerce website may wish to model which users visit a new page
Service operator may wish to monitor program logs to detect failures in seconds
BREAD PPT DESIGN
Introduction
数据库与知识工程实验室
www.dbke.sinaapp.com
To enable these low-latency processing applications, there is a need for streaming computation models that scale transparently to large clusters
Most distributed streaming systems, including Storm,TimeStream, MapReduce Online, and streaming databases, are based on a continuous operator model
long-running, stateful operators receive each record, update internal state, and send new records.
BREAD PPT DESIGN
Introduction
数据库与知识工程实验室
www.dbke.sinaapp.com
BREAD PPT DESIGN
Introduction
数据库与知识工程实验室
www.dbke.sinaapp.com
Major Problems : Faults & Stragglers
Continuous operator model perform recovery through two approaches : Replication, where there are two copies of each
node costs 2× the hardware Upstream Backup, where nodes buffer sent
messages and replay them to a new copy of a failed node takes a long time to recover
BREAD PPT DESIGN
Introduction
数据库与知识工程实验室
www.dbke.sinaapp.com
Major Problems : Faults & Stragglers
Neither approach handles stragglers: Replication, synchronization protocols to
coordinate replicas slow down
Upstream Backup, treated as a failure costly recovery
BREAD PPT DESIGN
Introduction
数据库与知识工程实验室
www.dbke.sinaapp.com
This paper presents a new stream processing model, discretized streams (D-Streams), that overcomes these challenges
Instead of managing long-lived operators, the idea in D-Streams is to structure a streaming computation as a series of stateless, deterministic batch computations on small time intervals
BREAD PPT DESIGN
Introduction
数据库与知识工程实验室
www.dbke.sinaapp.com
BREAD PPT DESIGN
Introduction
数据库与知识工程实验室
www.dbke.sinaapp.com
Challenge 1 : latency lowWe use a data structure called Resilient Distributed Datasets (RDDs) , which keeps data in memory and can recover it without replication by tracking the lineage graph of operations that were used to build it
Challenge 2 : quickly recovery from faults and stragglersParallel recovery, When a node fails, each node in the cluster works to recompute part of the lost node’s RDDs, resulting in significantly faster recovery than upstream backup without the cost of replication
BREAD PPT DESIGN
Introduction
数据库与知识工程实验室
www.dbke.sinaapp.com
We have implemented D-Streams in a system calledSpark Streaming, based on the Spark engine
The system can process over 60 million records/second on 100 nodes at sub-second latency, and can recover from faults and stragglers in sub-second time.
BREAD PPT DESIGN
Introduction
数据库与知识工程实验室
www.dbke.sinaapp.com
Spark Streaming’s per-node throughput is comparable to commercial streaming databases, while offering linear scalability to 100 nodes, and is 2–5× faster than the open source Storm and S4 systems, while offering fault recovery guarantees that they lack.
D-Streams use the same processing model and data structures (RDDs) as batch jobs, a powerful advantage of our model is that streaming queries can seamlessly be combined with batch and interactive computation.
BREAD PPT DESIGN
目录Introduction1
CONTENTS
Background2
Implementation3
Experiment4
Conclusion5数据库与知识工程实验室
www.dbke.sinaapp.com
Streaming
BREAD PPT DESIGN
Background
数据库与知识工程实验室
www.dbke.sinaapp.com
Review SparkCreator of Hadoop Doug Cutting says “the use of MapReduce engine for Big Data projects will decline, replaced by Apache Spark”
BREAD PPT DESIGN
Background
数据库与知识工程实验室
www.dbke.sinaapp.com
Review Spark
The Spark Stack
Spark SQLRelationalOperators
MLLibMachineLearning
GraphXGraph
Processing
SparkStreamingReal-time
Spark Runtime
YARN, Mesos, AWS HDFS, S3, Cassandra …Cluster Managers Data Sources
A fast and general engine for large-scale data processing
BREAD PPT DESIGN
Background
数据库与知识工程实验室
www.dbke.sinaapp.com
Review Spark
Resilient distributed datasets (RDDs) that enables efficient data reuse in a broad range of applications
Fault-tolerant Parallel data structures Explicitly persist in memory Control their partition A rich set of operators
jack
hash
arthur
tom
jack
arthur
tom
hash
BREAD PPT DESIGN
Background
数据库与知识工程实验室
www.dbke.sinaapp.com
Review Spark
1-102-11
1-jack2-tom
1-(10,jack)2-(11,tom)
join
BREAD PPT DESIGN
Background
过去“人人都是产品经理”这两年“人人都是大数据专家”再过两年“人人都是电影导演”数据库与知识工程实验室
www.dbke.sinaapp.com
Review MapReduce
BREAD PPT DESIGN
data block (key,value)
(key,value)
(key,value_list)
(key,value)
split map reduceshuffle/partition
The school motto analysis by MapReduce自强 弘毅 求是 拓新 (武大)明德 厚学 求是 创新 (华科) 自强 1 弘毅
1求是 1 拓新 1 明德 1 厚学 1 求是 1 创新 1求实 1 创新 1 进取 1 团结 1严紧 1 求实 1 团结 1 创新 1
自强 1求是 1 1…明德
求实 创新 进取 团结(大连理工)严紧 求实 团结 创新(同济)
0 自强 弘毅 求是 拓新 1 明德 厚学 求是 创新 0 求实 创新 进取 团结1 严紧 求实 团结 创新
自强 1求是 2…明德 1
map shuffle reduce
弘毅 1严紧 1…创新 3
弘毅 1严紧 1…创新 1 1 1
数据库与知识工程实验室
www.dbke.sinaapp.com
BackgroundReview MapReduce
BREAD PPT DESIGN
data block (key,value)
(key,value)
(key,value_list)
(key,value)
split map reduceshuffle/partition
The school motto analysis by MapReduce
数据库与知识工程实验室
www.dbke.sinaapp.com
BackgroundReview MapReduce
val file = spark.textFile("src/main/resources/abc") val counts = file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey( (a,b) => a+b ) counts.saveAsTextFile("src/main/resources/out")
BREAD PPT DESIGN
Background
数据库与知识工程实验室
www.dbke.sinaapp.com
Our work targets applications that need to run on tens to hundreds of machines, and tolerate a latency of several seconds. Some examples are: Site activity statistics Cluster monitoring Spam detection
For these applications, we believe that the 0.5–2 second latency of D-Streams is adequate, as it is well belowthe timescale of the trends monitored. We purposely donot target applications with latency needs below a fewhundred milliseconds, such as high-frequency trading
BREAD PPT DESIGN
目录Introduction1
CONTENTS
Background2
Implementation3
Experiment4
Conclusion5数据库与知识工程实验室
www.dbke.sinaapp.com
Streaming
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
val conf = new SparkConf().setMaster("local[2]") .setAppName("NetworkWordCount")val ssc = new StreamingContext(conf, Seconds(3))val lines = ssc.socketTextStream("203.195.218.212“ ,10000)val words = lines.flatMap(line=> line.split(" "))val pairs = words.map(word => (word,1))val wordCounts = pairs.reduceByKey( (a,b) => a+b )wordCounts.print()ssc.start()ssc.awaitTermination()
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
The window operation groups all the records from a sliding window of past time intervals into one RDD
Windowingwords.window("5s") yields a D-Stream of RDDs containing the words in intervals [0 , 5), [1 , 6), [2 , 7)…
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Incremental aggregationpairs.reduceByWindow("5s", (a, b) => a + b)pairs.reduceByWindow("5s", (a,b) => a+b, (a,b) => a-b)
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
State trackinghow many sessions have a bitrate above X ?
One could count the active sessions from a stream of (ClientID, Event)
sessions = events.track((key, ev) => 1, // initialize function(key, st, ev) => // update functionev == Exit ? null : 1,"30s") // timeoutcounts = sessions.count() // a stream of ints
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Unification with Batch & Interactive ProcessingSpark Streaming provides several powerful features to unify streaming and batch processing D-Streams can be combined with static RDDs computed using a standard Spark job Users can run a D-Stream program on previous
historical data using a “batch mode.” Users run ad-hoc queries on D-Streams
interactively by attaching a Scala console to their Spark Streaming program and running arbitrary Spark operations on the RDDs there
counts.slice("21:00", "21:05").topK(10)
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Master tracks the D-Stream lineage graph and schedules tasks to compute new RDD partitions.Worker nodes that receive data, store the partitionsof input and computed RDDs, and execute tasks.Client library used to send data into the system
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
New data is replicated across two worker nodes before sending an acknowledgement to the client library, because D-Streams require input data to be stored reliably to recompute results. If a worker fails, the client library sends unacknowledged data to another worker.
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Spark Streaming relies on Spark’s existing batchscheduler within each timestep, and performs manyof the optimizations in systems
It pipelines operators that can be grouped into a single task, such as a map followed by another map.
It places tasks based on data locality. It controls the partitioning of RDDs to avoid
shuffling data across the network
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Optimizations for Stream Processing Network communication :asynchronous I/O Timestep pipelining: submitting tasks from the
next timestep before the current one has finished
Task Scheduling : messages size, more task Storage layer: RDDs are immutable, they can
be checkpointed over the network without blocking computations on them and slowing jobs.
Lineage cutoff : forget lineage after an RDD has been checkpointed
Master recovery : run 24/7
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Memory ManagementEach node’s block store manages RDD partitions in an LRU fashion
User can set a maximum history timeout, after which the system will simply forget old blocks without doing disk I/O
The memory required by Spark Streaming is not onerous, because the state within a computation istypically much smaller than the input data
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Parallel RecoveryThe system periodically checkpoints some of the state RDDs, by asynchronously replicating them to other worker nodesWhen a node fails, the system detects all missing RDD partitions and launches tasks to recompute them from the last checkpoint. Many tasks can be launched at the same time to compute different RDD partitions, allowing the whole cluster to partake in recovery.
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Parallel Recovery
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Parallel Recovery
恢复量满载恢复时间 新数据
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Straggler MitigationD-Streams also let us mitigate stragglers like batch systems do, by running speculative backup copies of slow tasks.Such speculation would be difficult in a continuous operator system, as it would require launching a new copy of a node, synchronizd populating its state, and overtaking the slow copy. whenever a task runs more than 1 .4×longer than the median task in its job stage, we mark it as slow. More refined algorithms
BREAD PPT DESIGN
Implementation
数据库与知识工程实验室
www.dbke.sinaapp.com
Master RecoveryWriting the state of the computation reliably when starting each timestep Having workers connect to a new master and report their RDD partitions to it when the old master fails
Stores D-Stream metadata in HDFSgraph, function objects, checkpoint time,updated rdd
A 100-node cluster resuming work in 12 seconds
BREAD PPT DESIGN
目录Introduction1
CONTENTS
Background2
Implementation3
Experiment4
Conclusion5数据库与知识工程实验室
www.dbke.sinaapp.com
Streaming
BREAD PPT DESIGN
Experiment
数据库与知识工程实验室
www.dbke.sinaapp.com
Amazon EC2 m1.xlarge 4 cores and 15 GB RAM1 s latency target -> 500 ms input intervals2 s latency target -> 1 s intervals100-byte input records
BREAD PPT DESIGN
Experiment
数据库与知识工程实验室
www.dbke.sinaapp.com
Spark Streaming’s per-node throughput of 640,000 records/s for Grep and 250,000 records/s for TopKCount on 4-core nodesOracle CEP 1 million records/s on 16 coresStreamBase 245,000 records/s on 8 coresEsper 500,000 records/s on 4 cores
While there is no reason to expect D-Streams to be slower or faster per-node, the key advantage is that Spark Streaming scales nearly linearly to 100 nodes
BREAD PPT DESIGN
Experiment
数据库与知识工程实验室
www.dbke.sinaapp.com
S4 was limited in the number of records/second it could process per, which made it almost 10× slower than Spark and Storm.
Storm is still adversely affected by smaller record sizes
BREAD PPT DESIGN
Experiment
数据库与知识工程实验室
www.dbke.sinaapp.com
1-second batches with input data residing in HDFS20 MB/s/node for WordCount 80 MB/s/node for Grepcheckpoint interval of 10 seconds 20 four-core nodes
BREAD PPT DESIGN
Experiment
数据库与知识工程实验室
www.dbke.sinaapp.com
BREAD PPT DESIGN
Experiment
数据库与知识工程实验室
www.dbke.sinaapp.com
doubling the nodes reduces the recovery time in half
BREAD PPT DESIGN
Experiment
数据库与知识工程实验室
www.dbke.sinaapp.com
We tried slowing down one of the nodes instead of killing it, by launching a 60-thread process that overloaded the CPU
BREAD PPT DESIGN
目录Introduction1
CONTENTS
Background2
Implementation3
Experiment4
Conclusion5数据库与知识工程实验室
www.dbke.sinaapp.com
Streaming
BREAD PPT DESIGN
Conclusion
数据库与知识工程实验室
www.dbke.sinaapp.com
We have proposed D-Streams, a new model for distributed streaming computation that • enables fast recovery from both faults and stragglers
without the overhead of replication• forgot conventional streaming wisdom by batching
data into small timesteps• support a wide range of operators and can attain high
per-node throughput, linear scaling to 100 nodes, sub-second latency, and sub-second fault recovery
• compose seamlessly with batch and interactive queries
BREAD PPT DESIGN
工作进展
论文工作
数据库与知识工程实验室
www.dbke.sinaapp.com
实习工作手 Q 质量数据处理
流数据挖掘
Spark 调研
BREAD PPT DESIGN
实习工作对手机 QQ 十多个质量指标,约 50 个事件进行监控收发图片、收发消息、收发文件、登陆、页面切换等群图片、讨论组图片、用户间图片等每天收图片日志 iPhone 13 亿 Android 60 亿 约 8 万条 / 秒, 80M/ 秒, 7T/ 天 Java + Python + Hive + Pig + PostgreSQL
数据库与知识工程实验室
www.dbke.sinaapp.com
BREAD PPT DESIGN
实习工作数据样例
2014080108, 中国 , 浙江省 , 中国电信 ,unknown,unknown,10.157.89.36 2014-08-01 07:59:59.949,INFO,0S200MNJT807V3GE,5.0.0.146,beacon,1.8.0,H30-U10;Android 4.2.2,level 17,122.242.114.5,122.242.114.5,wifi,actGroupPicSmallDownV1,true, 4397,66,A2=000000000000000&A1=1085779492&A4=000000000000000¶m_NetworkInfo=2&A3=000000000000000_00:0c:e7:30:13:cf&A6=20:08:ed:07:c6:8d¶m_step=1_1_1_0_65;2_-1_0_0_0;3_-1_0_0_0&serverip=61.151.234.34&A7=7638540fc92e0c2e ¶m_groupPolicy=1¶m_uuid={2115FC55-2DA4-4A73-3570-FA89969A3C17}.jpg ¶m_uinType=1&A67=com.tencent.mobileqq:MSF&QQ=&A28=122.242.114.5&A27=4397¶m_FailCode=0&A26=66&A25=true&A23=2017¶m_DownMode=1¶m_ProductVersion=537039093¶m_NetworkOperator= 中国移动¶m_SsoServerIp=14.17.42.23:8080¶m_runStatus=0&A19=wifi¶m_grpUin=213478033¶m_GatewayrIp=122.242.114.5¶m_Server=61.151.234.34,2014-08-01 07:59:15,2014-08-01 07:59:59,,1085779492,Android,4.2.2,1085779492,0, 000000000000000_00:0c:e7:30:13:cf,0,,20:08:ed:07:c6:8d,7638540fc92e0c2e,,,,,,,,,,,,wifi,,,,2017,,true,66,4397,122.242.114.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,com.tencent.mobileqq:MSF,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20140801080
数据库与知识工程实验室
www.dbke.sinaapp.com
BREAD PPT DESIGN
实习工作手 Q 质量数据处理流程
灯塔灯塔库表
各指标小时统计总表小时汇总到天总表
收图小时表
收图天表
TDWHDFS PG入库 出库计算
发图天表
发图小时表
……
……
…
数据库与知识工程实验室
www.dbke.sinaapp.com
BREAD PPT DESIGN
手 Q 质量数据处理由 SQL 转向 Pig
灯塔灯塔库表
各指标小时统计总表小时汇总到天总表
收图小时表
收图天表
TDWHDFS PG入库 出库计算
发图天表
发图小时表
…
…
……
…
…收图统计
发图统计
…TDW
PigHDFS
入库
PigHDFS转移
数据库与知识工程实验室
www.dbke.sinaapp.com实习工作
BREAD PPT DESIGN
从 SQL 转向 Pig 后,每天手 Q 质量数据处理成本 由 3000+ 降低至 约 1000
数据库与知识工程实验室
www.dbke.sinaapp.com实习工作
BREAD PPT DESIGN
手 Q 质量数据处理由 SQL 转向 Pig 成效——原因分析SQL 重复解析 0_1_0_12;1_1_0_372;2_1_1_2245 => col1--col15
Pig 定义一个 UDF
PS : Hive 也支持 UDF 但 TDW SQL 不支持
数据库与知识工程实验室
www.dbke.sinaapp.com实习工作
BREAD PPT DESIGN
Spark 现状
HiveStormMahout Giraph
采用 Scala 编写,支持Python 、 Scala 、 Java
Spark SQL Spark StreamingMLlib GraphX
实习工作
数据库与知识工程实验室
www.dbke.sinaapp.com
BREAD PPT DESIGN
实习工作
数据库与知识工程实验室
www.dbke.sinaapp.com
• 学术界对工业的理论创新 RDD vs MapReduce• 不仅支持 MapReduce ,还支持 Pregel 等多范式• 充分利用内存,支持 DAG ,少序列化、 IO 、网络• 数据加载时, partition 可控• 多级别内存持久化可控,交互式查询• 基于血统的容错机制,类管道支持• 速度优势明显、内存消耗大• 支持 SQL 、流数据、离线数据、图数据、机器学习等• 学习了基本的 Spark 使用,其他框架上手容易不仅仅是快
Spark VS Hadoop
BREAD PPT DESIGN
实习工作
数据库与知识工程实验室
www.dbke.sinaapp.com
Spark 未来 ( San Francisco| June 30 - July 2, 2014Spark Summit 2014 )Spark SQL
• 优化:代码生成、更快的 join 等• 语言扩展:将支持 SQL92• 更好的集成
BREAD PPT DESIGN
实习工作
数据库与知识工程实验室
www.dbke.sinaapp.com
Spark 未来 ( San Francisco| June 30 - July 2, 2014Spark Summit 2014 )MLlib
• 支持的算法将由 15 个翻倍到 30 个左右,涵盖抽样、相关性、估计、检验等描述性统计学算法以及 NMF 、 Sparse SVD 和 LDA 等机器学习算法• SparkR 上线并集成到 MLlibStreaming将支持更多的数据源GraphX优化和 API稳定业界的贡献特性
停止MapReduce转向 Spark
BREAD PPT DESIGN
实习工作
数据库与知识工程实验室
www.dbke.sinaapp.com
Spark meetup in China
2014 年 8月 9 日 @北京 1st Intel 、亚信、 Databrick 2014 年 8月 31 日 @杭州 华为、阿里巴巴2014 年 9月 6 日 @北京 2nd traintracks.io 、微软、京东2014 年 9月 21 日 @深圳 华为、腾讯2014 年 10月 26 日 @北京 3rd Intel 、阿里巴巴、微软、美团、NJU
8月 9 日, Spark-User Beijing Meetup第一次分享活动在亚信科技总部研发中心大厦成功举办。本次活动吸引了包括百度、新浪、京东、 Tibco 、豌豆荚、豆瓣、微博、小米、华为、爱奇艺、美团、 58 、海星、搜狗、 CBSI 、神舟泰岳、大唐电信、 Talking Data 、安达佳、中航信、清华大学、北京邮电大学及银行系统等32 家不同公司、高校、金融系统共 121 人参与。
星火燎原
BREAD PPT DESIGN
进展• 研读了几篇关于流数据挖掘的博士论文,对于流数据挖掘的挑战与常用解决方法有了基本认识• 查阅了 Storm 这个工业界比较成熟的流数据系统的科普知识• 阅读MOA ( Massive Online Analysis )这个流数据挖掘工具的文档,测试了一些例子计划• 深入对流数据挖掘算法的研究• 完成小论文的实验与撰写• 确定毕业论文的具体题目
数据库与知识工程实验室
www.dbke.sinaapp.com论文工作
BREAD PPT DESIGN
Thank You !
数据库与知识工程实验室
www.dbke.sinaapp.com
Recommended