36
从从从从从从从从从从从从 浅浅“浅”浅浅浅浅浅浅浅

Streaming architecture zx_dec2015

Embed Size (px)

Citation preview

Page 1: Streaming architecture zx_dec2015

从零到千亿级实时数据处理浅谈“ ”流 化你的应用构架

Page 2: Streaming architecture zx_dec2015

– https://netflix.github.io/

– http://www.oschina.net/project/netflix

链接

Page 3: Streaming architecture zx_dec2015

• Netflix日处理千亿级数据平台•大数据技术简短历史•深潜流构架及技术基础•为什么你的 App “ ”也 流 的转!

话题

Page 4: Streaming architecture zx_dec2015

大家来回忆一下 :

如何用最基础的数据结构实现 hash table?

Before we start...

Page 5: Streaming architecture zx_dec2015

大家来想一下 :

“为什么 ”有些人 总不让你用全局变量?

Before we start...

Page 6: Streaming architecture zx_dec2015

● 日处理七千亿条/ 1+ PB 数据● 顶峰每秒处理 1千万条/ 20+ GB 信息● 3000+ Kafka brokers , 12 clusters in 3 regions

● 10,000+ Docker容器部署We help Produce,

Store,Process,

MoveEvents @ Cloud

scale

Netflix Keystone Pipeline

Page 7: Streaming architecture zx_dec2015

Keystone构架

Stream Consumers

SamzaRouter

EMR

FrontingKafka

ConsumerKafka

Control Plane

EventProducer

KS

Prox

y

Page 8: Streaming architecture zx_dec2015

● 横向可扩展构架● 完全构架在 AWS云端基础设施上● At-least-once 投递保证

● 容纳 back pressure ,容纳云端不稳定基础服务● Sink level isolation

● 同时支持数据中心内及跨洲际数据中心 failover

● High availability, scalability & durability

● Streaming Architecture

Netflix Keystone Pipeline

Page 9: Streaming architecture zx_dec2015

Big Data History

为什么要用 Streaming Architecture?

Page 10: Streaming architecture zx_dec2015

Big Data History

Page 11: Streaming architecture zx_dec2015

Big Data History

Page 12: Streaming architecture zx_dec2015

Big Data History

Page 13: Streaming architecture zx_dec2015

对流数据的现实需求

●数据爆炸性增长

Page 14: Streaming architecture zx_dec2015

对流数据的现实需求

●数据爆炸性增长●数据处理模式的需求变化

Page 15: Streaming architecture zx_dec2015

如何实现 hash table?

Page 16: Streaming architecture zx_dec2015

教科书说:

如何实现 hash table?

Page 17: Streaming architecture zx_dec2015

如何实现 hash table?

Page 18: Streaming architecture zx_dec2015

如何实现 hash table?

Page 19: Streaming architecture zx_dec2015

如何实现 hash table?

Page 20: Streaming architecture zx_dec2015

Commit log

Commit log 是很多分布式系统中的核心● Database Replication● Paxos Consensus● Kafka● … …

Page 21: Streaming architecture zx_dec2015

1. 传统应用构架从零开始

Page 22: Streaming architecture zx_dec2015

1. 传统应用构架从零开始

Page 23: Streaming architecture zx_dec2015

2. 传统应用构架 - Scale up DB!

Page 24: Streaming architecture zx_dec2015

3. 传统应用构架 - Caching!

res = cache.get(key)

if (!res) {r = db.get(key)cache.put(key, r)

}

return r;

Page 25: Streaming architecture zx_dec2015

3. 传统应用构架 - Caching!

缓存分布式系统难题:● Cache coherence● Cache Invalidation● Consistency issue● Cold start / bootstraping

为什么?● 分布式系统中网络延迟永远大于零● Race condition● Source of truth和客户端看到的永远可能不一致

Page 26: Streaming architecture zx_dec2015

3. 传统应用构架 - Caching!

Page 27: Streaming architecture zx_dec2015

4. 传统应用平台构架 - multi-layered!

Page 28: Streaming architecture zx_dec2015

4. 传统应用构架 - multi-layered!

分层组件之间的 Reconciliation 协议● 最终一直性( eventual consistency)● 轮询协议( polling protocol)● 物质化视图(materialized view)

Page 29: Streaming architecture zx_dec2015

?为什么保存状态的数据库一定要在构架的最底端?

Page 30: Streaming architecture zx_dec2015

介绍流构架

Page 31: Streaming architecture zx_dec2015

介绍流构架

Page 32: Streaming architecture zx_dec2015

流构架特性● 数据不可变性● 顺序可能很重要● Real time & Reactive● Request / Response ⇒ Subscribe / Notify● 预先计算好的缓存● 流构架可以迭代组合● 同一个数据流可产生不同的物质化视图● Delivery guarantee● stream everywhere!

Page 33: Streaming architecture zx_dec2015

核心实现细节

+

* Samza可由其他 streaming processing framework替代。

Page 34: Streaming architecture zx_dec2015

核心实现细节

为什么 Docker和流处理是天生一对?

Page 35: Streaming architecture zx_dec2015

核心实现细节

Page 36: Streaming architecture zx_dec2015

Streaming Architecture

Questions?