Upload
angad-singh
View
38
Download
2
Embed Size (px)
Citation preview
An adaptive and eventually self-healing framework for geo-distributed real-time data ingestion
Angad SinghInMobi
Towards
^
The problem domain
Scale● 15 billion events per day (post filtering)● 1.5+ billion users, 200 million per day● 4 geographically distributed data centers (DCs)● User’s request may land on non-local DC
Ingestion requirements● multiple tenants, multiple schemas per tenant● batch, stream, micro-batch and on-demand ingestion● 20+ streams, 100+ data types● need to ingest, transform, validate and aggregate this data● need to ingest streaming data in real-time (<1 min) for ad-serving/targeting use
cases (strict SLA)
The problem domain
Usage/serving requirements● need to pivot this data by user, activity type and other primary keys● serve an aggregated view (profile) at the end in < 5ms p99 latency● need both real-time serving of the view● as well as batch summaries for analytics, inference algorithms, feedback loops● need to be resilient to failure, absolutely no room for data loss/lag in ingestion
Data arrival, volume and velocity● data may be received out of order, or duplicated● data can arrive in periodic batches or real-time/streaming or once in a while● data may arrive in bursts or trickle slowly in some streams (autoscale)● user data may be received in any DC, but needs to be collectively available in a
single DC
The problem domain
Multi-tenancy● Quotas● Rate limiting/SLAs● Isolation
Manageability● need to be self-serve, flexible for specific changes in the flow, easily deployable● may need online migration, reprocessing, etc. of data● hassle-free schema evolution across the stack● monitoring, visibility, operability aspects for all of the above
The architecture
Serving layer(user store)
aerospikecluster
AP
I
dedup, aggregate, business rules
Rat
e lim
iting
/quo
tas
AP
I
Ad serving
<5m
s, 9
9.95
% s
ucce
ss
notifications
pubsub (kafka)
notification listeners (storm)
periodic dumps
streamingoffline snapshot store (HDFS)
batch inference jobs (MR/spark)
analytics engine (cubes, lens)
real-timeenrichment
on user engagement
Ingestion layer
globaldcglobaldc
offline snapshot store (HDFS)
globaldc
localdc
localdc
localdc
upst
ream
inge
stio
n so
urce
s
batc
h so
urce
sst
ream
ing
sour
ces
adaptors
localdc
adaptorsadaptors(MR/storm)
localdclocaldc
routers
localdc
routersrouters(MR/storm)
localdclocaldc
sinkssinkssinks(MR/storm)
remotedc
Ingestion Service
orchestrate/manage
remotedc remotedc
Architecture
DC1 (global)
DC2 (slave)
DC3 (slave)
adaptors(MR)
adaptors(storm)
adaptors(storm)
routers(MR)
routers(storm)
routers(storm)
User-Colo Metadata
(aerospike)
User-Colo Metadata
(aerospike)
User-Colo Metadata
(aerospike)
cust
om
repl
icat
ion
(contains userid)
(contains userid)
(contains userid)
sinks
getColo(userid)
sinks
sinks
Kaf
ka-d
ata-
repl
icat
or(s
torm
topo
logy
)
global colo tagger(storm)
tag not found
taggeddata
tag found
write tag
User Store
AP
I
History
Profile
User Store
AP
I
History
Profile
User Store
AP
I
History
Profile
XDR(profile)
Cross-DC architecture
cust
om
repl
icat
ion
DC1 (global)
DC2 (slave)
DC3 (slave)
adaptors(MR)
adaptors(storm)
adaptors(storm)
routers(storm)
routers(storm)
routers(storm)
User-Colo Metadata
(aerospike)
User-Colo Metadata
(aerospike)
User-Colo Metadata
(aerospike)
XD
RX
DR
(contains userid)
(contains userid)
(contains userid)
sinks
getColo(userid)
sinks
sinks
Kaf
ka-d
ata-
repl
icat
or(s
torm
topo
logy
)
global colo tagger(storm)
tag not found
taggeddata
tag found
write tag
User Store
AP
I
History
Profile
User Store
AP
I
History
Profile
User Store
AP
I
History
Profile
XDR(profile)
Mapper Comparison to map-reducePartitioner Shuffler Reducer
The ingestion layer
Current Features
Business-agnostic APIs● Built on simple RESTful APIs: Schema, Feed, Sink, Source, Flow, Driver, Data router, Adaptor● Unified APIs for doing batch, streaming, micro-batch ingestion.● Self-serve system which provides rule validation, metrics, etc. and makes the expression of
sources, sinks and flows easy with custom DSL.
Platform-agnostic Flow Execution● Pluggable execution engine (storm, hadoop, spark) - provides a Driver API● Uses falcon for batch scheduling, in-built scheduler for streaming drivers (storm, etc.)
Serialization support● Pluggable schema serde support (thrift, avro)
Current Features
Schema management● Schema is a first class citizen.● Contract between source, sink and flow all based on and validated against schema● Schema versioning and compatibility checks.● Error-free schema evolution across data flows● Clean abstractions to centrally manage all the schemas, data sources/feeds, sinks (key value store,
HDFS, etc.) and data flows (storm topologies, MR jobs) which are part of the ingestion pipelines
Manageability, operability● All entities - schemas, sinks and flows - can be updated online without any downtime.● Retries, error handling, metrics, orchestration hooks, etc. come standard
Out of the box support for● Cross-colo flow chaining● Data routing● Transformation, validation, conversion● All based on pluggable code
The problems we’ve seen
Storm● as usual, lot of knobs to tune based on lot of metrics: workers, threads, tasks, acks, max
spout pending, buffer sizes, xmx, num slots, execute/process/ack latency, capacity, etc.● debugging storm topology’s isn’t easy: threads, workers, shared logs, shuffling of data
between workers, netty, the ack system, etc.● storm (0.9.x) doesn’t like heterogenous load: unbalanced distribution between supervisors.
heavy topologies can choke each other. rebalancing not fully resource aware (1.x tries to solve this)
● no rolling upgrades, supervisor failures cause unrecoverable errors● zookeeper issues: too many executors leads to worker heartbeat update failure to zk.● storm-kafka issue: storm-kafka spout unaware of purging (earliestOffset update)● storm-kafka issue: invisible data loss● retries should done cautiously● etc
Kafka● topic deletion asynchronous, slow● tuning num partitions manually● bad consumers can cause excessive logging on brokers
Features under development
● Autoscaling flows - rebalance storm topology based on spout lag, priority and current throughput (or bolt capacity) - runtime metrics or linear regression on historical metrics
● Streaming and batch compaction / dedup of data based on domain specific rules
● Automatic fallback from streaming to batch ingestion in case of huge backlogs, for low priority ingestions
● Dynamic rerouting / sharding of data between DCs for load balancing cross-DC flows
● Eventual self-correction of data based on validations on the aggregated view (data received from multiple streams)
● Data lineage/auditing● Backfill management