20
Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Embed Size (px)

Citation preview

Page 1: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Reliable Multicast for Time-Critical Systems

Mahesh BalakrishnanKen Birman

Cornell University

Page 2: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Mission-Critical Datacenters

COTS DatacentersOnline e-tailers, search engines, corporate

applicationsWeb-services

Mission-Critical AppsNeed: Scalability, Availability, Fault-Tolerance

… Timeliness!

Page 3: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

The Time-Critical Datacenter

Migrating time-critical applications to commodity datacenters…

… conversely, providing datacenter web-services with time-critical performance.

Page 4: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

What’s a Time-Critical System?

Not ‘real time’, but ‘real fast’!

Financial calculators, military command and control… air traffic control (ATC)

… foobooks.com!

Technology Gap: Real-Time focuses on determinism, scale-up architectures

Page 5: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

The French ATC System

Mid to Late 90’s Teams of 3-5 air traffic controllers on a

cluster of desktop consoles 50-200 of these console clusters in an air

traffic control center Why study the French ATC?

Page 6: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

ATC Subsystems

Radar Image Weather Alert Track Updates Updates to Flight Plans Console to Console State Updates System Management and Monitoring ATC center to center Updates

Multicast ubiquitous…

Page 7: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Two Kinds of Multicast

Virtually Synchronous Multicast: very reliable, not particularly fast

Unreliable Multicast: very fast, not particularly reliable

Nothing in between!

Page 8: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Two Kinds of Subsystems

Category 1: Complete reliability (virtual synchrony) e.g: Routing decisions

Category 2: Careful application design + natural hardware properties + management policies. e.g: Radar

Page 9: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Multicast in the French ATC

Engineering Lessons: Structure application to tolerate partial failures Exploit natural hardware properties

Can we generalize to modern systems?

Research Direction: Time-Critical Reliability Can we design communication primitives that

encapsulate these lessons?

Page 10: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Anatomy of a Cloned Service

RACS

Updates multicast to whole group

Queries unicast to

single nodes

Page 11: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Services An Amazon web-page is constructed by

100s of co-operating services*

Multicast is used for:Updating Cloned ServicesPublish-Subscribe / EventingDatacenter Management/Monitoring

* Werner Vogels, CTO of amazon.com, at SOSP 2005

Page 12: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Multicast in the Datacenter

A node is in many multicast groups: One for each service it

hosts One for each topic it

subscribes to One or more

administration groups

Large Numbers of Overlapping Groups!

Page 13: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Service Semantics

Product Popularity Service

Shipping Scheduler

Store Inventory

User History Service

Product Recommendations

User Profile Data

Data Store Services: stale data can result in overselling / underselling loss of real-world dollars

Cache Services: updated

periodically by back-end data-stores

Page 14: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

The Challenge

Datacenter Blades are failure-prone: Crash failures Byzantine behavior Bursty Packet Loss :

End-hosts kernels drop packets when subjected to traffic spikes.

Page 15: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

A New Reliability Model

Rapid delivery is more important than perfect reliability

Probabilistic Timeliness Graceful Degradation

Page 16: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Wanted: a multicast primitive that

1. Scales to large numbers of arbitrarily overlapping multicast groups

2. Delivers multicasts quickly

3. Tolerates datacenter failure modes – bursty packet loss, node failures

4. Offers probabilistic properties

5. ‘Gives up’ on lost data after a threshold period

Page 17: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Ricochet: Lateral Error Correction

Receivers exchange error correction XORs of multicast traffic

Works very well with multiple groups – scales upto a thousand groups per node

Probabilistic Timeliness: probability distribution of delivery

latencies

Page 18: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Predictive Total Ordering (Plato)

Delivers messages to applications with no ordering delay in most cases

Orders messages only if there is a high probability of out-of-order delivery across different nodes

Probabilistic Timeliness: probability distribution of ordered delivery latency

Page 19: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Performance

SRM takes seconds to recover lost packets

Ricochet recovers almost all packets within ~70 milliseconds

Page 20: Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University

Conclusion

Move from R/T to T/C yields huge benefits! Ricochet is faster… slashes latency… scalable… Clean delivery delay curve a powerful design tool,

replaced traditional hard (but conservative) limits We’re open for business:

Software and detailed paper available for download Give it a try… tell us what you think!

www.cs.cornell.edu/projects/quicksilver/ricochet.html