18
Ariel Tseitlin Chaos Monkey & The Simian Army

AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Ariel Tseitlin

Chaos Monkey & The Simian Army

Page 2: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

About Netflix

With more than 30 million streaming members in the United States, Canada, Latin America, the United Kingdom, Ireland and the Nordics, Netflix, Inc. (NASDAQ: NFLX) is the world's leading internet subscription service for enjoying movies and TV programs[1][1] http://ir.netflix.com/

Page 3: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Personalization Engine User Info

Movie Metadata

Movie Ratings

Similar Movies

API

ReviewsA/B Test Engine

2B requests per day

into the Netflix API

12B outbound requests per day to API dependencies

Page 4: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

A complex distributed system

Page 5: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Growth is good (and scary)

30x growth in two years!

Page 6: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Growth is good (and scary)

Page 7: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Things will break

Page 8: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army
Page 9: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Chaos Monkey taught us…

• State is bad• Clusters are good• Surviving instance failure is a low bar

Page 10: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

The Sick and Wounded

Page 11: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Latency Monkey

Page 12: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Latency Monkey taught us

• Startup resiliency is often missed• An ongoing unified approach to runtime dependency

management is important (visibility & transparency gets missed otherwise)

• Know thy neighbor (unknown dependencies)

Page 13: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Clutter happens

Page 14: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Janitor Monkey taught us…

• Label everything• Clutter builds up

Page 15: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Ranks of the Simian Army

• Chaos Monkey

• Chaos Gorilla

• Latency Monkey

• Janitor Monkey

• Conformity Monkey

• Circus Monkey

• Doctor Monkey

• Howler Monkey

• Security Monkey

• Chaos Kong

• Efficiency Monkey

Page 16: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Big impact on availability

• Results of the monkeys

Page 17: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

Open

Page 18: AWS Re:Invent 2012 - Chaos Monkey & The Netflix Simian Army

We are sincerely eager to hear your feedback on this

presentation and on re:Invent.

Please fill out an evaluation form when you have a

chance.