SSL Certificate Expiration and Howler Monkey's Inception

Preview:

DESCRIPTION

 

Citation preview

@royrapoport rsr@netflix.com

SSL* Certificate ReportingBayLISA

March 21st, 2013

Friday, March 22, 13

This is the story of how we went from SSL certificates expiring without notice in production to deploying Security Monkey (later renamed Howler Monkey) and permanently eliminating SSL certificate expiration as a production-class issue.

@royrapoport rsr@netflix.com

SSL* Certificate ReportingBayLISA

March 21st, 2013

Friday, March 22, 13

This is the story of how we went from SSL certificates expiring without notice in production to deploying Security Monkey (later renamed Howler Monkey) and permanently eliminating SSL certificate expiration as a production-class issue.

@royrapoport rsr@netflix.com

Technology Overview

Friday, March 22, 13

@royrapoport rsr@netflix.com

Technology Overview• SoA, REST, Mostly Java

Friday, March 22, 13

@royrapoport rsr@netflix.com

Technology Overview• SoA, REST, Mostly Java

• Simple overall architecture:

Friday, March 22, 13

@royrapoport rsr@netflix.com

Technology Overview• SoA, REST, Mostly Java

• Simple overall architecture:

Friday, March 22, 13

@royrapoport rsr@netflix.com

Culture Overview

Friday, March 22, 13

We hire very smart people, give them all the context and situational awareness they want, and set them free. We design our environment, our systems, and our teams to be empowered to make decisions without requiring slow approval processes, cumbersome formal communication, or any other unnecessary friction.

@royrapoport rsr@netflix.com

Culture Overview

• Freedom and Responsibility

Friday, March 22, 13

We hire very smart people, give them all the context and situational awareness they want, and set them free. We design our environment, our systems, and our teams to be empowered to make decisions without requiring slow approval processes, cumbersome formal communication, or any other unnecessary friction.

@royrapoport rsr@netflix.com

Culture Overview

• Freedom and Responsibility

• Distributed Operations

Friday, March 22, 13

We hire very smart people, give them all the context and situational awareness they want, and set them free. We design our environment, our systems, and our teams to be empowered to make decisions without requiring slow approval processes, cumbersome formal communication, or any other unnecessary friction.

@royrapoport rsr@netflix.com

Culture Overview

• Freedom and Responsibility

• Distributed Operations

•Get out of the way of Developers

Friday, March 22, 13

We hire very smart people, give them all the context and situational awareness they want, and set them free. We design our environment, our systems, and our teams to be empowered to make decisions without requiring slow approval processes, cumbersome formal communication, or any other unnecessary friction.

@royrapoport rsr@netflix.com

So Certificates ...

Friday, March 22, 13

@royrapoport rsr@netflix.com

So Certificates ...• Dozens of Certificates

Friday, March 22, 13

@royrapoport rsr@netflix.com

So Certificates ...• Dozens of Certificates

• Different kinds of places

Friday, March 22, 13

@royrapoport rsr@netflix.com

So Certificates ...• Dozens of Certificates

• Different kinds of places

• Datacenter/private

Friday, March 22, 13

@royrapoport rsr@netflix.com

So Certificates ...• Dozens of Certificates

• Different kinds of places

• Datacenter/private

• Datacenter/public/LB

Friday, March 22, 13

@royrapoport rsr@netflix.com

So Certificates ...• Dozens of Certificates

• Different kinds of places

• Datacenter/private

• Datacenter/public/LB

• ELBs

Friday, March 22, 13

@royrapoport rsr@netflix.com

So Certificates ...• Dozens of Certificates

• Different kinds of places

• Datacenter/private

• Datacenter/public/LB

• ELBs

• EC2

Friday, March 22, 13

@royrapoport rsr@netflix.com

So Certificates ...• Dozens of Certificates

• Different kinds of places

• Datacenter/private

• Datacenter/public/LB

• ELBs

• EC2

• Source Control

Friday, March 22, 13

@royrapoport rsr@netflix.com

So Certificates ...• Dozens of Certificates

• Different kinds of places

• Datacenter/private

• Datacenter/public/LB

• ELBs

• EC2

• Source Control

• EIPs

Friday, March 22, 13

@royrapoport rsr@netflix.com

So Certificates ...• Dozens of Certificates

• Different kinds of places

• Datacenter/private

• Datacenter/public/LB

• ELBs

• EC2

• Source Control

• EIPs

• Totally Distributed Design

Friday, March 22, 13

@royrapoport rsr@netflix.com

So Certificates ...

• Some Certificates Weren’t[sic]

Friday, March 22, 13

Some certificates weren’t even SSL certificates -- we have certificates we get from a partner that cannot be accessed via SSL, and for which the answer to the question “when does this expire?” require scraping a web page.

@royrapoport rsr@netflix.com

So Certificates ...

Friday, March 22, 13

(obviously, the ‘standard ways to solve this’ part here is somewhat facetious, but these are, in fact, the standard ways in which most organizations try to deal with keeping up with SSL certificate expirations)

@royrapoport rsr@netflix.com

So Certificates ...• SSL Certificates expire

Friday, March 22, 13

(obviously, the ‘standard ways to solve this’ part here is somewhat facetious, but these are, in fact, the standard ways in which most organizations try to deal with keeping up with SSL certificate expirations)

@royrapoport rsr@netflix.com

So Certificates ...• SSL Certificates expire

• Millions of people can’t stream

Friday, March 22, 13

(obviously, the ‘standard ways to solve this’ part here is somewhat facetious, but these are, in fact, the standard ways in which most organizations try to deal with keeping up with SSL certificate expirations)

@royrapoport rsr@netflix.com

So Certificates ...• SSL Certificates expire

• Millions of people can’t stream

• Hilarity ensues

Friday, March 22, 13

(obviously, the ‘standard ways to solve this’ part here is somewhat facetious, but these are, in fact, the standard ways in which most organizations try to deal with keeping up with SSL certificate expirations)

@royrapoport rsr@netflix.com

So Certificates ...• SSL Certificates expire

• Millions of people can’t stream

• Hilarity ensues

• Standard Ways to Solve This

Friday, March 22, 13

(obviously, the ‘standard ways to solve this’ part here is somewhat facetious, but these are, in fact, the standard ways in which most organizations try to deal with keeping up with SSL certificate expirations)

@royrapoport rsr@netflix.com

So Certificates ...• SSL Certificates expire

• Millions of people can’t stream

• Hilarity ensues

• Standard Ways to Solve This

• Excel worksheets

Friday, March 22, 13

(obviously, the ‘standard ways to solve this’ part here is somewhat facetious, but these are, in fact, the standard ways in which most organizations try to deal with keeping up with SSL certificate expirations)

@royrapoport rsr@netflix.com

So Certificates ...• SSL Certificates expire

• Millions of people can’t stream

• Hilarity ensues

• Standard Ways to Solve This

• Excel worksheets

• Wiki documents

Friday, March 22, 13

(obviously, the ‘standard ways to solve this’ part here is somewhat facetious, but these are, in fact, the standard ways in which most organizations try to deal with keeping up with SSL certificate expirations)

@royrapoport rsr@netflix.com

So Certificates ...• SSL Certificates expire

• Millions of people can’t stream

• Hilarity ensues

• Standard Ways to Solve This

• Excel worksheets

• Wiki documents

• Events on public calendars

Friday, March 22, 13

(obviously, the ‘standard ways to solve this’ part here is somewhat facetious, but these are, in fact, the standard ways in which most organizations try to deal with keeping up with SSL certificate expirations)

@royrapoport rsr@netflix.com

Let’s Do This ThingCassandra

Certificate

Friday, March 22, 13

Start with a very simple model -- a Certificate entity, which is really just a combination of name, expiration date, and a series of locations where we can find this. It’d be trivial to feed this thing from my todo list, if I wanted to (but given the state of my todo list, probably a bad idea)

@royrapoport rsr@netflix.com

Let’s Do This ThingCassandra

Certificate

ELB

Friday, March 22, 13

Then start building location-aware spiders -- e.g. this spider that knows how to probe all our ELBs to see if they listen on 443 and gets their certificate if they do.

@royrapoport rsr@netflix.com

Let’s Do This ThingCassandra

Certificate

ELB

EC2 Instance

Friday, March 22, 13

Or this spider that knows how to talk to a specific kind of EC2 instance we have with some certificates.

@royrapoport rsr@netflix.com

Let’s Do This ThingCassandra

Certificate

ELB

EC2 Instance

IP Range

Friday, March 22, 13

etc ...

@royrapoport rsr@netflix.com

Let’s Do This ThingCassandra

Certificate

ELB

EC2 Instance

IP Range

Filesystem

Friday, March 22, 13

@royrapoport rsr@netflix.com

Let’s Do This ThingCassandra

Certificate

ELB

EC2 Instance

IP Range

FilesystemDNS

Friday, March 22, 13

@royrapoport rsr@netflix.com

Let’s Do This ThingCassandra

Certificate

ELB

EC2 Instance

IP Range

FilesystemDNS

Friday, March 22, 13

Once you have all this information, you can easily generate a web page showing certificates, where they are, and when they expire

@royrapoport rsr@netflix.com

Let’s Do This ThingCassandra

Certificate

ELB

EC2 Instance

IP Range

FilesystemDNS

Friday, March 22, 13

And send out emails, too -- once we built the capability for teams to subscribe to emails for a given certificate and specify how many days before expiration they should start getting notified

@royrapoport rsr@netflix.com

Since Then

Friday, March 22, 13

We validated the design by continuing to iterate on it -- recently, when building the DNS spider component, that work took only about 15 minutes to implement. We also expanded subscription capabilities so teams could subscribe to certificate expiration warnings based on certificate name regular expressions.

@royrapoport rsr@netflix.com

Since Then

• No Production Emergencies due to SSL certificate expiration

Friday, March 22, 13

We validated the design by continuing to iterate on it -- recently, when building the DNS spider component, that work took only about 15 minutes to implement. We also expanded subscription capabilities so teams could subscribe to certificate expiration warnings based on certificate name regular expressions.

@royrapoport rsr@netflix.com

Since Then

• No Production Emergencies due to SSL certificate expiration

• Validated Design

Friday, March 22, 13

We validated the design by continuing to iterate on it -- recently, when building the DNS spider component, that work took only about 15 minutes to implement. We also expanded subscription capabilities so teams could subscribe to certificate expiration warnings based on certificate name regular expressions.

@royrapoport rsr@netflix.com

Since Then

• No Production Emergencies due to SSL certificate expiration

• Validated Design

• Better Subscription Capabilities

Friday, March 22, 13

We validated the design by continuing to iterate on it -- recently, when building the DNS spider component, that work took only about 15 minutes to implement. We also expanded subscription capabilities so teams could subscribe to certificate expiration warnings based on certificate name regular expressions.

@royrapoport rsr@netflix.com

Soon ...

Friday, March 22, 13

We should be able to figure out who owns a certificate, most of the time, and alert them directly even if they don’t set up a subscription.

@royrapoport rsr@netflix.com

Soon ...• Customized, automated alerting

Friday, March 22, 13

We should be able to figure out who owns a certificate, most of the time, and alert them directly even if they don’t set up a subscription.

@royrapoport rsr@netflix.com

Soon ...• Customized, automated alerting

• Automated renewal

Friday, March 22, 13

We should be able to figure out who owns a certificate, most of the time, and alert them directly even if they don’t set up a subscription.

@royrapoport rsr@netflix.com

Soon ...• Customized, automated alerting

• Automated renewal

• Telling you a problem is about to happen: Good

Friday, March 22, 13

We should be able to figure out who owns a certificate, most of the time, and alert them directly even if they don’t set up a subscription.

@royrapoport rsr@netflix.com

Soon ...• Customized, automated alerting

• Automated renewal

• Telling you a problem is about to happen: Good

• Preventing the problem automatically: Priceless

Friday, March 22, 13

We should be able to figure out who owns a certificate, most of the time, and alert them directly even if they don’t set up a subscription.

@royrapoport rsr@netflix.com

Soon ...• Customized, automated alerting

• Automated renewal

• Telling you a problem is about to happen: Good

• Preventing the problem automatically: Priceless

• Open Source

Friday, March 22, 13

We should be able to figure out who owns a certificate, most of the time, and alert them directly even if they don’t set up a subscription.

@royrapoport rsr@netflix.com

Remember ...

Friday, March 22, 13

@royrapoport rsr@netflix.com

Remember ...

• Be Lazy

Friday, March 22, 13

@royrapoport rsr@netflix.com

Remember ...

• Be Lazy

• Help Others Be Lazy

Friday, March 22, 13

@royrapoport rsr@netflix.com

Remember ...

• Be Lazy

• Help Others Be Lazy

• Computers Are Better Than Humans

Friday, March 22, 13

@royrapoport rsr@netflix.com

Remember ...

• Be Lazy

• Help Others Be Lazy

• Computers Are Better Than Humans

• For some things

Friday, March 22, 13

@royrapoport rsr@netflix.com

Remember ...

• Be Lazy

• Help Others Be Lazy

• Computers Are Better Than Humans

• For some things

• Don’t compete on their terms

Friday, March 22, 13

@royrapoport rsr@netflix.com

Questions?

Friday, March 22, 13