31
Operations Driven Web Services -A Case Study of Service Evolution at Rent the Runway Camille Fournier, Head of Engineering @skamille Carlo Barbara, Senior Systems Engineer @CarloBarbara

Operations-Driven Web Services at Rent the Runway

Embed Size (px)

DESCRIPTION

From a meetup on 8/26, Rent the Runway's operations-driven service infrastructure including an overview of why we use Dropwizard

Citation preview

Page 1: Operations-Driven Web Services at Rent the Runway

Operations Driven Web Services-A Case Study of Service Evolution at Rent the Runway

Camille Fournier, Head of Engineering @skamille

Carlo Barbara, Senior Systems Engineer @CarloBarbara

Page 2: Operations-Driven Web Services at Rent the Runway

In The Beginning, There Was Drupal

Product Details

Filtering

View UsersProduct Creation

Order ManagementReservations

Login

Page 3: Operations-Driven Web Services at Rent the Runway

There was also all of these folks…

Page 4: Operations-Driven Web Services at Rent the Runway
Page 5: Operations-Driven Web Services at Rent the Runway

ViewProduct Creation

Order ManagementReservations

Filtering

Product Details

Users

Login

Can’t Just Burn the World Down

Page 6: Operations-Driven Web Services at Rent the Runway

ViewProduct Creation

Order ManagementReservations

Filtering

Product Details

Users

Login

Hollow It Out!

Page 7: Operations-Driven Web Services at Rent the Runway

ViewProduct Creation

Order ManagementFiltering

Product Details

Users

Login

Hollow It Out!

Page 8: Operations-Driven Web Services at Rent the Runway

ViewProduct Creation

Order ManagementFiltering

Users

Login

Hollow It Out!

Page 9: Operations-Driven Web Services at Rent the Runway

ViewProduct Creation

Order Management

Users

Login

Hollow It Out!

Page 10: Operations-Driven Web Services at Rent the Runway

Complexity

Dec-1

1

Feb-

12

Apr-1

2

Jun-

12

Aug-

12

Oct-1

2

Dec-1

2

Feb-

13

Apr-1

3

Jun-

1302468

101214

Number of Services in Production

Page 11: Operations-Driven Web Services at Rent the Runway

Operations first…

Availability and performance of our services is critical to running our business

The software we develop has to make delivering on our SLAs possible

How (besides sane design): Healthchecks + Nagios Measurements Historical Data with Graphs

Page 12: Operations-Driven Web Services at Rent the Runway

Metrics

Gauges – instantaneous value

Counters – counter with +/-

Meters – rate over time (mean, 1, 5, & 15 moving avg.)

Histograms – distribution of data (mean, median, max, std. div., 75th, 90th, 95th, 98th, 99th, & 99.9th percentiles)

Timers – Meter of requests & Histogram of duration (frequency & latency)

Page 13: Operations-Driven Web Services at Rent the Runway

Metrics - Healthchecks

Verify that your service is running correctly

Page 14: Operations-Driven Web Services at Rent the Runway

Metrics - Reporting

HTTP

JMX

Graphite

Page 15: Operations-Driven Web Services at Rent the Runway

Dropwizard: What is it?

Quality open source Java webservice components glued together in a modular way

Eliminates the need for picking a platform stack, it’s all there

It’s opinionated. If you don’t like a Dropwizard core component, that’s too bad, don’t use Dropwizard

Developers focus on business logic, not framework

It’s easy, maintainable, and it works!

Page 16: Operations-Driven Web Services at Rent the Runway

A Few Words from Coda…

“I had no one I had to toss a WAR to. I had no one to stand up a Tomcat server and fiddle with it until their eyes bled. I had no one who didn't trust me to spin up my own threads or connection pools. So I wrote something which worked as simply and in as straight-forward a manner as possible because my own ass was on the line if it didn't work.”

Page 17: Operations-Driven Web Services at Rent the Runway

Dropwizard: The Ingredients

Jersey for REST

Jackson for JSON

Jetty for a webserver

Metrics for measuring

YAML for configuring

Dropwizard for weaving everything together

Page 18: Operations-Driven Web Services at Rent the Runway

Dropwizard – Healthchecks

Register hooks that check the health of your app

An HTTP endpoint that iterates over all the hooks

“The meaning of healthy” is decided by you (i. e. Database Connections, Client Connections, DeadLock Count)

Page 19: Operations-Driven Web Services at Rent the Runway

Dropwizard + Metrics

Dropwizard has lots of platform instrumentation baked in using Metrics, happens for free! (i.e. Jetty, JVM, Log Counts, etc…)

Ability to add Timers to your endpoints with @Timed

Ability to add arbitrary metrics as you see fit

Page 20: Operations-Driven Web Services at Rent the Runway

Other Frameworks

Play 1.X Abandonware for Play 2.X, which was still beta Magic

Glassfish OSGI hell “standards”

Spring Everything and the kitchen sink Also I hate XML

Page 21: Operations-Driven Web Services at Rent the Runway

What do I get out of it? Dev agenda

Story telling: causation & correlation

Integral piece of the operational excellence puzzle

State of the world – Dashboards

Developers focus on features, operations is mostly free lunch

Code review & demo

Disclaimer: You need graphite to really harness the value

Page 22: Operations-Driven Web Services at Rent the Runway

Story telling

The grid is slow why? Is it load? Is it dependent service latency? How does that compare to yesterday

JVM throws out of memory, what’s the problem? What does the GC jigsaw look? When did it change? Is it correlated with increased load?

How is that new ‘performance’ tweak? If you never measured, then you didn’t tune. True story! What does my 5XX graph look like?

Page 23: Operations-Driven Web Services at Rent the Runway

Operational Excellence: The ingredients

Application Instrumentation (Dropwizard)

Time Series Data & Graphing (Graphite, D3)

Centralized logging & log parsing (Rsyslog, Logstash, Nagios)

Automated alerting & escalation (Pagerduty)

DW & Graphite will get you very far, but if you want total control & visibility you need the rest. This is the stack that RTR is moving towards, rather than relying on basic java logging smtp appenders

Page 24: Operations-Driven Web Services at Rent the Runway

OMG, we are on GMA, are we OK?

10+ services

Each services runs in a cluster behind an LB

‘OK’ is somewhat service specific

Basically you need a lot of info at your fingertips. Pictures are worth a thousand words. Get yourself some dashboards!

Page 25: Operations-Driven Web Services at Rent the Runway

Graphite Dashboard

Page 26: Operations-Driven Web Services at Rent the Runway

Tasseo dashboard (D3)

• Red, Yellow, & Green Lights• Realtime• Endless cool things: graphite + D3

If we see yellow or red, start diagnosing

Page 27: Operations-Driven Web Services at Rent the Runway

Free Lunch? Not really

DB connection pool monitoring

Http client connection pool monitoring

JVM Heap & GC info

Http Server response counts

Http Server connection info

Endpoint duration & throughput stats

Page 28: Operations-Driven Web Services at Rent the Runway

Where do I sign up?

You install Graphite, one time hit + some TLC. Medium Difficulty

You annotate your endpoints and maybe add finer telemetry. Easy

You configure so your service is feeding into graphite. Hopefully consistently across services, via a ‘Bundle’. Easy

Page 29: Operations-Driven Web Services at Rent the Runway

Demo

Show a simple dropwizard codebase

Do some curls

Show the admin endpoints

Page 30: Operations-Driven Web Services at Rent the Runway

References

dropwizard.codahale.com

metrics.codahale.com

graphite.wikidot.com

Page 31: Operations-Driven Web Services at Rent the Runway

Presenters

@CarloBarbara (www.cabkata.com)

@Skamille (whilefalse.blogspot.com)

Rent The Runway is hiring! (renttherunway.com/careers)