67
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS December 2, 2016 STG206 Austin Fonacier, Spokeo Sajee Mathew, AWS Principal Solutions Architect

AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Embed Size (px)

Citation preview

Page 1: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Case Study: How Spokeo Improved

Web Application Response Times with

Amazon EFS

December 2, 2016

STG206

Austin Fonacier, Spokeo

Sajee Mathew, AWS Principal Solutions Architect

Page 2: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

What to Expect from the Session

• Overview of Amazon EFS

• How Spokeo uses EFS

• What we do at Spokeo

• Spokeo Tech Stack

• Our challenge

• Off the shelf CDN

• Writing our own reverse proxy

• Back ends

• Populating EFS at scale

• Lessons learned

Page 3: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Batches and Streams

Direct

Connect

Snowball,

Snowmobile

3rd Party

Connectors

Transfer

Acceleration

Storage

GatewayAmazon Kinesis

Firehose

File

Amazon EFS

Block

Amazon EBS (persistent)

Object

Amazon GlacierAmazon S3Amazon EC2

Instance Store (ephemeral)

AWS Storage Overview

Page 4: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Operating shared file storage today is a pain

App owners and

Developers

Business

Managers

IT administrators Estimate demand

Procure, setup, maintain hardware & space

Provide demand forecasts/business case

Limited flexibility and agility

CAPEX & over-buy

Constant upgrade/refresh cycle

Page 5: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

What if you could…

App owners and

Developers

Business

Managers

IT administrators Eliminate management & maintenance

Scale

Migrate code, apps, tools

Build new cloud-native apps

Predict cost & eliminate CAPEX

Increase agility

Less time managing file system

Page 6: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Fully managed file system for EC2

File system access semantics that works with standard OS

APIs

Sharable across thousands of clients

Grow elastically to petabyte scale

Highly available and durable

Strong consistency

What is Amazon EFS?

Page 7: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Amazon EFS is simple

Fully managed

- No hardware, network, file layer

- Create a scalable file system in seconds!

Seamless integration with existing tools and apps

- NFS v4.1—widespread, open

- Standard file system access semantics

- Works with standard OS file system APIs

Simple pricing = simple forecasting

Page 8: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Amazon EFS is elastic

File systems grow and shrink automatically

as you add and remove files

No need to provision storage capacity or

performance

You pay for only the storage space you use,

with no minimum fee

Page 9: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

File systems can grow to petabyte scale

Throughput and IOPS scale automatically as file systems grow

Consistent low latencies regardless of file system size

Support for thousands of concurrent NFS connections

Amazon EFS is scalable

Page 10: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Designed to sustain Availability Zone (AZ) offline conditions

Resources aggregated across multiple AZs

Superior to traditional NAS availability models

Appropriate for Production / Tier 0 applications

Highly Durable and Highly Available

Page 11: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Highly durable and highly available

Every file system

object (directory,

file, and link) is

redundantly

stored across

multiple

Availability Zones

in a region

AVAILABILITY

ZONE 1

REGION

AVAILABILITY

ZONE 2

AVAILABILITY

ZONE 3

Amazon

EFS

Page 12: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Example use cases

Big data analytics

Media workflow processing

Web serving

Content management

Home directories

Page 13: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

The AWS Management Console, CLI, and SDK each

enable you to perform a variety of management tasks

Create a file system

Create and manage mount targets

Tag a file system

Delete a file system

View details on file systems in your AWS account

Page 14: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Setting up and mounting a file system takes

under a minute

1. Create a file system

2. Create a mount target in each Availability Zone from

which you want to access the file system

3. Enable the NFS client on your instances

4. Run the mount command

Page 15: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Setting up and mounting a file system

Page 16: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Two performance modes designed to support

this broad spectrum of use cases

Optimized for latency-sensitive applications and general-

purpose, file-based workloads – the best option for the majority

of use cases

General

Purpose mode

Max I/O modeCan scale to higher levels of aggregate throughput with a tradeoff

of slightly higher latencies for file operations

Default: Recommended for most use cases

Use Amazon CloudWatch to determine whether your application can benefit

from Max I/O mode; if not, you’ll get the best performance in General Purpose mode

Page 17: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

EFS provides a throughput bursting model that

scales as a file system grows

As a file system gets larger, it

needs access to more

throughput

Many file workloads are spiky,

with peak throughput well above

average levels+

Amazon EFS scalable bursting model is designed to

make performance available when you need it

Page 18: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Throughput bursting model based on earning

and spending “bursting credits”

• File systems earn credits at a “baseline rate” of 0.05 MB/s per GB stored and use credits by

performing file system operations; file systems can drive throughput at “baseline rate” indefinitely

• File systems with a positive bursting credit balance can “burst” to higher levels for periods of time:

100 MB/s for file systems 1 TB or smaller, 100 MB/s per TB for file systems larger than 1 TB

• New file systems start with a full credit balance

Page 19: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Bursting model examples

File system size Read/write throughput

A 100 GB EFS file system can… • Drive up to 5 MB/s continuously

or

• Burst to 100 MB/s for up to 72 minutes each day

A 1 TB EFS file system can… • Drive up to 50 MB/s continuously

or

• Burst to 100 MB/s for up to 12 hours each day

A 10 TB EFS file system can… • Drive up to 500 MB/s continuously

or

• Burst to 1 GB/s for up to 12 hours each day

Page 20: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

In Which Regions Can I Use EFS Today?

US East (N. Virginia) – us-east-1

US East (Ohio) – us-east-2

US West (Oregon) – us-west-2

EU (Ireland) – eu-west-1

More coming soon!

Page 21: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Simple and predictable pricing

With EFS, you pay for only the storage space you use

• No minimum commitments or upfront fees

• No need to provision storage in advance

• No other fees, charges, or billing dimensions

EFS price:

• $0.30/GB/month (N.Virginia, Ohio, Oregon)

• $0.33/GB/month (Ireland)

Customers within their first 12 months on AWS can use up to

5 GB/month for free

Page 22: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Introduction

Austin Fonacier

Lead Software Architect at Spokeo

[email protected]

@austinrfnd

http://github.com/austinrfnd

Page 23: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Spokeo

People search engine

Headquartered in beautiful

Pasadena, CA

200+ employees

18,000,000 unique visitors a month

8.5 billion people records

30,000,000 bot hits per 24 hour

period

Page 24: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Spokeo the product

Search for people by any

intersection of data:

first name, last name, email, age,

address, phone, email, or relative

name

Page 25: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Email/username and address search

Page 26: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Spokeo tech stack

Page 27: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

High-level Spokeo tech stack

Page 28: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Our challenge: SEO pages

3,000,000,000 SEO pages

≈ 37.4 terabytes of data

≈ 30,000,000 crawls per day

Page 29: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

SEO pages: compatibility Crawlers Users

Page 30: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

The importance of page speed

● Page speed abandonment rate

● Google utilizes page speed for

search ranking

● Studies show a direct relationship

with page speed and conversion

rate

Page 31: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

How to get faster?

Page 32: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Ninety - ninety rule

“The first 90 percent of the code accounts for the first 10

percent of the development time. The remaining 10 percent

of the code accounts for the other 90 percent of the

development time.”

- Tom Cargill Bell Labs

Page 33: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

How to get faster?

Page 34: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Switch away from Ruby on Rails

● Ton of effort

● Ton of time

● Ton of money

● Unmeasurable performance gains

Page 35: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

How to get faster?

Page 36: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Reverse Proxy

● Low effort (some code/header

tweaks)

● Immediate measurable

performance gains

Page 37: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Over the counter reverse proxy

● Fast

● Easy (CDN & header changes)

● Global delivery system

Page 38: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

The LRU

Page 39: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

The LRU

Page 40: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

The LRU

Page 41: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

The LRU

Page 42: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Back to the drawing board

“Always serve Google the

fastest possible page”

- Mike Daly

Spokeo CTO

Google is the toughest critic. By

making the site faster for

Google, we are making our

customers happy

Page 43: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Cache requirements

As fast as reasonably possible

Cost efficient

Scalable

Fault tolerant

Failover/availability

Page 44: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Ruby on Rails cache

Rails penalty of going through the framework

“Always serve Google the

fastest possible page”

- Mike Daly

Spokeo CTO

Page 45: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Two-part project

Reverse proxy

Backend

Page 46: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Proposed topology

Page 47: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Off-the-shelf reverse proxies

Page 48: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Reverse proxy options

● All have in-memory mapping of keys and

values

● Nginx and Varnish are expensive

● Apache Traffic Server doesn’t notify

other nodes on writes

● All are huge code bases

Page 49: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Reverse proxy optionsCons Lines of Code Cost

Nginx Memory mapping of all

keys

164,978 $187,573

Varnish Memory mapping of all

keys

220,813 $495,999

Apache Traffic Server Memory mapping of all

keys

Writes don’t propagate

between nodes

889,824 $6,771

Assuming 45 c4.xlarge instances per month

Page 50: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

● In-house expert knowledge

● Very simple use case

● Inexpensive to run (thin node.js app)

● No in-memory mapping

Write our own: MassCache

Page 51: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

MassCache

Page 52: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Back ends

Page 53: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Back ends

Cost* Performance Cons

Amazon S3/Amazon

CloudFront

$6,000 10-1000ms CloudFront LRU

Amazon DynamoDB $11,000 20-30ms

Amazon

ElastiCache

$90,000 Fast Not data-persistent

Amazon EBS

volumes

- - EBS mounting

limitations**

Amazon EFS $11,000/month 17 ms reads

30 ms writes(Max I/O mode, more

details next slide)

Page 54: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

EFS

Price: $11,000/month

Performance: 17 ms for read, 30 ms for writes• Latencies in Max IO mode (General Purpose mode has

lower latencies)

• Writes: Node.js Open, Write, and Close

• Reads: Node.JS file descriptor, file stats, and reading the

contents

• 30 kb files and peak EFS size is 2.3 GB

Built-in data redundancy

Built-in scalability

Page 55: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

EFS costs

Page 56: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Spokeo tech stack now

Page 57: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Populating EFS: Cacheup

● Actively populate EFS as fast as possible

● EFS doesn’t shy away from 250,000 requests/second

● Populate 3,000,000,000 files in one week

● Cache invalidation: sending requests with a special

header

Page 58: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Cacheup: dynamic throttling

Dynamic throttling based off of key metrics

of our stack:

● Application performance index scores

● Response times

● Database load

Page 59: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Benefits after a year with EFS, MassCache, and

Cacheup

● Costs to serve a cached page

● Horizontally scalable

• 37.4 TB

• 3,000,000,000 files

• 30,000,000 requests per day

● Active warming taught us about bottlenecks on our webstack

● Site redundancy

● Built-in DDOS protection

● Google webmaster dashboard numbers are steady

Page 60: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

EFS is the cloud

EFS to us feels like an infinitely

scalable resource

● Fast

● Easy

● Cheap

● Data redundant

● Goldilocks solution for us

Page 61: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

EFS gotchas

● Writes are slower than reads

● Writing a file is slightly slower than updating a file

● Improvements have been made since preview a year ago

and will continue to occur; including support for NFSv4.1

● Any access to EFS looks like a file access but is actually a

network call!

● General Purpose (GP) ≠ Max I/O

Page 62: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

DDoS/site redundancy

Page 63: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Crawler spike protection

Page 64: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Related Sessions

• STG202 - Deep Dive on Amazon Elastic File System • Recorded on Wednesday

• STG207 - Case Study: How Atlassian Uses Amazon

EFS with JIRA to Cut Costs and Accelerate Performance• Friday 12:30pm

• STG208 - Case Study: How Monsanto Uses Amazon

EFS with Their Large-Scale Geospatial Data Sets • Friday 11:00am

Page 65: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Thank you!

We’re hiring

http://spokeo.com/jobs

Page 66: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Remember to complete

your evaluations!

Page 67: AWS re:Invent 2016: Case Study: How Spokeo Improved Web Application Response Times with Amazon EFS (STG206)

Questions?