Open-source Infrastructure at Lyft

Preview:

Citation preview

Open-source Infrastructure at LyftConstance Caramanolis

Daniel Hochman July 2017

Overview of Lyft Architecture

Open-source Infrastructure Projects

- Confidant

- Discovery

- Ratelimit

- Envoy

Q&A

Agenda

Architecture (simplified)

Front Envoy Application

Envoy

DiscoveryConfidant

>100 Clusters

Ratelimit

Python

lyft / confidant

Your secret keeper. Stores secrets in Dynamo, encrypted at rest.

1,105

12 contributors

November 2015

How is a service configured?

lyft / location-service Private

common:

PORT: 8080

TIMEOUT_MS: 15000

development:

USE_AUTH: False

staging:

API_KEY: secret_key_igjq3i494fqq234qbc

production:

API_KEY: secret_key_ojajf823jj49ij8h

environment.yaml

Servicelocation-service

Confidant to the rescue!

Credentialapi_key: password123

Behind the scenes

Application

IAM Role

EC2 Instance

Credential

api_key: password123

api_key = os.getenv('CREDENTIAL_API_KEY')

KMS

DynamoDB

Confidant

Server-blind secrets

Highly sensitive secrets are encrypted and decrypted by the end-users.

Confidant stores but can't read them.

Confidant

KMS

IAM Role

EC2 Instance

lyft / discovery

Provides a REST interface for querying for the list of hosts that belong to a microservices

54

6 contributors

Python

August 2016

POST /v1/registration/location-service

{

"ip": "10.0.0.1",

"port": 80,

"revision": "da08f35b",

"tags": {

"id": "i-910203",

"az": "us-east-1a",

"canary": true

}

}

Tracking hosts

* * * * *

- Hosts are stored in DynamoDB

- Storage support is abstract

- Hosts removed if not reporting since now - HOST_TTL

- Ecosystem designed to tolerate eventual consistency

unlike Zookeeper, etcd, Consul

- Pair with active healthchecks

Storage

DynamoDB

GET /v1/registration/<service>

{

"hosts": [

{

"ip": "10.0.0.1", "port": 80, "revision": "da08f35b",

"tags": {"id": "i-910203", "az": "us-east-1a", "canary": true}

},

...

{

"ip": "10.0.0.2", "port": 80, "revision": "da08f35b",

"tags": {"id": "i-121286", "az": "us-east-1d"}

}

]

}

Fetching hosts

Services list the hosts they want to talk to!

internal_hosts:

- jobscheduler

- roads

external_hosts:

- dynamodb_iad

- kinesis_iad

Envoy per-service configuration

location-service/envoy.yaml

/etc/envoy.conf(on the box)

Active Healthcheck

Application

Envoy

Discovery

jobscheduler

roads

GET /healthcheck

Application

Envoy

GET

GET

Every host healthchecks every host in a destination cluster

location-service

lyft / ratelimit

Go/gRPC service designed to enable generic rate limit scenarios

224

6 contributors

Go

January 2017

Why rate limit?

- Control flow

- Protect against attacks

- Bad actors

- Accidents happenoops

!

Rate Limit Service

- Written in Go

- Enable generic rate limit

scenarios

- Decisions based on a domain

and set of descriptors

- Settings configured at runtime

- Backed by Redis

Ratelimit

?

INCR

Domains and descriptors

Domain

Defines a container for a set of rate limits

Globally unique

e.g. "envoy_front"

Descriptors

Ordered list of key/value pairs

Case sensitive

e.g. ("destination_cluster", "location-service"), ("user_id", "1234")

Limit definition

Runtime Setting

Defines the request per unit for a descriptor.

Request flow example

Rq1: (“user_id”, “1234”)

Redis state: user_id_1234 : 1

Rs1: RateLimitResponse_OK

Rq2: (“user_id”, “9876”)

Redis state: user_id_1234: 1, user_id_9876 : 1

Rs2: RateLimitResponse_OK

Rq3: (“user_id”, “1234)

Redis state: user_id_1234: 2, user_id_9876 : 1

Rs3: RateLimitResponse_OVER_LIMIT

Definition

domain: test_domain

key: user_id

rate_limit:

unit: hour

requests_per_unit: 1

Ratelimit Client

from lyft_idl.client.ratelimit.ratelimit_client import RateLimitClient

ratelimit_client = RateLimitClient(settings.LYFT_API_USER_AGENT)

# Determines whether or not to limit jsonp_messages_post according to ratelimit service.

def should_allow_jsonp_messages_post(ip_address, phone_number):

domain = settings.get('RATE_LIMIT_DOMAIN')

ip_descriptors = [(('jsonp_messages_post_from_ip_address', ip_address), )]

phone_descriptors = [(('jsonp_messages_post_from_phone_number', phone_number), )]

return (

ratelimit_client.is_request_allowed(domain, ip_descriptors) and

ratelimit_client.is_request_allowed(domain, phone_descriptors)

)

lyft / envoy

Front/service L7 proxy

1,924

62 contributors

C++

September 2016

Why Envoy?

Service Oriented Architecture

- Many languages and frameworks

- Protocols (HTTP/1, HTTP/2, databases, caching, etc…)

- Partial implementation of SoA best practices (retries, timeouts, …)

- Observability

- Load balancers (AWS, F5)

What is Envoy?

The network should be transparent to applications.

When network and application problems do occur it

should be easy to determine the source of the problem.

What is Envoy?

- Modern C++11

- Runs alongside applications

- Service discovery integration

- Rate Limit integration

- HTTP2 first (get gRPC!)

- Act as front/edge proxy

- Stats, Stats, Stats

- Logging

Observability: Global Health

Observability: Service to Service

Envoy Client in Python (internal)

from lyft.api_client import EnvoyClient

switchboard_client = EnvoyClient(

service='switchboard'

)

switchboard_client.post(

"/v2/messages",

data={

'template': 'welcome'

},

headers={

'x-lyft-user-id': 12345647363394

}

)

Envoy deployment @Lyft

- > 100 services

- > 10,000 hosts

- > 2,000,000 RPS

- All service to service traffic (REST and gRPC)

- MongoDB, DynamoDB, Redis proxy

- External service proxy (AWS and other partners)

- Kibana/Elastic Search for logging.

- LightStep for tracing

- Wavefront for stats

Architecture RevisitedFront Envoy

Application

Envoy

DiscoveryConfidant

>100 Clusters

Ratelimit

Done!

- Lyft is hiring. If you want to work on large-scale problems in a fast-moving,

high-growth company visit lyft.com/jobs

- Visit github.com/lyft

- Slides available at slideshare.net/danielhochman

- Q&A