Upload
daniel-hochman
View
196
Download
2
Embed Size (px)
Citation preview
Open-source Infrastructure at LyftConstance Caramanolis
Daniel Hochman July 2017
Overview of Lyft Architecture
Open-source Infrastructure Projects
- Confidant
- Discovery
- Ratelimit
- Envoy
Q&A
Agenda
Architecture (simplified)
Front Envoy Application
Envoy
DiscoveryConfidant
>100 Clusters
Ratelimit
Python
lyft / confidant
Your secret keeper. Stores secrets in Dynamo, encrypted at rest.
1,105
12 contributors
November 2015
How is a service configured?
lyft / location-service Private
common:
PORT: 8080
TIMEOUT_MS: 15000
development:
USE_AUTH: False
staging:
API_KEY: secret_key_igjq3i494fqq234qbc
production:
API_KEY: secret_key_ojajf823jj49ij8h
environment.yaml
Servicelocation-service
Confidant to the rescue!
Credentialapi_key: password123
Behind the scenes
Application
IAM Role
EC2 Instance
Credential
api_key: password123
api_key = os.getenv('CREDENTIAL_API_KEY')
KMS
DynamoDB
Confidant
Server-blind secrets
Highly sensitive secrets are encrypted and decrypted by the end-users.
Confidant stores but can't read them.
Confidant
KMS
IAM Role
EC2 Instance
lyft / discovery
Provides a REST interface for querying for the list of hosts that belong to a microservices
54
6 contributors
Python
August 2016
POST /v1/registration/location-service
{
"ip": "10.0.0.1",
"port": 80,
"revision": "da08f35b",
"tags": {
"id": "i-910203",
"az": "us-east-1a",
"canary": true
}
}
Tracking hosts
* * * * *
- Hosts are stored in DynamoDB
- Storage support is abstract
- Hosts removed if not reporting since now - HOST_TTL
- Ecosystem designed to tolerate eventual consistency
unlike Zookeeper, etcd, Consul
- Pair with active healthchecks
Storage
DynamoDB
GET /v1/registration/<service>
{
"hosts": [
{
"ip": "10.0.0.1", "port": 80, "revision": "da08f35b",
"tags": {"id": "i-910203", "az": "us-east-1a", "canary": true}
},
...
{
"ip": "10.0.0.2", "port": 80, "revision": "da08f35b",
"tags": {"id": "i-121286", "az": "us-east-1d"}
}
]
}
Fetching hosts
Services list the hosts they want to talk to!
internal_hosts:
- jobscheduler
- roads
external_hosts:
- dynamodb_iad
- kinesis_iad
Envoy per-service configuration
location-service/envoy.yaml
/etc/envoy.conf(on the box)
Active Healthcheck
Application
Envoy
Discovery
jobscheduler
roads
GET /healthcheck
Application
Envoy
GET
GET
Every host healthchecks every host in a destination cluster
location-service
lyft / ratelimit
Go/gRPC service designed to enable generic rate limit scenarios
224
6 contributors
Go
January 2017
Why rate limit?
- Control flow
- Protect against attacks
- Bad actors
- Accidents happenoops
!
Rate Limit Service
- Written in Go
- Enable generic rate limit
scenarios
- Decisions based on a domain
and set of descriptors
- Settings configured at runtime
- Backed by Redis
Ratelimit
?
INCR
Domains and descriptors
Domain
Defines a container for a set of rate limits
Globally unique
e.g. "envoy_front"
Descriptors
Ordered list of key/value pairs
Case sensitive
e.g. ("destination_cluster", "location-service"), ("user_id", "1234")
Limit definition
Runtime Setting
Defines the request per unit for a descriptor.
Request flow example
Rq1: (“user_id”, “1234”)
Redis state: user_id_1234 : 1
Rs1: RateLimitResponse_OK
Rq2: (“user_id”, “9876”)
Redis state: user_id_1234: 1, user_id_9876 : 1
Rs2: RateLimitResponse_OK
Rq3: (“user_id”, “1234)
Redis state: user_id_1234: 2, user_id_9876 : 1
Rs3: RateLimitResponse_OVER_LIMIT
Definition
domain: test_domain
key: user_id
rate_limit:
unit: hour
requests_per_unit: 1
Ratelimit Client
from lyft_idl.client.ratelimit.ratelimit_client import RateLimitClient
ratelimit_client = RateLimitClient(settings.LYFT_API_USER_AGENT)
# Determines whether or not to limit jsonp_messages_post according to ratelimit service.
def should_allow_jsonp_messages_post(ip_address, phone_number):
domain = settings.get('RATE_LIMIT_DOMAIN')
ip_descriptors = [(('jsonp_messages_post_from_ip_address', ip_address), )]
phone_descriptors = [(('jsonp_messages_post_from_phone_number', phone_number), )]
return (
ratelimit_client.is_request_allowed(domain, ip_descriptors) and
ratelimit_client.is_request_allowed(domain, phone_descriptors)
)
lyft / envoy
Front/service L7 proxy
1,924
62 contributors
C++
September 2016
Why Envoy?
Service Oriented Architecture
- Many languages and frameworks
- Protocols (HTTP/1, HTTP/2, databases, caching, etc…)
- Partial implementation of SoA best practices (retries, timeouts, …)
- Observability
- Load balancers (AWS, F5)
What is Envoy?
The network should be transparent to applications.
When network and application problems do occur it
should be easy to determine the source of the problem.
What is Envoy?
- Modern C++11
- Runs alongside applications
- Service discovery integration
- Rate Limit integration
- HTTP2 first (get gRPC!)
- Act as front/edge proxy
- Stats, Stats, Stats
- Logging
Observability: Global Health
Observability: Service to Service
Envoy Client in Python (internal)
from lyft.api_client import EnvoyClient
switchboard_client = EnvoyClient(
service='switchboard'
)
switchboard_client.post(
"/v2/messages",
data={
'template': 'welcome'
},
headers={
'x-lyft-user-id': 12345647363394
}
)
Envoy deployment @Lyft
- > 100 services
- > 10,000 hosts
- > 2,000,000 RPS
- All service to service traffic (REST and gRPC)
- MongoDB, DynamoDB, Redis proxy
- External service proxy (AWS and other partners)
- Kibana/Elastic Search for logging.
- LightStep for tracing
- Wavefront for stats
Architecture RevisitedFront Envoy
Application
Envoy
DiscoveryConfidant
>100 Clusters
Ratelimit
Done!
- Lyft is hiring. If you want to work on large-scale problems in a fast-moving,
high-growth company visit lyft.com/jobs
- Visit github.com/lyft
- Slides available at slideshare.net/danielhochman
- Q&A