AWS Lambda from the trenches

from the

TRENCHESTRENCHES

what you should know before you go to production

AWS LAMBDAAWS LAMBDA

hi, I’m Yan Cui

AWS user since 2009

apr, 2016

hidden complexities and dependencies

low utilisation to leave room for traffic spikes

EC2 scaling is slow, so scale earlier

lots of cost for unused resources

up to 30 mins for deployment

deployment required downtime

- Dan North

“lead time to someone saying thank you is the only reputation

metric that matters.”

“what would good

look like for us?”

be small be fast

have zero downtime have no lock-step

DEPLOYMENTS SHOULD...

FEATURES SHOULD...be deployable independently

be loosely-coupled

WE WANT TO...minimise cost for unused resources

minimise ops effort reduce tech mess

deliver visible improvements faster

nov, 2016

170 Lambda functions in prod

1.2 GB deployment packages in prod

95% cost saving vs EC2

15x no. of prod releases per month

timeis a good fit

1st function in prod!time

is a good fit

timeis a good fit

1st function in prod!

ALERTING

CI / CD

TESTING

LOGGING

MONITORING

170 functions

timeis a good fit

1st function in prod!

SECURITY

DISTRIBUTEDTRACING

CONFIGMANAGEMENT

evolving the PLATFORM

rebuilt search

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearch

Amazon CloudSearchAmazon API Gateway Amazon Lambda

new analytics pipeline

Google BigQuery

1 developer, 2 daysdesign production

(his 1st serverless project)

Google BigQuery“nothing ever got done

this fast at Skype!”

- Chris Twamley

- Dan North

“lead time to someone saying thank you is the only reputation

metric that matters.”

Rebuiltwith Lambda

BigQuery

grapheneDB

BigQuery

grapheneDB

BigQuery

grapheneDB

BigQuery

getting PRODUCTION READY

CHOOSE A

FRAMEWORK

DEPLOYMENT

http://serverless.com

http://apex.run

https://github.com/claudiajs/claudia

https://github.com/Miserlou/Zappa

http://gosparta.io/

TESTING

amzn.to/29Lxuzu

Level of Testing

1.Unitdo our objects do the right thing?are they easy to work with?

Level of Testing

1.Unit2.Integrationdoes our code work against code we can’t change?

handler

test by invoking the handler

Level of Testing

1.Unit2.Integration3.Acceptancedoes the whole system work?

Level of Testing

integration

acceptance

confidence

“…We find that tests that mock external libraries often need to be complex to get the code into the right state for the functionality we need to exercise.

The mess in such tests is telling us that the design isn’t right but, instead of fixing the problem by improving the code, we have to carry the extra complexity in both code and test…”

Don’t Mock Types You Can’t Change

“…The second risk is that we have to be sure that the behaviour we stub or mock matches what the external library will actually do…

Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries…”

Don’t Mock Types You Can’t Change

Don’t Mock Types You Can’t ChangeServices

“…Wherever possible, an acceptance test should exercise the system end-to-end without directly calling its internal code.

An end-to-end test interacts with the system only from the outside: through its interface…”

Testing End-to-End

Test Input

Validate

CI + CD PIPELINE

“the earlier you consider CI + CD, the more time you save in the long run”

“…We prefer to have the end-to-end tests exercise both the system and the process by which it’s built and deployed…

This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the software’s lifetime…”

Testing End-to-End

“deployment scripts that only live on the CI

box is a disaster waiting to happen”

Jenkins build config deploys and tests

unit + integration tests

deploy

acceptance tests

build.sh allows repeatable builds on both local & CI

Auto Auto Manual

LOGGING

2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?

UTC Timestamp API Gateway Request Id

your log message

function name

function version

LOG OVERLOAD

CENTRALISE LOGS

MAKE THEM EASILYSEARCHABLE

+ +the elk stack

CloudWatch Logs

CloudWatch Logs AWS Lambda ELK stack

CloudWatch Events

DISTRIBUTED TRACING

“my followers didn’t receive my new post!”

- a user

where could the problem be?

correlation IDs*

* eg. request-id, user-id, yubl-id, etc.

ROLL YOUR OWNCLIENTS

kinesis client

http client

sns client

ROLL YOUR OWNCLIENTS

MONITORING + ALERTING

“where do I install monitoring agents?”

you can’t

• invocation Count• error Count• latency• throttling• granular to the minute• support custom metrics

• same metrics as CW• better dashboard• support custom metrics

https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/

“how do I batch up and send logs in the

background?”

you can’t (kinda)

console.log(“hydrating yubls from db…”);

console.log(“fetching user info from user-api”);

console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”);

console.log(“MONITORING|1489795335|8|count|yubls-served”);

timestamp metric value

metric type

metric namemetrics

CloudWatch Logs AWS Lambda

ELK stacklogs

metrics

CloudWatch

DASHBOARDS

SET ALARMS

DASHBOARDS

SET ALARMS

TRACK APP-LEVELMETRICS

Not Only CloudWatch

“you really don't want your monitoring

system to fail at the same time as the

system it monitors” - me

CONFIG MANAGEMENT

easily and quickly propagate config changes

CENTRALISEDCONFIG SERVICE

config servicegoes here

CENTRALISEDCONFIG SERVICE

CLIENT LIBRARY

sensitive data should be encrypted in-flight, and at rest

(credentials, connection string, etc.)

role-based access

config API

encrypt

role-based access

config API

encrypted at restencrypted in-flight

config API

HTTPSencrypted in-flight

config API

decrypt

role-based access

config API

HTTPSaccess to config API can be controlled with IAM roles*

*http://amzn.to/2mxTOyH

role-based access

FRAMEWORKPLUG-INS

plug-ins

serverless-plugin-kmsvariables

serverless-secrets

serverless-meta-sync

PRO TIPS

MAP TIMEOUTSTO HTTP 504

AVOID 128MBFOR PRODUCTION

continuous timeout loop…

AVOIDCOLDSTARTS

functions are unloaded if idle for a while

noticeable coldstart time(package size matters)

CloudWatch Event AWS Lambda

HEALTH CHECKS?

even then…

functions are recycled every 4 hours

https://www.iopipe.com/2016/09/understanding-aws-lambda-coldstarts/

Coldstarts happen, with few exceptions, 4 hours from the creation of a host VM.

AVOID HARDASSUMPTIONS

ABOUT FUNCTIONLIFETIME

USE STATE FOR

OPTIMISATION

CLEAN UP OLDPACKAGES

max 50 MB deployment package size

max 50 MB deployment package sizemax 75 GB total deployment package size*

* limit is per AWS region

Janitor Monkey

Janitor Lambda

http://bit.ly/2nOAzlt

USE RECURSIONFOR LONG

RUNNING TASKS

max 5 mins execution time

CONSIDERPARTIAL

FAILURES

“AWS Lambda polls your stream and invokes your Lambda function. Therefore, if

a Lambda function fails, AWS Lambda attempts to process the erring batch of

records until the time the data expires…”

http://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html

should function fail on partial/any failures?

use local state to facilitate partial retries

DLQ after max attempts

PROCESS SQSWITH RECURSIVE

FUNCTIONS

http://bit.ly/2npomX6

AVOID HOTKINESS

STREAMS

“Each shard can support up to 5 transactions per second for reads, up to a maximum total data

read rate of 2 MB per second.”

http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html

“If your stream has 100 active shards, there will be 100 Lambda functions running concurrently. Then, each

Lambda function processes events on a shard in the order that they arrive.”

http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html

when no. of processors goes up…

ReadProvisionedThroughputExceeded

can have too many Kinesis read operations…

ReadRecords.IteratorAge

unpredictable spikes in read ‘latency’…

can kinda workaround…

@theburningmonktheburningmonk.comgithub.com/theburningmonk

Yubl’s journey to Serverlesspart 1 : overview http://theburningmonk.com/2016/12/yubls-road-to-serverless-architecture-part-1/

part 2 : test + CI/CD http://theburningmonk.com/2017/02/yubls-road-to-serverless-architecture-part-2/

part 3 : ops http://theburningmonk.com/2017/03/yubls-road-to-serverless-architecture-part-3/

QUESTIONS?

AWS Lambda from the trenches

Technology

AWS Lambda using Serverless Framework - jug.mkjug.mk/presentations/javaskop17/Kope.pdf · AWS Lambda using Serverless Framework - Goran Kopevski - Agenda What is Serverless, Lambda,

Introducing C# in AWS Lambda

Exploring Serverless Architectures: AWS Lambda

Hashiconf AWS Lambda Breakout

AWS Lambda

AWS Black Belt Techシリーズ AWS Lambda Updates

AWS Lambda with ClaudiaJS

AWS Lambda - Developer Guide · PDF fileAWS Lambda Developer Guide When should I Use Lambda? What Is AWS Lambda? AWS Lambda is a compute service that lets you run code without provisioning

Immutable Deployments with AWS CloudFormation and AWS Lambda

Serverless NodeJS With AWS Lambda

[Laws] Meetup - AWS Lambda

AWS Lambda - Developer Guide

Tech Talk AWS lambda

AWS LAMBDA SECURITY BEST PRACTICES · AWS Lambda AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of the Amazon Web Services (AWS). It is

AWS Lambda + AWS Cloudformation

AWS Lambda and Serverless Cloud

Continuous Deployment in AWS Lambda

AWS Black Belt Techシリーズ AWS Lambda

Aws lambda and accesing AWS RDS - Clouddictive

AWS Lambda in Practice