Upload
srisatish-ambati
View
2.108
Download
0
Embed Size (px)
Citation preview
BUILDINGAMACHINELEARNINGAPPLICATIONWITHAWSLAMBDA
Ludi [email protected]
SiliconValleyBigDataScienceMeetupMarch17,2016
(+helpfromTomandPrithvi)
BUILDINGA MACHINE LEARNINGAPPLICATIONWITHAWSLAMBDA
Q: WhatisAWSLambda?A: AWSLambda isacomputeservicethatrunscode–aLambdafunction- on-demand.Itsimplifiestheprocessofrunningcodeinthecloudbymanagingcomputeresourcesautomatically.
OffloadsDevOps tasksrelatedtoVMs:• Serverandoperatingsystemmaintenance• Capacityprovisioning• Scaling• Codemonitoringandlogging• Securitypatches
MAJORSTEPS
Step1:IdentifyproblemtosolveStep2: TrainmodelondataStep3: ExportthemodelasaPOJOStep4:WritecodeforLambdahandlerStep5: Builddeploymentpackage(.zipfile)and
uploadtoLambdaStep6: MapAPIendpointtoLambdafunctionStep7:Embedendpointinapplication
ACONCRETE USECASE: DOMAINNAMECLASS IFICATION
Maliciousdomains• Carryoutmaliciousactivity- botnets,phishing,malwarehosting,etc
• Namesaregeneratedbyalgorithmstodefeatsecuritysystems
Goal:Classifydomainsaslegitimatevs.malicious
Legitimate Malicioush2o zyxgifnjobqhzptuodmzov
zen-cart c3p4j7zdxexg1f2tuzk117wyzn
fedoraforum batdtrbtrikw
FEATURES
• Stringlength• ShannonEntropy
oMeasureofuncertaintyinarandomvariable
• NumberofsubstringsthatareEnglishwords• Proportionofvowels
DATA
• Domainsandwhethertheyaremaliciouso http://datadrivensecurity.info/blog/data/2014/10/legit-dga_domains.csv.zip
o 133,927 rows• Englishwords
o https://raw.githubusercontent.com/dwyl/english-words/master/words.txt
o 354,985rows
MODELINFORMATION
MaliciousDomainModel
Algorithm: GLMModelfamily: BinomialRegularization: RidgeThreshold(maxF1): 0.4935
Class 0 1 Error
0 15889 315 FPR0.0194
1 346 10043 FNR0.0333
Confusion matrix on validation data
Actual
Predicted
WORKFLOWFORTHISAPP
Inputdomainname
GetPredictions
MaliciousDomain?
Visitwebpage
Malicious Legitimate
Yes No
APPARCHITECTUREDIAGRAM
RESTendpoint
JavaScriptApp
Lambda
JythonFeatureMunging
LambdaFunctionHandler
H2OModelPOJO
Prediction
HTTPS POST
domain name
JSONwith
prediction
LAMBDAFUNCTIONHANDLER
publicstaticResponseClass myHandler(RequestClassrequest,Contextcontext)throwsPyException {
PyModule module=newPyModule();
//Predictioncodeisinpymodule.pydouble[]predictions=module.predict(request.domain);returnnewResponseClass(predictions);}
RESTendpoint
JythonFeatureMunging
LambdaFunctionHandler
H2OModelPOJO
Prediction
JYTHONFEATUREMUNGING
def predict(domain):domain=domain.split('.')[0]row=RowData()functions=[len,entropy,p_vowels,num_valid_substrings]eval_features =[f(domain)forfinfunctions]names=NamesHolder_MaliciousDomainModel().VALUESbeta=MaliciousDomainModel().BETA().VALUESfeature_coef_product =[beta[len(beta)- 1]]fori inrange(len(names)):row.put(names[i],float(eval_features[i]))feature_coef_product.append(eval_features[i]*beta[i])
#predictionmodel=EasyPredictModelWrapper(MaliciousDomainModel())p=model.predictBinomial(row)
RESTendpoint
JythonFeatureMunging
LambdaFunctionHandler
H2OModelPOJO
Prediction
H2OMODEL POJO
• staticfinalclassBETA_0implementsjava.io.Serializable {staticfinalvoidfill(double[]sa){sa[0]=1.49207826021648;sa[1]=2.8502716978560194;sa[2]=-8.839804567200542;sa[3]=-0.7977065034624655;sa[4]=-14.94132841574946;}}
RESTendpoint
JythonFeatureMunging
LambdaFunctionHandler
H2OModelPOJO
Prediction
HANDS-ONDEMONSTRATION
STEP1:Build$git clonehttps://github.com/h2oai/app-malicious-domains$cdapp-consumer-loan$gradle wrapper$./gradlew build
STEP2:CreateLambdafunctionandsetAPIendpointSeeinstructionsandscreenshotsinREADME.md
STEP3:Usetheappinawebbrowser$./gradlew jettyRunWar –xgenerateModelhttp://localhost:8080
TROUBLESHOOTING
• CommonPy errorso AnotherH2Oisalreadyrunning
• Py scriptcan’tfindthedatainh2o.import_file()• CommonJavaerrors
o Javanotinstalledatall• Also,mustinstallaJDK(JavaDevelopmentKit)sothattheJavacompileris
available(JREisnotsufficient)o Notconnectedtotheinternet
• Gradle needstofetchsomedependenciesfromtheinternet• CommonLambdaerrors
o Errorinuploading.zipfile• Checkifthefunctionalreadyexistsand,ifnot,tryagain.Forslowerinternet
connections,tryuploading.zipfilewithS3link.o TimeouterrorwhentestingLambdafunction
• GotoadvancedsettingsandincreaseTimeoutvalueo GatewayTimeout(504error)
• ThisisLambda’scoldstartbehavior.Keeptrying,eventuallyLambdakicksin
CAVEATS
• Statelesso Canaccessstateful databycallingotherwebservices,suchasAmazonS3orAmazonDynamoDB.
• Coldstartbehavioro containersareinstantiatedandreusedafterthefirstrequestandstayactiveforawindowoftime(10-20minutes)
o “thelongerIleaveitbetweeninvocations,thelongerthefunctiontakestowarmup”
• APIGatewaytimeoutof10secso Canrequestlongertimeout
CONFIGURINGLAMBDAFUNCTIONS
• Memoryo AllocatesproportionalCPUpower,networkbandwidth,anddiskI/O
o Easysingle-dialsolutiono Logshowshowmuchmemorywasusedfortuningandcostsavings
• Timeout
LAMBDARESOURCEL IMITS
Resource DefaultLimit
Memory 512MB
Numberof filedescriptors 1,024
Numberofprocessesandthreads(combined total)
1,024
Maximumexecutiondurationperrequest 300seconds
Invoke requestbodypayloadsize 6MB
Invoke responsebodypayloadsize 6MB
Concurrentexecutionsperregion 100
Item DefaultLimit
Lambdafunction deploymentpackagesize(.zip/.jarfile)
50MB
Sizeofcode/dependencies thatyoucanzipintoadeploymentpackage(uncompressed zip/jarsize)
250MB
LAMBDAPRICING
• Lambdao Requests
• First1millionpermontharefree• $0.20per1millionrequeststhereafter
o Duration• First400,000GB-secondsofcomputetimepermontharefree• $0.00001667foreveryGB-second thereafter
• APIGatewayo $3.50permillionAPIcallsreceivedplusdatatransfercosts
• EstimateforMaliciousDomainApplication:• Lambda:$0.37/hourwith10threadsafterfree-tier• APIGateway:$0.71/hour• Total:~$1/hr
LAMBDAPERFORMANCE
Memory(MB) Threads Loops Samples Median
(ms)Min(ms)
Max(ms)
%Error
Throughput(calls/sec)
512 1 10000 10000 102 85 2137 0 8.4
512 10 1000 10000 102 85 30330 0.18 44
512 100 100 10000 149 85 30307 0.43 168
LAMBDASCALING
• Automaticallyscalestosupporttherateofincomingrequests
• “Nolimittothenumberofrequestsyourcodecanhandle”
• StartsasmanyinstancesofLambdafunctionasneeded
RELATEDEXAMPLES
• H2OGeneratedModelPOJOinaJavaServletcontainero Github:h2oai/app-consumer-loan
• H2OGeneratedModelPOJOinaStormbolto GitHub:h2oai/h2o-world-2015-trainingo tutorials/streaming/storm
• H2OGeneratedModelPOJOinSparkStreamingo GitHub:h2oai/sparkling-watero examples/src/main/scala/org/apache/spark/examples/h2o/CraigslistJobTitlesStreamingApp.scala
RESOURCESONTHEWEB
• Slideso GitHub h2oai/h2o-tutorials/tree/master/tutorials/aws-lambda-app
• Sourcecodeo GitHub h2oai/app-malicious-domains
• LateststableH2OforPythonreleaseo http://h2o.ai/download/h2o/python
• GeneratedPOJOmodelJavadoco http://h2o-release.s3.amazonaws.com/h2o/rel-turan/3/docs-
website/h2o-genmodel/javadoc/index.html
• AWSLambdao http://docs.aws.amazon.com/lambda/latest/dg/welcome.html