Spark Summit EU talk by Shaun Klopfenstein and Neelesh Shastry

Preview:

Citation preview

Elastic Streaming Spark Streaming + Dynamic Provisioning + Dynamic Allocation

Neelesh Shastry, ArchitectShaun Klopfenstein, CTO

The Vision

Requirements

Page 4Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Business Requirements

• Nearreal-timeactivityprocessing• Billons activitiespercustomerperday• Improve costefficiencyofoperationswhilescaling up• Globalenterprisegradesecurity andgovernance

Page 5Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

SAAS Requirements

• Customersareaddedandremoved• Fairnessandthrottlingpercustomer• Strictsequentialeventprocessingforsomeapplications• Temporarilysuspendacustomer,whenerrorsoccur

Technology Selection

Page 7Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Use Cases

• Reacttoactivities• Sendanemailwhensomeone visitsawebpage• Changethescorewhensomeone fillsaform

• Replicatedata• BuildSolrIndexes, near real-time• UpdateDataXChange– aninternal leadcache• Syncto/fromCRMSystems

• Analytics• Incrementallyupdateemailreports• Enrichactivitiesandfeed toDruidforadvancedemail/webreports

Page 8Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Why Spark Streaming?

• Micro-batchingprovidessink-sideefficiencies• GreatintegrationwithKafka• Nostrictrealtimeprocessingrequirements• Greatcommunity,industryadoption

Page 9Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Challenges with Spark + Kafka

• Nowaytoadd/removetopicsonthefly• NooutoftheboxsupportforsequencingRDDs• Nosupportforturningofftopicsundererrors• DoesnotplaywellwithscalingKafkapartitionsup/down,whenorderingisrequired

Page 10Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Challenges - Stragglers

• Abatchcan’tcompleteuntiltheslowestoperationfinishes

• Manyofourbatchesincludeslowoperations• Sometimesdon’tcompletewithinthebatchtime

• Batchesaremultitenant• onecustomersoperationcandelayprocessingforothercustomersinthesamebatch

• Severeimpactonutilization&batchdelay

Architecture & Design

Page 12Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Marketo Activity Architecture

Page 13Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Kafka Topics Organization

• Onetopicperusecase,datafromallcustomers• Easytomanage• Asinglecustomercancreatebacklogsforothersduringactivitystorms

• Fairness/throttlingishardtoimplement

• Onetopicperusecase,percustomer• Stormsareisolatedtothecustomer• Fairness/throttlingiseasytocontrol,bytweakingthetopic• PressureonKafkaZK– sofarnotaproblem

Solutions

Page 15Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Dynamic provisioning capacity

JobGenerator

DAGScheduler

Executor1

Executor2

MultitenantKafka

DStream

OffsetManager

ProvisioningFramework

CustomerRegistry

Add/Remove

Check & Pull Changes

compute#Get new offsets

Generate RDD

Submit Job

Schedule Tasks

Page 16Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Marketo Offset Manager

• Tracksmultitenancy• StreamingJobsprocessdataformanycustomers

• AccessingmultipleKafkatopicsandpartitions

• Addsnewtopics• Remove/Deactivate/Suspendtopics

Page 17Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

• EnablesefficientmultitenantRDDs• ControlledsequencingofRDDs• CoalesceKafkapartitions

• Bin-packingforefficiency

• Maintainspartitionlineageforoffsetmanagement

Multitenant DStream

Page 18Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Provisioning

• Managesallocatingcustomerstoasparkstreamingapplication

• roundrobin+resourceaffinity• Enablesrebalancingofcustomersacrosssparkstreamingjobs

• Oozie basedframework

Page 19Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Dynamic Resource Allocation

• SPARK-12133• Goal– “makeprocessing timeinfinitelyclosetoduration”• Assumes tasksareroughlysimilar

• Stragglersthrowthisgoaloff• Whatwereallywant:

• DRA+Safeconcurrent jobexecution

Page 20Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Results so far

• ~10differentusecases• >100SparkExecutors• >1000KafkaPartitions• Processinglatencies<5s(99th %)• Rolledoutto~20%customers

Future Work

Page 22Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Application Scheduling

• Schedulingwithinanapplicationtohandlestragglers• spark.streaming.concurrentJobs• Exploringschedulerpools• ChangestoStreamingJobScheduler,toexecutemultipleRDDssafely

Page 23Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Scaling Up Kafka Partitions

• Ourcustomersgrowinsizeoveraperiodoftime• Orderingrequirementsmeanwecannotaltertopiconthefly

• Coordinationrequiredonbothproducer&consumerfronts

• Enhanceprovisioner tomanagepartitionup/downscaling

Page 24Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Move to 2.x and Open Source!

We’re Hiring! Http://Marketo.Jobs

Q & A

Q & A

Page 27Marketo Proprietary andConfidential|©Marketo, Inc.10/30/16

Architecture Requirements

• Maximizeutilizationofhardware• Multitenancy supportwithfairness• Encryption,Authorization&Authentication• Applicationsmustscalehorizontally

Deploying It

Running It

Recommended