Upload
manuel-bernhardt
View
370
Download
1
Embed Size (px)
DESCRIPTION
Slides of my talk at Scala.io 2014 about large-scale data migration with Akka.
Citation preview
@ELMANU
manuel BERNHART
BACK <
& future: ACTORS
AND > PIPES <using akka for large-scale data
migration
@ELMANU
AGENDA
• { BACKGROUND STORY
• } FUTURES > PIPES < ACTORS
• | LESSONS LEARNED
@ELMANU
who is speaking?
• freelance software consultant based in Vienna
• Vienna Scala User Group
• web, web, web
• writing a book on reactive web-applications
@ELMANU
[ { BACKGROUND
STORY
@ELMANU
talenthouse
• www.talenthouse.com
• based in Los Angeles
• connecting brands and artists
• 3+ million users
@ELMANU
BACKGROUND STORY
• old, slow (very slow) platform
• re-implementation from scratch with Scala & Play
• tight schedule, a lot of data to migrate
@ELMANU
SOURCE SYSTEM
@ELMANU
SOURCE SYSTEM
DISCLAIMER:
What follows is not intended as a bashing of the source system, but as a
necessary explanation of its complexity in relation to data migration.
@ELMANU
SOURCE SYSTEM
@ELMANU
SOURCE SYSTEM
@ELMANU
SOURCE SYSTEM
@ELMANU
SOURCE SYSTEM
@ELMANU
SOURCE SYSTEM
@ELMANU
SOURCE SYSTEM
@ELMANU
SOURCE SYSTEM
@ELMANU
SOURCE SYSTEM
@ELMANU
MIGRATION schedule
•basically, one week-end
•big-bang kind-of migration
• if possible incremental migration beforehand
@ELMANU
[ } FUTURES > PIPES
< ACTORS
@ELMANU
FUTURES
@ELMANU
FUTURES
• scala.concurrent.Future[T]
•holds a value of type T
• can either fail or succeed
@ELMANU
FUTURES: HAPPY PATH
import scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.global
val futureSum: Future[Int] = Future { 1 + 1 }
futureSum.map { sum =>println("The sum is " + sum)
}
@ELMANU
FUTURES: SAD PATHimport scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._
val futureDiv: Future[Int] = Future { 1 / 0 }
val futurePrint: Future[Unit] = futureDiv.map { div =>println("The division result is " + div)
}
Await.result(futurePrint, 1 second)
@ELMANU
FUTURES: SAD PATHimport scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._
val futureDiv: Future[Int] = Future { 1 / 0 }
val futurePrint: Future[Unit] = futureDiv.map { div =>println("The division result is " + div)
}
Await.result(futurePrint, 1 second)
Avoid blocking if possible
@ELMANU
FUTURES: SAD PATHimport scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._
val futureDiv: Future[Int] = Future { 1 / 0 }
futureDiv.map { div =>println("The division result is " + div)
}
Await.result(futureDiv, 1 second)
scala> Await.result(futureDiv, 1.second) java.lang.ArithmeticException: / by zero at $anonfun$1.apply$mcI$sp(<console>:11) at $anonfun$1.apply(<console>:11) at $anonfun$1.apply(<console>:11) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
@ELMANU
FUTURES: SAD PATH
import scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._
val futureDiv: Future[Int] = Future { 1 / 0 }
val futurePrint: Future[Unit] = futureDiv.map { div =>println("The division result is " + div)
}.recover {case a: java.lang.ArithmeticException =>
println("What on earth are you trying to do?")}
Await.result(futurePrint, 1 second) Be mindful of failure
@ELMANU
FUTURES: SAD PATH
•Exceptions are propagated up the chain
•Without recover there is no guarantee that failure will ever get noticed!
@ELMANU
COMPOSING FUTURES
val futureA: Future[Int] = Future { 1 + 1 }val futureB: Future[Int] = Future { 2 + 2 }
val futureC: Future[Int] = for {a <- futureAb <- futureB
} yield {a + b
}
@ELMANU
COMPOSING FUTURES
val futureC: Future[Int] = for {a <- Future { 1 + 1 }b <- Future { 2 + 2 }
} yield {a + b
}
@ELMANU
COMPOSING FUTURES
val futureC: Future[Int] = for {a <- Future { 1 + 1 }b <- Future { 2 + 2 }
} yield {a + b
}
This runs in sequence
Don’t do this
@ELMANU
FUTURES: CALLBACKS
import scala.concurrent._import scala.concurrent.ExecutionContext.Implicits.global
val futureDiv: Future[Int] = Future { 1 / 0 }
futureDiv.onSuccess { case result =>println("Result: " + result)
}
futureDiv.onFailure { case t: Throwable =>println("Oh no!")
}
@ELMANU
using FUTURES
•a Future { … } block that doesn’t do any I/O is code smell
•use them in combination with the “right” ExecutionContext set-up
•when you have blocking operations, wrap them into a blocking block
@ELMANU
using FUTURES
import scala.concurrent.blocking
Future { blocking { DB.withConnection { implicit connection => val query = SQL("select * from bar") query() } }}
@ELMANU
naming FUTURES
@ELMANU
naming FUTURES
“Say
eventuallyMaybe one more time!”
@ELMANU
ACTORS
@ELMANU
ACTORS
• lightweight objects
• send and receive messages (mailbox)
• can have children (supervision)
@ELMANU
ACTORSMailboxMailbox
akka://application/user/georgePeppard akka://application/user/audreyHepburn
akka://application/user/audreyHepburn/cat
Mailbox
@ELMANU
ACTORS
Holly, I'm in love with you.MailboxMailbox
akka://application/user/georgePeppard akka://application/user/audreyHepburn
akka://application/user/audreyHepburn/cat
@ELMANU
ACTORS
Holly, I'm in love with you.MailboxMailbox
akka://application/user/georgePeppard akka://application/user/audreyHepburn
akka://application/user/audreyHepburn/cat
So what?
@ELMANU
GETTING AN ACTOR
import akka.actor._
class AudreyHepburn extends Actor {def receive = { ... }
}
val system: ActorSystem = ActorSystem()
val audrey: ActorRef = system.actorOf(Props[AudreyHepburn])
@ELMANU
SENDING AND RECEIVING MESSAGES
case class Script(text: String)
class AudreyHepburn extends Actor {def receive = {
case Script(text) => read(text)
}}
@ELMANU
SENDING AND RECEIVING MESSAGES
case class Script(text: String)
class AudreyHepburn extends Actor {def receive = {
case Script(text) => read(text)
}}
audrey ! Script(breakfastAtTiffany)
@ELMANU
SENDING AND RECEIVING MESSAGES
case class Script(text: String)
class AudreyHepburn extends Actor {def receive = {
case Script(text) => read(text)
}}
audrey ! Script(breakfastAtTiffany)
“tell” - fire-forget
@ELMANU
ASK PATTERN
import akka.pattern.askimport scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._
implicit val timeout = akka.util.Timeout(1 second)
val maybeAnswer: Future[String] = audrey ? "Where should we have breakfast?"
@ELMANU
ASK PATTERN
import akka.pattern.askimport scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.duration._
implicit val timeout = akka.util.Timeout(1 second)
val maybeAnswer: Future[String] = audrey ? "Where should we have breakfast?"
“ask”
@ELMANU
SUPERVISION
class UserMigrator extends Actor {
lazy val workers: ActorRef = context .actorOf[UserMigrationWorker] .withRouter(RoundRobinRouter(nrOfInstances = 100))
}
@ELMANU
SUPERVISION
class UserMigrator extends Actor {
lazy val workers: ActorRef = context .actorOf[UserMigrationWorker] .withRouter(RoundRobinRouter(nrOfInstances = 100))
}
actor context
many childrenrouter type
@ELMANU
SUPERVISION
@ELMANU
SUPERVISION
class UserMigrator extends Actor {
lazy val workers: ActorRef = context .actorOf[UserMigrationWorker]
.withRouter(RoundRobinRouter(nrOfInstances = 100))
override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy(maxNrOfRetries = 3) { case t: Throwable => log.error(“A child died!”, t) Restart }}
@ELMANU
PIPES
@ELMANU
CECI EST UNE PIPE
•Akka pattern to combine Futures and Actors
•Sends the result of a Future to an Actor
•Be careful with error handling
@ELMANU
CECI EST UNE PIPEclass FileFetcher extends Actor {
def receive = { case FetchFile(url) => val originalSender = sender() val download: Future[DownloadedFile] = WS.url(url).get().map { response => DownloadedFile( url, response.ahcResponse.getResponseBodyAsBytes ) }
import akka.pattern.pipe download pipeTo originalSender }}
@ELMANU
CECI EST UNE PIPEclass FileFetcher extends Actor {
def receive = { case FetchFile(url) => val originalSender = sender() val download: Future[DownloadedFile] = WS.url(url).get().map { response => DownloadedFile( url, response.ahcResponse.getResponseBodyAsBytes ) }
import akka.pattern.pipe download pipeTo originalSender }} This is how you pipe
@ELMANU
CECI EST UNE PIPEclass FileFetcher extends Actor {
def receive = { case FetchFile(url) => val originalSender = sender() val download: Future[DownloadedFile] = WS.url(url).get().map { response => DownloadedFile( url, response.ahcResponse.getResponseBodyAsBytes ) }
import akka.pattern.pipe download pipeTo originalSender }}
Keep reference to original sender - what follows is a Future!
@ELMANU
CECI EST UNE PIPEclass FileFetcher extends Actor {
def receive = { case FetchFile(url) => val originalSender = sender() val download: Future[DownloadedFile] = WS.url(url).get().map { response => DownloadedFile( url, response.ahcResponse.getResponseBodyAsBytes ) }
import akka.pattern.pipe download pipeTo originalSender }}
Wrap your result into something you can easily match against
@ELMANU
CECI EST UNE PIPEclass FileFetcher extends Actor {
def receive = { case FetchFile(url) => val originalSender = sender val download: Future[Array[Byte]] = WS.url(url).get().map { response => DownloadedFile( url, response.ahcResponse.getResponseBodyAsBytes ) }
import akka.pattern.pipe download pipeTo originalSender }}
Will this work?
@ELMANU
PIPES AND error handling
class FileFetcher extends Actor {
def receive = { case FetchFile(url) => val originalSender = sender() val download = WS.url(url).get().map { response => DownloadedFile(...) } recover { case t: Throwable => DownloadFileFailure(url, t) }
import akka.pattern.pipe
download pipeTo originalSender }}
Don’t forget to recover!
@ELMANU
SUMMARY
• Futures: manipulate and combine asynchronous operation results
•Actors: organise complex asynchronous flows, deal with failure via supervision
•Pipes: deal with results of asynchronous computation inside of actors
@ELMANU
[ | LESSONS LEARNED
@ELMANU
design according to YOUR DATA
User migrator
Worker Worker Worker Worker Worker
@ELMANU
design according to YOUR DATA
Item migrator
User item migrator
Item migration
worker
Item migration
worker
User item migrator
Item migration
worker
Item migration
worker
User item migrator
Item migration
worker
Item migration
worker
design A
@ELMANU
design according to YOUR DATA
Item migrator
User item migrator
Item migration
worker
Item migration
worker
User item migrator
Item migration
worker
Item migration
worker
User item migrator
Item migration
worker
Item migration
worker
design A
Not all users have the same amount of items
@ELMANU
design according to YOUR DATA
Item m
igrator
Item migration
worker
User item migrator
User item migrator
User item migrator
Item migration
worker
Item migration
worker
Item migration
worker
Item migration
worker
Item migration
worker
File fetcher
File fetcher
File uploader
Soundcloud worker
File uploader
design B
@ELMANU
design according to YOUR DATA
Item migration
worker
User item migrator
User item migrator
Item migration
worker
Item migration
worker
Item migration
worker
Item migration
worker
Item migration
worker
File fetcher
File fetcher
File uploader
Soundcloud worker
File uploader
Pools of actors
design B
Item m
igrator
User item migrator
@ELMANU
KNOW THE limits OF THY SOURCE SYSTEM
@ELMANU
KNOW THE limits OF THY SOURCE SYSTEM
@ELMANU
DATA MIGRATION SHOULD not BE A RACE
•Your goal is to get the data, not to be as fast as possible
•Be gentle to the legacy system(s)
@ELMANU
CLOUD API STANDARDS
• ISO-28601 Data formats in REST APIs
• ISO-28700 Response times and failure communication of REST APIs
• ISO-28701 Rate limits in REST APIs and HTTP error codes
@ELMANU
CLOUD API STANDARDS
• ISO-28601 Data formats in REST APIs
• ISO-28700 Response times and failure communication of REST APIs
• ISO-28701 Rate limits in REST APIs and HTTP error codesDREAM ON
@ELMANU
NO STANDARDS!
• The cloud is heterogenous
•Response times, rate limits, error codes all different
•Don’t even try to treat all systems the same
@ELMANU
RATE limits
@ELMANU
RATE limits
•Read the docs - most cloud API docs will warn you about them
•Design your actor system so that you can queue if necessary
•Keep track of migration status
@ELMANU
RATE limits
•Example: Soundcloud API
•500 Internal Server Error after seemingly random amount of requests
@ELMANU
RATE limits
•Example: Soundcloud API
•500 Internal Server Error after seemingly random amount of requests
WS .url("http://api.soundcloud.com/resolve.json") .withHeaders("User-Agent" -> “FOOBAR”) // the magic ingredient that // opens the door to Soundcloud
Magic User-Agent
@ELMANU
BLOCKING
@ELMANU
seriously, do not BLOCK
•Seems innocent at first to block from time to time
•OutOfMemory after 8 hours of migration run is not very funny
•You will end up rewriting your whole code to be async anyway
@ELMANU
MISC
•Unstable primary IDs in source system
•Build a lot of small tools, be pragmatic
• sbt-tasks (http://yobriefca.se/sbt-tasks/)
@ELMANU
THE END
@ELMANU
THE END
QUESTIONS?