Upload
aragozin
View
1.599
Download
2
Embed Size (px)
DESCRIPTION
Slide deck from Moscow CloudCamp 2012
Citation preview
High Performance ComputingHigh Performance ComputingCloud point of viewCloud point of view
Alexey [email protected]
Apr 2012
Massive parallel computingMassive parallel computing
I/O bound workload• Data mining / machine learning / indexing• Focus: Do not move data, process in place
CPU bound• complex simulations / complex math models• Focus: Keep all cores busy
Latency bound• Physical process simulations
(e.g. weather forecast)
• Focus: Minimize communication latencies
CPU bound taskCPU bound task
Stream like workload • Independent tasks• Random continuous stream of tasks• E.g. video conversion, crawling
Structured batch jobs• Single batch is split into subtasks for parallel execution• Task may have data dependency on each other• Task may be generated during batch execution• E.g. portfolio risk calculation
Handling task stream in CloudHandling task stream in Cloud
Worker pool
adjusts pool sizebased on queue metrics
Controler
Task queue
queue metrics
incomingtasks
Simple pattern. Exploiting “elasticy” of cloud. Cost effective.
Structured batch jobs in cloudStructured batch jobs in cloud
Batches are usually more sporadic e.g. end of day risk calculations
Task may have cross dependencies scheduler should be “cloud-aware”Supplying tasks with data data delivery delay is critical worker pool is generally very large data sets also could be very large
Data delivery strategyData delivery strategy
Push approach scheduler controls data delivery worker expects data to be available locally more opportunities for optimization complexPull approach worker pulls required data from central service scheduler is unaware about data sets requires scalable data service much simpler
What kind of data do we have?What kind of data do we have?
Working set• working set is divided between jobs• each portion of working set processed by single job• often jobs are producing working set for next
computation stage
Reference data• exactly same data shared by multiple/all jobs• usually static data set
Data distribution problemData distribution problem
Working set• Spiky work load – especially at the start• Hard to predict there piece of data will be required• Caching is ineffective
Reference data set• Naïve approach will produce huge volume of
redundant transfers – smart caching required• Spiky work load
Private grid practicePrivate grid practice
HPC Grid
Data grid
RDBMSor
Data Warehouse
Data grid, what is it?Data grid, what is it?
• Key/Value storage• Data distributed across cluster of servers• RAM is usually used as storage• Redundant copies provide level of fault tolerant /
durability• No single point of failure• Automatic rebalancing of data when servers
added/removed from grid• Capacity and throughput are scaling linearly
Data service for cloud HPCData service for cloud HPC
• Block storage service Azure drive / Amazon EBS
– Lack of shared access to data• Key / Value storage
Azure Tables / Amazon Simple DB
– Pricing: volume + usage• Blob store
Azure Tables (blobs) / Amazon S3
– Pricing: volume + transactions– Good read scalability
Use case for cachingUse case for caching
Avoid storage of data in cloud• Upload data once per batch and cache in cloud
Reduce storage cost by reducing number of operations
Save IO bandwidth for shared data• Edge caching• Routing overlays
Routing overlaysRouting overlays
• Each node knows (communicates) only subset of network.
• A request to unknown node is routed via one of neighbors which a closest to destination.
Task stealingTask stealing
Task steeling – alternative scheduling approachTask steeling in widely used for in-process multi-core concurrency
Why use it for cluster task scheduling• Stochastic and adaptive• Can use cost models accounting internal cloud
topology• Decently solves problem of data delivery, without
additional caching• Unproven for cluster computation, so far
Task stealingTask stealing
fork
fork
processing
fork
Worker 1
Work backlog is organized as stack
Tasks are generated recursively Top of stack – fine grained tasks Bottom of stack – coarse
grained tasks Execution from top of stack Stealing – bottom of stack
Task stealingTask stealing
fork
fork
done
forkprocessing
steal
processing
fork
fork
Worker 1 Worker 2
IO bound workload in cloudIO bound workload in cloud
Dawn of Map/Reduce- high bandwidth interconnects are expensive- network storage is expensive- cheap serves and local processing for keeping costs
low“Cloud” reality- network bandwidth is cheap- disks are already “networked”- RAM is abundant
Hadoop is cloud unfriendlyHadoop is cloud unfriendly
Assume I have 50 nodes Hadoop cluster in cloudWhat will I gain by adding another 50 nodes?- Not much, until they are populated with data.What if I will shut these 50 afterward?- Effort to populate them with data will be wasted.
Hadoop is coupling execution and storage services together – you have pay for both even if you use one.
How cloud M/R should look?How cloud M/R should look?
• Use cloud storage service and persistent storage• Streaming M/R processing• Aggressive use of memory for intermediate data
Peregrine – storeless M/R frameworkhttp://peregrine_mapreduce.bitbucket.org/
Spark – in-memory M/R frameworkhttp://www.spark-project.org/
Looking into futureLooking into future
Highly anticipated features Scheduler as a Service
Azure HPC
Simple middleware for organizing caches and routing overlaysExisting solutions are far from simple
Cloud friendly map/reduce frameworks