1
Adaptive Request Scheduling for Parallel Scientific Web Services
Heshan Lin Xiaosong MaJiangtian Li
Ting YuNagiza Samatova
North Carolina State UniverstyOak Ridge National Laboratory
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 2
“Data Rich” Scientific ApplicationsScientists need to routinely process hundreds of GBs or TBs of data
Biology, cosmology, climate
Public science data grow rapidlyE.g., GenBank size grows > 5 orders of magnitude in last 2 decades
Storing, analyzing such data beyond capacity of personal computers
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 3
Scientific Web ServicesIncreasingly popular to address data growth
Efficient sharing of public data repository high-end computing resources
Hiding parallel job management overhead
InternetInternet
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 4
New Scheduling ContextCharacteristics of scientific requests
Compute-intensive: require processing on multiple processorsData-intensive: accessing GBs to TBs of data
Related scheduling studiesContent serving cluster web server: focusing on data-localitySpace sharing parallel job scheduling: focusing on parallel efficiency
Needs computation- and data-aware scheduling algorithms
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 5
Our Contributions
Two-level adaptive scheduling framework for scientific web services
Goal: to improve average request response timeTakes into account both data-locality and parallel efficiencyAutomatically adapts to system loads and request patterns
Case study: genomic sequence similarity search (BLAST) web server
Performance evaluation on real cluster
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 6
Road Map
IntroductionBackgroundScheduling designExperiment resultsConclusions
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 7
BLAST
Routinely used in many biomedical researchesSearch similarities between query sequences andthose in sequence database
Predict structures and functions of new sequencesVerify experiment and computation results
Analogous to web search engines (e.g. Google)Web Search Engine BLAST
Input Key word(s) Query sequence(s)Search space Internet Known sequence database
Output Related web pages DB sequences similar to the query
Sorted by Closeness & rank Score (Similarity)
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 8
Parallel BLAST
Partition large DBs across multiple processorsmpiBLAST [Darling03, Lin05, Gardner06, Lin08]
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 9
Road Map
IntroductionBackgroundScheduling algorithm designExperiment resultsConclusions
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 10
System ArchitectureFront end node
Receives request and making scheduling decision Backend nodes
Perform parallel BLAST jobs
Node 1 Node 2 Node 3 Node 4 Node 5 Node 6
q1 q2
DB11
Partition 1
DB12 DB13 DB14
Shared File SystemDB11 DB12 DB13 DB14 DB21 DB22 DB23 DB24
DB21 DB22 DB23 DB24
Partition 2
Incoming QueriesTGACGTCATCCTCATGTGTTTCTCCATTGACAGCCCTGACAGTTTGGAAAACATTCCTGAGAAG …
Query: 1 tgacgtcatcctcatgtgtttctccattgacag… 60|||||||||||||||||||||||||||||||||…
Sbjct: 21 tgacgtcatcctcatgtgtttctccattgacag… 80...
Results
Front End
Query Queue
q3, q4, q5 …
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 11
OverviewScheduling problem: find partition of cluster to service request
How many processors to allocate?And on which processors?Which database fragment(s) to search on each processor?
Scheduling techniquesEfficiency-oriented schedulingData-oriented schedulingChallenge: to automatically adapt to system loads and query patterns
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 12
Efficiency-Oriented SchedulingResponse time = wait time + service time
IntuitionPartition size grows => speedup increases, efficiency decreasesWhen load light, use large partition size -> reduce service timeWhen load heavy, use small partition size -> reduce wait time
MAP [Dandamudi99]Compute partition size S: number of jobs being servicedf: adjustable parameter ( 0 <= f <= 1)
__ (1, ( ))_ 1 *
total processorspartition size Max ceilqueue length f S
=+ +
Speedup/#procs
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 13
Our Solution: RMAPDefine a range of partition sizes [Pmin, Pmax] for each DB
Pmin: smallest # procs whose aggregate memory can hold the databasePmax: saturation point of speedup curve
Speedup Curve
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30 35Number of Processors
Spee
dup
linear-speedupnr searchnt search
Efficiency Curve
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 5 10 15 20 25 30 35
Number of Processors
Effic
ienc
y
linear-speedupnr searchnt search
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 14
Data-Oriented SchedulingGiven partition size p, which processors should search next query?
Naïve approach FA (First Available): similar to batch job scheduling
Orders processors by rank, pick first p idle processorsDoes not consider data locality
LARD algorithm for cluster web servers [Pai98] Intuition: assigns object request to processor that recently serviced itConsiders both data locality and load balance
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 15
Data-Oriented Scheduling (cont.)What’s new here?
Servicing a query requires co-scheduling of multiple nodesA processor can only serve one query at a time
Our solution: PLARDMultiple queues and processor pools
Per-database basisQuery assignment and load balancing among processor pools
Assign and migrate processors in groups
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 16
PLARD + RMAPTwo-level scheduling decisions
Inter-DBPool: dynamically adjusting DB pool sizes guided by RMAP on global statssInner-DBPool: RMAP on local (per DB) stats
Global Queue
DB1 Queue DB2 Queue
P1 P2
P3 P4
P5 P6
DB1 Pool DB2 Pool
q1 q2 q3
q6
q4 q5
RMAP, PLARD
RMAP
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 17
Road Map
IntroductionBackgroundScheduling designExperiment resultsConclusions
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 18
Experiment SetupInput data
5 NCBI sequence databases Synthesized query trace, Poisson arrivals
1000 randomly sampled query sequences (proportional to DB size)Backend cluster
32 Xeon procs, Linux OS, Gigabit Ethernet
DB Name Type Raw Size Formatted Size
Pmin Pmax
env_nr P 1.7GB2.6GB2.8GB21GB16GB
2.5GB 2 324288
32163232
nr P 3.0GBest_mouse N 2.0GB
nt N 6.5GBgss N 9.1GB
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 19
PLARD Impacts
0100200300400500600700800900
0.2 0.4 0.6 0.8 1System Loads
Avg
#Fa
ults
FIX-S FIX-S-PLARD
0123456789
10
0.2 0.4 0.6 0.8 1System Loads
Avg
Res
Tim
e (lo
g se
cs)
FIX-S FIX-S-PLARD
0123456789
10
0.2 0.4 0.6 0.8 1System Loads
Avg
#Fa
ults
FIX-M FIX-M-PLARD
020406080
100120140160180
0.2 0.4 0.6 0.8 1System Loads
Avg
Res
Tim
e (lo
g se
cs)
FIX-M FIX-M-PLARD
Avg #Page Faults Avg Response Time (Log)
Smal
l Par
titi
onM
ediu
m P
arti
tion
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 20
PLARD Impacts (cont.)
0
100
200
300
400
500
600
1 4 7 10 13 16 19 22 25 28 31Processor ID
# Se
arch
ed Q
uerie
s
load 0.2load 0.4load 0.6load 0.8load 1
0
100
200
300
400
500
600
1 4 7 10 13 16 19 22 25 28 31Processor ID
# Se
arch
ed Q
uerie
s
load 0.2load 0.4load 0.6load 0.8load 1
Count # of searched queries on each processorPLARD results in more balanced loads across processors
w/o PLARD w. PLARD
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 21
Adaptive to Fixed Arrival RatesStatic policies work well for certain workloadRMAP wins across the board
0123456789
10
0.2 0.4 0.6 0.8 1System Loads
FIX-S-PLARDFIX-M-PLARDFIX-L-PLARDRMAP-PLARD
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 22
Adaptive to Mixed Arrival RatesTwo traces with mixed arrival rates
Trace 1: 0.2 + 0.4 + 0.6 + 0.8Trace 2: 0.2 + 0.8 + 0.4 + 1.0
0102030405060708090
100
Trace 1 Trace 2Ave
rage
Res
pons
e Ti
me
(Sec
s)
.
FIX-S-PLARDFIX-M-PLARDFIX-L-PLARDRMAP-PLARD
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 23
Road Map
IntroductionBackgroundScheduling designExperiment resultsConclusions
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 24
Conclusions
Scientific web service request scheduling not well studied
“Moldable jobs” realized
Two-level adaptive scheduling frameworkRMAP: parallel efficiency awarePLARD: data locality aware
Combined adaptive policy autonomically adapts to system loads and query patterns
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 25
References[Dandamudi99] S. Dandamudi and H. Yu. Performance of adaptive space sharing processor allocation policies for distributed-memory multicomputers. JPDC, 58(1), 1999.[Gardner06], M. Gardner, W. Feng, J. Archuleta, H. Lin, and X. Ma, Parallel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications. Supercomputing, 2006[Lin05], H. Lin, X. Ma, P. Chandramohan, A. Geist, and N. Samatova, Efficient Data Access for Parallel BLAST, IPDPS, 2005.[Lin08], H. Lin, P. Balaji, R. Pool, C. Sosa, X. Ma, and W. Feng, Massively Parallel Genomic Sequence Search on the BlueGene/P Architecture. To appear, Supercomputing, 2008.[Pai98] V. Pai, M. Aron, G. Banga, M. Svendsen, P. Druschel, W. Zwaenepoel, and E. Nahum. Locality-aware request distribution in cluster-based network servers. ASPLOS-VIII, 1998.
Xiaosong Ma, NCSU/ORNL, SSDBM 2008 26
Thank You
Questions?