7
IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO. 6, JUNE 2013 2459 A File Assignment Strategy Towards Minimized Response Time for Parallel Storage Systems Yang Yu, Yongqing Zhu, Willie Ng, Juniarto Samsudin, and Zhixiang Li Data Centre Technology Division, Data Storage Institute, 138632 Singapore Prompt response to data access requests is the key performance concern of parallel storage systems. Sort partition (SP) is one of the most promising le allocation solutions for the static scenario due to its sorted partition and placing mechanism. The online variant of SP, hybrid partition (HP), however, does not perform well in the dynamic scenario. The recently proposed balanced allocation with sort (BAS) and its online version balanced allocation with sort for batch (BASB) try to compete with SP and HP by providing better load balancing together with sorting. BAS’s performance is close to SP, and BASB greatly outperforms HP. In this paper, we proposed new le placement solutions for the parallel storage systems: Optimized Sort Partition (OSP) for the static scenario and Optimized Sort Partition Online (OSPOnline) for the dynamic scenario. By eliminating the drawbacks of SP/HP and BAS/BASB in the le assignment, OSP and OSPOnline can achieve optimized system response time. Conducted simulations show that OSP and OSPOnline steadily outperform their competitors under various conditions. Index Terms—Balanced allocation with sort (BAS), balanced allocation with sort for batch (BASB), le placement, hybrid partition (HP), optimized sort partition (OSP), optimized sort partition online (OSPOnline), parallel I/O, sort partition (SP). I. INTRODUCTION N OWADAYS, digital data, generated by users and applica- tions, are increasing dramatically. As the IT infrastructure becomes faster and more powerful, users put up more stringent requirements on digital data access. Hence, prompt responses to user’s data access requests become essential to guarantee the quality of service (QoS) claimed by users [1]. Large-scale par- allel storage systems, like redundant arrays of inexpensive disks (RAID), are deployed to store digital data for real world applica- tions as they can partition data across multiple disks and access the data in parallel to achieve fast response to user requests [2]. Data should be allocated to multiple disks of a parallel storage system properly before being accessed to optimize the system response time. Such a data allocation problem, when consid- ering each le as an entirety without further partition, is often dened as le assignment problem (FAP) [3]. Algorithms for allocating les to a parallel disk storage system have been investigated intensively [1], [3]–[10]. In general, there are two categories of le assignment strategies: ofine ones and online ones. While ofine le assignment schemes require complete le knowledge in advance, including le access rates and le sizes, online strategies have no such requirements and les are assigned in real-time as they are written into the storage system [3], [6]. To the best of our knowledge, sort partition (SP) [6] is one of the best existing solutions in terms of response time for static scenarios. Its online variant, hybrid partition (HP) [6], however, does not perform well enough. Balanced allocation with sort (BAS) [9] tries to compete with SP, and its performance in terms of system response time is close to that of SP. balanced allocation with sort for batch (BASB) [9], the online version of BAS, outperforms HP in the dynamic scenario. Many promising existing solutions, including SP/HP and BAS/BASB try to reduce the load variance across disks or le Manuscript received November 28, 2012; revised March 05, 2013; accepted March 07, 2013. Date of current version May 30, 2013. Corresponding author: Y. Yu (e-mail: [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TMAG.2013.2252157 service time variance on each disk in order to reduce system response time, however, none of them perfectly achieve the two objectives simultaneously. Hence, we believe that the system performance in terms of response time can be further improved through optimization. With such a motivation, we propose a new le assignment solution in this paper. The solution includes an ofine version and an online version. The ofine algorithm, optimized sort partition (OSP), is applied to static le assignment scenarios, in which the characteristics of all les are preknown. The corresponding online algorithm, optimized sort partition online (OSPOnline), is utilized in the dynamic scenario, where les arrives in batches and le-related information is only known for les in previous and current batches. The rest of this paper is organized as follows: Section II re- views some related prior works. Section III presents system model and performance metrics in a mathematical way. Sec- tion IV describes OSP and OSPOnline algorithms in detail. Sim- ulations are conducted and evaluated in Section V. Finally, Sec- tion VI draws a conclusion about this paper. II. RELATED WORKS With the evolvement of parallel and distributed storage sys- tems, solutions have been proposed to solve new problems, in- cluding data staging and management for large-scale storage systems [11]–[13], data object replica placement and data re- allocation [14], [15]. Essentially, these problems are either di- rectly derived from or closely related to FAP [1]. Hence, it is still worth putting efforts on providing better solutions for FAP. Since fast response is the most important performance mea- sure of a parallel storage system, many heuristic algorithms, like Greedy [1], [16], try to achieve shorter response time by min- imizing the disks’ maximal utilization. [6], [9] points out that algorithms like Greedy only take care of disk load balancing but neglect the service time variance of each disk, which actu- ally has great impact on the queuing delay of a disk and conse- quently the mean response time of the system. SP then tries to reduce the service time variance on each disk by sorting les ac- cording to their sizes before le assignment. Then it assigns the least number of a contiguous set of sorted les to a disk so that 0018-9464/$31.00 © 2013 IEEE

A File Assignment Strategy Towards Minimized Response Time for Parallel Storage Systems

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO. 6, JUNE 2013 2459

A File Assignment Strategy Towards Minimized Response Time for ParallelStorage Systems

Yang Yu, Yongqing Zhu, Willie Ng, Juniarto Samsudin, and Zhixiang Li

Data Centre Technology Division, Data Storage Institute, 138632 Singapore

Prompt response to data access requests is the key performance concern of parallel storage systems. Sort partition (SP) is one of themost promising file allocation solutions for the static scenario due to its sorted partition and placing mechanism. The online variant ofSP, hybrid partition (HP), however, does not perform well in the dynamic scenario. The recently proposed balanced allocation with sort(BAS) and its online version balanced allocation with sort for batch (BASB) try to compete with SP and HP by providing better loadbalancing together with sorting. BAS’s performance is close to SP, and BASB greatly outperforms HP. In this paper, we proposed new fileplacement solutions for the parallel storage systems: Optimized Sort Partition (OSP) for the static scenario and Optimized Sort PartitionOnline (OSPOnline) for the dynamic scenario. By eliminating the drawbacks of SP/HP and BAS/BASB in the file assignment, OSP andOSPOnline can achieve optimized system response time. Conducted simulations show that OSP and OSPOnline steadily outperformtheir competitors under various conditions.

Index Terms—Balanced allocation with sort (BAS), balanced allocation with sort for batch (BASB), file placement, hybrid partition(HP), optimized sort partition (OSP), optimized sort partition online (OSPOnline), parallel I/O, sort partition (SP).

I. INTRODUCTION

N OWADAYS, digital data, generated by users and applica-tions, are increasing dramatically. As the IT infrastructure

becomes faster and more powerful, users put up more stringentrequirements on digital data access. Hence, prompt responsesto user’s data access requests become essential to guarantee thequality of service (QoS) claimed by users [1]. Large-scale par-allel storage systems, like redundant arrays of inexpensive disks(RAID), are deployed to store digital data for real world applica-tions as they can partition data across multiple disks and accessthe data in parallel to achieve fast response to user requests [2].Data should be allocated to multiple disks of a parallel storagesystem properly before being accessed to optimize the systemresponse time. Such a data allocation problem, when consid-ering each file as an entirety without further partition, is oftendefined as file assignment problem (FAP) [3].Algorithms for allocating files to a parallel disk storage

system have been investigated intensively [1], [3]–[10]. Ingeneral, there are two categories of file assignment strategies:offline ones and online ones. While offline file assignmentschemes require complete file knowledge in advance, includingfile access rates and file sizes, online strategies have no suchrequirements and files are assigned in real-time as they arewritten into the storage system [3], [6]. To the best of ourknowledge, sort partition (SP) [6] is one of the best existingsolutions in terms of response time for static scenarios. Itsonline variant, hybrid partition (HP) [6], however, does notperform well enough. Balanced allocation with sort (BAS)[9] tries to compete with SP, and its performance in terms ofsystem response time is close to that of SP. balanced allocationwith sort for batch (BASB) [9], the online version of BAS,outperforms HP in the dynamic scenario.Many promising existing solutions, including SP/HP and

BAS/BASB try to reduce the load variance across disks or file

Manuscript received November 28, 2012; revised March 05, 2013; acceptedMarch 07, 2013. Date of current version May 30, 2013. Corresponding author:Y. Yu (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TMAG.2013.2252157

service time variance on each disk in order to reduce systemresponse time, however, none of them perfectly achieve thetwo objectives simultaneously. Hence, we believe that thesystem performance in terms of response time can be furtherimproved through optimization. With such a motivation, wepropose a new file assignment solution in this paper. Thesolution includes an offline version and an online version. Theoffline algorithm, optimized sort partition (OSP), is applied tostatic file assignment scenarios, in which the characteristics ofall files are preknown. The corresponding online algorithm,optimized sort partition online (OSPOnline), is utilized in thedynamic scenario, where files arrives in batches and file-relatedinformation is only known for files in previous and currentbatches.The rest of this paper is organized as follows: Section II re-

views some related prior works. Section III presents systemmodel and performance metrics in a mathematical way. Sec-tion IV describes OSP andOSPOnline algorithms in detail. Sim-ulations are conducted and evaluated in Section V. Finally, Sec-tion VI draws a conclusion about this paper.

II. RELATED WORKS

With the evolvement of parallel and distributed storage sys-tems, solutions have been proposed to solve new problems, in-cluding data staging and management for large-scale storagesystems [11]–[13], data object replica placement and data re-allocation [14], [15]. Essentially, these problems are either di-rectly derived from or closely related to FAP [1]. Hence, it isstill worth putting efforts on providing better solutions for FAP.Since fast response is the most important performance mea-

sure of a parallel storage system, many heuristic algorithms, likeGreedy [1], [16], try to achieve shorter response time by min-imizing the disks’ maximal utilization. [6], [9] points out thatalgorithms like Greedy only take care of disk load balancingbut neglect the service time variance of each disk, which actu-ally has great impact on the queuing delay of a disk and conse-quently the mean response time of the system. SP then tries toreduce the service time variance on each disk by sorting files ac-cording to their sizes before file assignment. Then it assigns theleast number of a contiguous set of sorted files to a disk so that

0018-9464/$31.00 © 2013 IEEE

2460 IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO. 6, JUNE 2013

the corresponding disk load just reaches the average disk load. The same procedure repeats until the last disk, to which allthe remaining files are assigned. The obvious drawback of SP isthat the last disk is much more light-weighted as compared withother disks and the disk load balancing is not strictly followed,which in turn may adversely affect the system response time.HP is the corresponding online version of SP and can be de-ployed in the dynamic scenario. In HP, files are assigned to disksbatch-by-batch. For each batch of sorted files, HP selects thedisk with the minimum accumulated load to allocate files untilthe disk load reaches a predefined threshold. The same proce-dure repeats until all files in the batch are assigned to some disk.However, such a mechanism mixes large files and small filesof different batches into the same disk, which severely violatesthe sorted partition rule. Also, using a predefined threshold toguideline the targeted disk load, it is difficult for HP to achievegood load balancing. Hence the performance of HP is far fromsatisfying.To overcome SP’s drawback of imperfect load balancing

across disks, BAS introduces some Greedy-like mechanism inthe file allocation procedure. Similar to SP, BAS first sorts filesby their sizes and assigns a contiguous set of sorted files toeach disk. However, BAS does not allow the assigned disk loadin each disk exceeds , so some files are left unassigned afterthis round. BAS then utilizes Greedy mechanism to allocatethese unassigned files one-by-one to the disk with minimumload. In such a manner, BAS can surely achieve better diskload balancing. However, the greedy file assignment procedureonly uses disk load as the criteria for file allocation. Thus itpartially violates the fundamental sorted partition rule and mayaffect the overall system response time accordingly. BASB isthe extension of BAS for the dynamic scenario and it executesthe same procedures of BAS on each batch of files. BASB canoutperform HP because it observes the sorted partition ruleacross batches much better than HP.Recently Dong etc. proposed a static file assignment algo-

rithm named minimum I/O contention probability (MinCP)[17]. MinCP defines the I/O contention probability of accessingtwo files as the file consecutive access probability multipliedwith the I/O overlapping probability [17]. And it concludes thatthe I/O contention probability between popular files is higherthan that between unpopular files, and hence files should besorted according to their access rates instead of their servicetime and assign to disks in a round robin fashion. When fileaccess rates follow Zipfian distribution and are inversely relatedwith file size distribution, which can be observed in many realapplications and indicates popular files are with small file sizesand large files are normally unpopular, MinCP reaches almostan opposite conclusion about how to allocate files in order toachieve optimum system response time as compared to SPand BAS. With careful study, we think that MinCP neglectssome situations that may also affect I/O contention probability,such as nonconsecutive access requests can also cause I/Ocontention to each other and overlapping access can occur dueto the queuing of access requests.

III. SYSTEM MODEL

File assignment algorithms can be implemented inside a log-ical to physical space mapper like a logical volume manager(LVM), a file system, or a disk array controller [9], [18]. User

Fig. 1. Queuing model for data access scheduling.

generated data are then stored in disks according to some file as-signment algorithm and accessed by users subsequently. In thispaper, file partitioning or replication is not considered for sim-plicity reason and each file as an entirety is assigned to one disk[1], [6].The parallel storage system is modeled as m stand-alone

disks, in which the queuing delay on buses or controllers ofdisks are negligible as compared with that on disks themselves[6]. Also, we consider a homogenous system with each diskhaving the same characteristics. A file assignment algorithmallocates a set of files, , to disks, .When a file is requested for access later, the controller discoversat which disk the file is and then forwards the access request tothe corresponding disk. Disk access to a file is modeled asa Poisson process with a mean access rate , which is knownto the system when the file arrives [1], [6], [9]. A fixed servicetime is assumed for fi and is defined by the followingequation:

(1)

where and are constant seek time and rotation time,respectively. And is the time to sequentially scan theentire file of , which is proportional to the size of .The file popularity, represented by the file access rate, fol-

lows a Zipfian distribution, and is inversely correlated with thefile size distribution. These assumptions are supported by ob-servations in many real systems that popular files are relativelysmall in size and the large files are generally less popular [19],[20]. Obviously, the file access requests of the system can beconsidered as the requests to each disk independently, and thefile access requests for each disk can be modeled as anqueue as shown in Fig. 1. stands for the request/job arrivalinterval following a Poisson distribution, represents the ser-vice time, which obeys an independent random distribution, and1 indicates the number of processors for each disk.Since each disk is modeled as an queue, the Pol-

laczek-Khinchine (P-K) formula can be utilized to calculate themean queue length of a disk

(2)

Where is the arrival rate of disk is the mean ofthe service time distribution, is the disk load and

is the variance of the service time distribution. For the

YU et al.: MINIMIZED RESPONSE TIME FOR PARALLEL STORAGE SYSTEMS 2461

mean queue length of a disk to be finite it is necessary that.

Using as the mean time that a request spends in a disk ,we have

(3)

where is the mean waiting time (time spent in the waitingarea as shown in Fig. 1) and is the serve rate of disk . Byusing Little’s law, which states that

(4)

So we can have

(5)

Assuming a perfect load balancing across disks

(6)

where is a constant. And and are the access rate,the service time and the heat of file , respectively. Based on(5), we can calculate the overall mean response time acrossthe whole system by the following equation:

(7)

We can conclude that in order to minimize the system meanresponse time the cost function of (7) need to be minimized,which is directly related with the service time variance on eachdisk. Detailed analysis in [6] shows that sorted partition ac-cording to file sizes, together with perfect load balancing acrossdisks, can result in optimized system response time W. Hence,sort partition according to file sizes and disk load balancingshould be the key guidelines in file placement algorithm design.In addition, we can see from (7) that the system response timeis not directly related with files’ access request rate distribution,and this makes MinCP [17] some doubt.

IV. PROPOSED SOLUTIONS

A. Optimized Sort Partition (OSP)

OSP is motived by and originated from SP. Different fromSP, who starts file assignment from the largest file, OSP sortsfiles in ascending order of their service times and starts the fileallocation procedure from the smallest files in the sorted file list.As the allocation of extremely large files may greatly affect theload of corresponding disks and subsequent file assignment, webelieve it is desirable to put the assignment of large files last tominimize their impact on the overall file allocation. As shown inFig. 2, OSP first initialize the disk load of all disks to 0, strictlyfollows sorted partition rule by sorting files according to theirservice time first and then always assigns a continuous set ofsorted files to each disk.

To achieve optimum load balancing, OSP recalculates the tar-geted disk load of a disk based on unassigned files beforeallocating files to each disk according to (8):

(8)

where is the number of disks that haven’t been assignedfiles, is the heat of file and is the number ofunassigned files. Such a mechanism indicates the targeted diskload is always adjusted according to the real situation and theobviously heavier/lighter-weighted disk resulted in SP does notoccur for OSP. When assigning a contiguous set of sorted filesto a disk, OSP checks howmuch difference between the resultedactual disk load and the targeted disk load due to assigning thelast file in this set to a disk. OSP regards the last file in eachassignment as a “sensitive” file and carefully decides its disklocations as illustrated in Fig. 2 as we believe that even a singlefile’s disk location, especially the location of a file with largeheat, due to either its large size or its large access rate, may af-fect the file assignment on multiple disks and thus it should becarefully decided. By using such fine tuning mechanisms, OSPcan achieve optimized load balancing without violating the fun-damental sorted partition rule. Consequently, we are expectingimproved system response time for OSP as compared with SPand BAS.

B. Optimized Sort Partition Online (OSPOnline)

In applications such as computing and web application, filesarrive to the system in batches, and these applications requirethe system to perform real-time file assignment based only onexisting files. Hence, we propose OSPOnline to achieve min-imized system response time for such dynamic scenarios. Theprocedures of OSPOnline are elaborated in Fig. 3.We can see from Fig. 3 that OSPOnline is an extension of OSP

for the dynamic scenarios. OSPOnline allocates files on-the-flyas they arrive to the storage system batch-by-batch and performsthe file assignment procedures of OSP on each batch of files. Asshown in Fig. 3, how OSPOnline calculates the targeted diskload for a disk is different from OSP. OSPOnline always per-forms the calculation based on the heats of unassigned files inthe newly arrived batch and the current load of disks that haven’tbeen assigned any files in this round.OSPOnline may not perform as well as OSP because it only

has partial file information when performing the file allocation.However, it inherits the merits fromOSP, and does not mix largefiles and small files of different batches to the same disks as HPor use Greedy-like mechanism adopted by BASB to achieveload balancing. Hence its performance is more guaranteed ascompared to HP and BASB.

V. PERFORMANCE EVALUATION

A. Simulation Parameters

Wemodel the parallel storage system and implement the com-pared various file assignment algorithms by using an event-driven simulator, [21]. Disk model is referred toSeagate Cheetah disk ST39205LC [1] with identical character-istics as shown in Table I.System characteristics have a great impact on the perfor-

mance of a file assignment algorithm. Table II gives system

2462 IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO. 6, JUNE 2013

Fig. 2. Pseudocode of OSP algorithm.

parameters and their corresponding values used in the simula-tions.Coverage: it is defined as the percentage of the entire file set

that is actually accessed by requests. And it is set to 100%, in-dicating all files assigned to disks are accessed for at least onetime in the simulation.File Access Frequency: a Zipfian distribution for

the file access request with a skew parameter of, where percent of

all access requests are directed to percent of files [6].File size distribution: it is inversely correlated with the distri-

bution of file access rates with the same skew parameter . Thisassumption is supported by studies for real systems [19].

B. Simulation Results and Discussion

In this section, the performance of OSP is compared againstother offline file placement algorithms, including Greedy, SP,BAS, and MinCP, and the performance of OSPOnline, togetherwith that of HP and BASB, is studied in dynamic scenarios. Ef-forts are put on discussing how the compared algorithms per-form in varied scenarios and how the system parameters, in-cluding aggregate access rate, file size distribution and batchsize, affect the performances of those algorithms.1) Offline Algorithm Comparison: Fig. 4 shows the mean

response time with aggregate access rate for Greedy, SP, BAS,MinCP, and OSP. The mean response time for Greedy and

MinCP is much worse than that of SP, BAS, and OSP, espe-cially when the aggregate access rate is larger than 130. Thisis because neither Greedy nor MinCP follows the sorted placerule: Greedy only focuses on disk load balancing by assigninga consecutive set of files to each disk according to their arrivalsequence to the system. MinCP sorts files with their access ratesand uses a round robin fashion to evenly distribute popular filesand unpopular files to each disk. When file access rate followsa Zipfian distribution and is inversely related with file size, thefile allocation of MinCP results in large files and small filesbeing mixed on each disk. The mean response time for SP, BASand OSP is very close when the access rate is comparativelylow. This is because all of them manage to minimize the loadvariance across all disks and file service time variance on eachdisk simultaneously to obtain optimum system response time.When the access rate reaches 220, OSP can outperform SP andBAS by 7% because it provides better load balancing than SPand more strictly observes the sorted partition rule as comparedwith BAS. Thus, its superiority over the other two becomesobvious when the system is more heavily loaded.Fig. 5 shows the disk load variance increases with aggregate

access rate for all the 5 compared offline algorithms. It can beseen that Greedy provides the best disk load balancing as thisis its sole objective. OSP performs better as compared with SPand BAS, which proves its targeted disk load adjustment in eachfile assignment round and fine tuning of the location of “sensi-tive” files are effective in achieving better load balancing across

YU et al.: MINIMIZED RESPONSE TIME FOR PARALLEL STORAGE SYSTEMS 2463

Fig. 3. Pseudocode of OSPOnline Algorithm.

TABLE IDISK PARAMETERS

disks. MinCP also shows satisfying disk load balancing, how-ever this cannot help it to achieve good system response time asit completely violates the sorted place rule based on file size.We vary the file size ratio, which is defined as the ratio of

file size between the largest file and the smallest file, to investi-gate the impact of file size distribution on system performances.Simultaneously, we set different aggregate access rates for dif-ferent file size ratios so that the system loads across this simula-tion are constantly maintained at , which is the aggre-gated file heat as defined in (6). Table III shows the access ratesfor file sets with different file size ratios.

TABLE IISYSTEM AND WORKLOAD PARAMETERS

Fig. 6 shows the mean response time with file size ratio forthe five offline algorithms. Greedy and MinCP perform signif-icantly worse than the other three due to the same reason aswe explain for Fig. 4. As the file size ratio increases to 5400,the performance of BAS deteriorates dramatically. This is be-cause a few files are exceptionally large with the file size ratio

2464 IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO. 6, JUNE 2013

Fig. 4. Mean response time versus aggregate access rate for offline algorithms.

Fig. 5. Disk load variance versus aggregate access rate for offline algorithms.

TABLE IIIAGGREGATE ACCESS RATE VS FILE SIZE RATIO

Fig. 6. Mean response time versus file size ratio for offline algorithms.

of 5400 and the system is fully loaded with , whichindicates the average disk load is almost 1 for a 16-disk parallelsystem. For BAS, its Greedy-like mechanism is not executedon the disk containing the largest files with the original inten-tion of not blocking the access of regular files due to servinglarge files [9]. However, in such an extreme case, this results inthe disk containing the largest files has significant lower loadwhile most other disks are seriously overloaded (with disk loadexceeding 1), and poor performance of BAS accordingly. OSPoutperforms SP and BAS by 16% and 62% when file size ratioreaches 5400 because OSP provides better load balancing evenunder the extreme situation.

Fig. 7. Mean response time versus aggregate access rate for online algorithms.

Fig. 8. Disk load variance versus aggregate access rate for online algorithms.

2) Online Algorithm Comparison: Since the mean responsetime of Greedy and MinCP in the static scenario is significantlyworse as compared to that of SP, BAS, andOSP andMinCP doesnot have an online version, we are only interested in and focuson how HP, BASB, and OSPOnline perform in the dynamicscenarios.Fig. 7 shows the mean response time of the three online al-

gorithms with different access rates. HP performs much worsethan BASB and OSPOnline for all access rates. This is becausethe sorted placement rule is only observed within a batch butnot across batches in HP. OSPOnline can outperform BASB by5%–8% for different access rates because it more strictly ob-serves the sorted placement rule.Fig. 8 shows the disk load variance with aggregate ac-

cess rate. It can be seen that the disk load variance of HPdecreases with access rate whereas that of BASB and OSPOn-line increases with access rate. For BASB and OSPOnline,the increase of aggregate access rate, which results from theincrease of access rate of each individual file, does not affectfile allocation when other system parameters are fixed. Andthus the disk load imbalance increases with aggregate accessrate as the access rate and heat of individual files increaseproportionally. For HP, it utilizes a constant parameter called“overflow” [6] to calculate the maximum workload (threshold)that can be assigned to a disk in each allocation round. Whenthe aggregate access rate is low, indicating a small systemworkload, this calculation likely gets a threshold much higherthan the average disk workload, and consequently leads to amore severe disk load imbalance.From Fig. 9 we can see that the mean response time increases

with the file size ratio for all because the disk load balancing ismore difficult to achieve and the file service time variance oneach disk likely increases with the increased file size ratio, andthus the system mean response time is adversely affected. On

YU et al.: MINIMIZED RESPONSE TIME FOR PARALLEL STORAGE SYSTEMS 2465

Fig. 9. Mean response time versus file size ratio for online algorithms.

Fig. 10. Mean response time versus batch size for online algorithms.

average, OSPOnline outperforms HP and BASB by 36% and30% due to its optimization mechanisms.Fig. 10 shows the mean response time decreases with the

batch size for all the three online schemes. With a smaller batchsize, files with very different sizes of different batches are morelikely to be assigned to the same disk, resulting in the file ser-vice time variance on each disk significantly increases, and thusthe mean response time is adversely affected. OSPOnline out-performs HP and BASB by up to 64% and 55%. Again, sucha result proves the efficiency of OSPOnline in minimizing thesystem response time.

VI. CONCLUSION

In this paper, we have proposed OSP and OSPOnline fileassignment algorithms in order to achieve minimum meanresponse time for a parallel I/O system. The performances ofOSP and OSPOnline are studied thoroughly by simulationswith varied system parameters, including the aggregate accessrate, the file size distribution and the batch size. Also, theirperformances are compared with other solutions, includingGreedy, SP, BAS, MinCP, HP, and BASB. From simulationresults we can see that OSP and OSPOnline give satisfyingperformance under all simulated scenarios and can outperformtheir competitors in many extreme and severe conditions, suchas heavy system load, large file size distribution and small batchsize. This is because OSP and OSPOnline strictly observe thesorted partition rule and provide optimal disk load balancing byfinely tuning the targeted disk load for each disk and carefullydeciding the disk location of “sensitive” files.

In the future, we will extend our proposed OSP and OSPOn-line for data assignment by considering file replication and par-tition. Another possible future work is to study the online datamigration schemes to handle the dynamic changes on file accesscharacteristics.

REFERENCES

[1] T. Xie and Y. Sun, “A file assignment strategy independent of workloadcharacteristic assumptions,” ACM Trans. Storage, vol. 5, no. 3, Nov.2009.

[2] P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson,“RAID: High-performance, reliable secondary storage,”ACMComput.Surv., vol. 26, no. 2, pp. 145–185, 1994.

[3] W. Dowdy and D. Foster, “Comparative models of the file assignmentproblem,” ACM Comput. Surv., vol. 14, no. 2, pp. 287–313, 1982.

[4] P. Scheuermann, G. Weikum, and P. Zabback, “Data partitioning andload balancing in parallel disk systems,” in Proc. VLDB J., 1998, vol.7, no. 1, pp. 48–66.

[5] J. Wolf, P. Yu, J. Turek, and D. Dias, “A parallel hash join algorithmfor managing data skew,” IEEE Trans. Parallel Distrib. Syst., vol. 4,no. 12, Dec. 1993.

[6] L. W. Lee, P. Scheuermann, and R. Vingralek, “File assignment in par-allel I/O systems with minimal variance of service time,” IEEE Trans.Comput., vol. 49, no. 2, pp. 127–140, 2000.

[7] D. K. Madathil, R. B. Thota, P. Paul, and T. Xie, “A static data place-ment strategy towards perfect load-balancing for distributed storageclusters,” in Proc. IEEE IPDPS’08, Apr. 2008, pp. 1–8.

[8] A. Verma and A. Anand, “On store placement for response time mini-mization in parallel disks,” in Proc. IEEE ICDCS’06, Jul. 2006, p. 31.

[9] Y. Zhu, Y. Yu, W. Y. Wang, S. S. Tan, and T. C. Low, “A balancedallocation strategy for file assignment in parallel I/O systems,” in Proc.IEEE NAS’10, Feb. 2010, pp. 257–266.

[10] N. Yao, J. Chen, and S. Cai, “A non-partitioning file assignment schemewith approximating average waiting time in parallel I/O system,” J.Software, vol. 8, no. 2, pp. 302–309, 2013.

[11] G. Alvarez, E. Borowsky, S. Go, T. H. Romer, R. Eecker-Szendy, R.Golding, A. Merchant, M. Spasojevic, A. Veitch, and J. Wilkes, “Min-erva: An automated resource provisioning tool for large-scale storagesystems,” ACMTrans. Comput. Syst., vol. 23, no. 4, pp. 483–518, 2001.

[12] S. A. Weil, S. A. Brandt, E. L. Miller, and C. Maltzahn, “CRUSH:Controlled, scalable, decentralized placement of replicated data,” inProc. ACM/IEEE Conf. Supercomput., Nov. 2006, p. 31.

[13] F. Isaila, J. G. Blas, J. Carretero, R. Latham, and R. Ross, “Design andevaluation of multiple-level data staging for blue gene systems,” IEEETrans. Parallel Distrib. Syst., vol. 22, no. 6, pp. 946–959, 2011.

[14] J. Tjioe, R. Widjaja, A. Lee, and T. Xie, “DORA: A dynamic file as-signment strategywith replication,” inProc. IEEE ICPP’09, Sep. 2009,pp. 148–155.

[15] T. Xie and Y. Sun, “Dynamic data reallocation in hybrid disk arrays,”IEEE Trans. Parallel Distrib. Syst., vol. 21, no. 9, pp. 1330–1341,2010.

[16] R. L. Graham, “Bounds on multiprocessing timing anomalies,” SIAMJ. Appl. Math., vol. 17, no. 2, pp. 416–429, 1969.

[17] B. Dong, X. Li, L. Xiao, and L. Ruan, “A file assignment strategy forparallel I/O system with minimum I/O contention probability,” GridDistrib. Comput., vol. 261, pp. 445–454, 2011.

[18] A. Verma and A. Anand, “Genearal store placement for response timeminimization in parallel disks,” J. Parallel Distrib. Comput., vol. 67,no. 12, pp. 1286–1300, 2007.

[19] V. Almeda,M. Cesario, R. Fonseca,W.Meira, Jr., and C.Murta, “Ana-lyzing the behaviour of a proxy server,” inProc. 3rd Int. WWWCachingWorkshop, 1998.

[20] Q. Zoll, Y. Zhu, and D. Feng, “A study of self-similarity in parallel I/Oworkloads,” in Proc. IEEE Symp. MSST’10, May 2010, pp. 1–6.

[21] [Online]. Available: http://omnetpp.org/ website: