12
658 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGI”G, VOL. 5, NO. 4, AUGUST 1993 Continuous Retrieval of Multimedia Data Using Parallelism Shahram Ghandeharizadeh, Member, IEEE, and Luis Ramos, Member, ZEEE, Abstrut-Multimedia information systems have emerged as an essential component of many application domains ranging from library information systems to entertainment technology. This is because these systems utilize a variety of h-an senses to provide an effective means of communicating information. However, most implementations of these systems (based on a workstation) cannot support a continuous display of high resolution audio and video data and suffer from fkequent disruptions and delays termed hiccups. This is due to the low U0 bandwidth of the current disk technology, the high bandwidth requirement of multimedia objects, and the large size of these objects which mquks them to be almost always disk resident. In this paper, we describe a parallel multimedia information system and the key technical ideas that enable it to support a real-time display of multimedii objects. These techniques are as follows. First, we decluster a multimedia object across several disk drives, enabling the system to utilize the aggregate bandwidth of multiple disks to retrieve an object in real-time. Second, the workload of an application is distributed evenly across the disk drives in order to maximize the processing capability of the system. To support simultaneous display of several multimedia objects for different users, we describe two altemative approaches. The first approach multitasks a disk drive among several requests while the second replicates the data and dedicates resources to each individual request. We investigate the trade-offs associated with each approach using a simulation model. Our results demonstrate the superiority of the replication approach. Index Term-Data placement, declustering, multimedia, mul- titasking, pipelining, shared-nothing architecture. I. INTRODUCI’ION URING the past decade, information technology has D evolved to store, retrieve, and manipulate multimedia data (e.g., text, audio, and video). Multimedia information systems utilize a variety of human senses to provide an effec- tive means of conveying information. Already, these systems play a major role in educational applications, entertainment technology, and library information systems. As an example, consider Compton ’s Multimedia Encyclopedia from Britannia Software. According to the publisher [7], it includes the full text of the 19-volume, 5200 article, 8 784 OOO-word 1989 edition of the Compton’s Encyclopedia; 15 800 pictures, maps, and diagrams; 60 minutes of recorded voice and sound; 45 animated sequences; Webster’s Intermediate Dictionary; and Josten’s word processing program. This is a large volume of Manuscript received June 1, 1992; revised December 1, 1992. This work was supported in part by the NSF under Grants IRI-9110522 and IRI-9203389, and by the NYI under Award 1RI-9258362. The authors are with the Department of Computer Science, University of Southern California, Los Angeles, CA 90089. IEEE Log Number 9209926. information and there are many ways to query it. A typical query would involve the traversal of a hierarchical topic tree which establishes a connection between the relevant pieces of information. So, for example, one can traverse the historical time line (which is one of several topic trees) to hear John F. Kennedy’s “Ask not what your country can do for you” speech. As suggested by this example, once the information is stored in a multimedia system, it almost always becomes a read only database. Such a database can be compared to a library managing multiple books and users, where once a book is registered with the library, each user accesses the system to retrieve that book. While updates are rare, multimedia objects should be displayed in “real-time”. By “real-time”, we mean a continuous retrieval of an object at the bandwidth required by its media type. This is a challenging task because certain media types, in particular video, require very high bandwidths. For example, the bandwidth required by NTSCl for “network-quality” video is about 45 mb/s [16]. Recommendation 601 of the International Radio Consultative Committee (CCIR) calls for a 216-mb/s bandwidth for video objects [ll]. Moreover, video objects are typically very large and almost always disk resident. Due to the low bandwidth of the current disk technology (typically rated at 10 mb/s), the stand-alone implementation of these systems (based on a workstation) suffers from frequent delays and disruptions, termed hiccups [27], while the system is displaying an object. There are two reasons why this limitation is not likely to be resolved by itself. First, the speed of the magnetic disk drive: which typically serves as the secondary storage device for these systems, is not expected to improve significantly in the near future [22]? Second, the future calls for mul- timedia objects with a higher resolution, and hence, higher bandwidth requirements. For example, a video object based on the HDTV (High Definition Television) quality images requires approximately a 700 mb/s bandwidth. Currently, there are several techniques to support a real- time display of these objects: 1) sacrifice the quality of the data by using either a lossy compression technique (e.g., ‘The US standard established by the National Television System C o d t - tee. ’In this paper, we focus on magnetic disk drives as the secondary storage device for multimedia systems. However, the concepts described in this paper are applicable to other secondary storage devices. We focus on magnetic disk drives because: 1) they are common place, 2) there are no misconceptions on how they operate. 3The raw seek time has only improved at a rate of 7% each year since 1980 [22]. 1041-4347/93$03.00 0 1993 IEEE

Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

658 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA E N G I ” G , VOL. 5, NO. 4, AUGUST 1993

Continuous Retrieval of Multimedia Data Using Parallelism

Shahram Ghandeharizadeh, Member, IEEE, and Luis Ramos, Member, ZEEE,

Abstrut-Multimedia information systems have emerged as an essential component of many application domains ranging from library information systems to entertainment technology. This is because these systems utilize a variety of h-an senses to provide an effective means of communicating information. However, most implementations of these systems (based on a workstation) cannot support a continuous display of high resolution audio and video data and suffer from fkequent disruptions and delays termed hiccups. This is due to the low U0 bandwidth of the current disk technology, the high bandwidth requirement of multimedia objects, and the large size of these objects which mquks them to be almost always disk resident. In this paper, we describe a parallel multimedia information system and the key technical ideas that enable it to support a real-time display of multimedii objects. These techniques are as follows. First, we decluster a multimedia object across several disk drives, enabling the system to utilize the aggregate bandwidth of multiple disks to retrieve an object in real-time. Second, the workload of an application is distributed evenly across the disk drives in order to maximize the processing capability of the system. To support simultaneous display of several multimedia objects for different users, we describe two altemative approaches. The first approach multitasks a disk drive among several requests while the second replicates the data and dedicates resources to each individual request. We investigate the trade-offs associated with each approach using a simulation model. Our results demonstrate the superiority of the replication approach.

Index Term-Data placement, declustering, multimedia, mul- titasking, pipelining, shared-nothing architecture.

I. INTRODUCI’ION

URING the past decade, information technology has D evolved to store, retrieve, and manipulate multimedia data (e.g., text, audio, and video). Multimedia information systems utilize a variety of human senses to provide an effec- tive means of conveying information. Already, these systems play a major role in educational applications, entertainment technology, and library information systems. As an example, consider Compton ’s Multimedia Encyclopedia from Britannia Software. According to the publisher [7], it includes the full text of the 19-volume, 5200 article, 8 784 OOO-word 1989 edition of the Compton’s Encyclopedia; 15 800 pictures, maps, and diagrams; 60 minutes of recorded voice and sound; 45 animated sequences; Webster’s Intermediate Dictionary; and Josten’s word processing program. This is a large volume of

Manuscript received June 1, 1992; revised December 1, 1992. This work was supported in part by the NSF under Grants IRI-9110522 and IRI-9203389, and by the NYI under Award 1RI-9258362.

The authors are with the Department of Computer Science, University of Southern California, Los Angeles, CA 90089.

IEEE Log Number 9209926.

information and there are many ways to query it. A typical query would involve the traversal of a hierarchical topic tree which establishes a connection between the relevant pieces of information. So, for example, one can traverse the historical time line (which is one of several topic trees) to hear John F. Kennedy’s “Ask not what your country can do for you” speech.

As suggested by this example, once the information is stored in a multimedia system, it almost always becomes a read only database. Such a database can be compared to a library managing multiple books and users, where once a book is registered with the library, each user accesses the system to retrieve that book. While updates are rare, multimedia objects should be displayed in “real-time”. By “real-time”, we mean a continuous retrieval of an object at the bandwidth required by its media type. This is a challenging task because certain media types, in particular video, require very high bandwidths. For example, the bandwidth required by NTSCl for “network-quality” video is about 45 mb/s [16]. Recommendation 601 of the International Radio Consultative Committee (CCIR) calls for a 216-mb/s bandwidth for video objects [ll]. Moreover, video objects are typically very large and almost always disk resident. Due to the low bandwidth of the current disk technology (typically rated at 10 mb/s), the stand-alone implementation of these systems (based on a workstation) suffers from frequent delays and disruptions, termed hiccups [27], while the system is displaying an object.

There are two reasons why this limitation is not likely to be resolved by itself. First, the speed of the magnetic disk drive: which typically serves as the secondary storage device for these systems, is not expected to improve significantly in the near future [22]? Second, the future calls for mul- timedia objects with a higher resolution, and hence, higher bandwidth requirements. For example, a video object based on the HDTV (High Definition Television) quality images requires approximately a 700 mb/s bandwidth.

Currently, there are several techniques to support a real- time display of these objects: 1) sacrifice the quality of the data by using either a lossy compression technique (e.g.,

‘The US standard established by the National Television System C o d t - tee.

’In this paper, we focus on magnetic disk drives as the secondary storage device for multimedia systems. However, the concepts described in this paper are applicable to other secondary storage devices. We focus on magnetic disk drives because: 1) they are common place, 2) there are no misconceptions on how they operate.

3The raw seek time has only improved at a rate of 7% each year since 1980 [22].

1041-4347/93$03.00 0 1993 IEEE

Page 2: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

GHANDEHARIZADEH AND RAh4OS: CONTINUOUS RETRIEVAL

Display Stauons

...

UserA UserB

Disk I Dek2 Disk 3 Disk4 Disk 5

Fig. 2. Batch-order display of objects. Fig. 1. A shared-nothing architecture.

predictive [3], frequency oriented [ 181, importance oriented [15], etc.) or a low resolution device, 2) employ the aggregate bandwidth of several disk drives by declustering an object across multiple disks [13], and 3) use a combination of the first two techniques. Lossy compression techniques encode data into a form that consumes a relatively small amount of space, however, when the data is decoded, it yields a representation similar to the original (some loss of data). While it is effective, there are applications that cannot tolerate loss of data. As an example consider the video signals collected from space. This information may not be compressed using a lossy compression technique (or a low resolution device). Otherwise, the scientists who later uncompress and analyze the data run the risk of either observing phenomena that may not exist due to a slight change in data or miss important observations due to some loss of data.4

The focus of this study is on parallel multimedia information systems and applications that cannot tolerate loss in their quality of data to support its real-time display. In order to simplify the discussion, we assume a shared-nothing archi- tecture [25] or a multicomputer architecture as our hardware platform. However, the algorithms described in this paper can be extended to other architectures (e.g., shared-disk or shared-memory). Briefly, a shared-nothing architecture con- sists of a number of processors interconnected by a high speed communication network such as a hypercube or a ring. A processor consists of a CPU, a disk drive, and some random access memory. Processors do not share disk drives or random access memory and can only communicate with one another by sending messages using an interconnection network. Furthermore, we assume that the stations used to display objects are independent of the backend processors that contain the data, as shown in Fig. 1. In this study, we focus on the 1/0 bottleneck phenomena, and assume that the bandwidth of both the network and the network device driver exceeds the bandwidth requirement of an object. This assumption is justified considering the current technological trends.

Briefly, our techniques to support a continuous retrieval of multimedia objects are as follows. Assuming a fixed bandwidth for each disk ( B ~ i ~ k ) in the system, in order to support a real- time display of an object with bandwidth requirement Bt, it is declustered into M pieces where M = [ B t / B ~ i s k l [13]. The fragments of an object are constructed using a round-

4There are also lossless compression techniques (e.g., Huffman, Lempel Zev, etc.). While a good estimate for reduction in size with these techniques is anywhere from a factor of 2 to 15, with lossy techniques it ranges from a factor of 50 to loo0 [ll]. Consequently, it is generally accepted that lossless compression techniques cannot support a real-time display of video objects.

659

UserA UserB

2 2 Disk 1 Disk2 Disk3 Disk4 Disk5

Fig. 3. Disk multitasking.

robin assignment of its blocks to each processor, enabling the system to overlap the display of an object with its retrieval (multi-input pipelining).

In the presence of multiple users, two users may request objects whose fragments reside on an overlapping set of processors (see Fig. 2). In this case, the display of one object is delayed until the display of the other object is completed (batch-order display). This paper describes two alternative approaches to enable the system to support simultaneous display of multiple objects. The first approach, termed disk multitasking, configures the system to support a fixed number of users (UMAX.) or simultaneous displays. It multitasks a disk drive by requiring it to retrieve a portion of each requested fragment (say a track) in one scan of the disk drive (instead of the entire fragment as with batch-order). Obviously, this introduces disk seeks and reduced the bandwidth observed by each request. To compensate for this, an object is declustered across a larger number of processors (see Fig. 3). In order to ensure that the bandwidth provided by each disk drive does not exceed the bandwidth requirement of an object in the presence of fewer than UMAX requests, the system can either introduce dummy requests or employ a dataflow control mechanism [23].

The second approach replicates the fragments of an object across several processors so that the system can use a fragment of an object that resides on an idle processor to service a request (see Fig. 4). This paper quantifies the trade-offs asso- ciated with the replication and disk multitasking approaches. Briefly, the obtained results are as follows. Disk multitasking wastes the bandwidth of each disk drive by causing it to perform expensive seek operations. The replication technique does not incur this overhead. Consequently, it can support a higher number of simultaneous displays. However, this technique requires the system to provide a high amount of disk storage in order to store multiple copies of the data. When the disk capacity of the system limits the number of replicas constructed, disk multitasking may outperform replication.

When the database consists of a mix of multimedia objects where each object has a different bandwidth requirement, there are several strategies for replicating the data. This paper describes three alternative replication strategies and

Page 3: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

User A

K ZB Disk1 Disk2 Disk3 Disk4 Disks

Fig. 4. Data replication.

I ,

IEEE TRANSACTIONS ON KNOWLEDGE AND DNA ENGINEERING, VOL. 5, NO. 4, AUGUST 1993

quantifies their tradeoffs. The rest of this paper is organized as follows. Section I1 describes the work related to this study. Section I11 describes how the degree of declustering for an object is computed and introduces an algorithm to uniformly distribute the workload of an application across the processors. In Section IV, we describe the disk multitasking and the replication approach in more detail. In addition, this section describes three alternative strategies for replicating the data. Section V compares disk multitasking with the replication strategy, and evaluates the three replication strategies. Our conclusions and future research directions are contained in Section VI.

11. RELATED WORK

In addition to the compression techniques, to the best of our knowledge, there exist two other possible approaches to support a continuous retrieval of multimedia data. This section provides a description of each and differentiates them from this study.

The first technique organizes objects of a multimedia appli- cation across the surface of a disk drive in order to maximize its bandwidth when retrieving objects [6], [lo], [21], [28], [26]. The central idea is to reduce seek costs by clustering frequently accessed data together in locations on the storage device that are physically close. The complexity of approximating an optimal placement depends on the physical characteristics of a device (e.g., finding a solution is more complex for an optical disk as compared to a magnetic disk drive). This technique is effective only when the maximum bandwidth of a secondary storage device is higher than that required by multimedia objects (not the case with high resolution video data).

The second technique is to use a Redundant Arrays of Inexpensive Disks (RAID) [22] as a high bandwidth secondary storage device. The central idea behind RAID is to construct an VO subsystem which consists of a disk controller and an array of disk drives. The controller distributes a file or a multimedia object across all the drives in order to provide a high data rate upon its retrieval (ideal for supercomputer applications). In order to minimize the probability of the data becoming unavailable in the presence of disk failures, RAID has introduced a taxonomy of five different organizations of disk arrays, beginning with mirrored disks [SI and progressing through a variety of altematives with different performance and reliability characteristics.

For multimedia systems, a limitation of RAID is its lack of control on the placement of the data. Consequently, one cannot control the rate at which the 1/0 subsystem produces data. For example, consider a 2-h video object with a 45 mb/s bandwidth

-wz+ fr-t x

Fig. 5. Round-robin partitioning of object I.

requirement (this object is 40 gigabytes in size). Ideally, the I/O subsystems should produce this object at a continuous bandwidth of 45 mb/s for a period of two hours. However, if a RAID U0 subsystem consists of 100 disks, each disk with a bandwidth of 10 mb/s, then the data will be produced at a rate of 10oO mb/s, overflowing the memory buffers of a display station. Finally, it is unclear whether RAID can support a real-time display of several objects simultaneously. While the approaches presented in this paper are targeted toward a shared-nothing architecture, they can be extended to RAID to enable it to support a multimedia information system.

111. SINGLE USER DISPLAY OF MULTIMEDIA DATA

In order to retrieve objects of an application continuously at their required bandwidth, we decluster [20], [24] each object into several fragments (the number of fragments is termed the degree of declustering) and distribute these fragments across multiple processors. By combining the U0 bandwidth of several processors, the system can provide a retrieval rate equivalent to that required to display a multimedia object in real-time. Assuming that the bandwidth ( B ) required to display an object 2 which belongs to media type t is Bt, and the bandwidth of each disk drive is BDisk, object 2 is declustered across M processors in order to match the required bandwidth, where M is defined as:

(1) Bt M=r-l

B D i s k

Note that M is a function of the media type that an object belongs to (i.e., objects of the same media type have an identical degree of declustering). As long as the degree of declustering of each media type does not exceed the number of processors (P), the system can display all objects of an application at the required bandwidth in a single user environment.

Once the degree of declustering (M) for an object x is determined, its fragments are formed using a round-robin partitioning strategy as illustrated in Fig. 5. With this strategy, an object is partitioned into blocks of disk page size and fragments are constructed by round-robin assignment of these

Page 4: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

GHANDEHARIZADEH AND RAMOS: CONTINUOUS RETRIEVAL

blocks. This partitioning strategy allows a display station to construct and display a portion of x simultaneous with the retrieval of its remaining portion from the M processors (i.e., pipelining). Once a fragment has been formed, it is assigned to a processor and stored contiguously on its disk in order to minimize the number of seeks required to retrieve the fragment.

Note that lossless compression techniques can be used to minimize the number of processors required to display a multi- media object (i.e., compression complements parallelism). An object x can first be compressed and then declustered. Upon its retrieval, a display station can reconstruct a piece of z, uncompress it, and then display it. Compression techniques reduce the continuous bandwidth required to display an object (i.e., it reduces Bt for object x), and hence, minimize its degree of declustering (M) . However, it is generally accepted that for high quality video and animated sequences, the current lossless compression techniques cannot minimize M to one processor.

3.1. Assignment of Fragments to Processors

The objects of an application should be assigned to the processors such that: 1) the fragments of each object are assigned to different processors, i.e., for each object x declus- tered into M fragments (XI, 5 2 , . . . ,xi,. . . , Z M ) , location(zi) # location(zj) for distinct i and j , and 2) each processor has approximately the same workload, i.e., workZoad(Pi) x workZoad(Pj), for distinct i and j. The second requirement maximizes the processing capability of a partition by avoiding the formation of hot spots and bottleneck processors [12]. Assuming that the frequency of access and the size of each object x (termed heat(x) and size(x) respectively [8]) are provided, we define the work imposed on a processor by this object as

work(x) = heat(x) * size(x) . (2)

The work imposed by each fragment of z is:

(3) r B t i . size(x)

work(xi) = heat(x) * size(xi) = heat(x) * ~

' B D w k I

The workload of a processor is defined as the total work of the fragments (say N ) assigned to it (i.e., workload(Pi) = ELl heatpugmenti) * sizeCf.agmenti)). The assignment al- gorithm of Fig. 6 satisfies the first objective and approximates an optimal solution for the second objective. In this algorithm, the assignment of the M fragments of an object to the first M processors in list L ensures that the fragments are assigned to different processors. After the algorithm updates the workload of these processors, it inserts each one into the list L in a sorted order to approximate a uniform distribution of workload across the processors.

Iv . SIMULTANEOUS DISPLAYS OF SEVERAL MULTIMEDIA OBJECTS

In the presence of multiple users, two users may request objects whose fragments reside on an overlapping set of pro- cessors (see Fig. 2). This section describes disk multitasking

66 1

WodOut(V,L): (* Obtain an uignment of the objects in .et V ac- the pm-n in list L. *) (1 ) inibidisc the workload of all the proc-n in L to the total work of the

fragments already -ped to it, if a y ; sort the procarom in List L based an their workload with the lightest procesor

for urb object D in V do

(2)

(3) :.$ appealing fink

dedunkr z into M Iragments, remove the timt M ~ m - n from list L: -ign the M fragments o f t to t h s v pmuaon; adjust the worklosdof t h e M pmccaom based on the work imp& by hagment zi; i-tion sort each pm-r back into list L b a d on their workload;

$1 .nd do;

Fig. 6. An algorithm for the assignment of fragments to processors.

and data replication as two altemative approaches to support simultaneous retrieval of multimedia objects.

4. I. Disk Multitasking

This approach configures the system to support a fixed num- ber of users ( U M A X ) or simultaneous displays by multitasking each disk drive with these requests (see Fig. 3). Obviously, this introduces disk seeks and reduces the bandwidth observed by each request. In order to compensate for this, an object is declustered across a larger number of processors. In the presence of fewer than U M A X requests, the system intro- duces dummy requests in order to ensure a steady bandwidth from each disk drive which does not exceed the bandwidth requirement of an object.

In order to compute the degree of declustering for each object we need to compute the bandwidth provided by each disk drive when it is multitasked with UMAX requests (this bandwidth is termed Bu). This computation is dependent on the scheduling policy used by the disk drive to service the pending requests. We assume a Round-Robin CSCAN disk scheduling policy. With this scheduling policy, the disk head scans the entire disk in one direction starting from the outermost cylinder. As the head encounters a pending request for a fragment, it reads a portion (say a track) of this fragment and continues scanning the disk to service other requests. Upon reaching the innermost cylinder, the disk head is reset to the outermost cylinder and another scan begins. Thus, a request for a fragment is serviced by multiple scans of a disk drive. We chose this scheduling policy primarily because it minimizes the variance in the data retrieval rate of a disk drive.

With this scheduling policy, the bandwidth of a multitasked disk with UMAX requests is estimated by:

(4) PT Bv = ~

t s c a n

where P T is the portion of a fragment transfered in each scan of the disk drive per request, and t,,,, is the time to perform one SCAN of the disk. If the location of the requested fragments are equally spaced, the time to service each request is approximately the same. The time to seek to service the first request which resides on the outermost cylinder is significantly longer because the disk head is reset from the innermost to the outermost cylinder. Consequently, t,,,, is estimated by

n c/

t,,,, = ( U M A X - l )[seek- t ime( )+ UMAX - 1 settle-time + rotate-time + t rans f er-time]

Page 5: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

. I

662 IEEE TRANSACITONS ON KNOWLF!DGE AND DATA ENGINEERING, VOL. 5, NO. 4, AUGUST 1993

+ [seek_time(C) + settledime+ 4.2. Data Replication rotatedime + trans f er-time] (5)

where seek-time(C) defines the amount of time to seek C cylinders, settle-time is the time required by the head to stabilize its position after performing a seek and rotatedime is the disk rotation time so that the start of the data is underneath the disk head, and trans f er-time is the time to transfer data stored on one track of of the disk. References [4] and [14] estimate the time to seek C cylinders to be:

seekdime(C) = seek-fador x a (6) where seek-fador is a device dependent parameter that can either be provided by the manufacturer or obtained by experimentation. We use this formula to approximate the seek times in (5):

= WMAX - I)+ seekdime(C)d- + ( D + seekdime(C))(7)

where,

D = settle-time + rotatedime + trans f er-time

The degree of declustering for a multitasked disk is defined as: Bt

M~ = rKi The number of processors (P) is an upper bound on MU. If Mu exceeds P, then UMAX should be reduced in order to ensure that Mu is less than or equal to P.

In the presence of fewer than UMAX requests, we introduce dummy requests in order to bring the number of pending requests to UMAX and ensure a constant bandwidth from each disk drive. Dummy requests waste system resources; an alternative might be to employ a dataflow control mechanisms to control the bandwidth of a disk drive without wasting it [ 11, [2]. However, this topic is not our focus of attention and we chose the simplest possible approach in order to demonstrate the tradeoffs associated with disk multitasking. Note that this decision does not impact the performance results presented in Section 5.3.

4.1.1. Dummy Requests: If the system is to support UMAX active requests, UMAX dummy requests (0,1, . . . , UMAX - 1) are activated on each disk at system generation time. These requests are uniformly distributed across the surface of a disk drive. If a disk has C tracks, the ith dummy request will retrieve the data residing at track number L( C/UMAX - 1) (i - 1) J . Once this dummy request is serviced, its data is discarded and a new dummy request is regenerated for that Same address while the head proceeds to service the next dummy request.

When a genuine request is issued for a fragment whose starting address is track number j, the system replaces the dummy request that retrieves the track closest to j with this request. Since this request is serviced by several scans of the disk drive, a dummy request is not activated until the requested fragment is retrieved in its entirety. If upon the arrival of a request, there are no pending dummy requests (i.e., the disk is utilized by UMAX requests), this request waits in a queue until a dummy request becomes available (described in more detail in Section 5.1).

The replication approach partitions the processors into groups and assigns a replica of the fragments of an object to each group. Assuming a system with P processors and a database consisting of one media type with M as its degree of declustering, the processors can be partitioned into RMAX = P/M groups, with a replica of the database assigned to each partition, enabling the system to support RMAX simultaneous display of the objects.

The disk capacity of a partition may not be large enough to store a copy of the database. In this case, the processors are partitioned into fewer groups. This reduces the number of processor partitions and simultaneous displays of multimedia objects. In order to maximize the throughput of each such a partition, the objects of an application can be assigned to it using the algorithm described in Fig. 5.

When the database consists of a mix of media types with a different bandwidth requirement (and degree of declustering) for each media type, there are three alternative techniques for replicating the data: 1) Media Independent Partitioning (ME'), 2) Hierarchical Partitioning (Hip), and 3) MEdia Q p e partitioning (MET). These strategies result in a different num- ber of replicas and degree of interaction among the objects of different media types. Below, we describe each strategy in detail assuming that the storage capacity of a partition is large enough to store a copy of the database? For illustration purposes, we describe how each strategy organizes the objects of two different media types A and B across a 16 processor system with MA = 5 and MB = 2.

4.2.1. Media Independent Partitioning (MIP): Assuming that the database consists of n media types, this strategy analyzes the database and groups objects of the same media type together. Next, it partitions the processors using the degree of declustering of each media type and replicates the objects of that media type across each processor group. This strategy constructs the maximum number of possible replicas per media type. It also results in the highest degree of interaction among the objects of the different media types and suffers from the following limitation. The display of objects of one media type B may render two replicas of another media type A useless when MA > MB, resulting in externul processor fragmentation! In our example, MIP organizes objects of media type A and B as shown in Fig. 7(a). Group Bt overlaps two replicas of media type A, and if it is used to service a request, then groups A1 and A2 are rendered useless?

51nvestigating the trade-offs of each approach when this assumption is violated is a part of our future research direction.

concept is similar to the external memory fragmentation that occu~s in operating systems.

7While a subset of the idle processors (1, 2, 3, 4, 7, 8, 9, and 10) may contain a replica of objects of media type A, our current design cannot make use of this replica. 'Ibis is because we assume that the system maintains the availability of each replica of a media type (and NOT the fragments). If it maintained the availability of each fragment and its replicas, then it might have been able to utilize some of these idle processors to service a request for media type A. Section VI contains a discussion of such a technique, its overheads, and its impau on the obtained results.

Page 6: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

I

GHANDEHARIZADEH AND RAh4OS: CONTINUOUS RETRIEVAL

A, A, A,

A , A,

(c) Fig. 7. Three replication strategies.

When P is not a multiple of the degree of declustering of a media type, MIP results in residual processors. In our example, there is one residual processor for media type A which is incorporated into A3. The workload of an application can be distributed evenly across the processors of this partition using the algorithm of Fig. 6. Note that residual processors do not impact the degree of declustering of an object, only the placement of its fragments.

4.2.2. Hierarchical Partitioning (Hip): The Hierarchical Partitioning (HIP) strategy eliminates external processor frag- mentation by avoiding the overlap of media type B across two replicas of media type A when MA > MB. This is achieved as follows. First, it ranks the media types according to their degree of declustering. Next, starting with the media type that has the highest degree of declustering, HiP partitions the processors into groups (termed set A) and replicates each object of this media type across each group. Subsequently, it partitions the processors of each group in set A for the next media type and replicates the objects of that media type in each subgroup. This process is repeated recursively until all objects of the different media types have been replicated and assigned. Fig. 8 presents a recursive function that implements this strategy. Assuming a system with P processors and an application with MAXMEDIA types of objects organized using an array termed MEDIAMIX in a decreasing order of degree of declustering, an initial call is made with HiP(l,P), where 1 refers to the first element of the array. In our example, HiP forms three groups for media type A, and for each of these groups, it forms as many subgroups as possible for media type B (see Fig. 7(b)).

HiP does not suffer from external processor fragmentation, however, it suffers from a complementary limitation which we term internal processor fragmentation. This limitation arises when the degree of declustering for one media type is not a multiple of that of another, resulting in residual processors for the second media type. In our example, groups A1 and A2 are each partitioned into two groups for media type B, resulting in one residual processor which has been incorporated into a group for media type B. The residual processors cause HiP to construct fewer replicas of objects of media type B as compared to the MIP strategy.

663

HrP(1 ,P): (* Replicate the objects of the media type with rank t over the processors in

( I ) (2) raturn; (3) d s e bagin (4) ( 5 ) (6) (7) (8) for each group P , , 1 5 i 5 R do (9) WorkD:st(V,,P,);

(11) end do (12) end i f

set P. *) if t 2 MAXMEDIA then

Mt = degree of declustering of objects of MEDIAMIX[t]; form a set Vt conslating of all objects with a degree of declustering M,; compute the number of replicas R = [qj, partition the processors in P into R groups PI, P?, ..., P,, .._, PR;

(10) H I P ( , + l,P.),

Fig. 8. HIP strategy.

4.2.3. MEdia Type Partitioning (MET): This strategy elim- inates interactions between the different media types by par- titioning the processors into disjoint groups and dedicating each group to the retrieval of a single media type (see Fig. 7(c)). The number of processors allocated to a media type is proportional to its frequency of access. By allocating more processors to the most frequently accessed media type, MET can construct more replicas of this media type and provide a lower average response time. However, a media type with a high bandwidth requirement has a higher degree of declustering, and hence, requires a larger number of processors (regardless of its frequency of access); consequently, the degree of declustering for a media type is the lower bound on the number of processors dedicated to that media type. (Otherwise, the allocated processors would be useless as they cannot support a continuous retrieval of that object.) Thus, the number of processors assigned to media type i (1 5 i 5 n) is defined as

(9)

where H, represents the total heat of objects of media type i (E, obj I of media heat(z)) . Consequently, the number of replicas for objects of media type i is:

D Ri = 1 2 1 .

Mi If Pi is not a multiple of Mi, then the formation of processor

groups for media type i will results in Pi mod Mi residual processors. MET combines the residual processors from each media type to form additional processor groups. The total number of residual processors is:

n

Total Residue = Pi mod Mi ( 1 1 ) k l

An algorithm for assigning media types to these residual processors is shown in Fig. 9. This algorithm allocates all the residual processors to the media types with the highest frequency of access.

V. PERFORMANCE EVALUATION We compared disk multitasking with the replication ap-

proach and quantified the performance trade-offs associated with the alternative replication approaches using a simulation model of a parallel information system. This model is an

Page 7: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

IEEE TRANSACTIONS ON KNOWLEDGE AND DMA ENGINEERING, VOL. 5, NO. 4, AUGUST 1993

Auty~-Ruid8e (Q) : (* Form additional group of p r o c u ” from the r t of reaidud procapom Q. *) (1)

1;; Mdn := the media type with the loweat degree of declustering; eompuk the total hut Hi for urh media type i, 1 5 i 5 n; form an ordered lint L of media type sorted in a d-ing order of H i ;

(4) r.put (5) (6) (7) (8) (9) .1.. (10) (11) .rd if (12) nuti1 IQ[ <

let i : = h t mediatype in list L if IQ1 > Mi th.n

m o v e media type i f“ L and inaert at the end of list L; m o v e Mi pmeaaors f” d Q and docate to media type i ;

m o v e media tfpe i from L;

Fig. 9. An algorithm for grouping residual processors.

...

Fig. 10. Simulation model.

extended version of a verified simulation model of the Gamma [9] database machine with an Intel iPSC/2 hypercube as its underlying hardware platform [17]. It is implemented using the DeNet [19] simulation language. We start by describing the simulation model. Next, we compare disk multitasking with replication. Finally, we evaluate the alternative replication strategies.

5.1. Simulation Model

Fig. 10 presents the overall organization of the simulation model. The rectangles correspond to the four main elements of the multimedia system: the Display Station, Centralized Scheduler, Interconnection Network, and Processor. Each Display Station, Scheduler and Processor consists of a Net- work Interface Manager. In addition, each Processor consists of a Disk Manager, and a Memory Manager. Each display station consists of a Tenninal which generates the workload of the system; and a Device Driver, which groups fragments and displays them.

TABLE I SLMUWON PARAMETERS

Network packet size 8 kilobytes avg network latency for 100 byte packet avg network latency for 8 kilobyte packet

average settle time 2 ms average rotational latency 8.33 ms seek factor 0.78 ms disk page size 18 kilobytes disk transfer rate 14.4 mb/s

0.6 ms

5.6 ms

Disk

r m r , InstructiondSecond 4000000 read 18 kilobyte disk page transmit/receive 100 byte packet transmiVreceive 8 kilobyte packet Xfer one disk page from SCSI to memory

32 800 instructions 160 instructions 13 120 instructions 9OOO instructions

Media Device Driver Bandwidth 45 mbls video scanning rate 30 frames per second

The centralized scheduler consists of an Object Manager which maintains the individual multimedia objects and the identity of the processors containing their relevant fragments. Each Display Station’s request is first dispatched to the cen- tralized scheduler. The centralized scheduler is in charge of activating a request across the processors with the relevant data. It uses a First Come First Serve (FCFS) policy to activate requests with: 1) disk multitasking, 2) batch-order, and 3) the data replication approach when the database consists of a single media type.

The Network Interface manager enforces an FCFS policy for access to the global communications network. The Network module currently models a fully connected network and distin- guishes between two types of packets: 1) control packets (for requesting an object), and 2) normal data packets (containing a portion of a fragment). Control packets are generally less than 100 bytes long while the data packets are fairly large. The Intel iPSC/2 hypercube can transmit small packets faster than large packets (using fewer CPU instructions) [9]. The ratings for the network bandwidth and the network interface are based on the specification of the hypercube (see Table I).

5.2. System Characteristics

The values for the disk settle time, latency time, transfer rate, and disk capacity are based on the MAXTOR 4380 (5 114 in) disk drive. We chose the characteristics of this disk drive because it is common with the Intel iPSCI2 hypercube. The disk seek time is approximated by multiplying the seek factor and the square root of number of tracks to seek [4], [14]. We use a disk page size of 18 kilobytes. In order to accurately reflect the iPSC/2 hypercube, the Disk Manager interrupts the CPU when there are bytes to be transferred from the VO channel’s FIFO buffer to memory or vice versa.

Page 8: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

GHANDEHARlZADEH AND RAMOS: CONTINUOUS RETRlEVAl

Response Time (seconds) Throughput (objects &splayed per minute)

40 /’ Mulntzslung

O I 5 IO 15 20

Disk Mulntzshng

1 5 I O I5 20

(a) (b)

database. Fig. 11 . Performance of the alternative approaches for a single media

The CPU module enforces an FCFS nonpreemptive scheduling paradigm on all requests.

The microprocessor used by the processor nodes of the hypercube can execute instructions at a rate of 4 million machine instructions/s. The resource managers at the Processor Node require CPU cycles to perform their functions. Reading an 18 kilobytes disk page requires about 32 800 instructions while transmitting or receiving a 100 byte and 8 kilobytes packet requires 160 and 13 120 instructions, respectively. These characteristics are based on the measurements from the Intel iPSC/2 hypercube.

A user employs a display station to request for the display of an object. Throughout the rest of this paper, the term user is interchangeably used with display station. We assume that a display station can display only one object at a time. In our experiments, we varied the number of display stations from one to 40 in order to vary the system load. We assumed a closed system where once a display station issues a request, it does not issue another until the first one is serviced. We also assume a zero think time between the requests. This parameter was chosen in order to stress the system and investigate its limitations. We assume a uniform distribution of access to the objects in the system.

5.3. Replication versus Disk Multitasking

In these experiments, the system consisted of seventy pro- cessors (P = 70) and the database consisted of a single media type with a degree of declustering of 14 (M = 14). We assumed a 12-s display time for this media type and a uniform distribution of access to the objects in the database.

With the batch-order approach, the database is distributed across all 70 processors (using the algorithm of Fig. 5). The replication approach forms five storage groups each consisting of 14 processors (RMAX = 5). With the disk multitasking approach, the overhead associated with disk seeks is so sig- nificant that when each object is declustered across all 70 processors, the system can support at most two simultaneous displays of this object (UMAX = 2) with no hiccups.

Fig. 11 presents the response time and throughput of the alternative approaches as a function of the number of users. The replication approach provides the best overall response time, throughput, and disk utilization. It provides a relatively low and invariant response time when the number of users

665

in the system is less than or equal to five because the system can service a request almost immediately at these multiprogramming levels. Beyond a multiprogramming level of five, additional requests are queued at the centralized scheduler for an available replica, causing: 1) the throughput of the system to level off, and 2) the response time of the system to increase as a function of the average display time of an object and the multiprogramming level of the system. At these multiprogramming levels, the disks are 100% utilized.

With the batch-order approach, the response time of the system is much higher than both replication and disk multitask- ing because multiple requests collide and compete for access to processors. Everytime two requests collide and compete for a subset of the required 14 processors, some processors remain idle, causing a low throughput and disk utilization for this approach. The highest disk utilization observed with this approach was approximately 40%.

With disk multitasking, the response time is invariant when the number of users is less than or equal to two (U 5 U M A X ) . This is because these requests are serviced within one scan of the disk. In the presence of more than UMAX requests, the additional requests are queued at the centralized scheduler causing the throughput of the system to level off. At these multiprogramming levels, each disk is 100% utilized. Approximately 55% of the disk utilization was due to disk seeks introduced by this approach. This should come as no surprise because the seek time constitutes a significant portion of the disk service time.

The fundamental reason why the replication approach out- performs disk multitasking is that it services each request more efficiently by reserving resources for that request and performing it as efficiently as possible (no seeks). While the batch-order approach does not incur the overhead of seeks, its performance is lower because it causes some of the processors to remain idle everytime two or more users request objects whose fragments reside across an overlapping set of processors.

The replication approach requires a large amount of disk storage as compared to both the batch-order and the disk multitasking approaches. The database may be so large that one may not be able to replicate it RMAX times. In this case, fewer replicas (R) are constructed with each partition containing more processors. The system can use a batch-order approach for servicing multiple requests for different objects within such a partition. Indeed batch-order represents the worst case scenario where the disk storage is so scarce that the replication approach cannot construct an additional replica of the database.

In a second experiment, we characterized the performance of the replication approach as a function of the number of replicas constructed and compared it to the disk multitasking approach. As before, the system consisted of 70 processors, however, each object in the database had a lower bandwidth requirement and required four processors to support its contin- uous display for a single user (M = 4). The disk multitasking approach can support 13 simultaneous displays of this media type (UMAX=13) by declustering each object across all 70 processors. Fig. 12 presents the response time of the system

Page 9: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

666 IEEE TRANsAcTlONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 5, NO. 4, AUGUST 1993

0 5 1 O l S 2 0 2 5 3 0

y 10 , , , , , ,

1 5 10 15 U) 25 30 35 40 Number of Usus N u m b of Ums

Fig. 12. Sensitivity to the degree of replication. Fig. 13. FAR scheduling policy.

when the replication approach constructed 1 (batch-order), 3, 7, 13, and 17 (RMAX) processor partitions. It outperforms disk multitasking (DM) when the degree of replication is 13 or higher. With 13 replicas, disk multitasking is outperformed by a constant margin because it incurs the following overheads: 1) activation and termination of each request across 70 processors (as compared to 4 with replication), 2) the overhead of managing the dummy requests at each site, and 3) one scan of the disk before the retrieval of the requested object is started. With fewer than 13 replicas, disk multitasking provides a better response time because replication incurs the overhead associated with multiple users competing and colliding for the processors of a replica.

In [23], we characterized the performance of these three approaches as a function of: 1) the size of each object, and 2) the bandwidth requirement of each object. In all experiments, the replication strategy outperformed disk multitasking as long as R (its number of replicas) is equal to or greater than UMAX (the number of users the disk multitasking approach is configured to support).

5.4. Evaluation of Alternative Replication Strategies

In these experiments, the system consisted of 60 processors (P = 60) and the database consisted of two media types A and B, MA = 5 and MB = 4 respectively. In order to make this evaluation systematical, we chose a constant display time for the objects of each media type (12 s). In addition, we assumed the same number of objects for each media type and a uniform distribution of access to the objects (50% frequency of access to each media type). With these parameters, MIP constructs 10 replicas of media type A and 15 copies for media type B. MET creates six copies for each media type. HiP constructs ten replicas of each media type A and B. While MIP and MET result in no residual processors, HiP results in 10 residual processors for media type B.

When the database consists of a mix of media types, with MIP and Hip, the centralized scheduler can activate requests across the different replicas of each media type using either a First Come First Serve (FCFS) or a First Available Resource (FAR) policy. With FAR, a request is activated as won as a partition containing a replica of the relevant object is available to service it. To illustrate these policies, consider the assignment obtained by MIP in Fig. 7(a). Assume that the system is currently servicing two requests for objects of

media type A (using A1 and Az) and two requests for objects of media type B (using B6 and By). Furthermore, assume that one request for object of media type A is pending. If a second request for an object of media type B arrives, the FCFS policy will not activate this request because the request for media type A is pending, causing two processors to remain idle. However, FAR will activate this request using Bs, causing: 1) all resources in the system to become fully utilized, and 2) potentially a longer wait time for the request accessing media type A. When the data is replicated using MET, the centralized scheduler should not use a FCFS policy across the requests for different media types because MET allocates processors to each media type and eliminates interactions between the objects of different media types.

Fig. 13 presents the throughput of the system as a function of the number of users for the alternative replication strategies. In this experiment, the centralized scheduler activated requests using a FAR policy.

With an underutilized (6 5 U 5 10) system load, Hip outperforms both MIP and MET (not visible in Fig. 13) because it constructs 10 replicas of each media type and can service all 10 different requests almost immediately. MET constructs six replicas of each media type and when the mix of requests is unevenly distributed across the two media types (due to the random nature of the workload) some resources sit idle. On the other hand, MIP suffers from external processor fragmentation which results in some requests for media type A to wait for a longer interval of time while some resources sit idle.

With an overcommitted system load (U > lo), MET and MIP provide comparable performance and outperform Hip because they can utilize the full processing capability of the system. Hip cannot utilize all resources because it constructs ten residual processors for media type B and when it services these requests, it causes some processors to remain idle. If MA was an exact multiple of MB, MIP and HiP would have provided an identical performance [23].

While at first glance, MIP and MET appear to provide an identical performance, an analysis of the results revealed that MIP results in a higher response time for the objects of media type A (see Fig. 14).8 With MET, the response time for both media types is similar because the number of processors dedicated to each media type is the same (Le., 6).

*Hip exhibits a behavior similar to MIP.

Page 10: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

GHANDEHARIZADEH AND RAMOS: CONTINUOUS RETRIEVAL. 667

Response T h e (seconds)

MIP. M d a A ,. 35 30 I

Responsc Tune (seconds)

FAR, Media A,,

,//

/ MET. ,' MediaA

FAR, M 2 i B

1 5 IO 15 20 25 30 35 40 Number of Users Number of Usen

Fig. 14. Response time for each media type. Fig. 16. MIP, FCFS versus FAR.

Throughput (objects displayed pcr minute) 80.

60

50 -

40-

20 30t 1 1 5 10 15 20 25 30 35 40

Number of Users 0 20 40 60 80 100

Fnquency of access to objects of media type B (%)

Fig. 15. FCFS scheduling policy. Fig. 17. Varying access patterns.

With MIP, requests for objects that belong to media type B are serviced earlier because: 1) the system has more replicas of these objects due to their lower degree of declustering, and the external processor fragmentation caused by MIP, and 3) requests for objects of different media types compete for resources and the FAR scheduling policy favors those requests that retrieve objects with a lower degree of declustering due to a higher availability of processors that contain these objects.

5.5. FCFS Scheduling Policy

Fig. 15 presents the performance of the alternative replica- tion strategies when the centralized scheduler services requests for objects of different media types using a FCFS policy. This policy is not appropriate for the MET replication strategy because MET eliminates interactions between requests for ob- jects of different media types. Consequently, the performance of MET remains unchanged from the previous experiment. At high multiprogramming levels, MET outperforms both MIP and HiP because it can utilize all sixty processors while the other two strategies cause some of the processors to remain idle. This is because the FCFS policy avoids the centralized scheduler from allocating resources to requests for object of media type B unless a previous request for an object of media type A has already been serviced. With this scheduling policy, as we increased the ratio between MA and MB, MET outperforms MIP and HIP by a wider margin.

Fig. 16 presents the average response time of requests for objects of different media types for the MIP strategy (for both FCFS and FAR). While the FCFS policy eliminates the variance between the response time of requests for objects of different media types, it results in a higher response time for both media type (except media type A at multiprogramming

levels higher than 20), and a higher average response time for MIP because the FCFS policy causes some of the processors to remain idle.

5.6. Varying frequency of access

MET utilizes the frequency of access to the different media types in order to construct the appropriate number of replicas of each media type. The previous section assumed a perfect knowledge of the frequency of access to the objects of the different media types. However, this may not be a realistic assumption. In this experiment, we assume a 50% frequency of access to each media type and investigate the sensitivity of the alternative replication strategies when this assumption is violated.

Violating the assumed frequency of access has a signif- icant impact at high multiprogramming levels when all the system resources have been committed. Fig. 17 presents the response time of the replication strategies when the frequency of access to media type B is varied from 0% to 100% at a multiprogramming level of 40 using a FAR scheduling policy. The response time of HiP is a constant because it constructs ten replicas of each media type (and we assumed a constant display time for each media type). The response time of MIP decreases as the frequency of access to media type B increases because it has more replicas of this media type (fifteen as compared to ten for media type A). MET provides the best response time when each media type has a 50% frequency of access (the assumed value for constructing the appropriate number of replicas). With other mix of requests, MET causes the processors dedicated to the least frequently accessed media type to remain idle, resulting in a higher average response time.

Page 11: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

668 IEEE TRANSACTIONS ON KNOWLENE AND DNA ENGINEERING, VOL. 5, NO. 4, AUGUST 1993

VI. CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS

In this paper, we described techniques to support a con- tinuous retrieval of multimedia data using parallelism by declustering an object across multiple processors. In order to support simultaneous retrieval of an object for different users, two alternative approaches were described: 1) disk multitasking, and 2) data replication. When the database consists of a mix of multimedia objects, there are several techniques to replicate the data. This paper described three alternative replication strategies: 1) MIP, 2) Hip, and 3) MET.

We compared disk multitasking with data replication and quantified the performance tradeoffs associated with the repli- cation strategies using a simulation model and a synthetic workload. Our conclusions are as follows. The replication approach outperforms disk multitasking as long as the storage capacity of the system can support the maximum number of replicas constructed by this strategy. The fundamental reason for this is that disk multitasking introduces seeks at each disk drive, wasting the disk bandwidth. The replication approach services each user as efficiently as possible using a replica of the requested object.

When the storage capacity of the system limits the number of replicas constructed by the replication approach, disk multi- tasking may outperform the replication strategy. When UMAX (the number of simultaneous displays the disk multitasking is configured for) is greater than R (the number of replicas constructed by the replication strategy), disk multitasking will outperform replication. When UMAX is equivalent to RMAX, replication continues to provide a better performance because disk multitasking incurs the following overheads: 1) activation and termination of each request across a larger number of processors, 2) overhead of managing the dummy requests at each site, and 3) on average it causes each request to wait for one scan of the disk drive before the retrieval of the requested data is started.

We made the following conclusions from evaluating the alternative replication strategies. When the frequency of access to the different media types are known and the number of users is expected to exceed the number of replicas in the system, MET is a superior replication strategy because: 1) it maximizes the performance of the system, and at the same time, constructs fewer replicas of each media type as compared to MIP and Hip, i.e., has a lower disk space requirement, and 2) it does not result in starvation.

When the frequency of access to the different media types is unknown or is expected to vary significantly, either MIP or HiP would be a more appropriate replication strategy. This is because MET dedicates processors to each media type, causing those processors dedicated to the media type with the least frequency of access to remain idle. The choice between MIP and HiP depends on the number of residual processors constructed by Hip. If there are no residual processors, Hip is more appropriate because it eliminates external processor fragmentation. Otherwise, MIP is a better strategy because it maximizes the number of replicas (and the processing capability of the system).

When the number of users exceeds the number of replicas, both MIP and HiP become sensitive to the scheduling policy. With the FAR policy, both strategies favor requests that access the media type with the lowest degree of declustering due to a higher availability of processors containing these objects, causing requests for objects with a higher degree of declustering to starve for service. When MIP results in external processor fragmentation, it results in a longer delay for the starving requests as compared to Hip because ex- ternal fragmentation increases the availability of processors containing a media type with a low degree of declustering. A FCFS scheduling policy eliminates the starvation behavior for both replication strategies, however, it reduces the overall performance of the system because it causes some processors to remain idle.

One assumption of this study is that the system activates requests based on the availability of the different processors partitions for each media type. An alternative might have been to maintain the availability of each individual fragment and its replicas. We believe such a technique is unrealistic due to the significant amount of data that the centralized scheduler would have had to maintain. For example, every time a processor is activated, this technique would have required the centralized scheduler to change the status of all fragments that reside on this processor to unavailable. Regardless of this and for the sake of completeness, consider the impact of such a technique on the evaluation of the alternative replication strategies. While it would have had no impact on MET, it would have improved the performance of MIP and Hip with the FCFS policy by minimizing the number of idle processors. To illustrate, consider the placement obtained by MIP in Fig. 6. When the system uses B3 to service a request, this technique enables the system to use a subset of remaining processors from A1 and A2 to service a request for media type A. It would have had no impact on the performance of MIP with FAR because all resources become 100% utilized with FAR, however, it would have reduced the delay observed by the requests for media type A because it would have increased the availability of this media type.

We intend to extend this study in several ways. First, we intend to analyze the disk storage requirements of each replication strategy and quantify their performance tradeoffs in the presence of a limited disk space. Second, this study assumes a centralized scheduler that activates requests by maintaining a list of available replicas for each media type. For a system with thousands of processors, this site might become a bottleneck. Currently, we are investigating a distributed scheduling mechanism for activating requests? Third, with data replication, the failure of a site (typically due to a disk failure) may cause the remaining processors of a partition to become useless. We intend to introduce techniques that can utilize the remaining processors of such a partition by incorporating them into other partitions. In addition, we are interested in quantifying the sensitivity of each replication strategy to such failures.

9With such a mechanism, enforcing a FCFS policy among requests might become very hard (if not impossible).

Page 12: Continuous retrieval of multimedia data using parallelism ...pdfs.semanticscholar.org/d847/169301e4e92b086188dbbb58560072ee07c6.pdfmultimedia objects, and the large size of these objects

GHANDEHARIZADEH AND RAMOS: CONTINUOUS RETRIEVAL 669

REFERENCES

D. Anderson and G. Homsy, “A cotinuous media I/O server and its synchronization,” IEEE Comput., Oct. 1991. D. Anderson, Y. Osawa, and R. Govindan, “Real-time disk storage and retrieval of digital audio and video,” UCBIERL Tech. Rep. M91/646, Univ. Calif. Berkeley, 1991. D. R. Benigni, Ed., Digital Kdeo m the PC Envirohnect. New York: McGraw-Hill, 1991. D. Bitton and J. Gray, “Disk shadowing,” in Proc. 1988 1/LDB Conf., Sept. 1988. A. Borr, “Transaction monitoring in Encompass T’: Reliable distributed transaction processing,” in Proc. 1981 VLDB Cont, 1981. S. Christodoulakis, “Performance analysis and fundamental performance tradeoffs for CLV optical disks,” Tech. Rep. CS-88-06, Univ. Waterloo, Waterloo, ON, 1988. Brittanica Inc., Compfon ’s Multimedia Encyclopedia. Encyclopedia Brittanica, Inc. 1991. G. Copeland, W. Alexander, E. Boughter, and T. Keller, “Data Place- ment in Bubba,” in Proc. 1988 SZCMOD Conj, June 1988. D. DeWitt, S. Ghandeharizadeh, D. Schneier, A. Bricker, H. Hsiao, and R. Rasmussen, “The Gamma database machine project,” IEEE Trans. Knowl. Data Eng., vol. 1 , Mar. 1990. D. Ford and S. Christodoulakis, “Optimizing random retrievals from CLV format optical disks,” in Proc. 1991 VLDB Conf., Sept. 1991. E. A. Fox, “Advances in interactive digital multimedia sytems,” IEEE Comput., Oct. 1991. S. Ghandeharizadeh and D. DeWitt, “A multiuser performance analysis of alternative declustering strategies,” in Proc. Database Eng. Conf, 1990. S. Ghandeharizadeh, L. Ramos, Z. Asad, and W. Qureshi, “Object placement in parallel hypermedia systems,” in Proc. 1991 VLDB Conf., 1991. J. Gray, B. Host, and M. Walker, “Parity striping of disc arrays: Low- cost reliable storage with acceptable throughput,” in Proc. 1990 VLDB Conf, Aug. 1990. J. Green, “The evolution of DVI system software,” Commun. ACM, vol. 35, Jan. 1992. B. Haskell, “International standards activities in image data com- pression,” in Proc. Scientific Data Compression Workshop, 1989, pp. 4 3 9 4 9 . H. Hsiao, “Availibility in multiprocessor database machines,” Ph.D. dissertation, Univ. Wisconsin, Madison, WI, 1990. A. Lippman and W. Butera, “Coding image sequences for interactive retrieval,” Commun. ACM, vol. 32, July 1989. M. Livny, “Denet user’s guide,” Tech. Rep., Univ. Wisconsin, Madison, WI, 1989. M. Livny, S. Khoshafian, and H. Boral, “Multi-disk management algorithms,” in Proc. 1987 SIGMETRlCS Conf., May 1987. M. Mckusick, W. Joy, S. Leffler, and R. Fabry, “A fast file system for UNIX,” ACM Trans. Comput. Syst., Aug. 1984. D. Patterson, G. Gibson, and R. Katz, “A case for redundant arrays of inexpensive disks (RAID),” in Proc. 1988 SICMOD Cont, June 1988. L. Ramos, “Real-time retrieval of continuous multimedia data using parallelism,” Ph.D. dissertation, Univ. Southern California, 1992.

[24] D. Ries and R. Epstein, “Evaluation of distribution criteria for distributed database systems,” UCB/ERL Tech. Rep. M78/22, UC Berkeley, May 1978.

1251 M. R. Stonebraker, “The case for shared-nothing,” in Proc. 1986 Data Eng. Conf., 1986.

[26] C. Wong, W. Joy, S. Leffler, and R. Fabry, “Minimizing expected head movement in one-dimensional and two-dimensional mass sotrage systems,” Computing Surveys, June 1980.

[27] C. Yu, W. Sun, D. Bitton, Q. Yang, R. Brunno, and J. Tullis, “Efficient ulacement of audio on outical disks for real-time aoulications,” Commun. .. ACM, July 1989. . scheme in storage applications,” J. ACM, Oct. 1973.

[28] P. Yue and C. Wong, “On the optimality of the probability ranking

Shahram Ghandeharizadeh received the B.S., M.S., and Ph.D degrees in computer sciences, in 1985, 1987, and 1990, respectively, from the University of Wisconsin, Madison.

In September 1990, he joined the Computer Science Department at the University of Southern California, where he is currently an Assistant Professor. One of his current research program involves the design and implementation of a highly parallel object-oriented database machine. This project is studying alternative implementations of

next-generation database constructs, declustering algorithms for data intensive applications, and techniques to support applications with irregularly structured data (e.g., multimedia information systems, CAD/CAM, scientific databases). To support this research, the project has implemented the Omega database machine on a cluster of workstations. One of his main research objective is to demonstrate the feasibility of the techniques presented in this paper using Omega.

Dr. Ghandeharizadeh is a member of the Association for Computing Machinery and the IEEE Computer Society He received an NSF Young Investigator Award in 1992

Luis Ramos received the B.S. in applied math in 1982 from the University of Philippines, the M.S. degree in computer science in 1985 from Ateneode Manilla University, and the Ph.D. in computer sci- ence in 1992 from the University of Southern Cal- ifornia.

He has been with Object Design, Inc. since 1992. His research interests include design of novel algo- rithms for multimedia and hypermedia information systems.

Dr. Ramos is a member of the Association for Computing Machinery and the IEEE Computer Society.