Mass Storage System Reference Model€¦ · High-performance data base systems, for example, may need to organize disk storage directly and schedule disk operations to gain performance,

Mass Storage System Reference Model:Version 4 (May, 1990)

Developed by the IEEE Technical Committee on MassStorage Systems and Technology

Edited by: Sam ColemanLawrence Livermore National Laboratory

Steve MillerSRI International

All rights reserved by the Technical Committee on Mass Storage Systems and Technology,Institute of Electrical and Electronics Engineers, Inc.

This is an unapproved draft subject to change and cannot be presumed to reflect the position of theInstitute of Electrical and Electronics Engineers, Inc.

i i Mass Storage System Reference Model:

Mass Storage System Reference Model:Version 4 (May, 1990)

Contents Page

1 . Preface ....................................................................................................................... 11 .1 Transparencies............................................................................................... 21 .2 Requirements ................................................................................................. 3

2 . Introduction................................................................................................................ 52 .1 Background..................................................................................................... 52 .2 Motivation ...................................................................................................... 92 .3 Reference Model Architecture......................................................................1 0

2.3.1 Abstract Objects............................................................................1 02.3.2 Client/Server Properties.............................................................1 02.3.3 Reference Model Modules ..............................................................1 1

3 . Detailed Description of the Reference Model ...........................................................1 53 .1 Bitfile Client ................................................................................................1 53 .2 Name Server.................................................................................................1 63 .3 Bitfile Server...............................................................................................1 7

3.3.1 Bitfile Server Commands..............................................................1 83.3.2 Bitfile Request Processor .............................................................1 93.3.3 Bitfile Descriptor Manager and Descriptor Table........................2 03.3.4 Bitfile ID Authenticator ................................................................2 13.3.5 Migration Manager........................................................................2 13.3.6 Space Limit Manager.....................................................................2 2

3 .4 Storage Server .............................................................................................2 23.4.1 Physical Storage System...............................................................2 23.4.2 Physical Device Manager ..............................................................2 33.4.3 Logical-to-Physical Volume Manager ..........................................2 4

3 .5 Physical Volume Repository ........................................................................2 53 .6 Communication Service................................................................................2 83 .7 Site Manager.................................................................................................2 9

3.7.1 Storage Management......................................................................3 03.7.1.1 Requirements .................................................................3 03.7.1.2 Tools Needed....................................................................3 0

3.7.2 Operations .....................................................................................3 13.7.2.1 Requirements .................................................................3 13.7.2.2 Tools Needed....................................................................3 1

3.7.3 Systems Maintenance ....................................................................3 13.7.3.1 Requirements .................................................................3 13.7.3.2 Tools Needed....................................................................3 1

3.7.4 Software Support ..........................................................................3 23.7.4.1 Requirements .................................................................3 23.7.4.2 Tools Needed....................................................................3 2

3.7.5 Hardware Support.........................................................................3 23.7.5.1 Requirements .................................................................3 23.7.5.2 Tools Needed....................................................................3 2

3.7.6 Administrative Control .................................................................3 33.7.6.1 Requirements .................................................................3 33.7.6.2 Tools Needed....................................................................3 3

4. References.....................................................................................................................3 45. Glossary ........................................................................................................................3 7

Version 4 (May, 1990) 1

1 . Preface

The purpose of this reference model is toidentify the high-level abstractions thatunderlie modern storage systems. The i n -formation to generate the model was co l -lected from major practitioners who havebuilt and operated large storage facil it ies,and represents a distillation of the wisdomthey have acquired over the years. Themodel provides a common terminology andset of concepts to allow existing systems tobe examined and new systems to be discussedand built. It is intended that the model andthe interfaces identified from it will allowand encourage vendors to develop mutual ly-compatible storage components that can becombined to form integrated storage systemsand services.

The reference model presents an abstractview of the concepts and organization ofstorage systems. From this abstraction w i l lcome the identification of the interfaces andmodules that will be used in IEEE storagesystem standards. The model is not yetsuitable as a standard; it does not containimplementation decisions, such as how a b -stract objects should be broken up intosoftware modules or how software modulesshould be mapped to hosts; it does not givepolicy specifications, such as when fi lesshould be migrated; does not describe howthe abstract objects should be used o rconnected; and does not refer to specifichardware components. In particular, it doesnot fully specify the interfaces.

A storage system is the portion of a com-puting facility responsible for the long-term storage of large amounts of in forma-tion. It is usually viewed as a shared faci l i tyand has traditionally been organized aroundspecialized hardware devices. It usuallycontains a variety of storage media thatoffer a range of tradeoffs among cost, p e r -formance, reliability, density, and powerrequirements. The storage system includesthe hardware devices for storing in forma-tion, the communication media for t rans-ferring information, and the software

modules for controlling the hardware andmanaging the storage.

The size and complexity of this software i soften overlooked, and its importance i sgrowing as computing systems becomelarger and more complex. Large storagefacilities tend to grow over a period of yearsand, as a result, must accommodate a co l -lection of heterogeneous equipment from avariety of vendors. Modern computing f a -cilities are putting increasing demands ontheir storage facilities. Often, large num-bers of workstations as well as specializedcomputing machines such as mainframes,mini-supercomputers, and supercomputersare attached to the storage system by acommunication network. These computingfacilities are able to generate both largenumbers of files and large files, and therequirements for transferring informationto and from the storage system oftenoverwhelms the networks.

The type of environment described above i sthe one that places the greatest strain on astorage system design, and the one that mostneeds a storage system. The abstractions i nthe reference model were selected to ac -commodate this type of environment. Whilethey are also suitable for simpler env i -ronments, their desirability is perhaps bestappreciated when viewed from the p e r -spective of the most complicated env i -ronment.

There is a spectrum of system architec-tures, from storage services being suppliedas single nodes specializing in long-termstorage to what is referred to as “ f u l l ydistributed systems”. The steps in thisspectrum are most easily distinguished bythe transparencies that they provide, wherethey are provided in the site configuration,and whether they are provided by a siteadministrator or by system managementsoftware. The trend toward distributed sys -tems is appealing because it allows a l lstorage to be viewed in the same way, aspart of a single, large, transparent storage

2 Mass Storage System Reference Model:

space that can be globally optimized. This i sespecially important as systems grow morecomplex and better use of storage is r e -quired to achieve satisfactory performancelevels. Distributed systems also tend tobreak the dependence on single, powerfulstorage processors and may increaseavailability by reducing reliance on singlenodes.

1 . 1 Transparencies

Many aspects of a distributed system areirrelevant to a user of the system. As aresult, it is often desirable to hide thesedetails from the user and provide a h igher-level abstraction of the system. Hiding de-tails of system operation or behavior f romusers is known as providing transparencyfor those details. Providing transparencyhas the effect of reducing the complexity ofinteracting with the system and therebyimproving the dependability, maintain-ability, and usability of applications.Transparency also makes it possible tochange the underlying system because thehidden details will not be embedded in a p -plication programs or operating practices.

The disadvantage of using transparency i sthat some efficiency can be lost in resourceusage or performance. This occurs becausethe mechanism that provides the t rans-parency masks semantic information andcauses the system to be used conservatively.High-performance data base systems, f o rexample, may need to organize disk storagedirectly and schedule disk operations to gainperformance, rather than depend on lower -level file systems with their own structure,scheduling, and policies for caching andmigration.

There is a range of support that can beprovided for distributed systems in acomputer network. A system with fewtransparencies is often called a networkedsystem. The simplest kind of networkedsystem provides utilities to allow a programto be started on a specified host and i n f o r -mation to be transferred between specifiedstorage devices. Examples include TELNETand FTP, respectively. This type of systemrarely provides support for heterogeneity.At the other end of the spectrum are fu l l ydistributed systems that provide many

transparencies. An example is LOCUS. Indistributed systems, a goal is for worksta-tions to appear to have unlimited storage andprocessing capacities.

System and application designers must thinkcarefully about what transparencies will beprovided and whether they will be manda-tory. It is possible for applications toprovide certain transparencies and notothers. Fundamental transparencies can beimplemented by the system, saving eachuser from re-implementing them. Acommon implementation will also improvethe likelihood that the transparency will beimplemented efficiently.

The common transparencies are:

AccessClients do not know if objects or servicesare local or remote.

ConcurrencyClients are not aware that other clientsare using services concurrently.

Data representationClients are not aware that different datarepresentations are used in differentparts of the system.

ExecutionPrograms can execute in any locationwithout being changed.

Faul tClients are not aware that certain faultshave occurred.

Ident i tyServices do not make use of the identity oftheir clients.

LocationClients do not know where objects o rservices are located.

Migrat ionClients are not aware that services havemoved.

NamingObjects have globally unique names whichare independent of resource and accessorlocation.


PerformanceClients see the same performance r e -gardless of the location of objects andservices (this is not always achievableunless the user is willing to slow downlocal performance).

ReplicationClients do not know if objects or servicesare replicated, and services do not knowif clients are replicated.

SemanticThe behavior of operations is independentof the location of operands and the type offailures that occur.

SyntacticClients use the same operations and p a -rameters to access local and remote o b -jects and services.

Some of the transparencies overlap or i n -clude others.

With this in mind, it is incumbent upon theStorage System Standards Working Group toidentify interfaces and modules that areinvariant from single storage nodes to f u l l ydistributed systems. Many sites are notlikely to embrace fully distributed systemsin a single step. Rather, they are likely toevolve gradually as growing system size andcomplexity dictate and as vendors makeavailable products supporting fully d i s -tributed systems.

1 . 2 Requirements

Modern computing facilities are large andcomplex. They contain a diverse collection ofhardware connected by communicationnetworks, and are used by a wide variety ofusers with a spectrum of often-conflictingrequirements. The hardware includes arange of processors from personal com-puters and workstations to mainframes andsupercomputers, and many types of storagedevices such as magnetic disks, opticaldisks, and magnetic tapes. This equipment i stypically supplied by a variety of vendorsand, as a result, is usually heterogeneous.Both the hardware characteristics and theuser requirements make this type of facilityextremely complicated.

To insure that the reference model appliesto many computer environments, the IEEETechnical Committee on Mass StorageSystems and Technology identified the f o l -lowing requirements:

• The model should support both cen-tralized and distributed hierarchical,multi-media file systems.

• The model should support the simplestrandomly addressable file abstractionout of which higher level file s t r uc -tures can be created (e.g., a segment ofbits or bytes and a header of a t -tr ibutes).

• Where the defined services are a p -propriate, the model should use n a -tional or international standard p r o -tocols and interfaces, or subsetsthereof.

• The model should be modular such thatit meets the following needs:

- The modules should make sense toproduce commercially.

- It should be reasonable to integratemodules from two or more vendors.

- The modules should integrate wi theach other and existing operatingsystems (centralized and d i s -tributed), singly or together.

- It should be possible to build h i e r -archical centralized or distributedsystems from the standard modules.The hierarchy might include, f o rexample, solid state disks, rotatingdisks (local and remote), an on-l inelibrary of archival tape cartridgesor optical disks, and an of f - l ine,manually-operated archival vault.

- Module interfaces should remain thesame even though implementationsmay be replaced and upgraded overtime.

- Modules should have standardizedinterfaces hiding implementationdetails. Access to module objectsshould only be through these i n t e r -faces. Interfaces should be specified


by the abstract object data s t ruc -tures visible at those interfaces.

- Module interfaces should be mediaindependent.

• File operations and parameters shouldmeet the following requirements:

- Access to local and remote resourcesshould use the same operations andparameters.

- Behavior of an operation should beindependent of operand location.

- Performance should be as indepen-dent of location as possible.

- It should be possible to read andwrite both whole files and a r b i -trary-sized, randomly-accessiblepieces of files.

- The model should separate policy andmechanism such that it supportsstandard as well as vendor- or s i t e -specific policy submodules and i n -terfaces for access control, ac -counting, allocation, site manage-ment, security, and migration.

- The model should provide for de-bugging, diagnostics, and mainte-nance.

- The model should support a r e -quest/reply (transaction) orientedcommunication model.

- Request and data communicationassociations should be separated tosupport high speed direct source todestination data channels.

- Transformation services (e.g.translation, check summing, e n -cryption) should be supported.

• The model should meet the followingnaming requirements:

- Objects should have globally unique,machine-oriented names which areindependent of resource and accesslocation.

- Each operating system or site e n -vironment may have a differenthuman-oriented naming system,therefore human- and machine-oriented naming should be clearlyseparated.

- Globally unique, distr ibutivelygenerated, opaque file identifiersshould be used at the c l ien t - to -storage-system interface.

• The model should support the followingprotection mechanism requirements:

- System security mechanisms shouldassume mutual suspicion betweennodes and networks.

- Mechanism should exist to establishaccess rights independent of location.

- Access list, capability or other site,vendor, or operating system specificaccess control should be supportable.

- Security or privacy labels shouldexist for all objects.

• The model should support appropriatelock types for concurrent file access.

• Lock mechanisms for automatic m i -gration and caching (i.e., mult iplecopies of the same data or files) shouldbe provided.

• The model should provide mechanismsto aid recovery from network, client,server crashes and protection againstnetwork or interface errors. In p a r -ticular, except for file locks, the f i leserver should be stateless (e.g., nostate maintained between “open” and“close” calls).

• The model should support the concept offixed and removable logical volumes asseparate abstractions from physicalvolumes.

• It should be possible to store one o rmany logical volumes on a physicalvolume, and one logical volume shouldbe able to span multiple physical v o l -umes.


2 . Introduction

2 . 1 Background

From the early days of computers,“storage” has been used to refer to thelevels of storage outside the central p r o -cessor. If “memory” is differentiated to beinside the central processor and “storage”to be outside, (i.e., requiring an i npu t -output channel to access), the first level ofstorage is called “primary storage”(Grossman 89). The predominant tech-nology for this level of storage has beenmagnetic disk, or solid-state memory con-figured to emulate magnetic disks, and w i l lremain so for the foreseeable future i nvirtually every size of computer systemfrom personal computers to supercom-puters. Magnetic disks connected directly toI/O channels are often called “local” diskswhile magnetic disks accessed through anetwork are referred to as “remote” o r“central” disks. Sometimes a solid-statecache is interposed between the mainmemory and primary storage. Becausenetworks have altered the access to pr imarystorage we will use the terms “local s t o r -age” and “remote storage” to differentiatethe different roles of disks.

The next level of data storage is often amagnetic tape library. Magnetic tape hasalso played several roles:

• On-line archive known as “long termstorage” (e.g., less active storage thanmagnetic disk),

• off-line archival storage (possibly off-s i te) ,

• backup for critical files, and

• as an I/O medium (transfer to and f romother systems).

Magnetic tape has been used in these rolesbecause it has enjoyed the lowest cost -per-bit of any of the widely used technologies. Asan I/O medium, magnetic tape must conformto standards such that the tape can be

written on one system and read on another.This is not necessarily true for archival o rbackup storage roles, where nonstandardtape sizes and formats can be used, eventhough there are potential disadvantages i fstandards are not used even for these p u r -poses.

In the early 1970s nearly every majorcomputer vendor, a number of new com-panies, and vendors not otherwise in thecomputer business, developed some type oflarge peripheral storage device. Burroughsand Bryant experimented with spindles of4-ft diameter magnetic disks. Control Dataexperimented with 12 in. wide magnetictape wrapped nearly all the way around adrum with a head per track. The tape wasmoved to an indexed location and stoppedwhile the drum rotated for operation.(Davis 82 presents an interesting com-parison of devices that actually got to themarketplace.)

Examples of early storage systems are theAmpex Terabit Memory (TBM) (Wildmann75), IBM 1360 Photostore (Kuehler 6 6 ) ,Braegan Automated Tape Library, IBM 3850Mass Storage System (Harris 75, Johnson75), Fujitsu M861, and the Control Data38500. One of the earliest systems to e m -ploy these devices was the Department ofDefense Tablon system (Gentile 71), whichmade use of both the Ampex TBM and theIBM Photostore. Much was learned about thesoftware requirements from this instal la-tion.

The IBM 1360, first delivered in the late1960s, used write-once, read-many(WORM) chips of photographic film. Eachchip measured 1.4 x 2.8 in. and stored 5megabits of data. “File modules” wereformed of either 2250 or 4500 cells of 3 2chips each. The entire process of writing achip, photographically developing it, i n -serting the chip in a cell, and a cell in a f i lemodule, storing and retrieving for read,etc., was managed by a control processorsimilar to an IBM 1800. The complex


chemical and mechanical processing r e -quired considerable maintenance expertiseand, while the Photostore almost never lostdata, the maintenance cost was largely r e -sponsible for its retirement. A terabit sys -tem could retrieve a file in under 10 sec-onds.

The TBM, first delivered in 1971, was amagnetic tape drive that used 2- inch-widemagnetic tape in large 25,000-foot reels.Each reel of tape had a capacity of 44 giga-bits and a file could be retrieved, on theaverage, in under 17 seconds. With twodrives per module, a ten module (plus twocontrol modules) system provided a terabitof storage. The drive was a digital r e -engineering of broadcast video technology.The drive connected to a channel through acontroller, and cataloging was the respon-sibility of the host system.

The Braegan Automated Tape Library wasfirst delivered in the mid 1970s and con-sisted of special shelf storage housingseveral thousand half-inch magnetic tapereels, a robotic mechanism for moving reelsbetween shelf storage and self-threadingtape drives, and a control processor. Thisconceptually simple system was originallydeveloped by Xytecs, sold to Calcomp, andthen to Braegan. In late 1986, the produc-tion rights were acquired by Digital StorageSystems, Longmont, Colorado. Sizes vary,but up to 8,000 tape reels (9.6 terabits)and 12 tape drives per cabinet are f a i r l ycommon.

The IBM 3850 (Johnson 75, Harris 7 5 )used a cartridge with a 2.7- inch-wide,770-in. long magnetic tape. A robotic c a r -tridge handler moved cartridges betweentheir physical storage location (sometimescalled the honeycomb wall) and read/writedevices. Data accessed by the host was stagedto magnetic disk for host access. De-stagingthe changed pages (about 2 megabits) oc -curred when those pages became the leastrecently used pages on the staging disks.Staging disks consisted of a few real diskdevices, which served as buffers to the e n -tire tape cartridge library. The real diskswere divided into pages and used to make upmany virtual disk devices that could appearto be on-line at any given time.

Manufactured by Fujitsu and marketed i nthis country by MASSTOR and Control Data,the M861 storage module uses the same datacartridge as the IBM 3850; however, it i sformatted to hold 175 megabytes per c a r -tridge. The M861 holds up to 316 c a r -tridges and provides unit capacity of 0 .44terabits. The physical cartridges are storedon the periphery of a cylinder, where arobotic mechanism picks them for the read-write station. The unit achieves about 1 2 -second access time and 5 0 0mounts/dismounts per hour.

A spectrum of interconnection mechanismswas described (Howie 75) that included:

• The host being entirely responsible f o rspecial hardware characteristics of thestorage system device,

• the device characteristics beingtranslated (by emulation in the storagesystem) to a device known by the hostoperating system, and

• the storage system and host softwarebeing combined to create a general sys -tem.

This has sometimes been termed movingfrom tightly coupled to loosely coupled sys -tems. Loosely coupled systems use messagepassing between autonomous elements.

The evolution of the architectural view ofwhat constitutes a large storage system hasbeen shaped by the growth in shear size ofsystems, more rapid growth of interactiverather than batch processing, the growth ofnetworks, distributed computing, and thegrowth of personal computers, worksta-tions, and file servers.

Many commercial systems have trackedgrowth rates of 60-100% per year overmany years. As systems grow, a number ofthings change just because of size. It b e -comes difficult for large numbers of peopleto handle tape reels, so automating thefetching and returning and the mounting anddismounting of reels becomes important. Assize increases, it also becomes more d i f f i -cult for humans to decide which devices touse for load balancing.


Because of this growth, early users ofstorage systems were forced to do much ofthe systems integration in their own siteenvironments. Large portions (software andhardware) of many existing systems(Gentile 71, Penny 73, Fletcher 7 5 ,Collins 82, Coleman 84) were developed byuser organizations that were faced with theproblem of storing, retrieving, andmanaging trillions of bits and catalogingmillions of files. The sheer size of suchstorage problems meant that only organi-zations such as government laboratories,which possessed sufficient systems engi-neering resources and talent to complete theintegration, initially took on the develop-ment task. These individualized develop-ments and integrations resulted in storagesystems that were heavily intertwined wi thother elements of each unique site.

These systems initiated an evolution i nstorage products in which three stages arereadily recognizable today. During the f i r s tstage, a storage system was viewed as a verylarge peripheral device serving a singlesystem attached to an I/O channel on acentral processor in the same manner asother peripheral devices. Tasks to catalogthe files and free space of the device, managethe flow of data to and from it, take care ofbackup and recovery, and the many otherfile management tasks, were added as a p -plication programs within the systemsenvironment. Many decisions, such as whento migrate a file, were left to the user or toa manual operator. If data was moved f romthe storage system to local disk, two hostchannels (one for each device) were r e -quired plus a significant amount of mainmemory space and central processing ca-pability (Davis 82).

During this stage, the primary effort i ndesign was machine-room automation toreduce the need to manually mount anddismount magnetic tapes.

The second stage (late 1970s to present)has been characterized by centralizedshared service that takes advantage of theeconomies of scale and provides file servernodes to serve several, perhaps heteroge-neous, systems (Svobodova 84).

This stage of the storage system evolution i sthe one that is most prevalent today. The

storage system node entails using a controlprocessor to perform the functions of thereference model in a storage hierarchy(O'Lear 82). The cost of the storage systemmakes it desirable to share these centralizedfacilities among several computational sys -tems rather than provide a storage systemfor each computational system. This i sespecially true when supercomputers b e -come a part of the site configuration.

This approach to providing storage hasseveral advantages:

• The number of processors that haveaccess to a file is larger than that whichcan share a peripheral device. (Thistype of access is not the same as sharinga file, which implies concurrent ac -cess.)

• Multiple read-only copies of data can beprovided, circumventing the need for alarge number of processors havingaccess to common storage devices.

• Processors of different architecturescan have access to common storage, andtherefore to common data, if they areattached to a network and use a commonprotocol for bit-stream transfer.

• The independence between the levels ofstorage allows the inclusion of newstorage devices as they become com-mercially available.

Some of the earliest systems in this sharedservice stage were at the Lawrence BerkeleyNational Laboratory (Penny 70) andLawrence Livermore National Laboratory(LLNL) (Fletcher 75, Watson 80).

The Los Alamos Common File System(Collins 82, McLarty 84) and the system atthe National Center for AtmosphericResearch (Nelson 87, O'Lear 82) are morerecent examples of shared, centralizedstorage system nodes.

The third stage is the emerging distributedsystem. An essential feature of a distributedsystem (Enslow 78, Watson 81a, 84, 8 8 )is that the network nodes are autonomous,employing cooperative communication wi thother nodes on the network. The control


processors of storage systems developedduring this stage provide this capability.

The view of a storage system as a distributedstorage hierarchy is neither a device nor asingle service node, but is the integration ofdistributed computing systems and storagesystem architectures with the elements thatprovide the storage service distributedthroughout the system. The distributedcomputing community has been very i n -terested in the problems of providing f i lemanagement services, albeit generally onsmaller systems (Almes 85, Birrell 8 2 ,Brownbridge 82, Donnelley 80, Leach 8 2 ,Svobodova 84, Watson 81a). Probably thebest known example at the workstation levelis the SUN Microsystems “network f i leserver” (Sandberg 85).

Several elements are necessary for a systemto be classed as “distributed” (Enslow 78):

• A multiplicity of general-purposeresource elements,

• the distribution of these elements,logically and physically,

• a distributed (network) operating sys -tem,

• system transparency (service requestsby name only), and

• cooperative communication amongelements (nodes).

Achieving all of these elements sounds d i f -ficult and expensive. The motivations mostoften cited are extensibility, availabil i ty,and costly resource sharing (LeLann 8 1 ) .Readily extensible systems permit the “ho twiring” necessary in large systems that canno longer afford downtime for cabling i nnew elements. Extensibility also means thatindividual elements can be upgraded withoutdisrupting the entire system. Systemavailability is obtained by replicating sys -tem elements in a way that permits gracefuldegradation. Sharing costly elements occursthrough communications and networking.

The issues involved in designing distributedsystems with the characteristics outlinedabove were discussed by Dr. Richard W.Watson of the Lawrence Livermore National

Laboratory at the Eighth IEEE Mass StorageSymposium in Tucson, Arizona, May 1987.He stated (Watson 87) that the long-rangegoal is to design systems in which“mainframes, minicomputers, worksta-tions, networks, multiple levels of storage,and input/output systems are viewed aselements of a logically single distributedcomputer whose resources are managed byand accessed through a single distributedoperating system.”

Individual operating systems have their ownway of handling files. One reason for r e -quiring a distributed operating system is toprovide a single logical file and namingsystem. This distributed file system shouldbe accessed by name only; that is, thenaming and heterogeneous features of d i f -ferent component parts should be t rans-parent to the user. Logically, the the d i s -tributed storage system should have inf ini tecapacity and unlimited file size. This i sobtained through the use of migration amongthe distributed storage elements that makeup the storage hierarchy. The differentlevels of storage probably have differentstorage characteristics and costs.

Other design goals include high re l iab i l i tyand availability, high performance ( lowdelay and high throughput), mandatory anddiscretionary access control, file sharingand safe concurrent access (Lantz 85), andaccounting and administrative controls(Mullender 84).

It now appears that an attainable goal is todesign interconnected systems, whosesubsystems can be produced by a number ofvendors, such that the file service is u n i -form from the user's local level through a l llevels of the on-line hierarchy to shelfstorage. Internally, the distributed h i e r -archical storage system will consist ofmultiple levels of storage such as bulksemiconductor memories, magnetic disks,magnetic tape, optical disks, automatedmedia libraries, and manual vaults. Such asystem is currently under development atLawrence Livermore National Laboratory(Coleman 84, Foglesong 90, Gary 90, Hogan9 0 ) .


2 . 2 Mot ivat ion

The central architectural features of thereference model and the motivation for themcan be summarized as follows:

• An object-oriented description allowsthe identification of a modular set ofstandard services and standardizedclient/server interfaces. The referencemodel servers are potentially viablecommercial products and are buildingblocks for higher-level services andrecursive integration in centralized,shared, or distributed hierarchicalstorage systems. This integration can bedone within single-vendor systems, bythird-party, value-added system i n -tegrators, or by end-user organiza-tions. The object-oriented modularityhides implementation details, allowingmany possible implementations i nsupport of the standard abstract objectsand interfaces (Booch 86).

• For the storage system to be integratedwith applications and operating systemssupporting many different internal f i lestructures, the abstract object visibleto storage-system clients is an u n i n -terpreted string of bits and a set ofattributes.

• The separation of human-orientednaming from machine-oriented f i leidentifiers allows integration wi thcurrent and future operating systemsand site-dependent naming systems.This implies separation of the nameserver as a separate module associatedwith the reference model (Watson81b) .

• The separation of access rights controlas a site-specific module with a stan-dard interface to the storage systemaccommodates the many operating sys -tems and site-dependent access controlmechanisms in existence.

• The separation of the request and datacommunication paths supports existingpractices and the need for th i rd -par tycontrol of transfers between two en t i -ties by direct data transfers f romsource to sink, as well as data transfer

redirection and pipelining through suchdata transformations as encryption,compression, and check-summing.

• The separation of the site manager a l -lows site-dependent policies and statusto be managed. Provision is made f o rstandard site-management interfacefunctionality.

• Inclusion of a migration server wi th inthe file server allows each file serverto be self-contained and f i le-migrat ionpolicies for each server to be estab-lished separately. It also facilitatesbuilding a hierarchical storage systemsupporting automatic migration b e -tween servers. The general goal is tocache the most active data on the fasteststorage servers and the least active onstorage servers with the lowest cost-per-bit medium.

It is envisioned that the modules of thereference model can be integrated in variouscombinations to support a variety of storageneeds from single storage systems to d i s -tributed, hierarchical systems supportingautomatic file migration. Vendors can buildand market individual standard modules o rintegrated systems supporting standardinterfaces and functionality. Hopefully, thedevelopment of standards will increasemarkets and lead to modules and systemsmanufactured in larger numbers, thusreducing costs as a result of mass produc-tion economies.

To better understand the modularization andthe requirements placed on interfaces butnot to force a particular design philosophy,the discussion in this document does notrestrict itself to external interfaces andservices as might be expected of a referencemodel. The intent is not to standardize theinternal structure of modules, since this i simplementation- and vendor-specific, butto provide additional understanding to aidthe model building, interface standardiza-tion, and implementation processes.


2 . 3 Reference ModelArch i tec ture

2 . 3 . 1 Abstract Objects

To follow the description of the referencemodel, there are several concepts thatshould first be established. These conceptsemploy the properties of abstract objects(Watson 81a), which have been succinctlylisted as :

• Objects are an instance of a type ( f i l e ,process, directory, account, etc.). Assuch, an object type is defined by:

- An identifier.

- A logical representation visible at aninterface (e.g., a logical represen-tation of a file is a set of attributesand a data segment of uninterpretedbi ts) .

- A set of operations or functions andassociated parameters presented atthe interface to create, destroy, o rmanipulate the object.

- Specification of sequences of opera-tions that are allowed.

• Objects are managed by servers. Therecan be many servers for a given type(e.g., there can be many file man-agers).

• Objects are of two basic classes, activeand passive. To be manipulated, passiveobjects (such as files, directories, o raccounts) must be acted on by requestsfrom active objects presented at theserver interfaces. Active objects,which are mainly processes, can d i -rectly change aspects of their ownrepresentation. Active objects can playeither or both of two roles, a client roleaccessing a service, and a server roleproviding a service or managing a typeof object.

• Objects are named via an identificationscheme with a machine-oriented namethat is unique throughout an env i -ronment. This identification schememay be used in conjunction with p r o -

tection and resource managementschemes. Human-oriented naming i simplemented by separate name serversthat associate mnemonic, human-or i -ented names with the machine-orientedobject identifiers. Higher-level f i leservices might integrate the name andfile services.

• Access to objects is controlled by theserver through access lists, capabil i-ties, or other techniques.

2 . 3 . 2 Client/Server Properties

The client/server model (Watson 81a) is anobject-oriented paradigm. Simply stated,both clients and servers are active abstractobjects in which the client requests s e r -vices from the server through a specifiedinterface. The word client is used to meanthe program that accesses some service. Theword user is reserved to mean the human atthe terminal. A client is an agent of a user.The server is a provider of a service. Accessto server-supported objects or services i sonly through defined server interfaces, thushiding implementation details to providetransparency. Both the client and the servermay be processes or collections of p r o -cesses. These processes are not necessarilyassociated with any particular host machine.We describe the client/server interactionsin terms of messages, but it is understoodthat local or remote procedure calls(Birrell 82) or other communicationsparadigms are possible.

A server may be thought of as a collection ofone or more tasks or processes(concurrently executing instructionstreams). A server may include requestprocessing and other tasks supportingconcurrent handling of requests from manyclients. Clients may also be constructed asmany cooperating, concurrent tasks.

Client and server processes interact bysending each other messages, in the form ofrequests and replies. A message is thesmallest unit of data that can be sent andreceived between a pair of correspondentsfor a meaningful action to take place. Aclient process accesses a resource bysending requests containing the operationspecification and appropriate parametersfrom one of its ports to a server port. A


given process can operate in both server o rclient roles at different times (Watson 8 4 ) .For example, during the migration of f i les,a file server that manages magnetic diskscan play the role of client to a file serverthat manages magnetic tapes. Another e x -ample is a name server that stores its ca t -alogs in a file server.

A distinction is drawn between the wordsserver and service. A service may includeseveral servers (Svobodova 84). For e x -ample, a directory service might be i m -plemented by having separate name serversfor objects such as files and for other o b -jects such as users addresses or printers.On the other hand, one might implement auniversal directory to provide the wholedirectory service (Lantz 85), or one mightchoose to implement the file service definedby the ISO-OSI Virtual Filestore (DIS8571), where this reference model servesthe unstructured file segment. Thus acomplete file service will likely consist ofname servers and multiple file servers.

2 . 3 . 3 Reference Model Modules

The primary reference model modules,shown in Figure 1, are:

• The bitf i le* server, which handles thelogical aspects of bitfile storage andretr ieval ,

• the storage server, which handles thephysical aspects of bitfile storage, and

• the physical volume repository, whichprovides manually or robotically r e -trievable shelf storage of physicalmedia volumes.

Closely related to these modules are:

• The bitfile client, which is the p r o -grammatic agent of the user required to

*The word "bitfile" was coined by the IEEE-CS Technical Committee on Mass StorageSystems and Technology to refer to a b i tstring that is completely unconstrained bysize and structure; it was coined to relievethose who worked on the model from beingbound by any particular file managementsystem.

convert user desires into bitfile r e -quests to the bitfile server and datatransfer commands to the b i t f i lemover,

• the bitfile mover, which provides thecomponents and protocols for h igh-speed data transfer,

• the name server, which provides theretention of bitfile IDs and the con-version of human-oriented names tobitfile IDs, and

• the site manager, who monitors oper-ations, collects statistics, and estab-lishes policy and exerts control overpolicy parameters and site operation.

These modules are not directly associatedwith any particular hardware or softwareproducts.

The modularity of the reference model de-fines a virtual store for bitfiles. The storagesystem can be implemented with manylevels of storage hierarchy, including aphysical volume repository. The structureof the model permits standard interfaces andmultiple instances of modules, and thusshould facilitate more economical imp le -mentation of many forms of storage a rch i -tectures. There may be many different i n -stances of bitfile server and storage servercombinations in which storage servers neednot be of the same technology and can form ahierarchy.

The bitfile client represents the programobject or agent that accesses bitfiles. Thisagent is not the application but acts for theapplication. The bitfile client can take manyforms depending on how the storage systemis implemented and integrated into a p a r -ticular user environment; it might be oneor more application programs or be func-tionally supported within an operating sys -tem to facilitate access to storage. The b i t -file client may run on personal computers,workstations, or on larger machines. Thebitfile client can also be a part of a dataacquisition system needing the services of astorage system. The bitfile client can locatebitfiles by use of a name server. The user's


Figure 1

The Storage System Reference Model

User Name

BitfileMover

BitfileClient

BitfileServer

StorageServer

PhysicalVolume

Repository

NameServer

Move Complete

MoveCommand

Client Requests

Client Replies

SS Requests

SS Replies

BitfileTransfer

PV Request

PV Reply

SiteManager

To/FromAll Modules

Site Control Site Monitor

Bitfile ID

Application

Interface

BitfileMover

Move Complete

MoveCommand

BitfileTransfer

Media Volumes

human-oriented bitfile names are mapped tobitfile IDs and bitfile server addresses bythe name server.

It is the interaction of the bitfile client w i ththe bitfile server, the bitfile server i n -teraction with the storage servers, and thestorage server interaction with the physicalvolume repository that are of part icularinterest. There may be any number of b i t -file clients in the general system env i -ronment of a site. Furthermore, b i t f i leservers or other entities of the total storageenvironment, such as name servers, sitemanagers, or migration modules, may o p -erate in the role of bitfile clients when theyneed storage service themselves.

A bitfile server handles the logical aspectsof the storage and retrieval of bitfiles. Thebitfile server's abstract object is the b i t -file, identified by a globally unique, m a -

chine-oriented bitfile ID. A bitfile is a set ofattributes (state fields) and an un in te r -preted, logically contiguous segment of databits.

A bitfile server may keep track of the b i t -files stored in one or more storage servers.A single bitfile server may control ahierarchy and need the services of severalstorage servers. As an alternative, a singlebitfile server may handle the bitfiles in asingle level of the storage hierarchy or asingle storage technology; multiple b i t f i leserver-storage server pairs simplify e x -tensibility and evolution.

The bitfile server accepts requests f rombitfile clients to create, destroy, store, andretrieve bitfiles, and to modify and i n t e r -rogate the bitfile attributes needed for sys -tem management. The bitfile server containsa request processor to parse the requests


and control the sequence of actions neces-sary to fulfill the requests. Before p e r -mitting access to a bitfile, the bitfile serverauthenticates the access rights of the r e -questor.

The bitfile server communicates actioncommands to the storage server. Each b i t f i leserver contains a migration manager toprevent overflow of the storage space f o rwhich it is responsible. The migrationmanager knows which bitfile server is usedto offload bitfiles, as established by the sitemanager and by migration and cachingpolicies.

The storage server handles the physicalaspects of bitfile storage and retrieval, andpresents the image of perfect media to thebitfile server. (The capacity of the media,influenced by imperfections, may be visibleto the bitfile server.) The storage server'sabstract objects, as seen by the b i t f i leserver, are logical volumes made up of anordered set of bit string segments.

Physical volumes and volume serial num-bers are not visible to the client. The sitemanager, however, may have privilegedstorage server commands where physicalvolumes and devices are treated as visibleobjects of the storage server. Volumes anddevices are identified by volume and deviceIDs.

Bit stream segments are identified bysegment descriptors. These segments r e p -resent how the storage server has allocatedspace for the bitfile data blocks. Eachsegment is identified by a descriptor gen-erated by the storage server, and theordered set is retained as a bitfile attributeby the bitfile server. The storage servermust internally map bit string segments toreal physical volumes (removable or not)and addresses where the bitfiles are stored,provide read/write (and some error man-agement) of those volumes, and be able toaccess a bitfile mover to transmit and r e -ceive bitfile data blocks. One or more logicalvolumes may be mapped to a given physicalvolume, or one logical volume may bemapped to several physical volumes. Thus, astorage server contains storage devices,device-specific controllers that map b i t -files to bit string segments on physicalvolumes, and a means for handling and

managing physical volumes on the storagedevices.

The physical volume repository servermanages a library that stores physicalvolumes such as tape reels, tape cartridges,or optical disk platters. Physical volumes,identified by physical volume IDs, are i t sabstract objects. A physical volume repos-itory server can be used by one or morestorage servers.

The site manager is a client process that cangenerate and send ordinary and privilegedrequests to the other servers to set policyparameters, install logical and physicalvolumes (import, export), obtain statisticsand status, run diagnostics, etc.

The various clients and servers are i n t e r -connected through a communication service,which must handle all of the inter-processcommunications involved in requests andreplies, as well as the high-speed transportof bitfiles. The elements of the communi-cations service may be distributed throughthe many physical processors of the site.The reference model does not specify thedetails of this service but does assume i t sexistence whether by procedure call, r e -mote procedure call, or message passing. Inparticular, the model does not requirehomogeneity of this communication service.The communication service can includemovement across I/O channels as well asacross networks. The model assumes theability to separate data movement f romrequest movement. All that is required i sthat entities that must communicate are, i nfact, able to do so and that standard i n t e r -faces are supported. The model has separatedauthentication details to allow each site toinstall the form appropriate to that site.

Existing storage systems often include ahigh-performance data path of some type,often specially designed, to handle the h igh-volume, high-speed data flow between thebitfile client and the storage server. In thereference model, the need for a high-speeddata path is incorporated as part of thecommunication service referred to as thebitfile mover. This path has been separatedfrom the request path to correspond to e x -isting practice; to facilitate th i rd -par tycontrol of transfers; to facilitate insertionof data transformations such as encryption,


compression, and check-summing; and tosupport transfer redirection. If a generalnetwork is used for the communicationservice, it has to account for this need f o rhigh transmission speed of bitfiles as wel las the communication of requests andreplies.


3 . Detailed Description of the Reference Model

This description of the reference modeldiscusses the entities shown in Figure 1 i nmore detail. In describing the functionsaccepted by each module, input parametersthat are common to all functions are deletedfor clarity. These parameters include theidentification of the client making the r e -quest or other access-control information,accounting information, and a transactionidentifier. Similarly deleted are thetransaction identifier and the success/failindication common to all responses. I ffunctions fail, error information will r e -place the expected responses. Optional p a -rameters that can be defaulted are enclosedin square brackets (Miller 88a, 88b).

3 . 1 Bitfile Client

The bitfile client is the collection ofhardware and software at a user node wi th inthe site to permit that node to use thestorage system. Bitfile clients are respon-sible for providing the storage system i n -terface to users at the terminal or to a p -plication processes by:

• Translating user and application r e -quests for storage services into b i t f i leserver requests, and

• providing communication with theappropriate bitfile servers and moversas determined by the name servermapping.

The bitfile client may be library routineswithin the application, an interface process(local or remote), or routines within anoperating system kernel, and it may com-bine the services provided by mult iplebitfile servers, bitfile movers, and nameservers to form the higher-level abstractobjects of an integrated storage service. Thesyntax and semantics of messages that f lowbetween the bitfile client and the b i t f i leserver should be the same regardless of thetype of bitfile client. These messages iden-tify the bitfile to be acted upon and specify

which of the basic commands and parame-ters are to be processed.

When a bitfile is created, the bitfile ID i sgenerated by the bitfile server and passedback to the bitfile client for retention. Whenthe user accesses an existing bitfile byname, the bitfile client obtains the bitfile IDfrom a name server or some other data basesystem.

Several arrangements of the bitfile clientare possible. In the first arrangement, the“kernel view”, all bitfiles are logicallylocal. The bitfile client is a program in aprocessor with its own operating system andlocal storage. The storage and s i te-man-agement capabilities of the local operatingsystem are what the user sees with respectto how his bitfiles are handled. To the kernelof the operating system is added code thatpermits the operating system to determineif a bitfile being requested is locally storedor remote (Sandberg 85). If it is remote,this special code in the operating systemkernel makes up the messages for the b i t f i leserver and possibly for the name server.Alternatively, the local/remote distinctioncan be made in a library routine, and thelibrary code can act as the bitfile client(Brownbridge 82). In either case, the b i t -file is put into a local user buffer or i scached in an operating system buffer or alocal bitfile and, except for a possible delay,the remote transfer is transparent to theuser.

In a pure “client/server” view, all bi t f i lesare logically remote. Here, all references tobitfiles are translated into messages for abitfile server, using routines in a l i b r a r yor other run-time support facility, such asa remote invocation system. The b i t f i leserver might be local or remote; in a d isk -less workstation for example, the b i t f i leserver is remote. Mapping human-orientednames to machine-oriented bitfile IDs is aseparate, explicit step or is hidden in therun-time support facility (Svobodova 8 4 ,Watson 84).


In a third view, the systems that create andstore bitfiles are separate from the systemsthat retrieve and process the data containedin the bitfiles. Such might be the relat ion-ship between a data acquisition system thatputs bitfiles into the storage system and thesystems in a data processing center that usethem. While there is no name server per se,some means must be provided to retainbitfile IDs when they are returned from thebitfile server and to pass them in someunderstandable way (e.g. using a data basemanagement system) to the processing sys -tems. Such systems must take care to backup their records; if the bitfile IDs are lost,the bitfiles become lost objects in thestorage system.

3 . 2 Name Server

The development of distributed systems hascaused a much more in-depth look atschemes for identifying objects. While theadvent of distributed systems brought thisabout, the requirements recognized are notrestricted to distributed systems. Theyapply to all systems, especially those thatgrow in size. Dealing properly with namingis central to achieving the location t rans-parency needed in a distributed system.Thus, it is advantageous to look at some ofthe properties of identification schemes.

There are many possible ways to designate adesired object (Watson 81b):

• by an explicit name or address (objectx or object at address x),

• by content (object with value or valueexpression x),

• by source (all my files),

• by broadcast identifier (all objects of aclass),

• by group identifiers (participants i nclass x),

• by route (all objects found at the end ofpath x),

• by relationship to a given identif ier(all previous sequence numbers), etc.

A useful informal distinction between threeimportant classes of identifiers widely usedin system design—names, addresses, androutes—is (Shoch 78):

• The name of a resource is what we seek,

• an address indicates where it is, and

• a route tells how to get there.

One should not examine such informal def-initions too closely. Names, addresses, androutes can occur at all levels of the a rch i -tecture. Names used in the inter-processcommunication layer have often been calledsuch terms as ports, or logical or genericaddresses. A human-oriented chain or pathname can be thought of as a “route” througha directory. An important idea is that iden-tifiers at different levels of the architecturereferring to the same object must be boundtogether in contexts, statically or dynami-cally. Later they must be resolved usingthese contexts to locate the named objects.

There are important system benefits i fevery bitfile ID is unique (Leach 82). Lessobvious are the system-wide ramificationsof the total naming system, especially thechoice of the particular mechanism used tocreate unique bitfile IDs and the mechanismto associate application-dependent, human-oriented names with them. Of the many goalsand implications of identification schemesenumerated by (Watson 81b), the goals thatare the most pertinent to the referencemodel are abstracted and discussed below.

The naming system should:

• Support at least two levels of ident i -fiers, one convenient for people and oneconvenient for machines. The latter i sthe bitfile identifier. The former w i l lbe handled by site or operating systemspecification of the name servers or byimbedding a name service in a higherlevel file service.

The separation of identifier levels i svery important because a storage sys -tem must be integrated with many typesof heterogeneous applications and o p -erating and storage systems(centralized and distributed), each


supporting its own form of human-oriented naming scheme. The referencemodel provides a clean separation ofmechanism for these two levels ofidentifiers and allows their easy i n -tegration. (When the client is r e -sponsible for the use of the bitfile ID,there is the potential to create lostobjects in a system and thus mecha-nisms must also be included to assistthe system in identifying them so thatthe resources they use can be r e -claimed.)

• Support distributed generation of m a -chine-oriented, globally-unique b i t f i leidentifiers. A variety of mechanismsare available to support this need(Leach 82, Mullender 84, Watson81b). One mechanism is to include botha bitfile server ID and a time stamp i nthe identifier. This structure, con-taining node or server boundary i n -formation, is at most a hint to app l i -cations as to where to send access r e -quests and should not restrict m ig ra -tion. A machine-oriented identifier is abit pattern easily manipulated andstored by machines and may be directlyuseful with protection, resourcemanagement, and other mechanisms. Ahuman-oriented identifier, on the otherhand, is generally a character str ingthat is readable by humans and that hasmnemonic value. Directory path namesare a common mechanism (McLarty8 4 ) .

• Provide a storage system viewed as aglobal space of identified objects ratherthan as a space of identified host com-puters containing locally-identifiedobjects. Similarly, the identificationmechanism should be independent of thephysical connectivity or topology of thesystem. That is, the boundaries ofphysical components and the connectionamong them as a network, while tech-nologically and administratively real,are invisible in object identifiers.Further, an object's name should beindependent of client or server location.Users should be able to discover o rinfluence an object's location.

• Support relocation of objects. The i m -plication here is that there be at leasttwo lower levels of identifiers and thatthe mapping and binding between thembe dynamic. For example, bitfiles areexpected to migrate. Therefore, thebitfile IDs should not contain storageaddresses, and there must be mecha-nisms for updating the appropriatecontext (e.g. directories and tables)when objects are moved.

• Support use of multiple copies of thesame object. For example, a file may becached on disks at one or more hosts, onstaging disks, or it may be stored on anarchival volume. If the value of theobject is only going to be read or i n -terrogated, one set of constraints i simposed. If values are to be written o rmodified, tougher constraints must beimposed to achieve consistency betweenthe contents of the copies. Policies ofenforcement of such constraints arehandled using the basic locking servicesspecified by the reference model.

• Allow multiple local, user-defined(human-oriented) names for the sameobject by allowing multiple mappingsof a given bitfile identifier within theservices of one or more name servers.

• Support two or more resources sharinga single instance of a storage objectwithout identifier conflicts.

• Minimize the number of independentidentification systems needed across andwithin architectural levels.

3 . 3 Bitfile Server

A bitfile server (Falcone 88) handles thelogical aspects of bitfiles that are physicallystored in one or more storage servers of thestorage system. As shown in Figure 2, themajor components of a bitfile server are abitfile server request processor, a b i t f i ledescriptor manager and its descriptor table,a migration manager, a bitfile ID authen-ticator, and a space limit manager and i t sspace limit table.


MigrationManager

Bitfile IDAuthentication

ClientRequest/

Reply

Bitfile ServerRequest

Processor

BitfileDescriptorManager

To Other

BitfileServers

Site Manager

Account Server

SS Requests& Replies

Storage Servers

Authentication Request Authentication Reply

Command & Bitfile ID

Bitfile Descriptor

SpaceLimit

Manager

Space Limit Table

Figure 2

The Bitfile Server

Bitfile Descriptor Tables

The bitfile server accepts requests f o rservice on bitfiles from the bitfile client,site managers, migration manager, andother bitfile servers. A discussion of theoperations that bitfile clients can request ofthe bitfile server regarding the bitfile f o l -lows. The function parameters are shown i nTable 1.

3 . 3 . 1 Bitfile Server Commands

AbortThe client requests that a previousrequest be aborted.

CreateThis request is used to establish a newentry in the bitfile server's descriptortable. The requestor must prove hisright to do so, and when this is estab-lished, he receives a new bitfile IDfrom the bitfile server, which may

then be saved for use when the bitfile i saccessed later.

DestroyThe client requests that a bitfile de-scriptor be removed from the b i t f i ledescriptor table. The space allocated tothe bitfile within a storage server i sdeallocated and, if the medium can berewritten, the storage server returnsit to the free-space list.

EraseThis request erases data on a storageserver with the specified erasurepattern. If only a segment is to beerased, then the client must specify thelength of the segment and an optionaloffset field displacing the length intothe bitfile.


LockThe client requests that a bitfile belocked for read access or for read/writeaccess.

ModifyThe client requests a change in one o rmore attributes of a bitfile contained i nthe bitfile descriptor table.

QueryThe client requests information about abitfile or a bitfile's attributes from thebitfile descriptor table.

Retr ieveThe client requests that a bitfile or asegment of a bitfile be moved from thestorage server median to the b i t f i leclient or application buffer. If only asegment is to be moved, then the i n -ternal starting bit address (offset) andthe bit string length of the segment tobe transferred must be specified. (Thedata transfer is on a separate logical,and perhaps physical, path from therequest/response path; the data blockitself is not part of the response.)

StatusThe client requests the status of aprevious request made by the client.Although the way in which status i simplemented is site dependent, gener-ally a transaction ID must be generatedto support the status request.

StoreThe client requests that a bitfile datablock be moved from a bitfile client o rapplication buffer to the storage servermedium. This request may include theability to append a segment at the end ofan existing bitfile or to update aphysically specified segment of an e x -isting bitfile. (The data block itself i s not part of the request.)

UnlockThe client requests that any locks heldbe released.

The interface on which this list of commandscan be sent is shown in Figure 1 as b i t f i leclient requests and bitfile server replies.

The site manager will have additionalprivileged requests to control allocatedspace limits, examine all bitfile directoryfields, set access control and migrationpolicy parameters, etc.

3 . 3 . 2 Bitfile Request Processor

The bitfile server request processor ac -cepts, parses, and executes request mes-sages from:

• The bitfile clients, to store, retr ieve,and manage bitfiles,

• the site manager, to provide monitoringand control,

• the migration manager, to move bit f i lesto other bitfile servers, and

• other bitfile servers, to support m i -gration.

The request processor is therefore essen-tially the sequencer and controller of ac -tions within the bitfile server and the i n -terface to the other storage system modules.Requests must be scheduled to provide thebest possible response to the bitfile clientswhile optimizing the use of the availableresources. Client-specified priority, b i t f i lesize, and storage server availability mayaffect the request scheduling. These actionsrequire a significant amount of processing.In executing a request, the request p r o -cessor may interact with the bitfile de-scriptor manager to retrieve, create, o rupdate bitfile attribute information, andwith a storage server to allocate or releaselogical volume space for bitfile storage andto store and retrieve bitfiles. To select astorage server when a bitfile is created, therequest processor must have informationabout the bitfile (bitfile size, the responsedesired, the protection and reliability de-sired, the type of storage desired, etc.) andmust match this information with thecharacteristics of the available storageservers. Bitfile clients might be able tospecify a specific storage server or logicalvolume as well.


Table 1Bitfile Server Functions

Function Parameters Response

Abort Transaction ID

Create [Initial length in bits] Bitfile ID[Maximum length in bits][Attribute Value/Name pair list]

Destroy Bitfile ID

Erase Bitfile ID Number of bits erased[Offset][Length]Erasure pattern

Lock Bitfile IDLock type

Modify Bitfile IDNew attribute name/value pairs

Query Bitfile ID Attribute name/value pair listAttribute name list

Retr ieve Bitfile ID Number of bits transferred[Offset][Length]Data transfer sink ID

Status Transaction ID Transaction status

Store Bitfile ID Number of bits transferred[Offset][Length]Data transfer source ID

Unlock Bitfile ID

The request processor is responsible, usingthe bitfile ID authenticator, for the securityand integrity of the access to bitfiles, andfor synchronizing the sharing of bit f i lesthrough its locking services. The b i t f i lerequest processor collects accounting datafrom all affected sources regarding eachtransaction and sends them to the accountservice. The request processor also com-municates with the space limit manager todetermine that the resources assigned to aparticular account are not overdrawn.

3 . 3 . 3 Bitfile Descriptor Managerand Descriptor Table

State and attribute information for eachbitfile is kept in records in a descriptortable. Each record is called a bitfile de-scriptor. A descriptor manager provides aninterface for the request processor to store,retrieve, and update bitfile descriptors.Bitfile descriptors are accessed using abitfile ID as a key which is assigned by thedescriptor manager when the bitfile de-scriptor is created.


A convenient way to classify bitfile de-scriptors is by origin and usage. Typicalbitfile descriptor classes and some examplesare:

• Created and used by the bitfilec l ient .

- comment- bitfile format

• Created by the bitfile client andused by the bitfile server.

- access-control information- account ID- bitfile lifetime- desired level of redundancy- family attribute- maximum bitfile length- p r i o r i t y- security level- service class- storage class- type of storage desired

• Created by the bitfile server andused by both the bitfile serverand the bitfile client.

- access statistics- accounting statistics- bitfile allocated length- bitfile ID- bitfile length- creation time

• Created and used by the bitfiles e r v e r .

- last backup time- last migration time- location of backup copy- lock information- previous location

• Created by the storage serverand used by bitfile server.

- last device to write bitfile- location of bitfile

The importance of descriptor tables neces-sitates that backup and recovery be sup-ported by the descriptor manager.

3 . 3 . 4 Bitfile ID Authenticator

The bitfile ID authenticator implements amechanism, such as an access list or DESencryption used in a capability system,which protects the bitfile ID from beingforged. It may also enforce security policybased on the security level of the bitfile, therequest message, or the client. The authen-ticator is called by the descriptor managerwhen the bitfile ID is created to support theauthentication mechanism and, when a r e -quest for access to the bitfile is received, toauthenticate the bitfile ID presented by theclient. If the access control is via an accesscontrol list, an identifier of the accessingentity (principal ID) must accompany therequest and be checked against an access l i s tthat is kept, at least logically, in the b i t f i ledescriptor. If access control is via a capa-bility system, the bitfile ID may be e n -crypted along with some redundant and ac-cess-right information within the capa-bility, and decrypted by the authenticatorand compared against information in thedescriptor when the bitfile is accessed. It i sassumed that the authentication module canbe added by a site or systems integratorsince access control mechanisms and secu-rity policies are site-dependent (Jones79b, Donnelley 80, Mullender 84).

3 . 3 . 5 Migration Manager

No single storage server now available canprovide both the performance and largecapacity often needed by a large storagesystem. A successful strategy is for anumber of bitfile servers and their asso-ciated storage servers to be operated as astorage hierarchy.

A migration manager is associated with eachbitfile server. The migration manager i sresponsible for maintaining enough freespace on the logical volumes managed by i t sbitfile server (e.g. disk storage) to ensurethat requests for new bitfiles can behonored. When the migration manager i n i -tiates a migration procedure, it f i r s tqueries the bitfile descriptor manager f o rinformation about all of the bitfiles thatmight be migrated. This information mightinclude the bitfile priority, size, locks,activity, idle time, and client-desired r e -sponse. Bitfile clients may be given d i f -ferent degrees of control, by various site


management policies, over the placement oftheir bitfiles in the storage hierarchy.Using policy set by the site manager, themigration manager determines which b i t -files should be moved. Finally, the m ig ra -tion server sends requests to the b i t f i leserver request processor to move the b i t -files to a bitfile server “lower” in thestorage hierarchy. (Bitfiles move “up” i nthe hierarchy, toward higher-performanceservers, as they are accessed; this move-ment is orchestrated by the bitfile serverrequest processor.)

An alternate configuration may permit themigration manager to act as a th i rd -par tycontroller to initiate the request for a move.The separate request and data paths in thereference model allow data to move directlyfrom a source storage server to a sinkstorage server, even though a third partyinitiated the transfer between the two b i t -file servers. A request path may span two o rmore bitfile servers until the bitfile i slocated. To increase performance duringretrieval, it may be desirable to establish adirect data transfer path, bypassing somestorage servers, once the bitfile has beenlocated. Such might be the case when bit f i lesare accessed on very rare occasions and it i snot economic to bring them back up thehierarchy.

3 . 3 . 6 Space Limit Manager

The space limit manager checks to see whatlogical volumes a given account, user, o ruser group is allowed to use; it controlsspace allocations, number of bitfiles a l -lowed, or other policy parameters associ-ated with space resource management that agiven site may wish to enforce. The spacelimit table has entries for each account o rprincipal ID for maximum and currentspace and bitfile limits.

3 . 4 Storage Server

A storage server (Savage 88) may best bevisualized as an intelligent storage con-troller and its suite of storage devices. Astorage server consists of a physical storagesystem (containing the physical b i t f i l e -storage medium), a logical-to-physicalvolume manager, a physical device manager,a means of command authentication (unless

it is a trusted component of the storagecontrol processor), and some part of or i n -timate connection to the bitfile mover. Adiagram of the storage server is shown i nFigure 3.

The abstract objects of the storage serverthat are visible to the bitfile server arelogical volumes and bit string segment de-scriptors. The descriptors of the spaceoccupied by a bitfile form an ordered set ofbit string segments identified by descrip-tors, each of which contains the logicalvolume ID, the starting point of the segmenton the logical volume, and the length of thesegment. The bit string segment descriptorsare created by the storage server and storedin the descriptor tables of the b i t f i leserver.

Each logical volume is considered to be alogical image of flawless media usable f o rstoring bitfile data blocks, thus providingthe separation of physical and logical space.Separation of logical and physical volumessupports segment relocation when mediafail, where new storage devices are i n t r o -duced, and when space utilization o rtransfer rate are optimized. Any media areathat is unavailable for data storage becauseof flaw areas, formatting, control tracks,etc., is excluded from representation in thelogical volume by the logical-to-physicalmapping function.

The list of operations supported by thestorage server are listed in Table 2. The sitemanager has a number of privileged oper-ations including create, destroy, modify, andquery of logical volumes, physical volumes,and physical devices.

3 . 4 . 1 Physical Storage System

A physical storage system consists of thedevices used to read and write volumes andthe drivers to control those devices (to p o -sition heads properly in relation to themedia before reading or writing, etc.). Theavailable physical storage systems cover abroad spectrum of characteristics in termsof random or sequential access, rewritableor write-once media, capacity, and p e r -formance.


Storage/Retrieve Reply Storage/Retrieve Request

SiteManager

BitfileServer

Physical Device Descriptor Table

Physical Device

Manager

Figure 3

The Storage Server

RequestAuthenticator

Logical-To-PhysicalVolumeManager

Logical-To-Physical Map

Physical Volume Tables

Logical Volume Tables

Storage Server

RequestProcessor

Bitfile Mover PhysicalVolume Repository

Physical Storage System

PhysicalVolume RepositoryPhysicalVolume Repository

PhysicalVolumes

3 . 4 . 2 Physical Device Manager

The physical device manager communicateswith drivers in the physical storage systemto load, unload, and position media volumes(it is the bitfile mover that controls theactual transfer of data).

Physical device managers vary from simplemodules associated with fixed-media de-vices, such as Winchester disks, to complexmodules that deal with manually mounted

volumes, as in systems with standardmagnetic tape drives or automaticallymounted volumes, such as in the IBM 3 8 5 0and the STK 4400 Automated CartridgeLibrary. In automated systems, the physicaldevice manager communicates with aphysical volume repository to request thatphysical volumes be mounted. The physicaldevice manager maintains a mounted volumetable to optimize mount requests. It sched-ules and executes requests in a manner thatattempts to give the desired response to i t s


Table 2Storage Server Functions


SS-Allocate logical volume IDs new segment descriptorsexisting segment descriptorsdesired length

SS-Deallocate existing segment descriptors

SS-Ret r ieve segment descriptors number of bits transferredstarting offsetbit string lengthsink descriptor

SS-Store segment descriptors number of bits transferredstarting offsetbit string lengthsource descriptor

clients while at the same time making thebest use of the storage system and commu-nication resources. For example, it may bedesirable to give higher priority to t rans-fers for which volumes are already mountedor to small bitfile transfers, to limit thenumber of concurrent large bitfile t rans-fers, or to use a client-specified priority.

3 . 4 . 3 Logical-to-Physical VolumeManager

The logical-to-physical volume managermaintains descriptors of attributes for eachlogical and physical volume and a set of t a -bles to permit mapping the bit str ingsegment descriptors of the logical volumesonto physical space in one or more physicalvolumes. The bit string descriptors includevolume serial number, starting point ad-dress, and attributes for each logical andphysical volume. Examples of attributes arecreation time, size, security level, andphysical volume attributes.

The logical-to-physical volume managerunderstands the characteristics of the actualphysical volumes used by the storageserver. Its main functions are to allocateand deallocate space and to convert logical

bit string segments to physical bit str ingsegments for that bitfile. The logical- to-physical volume manager also maintains aflaw map to map, for example, defectivetracks on a magnetic disk to spare tracks(some device controllers maintain flawmaps, making duplicate maps in the volumemanager unnecessary). Similarly, i tmaintains a map of disk tracks or magnetictape block numbers that are used for controland formatting and that are thus unavailablefor data storage. When data is moved wi th inthe storage system because errors start tooccur or new physical devices or volumesare introduced, the map must be changed.

A map of the free and used space is ma in-tained for each logical and physical volume.Space summary information for each v o l -ume may be retrieved to aid in the volumeselection process. This volume informationis retained in the storage tables, which musthave the same reliability and performanceas the directory in the bitfile server, i.e., i tmust be backed up and recoverable or i tmust be possible to build the informationfrom other records. All of this informationis available to the site manager interface.


3 . 5 Physical VolumeRepository

The physical volume repository (Coleman88, Savage 85), shown in Figure 4, storesphysical volumes. It may be manual o rmechanical.

The physical volume repository is r e -sponsible for managing the storage of mediavolumes and for mounting these volumesonto drives managed by the physical devicemanager. Volumes may be stored in an a u -tomated library that includes a robot ca-pable of mounting the volumes or stored in avault and mounted manually.

The architecture of the physical volumerepository is that of a server that managesabstract objects called physical volumes. Aphysical volume consists of a media volumeand a volume descriptor. (A physical volumeis similar to a bitfile in that both include aresource and a resource descriptor.) Thevolume descriptor contains at least the f o l -lowing fields:

• The current physical location of themedia volume. The volume might be in avault, in a storage cell of an automateddevice, mounted on a drive, or held by arobot.

• A human-readable label by which anoperator can identify the volume.

• The media type. One physical volumerepository might manage both magneticand optical media, different varieties ofmagnetic tape, etc.

• Information to identify the owner of thevolume.

• Access-control information to validaterequests. In a capability-based system,this information might be an encryp-tion key. In other systems, this i n -formation might be a list of clientsauthorized to access the volume.

• Various statistics associated with thevolume, such as the number of timesthe volume has been mounted and thetime of the last mount.

Associated with each physical volume is avolume identifier. This identifier, whenincluded in a request, allows the physicalvolume repository to locate the descriptorfor the media volume and, in a capabil i ty-based system, proves that the client is a u -thorized to access the volume. The format ofthe volume identifier is not specified by thereference model. If the medium is opticaldisk and only one side of a physical disk canbe read at a time, there may be a uniquevolume identifier associated with each sideof a disk.

The physical volume repository maintainsthe volume descriptors on a storage deviceto which it has access. The physical volumerepository cannot maintain the volumedescriptor on the volume itself because:

• The reference model does not specifythe format of the data on a volume. Insome implementations, the physicaldevice manager may be able to readvolume labels (using a bitfile mover),but if unlabeled volumes are allowed,only the bitfile client or the ultimateapplication can interpret the contentsof the volume.

• Most types of archival media do notsupport “update in place”, preventingthe physical volume repository f rommaintaining dynamic information, suchas the time of last access, on the volumeitself. Information on WORM opticaldisks, once written, cannot be modified.Some volumes, such as CD-ROMs,cannot be written at all.

• One of the important pieces of i n f o r -mation in the volume descriptor is thephysical location of the volume. One canhardly access the volume to determinewhere it is!


AutomatedLibraryDriver

Storage Server

Volume Descriptor Table

RepositoryRequest

Processor

RequestAuthenticator

Site Manager

Storage Server

OperatorInterface

VolumeDescriptorManager

Figure 4

The Physical Volume Repository

Physical Volumes

The client interface consists of the opera-tions necessary to allow the physical devicemanager to mount and dismount volumes andto allow the site manager to query andchange the state of the repository. The o p -erations and parameters that are unique tothe physical volume repository are listed i nTable 3 and described in the followingparagraphs.

PVR-DequeueAny queued request for the specifiedvolume with the specific wri te-protectmode that includes the specified driveas an acceptable drive is cancelled.

The dequeue function is routinely usedby the physical device manager toremove requests for manually mountedvolumes. Even though the physicalvolume repository maintains the queue

of requested volumes, the physicaldevice manager may be the only moduleable to detect that a volume has beenmounted on a drive not accessible to thephysical volume repository. If an o p -erator inserts a requested volume intoan automated library, the physicalvolume repository will mount thevolume on an available drive; if thephysical volume repository can identifythe volume by reading an external l a -bel, and a request for this volume i squeued, the physical volume repositorywill choose a drive acceptable for thatrequest. Otherwise, the physical v o l -ume repository will choose any drivecapable of handling the volume. In anyevent, the physical volume repositorywill not remove the request from thequeue until it receives a dequeuecommand from the physical devicemanager.


Table 3Physical Volume Repository Functions


PVR-Dequeue volume IDwrite-protect modedrive ID

PVR-Dismount volume IDdrive ID

PVR-E jec t volume IDreason

PVR-Locate volume ID current volume location

PVR-Mount volume ID drive ID or queued for manual mountwrite-protect modelist of acceptable drives

PVR-ReadQueue queue offsetmaximum number of entries to send

PVR-ReadStatus device ID device statustype of status desired

PVR-SetStatus device IDtype of statusdesired value

PVR-DismountThe media volume on the specified driveis dismounted and stored in a locationselected by the physical volumerepository.

The volume identifier is included in thedismount command to allow the physicalvolume repository to update i t srecords; if the physical volumerepository has a mechanism to identifythe volume itself (by reading an e x -ternal label), the volume identif ierserves to confirm the physical volumerepository's records and to detectanomalies.

PVR-E jec tThe volume is removed from the domainof the physical volume repository. The“reason” parameter is an optionalstring to be sent to the operator e x -

plaining why the volume is beingejected.

In an automated system, the Ejectcommand will probably result in thevolume being moved to a port accessibleby the operator.

PVR-LocateThe PVR-Locate function is used todetermine volume locations when, f o rexample, an automated library hasfailed and volumes are being accessedmanually.

PVR-MountA media volume is mounted on a drive.Volumes queued for manual mountingare displayed on an operator console i fthe physical volume repository con-trols such a device (remotely con-trolled consoles can use the PVR-


ReadQueue command described below).Some physical volume repository i m -plementations may allow concurrentrequests in which no volumes of a groupare mounted until all can be mounted,or requests with a choice of volumes.

PVR-ReadQueueFor each request queued for a manualmount, the volume identifier, list ofacceptable drives, and the w r i t e - p r o -tect mode are returned.

Providing a queue offset and a maximumnumber of entries to send in the PVR-ReadQueue command allows the client toreceive only the number of entries thatit can handle. This function also sup-ports operator displays not under thecontrol of the physical volume repos-i tory.

PVR-ReadStatusThe amount and type of status i n f o r -mation is dependent upon the devicescontrolled by the physical volumerepository and upon their configura-tion. Status information might includethe on-line status of the device, thevolume identifier of the volumemounted on the device, current o rprevious error information, configu-ration information, etc.

PVR-SetStatusThe particular status values are de-pendent on the devices controlled by thephysical volume repository. Thisfunction is used to bring devices o n -line, take them off-line, set diagnosticor manual modes, etc.

3 . 6 Communication Service

The communication service includes thecapability to communicate request messagesas well as the bitfile mover (Kitts 88) f o rhigh-speed transfer of bitfile data blocks(Allen 83).

A bitfile mover is a set of modules that movedata from one source/sink to another. Astorage system includes at least two b i t f i lemovers (Figure 1), one controlled by thebitfile client and the other controlled by thestorage server. Additional movers may berequired for more complex routing. Figure5 shows the control and data paths necessaryto move data from source to sink.

A source or sink can be defined as:

• A memory buffer, local or remote,

• a media extent, such as on local o rremote disk, or

• a channel interface connected to a de-vice.

These definitions do not limit the methods ofdata transport used by the bitfile mover o rthe ability to transform data during themove. Because the mover's source and sinkinterfaces depend on the devices, networkinterfaces, and network protocols used bythe site, the reference model does notspecify them. The bitfile mover's controlinterface to the source or sink manager,however, can be specified.

The Move operation supported by the moveris shown in Table 4.

The source and sink descriptors may de-scribe network interfaces, buffer ad-dresses, or device descriptors (device ad-dresses, block information, etc). One de-scriptor is sent by the bitfile client; theother is provided by the storage server.

The transformation description may specifytranslation, compaction, compression,encryption, and/or check-summing to beperformed by the mover.

The site manager interface can, throughprivileged commands, interrogate channelstatus and other mover statistics.


Source/Sink Manager

Control

Data

ControlControl

Data

Source/SinkSource/Sink Bitfile Mover

Figure 5

The Mover

Table 4Mover Function


Move direction number of source bits movedsource descriptor number of sink bits moved

sink descriptortransformation descriptor

3 . 7 Site Manager

Site management (Collins 88) is the co l -lection of functions that are p r imar i l yconcerned with the control, performanceand utilization of the storage system. Thesefunctions are often site-dependent, involvehuman decision making, and span mult ipleservers. The functions may be implementedas stand-alone programs, may be integratedwith the other storage system software, o rmay be policy.

Site management attempts to allocate theresources of the storage system to the bestuse for the overall benefit of the site.Policies for the site must be set, and themanual and automatic procedures must bedeveloped to implement those policies. Theprocedures must be adaptable because therequirements will change as time p r o -gresses and because the same software maybe run at a number of different sites.


For this discussion, the site managementfunctions are grouped into seven areas:storage management, operations, systemsmaintenance, software support, hardwaresupport, administrative control, and b i t f i lemanagement. The format will be to state therequirements and then discuss the toolsneeded to satisfy these requirements f o reach area.

3 . 7 . 1 Storage Management

3.7.1.1 Requirements

The storage management function is con-cerned with providing good performance andreliability for user storage and accessneeds, while utilizing the storage servers i nan efficient and cost-effective manner. Thespecific goals are to optimize the overallperformance of the storage servers, tocontrol placement of bitfiles in the storagehierarchy, to maintain sufficient free spacein each storage server, to control f r ag -mentation of space on volumes, to add anddelete volumes, to recover data from badvolumes, to implement data backup policies,to enforce space allocation policies, and todetermine the need for equipment. Most ofthe activities should be automated to theextent that the task of the human systemadministrator is primarily one of mon i -toring summary reports and using reportsfor planning purposes.

3.7.1.2 Tools Needed

The key to storage management for anystorage system is the ability to gather andutilize information about the current stateof the storage servers and statistics on the i rtransaction histories. The total space, totalfree space, and distribution of free space onindividual volumes define the state of astorage server. Information should be e x -tracted from the transaction log of eachstorage server to give the number of b i t f i leaccesses, amount of data transmitted, a v -erage and mean response times, average andmean data transfer rates, and patterns ofaccess by bitfile age, activity and size.Performance monitor programs are neededto provide information such as the averagewait and response times, resource u t i l i za -tion, demand and contention, and queuelengths for storage system components.

The migration manager component of thebitfile server is the primary tool for i m -plementing storage management policies.The migration manager uses guidelines setby the system administrator as well ashistorical data and current state data todetermine the amount of free space to keepavailable for each storage server, to decidewhat bitfiles (by activity, size, etc.) shouldbe stored on each storage device, and to de-termine which bitfiles to move within thehierarchy to keep active bitfiles readilyaccessible.

The storage system should have automaticfragmentation control. Information aboutthe amount of free space and allocated spacecan be used to determine when to re-packvolumes. This function may be performed bythe migration manager or by some otherstorage server module.

Programs to analyze, initialize and labelvolumes should be provided, along wi thstorage server commands to add and deletevolumes. Two distinct types of volumes e x -ist. Volumes like fixed magnetic disks arenot demountable, and are usually defined tothe operating system at system generationtime. Demountable volumes such as tapes o roptical disks are not subject to these con-straints.

Storage servers and physical volumerepositories should manage their de-mountable volumes largely without humanintervention. The only human activitiesinvolved are occasional monitoring andrevision of control parameters and sup-plying empty volumes to the storage serveror physical volume repository when needed.

Two areas of storage management requirethe direct involvement of knowledgeablepersons. The first is the recovery of datafrom bad volumes. Programs are needed toanalyze, display and modify information on avolume, and to copy the entire contents of avolume to another volume without changingbitfile locations, skipping bad data whichcannot be read after a number of retr ies.Data recovery may use the migrationmanager to evacuate a bad volume by movingindividual bitfiles. The decision as to whichtype of recovery should be used in a p a r -ticular case must be left to an experiencedperson.


System planning also requires human i n -volvement. As new products are developedand old ones discontinued, changes in thestorage servers are needed. In addition,changes in the network or user environmentmay require changes in the storage man-agement policy or implementation.Statistical information may be used to decidewhen storage servers/devices should beenhanced, acquired and phased out. Changingpatterns seen in usage information may bethe best indicator that changes are needed i nthe policies of storage management.

3 . 7 . 2 Operations


The operations functions are concerned wi thmaking sure that the storage system oper-ates continuously and insuring that userrequests are being satisfied in a timely andreliable manner. The system must be mon-itored and controlled to identify and resolveproblems, to load/unload off-line volumes,and to verify that site management jobs haverun correctly.


An intelligent operations control centerspanning the complete storage system i sneeded. Console displays need to show activerequests for each server, requests queuedfor each server, volumes mounted for eachstorage server, space summary information(total number of volumes, number of emptyvolumes, free space, etc.) for each storagesystem in the hierarchy, resource status(processors, storage controllers, storagedevices, communication links, volumes,etc.), and a special display for resourcesthat are suspect or unavailable. Operatorcommands should be available for eachserver to restart or abort requests, and toset resources available or unavailable.

Storage system log information is needed.Messages that require action such as volumemounts and error messages should go to adisplay and/or hard copy console. All mes-sages should be kept in a data base wherethey can be easily retrieved and displayed.

Job summary information is needed.Successful completion messages and e r r o r

messages for system jobs should be wr i t tento a data base where they can be reviewed.When an important system job fails, such asbackup of the bitfile descriptor tables, anoperator action message should be issued.

The operational means to recover f romtemporary and permanent failures i sneeded. This includes the ability to isolateequipment which is failing or needs p r e -ventive maintenance (e.g. tape drivesneeding cleaning) and the ability to switchto backup equipment.

Automation of operations is needed to max-imize the performance and reliability, andto minimize the manual effort. This includesautomation of volume loading using aphysical volume repository and automationof the decision-making process to minimizehuman errors and human delays.

3 . 7 . 3 Systems Maintenance


The systems maintenance functions strive tomaintain the performance, reliability, andavailability of the storage system, and theintegrity of the stored data. Performance i ssupported by monitoring the individualcomponents and devices as well as theoverall storage for failing components o rout-of-balance conditions. The key to r e -liability and availability is the preservationof critical system information in an env i -ronment of possible hardware er rors ,software errors and system crashes. Thisinformation includes name server d i rec-tories, bitfile descriptor tables, space l i m i ttables, physical volume tables, physicaldevice descriptor tables, logical-to-phys-ical maps, logical volume tables, networkconfiguration tables and transaction logs.


System programmers must have the abi l i tyto quickly make changes in operating systemor storage system parameters that affect theperformance of the system. These tuningparameters may be available at executionand/or compile time. A “super-user” ca -pability is needed so that a system p r o -grammer can execute all commands and haveaccess to all system data.


Tools to maintain the integrity of in forma-tion are needed. Programs must be availableto back-up bitfiles and volumes, and torestore information from the backups.Additional tools must be available for i m -portant, dynamic tables and data baseswhere a backup quickly becomes ou t -o f -date. One technique is to keep a secondarycopy of dynamic information in addition tothe primary copy. If either the primary o rsecondary fails, a new copy is immediatelymade of the good copy. Another technique i sto keep a journal of the important transac-tions. If a failure occurs, the journal i sapplied to a backup to restore the in forma-tion to the current level. The recoveryprograms needed to restore a backup to thecurrent level following a crash must beavailable and well tested. Several personsshould be familiar with the proceduresrequired.

A checkpoint capability is needed to restorecritical storage system tables and data basesto a consistent state if a crash occurs duringa transaction that makes multiple changes(such as saving a bitfile which makes a newbitfile descriptor, updates the directorythat points to the descriptor, and may updatethe accounting data base as well). Duringrestart following an abnormal termination,the checkpoints are used to either completeor back-off requests so that the tables anddata bases will be consistent.

Verification programs are needed to checkthe consistency of storage system informa-tion. These programs should be designed torun in parallel with the system so thatoperation may continue while verification isdone.

Tools to help with problem determinationare needed. These include trace capabilities,breakpoint capabilities, selected printing offormatted and unformatted dumps of data andprograms, and dump analysis programs.Tools are needed to modify and repairstorage system information.

3 . 7 . 4 Software Support


For sites that develop new storage systemsoftware, facilities must be available todevelop, maintain and test that software.

For customer sites, a test facility is r e -quired to verify that a new version satisfieslocal security and other requirements b e -fore production use.


An environment must be provided to testsoftware changes and enhancements withoutdisrupting the production use of the storagesystem. The ability to run a test version andthe production version of the software s i -multaneously is necessary. The test so f t -ware may run on the same processors as theproduction software or run on other p r o -cessors. It may share devices such as thecommunication systems and the physicalstorage systems, but it should have its owntables, data bases, volumes, etc. Instead ofrunning a complete test version of thestorage system software, a test version of aparticular component (e.g., a b i t f i leserver) could be run using components ofthe production system for the rest of thesystem.

A regression testing capability should beavailable so that a comprehensive set oftests can be run at any time against theproduction or test system to verify secu-rity, integrity, and performance. Both therunning and checking of the regression testsshould be automated.

3 . 7 . 5 Hardware Support


The hardware support functions are con-cerned with the display, diagnosis andcorrection of hardware problems, and theconfiguration and installation of thehardware. Hardware failures and the timeneeded to repair failures must be minimizedespecially for those failures that bring downthe storage system. It must be easy to r e -configure the system hardware, and instal land remove equipment.


Programs to report hardware errors areneeded. These programs should be able togive a detailed time history of hardwareerrors, and show correlated summaries ofboth temporary and permanent errors by


error-type, device-type, specific deviceand volume, over specified time intervals.The ability to recognize the beginning of aproblem before it becomes permanent i sespecially important when dealing wi thstorage devices/volumes where permanenterrors generally mean the loss of data.

Programs to exercise and diagnose a l lhardware components of the storage systemare needed. These programs should be able toanalyze the errors and recommend thecorrective action. Storage devices wi thmechanical parts, such as magnetic disk,optical disk, magnetic tape, and especiallyphysical volume repositories, have a muchhigher error rate than strictly electronichardware so diagnostic and exercise p r o -grams play an important role in storagesystems.

The system should be redundantly config-ured so that components and paths can beisolated, removed for repair and upgradedwith a minimal impact upon operation andperformance. Dynamic reconfigurationcapabilities, including the switching of theproduction software to a backup processor,should be available.

3 . 7 . 6 Administrative Control


Administrative control covers the security,accounting, and management policies of thestorage system. The security requirementsare to implement the security policies of thesite and to recognize if policy violations arebeing attempted. The accounting requi re-ments are to gather usage information, tocharge for use of resources, and to controlthe resources. The management requi re-ments are to present summary informationconcerning the operation and performance ofthe system that can be used to justify o p -erational and equipment expenditures and toset high-level policy.


The storage system must implement theparticular security policies of each site bybuilding the policies into the programs o rthrough the use of replaceable modules. Ingeneral, the policies involve verificationthat a user has access to the requested r e -

sources of a server. Access or capabilityinformation is stored with the resource andchecked against similar information in therequest. For some sites it is required thatclassification and partition levels be asso-ciated with bitfiles and requests, and thataccess be controlled based on certain clas-sification and partition rules. A security logmust be available that contains all securityviolations (as determined by a site) and a l ltransactions above a specified securityclassification level. A log of all transactionsshould be kept to help diagnose anomaloussituations.

The storage system needs a resource-charging mechanism. Charges may be i n -curred for the following resources andservices: amount of data stored, number ofbitfiles stored, data transferred, bit f i lesaccessed, and any of the requests made to thebitfile servers. These charges may vary f o rthe different bitfile servers, depending onthe level of performance and the class ofstorage used. Requests made to the storagesystem should contain an account code towhich the charge is to be made. An accountcode can be stored in each bitfile descriptoralong with the storage space used, the lengthof time stored, and a reference time f o raccounting purposes. An accounting programobtains the storage and bitfile charge i n -formation from the bitfile descriptors;obtain the access, data transfer and requestcharge information from the transactionlogs; accumulate and sort the charge i n -formation; and write the charge informationto an accounting file. Another accountingprogram has the resource charging ratesand calculates the bills. Since the accountcodes often change, an automatic means ofupdating the bitfile descriptors is needed.One approach is to have a central data baseof accounts from which an accounting p r o -gram can update the bitfile descriptors. Thisdata base can also be used to validate usersand to show what resources they can use.

The summary information used by man-agement to set high-level policy needs to beextracted from all the other site manage-ment reports and data bases, and presentedin a useful (usually graphic) manner. Anumber of vendor products are availablethat can be used to extract, process anddisplay information.


4. References

Allen, I. D. (1983). The role of intelligentperipheral interfaces in systems a rch i -tecture. Proc. Nat. Computer Conf. pp. 6 2 3 -630.

Almes, G. T., Black, P., Lazowska, D., andNoe, D. (1985). The Eden system: a tech-nical review. IEEE Trans. on SoftwareEngineering SE-11. (1), 43-59.

Birrell, A. D., Levin, R., Neddham, M., andSchoeder D. (1982). Grapevine: an exercisein distributed computing. Comm. ACM, Vol25, No. 4, 260-274.

Booch, G. (1986). Object-oriented devel-opment, IEEE Trans. on SoftwareEngineering, SE-12. (2), 211-221.

Brownbridge, D. R., Marshall, L. F., andRandell, B. (1982). The newcastle con-nection. Software Practice and Experience12, 1147-1162.

Coleman, S. and Watson, R. (1984). Storagein the LLNL Octopus network: an overviewand reflections. Digest of Papers, Sixth IEEESymposium on Mass Storage Systems, Vail,Colorado, June 1984.

Coleman, S. (1988). Physical volumerepository. Digest of Papers, Ninth IEEESymposium on Mass Storage Systems,Monterey, California, November 1988.

Collins, B., Devaney, M., and Wilbanks, E.(1982). A network file storage system.Digest of Papers, Fifth IEEE Symposium onMass Storage Systems, Boulder, Colorado,October 1982.

Collins, B. (1988). Mass storage systemreference model system management. Digestof Papers, Ninth IEEE Symposium on MassStorage Systems, Monterey, California,November 1988.

Davis, J. D. (1982). Mass storage systems:a current analysis. Digest of Papers, Fif th

IEEE Symposium on Mass Storage Systems,Boulder, Colorado, October 1982.

DIS8571. Information processing systemsopen systems interconnection, file transfer,access, and management (in four parts,draft international standard ISO/DIS8571),distributed by Omicon Information Services.

Donnelley, J. E., and Fletcher, J. G.(1980). Resource access control in anetwork operating system. Proc. ACMPacific 80 Conf.

Enslow, P. H., Jr. (1978). What is a“distributed” data processing system?Computer, Vol 11, No. 1, Jan, 13-21.

Falcone, Joseph R. (1988). The b i t f i leserver in the IEEE reference model for massstorage systems. Digest of Papers, NinthIEEE Symposium on Mass Storage Systems,Monterey, California, November 1988.

Fletcher, J., G. (1975). Computer storagestructure and utilization at a large scien-tific laboratory. Proc. IEEE 63 (8), 1 1 0 4 -1113.

Foglesong, Joy, et. al. (1990). TheLivermore distributed storage system:implementation and experiences. Digest ofPapers, Tenth IEEE Symposium on MassStorage Systems, Monterey, California, May1990.

Gary, Mark (1990). Overcoming Unixkernel deficiencies in a portable, d i s -tributed storage system. Digest of Papers,Tenth IEEE Symposium on Mass StorageSystems, Monterey, California, May 1990.

Gentile, R. B., and Lucas, J. R. (1971). TheTABLON mass storage network. Proc. SpringJoint Computer Conf., pp. 345-356.

Grossman, C.P.(1989). Evolution of theDASD storage control, IBM SystemsJournal, Vol.28, No.2, 1989.


Harris, J. P., Rhode, R. S., and Arter, N. K.(1975). The IBM 3850 mass storage sys -tem: design aspects. Proc. IEEE 63 ( 8 ) ,1 1 7 1 - 1 1 7 9 .

Hogan, Carole, et. al. (1990). TheLivermore distributed storage system: r e -quirements and overview. Digest of Papers,Tenth IEEE Symposium on Mass StorageSystems, Monterey, California, May 1990.

Howie, H. R., Jr. and Salbu, E. ( 1 9 7 5 ) .Mass storage implementation approaches: aspectrum. AFIPS The InformationTechnology Series, Memory and StorageTechnology.

Johnson, C. (1975). IBM 3850 massstorage system. AFIPS Conf. Proc. 44.

Jones, A. K. (1979). The object model: aconceptual tool for structuring software.“Operating Systems”. Springer-Verlag,Berl in.

Kitts, D. (1988). Bitfile mover. Digest ofPapers, Ninth IEEE Symposium on MassStorage Systems, Monterey, California,November 1988.

Kuehler, J. D. and Kerby, H. R. (1966). Aphotographic mass storage system. AFIPSFJCC Proc. 29, 735-742.

Lantz, K. A., Edighoffer, J. L., and Hitson, B.L. (1985). Toward a Universal DirectoryService. Report No. STAN-C5-85-1086,Stanford University.

Leach, P. J., et al (1982). UIDs as internalnames in a distributed file system. Proc.Symposium on Principles of Dist.Computing, Ottawa, 34-41.

LeLann, G. (1981). Motivation, objectives,and characteristics of distributed systems.“Distributed system-architecture and i m -plementation”. Springer-Verlag, Ber l in ,1 - 9 .

Mclarty, T., Collins, B. and Devaney, M.(1984). A functional view of the Los Alamoscentral file system. Digest of Papers, SixthIEEE Symposium on Mass Storage Systems,Vail, Colorado, June 1984.

Miller, S. W. and Collins, B. ( 1 9 8 5 ) .Toward a reference model for mass storagesystems. Computer, Vol. 18, No. 7, July, 9-22 .

Miller, S. W. (1988a). “A Reference Modelfor Mass Storage Systems”, Advances i nComputers, Volume 27, Academic Press.

Miller, S. W. (1988b). Mass storage r e f -erence model, special topics. Digest ofPapers, Ninth IEEE Symposium on MassStorage Systems, Monterey, California,November 1988.

Mullender, J., and Tannenbaum, A. S.(1984). Protection and resource control i ndistributed operating systems. ComputerNetworks 8, 421-432.

Nelson, M., Kitts, D. L., Merrill, J. H., andHarano, G. (1987). The NCAR mass storagesystem. Digest of Papers, Eighth IEEESymposium on Mass Storage Systems,Tucson, Arizona, May 1987, pp. 12-20.

O'Lear, B. T. and Choy, J. H. ( 1 9 8 2 ) .Software considerations in mass storagesystems. Computer 15 (7), 36-44.

Penny, S. J. and Alston-Garnjost, M.(1970). Design of a very large storagesystem. Proc. Fall Joint Computer Conf. pp.4 5 - 5 1 .

Sandberg, R. (1985). Design and imp le -mentation of the SUN network file system.Proc. Tenth Usenix Conference, 119-130.

Savage, P. (1985). Proposed guidelines f o ran automated cartridge repository.Computer, Vol 18, No. 7, July, 49-58.

Savage, P. (1988). Storage server asphysical box. Digest of Papers, Ninth IEEESymposium on Mass Storage Systems,Monterey, California, November 1988.

Svobodova, L. (1984). File servers f o rnetwork-based distributed systems.Computing Surveys, 16, (4), 354-398.

Watson, R. W. (1980). Network a r ch i -tecture design for a back-end storagenetwork. Computer, Vol 13, No. 2, Feb, 32-48 .


Watson, R. W. (1981a). Distributed systemarchitecture model. Distributed Systems—Architecture and Implementation,Springer-Verlag, Berlin, 10-43.

Watson, R. W. (1981b). Identifiers(naming) in distributed systems.Distributed Systems—Architecture andImplementation, Springer-Verlag, Ber l in ,1 9 1 - 2 1 0 .

Watson, R. W. (1984). Requirements andoverview of the LINCS distributed operatingsystem architecture. Lawrence LivermoreNational Laboratory, Preprint UCRL-90906.

Watson, R. W. (1987). Tutorial notes,Eighth IEEE Symposium on Mass StorageSystems, Tucson, Arizona, May 1987.

Watson, R. W. (1988). The Architecture ofFuture Operating Systems, UCRL Prepr int99896, presented at the Cray Users GroupMeeting, Tokyo, Japan - September 1988.

Wildman, M. (1975). Terabit memorysystem: design history. Proc. IEEE 63 ( 8 ) ,1 1 6 0 - 1 1 6 5 .


5. Glossary

Authentication Request/ReplyThe command to test the access rights ofthe requestor to a particular service.

B i t f i l eA collection of data that can be createdon, read from, written into, and deletedfrom a storage system. These data aretreated as a string of bits without anyparticular structure.

Bitfile AuthenticatorThe process that checks the accessrights of a requestor for service.

Bitfile DescriptorThe bitfile attribute information that i sstored as an entry in the bitfile de-scriptor table.

Bitfile Descriptor ManagerThe process that manages the b i t f i ledescriptor table.

Bitfile Descriptor TableThe data store where the bitfile de-scriptors are stored.

Bitfile IDA machine-oriented, globally uniqueidentifier of a bitfile.

Bitfile MoverThe processes, including the high-levelprotocols, that control the movement ofbitf i les.

Bitfile ServerThe set of processes that control thecreation, destruction, and access to themany bitfiles under its control.

Bitfile Server Request ProcessorThe portion of the bitfile server thatacts upon requests and controls therequest/reply communications wi thinternal modules as well as otherprocesses and servers.

Bitfile TransferThe high-speed movement of bitfile datablocks.

Client Request/ReplyThe list of permitted commands from aclient to a server and the resultingresponses.

CreateThe bitfile client request to form abitfile descriptor record in the b i t f i ledescriptor table. The bitfile attributesto be contained in the bitfile descriptorare specified in the request.

DestroyThe bitfile client request to remove abitfile descriptor from the bitfile de-scriptor table. The space allocated tothe bitfile is deallocated and, if themedia can be rewritten, returned to thefree space list.

LockThe bitfile client request to establish alock for a bitfile in preparation for oneor more stores or retrieves of the b i t -f i le.

ModifyThe bitfile client request to change oneor more attributes of a bitfile as con-tained in the bitfile descriptor table.

Monitor InformationStatus information from storage systemmodules used by the site manager toassist in the management and control ofthe storage system.

Move CommandThe request to move a bitfile betweenspecified devices.

Name ServerThe server that converts between thehuman-oriented name for a bitfile andthe machine-oriented name for thesame bitfile.


Physical VolumeA bounded unit of storage media that i sused to store bitfiles.

Physical Volume MoveThe physical movement of a volumebetween the volume repository and astorage server or its return.

Physical Volume RepositoryThe place where physical volumes arestored when they are not at a read/write station.

Principle IDIdentification of the agent requestingservice from the bitfile server.

PVR-DismountA request sent to the physical volumerepository to remove a physical volumefrom a drive.

PVR-MountA physical volume ID sent to thephysical volume repository with therequest to mount it on a part icularstorage device in the storage server.

QueryThe bitfile client request to obtain i n -formation about a bitfile or its a t -tributes from the bitfile descriptortable.

Retr ieveThe bitfile client request to move abitfile or a segment of a bitfile from astorage server to the bitfile client. I fonly a segment is to be moved, then theinternal starting bit address and the b i tstring length must be specified.

Site ControlCommands from the site manager f o rinitial set up, operations, and man-agement of the storage system.

SS-AllocateThe request to a storage server to makelogical space available for b i t f i lestorage.

SS-DeallocateThe request to a storage server toremove a bitfile from physical storageand return the space to the free spacel is t .

SS-Ret r ieveThe request from a bitfile server tomove a bitfile from a storage server toa bitfile client.

SS-StoreThe request from a bitfile server tomove a bitfile from bitfile client bufferto storage server media.

StatusThe bitfile client request for the statusof the bitfile server or of a previousrequest made by the bitfile client.

StoreThe client request to move a bitfile datablock from the bitfile client to a b i t f i leserver medium. This request may i n -clude the ability to append a segment atthe end of an existing bitfile or to u p -date a physically specified segment ofan existing bitfile.

UnlockThe client request to release the lockheld on a bitfile.

Documents

Mass Storage System Reference Model€¦ · High-performance data base systems, for example, may need to organize disk storage directly and schedule disk operations to gain performance,