22
Digital Asset Assessment Tool Guidance Manual (part I) -Version 1 2006-09-29 University of London Computer Centre Digital Asset Assessment Tool (DAAT) Project D-PAS Guidance Manual (part I) Version number 1.0 Release date 29 September 2006 This Guidance Manual is published as a deliverable under the Digital Asset Assessment Tool (DAAT) project. The project was funded by the Joint Information Systems Committee (JISC) under the Supporting 1

DAAT guidance manual

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

University of London Computer CentreDigital Asset Assessment Tool (DAAT) Project

D-PAS Guidance Manual (part I)

Version number 1.0Release date 29 September 2006

This Guidance Manual is published as a deliverable under the Digital Asset Assessment Tool (DAAT) project. The project was funded by the Joint Information Systems Committee (JISC) under the Supporting Digital Preservation and Asset Management in Institutions 4/04 Programme.

Project website: http://www.ulcc.ac.uk/daat.htmlProgramme website: http://www.jisc.ac.uk/index.cfm?name=programme_404

Project contact: Ed Pinsent, [email protected], 0207 692 1345

© 2006 University of London Computer Centre20 Guilford StreetLondon WC1N 1DZ

1

Page 2: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

Table of contents

SECTION 1. Introduction........................................................................................................3

1.1 Introduction : origins and background..........................................................................31.2 Applying the NPO Model to digital assets....................................................................31.3 Who can use D-PAS....................................................................................................31.4 Outcomes....................................................................................................................31.5 Number of items to be surveyed..................................................................................41.6 Working in house.........................................................................................................4

SECTION 2. How the D-PAS Survey works...........................................................................5

2.1 Planning....................................................................................................................... 52.2 Sampling...................................................................................................................... 52.3 Target Collection Areas...............................................................................................52.4 Assessment................................................................................................................. 62.5 Data Entry.................................................................................................................... 7

SECTION 3. Planning and preparation..................................................................................8

3.1 Staff and Roles............................................................................................................83.1.1 Project manager and survey team............................................................................83.1.2 Teams of Staff carrying out the assessments...........................................................83.1.3 Agreed Terminology.................................................................................................9Collection Areas................................................................................................................9Size of target collection....................................................................................................10Relative humidity and temperature..................................................................................10Significance.....................................................................................................................103.1.4 Who does sampling?..............................................................................................103.1.5 Who does the data entry?.......................................................................................103.2 How long will the survey take to complete?...............................................................103.3 Planning the work pattern..........................................................................................113.4 What equipment will be needed?...............................................................................113.4.1 Sampling.................................................................................................................113.4.2 Assessing...............................................................................................................113.4.3 Data Entry...............................................................................................................11

SECTION 4. Sampling...........................................................................................................12

4.1 What is sampling and why is it important?.................................................................124.2 Which sampling method?..........................................................................................124.3 Simple random sampling...........................................................................................124.4 Systematic Sampling.................................................................................................134.5 Stratified sampling.....................................................................................................13

SECTION 5. Assessment......................................................................................................15

5.1 Stages....................................................................................................................... 155.2 Completing the questionnaire: collection assessments.............................................155.3 Completing the questionnaire: condition assessments..............................................16

2

Page 3: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

SECTION 1. Introduction

1.1 Introduction : origins and background

In 1998 the British Library Research and Innovation Centre published the results of a research project which included a draft model for assessing preservation needs in libraries . In 1999 the National Preservation Office (NPO) undertook a number of pilot studies to test the model. This led to the library and archive preservation assessment survey (L-PAS) tool, which since 2001 has received extensive use in paper-based collections. The M-PAS module, for Museums, followed thereafter.

1.2 Applying the NPO Model to digital assets

The Digital Preservation Assessment Survey (D-PAS) is intended to provide a general overview of the digital preservation requirements of a whole organisation, or of an individual collection or department, as opposed to yielding detailed information about each digital object. This method enables a survey of a large volume of material to be carried out in a realistic amount of time.

D-PAS will be a practical tool for the collection manager. The tool should address the needs of entire institutions or of groups with identifiable collections within those institutions. It should be usable by librarians and archivists as well as research group leaders and IT professionals; all of these groups may have, or believe themselves to have, responsibility for digital assets in some form.

Results from the D-PAS tool can help to provide information regarding the preservation needs of an organisation; and translate into guidance for strategic development. It will not provide specific treatment recommendations for digital objects, nor enable specific treatment costs to be calculated, but it can be used as a basis for future development of these elements if required.

D-PAS should be:

a standard sample-based survey methodology applicable to any collection of digital objects capable of providing quantifiable and comparable data on the preservation needs of

digital collections

1.3 Who can use D-PAS

D-PAS is targeted at the needs of the UK Higher Education, Further Education and research sectors, but it should be capable of deployment in other sectors such as national libraries, archives and museums and national and local government. The tool could be used by any institution or organisation that holds collections of digital assets and objects. This includes Universities, Academic Libraries, Public Libraries, University Archives, Museums, Data Centres, Computer Services, service providers, and Research Teams.

1.4 Outcomes

The results of the D-PAS survey can enable institutions:

to identify, understand, and quantify the risks facing their digital asset collections prioritise of their own preservation work to justify and support applications to funding bodies for preservation and conservation

projects

Originally, the results of surveys carried out using this model were intended to be collated and used to help the NPO to develop a national picture of preservation needs. This information was used to enable resources to be targeted at appropriate funding and training programmes. It should generate information on the basis of which national or regional strategies may be planned and funded.

3

Page 4: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

The D-PAS tool however may be deployed on a different basis. It is not known if the National Preservation Office will be acting in this role as a centralised information point for the compilation and consolidation of digital preservation needs. Nor is it anticipated that ULCC will be taking on such a role. It’s more likely that D-PAS will provide Universities and other organisations with a stand-alone report on their collections of digital assets.

1.5 Number of items to be surveyed

L-PAS and M-PAS agreed that a sample of 400 items could be used as an efficient and accurate way of surveying libraries, archives or museums with holdings of more than 5000 items. D-PAS will work on the same basis.

With the D-PAS tool, we have attempted to combine the use of automation, together with sampling methods adapted from the L-PAS and M-PAS methods. Automation can be used to discover and enumerate large quantities of digital objects held on a server, and prepare an inventory of them. For further information, see Section 4 of this Manual.

1.6 Working in house

The method is designed to be carried out in-house, although external assistance could be used if necessary. Working in-house has several clear benefits:

It allows staff to become familiar with the preservation status of their collections It allows staff to become more familiar with the whereabouts of their collections,

particularly if digital assets are scattered around the organisation and not held in one place

It can provide training and experience in assessment of collection care issues It can encourage team working and a corporate approach to collection care It is economical, in that no substantial additional funding is likely to be necessary,

although staff time may need to be reallocated

4

Page 5: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

SECTION 2. How the D-PAS Survey works

The D-PAS Survey consists of several phases:

Planning Building a project team Sampling Assessment Data entry Analysis and generation of reports

2.1 Planning

D-PAS offers an institution the opportunity to identify its collections of digital assets, ascertain their whereabouts, review these collections, and look at the policies in place for their protection and preservation. In order for this to be a useful exercise yielding reliable data the process must be planned carefully. The planning phase involves logistical decisions, and decisions about resources, such as:

who will carry out the survey what sort of technical and external support will be needed when the survey will take place whether the whole of the institution’s digital assets are to be surveyed, or individual

collection areas the definition of target collection areas

2.2 Sampling

The survey uses a sample of approximately 400 objects selected from the institution as a whole, or from within a defined collection area or 'population' within the institution's holdings. This is also called a ‘target collection area’. The unit of sample could be:

A single digital object, usually identified by its file format, comprising an asset A single piece of storage media, such as a CD, DVD, disk or tape, containing one or

many assets A single asset made up from a complex arrangement of several digital objects

Methods of defining collections/populations and of identifying the digital objects are explained in Section 4.

There are several methods that can be used for selecting the sample. The choice depends on the size of the institution's holdings, the ways in which it is divided into collections, and the nature of its documentation.

The digital assets might be held on centralised servers in house, on externalised storage (like DSpace 1), or even on the individual hard drives of key staff members. They might also be held or preserved on various storage media, like disks and tapes.

At this stage you will need to identify the Target Collection Areas, as this will affect the way the survey is carried out.

2.3 Target Collection Areas

Firstly, identify the target collection. A target collection is a discrete ‘container’ for a set of digital assets. So a target collection could be:

1 DSpace is an Open Archive Initiative (OAI) - compliant open-source software released by MIT for archiving eprints and other kinds of academic content. The DSpace digital repository system captures, stores, indexes, preserves, and distributes digital research material. See http://www.dspace.org/

5

Page 6: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

A single server A related collection of carriers A single tape or disc A drive on a single PC An external hard drive (including storage drives, mobile phones and digital cameras)

Secondly, describe the attributes of the agreed target collection.

Mostly one file format = all or most of the assets match a single file format - eg they're all TIF files.

Mixed file format = one or more of the assets are a mixture of file formats. Files with complex relationships = one or more of the assets are a mixture of file formats

and have complex / dynamic relationships between them (for example web pages). Multiple systems = the assets require multiple systems to manage them (which may be

managed in different ways). This means that even though the servers may be stored in the same location, the content or format of the assets means that there are different management regimes (eg databases, directory storage, website management).

Thirdly, keep records of the physical location of the target collections, noting such things as site, building, room number, and serial numbers (if appropriate).

2.4 Assessment

When the sample has been identified, and more importantly the Target Collection Areas have been agreed, the questionnaires are completed.

D-PAS proposes surveys targeted at digital assets held on servers, also called collections of 'virtual objects'; and surveys targeted at digital assets held on storage media.

Similar to the approaches used by L-PAS and M-PAS, the D-PAS survey assesses a Collection and its Condition. However, D-PAS is not as clearly divided into these two discrete parts as L-PAS.

Collection Assessment requires a number of 'tick-box' questions to be completed (very often with YES/NO answers). These questions cover key areas which affect preservation: institutional management; location; intellectual control; system management; accommodation; preservation actions; usage, and value and importance.

Collection Assessment aims to provide information on the object's context, for example the storage conditions for digital media; accommodation for servers; the environment, monitoring and protection provided for both media and servers.

There are also questions on management of the system or systems for the digital assets in each collection area. These are generic questions about what happens to assets within systems.

There is a section which addresses preservation strategy, bit preservation, fixity, and transformation of digital assets. This assumes the organisation has a set of procedures for carrying out preservation actions, which will commonly involve transformation of the digital asset; and that the organisation has access to checksum tools and the basic technical skills required to carry out checksum checking.

The results of the Collection Assessment side will provide an organisation with a certain amount of quantitative information to guide their policies and strategic planning.

Condition Assessment requires a more detailed assessment of the item's condition.

In the case of storage media, Condition Assessment aims to identify the media type, seek results on two basic readability tests, and broadly assesses damage that the item has suffered.

In the case of 'virtual objects' held on servers, D-PAS proposes not so much a 'condition' assessment as a detailed assessment of risks facing file formats, software and hardware and their continued support.

6

Page 7: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

Once completed, this will provide an indication of the preservation and curation solutions and skills needed to remedy damage to media. More importantly, it might identify problems associated with file formats, software applications, hardware, and vendors used by the institution.

The results should give a picture of the kinds of preservation problems that are present and, combined with the results from the other parts of the survey, may highlight whether these problems are linked to inappropriate accommodation and/or insufficient environmental control; or system management, or software applications, or hardware issues, or even a gap in the overall institutional management of assets.

2.5 Data Entry

Answers to questions are entered into a Microsoft Access database. Weighted scores are attached to the results, providing a 'preservation priority rating' for each object. These can be summarised for each collection to provide an overall picture of preservation need.

A variety of reports can be generated from the database to provide a picture of the current condition of the collection. The NPO originally provided institutions with a standard range of reports. ULCC hope to develop further sophisticated reports over time. These would include 'what if?' reports, which model the effects of potential improvements in digital curation and protection of assets.

7

Page 8: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

SECTION 3. Planning and preparation

3.1 Staff and Roles

3.1.1 Project manager and survey team

Carrying out a D-PAS survey will entail the participation of a number of people across an institution and will involve several phases of work.

The role of project manager should be assigned to someone who can oversee all phases of the process, ideally a person who has experience of conducting surveys. They should have a clear overview of the aims of the survey and of any project for which the results will be used. The project manager will be involved with planning the survey and should be responsible for co-ordinating the team, seeing that essential decisions are made and that the survey is completed within the required time frame. The project manager should keep a project log in which decisions are recorded.

All members of staff who will be involved in the survey process, whether as surveyors, suppliers, or as users of the information that will be generated, should meet to plan the survey and to discuss parameters and survey methodology. It is important that everyone involved understands the purpose of the D-PAS and what it is, and isn't, designed to do. For example, the D-PAS will yield an overview of risks faced by digital assets and digital preservation needs but it will not give a detailed condition report on individual objects, nor can it be used directly to develop a treatment programme.

The survey team should discuss how to achieve consistent results in the object assessments. This is of vital importance when it comes to defining software and hardware, and the issues involved in 'complex' digital objects.

Individual departments within the institution may wish to use their own methods for assessing condition and recognising damage types.

Those organising the survey within the institution should ensure that all staff involved have an appropriate level of experience and training to carry out assessments. If members of the assessment team do not customarily handle media storage units, it may be necessary to provide training in safe handling procedures.

Before you begin the D-PAS survey, you must consider:

Which parts of the collection will be surveyed How the target collection areas will be determined and identified What the unit of sample will be Whether automated methods will be used in determining the target collection areas,

unit of sample, or generation of an inventory Which members of IT and other technical staff will be need as suppliers to the survey,

or simply there to help answer questions Which members of staff will select the sample Which members of staff will complete the questionnaires What other information you need, besides that provided by staff, to help complete the

survey – for example policies and standards, technical manuals, building reports, organisational charts, record and document surveys, software registry information

Whether you need to gain access to secure IT storage areas and/or offsite IT storage areas, and if so that you have staff with sufficient clearance to enable such access

When the survey is to take place The impact this will have on other workloads and service levels

3.1.2 Teams of Staff carrying out the assessments

The mix of staff for the assessment will depend on circumstances. The essential elements are:

knowledge of the collection's value and importance knowledge of the collection's whereabouts

8

Page 9: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

knowledge of preservation of digital assets knowledge of IT systems in place knowledge of file formats and software

The survey will work very much better if several people work together.

Skills needed for the survey

The skills and knowledge of a Systems Administrator will facilitate informed assessments as to IT systems, file formats and software.

A Repository Administrator could help extract data from the system to generate an inventory of the assets.

An IT manager in charge of backups, preservation and safe storage should be on the team; and, if appropriate, someone familiar with the local storage and server locations, and offsite storage arrangements.

An electronic records manager, departmental records officer or archivist may have conducted an information survey that assists with the locations of important collections around the institution.

The team should include someone familiar with the intellectual content of the collection, for example an archivist or records manager. This person is likely to be in the best position to make judgements as to the 'importance' of, and demand for access to, an item

The combination of knowledge ensures that balanced assessments are made.

The guidance notes and publications cited in this document are designed to assist staff without technical IT skills in making assessment decisions, and the project co-ordinators can provide advice by telephone or email.

The possibilities of institutions collaborating in their assessment work - establishing local or regional teams of personnel - should also be considered.

The use of appropriately informed and supervised students on placements from relevant higher education courses should also be given serious consideration. Several students were involved in the library pilots and worked very effectively whilst gaining valuable work experience. It is important however to recognise that accuracy and consistency of response determine the validity of the results.

3.1.3 Agreed Terminology

There are a number of fields in the survey database that involve issues of terminology and consistency. These should be discussed and agreed by the teams in advance. For example:

Collection Areas

As part of the strategy, you may need to identify and work on discrete representative areas from the entire collection of digital assets. Identify the collection of which the object is a part by department, or by another category, defined according to the institution's own organisational/geographical structure. To identify risks comprehensively, it would help to include areas where you are less certain of the content of the assets, or that they are well-organised.

D-PAS is also proposing collection areas based on the options below:

Collection area = mostly one file format Collection area = mixed file format Collection area = files with complex relationships Collection area = multiple systems

9

Page 10: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

Size of target collection

Define the size of the collection to be surveyed, and how it will be measured - in numbers of objects, different file formats, in megabytes...

Relative humidity and temperature

Define what constitutes 'routine' monitoring, or what is meant by 'normally kept' for temperature conditions.

Significance

Discuss the issue of significance and how each significance category will be defined within the organisation.

The survey team should agree the definition of an object to be surveyed. In digital terms, this can be extremely complicated, involving factors such as file formats, relationships between digital objects, multiple system management, and whether relationships between objects are discernible. This problem may be partially addressed by defining the parameters of the collection area correctly.

The survey team should agree on a protocol for backing up the data that will be generated on the database. This should be done at the end of each surveying session and should form part of the duties of those members of staff involved with inputting data. Data can be saved to a CD, a DVD or to the organisation's own network server.

3.1.4 Who does sampling?

This will depend on your strategy. If it's feasible, you may decide to survey the entire collection rather than produce a sample of 400 items. See Section 4 for detailed guidance.

As regards the sampling methods:

Simple random sampling - for a collection of objects on a server, it will need the involvement of a system specialist to generate an automated inventory of digital objects to sample. Simple random sampling could also be used for selecting media.

Systematic sampling - this will need someone familiar with the location of the items within the chosen population. In the context of digital assets, 'location' can mean the location of servers in a computer room, or location of assets on an individual's hard drive; and it can also refer to the physical location of media, whether in dedicated storage or elsewhere.

Stratified sampling - this method may require both of the above skills.

3.1.5 Who does the data entry?

The D-PAS database has many more questions, and a more complex relational structure, than the L-PAS or M-PAS databases. It is really only appropriate for a user with some computer experience. It requires some understanding of navigation to move from screen to screen and from tab to tab. Therefore it is more than just a matter of simple keyboard input. Data entry could be done by a member of the institution's project team, but not by a word processing department within your institution, or by a commercial bureau. Section 6 of this Manual provides step-by-step guidance to each screen and section, with navigational tips.

3.2 How long will the survey take to complete?

It is difficult to suggest a standard duration for the whole survey. There will be many variables because each organisation and collection is different. The sampling work can often take up the most time unless a detailed site map already exists. The time taken for the assessments can be reduced by good preparation and commitment to the provision of sufficient staff time. As a very rough guide, it would be advisable to allow at least two person-weeks for the

10

Page 11: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

assessment stage, but it is important to include staff time for all the stages, from planning through to the generation of final reports.

There is no minimum or maximum duration for surveys. It is important to remember, however, that the survey is meant to provide a picture of collection needs at a given point in time. Where possible, the aim should be to complete surveys within a few weeks. A longer time frame may introduce distorting effects due to variables such as staff changes and seasonal variations.

The large numbers of questions involved in the D-PAS survey are also likely to prove a major factor in the length of the survey.

3.3 Planning the work pattern

Although it should in theory be possible to survey for 7 hours a day, 5 days a week, this is likely to be undesirable for a variety of reasons. Consideration should be given to factors such as fatigue, boredom, and health and safety. Staff should feel comfortable when carrying out the surveying work, and regular breaks should be taken away from the storage areas as with any physical labour. Fatigue from continuous surveying activity could lead to a decline in the reliability of assessment decisions. Staff should not be expected to work at a pace that brings health and safety risks. It may also be undesirable to work continuously in a chilled or dirty environment. Students and volunteers may require training in handling carriers.

3.4 What equipment will be needed?

3.4.1 Sampling

Inventories of the collections Building plans and storage locations The D-PAS Guidance Manual A random number generator Markers (Post-It notes or similar) to identify shelves or other storage locations

3.4.2 Assessing

A trolley and/or table Chairs for those sitting at the table and/or using the computer Laptop computer or sufficient copies of the survey form Clipboards for use with paper forms Pencils Notebook An extension lead

Practical experience from this and other surveying exercises suggest that paper recording and subsequent data entry are preferable to direct entry on a laptop computer.

3.4.3 Data Entry

A computer loaded with Microsoft Access 2003 is required.

11

Page 12: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

SECTION 4. Sampling 4.1 What is sampling and why is it important?

Sampling is a statistical methodology for testing a subset of a population. In the context of D-PAS it would be best to sample collections of digital objects (the population). Assessment of every single object is unfeasible, because of the large quantities of objects that typically make up digital collections.

Sampling is particularly necessary at this stage of the project's development. The approach we are taking means that the D-PAS assessment process is not automated, and may not be for some time. Given that using D-PAS to assess collections would be carried out manually, limiting the size of the sample is of crucial significance. However, it is also necessary to ensure that the sample size will produce results that are statistically significant.

4.2 Which sampling method?

The sampling method used will depend to some extent on the size of the organisation's collection, and its general business processes. If it's feasible, you may decide to survey the entire collection rather than produce a sample of (say) 400 items, which is what NPO recommends. Note, however, that the minimum sample size to generate statistically valid results (for reasons that won't be gone into here) is 400 objects. For the sake of consistency a sample size of 400 should be used whenever possible. However, where the collection to be assessed is small (less than 2,000 objects) then the sample size may be reduced to 350 items, and to 225 if the population is 500 or less.

There are three general sampling methods that can be used for selecting the sample from the collection. These are:

(1) Simple random

(2) Systematic

(3) Stratified

Each is explained briefly below, and a hypothetical example included to provide some idea of how each method might work in practice.

Whichever sampling method is used, it is essential to decide before the survey begins what is to be included within the boundaries of a population. This requires decisions about identifying the population, understanding where the population is located, and how it is ordered. Once made, these decisions are expressed and recorded in D-PAS as the Target Collection Area. The survey assumes you may need to nominate several Target Collection Areas, and assess each of them in turn.

In the context of digital assets, 'location' can mean the location of servers in a computer room, or location of assets on an individual's hard drive; and it can also refer to the physical location of media, whether held in dedicated storage, or elsewhere.

The sampling process is crucial and may take longer than completing the assessments. The validity of the data produced is dependent on the accuracy of the sample selection and therefore this expenditure of time is justified.

4.3 Simple random sampling

Simple sampling gives equal weight and importance to all members of the population. In the case of digital objects there is an underlying assumption that all objects are worth assessing, if they constitute an organisational asset, or form part of an asset.

Random sampling means that each member of the population has an equal chance of being chosen for assessment. To reach a true random sample will involve the use of some form of random number generator. This could take the form of a look-up table or a software pseudo-random number generator (which will be good enough for using D-PAS - see http://www.random.org/nform.html).

Example: a repository with 12,000 digital objects stored on a single server. The unit of sample is a single digital object. First, an inventory of the objects will be required; such an inventory

12

Page 13: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

should be automated, to minimise human intervention. Second, some form of random number system will be necessary in order to choose the sample set from the inventory.

Random numbers can be generated by using an online service such as http://www.random.org/nform.html. This requires the input of the population size by stating the smallest and largest values (this defines the population size - so here 1 and 12000 are the required values) and the number of random integers required (that is, the sample size - in this case, 400).

This service generates a set of random numbers which are listed in random order, and therefore not particularly useful for making a D-PAS assessment. So the list of random numbers needs to be converted to some useable format, such as comma separated values, which can be opened in a spreadsheet and ordered properly. (Random numbers can also be found in printed lookup tables, although it is doubtful that anyone would have access to these.)

Summary of steps:

Identify the target collection as a large number of digital objects, all stored on one server.

Create an automated inventory of them. Feed the results of the inventory into the online random number generator. Convert the results into a csv file.

Simple random sampling could also be used to sample collections of storage media, particularly if they are held in one place and in a shelving system. This method assumes that the number of storage units is known, or can be counted.

Example: a repository with 8,000 media objects (CDs, tapes, etc) kept in dedicated storage. The unit of sample is the entire CD or tape. Select the 4th object on every nth unit and mark the location, and, where feasible, also mark the shelf/rack/drawer etc.

Counting of units must be done in a systematic and consistent manner within a population. For example, you could start at the left of the door as you enter a storage room and then count from the top of a bay of shelves. Selecting the 4th object within a unit should also be done in a consistent way, e.g., the 4th CD from the left, starting from the front, or the 4th tape on a shelf, starting from the left. If there are fewer than 4 objects in a unit, choose the last one, starting from the left.

4.4 Systematic Sampling

Systematic sampling is the selection of every nth element from a population, where n is calculated by dividing the total population by the number in the sample. For example, in a population of 10,000 where we require a sample size of 500, we would select every 10000/5000 object. This means every 20th object would be selected for assessment (ie objects 20, 40, 60, 80...).

This method differs from simple random sampling only in the manner in which objects are chosen for assessment.

Example: a repository containing 12,000 digital objects. Since we need a minimum sample set of 400, systematic sampling would sample every 30 objects (12,000/400 = 30), starting with number 30. If we wanted to assess a sample of 600 objects we would test every 20 objects (12000/600 = 20), and so on.

4.5 Stratified sampling

Stratified sampling is a process of grouping the population into relatively homogenous sub-groups. For example, grouping a collection of digital objects by format type, such as all JPEGS, all DOCs, all PDFs etc. Within each of the subgroups, sampling can be either random or systematic.

Example: a collection of 12,000 digital objects scattered across servers, hard drives, and media such as CDs and DVDs. In this example the collection consists of:

3,000 files in .doc format

13

Page 14: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

2,000 files each in .jpg and .pdf formats 1,000 files each of images in .tiff format, files in .html format, and files in .csv format 500 files each of files in .gif format, .txt format, .png format and .xml format

These objects are stored on two servers, four hard drives and 600 CDs. Each of these storage types represents a homogenous sub-set of the whole population.

Stratification can be done on format types, producing 10 subsets of formats as per the list above; or on the basis of storage media, producing 3 subsets.

We can then sample each of these homogenous sub-sets by testing 400 .doc files, .jpg files, .pdf files, 350 each of the tiff, html and csv files, and 225 each of the remaining 4 format types; or by testing 400 objects stored on servers, 400 on hard drives, and 400 stored on CDs. We can choose either random sampling within each subset or systematic sampling, as described above. If our concern is with the storage media themselves, we could test 225 of the 600 CDs for media problems.

14

Page 15: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

SECTION 5. Assessment You are now ready to start the actual 'assessment', i.e. the survey of the collection. The collection has been 'mapped', the target collection areas have been identified, the sampling calculations have been made, and the sample of 400 items has been identified and marked in some way.

Part II of the Guidance Manual is your guide to using the D-PAS database.

5.1 Stages

You must create and complete a database record for each object in the sample. Unlike the NPO approach, making paper records on copies of questionnaire forms is NOT an option.

The D-PAS database is structured to make the task of assessment as easy as possible. Its hierarchical structure and use of linking fields will mean that top-level information cascades down and ‘attaches’ itself to each relevant record. This minimises the work of data entry.

Broadly it works in 5 levels:

Levels 1-2 contain questions which only need to be answered once, as they apply to the entire organisation.

Level 3 identifies the target collection areas. You need to create several of these, but once completed the profile information for each target collection will attach itself to each collection assessed.

Level 4 is the most detailed part of the survey. Here, you will be answering several series of questions about each collection. However, you are still making statements applicable to the entire collection in each case, and so you are not required to state and restate the same information for each of the 400 items in the sample. Level 4 splits into two discrete areas, one for objects held on servers and one for objects held on media.

Level 5 requires you to report on all the individual items in the collection. This is not as daunting as it may appear, as you are simply reporting on the unique characteristics of each digital object. Remember, all the profile information completed in Levels 1-4 is ‘attached’ to each item.

Tick the ‘Not Applicable’ (N/A) box for any question which does not apply to your organisation’s way of working. Doing so means the question will not score, although the number of NA boxes ticked could be added together and used for generating a report of some sort.

5.2 Completing the questionnaire: collection assessments

For assets held on servers, the purpose of level 4 is:

To assess the state of the hardware they’re held on To assess the management of systems (including backup and migration of assets) To assess the accommodation of the servers they’re stored on To assess your preservation actions for the assets (in some detail)

For assets held on media, the purpose of level 4 is:

To assess the condition of the media To assess the management of systems, insofar as they affect media To assess the handling of the media To assess the accommodation and storage conditions of the media

A systems administrator or other IT manager will be needed, to give advice or additional information.

15

Page 16: DAAT guidance manual

Digital Asset Assessment Tool – Guidance Manual (part I) -Version 1 – 2006-09-29

The condition of each object in the sample should be recorded, even if just to indicate that it is undamaged or stable. Unlike L-PAS/M-PAS, it is not really relevant to think in terms of ‘damage’ to digital objects, nor to differentiate between ‘types’ of damage to storage media. This is where D-PAS has been rethought.

As noted above, there is no need to differentiate the sorts of damage present on a CD, CD-ROM, DVD, hard disk or tape, as there are no real ‘degrees’ of damage that are relevant for our purposes. The survey merely suggests a checklist of types of damage to look for. There is also a simple readability test applicable to CD or DVD storage media.

5.3 Completing the questionnaire: condition assessments

For assets held on servers, the purpose of level 5 is:

To log some basic identification / profile information relevant to the individual asset To rate its value and importance to the organisation To assess the stability (rather than the ‘condition’) of the file format of the individual asset,

and the software used to create / access / preserve it

For assets held on media, the purpose of level 5 is:

To log some basic identification / profile information relevant to the media To rate the value and importance to the organisation of the asset(s) held on the media

16