34
Managing large and complex data sets: … THE CHALLENGES OF ARCHIVING AND ONLINE DELIVERY CATHERINE HARDMAN

Managing large and complex data sets

Embed Size (px)

DESCRIPTION

Presentation given by Catherine Hardman of the Archaeology Data Service in York.The presentation was given at the 'Managing Archaeology Data' event on Monday 7th March 2011 at the University of Glasgow.

Citation preview

Page 1: Managing large and complex data sets

Managing large and complex data sets:

… THE CHALLENGES OF ARCHIVING AND ONLINE DELIVERY

CATHERINE HARDMAN

Page 2: Managing large and complex data sets

My lithics report here, on floppy disc

The problem….in 1996

Page 3: Managing large and complex data sets

The Archaeology Data Service:•set up in 1996 •one of five AHDS subject centres•based within the University of York

Funding:•initially received funding from

•Arts and Humanities Research Council (AHRC)

•Joint Information Systems Committee (JISC)•Presently receives core funding from AHRC alongside cross-sectoral, project-based funding.

The ADS: some ancient history

Page 4: Managing large and complex data sets

Our remit:

“To support research, learning and teaching with high quality and dependable digital resources.”

In practice this means three key things:

•That ADS collect and preserve datasets•That we allow full, easy and free access to these•And that we additionally provide guidance and support to data creators

What do we do?

Page 5: Managing large and complex data sets

No need for digital preservation

Domesday Book: Publisher: William of

Normandy (1086) – still readable

Page 6: Managing large and complex data sets

Where’s preservation when you need it?

Domesday Disc: Publisher: BBC (1986) –nearly lost

Page 7: Managing large and complex data sets

Why is it important?

Page 8: Managing large and complex data sets

Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B. and Stafford, S.G. 1997. Nongeospatial Metadata for the Ecological Sciences. Ecological Applications. 7: 330-342.

What’s the problem? Information Entropy

Page 9: Managing large and complex data sets

The scale of the problem in the 1990s

None47%

Humidity control

8%

Heat control

7%

Fire-resistant container

23%

Anti-magnetic

10%

Anti-static

protected5%

Strategies for protecting physical media

Findings and Recommendations from ‘Digital Data in Archaeology: A Survey of User Needs’ Condron et al 1999

Page 10: Managing large and complex data sets

Protecting Physical media

…never the twain

Page 11: Managing large and complex data sets

The scale of the problem in the 1990s

Hard disc28%

Tape22%

CD-ROM14%

Netw ork13%

Floppy disc23%

The popularity of storage options

Findings and Recommendations from ‘Digital Data in Archaeology: A Survey of User Needs’ Condron et al 1999

Page 12: Managing large and complex data sets

8" Floppy

3.5" Floppy

5.25" Floppy

12" Optical Disk

5.25" Optical Disk

CD-ROM

Sparq Disk Cartridge

Zip Disk

Click!

DVD-ROM

Jaz Disk

Floptical Disk

Punch Tape

Rectangular Hole Punch Card

IBM 3480

DLT Tape

DG90M Tape

DC4_120

8mmD-eight

QIC DC600

G2000 Tape

4mm Tape

Ditto Max

9-Track Reel

Cassette tape

       Memory Stick

MultiMedia Card SD Memory Card

xD Picture Card

Smart Media

CompactFlash

Travan

Page 13: Managing large and complex data sets

Why is it all so difficult?

Deterioration of the storage medium Obsolescence of the storage mediumFailure to document the format adequatelyObsolescence of the softwareObsolescence of the hardware Long-term management

Page 14: Managing large and complex data sets

How do we do it?Open Archival Information System (OAIS)

Page 15: Managing large and complex data sets

But that’s people…

Page 16: Managing large and complex data sets

Migration based approach & controlled ingest

Aim to connect with data

producers early on in their project

lifecycles to ensure that preservation

planning is a key consideration

during the project rather than an afterthought.

Page 17: Managing large and complex data sets

17

Guides to help you do all that.

Page 18: Managing large and complex data sets

It hasn’t really got much easier

The goal posts keep moving!

Page 19: Managing large and complex data sets

The size of digital archives held by different types of The size of digital archives held by different types of archaeological bodies archaeological bodies

0

10

20

30

40

1-5Mb 5-10Mb 10-50Mb 50-100Mb 100-1,000Mb

>1Gb

Num

ber

of a

rchi

ving

bod

ies

National body

Local gov. archaeology

Field archaeology

HEI

Museum

Consultancy

http://ads.ahds.ac.uk/

Archaeology Data Service

Page 20: Managing large and complex data sets

Big Data ProjectRoughly how much data would be generated by a single project?

Average project size (estimated)

19%

3%

3%

25%

50%

over 200GB

150 - 200GB

100 - 150GB

50 - 100GB

under 50GB

Page 21: Managing large and complex data sets

Which of these data collection techniques do you carry out?

Technologies used

12%

4%

4%

3%

8%

1%

3%

11%

9%

9%

7%

14%

3%

12%

3D Laser Scanning

Sidescan Sonar

Multibeam Scanning

Single Beam Scanning

Geophysics

Acoustic Tracking

Sub bottom profiling

Geographic (eg GIS)

Lidar

Digital Video

Video Movie Clips

Still Images

CAD (2D or 3D)

Other

Page 22: Managing large and complex data sets

What are the main software packages you use ?

Software (noted more than once)

4%10%

12%

4%

4%

4%

6%6%10%

4%

4%

4%

8%

6%

4%

4%4%

3D Studio Max

ArcGIS

AutoCAD

BAE SOCETSET

CODA

ENVI / IDL

ERDAS Imagine

Golden Software Surfer

Leica Cyclone

MicroStation

Pointools

Polyworks

RapidForm

TerraScan

Trimble Realworks

Custom software

MySQL

Page 23: Managing large and complex data sets

Do you have an archiving policy for the data sets / types in question?

Archival policy?

48%

27%

25%

Yes

No

No response

Page 24: Managing large and complex data sets

back-up

Page 25: Managing large and complex data sets

When you start a new project …would you consider using existing datasets?

Yes, 28

Not answered, 2

Yes

Not answered

Page 26: Managing large and complex data sets

This is the opportunity!

Page 27: Managing large and complex data sets
Page 28: Managing large and complex data sets

Making the inaccessible accessible

to make available unpublished fieldwork reports in an easily retrievable fashion. There are currently 8018 reports available and this number is increasing steadily through the OASIS project in England and Scotland.

Page 29: Managing large and complex data sets

…between publication and archives …

Blurring the distinction …

Page 30: Managing large and complex data sets

Making the LEAP…

Page 31: Managing large and complex data sets
Page 32: Managing large and complex data sets

What does that mean for you?

Plan for reusePlan for reusePlan for reusePlan for reuse

Page 33: Managing large and complex data sets

How do you do that?

Include a data management plan (use the DCCs)Order your dataFile naming strategyVersion controlBack-up (in the field)Consider your file formatsDissemination plan (and it’s longevity)What does the long term look like?Discuss requirements with an archive

Page 34: Managing large and complex data sets

We’re here to help

http://archaeologydataservice.ac.uk/

http://guides.archaeologydataservice.ac.uk/

[email protected]