Upload
datamanagement
View
1.611
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presentation given by Catherine Hardman of the Archaeology Data Service in York.The presentation was given at the 'Managing Archaeology Data' event on Monday 7th March 2011 at the University of Glasgow.
Citation preview
Managing large and complex data sets:
… THE CHALLENGES OF ARCHIVING AND ONLINE DELIVERY
CATHERINE HARDMAN
My lithics report here, on floppy disc
The problem….in 1996
The Archaeology Data Service:•set up in 1996 •one of five AHDS subject centres•based within the University of York
Funding:•initially received funding from
•Arts and Humanities Research Council (AHRC)
•Joint Information Systems Committee (JISC)•Presently receives core funding from AHRC alongside cross-sectoral, project-based funding.
The ADS: some ancient history
Our remit:
“To support research, learning and teaching with high quality and dependable digital resources.”
In practice this means three key things:
•That ADS collect and preserve datasets•That we allow full, easy and free access to these•And that we additionally provide guidance and support to data creators
What do we do?
No need for digital preservation
Domesday Book: Publisher: William of
Normandy (1086) – still readable
Where’s preservation when you need it?
Domesday Disc: Publisher: BBC (1986) –nearly lost
Why is it important?
Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B. and Stafford, S.G. 1997. Nongeospatial Metadata for the Ecological Sciences. Ecological Applications. 7: 330-342.
What’s the problem? Information Entropy
The scale of the problem in the 1990s
None47%
Humidity control
8%
Heat control
7%
Fire-resistant container
23%
Anti-magnetic
10%
Anti-static
protected5%
Strategies for protecting physical media
Findings and Recommendations from ‘Digital Data in Archaeology: A Survey of User Needs’ Condron et al 1999
Protecting Physical media
…never the twain
The scale of the problem in the 1990s
Hard disc28%
Tape22%
CD-ROM14%
Netw ork13%
Floppy disc23%
The popularity of storage options
Findings and Recommendations from ‘Digital Data in Archaeology: A Survey of User Needs’ Condron et al 1999
8" Floppy
3.5" Floppy
5.25" Floppy
12" Optical Disk
5.25" Optical Disk
CD-ROM
Sparq Disk Cartridge
Zip Disk
Click!
DVD-ROM
Jaz Disk
Floptical Disk
Punch Tape
Rectangular Hole Punch Card
IBM 3480
DLT Tape
DG90M Tape
DC4_120
8mmD-eight
QIC DC600
G2000 Tape
4mm Tape
Ditto Max
9-Track Reel
Cassette tape
Memory Stick
MultiMedia Card SD Memory Card
xD Picture Card
Smart Media
CompactFlash
Travan
Why is it all so difficult?
Deterioration of the storage medium Obsolescence of the storage mediumFailure to document the format adequatelyObsolescence of the softwareObsolescence of the hardware Long-term management
How do we do it?Open Archival Information System (OAIS)
But that’s people…
Migration based approach & controlled ingest
Aim to connect with data
producers early on in their project
lifecycles to ensure that preservation
planning is a key consideration
during the project rather than an afterthought.
17
Guides to help you do all that.
It hasn’t really got much easier
The goal posts keep moving!
The size of digital archives held by different types of The size of digital archives held by different types of archaeological bodies archaeological bodies
0
10
20
30
40
1-5Mb 5-10Mb 10-50Mb 50-100Mb 100-1,000Mb
>1Gb
Num
ber
of a
rchi
ving
bod
ies
National body
Local gov. archaeology
Field archaeology
HEI
Museum
Consultancy
http://ads.ahds.ac.uk/
Archaeology Data Service
Big Data ProjectRoughly how much data would be generated by a single project?
Average project size (estimated)
19%
3%
3%
25%
50%
over 200GB
150 - 200GB
100 - 150GB
50 - 100GB
under 50GB
Which of these data collection techniques do you carry out?
Technologies used
12%
4%
4%
3%
8%
1%
3%
11%
9%
9%
7%
14%
3%
12%
3D Laser Scanning
Sidescan Sonar
Multibeam Scanning
Single Beam Scanning
Geophysics
Acoustic Tracking
Sub bottom profiling
Geographic (eg GIS)
Lidar
Digital Video
Video Movie Clips
Still Images
CAD (2D or 3D)
Other
What are the main software packages you use ?
Software (noted more than once)
4%10%
12%
4%
4%
4%
6%6%10%
4%
4%
4%
8%
6%
4%
4%4%
3D Studio Max
ArcGIS
AutoCAD
BAE SOCETSET
CODA
ENVI / IDL
ERDAS Imagine
Golden Software Surfer
Leica Cyclone
MicroStation
Pointools
Polyworks
RapidForm
TerraScan
Trimble Realworks
Custom software
MySQL
Do you have an archiving policy for the data sets / types in question?
Archival policy?
48%
27%
25%
Yes
No
No response
back-up
When you start a new project …would you consider using existing datasets?
Yes, 28
Not answered, 2
Yes
Not answered
This is the opportunity!
Making the inaccessible accessible
to make available unpublished fieldwork reports in an easily retrievable fashion. There are currently 8018 reports available and this number is increasing steadily through the OASIS project in England and Scotland.
…between publication and archives …
Blurring the distinction …
Making the LEAP…
What does that mean for you?
Plan for reusePlan for reusePlan for reusePlan for reuse
How do you do that?
Include a data management plan (use the DCCs)Order your dataFile naming strategyVersion controlBack-up (in the field)Consider your file formatsDissemination plan (and it’s longevity)What does the long term look like?Discuss requirements with an archive
We’re here to help
http://archaeologydataservice.ac.uk/
http://guides.archaeologydataservice.ac.uk/