Upload
espen
View
39
Download
0
Embed Size (px)
DESCRIPTION
Jim Tuttle North Carolina State University Libraries. Tools Development and Demonstration: North Carolina Geospatial Data Archiving Project. Process Overview. Data transfer Threat and format analysis, validation Archive package organization Selective format migration - PowerPoint PPT Presentation
Citation preview
NCSU Libraries
Tools Development and Demonstration:North Carolina Geospatial Data Archiving Project
Jim TuttleNorth Carolina State University Libraries
NCSU Libraries
Process Overview
• Data transfer• Threat and format analysis, validation• Archive package organization• Selective format migration• Metadata normalization and supplementation• Source metadata translation• Statistics collection• Extra-repository AIP management
NCSU Libraries
Data Transfer
• Python Md5sum comparison• 'Transfer set' metadata capture in 'Seed file'
NCSU Libraries
Threat and format analysis, validation
Python wrappers for the following:
• Virus – ClamAV• Compressed files (tar, zip, gzip, bzip)• Geodatabases (extension and size)• Executable files (magic numbers)• Jhove validation
NCSU Libraries
Archive package organization
• ESRI ArcGIS toolbar for selected formats
NCSU Libraries
Archive package organization
• Rule-based python logic– filestem – extension relationships
( multi-file format validation)
– directory structure• Manual intervention
– metadata.doc• NOID assignment
NCSU Libraries
Selective Format Migration
• Coversions using ArcGIS toolbar– e00 interchange to coverage to shapefile– geodatabase to raster, shapefile, etc
• Original files retained
NCSU Libraries
Metadata Normalization & Supplementation
• Agency-specific XML templates in ArcCatalog with synchronization flags
• Provenance and curation metadata scripted
NCSU Libraries
Source Metadata Translation
• Hub-and-spoke model a la Echo Depository– repository agnostic– modular conversion
hub– facilitate repository
software migration & inter-archive exchange
NCSU Libraries
Statistics Collection
• Python scripted statistics generation:– number of files by format– cumulative size by format– mean file size– collection size– agency contribution
NCSU Libraries
Extra-repository AIP management
• Workflow Management Database populated as a spoke on the metadata/ingest hub
• External tracking of NOID, Handle, ISO keywords, other metadata for interaction with other systems
NCSU Libraries
Questions?
Jim TuttleGeospatial Data Librarian &Project Coordinator
NCGDAPNCSU Librariesjim_tuttle at ncsu dot edu
http://www.lib.ncsu.edu/ncgdap/