Upload
thomasina-dalton
View
214
Download
0
Embed Size (px)
Citation preview
WGS Data management course
Try-out
2012-09-24, Hugo Besemer
Short time storage: file and path names
MS/Windows , Mac OS allow very long names but ...
Are your filenames descriptive?
Are your filenames unique?
8.3 convention (12345678.abc ) important e.g. when burning CD’s or DVD’s
Avoid spaces for files that may go on the web
Avoid punctuation () \ / : * ? " < >’As they may be reserved in operating system or programming languages
Short time storage: Descriptive file names
Descriptive filename Not unique Unique
in a folder structure (across folders) (across folders)
This will work for relatively small numbers of files. If
large numbers of files are produced automatically non-
descriptive filenames may be used. You need to know
something else (“DAMS “Digital assets management
system”) to keep track what is what
Short time storage: version control
Questions and Best practices
●Are you working alone or with others?
●Do you store files at different locations? (synchronisation)
●Keep track of ‘master files’ and ‘milestone files’ and store them in a single location (Dropbox?)
Identifying versions
●Use a naming convention that includes date or number (..._v1, ..._v2)
●Your software may be able to do (part of) the job
Backups
Stick to the agreed way of working within your group (if there are any)
In the next slides some points of view from the Wageningen UR IT department (FB-IT)
Backups: IT Data storage Continuity
Versus
• Data centre . Secure: (fire, power incidents, burglary).
• 2 data centres in case of disaster• The equipment is fail-safe • 500 TB reserved, 300 in use, 1 PB avail
Backups: ICT Data Products & Services
Service:
Application
Price per GB
Backup
ReliableAvailable
Speed
Minimalsupply
Bronze
Volatile or static data
€3,25
Week
Good
Good
50 GB
Silver
Databases or research data
€5,- without€7,- withbackup
Month
High
Fast
50 GB
Gold
Critical data
€15,-
Month+ History
max 1 year
VeryHigh
Fast
1 GB
Massive
Mass reproducable
data
€520 / TB
No
Good
Good
1 TB
Massive double
Same as massive, high
availability
€1000 / TB
No
High
Good
1 TB
Backups: Better alignment
(% is total percentage of score + 1 up or down)
Subject Importance Score FBIT
Ease of use 9 (85%) 8 (64%)
Backup/Restore 9 (64%) 7 (28%) very diverse
Share (intern) 9 (79%) 9 (71%)
Share (external) 6 (28%) very diverse 5 (21%) many n/a
Archive function 8 (50%) many n/a 5 (14%)
Findable 9 (79%) 7 (28%)
Price 9 (86%) 4 (28%) very diverse
Speed data transfer 9 (72%) 5 (21%) very diverse
Availability 9 (79%) 8 (64%)
Flexibility 8 (78%) 6 (28%)
Security 7 (50%) very diverse 8 (57%)
Backups: Data storage workshop conclusions
Enhancements Request:
1. Lower the price
2. Set up a Concern policy for Information security
3. Higher flexibility (request period, use period, costing, etc)
4. Accessibility for external people
5. Deliver a Product for Archiving
6. Higher throughput (data rate)
What is the next step?
● Building a roadmap for IT Storage and Products
Long term storage: Metadata
Content metadata
Context metadata
Metadata serves different purposes:
Metadata are structured data that provide a short summary about any
information resource, print or electronic, and facilitate the location,
identification, or discovery of that resource.
Subject terms, titles
creator, place , time, project
Location. Metadata can indicate where an information resource is located, either physically or virtually.Identification. Metadata can distinguish one information resource from another without describing the entire collection of information resources.Resource discovery. Metadata can link a user's queries about a particular subject with those information resources about the same subject.
Long term storage: metadata and datasets
Long term storage: metadata and datasets 2
Long term storage: metadata and datasets 3
DANS: Dutch national repository for datasets
Unique ID
Long term storage: metadata, datasets and
preservation
It’s as open as you want it to be
In a sustainable format, independent of (version of) software
With proper documentation for re-use
Long term storage: selection
Practical
Origin
Status
Subject content
Easy to reproduceCost of documentation / conversion acceptableFile size Reliable
AuthenticIs it stored elsewhere?
Required for verificationRequired for legal purposes
Re-usable
General interest
(WUR)mission
What does all this mean
for your data
management plan?