Upload
andrew-fleming
View
213
Download
0
Embed Size (px)
Citation preview
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 1
Dr. Stephan Wonczak
Abteilung Systeme
Zentrum für Angewandte Informatik der Universität zu Köln (ZAIK/RRZK)
Organizing DataHow to manage large, randomly growing data sets in AFS
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 2
Contents
The Problem
Wrong paths
Meeting the user
The ‘Right Way’™ - A case study
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 3
The Problem
User comes along... usually by EMail
“I want to put up some data. Lots of it”
“I don’t want any management overhead”
“I want everything in a single directory”
Users usually do not know their real needs
Data structure?
Data size (MB? GB? TB?)
Access?
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 4
Wrong paths
What the user wants is seldom what he
needs
Do not simply give in to demands
Do not do a quick-and-dirty solution
It WILL come back and bite you!
“Why didn’t you tell me before this was a bad
idea?”
“Why didn’t you tell me about this limitation?”
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 5
Meeting the User
Talking to the User is essential
Ask about the type of data Is AFS really suited for the data?
Databases? High Performance Data Access?
Talk about AFS’s Limits:
Bandwidth, file size, number of clients...
Ask about data structure Look for ‘natural divisions’ in the data set
If needed, create something usable
Having manageable units is essential
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 6
Meeting the User
Talking to the User is essential
Talk about access models
Who will actually work with the data?
Confidentiality of the data
Talk about Backup
Ask about the rate of change
Create a backup schedule
Getting xx TB to/from tape takes time!
Discuss worst-case-scenarios
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 7
Meeting the User
Talking to the User is essential
Talk about money
Who will pay for all the storage?
How much effort is this going to take?
(ie: Who will pay for the people doing the job?)
Talk about responsibilities and guarantees
Who will manage what part of the job?
Talk about MTBF and reaction time to problems
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 8
The ‘Right Way’™
Arachne Archive – A case study
Original situation: 1 Server, 1 Disk (1 TB RAID)
Locally managed by the user
Local access only
User wishes
User was concerned about availability
Projected needs exceeded their means
Easy external access was needed
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 9
The ‘Right Way’™
Arachne Archive – A case study (continued)
1st Version: 32 GB-Volumes mounted in a single directory
Start: 170 Volumes
End: 390 Volumes with ~5,5 TB data
Paths to data are kept in a database
Problems with this scheme: User wanted fixed order of the data
Inserting Data was a big problem:
Volume size, Mountpoints, symbolic links
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 10
The ‘Right Way’™
Arachne Archive – A case study (continued)
2nd Version
Analyzing the data led to the following:
10000 Volumes, each with 5 GB (50 TB total quota)
Data folders are named numerically: 0000 – 9999
Organisation in 2 tiers
Order criterion: last 4 digits of the object number
Growth is possible
Structure is easily extensible
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 11
The ‘Right Way’™
Arachne Archive – A case study (continued)
archiv
0000
0100
0200
9900
0100
0101
0102
0199
.
.
.
.
.
.
0100340101
BNC530101c
A40330101
DatenVerzeichnis
DatenVerzeichnis
Datenstruktur ArchäologieAFS unter dem Verzeichnisarachne
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 12
The ‘Right Way’™
Arachne Archive – A case study (continued)
Tools for the User and Admin
Scripts to automatically sort in the data
Scripts to readjust quota
Scripts to balance volumes between partitions
Advantages
Human readable 'Hash‘: Data can be easily located
Inserting data manually is still possible
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 13
The ‘Right Way’™
More Projects
Prometheus Picture Archive (failed)
CEEC Picture Database
Opera Project (scanned scores)
TR32DB (Project document archive)
Shared Folders for Institutes
Department of Biology
Department of ‘HW’ (in preparation)
AFS-Workshop 2008
Dr. Stephan Wonczak
© 2008 ZAIK/RRZK 14
Thank you for listening!
Questions?
これで終わります。
質問がありませんか。