14
AFS-Workshop 2008 Dr. Stephan Wonczak [email protected] © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik der Universität zu Köln (ZAIK/RRZK) Organizing Data How to manage large, randomly growing data sets in AFS

AFS-Workshop 2008 Dr. Stephan Wonczak [email protected] © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

Embed Size (px)

Citation preview

Page 1: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 1

Dr. Stephan Wonczak

Abteilung Systeme

Zentrum für Angewandte Informatik der Universität zu Köln (ZAIK/RRZK)

Organizing DataHow to manage large, randomly growing data sets in AFS

Page 2: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 2

Contents

The Problem

Wrong paths

Meeting the user

The ‘Right Way’™ - A case study

Page 3: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 3

The Problem

User comes along... usually by EMail

“I want to put up some data. Lots of it”

“I don’t want any management overhead”

“I want everything in a single directory”

Users usually do not know their real needs

Data structure?

Data size (MB? GB? TB?)

Access?

Page 4: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 4

Wrong paths

What the user wants is seldom what he

needs

Do not simply give in to demands

Do not do a quick-and-dirty solution

It WILL come back and bite you!

“Why didn’t you tell me before this was a bad

idea?”

“Why didn’t you tell me about this limitation?”

Page 5: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 5

Meeting the User

Talking to the User is essential

Ask about the type of data Is AFS really suited for the data?

Databases? High Performance Data Access?

Talk about AFS’s Limits:

Bandwidth, file size, number of clients...

Ask about data structure Look for ‘natural divisions’ in the data set

If needed, create something usable

Having manageable units is essential

Page 6: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 6

Meeting the User

Talking to the User is essential

Talk about access models

Who will actually work with the data?

Confidentiality of the data

Talk about Backup

Ask about the rate of change

Create a backup schedule

Getting xx TB to/from tape takes time!

Discuss worst-case-scenarios

Page 7: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 7

Meeting the User

Talking to the User is essential

Talk about money

Who will pay for all the storage?

How much effort is this going to take?

(ie: Who will pay for the people doing the job?)

Talk about responsibilities and guarantees

Who will manage what part of the job?

Talk about MTBF and reaction time to problems

Page 8: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 8

The ‘Right Way’™

Arachne Archive – A case study

Original situation: 1 Server, 1 Disk (1 TB RAID)

Locally managed by the user

Local access only

User wishes

User was concerned about availability

Projected needs exceeded their means

Easy external access was needed

Page 9: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 9

The ‘Right Way’™

Arachne Archive – A case study (continued)

1st Version: 32 GB-Volumes mounted in a single directory

Start: 170 Volumes

End: 390 Volumes with ~5,5 TB data

Paths to data are kept in a database

Problems with this scheme: User wanted fixed order of the data

Inserting Data was a big problem:

Volume size, Mountpoints, symbolic links

Page 10: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 10

The ‘Right Way’™

Arachne Archive – A case study (continued)

2nd Version

Analyzing the data led to the following:

10000 Volumes, each with 5 GB (50 TB total quota)

Data folders are named numerically: 0000 – 9999

Organisation in 2 tiers

Order criterion: last 4 digits of the object number

Growth is possible

Structure is easily extensible

Page 11: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 11

The ‘Right Way’™

Arachne Archive – A case study (continued)

archiv

0000

0100

0200

9900

0100

0101

0102

0199

.

.

.

.

.

.

0100340101

BNC530101c

A40330101

DatenVerzeichnis

DatenVerzeichnis

Datenstruktur ArchäologieAFS unter dem Verzeichnisarachne

Page 12: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 12

The ‘Right Way’™

Arachne Archive – A case study (continued)

Tools for the User and Admin

Scripts to automatically sort in the data

Scripts to readjust quota

Scripts to balance volumes between partitions

Advantages

Human readable 'Hash‘: Data can be easily located

Inserting data manually is still possible

Page 13: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 13

The ‘Right Way’™

More Projects

Prometheus Picture Archive (failed)

CEEC Picture Database

Opera Project (scanned scores)

TR32DB (Project document archive)

Shared Folders for Institutes

Department of Biology

Department of ‘HW’ (in preparation)

Page 14: AFS-Workshop 2008 Dr. Stephan Wonczak wonczak@rrz.uni-koeln.de © 2008 ZAIK/RRZK 1 Dr. Stephan Wonczak Abteilung Systeme Zentrum für Angewandte Informatik

AFS-Workshop 2008

Dr. Stephan Wonczak

[email protected]

© 2008 ZAIK/RRZK 14

Thank you for listening!

Questions?

これで終わります。

質問がありませんか。