Upload
crete
View
37
Download
0
Embed Size (px)
DESCRIPTION
Experience of a low-maintenance distributed data management system. W.Takase 1 , Y.Matsumoto 1 , A.Hasan 2 , F.Di Lodovico 3 , Y.Watase 1 , T.Sasaki 1. 1. High Energy Accelerator Research Organization (KEK), Japan 2. University of Liverpool, UK 3. Queen Mary, University of London, UK. - PowerPoint PPT Presentation
Citation preview
Experience of a low-maintenance distributed data management
systemW.Takase1, Y.Matsumoto1, A.Hasan2, F.Di Lodovico3, Y.Watase1, T.Sasaki1
1. High Energy Accelerator Research Organization (KEK), Japan2. University of Liverpool, UK
3. Queen Mary, University of London, UK
1
Contents• KEK iRODS system
– Running in production over 2 years– Rules enable to store file efficiently– Federation with QMUL
• iRODS applications– SCALA : Visualization tool for SCALA– iRODS XOR-based backup
• Summary
2
iRODS overview
3
• Distributed data management system• Client-server architecture• Allows data management policies to be
enforced on the server-side• Provides interface to many different
types of storage• Client can access to iRODS via
– i-commands : Commands-line utilities– iRODS Browser : Web interface
KEK iRODS Systems• 4 iRODS servers
– RHEL 5.6– iRODS 2.5 ⇒ 3.2– PostgreSQL 9.1.1– 2 years〜
4
• iRODS Zone– KEK-T2K– KEK-MLF– KEKZone– demoKEKZone
HPSS (High Performance Storage System)
Disk System
• Storage resource
Data Management for T2K• Tokai to Kamioka (T2K)
Neutrino experimental group• The experimental data is
stored to KEK storage• The group needed to provide
an easy way to quickly access data collected to evaluate the quality of the data from outside of KEK
• iRODS provided the solution
5
http://t2k-experiment.org/wp-content/uploads/t2kmap.gif
Data Management for T2K• KEK-T2K Zone for the experimental group
started operation from October 2010• Detected data are processed then
transferred to KEK iRODS• People in the group became to able to
access the stored data easily and quickly– i-commands– iRODS Browser
6
iRODS Rules for KEK-T2K Zone• Bundle and replicate the data
7
Client
T2Kdata server
disk
DBDisk system
HPSS
iRODSserver
rodswebfilefile
filetar file
tar file
Each experimental data file is small (〜 several MB)
HPSS prefers large file
iRODS Rules for KEK-T2K Zone• Response to request
8
disk
DBDisk system
HPSS
ClientiRODSserver
rodswebtar filefile
file
request
T2Kdata server
Federation with QMUL
9
• Data replication among 2 sites• Share each site data
KEK-T2KExperimental
dataQMULZoneAnalytical
dataFederation
10
Amount of data in KEK-T2K
T2K group start the data taking on 22nd
Dec, 2011
11
SCALA : Visualization tool for iRODS
• Statistical Charts And Log Analyzer• iRODS lacked an interface for usage
statistics and also for debugging problems
• We developed a web interface for visualizing iRODS status overview– Statistical Charts page– Log Analyzer page
• SCALA has been installed to KEK iRODS
12
SCALA Overview
iRODS
Resource usage
Log files
Parse Summarize
Display
SCALA
• Input : iRODS outputs• Output : Visualized system daily status as charts
Parsed table
Summarized table
Database
13
Statistical Charts• Visualizes iRODS daily operational data
14
Log Analyzer
1. User clicks an bar
3. User clicks an error message
4. Related log displayed
2. Error detail displayed
• Provides error debugging tool
15
Download SCALA• http://tgwww.kek.jp/scala/
16
iRODS XOR-based backup• Full file replication
– Current method for reliable storage of data is replicate data
– If disk fails or server fails still have a copy– Requires much storage space– Portion of the file becomes corrupt you have to
replace the full file• XOR-based backup
• Reduces the space with same robustness• Splits file into some blocks and creates parity blocks• If a block becomes corrupt you have to recreate only
corrupted block
17
XOR-based backup:100% recovery with any 2 servers fail
Full-File Replication uses 3 servers and needs 300GB
XOR-based backup uses4 servers but only needs 200GB
iRODS rule enables automatic processing
Server1
Server2
Server3
Server4
A B C DE =
B + CF =
C + DG =
A + DH = A + B
18
XOR-based backup:Decoding flow
Server1 Server2 Server3 Server4A B C D
E = B + C
F = C + D
G = A + D
H = A + B
19
Summary• KEK iRODS system has been running in production over 2
years• iRODS gives a way to quickly and easily access data outside
of KEK• Rule of bundle and replicate the data leads to store files
efficiently• Federation with QMUL enables to share each data and
backup• SCALA is a visualizing tool and has been installed KEK iRODS
– It leads to better management of the iRODS overall service• XOR-based backup provides data reliability and less storage
cost compared with replication– iRODS rule enables automatic processing