Jean-Yves Nief, CC-IN2P3, Lyon
First Latin American EELA Workshop April 24th – 26th, 2006
Data distribution and aggregation over geographically distant sites.
Data distribution and aggregation over geographically distant sites.
First Latin American EELA Workshop, April 24th-26th 2006
2
Talk overview.Talk overview.
• Introduction: big science, big data, big problem.• SRB: an example of a mature data management tool.• Data management and distribution @ CC-IN2P3: a
few examples in various fields (HEP, astrophysics, biomedical applications) using SRB.
• Data management elsewhere: some interesting data management applications in various area.
• Pitfalls and challenges: having chosen the right architecture for your project is not the end of the game.
• Prospects.
First Latin American EELA Workshop, April 24th-26th 2006
4
The present situation.The present situation.
• Large amounts of data produced by scientific projects.• Order of magnitude right now: 100 TB, ~ PB,
millions of records.• In many fields:
– High Energy Physics (SLAC, Fermilab, CERN etc..).
– Astrophysics (simulation projects: Enzo, Terascale Supernova Initiative…, observational data: Eros, MACHO, 2MASS, USNO-B, SDSS, IVOA …).
– Earth sciences (Terashake, Terra …).
– Biology / Biomedical research (BIRN …).
First Latin American EELA Workshop, April 24th-26th 2006
5
Prospects for the future.Prospects for the future.
• Hard to tell but already some indications.• Some examples (next decade):
– DOE Genomics, GTL program.
– Digital libraries for the US administration (NARA).
• Order of magnitude: ~ EB, trillions of records!Amount of data and information exploding. Wider variety of actors: not only big science!• Also true for the networking (next slide: source ESnet).
Science Areas Today End2End Throughput
5 years End2End Throughput
5-10 Years End2End Throughput
Remarks
High Energy Physics
0.5 Gb/s 100 Gb/s 1000 Gb/s high bulk throughput
Climate (Data & Computation)
0.5 Gb/s 160-200 Gb/s N x 1000 Gb/s high bulk throughput
SNS NanoScience
Not yet started
1 Gb/s 1000 Gb/s + QoS for control channel
remote control and time critical throughput
Fusion Energy 0.066 Gb/s(500 MB/s burst)
0.198 Gb/s(500MB/20 sec. burst)
N x 1000 Gb/s time critical throughput
Astrophysics 0.013 Gb/s(1 TBy/week)
N*N multicast
1000 Gb/s computational steering and collaborations
Genomics Data & Computation
0.091 Gb/s(1 TBy/day)
100s of users 1000 Gb/s + QoS for control channel
high throughput and steering
First Latin American EELA Workshop, April 24th-26th 2006
7
Living in a digital world.Living in a digital world.
• Lots of science or digital library projects involving collaborators / users geographically spread.
• Large computing needs (both CPU and storage).• Need for data backup.• And / or need for data closed to the users (replica over
different sites).• Need for collaborative tools to exchange data. Federate geographically distributed computing
facilities.
First Latin American EELA Workshop, April 24th-26th 2006
8
The dawn of cyberinfrastructure (I).
The dawn of cyberinfrastructure (I).
• What is that ?
« An infrastructure based on grids and on application-specific software, tools, and data repositories that support research in a particular discipline. »
• Why is it needed ?– Need to handle heterogeneous hardware.
– Need to handle heterogeneous OS.
– Need to handle heterogeneous storage devices.
– Need to handle various preservation policies across the distributed environment.
First Latin American EELA Workshop, April 24th-26th 2006
9
The dawn of cyberinfrastructure (II).
The dawn of cyberinfrastructure (II).
• Virtualization of the storage.• Necessary in order to develop client applications
transparent to the technology evolution of the underlying storage systems.
• Virtual organization:– Access rights.
– Groups, domains handling: policies for data sharing.
– Preservation policies.
First Latin American EELA Workshop, April 24th-26th 2006
10
Requirements (I).Requirements (I).
Infrastructure independence:• Data virtualization:
– Management of name spaces independently of the storage repositories.
– Support for access operations independently of the storage repositories.
Authentication:– Certificate: GSI etc…– Challenge-response mechanism: no pwd sent over the
network.– Encrypted password.– Ticket: valid for a given amount of time to access the
virtual organization.
First Latin American EELA Workshop, April 24th-26th 2006
11
Requirements (II).Requirements (II).
Data ownership / Authorization:– Management of the files’ownership across
multiple sites (partial or total decoupling between each sites organization and virtual organization).
– Access Control Lists valid for the entire virtual organization across the physical domain (group, user etc… levels).
First Latin American EELA Workshop, April 24th-26th 2006
12
Requirements (III).Requirements (III).Data operations:
– File access:• Open, close, read, write, stat…
• Audit, versions, pinning, checksums, synchronize etc…
• Parallel I/O, firewall interactions.
– Latency management:• Bulk operations: register, load, unload, delete etc…
• Remote procedures: replicate, aggregate, file parsing, I/O requests (FITS, DICOM, HDF5 …).
– Metadata management:• Annotations, metadata/auditing queries, interface with various
information systems (schema extension of the core system).
First Latin American EELA Workshop, April 24th-26th 2006
14
What’s SRB ?What’s SRB ?
• Storage Resource Broker: developed by SDSC (San Diego).
• Provides an uniform interface to heterogeneous storage system (disk, tape, databases) for data distributed in multiple sites.
• Collaborative tool to share files.• Who is using SRB ?
– Biology, biomedical applications (e.g: BIRN).– Astrophysics, Earth Sciences (e.g: NASA).– Digital libraries (e.g: NARA).
• Used world wide: USA, Europe, Asia, Australia.
First Latin American EELA Workshop, April 24th-26th 2006
15
SRB architecture.SRB architecture.• 1 zone:
– 1 SRB/MetaCatalog server: contains list of files, physical resources, users registered, etc…
– several SRB servers to access the data at their physical location.
Site 1
SRB
Site 2
SRB
Site 3
Application(asking for test1.txt, connecting to site 2)
SRBMCAT
(1)
(4)(2)
test1.txt(3)
First Latin American EELA Workshop, April 24th-26th 2006
16
Some SRB features.Some SRB features.
• Files organized in a logical name space with directories, subdirectories.
> /home/nief.ccin2p3 # dir /home/nief.ccin2p3 evs_g_isPhysicsEvents_aod004051 # on tape @ CC-IN2P3 test1.txt # on disk @ Merida
• Handling replica.• Search for the files based on their attributes instead of
their physical name and location (site, storage type: disk, tape, databases).
Search by metadata « attached » to the files.
First Latin American EELA Workshop, April 24th-26th 2006
17
Users and ACLs management.Users and ACLs management.
• Users belong to:– 1 zone (ex: IN2P3, Venezuela …).– 1 domain (ex: ccin2p3, Merida, Caracas).– 1 or several groups.
• ACL on files and directories.• Tickets:
– Rights given to temporary users for a limited amount of time.
First Latin American EELA Workshop, April 24th-26th 2006
18
Storage.Storage.• Mass Storage System (MSS):
• interface provided for HPSS, Castor and many other MSS.• small files management (containers).
MSS usage (tapes etc…) transparent for the end user.
Logical resources: set of physical resources.• resource1: file system Unix @ IAP• resource2: hpss file system @ CC-IN2P3• resource3: file system Unix @ Merida
• Able to put a file in the 3 resources in one shot: > Sput –S logical-res test1.txt <SRB filename>
logical-res
First Latin American EELA Workshop, April 24th-26th 2006
19
Databases.Databases.• Access to databases through SRB:
– Security: SRB server = proxy server Database can be shielded from the outer world, control on
the requests submitted to the database server.– Duplication: very simple copy from a database at one site to
an other one. (e.g.: copy of tables from a Oracle db in Lyon to a mySQL
db at site X in one shot).
• Schema extension:– Possibility to link the SRB-MCAT with some other
databases (search on SRB objects based on attributes stored in an other db).
First Latin American EELA Workshop, April 24th-26th 2006
20
Interfaces, portability.Interfaces, portability.• Interfaces:
– Binary commands (Scommands).– APIs: C, Java, Perl, Python.– Web interface (mySRB).– GUI client for Windows (inQ).
• Portability:– Linux, Windows, Mac OS, Solaris and many more…
• Databases: – Oracle, DB2, Sybase, PostgreSQL, Informix, mySQL…
Data management and distribution @ CC-IN2P3:
examples using SRB.
Data management and distribution @ CC-IN2P3:
examples using SRB.
First Latin American EELA Workshop, April 24th-26th 2006
22
Who is using SRB @ CC-IN2P3 ?Who is using SRB @ CC-IN2P3 ?
In green = pre-production.• High Energy Physics:
– BaBar (SLAC, Stanford).– CMOS (International Linear Collider R&D).– Calice (International Linear Collider R&D).
• Astroparticle:– Edelweiss (Modane, France).– Pierre Auger Observatory (Argentina).
• Astrophysics:– SuperNovae Factory (Hawaii).
• Biomedical applications:– Neuroscience research.
First Latin American EELA Workshop, April 24th-26th 2006
23
Babar, SLAC & CC-IN2P3.Babar, SLAC & CC-IN2P3.
• BaBar: High Energy Physics experiment closed to Stanford (California).
• SLAC and CC-IN2P3 first opened to the BaBar collaborators data analysis.
• Both held complete copies of data (Objectivity).• Now only SLAC hold a complete copy of the data.• Natural candidates for testing and deployment of
grid middleware.• Data should be available in a delay of 24/48 hours.• SRB: chosen for data distribution of hundreds of
TBs of data.
First Latin American EELA Workshop, April 24th-26th 2006
24
SRB BaBar architecture.SRB BaBar architecture.
CC-IN2P3 (Lyon)
HPSS/Lyon
SRB
SLAC(Stanford, CA)
SRB
SRB
SRBMCAT
(1)
(3)
(2)
HPSS/SLACSRB
SRBMCAT
2 Zones (SLAC + Lyon)
First Latin American EELA Workshop, April 24th-26th 2006
25
Extra details (BaBar).Extra details (BaBar).• Hardware:
– SUN servers (Solaris 5.8, 5.9): NetraT 1405, V440.• Software:
– Oracle 10g for the SLAC MCAT.– Oracle 9i for the Lyon MCAT (migration to 10g foreseen).
• MCATs synchronization: only users and physical resources.
• Comparison of the MCATs contents to transfer the data.• Step (1), (2), (3) multithreaded under client control: very
little latency.• Advantage:
– External client can pick up data from SLAC or Lyon without interacting with the other site.
First Latin American EELA Workshop, April 24th-26th 2006
26
Overall assessment for BaBar.Overall assessment for BaBar.
• A lot of time saved for developping applications thanks to the SRB.
• Transparent access to data: – Very useful in an hybrid environment (disk, tape).– Easy to scale the service (adding new servers on the fly).– Not dependent of physical locations changes in the client application.
• Fully automated procedure.• Easy for SLAC to recover corrupted data.• 270 TB (460,000 files) shipped to Lyon.• Up to 3 TB /day from tape to tape (minimum latency).• Going to 5 TB / day soon/
First Latin American EELA Workshop, April 24th-26th 2006
27
Fermila
b (US)
CERN
SLAC (US)
IN2P3 (F
R)
1 T
erab
yte/
day
SLAC (US)
INFN P
adva (I
T)
Fermila
b (US)
U. C
hicago (U
S)
CEBAF (US)
IN2P3 (F
R)
INFN P
adva (I
T) S
LAC (US)
U. Toro
nto (CA)
Ferm
ilab (U
S)
Helmholtz
-Karls
ruhe (D
E) S
LAC (US)
DOE Lab D
OE Lab
DOE Lab D
OE Lab
SLAC (US)
JANET (U
K)
Fermila
b (US)
JANET (U
K)
Argonne (U
S) Leve
l3 (US)
Argonne
SURFnet (
NL)
IN2P3 (F
R) S
LAC (US)
Fermila
b (US)
INFN P
adva (I
T)
ESNET Traffic with one server on both sides (April 2004).
Neuroscience research (P. Calvat).Neuroscience research (P. Calvat).
DICOM
DICOM
DICOM
DICOM
IRMSiemens MAGNETOM
Sonata Maestro Class 1.5 T (Lyon hospital)
ConsolSiemens Celsius Xeon
(Window NT)
Ac
qu
isit
ion
DICOM
Export PCDell PowerEdge 800
FTP,
File sharing,
…
DICOM
DICOM
DICOM
First Latin American EELA Workshop, April 24th-26th 2006
29
Neuroscience research (II).Neuroscience research (II).
• Goal: make SRB invisible to the end user.• More than 500,000 files registered.• Now interfaced within the MATLAB environment:
– Data pushed where the CPUs are (CC-IN2P3, ENS Lyon).
• ~ 1.5 FTE for 3 months…• Next step:
– Ever growing community (a few TBs / year): Strasbourg hospital to join the project (maybe Marseille, St Etienne…).
– Goal: Join the BIRN network (US biomedical network).
SuperNovae Factory.SuperNovae Factory.• Telescope data stored into the SRB, processed in Lyon
(almost online).• Collaborative tool + backup (files exchanged between
French and US users).
Hawaii telescope
HPSS/Lyon
SRBCC-IN2P3
a few GBs / day
SRBHPSS/NERSC
Berkeley (project)
Neuroscience: BIRN (I).Neuroscience: BIRN (I).• BIRN = BioInformatics Research Network• Brain imagery (human, animals: mice, apes):
- fMRI etc…• Data sharing and exchange of experimental data for each lab and project.
First Latin American EELA Workshop, April 24th-26th 2006
33
Neuroscience: BIRN (II).Neuroscience: BIRN (II).
• BIRN Coordination Center in San Diego:– 1 rack (SRB server, database etc…) on each site.– Administration centralized from the BIRN-CC: 24/7.– Sharing software, APIs…– 15 millions of files registered (16 TB), 360 users: file search on
metadata over the entire sample (impressive!).• John Hopkins Hospital: « done more in 6 months than in 18
years ».• BIRN: 30 people at the first meeting (2001), 115 in Feb. 2005,
more than 200 now success.• Some sites already starting in Europe: Edinburgh, Manchester.• Hoping for a french site in the near future.
First Latin American EELA Workshop, April 24th-26th 2006
34
ROADNet (UCSD).ROADNet (UCSD).
• Real-time Observatories, Applications, and Data-Management Network.
• The Problem: – Integrated real-time management of large, distributed, heterogeneous data
streams from sensor networks.– Sensors: Seismometers, Accelerometers, Displacement, Barometric
pressure, Temperature, Wind Speed, Wind Direction, Infrasound, Hydroacoustic, Differential Pressure Gauges, Strain, Solar Insolation, pH, Electric Current, Electric Potential, Dilution of oxygen, Still Camera Images, Codar.
– Multidisciplinary project:• Sismology.• Oceanography.• Hydrology.• Meteorology.• Etc…
First Latin American EELA Workshop, April 24th-26th 2006
35
ROADNet (UCSD).ROADNet (UCSD).
DATASCOPE
Archives/Processing/Review
It’s a grid for online studies, handling data streams.
(ORB= Online Ring Buffer)
First Latin American EELA Workshop, April 24th-26th 2006
37
Potential pitfalls.Potential pitfalls.
• To build a successfull environment for data management and distribution over many sites:– Good coordination and communication between the sites
administrators: « social » factor.– Manpower: expertise needed in several area (network,
sys. admin. and database administration).– Working in different time zones does not make things
easy.– Development of monitoring tools.– Automatic recovery of the services in case of services
problem: decrease downtime of the services.
First Latin American EELA Workshop, April 24th-26th 2006
38
Hardware requirements.Hardware requirements.• Network:
– % packet loss must be low.– High latency network (Round Trip Time > 100 ms):
potential show stopper.– Duplication of information services (databases) should
be considered (e.g: Belle grid extending in Australia, Japan, South Korea).
• Servers hardware: – Disk arrays quality: data corruption etc…– Data duplication can be a show stopper in terms of
budget.– Database servers scaled correctly.
First Latin American EELA Workshop, April 24th-26th 2006
39
Other requirements.Other requirements.• Data integrity (checksum).• Backup policy in order to prevent data loss. • Scalability of the middleware.• Middleware must be multi OS.• Fault tolerance of the system. • Compatibility of the client application version as a
function of the midleware evolution: prevent tough and painfull migration to newer version.
• Middleware must be as transparent as possible to hardware, databases etc…evolution.
First Latin American EELA Workshop, April 24th-26th 2006
40
Challenges.Challenges.
• Is a grid environment always the solution?
• Not sure !!!
• Cost in terms of:– Hardware.– Networking.– Manpower (more duplicated sites, more data,
more admins).
can be prohibitive.
First Latin American EELA Workshop, April 24th-26th 2006
42
Summary and outlook (I).Summary and outlook (I).
• Middleware needed for an efficient data management over multiple site.
• Scalability might be an issue in the future for the information systems (databases) linked to these middleware:– Inflation of metadata.– Inflation of files.
Web services: not sure that it should not be at the centre of data distribution.
• Economic and manpower costs often neglected.
First Latin American EELA Workshop, April 24th-26th 2006
43
Summary and outlook (II).Summary and outlook (II).
• SRB: a very good candidate.
Is there a real competitor at the moment ?• RODS (Rule Oriented Data management System):
– Replacement of SRB (open source).
– Compatible with SRB (SRB client application could connect to a RODS server).
– SDSC leading the project.
– CC-IN2P3, one of a few partners going to be involved in the first step.
First Latin American EELA Workshop, April 24th-26th 2006
44
Acknowledgement.Acknowledgement.Many thanks to:
– Reagan Moore and his team (SDSC, USA).– Adil Hasan (CCLRC-RAL, UK).– Wilko Kroeger (SLAC, USA).– Pascal Calvat (CC-IN2P3, France).
BaBar: http://www.slac.stanford.edu/BFROOT/Belle: http://belle.kek.jp/ BIRN: http://www.nbirn.net/CC-IN2P3: http://cc.in2p3.fr/ ESnet: http://www.es.net/ROADNet: http://eqinfo.ucsd.edu/projects/roadnet/index.htmlSRB: http://www.sdsc.edu/srb/index.php/Main_Page