Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Grid Data Management
Week #4Basics of Grid and Cloud computing
Hardi [email protected]
University of TartuMarch 6th 2013
Basics of Grid and Cloud computing 2/33
Overview
● Grid Data Management● Where the Data comes from?● Grid Data Management tools
Basics of Grid and Cloud computing 3/33
Grid foundations
Basics of Grid and Cloud computing 4/33
Where the data comes from?
● CERN's LHC CMS experiment example● CERN – European Organization for Nuclear Research● LHC – Large Hadron Collider● CMS – Compact Muon Solenoid
Basics of Grid and Cloud computing 5/33
Grid acronyms
● EGI Glossary● http://www.egi.eu/about/glossary/ ● Goole search helps
● EGI Security Policy Glossary of Terms● https://documents.egi.eu/public/ShowDocument?docid=71
Basics of Grid and Cloud computing 6/33
Large Hadron Collider (LHC)
Basics of Grid and Cloud computing 7/33
Smash things together, see what happens!
Basics of Grid and Cloud computing 8/33
Discover particles
● Quarks ● Leptons
Quarks
up
down
charm
strange
top
bottom
Leptons
electron muontau
electron neutrino
muon neutrino tau neutrino
Basics of Grid and Cloud computing 9/33
Large Hadron Collider (LHC)
Basics of Grid and Cloud computing 10/33
CMS detector● Took ~2000 scientists and
engineers more than 20 years to design and build
● Is about 15 metres wide and 21.5 metres long
● Weighs twice as much as the Eiffel Tower – about 14000t
● Uses the largest, most powerful magnet of its kind ever made
Basics of Grid and Cloud computing 11/33
Basics of Grid and Cloud computing 12/33
Basics of Grid and Cloud computing 13/33
Collisions in CMS
Basics of Grid and Cloud computing 14/33
CMS in production● volume: ~250 TB/day among dozens of Tiers
● # files: ~19M logical files (but total of replicas so far is ~27M)
● throughput: 2-2.5 GB/s aggregate (weekly averages) in peak weeks in 2012
Basics of Grid and Cloud computing 15/33
Worldwide LHC Computing Grid (WLCG)
● Tier0 at CERN
● 11 Tier1 sites
● 138 Tier2 sites
Basics of Grid and Cloud computing 16/33
WLCG
● 15 Petabytes of data
annually generated
Basics of Grid and Cloud computing 17/33
There are more projects
● DNA experiments
● Radio telescopes
● Sensor networks
● Digitalizing data: books, documents, images
Basics of Grid and Cloud computing 18/33
Grid foundations
Basics of Grid and Cloud computing 19/33
Data management
● Data access and transfer– Simple, automatic multi-protocol file transfer tools: Integrated
with Resource Management service● Move data from local machine to remote machine, where the job is
executed (input file staging)● Move the output files from the remote computer to the local
machines (output file staging)● Pull executable from a remote location
– To have a secure, high-performance, reliable file transfer over modern WANs: GridFTP
● Data replication and management
Basics of Grid and Cloud computing 20/33
ARC Computing Element (CE)
● Universal frontend for different batch systems
● Standard and custom interfaces
● Status information publishing
● File handling
Basics of Grid and Cloud computing 21/33
ARC CE and data handling
● Data are moved by the users and/or by the ARC
● Frequently used files are cached at the execution sites
● Cached files are indexed
Basics of Grid and Cloud computing 22/33
ARC CE internals
● All services are only in the frontend
● Grid users are mapped to local identities
● Use /tmp/user for files witch are actively used
Basics of Grid and Cloud computing 23/33
ARC UI data manipulation
● arcls – to list contents and view some attributes of objects of a specified (by a URL) remote directory
● arccp – a tool to copy files over the Grid
● arcrm – allows users to erase files and directories at any location specified by a valid URL
● arcmkdir – allows users to create directories, if the protocol of the specified URL supports it
Basics of Grid and Cloud computing 24/33
ARC URLs● ftp ordinary File Transfer Protocol (FTP)
● gsiftp GridFTP, the Globus - enhanced FTP protocol with security, encryption, etc. developed by The Globus Alliance
● http ordinary Hyper-Text Transfer Protocol (HTTP) with PUT and GET methods using multiple streams
● https HTTP with SSL v3
● httpg HTTP with Globus GSI
● ldap ordinary Lightweight Data Access Protocol (LDAP) [9]
● lfc LFC catalog and indexing service of gLite [1]
● srm Storage Resource Manager (SRM) service [7]
● root Xrootd protocol (read-only, available in ARC 2.0.0 and later)
● file local to the host le name with a full path
Basics of Grid and Cloud computing 25/33
An URL can be used:
● In standard form:● protocol://[host[:port]]/file
● Or, to enhance the performance● protocol://[host[:port]][;option[;option[...]]]/file● protocol://[url[|url[...]]@]host[:port][;option[;option[...]]]
/lfn[:metadataoption[:metadataoption[...]]]● protocol://[;commonoption[;commonoption]|][url[|
url[...]]@]host[:port [;option[;option[...]]/lfn[:metadataoption[:metadataoption[...]]]
Basics of Grid and Cloud computing 26/33
URL examples ● ARC UI
● arcls lfc://lfc.balticgrid.org/grid/balticgrid/BGCC2013/Lab4/● arcls -l gsiftp://se.grid.eenet.ee/storage/balticgrid/BGCC2013
● XRSL
● to store the job output to storage● (optputFiles=("jobHugeOutputFile.tgz"
"gsiftp://se.grid.eenet.ee/storage/balticgrid/BGCC2013/user/"))
GridFTP
● The GSIFTP protocol offers the functionalities of FTP, but with support for GSI.
● Supported by all VOs in Gird
● arccp gsiftp://lscf.nbi.dk:2811/jobs/1323842831451666535/job.out job.out
File Catalogue (LFC)
● Users and applications need to locate files (or replicas) on the Grid.
● The File Catalogue is the service which maintains mappings between LFN(s), GUID and SURL(s).
● lfc://lfc.balticgrid.org/grid/balticgrid/BGCC2013/Lab4/P4_data.test
● Lfc:P4_data.test
Basics of Grid and Cloud computing 29/33
Relationships between tables
LFC environment● !/bin/bash
● export LCG_GFAL_INFOSYS=bdii.balticgrid.org:2170
● export LCG_CATALOG_TYPE=lfc
● export LFC_HOST=lfc.balticgrid.org
● echo -e 'Prindin muutujaid: LCG_GFAL_INFOSYS; LCG_CATALOG_TYPE; LFC_HOST \n'
● echo $LCG_GFAL_INFOSYS; echo $LCG_CATALOG_TYPE; echo $LFC_HOST
● export LFC_HOME=/grid/balticgrid/BGCC2012/Hardi_Teder
Basics of Grid and Cloud computing 31/33
Clean up after yourself
● Delete the files you don't use any more
Basics of Grid and Cloud computing 32/33
References
● I used several pictures from:● CMS experiment public presentations:
– http://cms.web.cern.ch/org/cms-presentations-public
● NorduGrid repository– http://svn.nordugrid.org/trac/nordugrid/browser/doc/trunk/figures
● FREEIMAGES.co.uk– www.freeimages.co.uk
● More information about ARC Data Management:● http://www.nordugrid.org/papers.html
Basics of Grid and Cloud computing 33/33
Thank you● More information from:
● Hardi Teder [email protected]
● http://courses.cs.ut.ee/2013/cloud