Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
PhEDEx Overview for CMS data operators
Natalia RatnikovaFermilab, WH1E22th November 2016
Introduction
• Physics Experiment Data Export : central component of CMS Data Management System, responsible for data location and placement – created in 2004, undergone significant evolution over time– uses grid tools (FTS, SRM, etc) to transfer files according to the
CMS data placement policy– manages CMS data at CMS grid sites: currently 150 nodes are
registered in CMS production instance – uses debug instance for load test transfers between the sites to
ensure “links” quality– provides tools for verifying data consistency, statistics and
monitoring– uses clever routing algorithm, adjustable workload, and more
11/22/2016N. Ratnikova | PhEDEx overview for CMS data operators2
11/22/2016 3
CMS Data
• Event data in files – average file size reasonably large ~2.5 GB – output merged to help scaling in catalogs and storages
• Files are grouped in file blocks to manage them in bulk – ~10-1000 files/block
• File blocks are grouped by physics content in datasets of variable size (0.1–100 TB)
~1010 events/year ~6x107 distinct files
in 2016
N. Ratnikova | PhEDEx overview for CMS data operators
PhEDEx components
• Transfer Management Database at CERN (oracle)
• Site agents – set of perl daemons running at every site, each performing a particular local data management task:– file download, delete, stage from MSS, export for outbound transfer, verify
• Central agents managing PhEDEx workflows and infrastructure– transfer requests, data routing, bookkeeping, monitoring and other central activities
• PhEDEx web site – set of interactive web applications to control and monitor the PhEDEx system– new implementation uses combination of perl + javascript for more interactive features
11/22/2016N. Ratnikova | PhEDEx overview for CMS data operators4
11/22/2016 N. Ratnikova | PhEDEx overview for CMS data operators 5
PhEDEx workflow
11/22/2016 6
PhEDEx transfer workflow
• Central PhEDEx agents are middleware-agnostic • Site agents integrated through plugins with WLCG DM
middleware – e.g FTS or SRM – to execute transfers
N. Ratnikova | PhEDEx overview for CMS data operators
PhEDEx building blocks
PhEDEx code has been refactored to provide generalized solutions and facilitate the implementation of the new features:• Core agent framework
– provides base Agent class and set of modules for common functions: SQL statements, LFN to PFN conversions, etc
• Namespace framework– provides interface to various storage types (dCache,DPM, EOS, Castor,
posix) to access file properties for consistency checks• Data service framework and the website
– Implement web site frontend and the APIs to access the PhEDEx database.
• LifeCycle agent framework– allows to simulate full life cycle of the PhEDEx system, generating the work
load; useful for performance and scalability tests, debugging and validation
11/22/2016N. Ratnikova | PhEDEx overview for CMS data operators7
Recent developments
• Maintenance of the existing code– port to new systems, external upgrades
• Additional features for operational needs:– Integrate FTS 3 support – automate file invalidation requests– processing consistency checks results
• Network–aware applications • Metrics for latency and popularity analytics
11/22/2016N. Ratnikova | PhEDEx overview for CMS data operators8
11/22/2016N. Ratnikova | PhEDEx overview for CMS data operators9
Screen shot of general overview of PhEDEx web page
Screen shot of file sizes breakdown and stats As of Jan 6, 2016:
Total files: 44 763 533Total data size: 115.27 PB
11/22/2016N. Ratnikova | PhEDEx overview for CMS data operators10
Today’s numbers:Total files: 64 363 757Total data size: 165.27 PB