Upload
david-wolfson
View
215
Download
0
Embed Size (px)
Citation preview
ATLAS Production
Kaushik DeKaushik De
University of Texas At ArlingtonUniversity of Texas At Arlington
LHC Computing Workshop, AnkaraLHC Computing Workshop, Ankara
May 2, 2008May 2, 2008
May 2, 2008May 2, 2008Kaushik DeKaushik De 2
Outline
Computing GridsComputing Grids
Tiers of ATLASTiers of ATLAS
PanDA Production SystemPanDA Production System
MC Production StatisticsMC Production Statistics
Distributed Data ManagementDistributed Data Management
Operations ShiftsOperations Shifts
Thanks to T. Maeno (BNL), R. Rocha (CERN), J. Shank Thanks to T. Maeno (BNL), R. Rocha (CERN), J. Shank (BU) for some of the slides presented here(BU) for some of the slides presented here
May 2, 2008May 2, 2008Kaushik DeKaushik De 3
EGEE Grid
+ Nordugrid - NDGF (Nordic countries)
May 2, 2008May 2, 2008Kaushik DeKaushik De 4
OSG – US Grid
May 2, 2008May 2, 2008Kaushik DeKaushik De 5
Tiers of ATLAS
10 Tier 1 centers10 Tier 1 centers Canada, France, Germany, Italy, Netherlands, Nordic Countries,
Spain, Taipei, UK, USA
~35 Tier 2 centers~35 Tier 2 centers Australia, Austria, Canada, China, Czech R., France, Germany,
Israel, Italy, Japan, Poland, Portugal, Romania, Russian Fed., Slovenia, Spain, Switzerland, Taipei, UK, USA
? Tier 3 centers? Tier 3 centers
May 2, 2008May 2, 2008Kaushik DeKaushik De 6
Tiered Example – US Cloud
BNL T1
MW T2UC, IU
NE T2BU, HU
SW T2UTA, OU
GL T2UM, MSU
SLAC T2
IU OSG
UC Teraport
UTA DPCC
LTUUTD
OU Oscer
SMU
Wisconsin
Tier 3’s
May 2, 2008May 2, 2008Kaushik DeKaushik De 7
Data Flow in ATLAS
May 2, 2008May 2, 2008Kaushik DeKaushik De 8
Storage Estimate
May 2, 2008May 2, 2008Kaushik DeKaushik De 9
Production System Overview
DQ2DQ2
Tasks requested by Physics Working Group
approved
Panda
ProdDBsubmit
CA,DE,ES,FR,IT, NL,TW,UK,US (NDGF coming)
Clouds
send jobs
register files
May 2, 2008May 2, 2008Kaushik DeKaushik De 10
PANDA = Production ANd Distributed Analysis systemPANDA = Production ANd Distributed Analysis system Designed for analysis as well as production Project started Aug 2005, prototype Sep 2005, production
Dec 2005 Works both with OSG and EGEE middleware
A single task queue and pilotsA single task queue and pilots Apache-based Central Server Pilots retrieve jobs from the server as soon as CPU is
available low latency Highly automated, has an integrated monitoring system, and Highly automated, has an integrated monitoring system, and
requires low operation manpowerrequires low operation manpower Integrated with ATLAS Distributed Data Management (DDM) Integrated with ATLAS Distributed Data Management (DDM)
systemsystem Not exclusively ATLAS: has its first OSG user CHARMMNot exclusively ATLAS: has its first OSG user CHARMM
Panda
May 2, 2008May 2, 2008Kaushik DeKaushik De 11
CloudPanda
storage
Tier 1
storage
Tier 2s
job
output files
output files
output files
job
input files
input files
input files
May 2, 2008May 2, 2008Kaushik DeKaushik De 12
Panda/Bamboo System Overview
site A
Panda server
site B
pilotpilot
Worker Nodes
condor-g
AutopilotAutopilot
https
https
submit
pull
End-user
submit
job
pilotpilot
ProdDBjob
job
loggerloggerhttp
send log
bamboobambooLRC/LFC
DQ2DQ2
May 2, 2008May 2, 2008Kaushik DeKaushik De 13
Apache + gridsite
PandaDB
clientsPanda server
Panda server
Central queue for all kinds of jobsCentral queue for all kinds of jobs Assign jobs to sites (brokerage)Assign jobs to sites (brokerage) Setup input/output datasetsSetup input/output datasets
Create them when jobs are submitted Add files to output datasets when jobs are finished
Dispatch jobsDispatch jobs
loggerlogger
LRC/LFC
DQ2DQ2
https
pilotpilot
May 2, 2008May 2, 2008Kaushik DeKaushik De 14
Apache + gridsite
cx_Oracle
Panda server
https
https
prodDB
cron
Bamboo
Bamboo
Get jobs from prodDB to submit them to PandaGet jobs from prodDB to submit them to Panda
Update job status in prodDBUpdate job status in prodDB
Assign tasks to clouds dynamicallyAssign tasks to clouds dynamically
Kill TOBEABORTED jobsKill TOBEABORTED jobs
A cron triggers the above procedures every 10 minA cron triggers the above procedures every 10 min
May 2, 2008May 2, 2008Kaushik DeKaushik De 15
HTTP/S-based communication HTTP/S-based communication (curl+grid proxy+python)(curl+grid proxy+python)
GSI authentication via mod_gridsiteGSI authentication via mod_gridsite
Most of communications are asynchronousMost of communications are asynchronous Panda server runs python threads as soon as it receives
HTTP requests, and then sends responses back immediately. Threads do heavy procedures (e.g., DB access) in background better throughput
Several are synchronous
Client-Server Communication
serialize(cPickle)
HTTPS(x-www-form -urlencode)
Client UserIF
mod_python
Panda
Pythonobj
mod_deflate
Request
Pythonobj
deserialize(cPickle)
Pythonobj
Response
May 2, 2008May 2, 2008Kaushik DeKaushik De 16
Data Transfer
Rely on ATLAS DDMRely on ATLAS DDM Panda sends requests to DDM DDM moves files and sends
notifications back to Panda Panda and DDM work
asynchronously Dispatch input files to T2s and aggregate Dispatch input files to T2s and aggregate
output files to T1output files to T1 Jobs get ‘activated’ when all input files Jobs get ‘activated’ when all input files
are copied, and pilots pick them upare copied, and pilots pick them up Pilots don’t have to wait for data
arrival on WNs Data-transfer and Job-execution can
run in parallel
Panda
submit Job
DQ2
subscribe T2 for disp dataset
callback
get Jobpilot
submitter
finish Job
callback
add files to dest datasets
run job
data transfer
data transfer
May 2, 2008May 2, 2008Kaushik DeKaushik De 17
Pilot and Autopilot (1/2)
Autopilot is a scheduler to submit pilots to sites via Autopilot is a scheduler to submit pilots to sites via condor-g/glideincondor-g/glideinPilot GatekeeperJob Panda server
Pilots are scheduled to the site batch system and pull Pilots are scheduled to the site batch system and pull jobs as soon as CPUs become availablejobs as soon as CPUs become availablePanda server Job Pilot
Pilot submission and Job submission are differentPilot submission and Job submission are differentJob = payload for pilot
May 2, 2008May 2, 2008Kaushik DeKaushik De 18
Pilot and Autopilot (2/2)
How pilot worksHow pilot works Sends the several parameters to Panda server for job
matching (HTTP request) CPU speed Available memory size on the WN List of available ATLAS releases at the site
Retrieves an `activated’ job (HTTP response of the above request)
activated running Runs the job immediately because all input files should be
already available at the site Sends heartbeat every 30min Copy output files to local SE and register them to Local
Replica Catalogue
May 2, 2008May 2, 2008Kaushik DeKaushik De 19
Production vs Analysis
Run on same infrastructuresRun on same infrastructures Same software, monitoring system and facilities No duplicated manpower for maintenance
Separate computing resourcesSeparate computing resources Different queues different CPU clusters Production and analysis don’t have to compete with each
other Different policies for data transfersDifferent policies for data transfers
Analysis jobs don’t trigger data-transfer Jobs go to sites which hold the input files
For production, input files are dispatched to T2s and output files are aggregated to T1 via DDM asynchronously
Controlled traffics
May 2, 2008May 2, 2008Kaushik DeKaushik De 20
Current PanDA production – Past Week
May 2, 2008May 2, 2008Kaushik DeKaushik De 21
PanDA production – Past Month
May 2, 2008May 2, 2008Kaushik DeKaushik De 22
MC Production 2006-07
May 2, 2008May 2, 2008Kaushik DeKaushik De 23
ATLAS Data Management Software - Don Quijote
The second generation of the ATLAS DDM system (DQ2)The second generation of the ATLAS DDM system (DQ2) DQ2 developers M.Branco, D.Cameron, T.Maeno, P.Salgado, T.Wenaus,… Initial idea and architecture were proposed by M.Branco and T.Wenaus
DQ2 is DQ2 is built on top of Grid data transfer toolsbuilt on top of Grid data transfer tools Moved to dataset based approach
Datasets : an aggregation of files plus associated DDM metadata Datasets is a unit of storage and replication Automatic data transfer mechanisms using distributed site services
Subscription system Notification system
Current version 1.0
May 2, 2008May 2, 2008Kaushik DeKaushik De 24
DDM components
DQ2 datasetcatalog
DQ2“Queued
Transfers”
LocalFile
Catalogs
FileTransferService
DQ2Subscription
Agents
DDM end-user tools (T.Maeno, BNL)(dq2_ls,dq2_get, dq2_cr)
May 2, 2008May 2, 2008Kaushik DeKaushik De 25
DDM Operations Mode
CERN
LYON
NG
BNL
FZK
RAL
CNAF
PIC
TRIUMF
SARA
ASGC
lapp
lpc
TokyoBeijing
Romania
grif T3
SWT2
GLT2NET2
WT2MWT2
T1T2 T3 VO box, dedicated computer
to run DDM services
US ATLAS DDM operations team :
BNL H.Ito, W.Deng,A.Klimentov,P.NevskiGLT2 S.McKee (MU)MWT2 C.Waldman (UC)NET2 S.Youssef (BU)SWT2 P.McGuigan (UTA)WT2 Y.Wei (SLAC)WISC X.Neng (WISC)
T1-T1 and T1-T2 associations according to GP
ATLAS Tiers associations.
All Tier-1s have predefined (software) channel with CERN and with each other.Tier-2s are associated with one Tier-1 and form the cloudTier-2s have predefined channel with the parent Tier-1 only.
LYON Cloud
BNL Cloud
TWT2
Melbourne
ASGC Cloud
wisc
May 2, 2008May 2, 2008Kaushik DeKaushik De 26
Activities. Data Replication
Centralized and automatic (according to computing model) Centralized and automatic (according to computing model) Simulated data
AOD/NTUP/TAG (current data volume ~1.5 TB/week) BNL has a complete dataset replicas US Tier-2s are defined what fraction of data they will keep
– From 30% to 100% . Validation samples
Replicated to BNL for SW validation purposes Critical Data replication
Database releases replicated to BNL from CERN and then from BNL to US ATLAS T2s. Data
volume is relatively small (~100MB) Conditions data
Replicated to BNL from CERN Cosmic data
BNL requested 100% of cosmic data. Data replicated from CERN to BNL and to US Tier-2s
Data replication for individual groups, Universities, physicistsData replication for individual groups, Universities, physicists Dedicated Web interface is set up
May 2, 2008May 2, 2008Kaushik DeKaushik De 27
Data Replication to Tier 2’s
May 2, 2008May 2, 2008Kaushik DeKaushik De 28
You’ll never walk alone
Weekly Throughput
2.1 GB/s out of CERN
From Simone Campana
May 2, 2008May 2, 2008Kaushik DeKaushik De 29
Subscriptions
SubscriptionSubscription Request for the full replication of a dataset (or dataset
version) at a given siteRequests are collected by the centralized Requests are collected by the centralized
subscription catalogsubscription catalogAnd are then served by a site of agents – the And are then served by a site of agents – the site site
servicesservicesSubscription on a dataset versionSubscription on a dataset version
One time only replicationSubscription on a datasetSubscription on a dataset
Replication triggered on every new version detected Subscription closed when dataset is frozen
May 2, 2008May 2, 2008Kaushik DeKaushik De 30
Site Services
Agent based frameworkAgent based framework GoalGoal: Satisfy subscriptions: Satisfy subscriptions Each agent serves a specific part of a requestEach agent serves a specific part of a request
Fetcher: fetches up new subscription from the subscription catalog Subscription Resolver: checks if subscription is still active, new
dataset versions, new files to transfer, … Splitter: Create smaller chunks from the initial requests, identifies
files requiring transfer Replica Resolver: Selects a valid replica to use as source Partitioner: Creates chunks of files to be submitted as a single
request to the FTS Submitter/PendingHandler: Submit/manage the FTS requests Verifier: Check validity of file at destination Replica Register: Registers new replica in the local replica catalog …
May 2, 2008May 2, 2008Kaushik DeKaushik De 31
Typical deployment
Deployment at Tier0 similar to Tier1sDeployment at Tier0 similar to Tier1s
LFC and FTS services at Tier1sLFC and FTS services at Tier1s
SRM services at every site, including Tier2sSRM services at every site, including Tier2s
SiteServices
SiteServices
SiteServices
CentralCatalogs
May 2, 2008May 2, 2008Kaushik DeKaushik De 32
Interaction with the grid middleware
File Transfer Services (FTS)File Transfer Services (FTS) One deployed per Tier0 / Tier1 (matches typical site
services deployment) Triggers the third party transfer by contacting the SRMs,
needs to be constantly monitored LCG File Catalog (LFC)LCG File Catalog (LFC)
One deployed per Tier0 / Tier1 (matches typical site services deployment)
Keeps track of local file replicas at a site Currently used as main source of replica information by
the site services Storage Resource Manager (SRM)Storage Resource Manager (SRM)
Once pre-staging comes into the picture
May 2, 2008May 2, 2008Kaushik DeKaushik De 33
DDM - Current Issues and Plans
Dataset deletionDataset deletion Non trivial, although critical First implementation using a central request repository Being integrated into the site services
Dataset consistencyDataset consistency Between storage and local replica catalogs Between local replica catalogs and the central catalogs Lot of effort put into this recently – tracker, consistency service
Prestaging of dataPrestaging of data Currently done just before file movement Introduces high latency when file is on tape
MessagingMessaging More asynchronous flow (less polling)
May 2, 2008May 2, 2008Kaushik DeKaushik De 34
ADC Operations Shifts
ATLAS Distributed Computing Operations Shifts (ADCoS)ATLAS Distributed Computing Operations Shifts (ADCoS) World-wide shifts To monitor all ATLAS distributed computing resources To provide Quality of Service (QoS) for all data processing Shifters receive official ATLAS service credit (OTSMoU)
Additional informationAdditional information http://indico.cern.ch/conferenceDisplay.py?confId=22132 http://indico.cern.ch/conferenceDisplay.py?confId=26338
May 2, 2008May 2, 2008Kaushik DeKaushik De 35
Typical Shift Plan
Browse recent shift historyBrowse recent shift history
Check performance of all sitesCheck performance of all sites File tickets for new issues Continue interactions about old issues
Check status of current tasksCheck status of current tasks Check all central processing tasks Monitor analysis flow (not individual tasks) Overall data movement
File software (validation) bug reportsFile software (validation) bug reports
Check Panda, DDM healthCheck Panda, DDM health
Maintain elog of shift activitiesMaintain elog of shift activities
May 2, 2008May 2, 2008Kaushik DeKaushik De 36
Shift Structure
Shifter on callShifter on call Two consecutive days Monitor – escalate – follow up Basic manual interventions (site – on/off)
Expert on callExpert on call One week duration Global monitoring Advice shifter on call Major interventions (service - on/off) Interact with other ADC operations teams Provide feed-back to ADC development teams
Tier 1 expert on callTier 1 expert on call Very important (ex. Rod Walker, Graeme Stewart, Eric Lancon…)
May 2, 2008May 2, 2008Kaushik DeKaushik De 37
Shift Structure Schematic
by Xavier Espinal
May 2, 2008May 2, 2008Kaushik DeKaushik De 38
ADC Inter-relations
ADCoS
Central ServicesBirger Koblitz
Operations SupportPavel Nevski
Tier 0Armin Nairz
DDMStephane Jezequel
Distributed AnalysisDietrich Liko
Tier 1 / Tier 2Simone Campana
ProductionAlex Read
May 2, 2008May 2, 2008Kaushik DeKaushik De 39
May 2, 2008May 2, 2008Kaushik DeKaushik De 40