Overview Status of endpoints and redirectors Monitoring
Failover Overflow
Slide 3
Endpoints Status on Sat. 15 Nov. Got one more site: RO-07-NIPNE
Problems: We work on CSCS Not working at all: Nikhef Flip-flopping:
FZK-LCG2 and NDGF-T1
Slide 4
Direct access Expired cert Wrong config Test jobs were unable
to get proxy
Slide 5
Upstream redirection
Slide 6
Downstream redirection Redirectors moved to AI machines
Slide 7
Moving redirectors Herve had to move all the EU redirectors to
the Agile Infrastructure. Simultaneously upgraded to xrootd 4.0.4.
Started with DE redirector. Had to re-implement access rules.
Continued with two redirectors per day. But old machines got
re-introduced, confused everybody. A new set of changes being
applied right now. Now situation clear, but sites need to restart
their services as IPs changed.
Slide 8
Monitoring Machine receiving info from AMQ and giving it to SSB
etc. had to move to Agile Infrastructure. Took much more time then
expected but its done now. EU sites were moving to sending
monitoring data to CERN. Current state may be seen here (thanks to
Igor Pelevanyuk): http://dashb-
xrootd-comp.cern.ch/cosmic/ATLASmigrationMonitoring/ http://dashb-
xrootd-comp.cern.ch/cosmic/ATLASmigrationMonitoring/ Still a lot of
effort needed to make summary and detailed monitoring match:
http://dashb-ai-621.cern.ch/cosmic/DB_ML_Comparator/
http://dashb-ai-621.cern.ch/cosmic/DB_ML_Comparator/ Started deeper
analysis of Panda job info data transported into Hadoop at CERN.
Further improvements in FSB
Slide 9
Cost matrix
Slide 10
Overflow Slowly expanding: BNL still missing, even the reverse
proxy hardware is there. ANALY_AGLT2_SL6ANALY_INFN-T1
ANALY_CONNECTANALY_IN2P3-CC ANALY_BU_ATLASANALY_MPPMU
ANALY_MWT2_SL6ANALY_DESY-HH ANALY_OU_OCHEPANALY_QMUL_SL6 ANALY_SLAC
ANALY_SFU Cant use data from the rest of EU cloud
Slide 11
Snakey overflow plots - success
Slide 12
Snakey overflow plots - failures
Slide 13
Overflow - workload
Slide 14
Overflow workload
Slide 15
Overflow job efficiency
Slide 16
Slide 17
Overflow CPU efficiency
Slide 18
Reactions Up to now only two sites noticed the overflows:
TRIUMF Jedi sent a lot of jobs to almost all US cloud sites, all
reading from TRIUMF. Saturated their proxy (1Gb/s). They since made
it 2 Gb/s. QMUL Chris Walker noticed 5Gbps+ at their NAT gateway,
~10TB/day. Not a problem for now.