Upload
eleanore-moody
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
2
HEPiX Autumn Meeting 2014University of Nebraska, Lincoln
http://indico.cern.ch/event/320819/
Arne Wiebalck
Liviu Valsan
Borja Aparicio
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
3
HEPiX
• Global organization of service managers and support staff providing computing facilities for HEP community
• Participating sites include BNL, CERN, DESY,
FNAL, IN2P3, INFN, NIKHEF, RAL, TRIUMF …
• Meetings are held twice per year- Spring: Europe, Autumn: U.S./Asia
• Reports on status and recent work, work in progress & future plans
- Usually no showing-off, honest exchange of experiences
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
4
Outline • 2014 Autumn Meeting & HEPiX News• Site Reports• End User Services & OS• Grids, Clouds, and Virtualization
• Storage and File systems• Computing and Batch• IT Facilities
• Networking and Security• Basic IT Services
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Arne
Liviu
Borja
5
HEPiX Autumn 2014• Oct 13 – 17, 2014 at the
University of Nebraska Lincoln- Well organized, rich program
- Eduroam, Indico (intervention, incident, power cut)
• 93 registered participants- Many first timers again
- 6/8 US-CMS Tier-2 sites, 2/5 US-ATLAS Tier-2 sites
- 45 sites represented
• 60 contributions- 96 slides (in 25 minutes!)
- 300 words per slide …
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
6Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Lincoln, Nebraska
About 22 hours door to door …
7
HEPiX Autumn 2014• Oct 13 – 17, 2014 at the
University of Nebraska Lincoln- Well organized, rich program
- Eduroam, Indico (intervention, incident, power cut)
• 93 registered participants- Many first timers again
- 6/8 US-CMS Tier-2 sites, 2/5 US-ATLAS Tier-2 sites
- 45 sites represented
• 60 contributions- 96 slides (in 25 minutes!)
- 300 words per slide …
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
8
HEPiX News
• Tony Wong (BNL) new HEPiX co-chair- 3-year term
• Next meetings- Spring 2015: Oxford (UK) March 23 – 27
- Autumn 2015: BNL (US) Oct 12 – 16
- Spring 2016: DESY Zeuthen (DE), Berlin/Potsdam (TBC)
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
9
HEPiX Working Groups• IPv6
- Deployment/readiness following Tier structure
https://www.gridpp.ac.uk/wiki/2014_IPv6_WLCG_Site_Survey
- Experiments pushing for services at T1/T2
• Benchmarking- Awaiting SPEC CPUv6
- Suggestion of a “fast” benchmark (minutes)
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
10
Site Reports• 15 site reports: T0, 7x T1s, 7x T2s
• (Move to) HTCondor still very visible - Talk from HTCondor team - INFN (on LSF now) will start evaluation
• KIT’s “Dropbox”: bwSync&Share- 8’000 users- Based on PowerFolder
• Ganeti used at multiple sites- VM cluster management tool from Google- Overall positive experience
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
11
Site Reports• Ceph
- Still gaining momentum: many PoCs (RAL: 1PB, BNL: 3PB)- Vivid mail exchange, BoF Session in Oxford?
• Energy efficiency- No WG, but many activities (refurbishments)
- “Energy accounting” discussions
• INFN still investigating micro-server options- Moonshot and other Avoton based solutions
- Experiments seem fine with performance/power ratio
• During “dark data” cleanup NDGF deleted all ALICE tape data due to misunderstanding of what “NDGF data” means
- ALICE::NDGF vs. ALICE::NDGF_tape
- 200TB of data now being backfilled …
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
12
CERN Site Report
• “What about Ceph @ CERN?”
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
13
CERN Site Report
• “What about Ceph @ CERN?”
• “Are there ever power cuts at CERN?”
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
14
End User Services & OS• Six talks in total, three from CERN
- Thomas: CC7
- Borja: Issue tracking and VCS
- Michail: FTS3
• Scientific Linux / CentOS - FNAL SL team continue to provide Scientific Linux
- No competition with other rebuilds
- Rebuild from git.centos.org: difficult (as not supported)
So, after the initial discussions at the Annecy
meeting, the community seems to part ways …
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
15
Virtualization• Six talks in total, five from CERN
- Laurence: Experiment’s Cloud Computing Adoption
- Andrea: WLCG Monitoring
- Helge: Volunteer Computing
- Arne: Cloud Report, VM IO Performance
• RAL starting batch virtualization- “Burst batch into the cloud”
- Successful PoC: Vacuum model integration with HTCondor
• Virtualization @ GSI: MS Windows on KVM- Windows domain restructuring: all on VMs, all on KVM
- Partly in prod (CA, TS), partly in testing (DC, Exchange)
- No support issue
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
16
Outline • 2014 Autumn Meeting & HEPiX News• Site Reports• End User Services & OS• Grids, Clouds, and Virtualization
• Storage and File systems• Computing and Batch• IT Facilities
• Networking and Security• Basic IT Services
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Arne
Liviu
Borja
17
Storage and Filesystems Ten talks in total, five from CERN:
–Luca:– EOS across 1000 km– CERNbox + EOS: Cloud Storage for Science
–Andrea: DPM performance tuning hints for HTTP/WebDAV and Xrootd
–Ruben: Experience in running relational databases on clustered storage
–Liviu: SSD Benchmarking at CERN
https://lvalsan.web.cern.ch/lvalsan/ssd_benchmarking
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
18
OpenZFS on Linux OpenZFS
Large set of features Independent of the Linux kernel
LLNL: Three Lustre filesystems, ~100 PB, OpenZFS
backend Moving to commodity JBODs Work ongoing for improving Linux boot time with
large number of drives
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
19
Ceph Based Storage Systemsfor RACF Deployment of same scale as at CERN Lots of performance and stability tests
Object storage, block storage and file system (Ceph FS)
On several platforms (including HP Moonshot) Different networking solutions
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
20
Using XRootD to Minimize Hadoop Replication Hadoop replication via XRootD Reduced local Hadoop replication to 1 In case of corrupt local blocks:
Request blocks via XRootD Cache locally Repair broken blocks locally in Hadoop
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
21
Computing and Batch Systems
Six talks in total, one from CERN: Two presentations on benchmarking Four presentations on batch systems
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
22
Benchmarking activities Intel Xeon E5-2600 v3 (Haswell)
Showing good performance Intel Avoton: very good HS06 / Watt ratio ARM 32-bit HS06 / Watt in between Xeon &
Avoton
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
23
Fast Benchmark Some requirements are clear:
Open source Easy to run Small
Others requirements not so clear: How fast? Reproducible? Reliable? Single core or multicore?
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
24
Fast Benchmark Proposals Geant4 based
Linux x86-64 & ARM Realistic detector geometry Footprint: 1/4 to 1/3 of real experiment CPU bound, no I/O
LHCb fast benchmark Small python script, single threaded
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
25
Next generation HEP-SPEC06
Next SPEC CPU benchmark (CPUv6) in beta
Should be released before the end of the year
Will probably not run with the default SLC 6 compiler
Gcc on CentOS 7 should be fine, config file will be provided by GridKa
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
26
Batch Systems All four talks about HTCondor:
Two talks from developers Jérôme’s talk: HTCondor pilot @ CERN Open Science Grid adopting HTCondor
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
27
IT Facilities and Business Continuity
Three talks, two from CERN First Experience with the Wigner Data Centre Joint procurement of IT equipment and services
UPS Monitoring with Sensaphone Multi-level email / SMS alerting Gradual shutdown of servers in case of power cut or
cooling failure Wireless temperature sensors used to build 3D heatmap
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
28
NeRSC New Computational Research and Theory
(CRT) Building Year-round free air
and water cooling PUE < 1.1 42 MW to building
12.5 MW provisioned
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
29
Outline • 2014 Autumn Meeting & HEPiX News• Site Reports• End User Services & OS• Grids, Clouds, and Virtualization
• Storage and File systems• Computing and Batch• IT Facilities
• Networking and Security• Basic IT Services
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
Arne
Liviu
Borja
30
Networking and Security
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
• Four networking talks, two security, one from CERN- Stefan: Situational Awareness: Computer Security
• IPv6 Deployment- HEPiX Ipv6 Working Group: WLCG dual-stack services deployment. Testing
- Open Sciences Grid: Client/Server are dual-stack? Server is but not the client?
• Infiniband Based Networking evaluation- Brookhaven National Laboratory (USA)
https://indico.cern.ch/event/320819/session/4/contribution/46/material/slides/0.pdf
• ESNet: Extension to Europe- US Department of Energy
- “Scientific progress will be completely unconstrained by the physical location of instruments, people, computational resources or data”
31
Basic IT Services 1/2
• Seven talks, three from CERN- Ben: Configuration Services at CERN: Update
- Rubén: Database on Deman: insight how to build your DbaaS
- Aris: Ermis service for DNS Load Balancer configuration
• Monitoring with Nagios- NERSC – US Department of Energy
- Monitoring clusters of 1000's of compute nodes
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary
32
Basic IT Services 2/2• CFEngine
- ATLAS Great Lakes Tier 2 (AGLT2)
- Change management: SVN → Push to production
• Puppet at USCMS-T1 – FermiLab- Modules + Data in Hiera approach. PuppetDashboard instead of TheForeman
- Change management: Git branches → Push to production
- Continuous Integration? Not yet but Beaker is the main candidate
- Secrets? “hiera-eyaml” Not a good solution
• Puppet at BNL- RICH and ATLAS computing Facility
- Emphasis in Change Management and Cultural Management
- Test environments + self-approve delay
- Looking for automatic testing
Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary