33

HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln 2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

Embed Size (px)

Citation preview

Page 1: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,
Page 2: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

2

HEPiX Autumn Meeting 2014University of Nebraska, Lincoln

http://indico.cern.ch/event/320819/

Arne Wiebalck

Liviu Valsan

Borja Aparicio

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 3: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

3

HEPiX

• Global organization of service managers and support staff providing computing facilities for HEP community

• Participating sites include BNL, CERN, DESY,

FNAL, IN2P3, INFN, NIKHEF, RAL, TRIUMF …

• Meetings are held twice per year- Spring: Europe, Autumn: U.S./Asia

• Reports on status and recent work, work in progress & future plans

- Usually no showing-off, honest exchange of experiences

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 4: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

4

Outline • 2014 Autumn Meeting & HEPiX News• Site Reports• End User Services & OS• Grids, Clouds, and Virtualization

• Storage and File systems• Computing and Batch• IT Facilities

• Networking and Security• Basic IT Services

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Arne

Liviu

Borja

Page 5: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

5

HEPiX Autumn 2014• Oct 13 – 17, 2014 at the

University of Nebraska Lincoln- Well organized, rich program

- Eduroam, Indico (intervention, incident, power cut)

• 93 registered participants- Many first timers again

- 6/8 US-CMS Tier-2 sites, 2/5 US-ATLAS Tier-2 sites

- 45 sites represented

• 60 contributions- 96 slides (in 25 minutes!)

- 300 words per slide …

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 6: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

6Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Lincoln, Nebraska

About 22 hours door to door …

Page 7: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

7

HEPiX Autumn 2014• Oct 13 – 17, 2014 at the

University of Nebraska Lincoln- Well organized, rich program

- Eduroam, Indico (intervention, incident, power cut)

• 93 registered participants- Many first timers again

- 6/8 US-CMS Tier-2 sites, 2/5 US-ATLAS Tier-2 sites

- 45 sites represented

• 60 contributions- 96 slides (in 25 minutes!)

- 300 words per slide …

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 8: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

8

HEPiX News

• Tony Wong (BNL) new HEPiX co-chair- 3-year term

• Next meetings- Spring 2015: Oxford (UK) March 23 – 27

- Autumn 2015: BNL (US) Oct 12 – 16

- Spring 2016: DESY Zeuthen (DE), Berlin/Potsdam (TBC)

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 9: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

9

HEPiX Working Groups• IPv6

- Deployment/readiness following Tier structure

https://www.gridpp.ac.uk/wiki/2014_IPv6_WLCG_Site_Survey

- Experiments pushing for services at T1/T2

• Benchmarking- Awaiting SPEC CPUv6

- Suggestion of a “fast” benchmark (minutes)

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 10: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

10

Site Reports• 15 site reports: T0, 7x T1s, 7x T2s

• (Move to) HTCondor still very visible - Talk from HTCondor team - INFN (on LSF now) will start evaluation

• KIT’s “Dropbox”: bwSync&Share- 8’000 users- Based on PowerFolder

• Ganeti used at multiple sites- VM cluster management tool from Google- Overall positive experience

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 11: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

11

Site Reports• Ceph

- Still gaining momentum: many PoCs (RAL: 1PB, BNL: 3PB)- Vivid mail exchange, BoF Session in Oxford?

• Energy efficiency- No WG, but many activities (refurbishments)

- “Energy accounting” discussions

• INFN still investigating micro-server options- Moonshot and other Avoton based solutions

- Experiments seem fine with performance/power ratio

• During “dark data” cleanup NDGF deleted all ALICE tape data due to misunderstanding of what “NDGF data” means

- ALICE::NDGF vs. ALICE::NDGF_tape

- 200TB of data now being backfilled …

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 12: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

12

CERN Site Report

• “What about Ceph @ CERN?”

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 13: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

13

CERN Site Report

• “What about Ceph @ CERN?”

• “Are there ever power cuts at CERN?”

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 14: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

14

End User Services & OS• Six talks in total, three from CERN

- Thomas: CC7

- Borja: Issue tracking and VCS

- Michail: FTS3

• Scientific Linux / CentOS - FNAL SL team continue to provide Scientific Linux

- No competition with other rebuilds

- Rebuild from git.centos.org: difficult (as not supported)

So, after the initial discussions at the Annecy

meeting, the community seems to part ways …

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 15: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

15

Virtualization• Six talks in total, five from CERN

- Laurence: Experiment’s Cloud Computing Adoption

- Andrea: WLCG Monitoring

- Helge: Volunteer Computing

- Arne: Cloud Report, VM IO Performance

• RAL starting batch virtualization- “Burst batch into the cloud”

- Successful PoC: Vacuum model integration with HTCondor

• Virtualization @ GSI: MS Windows on KVM- Windows domain restructuring: all on VMs, all on KVM

- Partly in prod (CA, TS), partly in testing (DC, Exchange)

- No support issue

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 16: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

16

Outline • 2014 Autumn Meeting & HEPiX News• Site Reports• End User Services & OS• Grids, Clouds, and Virtualization

• Storage and File systems• Computing and Batch• IT Facilities

• Networking and Security• Basic IT Services

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Arne

Liviu

Borja

Page 17: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

17

Storage and Filesystems Ten talks in total, five from CERN:

–Luca:– EOS across 1000 km– CERNbox + EOS: Cloud Storage for Science

–Andrea: DPM performance tuning hints for HTTP/WebDAV and Xrootd

–Ruben: Experience in running relational databases on clustered storage

–Liviu: SSD Benchmarking at CERN

https://lvalsan.web.cern.ch/lvalsan/ssd_benchmarking

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 18: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

18

OpenZFS on Linux OpenZFS

Large set of features Independent of the Linux kernel

LLNL: Three Lustre filesystems, ~100 PB, OpenZFS

backend Moving to commodity JBODs Work ongoing for improving Linux boot time with

large number of drives

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 19: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

19

Ceph Based Storage Systemsfor RACF Deployment of same scale as at CERN Lots of performance and stability tests

Object storage, block storage and file system (Ceph FS)

On several platforms (including HP Moonshot) Different networking solutions

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 20: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

20

Using XRootD to Minimize Hadoop Replication Hadoop replication via XRootD Reduced local Hadoop replication to 1 In case of corrupt local blocks:

Request blocks via XRootD Cache locally Repair broken blocks locally in Hadoop

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 21: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

21

Computing and Batch Systems

Six talks in total, one from CERN: Two presentations on benchmarking Four presentations on batch systems

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 22: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

22

Benchmarking activities Intel Xeon E5-2600 v3 (Haswell)

Showing good performance Intel Avoton: very good HS06 / Watt ratio ARM 32-bit HS06 / Watt in between Xeon &

Avoton

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 23: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

23

Fast Benchmark Some requirements are clear:

Open source Easy to run Small

Others requirements not so clear: How fast? Reproducible? Reliable? Single core or multicore?

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 24: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

24

Fast Benchmark Proposals Geant4 based

Linux x86-64 & ARM Realistic detector geometry Footprint: 1/4 to 1/3 of real experiment CPU bound, no I/O

LHCb fast benchmark Small python script, single threaded

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 25: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

25

Next generation HEP-SPEC06

Next SPEC CPU benchmark (CPUv6) in beta

Should be released before the end of the year

Will probably not run with the default SLC 6 compiler

Gcc on CentOS 7 should be fine, config file will be provided by GridKa

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 26: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

26

Batch Systems All four talks about HTCondor:

Two talks from developers Jérôme’s talk: HTCondor pilot @ CERN Open Science Grid adopting HTCondor

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 27: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

27

IT Facilities and Business Continuity

Three talks, two from CERN First Experience with the Wigner Data Centre Joint procurement of IT equipment and services

UPS Monitoring with Sensaphone Multi-level email / SMS alerting Gradual shutdown of servers in case of power cut or

cooling failure Wireless temperature sensors used to build 3D heatmap

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 28: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

28

NeRSC New Computational Research and Theory

(CRT) Building Year-round free air

and water cooling PUE < 1.1 42 MW to building

12.5 MW provisioned

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 29: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

29

Outline • 2014 Autumn Meeting & HEPiX News• Site Reports• End User Services & OS• Grids, Clouds, and Virtualization

• Storage and File systems• Computing and Batch• IT Facilities

• Networking and Security• Basic IT Services

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Arne

Liviu

Borja

Page 30: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

30

Networking and Security

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

• Four networking talks, two security, one from CERN- Stefan: Situational Awareness: Computer Security

• IPv6 Deployment- HEPiX Ipv6 Working Group: WLCG dual-stack services deployment. Testing

- Open Sciences Grid: Client/Server are dual-stack? Server is but not the client?

• Infiniband Based Networking evaluation- Brookhaven National Laboratory (USA)

https://indico.cern.ch/event/320819/session/4/contribution/46/material/slides/0.pdf

• ESNet: Extension to Europe- US Department of Energy

- “Scientific progress will be completely unconstrained by the physical location of instruments, people, computational resources or data”

Page 31: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

31

Basic IT Services 1/2

• Seven talks, three from CERN- Ben: Configuration Services at CERN: Update

- Rubén: Database on Deman: insight how to build your DbaaS

- Aris: Ermis service for DNS Load Balancer configuration

• Monitoring with Nagios- NERSC – US Department of Energy

- Monitoring clusters of 1000's of compute nodes

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 32: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

32

Basic IT Services 2/2• CFEngine

- ATLAS Great Lakes Tier 2 (AGLT2)

- Change management: SVN → Push to production

• Puppet at USCMS-T1 – FermiLab- Modules + Data in Hiera approach. PuppetDashboard instead of TheForeman

- Change management: Git branches → Push to production

- Continuous Integration? Not yet but Beaker is the main candidate

- Secrets? “hiera-eyaml” Not a good solution

• Puppet at BNL- RICH and ATLAS computing Facility

- Emphasis in Change Management and Cultural Management

- Test environments + self-approve delay

- Looking for automatic testing

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Page 33: HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln  2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,