21
1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

Embed Size (px)

Citation preview

Page 1: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

1

Computing & NetworkingUser Group Meeting

Roy Whitney

Andy Kowalski

Sandy Philpott

Chip Watson

17 June 2008

Page 2: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

2

Users and JLab IT

• Ed Brash is User Group Board of Directors’ representative on the IT Steering Committee.

• Physics Computing Committee (Sandy Philpott)

• Helpdesk and CCPR requests and activities

• Challenges– Constrained budget

• Staffing• Aging infrastructure

– Cyber Security

Page 3: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

3

Computing and Networking Infrastructure

Andy Kowalski

Page 4: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

4

CNI Outline

• Helpdesk

• Computing

• Wide Area Network

• Cyber Security

• Networking and Asset Management

Page 5: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

5

Helpdesk

• Hour 8am-12pm M-F– Submit a CCPR via http://cc.jlab.org/– Dial x7155– Send email to [email protected]

• Windows XP, Vista and RHEL5 Supported Desktops– Migrating older desktops

• Mac Support?

Page 6: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

6

Computing

• Email Servers Upgraded– Dovecot IMAP Server (Indexing)– New File Server and IMAP Servers (Farm Nodes)

• Servers Migrating to Virtual Machines

• Printing– Centralized Access via jlabprt.jlab.org– Accounting Coming Soon

• Video Conferencing (working on EVO)

Page 7: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

7

Wide Area Network• Bandwidth

– 10Gbps WAN and LAN backbone– Offsite Data Transfer Servers

• scigw.jlab.org(bbftp)• qcdgw.jlab.org(bbcp)

Page 8: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

8

Cyber Security Challenge

• The threat: sophistication and volume of attacks continue to increase.– Phishing Attacks

• Spear Phishing/Whaling are now being observed at JLab.

• Federal, including DOE, requirements to meet the cyber security challenges require additional measures.

• JLab uses a risk based approach that incorporates achieving the mission while at the same time dealing with the threat.

Page 9: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

9

Cyber Security

• Managed Desktops

– Skype Allowed From Managed Desktops On Certain Enclaves

• Network Scanning

• Intrusion Detection

• PII/SUI (CUI) Management

Page 10: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

10

Networking and IT Asset Management

• Network Segmentation/Enclaves– Firewalls

• Computer Registration– https://reggie.jlab.org/user/index.php

• Managing IP Addresses– DHCP

• Assigns all IP addresses (most static)• Integrated with registration

• Automatic Port Configuration– Rolling out now– Uses registration database

Page 11: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

11

Scientific Computing

Chip Watson & Sandy Philpott

Page 12: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

13

Farm Evolution Motivation

• Capacity upgrades– Re-use of HPC clusters

• Movement to Open Source– O/S upgrade– Change from LSF to PBS

Page 13: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

14

Farm Evolution Timetable

Nov 07: Auger/PBS available – RHEL3 - 35 nodes

Jan 08: Fedora 8 (F8) available – 50 nodes

May 08: Friendly-user mode; IFARML4,5

Jun 08: Production

– F8 only; IFARML3 + 60 nodes from LSF IFARML alias

Jul 08: IFARML2 + 60 nodes from LSF

Aug 08: IFARML1 + 60 nodes from LSF

Sep 08: RHEL3/LSF->F8/PBS Migration complete

– No renewal of LSF or RHEL for cluster nodes

Page 14: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

15

Farm F8/PBS Differences

• Code must be recompiled– 2.6 kernel– gcc 4

• Software installed locally via yum– cernlib– Mysql

• Time limits: 1 day default, 3 days max

• stdout/stderr to ~/farm_out

• Email notification

Page 15: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

16

Farm Future Plans

• Additional nodes – from HPC clusters

• CY08: ~120 4g nodes• CY09-10: ~60 6n nodes

– Purchase as budgets allow

• Support for 64 bit systems when feasible & needed

Page 16: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

17

Storage Evolution

• Deployment of Sun x4500 “thumpers”

• Decommissioning of Panasas(old /work server)

• Planned replacement of old cache nodes

Page 17: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

18

Tape Library

• Current STK “Powderhorn” silo is nearing end-of-life– Reaching capacity & running out of blank tapes– Doesn’t support upgrade to higher density cartridges– Is officially end-of-life December 2010

• Market trends– LTO (Linear Tape Open) Standard has proliferated since 2000– LTO-4 is 4x density, capacity/$, and bandwidth of 9940b:

800 GB/tape, $100/TB, 120 MB/s– LTO-5, out next year, will double capacity, 1.5x bandwidth:

1600 GB/tape, 180 MB/s– LTO-6 will be out prior to the 12 GeV era

3200 GB/tape, 270 MB/s

Page 18: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

19

Tape Library Replacement

• Competitive procurement now in progress– Replace old system, support 10x growth over 5 years

• Phase 1 in August– System integration, software evolution– Begin data transfers, re-use 9940b tapes

• Tape swap through January

• 2 PB capacity by November

• DAQ to LTO-4 in January 2009

• Old silo gone in March 2009

End result: breakeven on cost by the end of 2009!

Page 19: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

20

Long Term Planning

• Continue to increase compute & storage capacity in most cost effective manner

• Improve processes & planning– PAC submission process– 12 GeV Planning…

Page 20: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

E.g.: Hall B Requirements

Event Simulation 2012 2013 2014 2015 2016SPECint_rate2006 sec/event 1.8 1.8 1.8 1.8 1.8Number of events 1.00E+12 1.00E+12 1.00E+12 1.00E+12 1.00E+12Event size (KB) 20 20 20 20 20

% Stored Long Term 10% 25% 25% 25% 25%Total CPU (SPECint_rate2006) 5.7E+04 5.7E+04 5.7E+04 5.7E+04 5.7E+04Petabytes / year (PB) 2 5 5 5 5

Data Acquisition          Average event size (KB) 20 20 20 20 20Max sustained event rate (kHz) 0 0 10 10 20Average event rate (kHz) 0 0 10 10 10Average 24-hour duty factor (%) 0% 0% 50% 60% 65%Weeks of operation / year 0 0 0 30 30Network (n*10gigE) 1 1 1 1 1Petabytes / year 0.0 0.0 0.0 2.2 2.4

1st Pass Analysis 2012 2013 2014 2015 2016

SPECint_rate2006 sec/event 1.5 1.5 1.5 1.5 1.5Number of analysis passes 0 0 1.5 1.5 1.5Event size out / event size in 2 2 2 2 2Total CPU (SPECint_rate2006) 0.0E+00 0.0E+00 0.0E+00 7.8E-03 8.4E-03Silo Bandwidth (MB/s) 0 0 900 900 1800Petabytes / year 0.0 0.0 0.0 4.4 4.7

Total SPECint_rate2006 5.7E+04 5.7E+04 5.7E+04 5.7E+04 5.7E+04SPECint_rate2006 / node 600 900 1350 2025 3038# nodes needed (current year) 95 63 42 28 19Petabytes / year 2 5 5 12 12

Page 21: 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17 June 2008

22

LQCD Computing

• JLab operates 3 clusters with nearly 1100 nodes, primarily for LQCD plus some accelerator modeling

• National LQCD Computing Project (2006-2009: BNL, FNAL, JLab; USQCD Collaboration)

• LQCD II proposal 2010-2014 would double the hardware budget to enable key calculations

• JLab Experimental Physics & LQCD computing share staff (operations & software development) & tape silo, providing efficiencies for both