14
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute of Science and Technology Information Global Science experiment Data hub Center

Status Report on Tier-1 in Korea

  • Upload
    moke

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Status Report on Tier-1 in Korea. Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC). Korea Institute of Science and Technology Information Global Science experiment Data hub Center. OUTLINE. Computing Resources Operations Network Conclusion. KISTI GSDC Tier-1 Team. ~ 9 people. - PowerPoint PPT Presentation

Citation preview

Page 1: Status Report on Tier-1 in Korea

Status Report on Tier-1 in Korea

Gungwon Kang, Sang-Un Ahn and Hangjin Jang(KISTI GSDC)

April 28, 2014 at 15th CERN-Korea Committee, Geneva

Korea Institute of Science and Technology Informa-tion

Global Science experiment Data hub Center

Page 2: Status Report on Tier-1 in Korea

2

OUTLINE Computing Resources

Operations

Network

Conclusion

28 April 201415th CERN-Korea Committee

Page 3: Status Report on Tier-1 in Korea

KISTI GSDC Tier-1 Team

3

ROLE NameRepresentative Haeng-Jin JangSystem Management Hee-Jun YoonSystem Administration Jeong-Heon Kim

Storage (Disk & Tape) Hee-Jun YoonSang-Oh Park

Network Hyoung-Woo ParkKISTI support (Dr. Bu-Seung Cho)

Site Operation & Administration Il-Yeon YeoSang-Un Ahn

KIAF Operation & User Support Sang-Un Ahn

~ 9 people

28 April 201415th CERN-Korea Committee

Page 4: Status Report on Tier-1 in Korea

4

Computing Resource Status 2013 Pledges (CPU): HepSpec06 25,000

Current HepSpec06: 28,055 2,524 Jobs slots available (4 reserved slots for pilot jobs) with H/T enabled

2013 Pledges (Tape Storage): Tape 1,500 TB Current Tape capacity: 1,000 TB Pledges will be met in this year

2013 Pledges (Disk Storage): Disk 1,000 TB Current Disk capacity: 966 TB (allocated 1,000 TB but usable space slightly below)

28 April 2014

15th CERN-Korea Committee

Page 5: Status Report on Tier-1 in Korea

5

OPERATIONS

Page 6: Status Report on Tier-1 in Korea

6

Total wall clock hours for ALICE jobs in the last 6 monthsKISTI, 3.9 %(Including Tier-2)Jobs

Oct 2013

T1 worker nodes migration to 10GbE equipped ones

ALICE Central Service Maintenance

EMI-3 Migration & Delivery of full pledges

~ 800

~ 1800

~ 2500

Apr 2014

• Current capacity: 2,524 job slots, 28.1 kHS06– 84 nodes, 32 (logical) cores per node, 11 HS06/core

• Maintenance issues– Worker nodes migration to 10GbE equipped ones– Middleware: EMI-3 migration (end of support to EMI-2

by 30 April)– Delivered full pledges for 2013

3.58% (2013)

Page 7: Status Report on Tier-1 in Korea

7

Site Reliability 

28 April 201415th CERN-Korea Committee

Page 8: Status Report on Tier-1 in Korea

8

KISTI Analysis Facility - KIAF• Parallel Analysis Facility based on PROOF• In operation since 2011, ALICE use only• 1 master, 8 worker nodes, 12 cores and 22 TB disk per node• Similar size and utilization as CAF - CERN Analysis Facility

28 April 201415th CERN-Korea Committee

Page 9: Status Report on Tier-1 in Korea

Plans for On-call Service

• Alarm system– Nagios + e-mail notifications – Implementing SMS plugin + Night Owl shift by private company– Tape system - hardware/software malfunction reported to IBM and third-party company– 24/7 support, intervention to be carried out within one day– Ongoing evaluation of monitoring frameworks: e.g. Icinga, Zabbix, etc.

• On-call scheme– One week shift cycle with 5-6 personnel– Expecting 1 or 2 calls in a cycle - alarms from batch scheduler and services, WN servicing– From daily monitoring report – detailed action list on services and hardware incidents

• Night owl shift– Private company contract – on-site support – If necessary - SMS and e-mail notification to off-site on-duty experts– Supercomputing division at KISTI is running similar system for years

We are planning to prepare for On-call Service. Maybe it has 3 func-tions of service.

28 April 201415th CERN-Korea Committee

Page 10: Status Report on Tier-1 in Korea

10

NETWORK

Page 11: Status Report on Tier-1 in Korea

11

Internal Network• Internal network for Tier-1 is isolated from the computing centre service net-

work

• Done in Oct 2013 - internal network re-structuring (3-week shutdown)• Preparation for upgrade of bandwidth of external network up to 10Gbps• Main switch upgrade: bandwidth up to 2.5 Tbps • HA configuration of private network• Remove bottlenecks to storage

• Full 20 Gbps configuration (Incoming/Outgoing)• Replaced all switches by 10 Gbps; done on part of service racks• 1Gbps switches in place for servers with 1Gbps cards

• Worker nodes to be upgraded with10 Gb cards• Tape service nodes are being connected to the 10 Gbps switches

Page 12: Status Report on Tier-1 in Korea

12

External Network• Current Bandwidth to CERN: 2 Gbps

• Dedicated link via Daejeon-Chicago-Amsterdam-Geneva

• Roadmap for 10 Gbps upgrade presented to WLCG MB and accepted• Working on upgrading bandwidth up to 10 Gbps

Page 13: Status Report on Tier-1 in Korea

13

LHC OPN• KISTI T1 network (134.75.125.0/24) included into LHC OPN

• BGP Peering between Kreonet router @ KISTI and LCG network @ CERN• perfSONAR has been deployed for measuring bandwidth and latency; firewall policy

issue persists concerning the ports below 1024 e.g. 80 (http), 443 (https), 843 (b-wctl)

Page 14: Status Report on Tier-1 in Korea

14

Conclusion• KISTI T1 has been approved as a full T1 at the meeting of WLCG Overview Board in

Nov. 2013• The progress of ramping up the capability as a T1 appreciated by ALICE community and a

roadmap to 10G network accepted

• In Jan, KISTI T1 joined LHC OPN

• Over the last 6 months, KISTI T1 has been in “shape-shifting” in terms of network• Core switches replaced (bandwidth: 0.9 Tbps 2.5 Tbps)• Rack switches replaced (bandwidth: 1 Gbps 10 Gbps)• Servers migrated to 10GbE equipped ones