Upload
homer-banks
View
227
Download
1
Embed Size (px)
Citation preview
KISTI-GSDC SITE REPORTAsia Tier Center Forum
@ KISTI, Daejeon, South Korea
22 Sep – 24 Sep 2015
Sang-Un Ahn
on the behalf of KISTI-GSDC
Asia Tier Center Forum 2
CONTENTS• KISTI GSDC Overview
• Tier-1 operations
• Network
• Plan
2015-09-22
Asia Tier Center Forum 4
KISTI LocationSouth Korea
Busan
Daejeon
2h by car
Gwangju
Jeju Island
Daegu
KISTI
30 Government Research Institutes11 Public Research Institutes29 Non-profit Organizations 7 Universities
Daedeok R&D Innopolis
2015-09-22
Rare Isotope Accelerator(To be constructed)
SeoulIncheonAirport
Asia Tier Center Forum 5
KISTI GSDC• Government funding research institute for IT founded in 1962
• 600 people working for National Information Service(distribution & analysis), Super-computing and Networking
• Operating Supercomputing and NREN Infrastructure• Supercomputer: 307.4 TFlops at peak(14th ranked at Top500 in 2009; 201st now)
• NREN Infrastructure: KREONet2 • Domestic: Seoul ←(100G)→ Daejeon• International: HK ←(10G)→ Chicago/Seattle(Member of GLORIAD)
KISTI (Korea Institute of Science and Technology Information)
History of GSDC• 7 years of the experience running grid computing centre
with the collaboration with the ALICE experiment and WLCG
GSDC (Global Science experiment Data hub Center)• Government funding project to promote research experiment pro-
viding computing power and storage• HEP: ALICE, CMS, Belle, RENO• Others: LIGO, Bioinformatics
• Running Data-Intensive Computing Facility• 13 staffs: sysadmin, experiment support, external-relation, administration• Total 6,000 cores, 6,500 TB disk and 1,500 TB tape storage
GSDC Facility
2007 2009 2013201220112010 2014
ALICE T2 operation startFormation of GSDCALICE T2 Test-bed
ALICE T1 Test-bed KISTI Analysis Facility ALICE T1 candidate Full T1 for ALICE CMS T3
2015-09-22
Asia Tier Center Forum 6
GSDC System Overview
2015-09-22
1.5 PB
Torque/MAUI 3,500 slots ALICE T1, Belle, RENO
HTCondor 2,500 slots CMS T3, LIGO, KIAF
Public Private
1.5 PB
4.0 PB
IBM TSM/GPFS
HITACHI USP/VSPEMC Clariion/VNX
HITACHI HNASEMC ICILON
4 Spine switches 74 Leaf switches500+ Servers in 22 racks 14 Storage racks 4 tape frames
40 RACKS!!!!
Asia Tier Center Forum 7
System Management
• Services are defined at Puppet (manifests, profiles)• Stash is used for Puppet code management
• Nodes are created/provisioned via Foreman with Puppet classes• Any VMs are managed by the Red Hat solution
• Centralized authN/authZ are provided via IPA (SSO to be imple-mented)
• JIRA helps to track issues and to manage project • Confluence is a useful tool for documentation and sharing
2015-09-22
ProjectIssue tracking
Puppet codemanagement(via Git)
Documentation & Space
Node definitionProvisioning
ManifestsProfiles
v3.7.4
Asia Tier Center Forum 9
Pledges
2014 2015 2016
CPU(HS06)(Installed)
25,000(28,800)
28,000(28,800)
31,000(31,000)
Disk(TB)(Installed)
1,000(1,000)
1,000(1,000)
1,500(1,500)
Tape(TB)(Installed)
1,500(1,000)
1,500(1,500)
1,500(1,500)
2015-09-22
2015 pledges was fully fulfilled at the end of last year
ALICE ONLY
Asia Tier Center Forum 10
KISTI, 4.06%Jobs
Mar 2015 Sep 2015
~ 2500
~ 100 (Queued Agents)
2015-09-22
• 2,688 concurrent jobs = 28 kHS06• 84 nodes, 32 (logical) cores per node, 10.5 HS06/core• 2015 pledges
• Stable and smooth running• No issues
• Completed 2.1M jobs in the last 6 months
ALICE ONLY
Asia Tier Center Forum 11
Storage• Disk: 1000 TB
• Usage > 75% • Managed by XRootD
• Tape: 1500 TB• 1,019 TB RAW data Pb-Pb & p-Pb (from ALICE)• Tape system: IBM TS3500
• Managed by TSM/GPFS
• Available tape buffer = 400 TB• Keeps replication to complement tape’s low R/W performance• Managed by XRootD
99 % Availability (Last 6 months) for R/W
3 Years Usage History (KISTI_GSDC::[SE2|TAPE])
← Oct 2012Run2 Data Taking
2015-09-22
1,019 TB Used (Tape)
725.3 TB Used (Disk)
ALICE ONLY
Asia Tier Center Forum 12
• 100% Reliable for the last 6 month (from Mar-2015 to Aug-2015 )• Monthly Target for Reliability of ALICE test: 97%• Less than 10 days of yearly downtime
• On track for a stable and reliable site • Participating in weekly WLCG operations meetings(2 times (Mon/Thu) per week): reporting operation-related issues• 24/7 monitoring & maintenance contract • 2 persons responsible for on-call
Site Availability/Reliability
Mar Apr May Jun Jul AugAverage(6M
)
Reliability 100 100 100 100 100 100 100
Availability 95 100 99 100 98 100 99
< Monthly Availability/Reliability (%) >
2015-09-22
ALICE ONLY
Asia Tier Center Forum 14
KISTI Domain Network for T1
2015-09-22
2 Core Switches
Physical Firewall
Backbone Router
Asia Tier Center Forum 15
KISTI-CERN Network (LHCOPN)
10Gbps Upgrade done by 31st April 2015
Dedicated Circuit 10G + 10G SURFnet (backup link included) Operated by Kreonet, KISTI GLORIAD provides 3rd backup
2015-09-22
Asia Tier Center Forum 16
Performance
2015-09-22
CERN IT GatewayMulti-stream: 500Max peak: 1GB/s
10G enabled
KISTI-GSDC
CERN→KISTI (5 min)CERN→KISTI
Average: 65 MB/s
• > 9Gbps peak (~ 1GB/s) observed• CERN IT provided a gateway, 500 parallel transfers
• xrd3cp crashed with Xrootd v3.3.4 (fixed @ v4 or later)
• Max 1GB/s peak @ alimonitor.cern.ch • Confirmed full capacity
MRTG @ MX960
alimonitor.cern.ch
Asia Tier Center Forum 17
KISTI-ASIA
2015-09-22
• Connected to JP, US, CN, TW(ASGC) and HK• via Kreonet2• JP connected through APAN
• Not connected to TEIN @ HK • Detailed talks tomorrow
Tsukuba
Wuhan
Asia Tier Center Forum 19
T1 Operations
2014 2015 2016
CPU(HS06)(Installed)
25,000(28,800)
28,000(28,800)
31,000(31,000)
Disk(TB)(Installed)
1,000(1,000)
1,000(1,000)
1,500(1,500)
Tape(TB)(Installed)
1,500(1,000)
1,500(1,500)
1,500(1,500)
2015-09-22
• (CPU) Worker nodes were already allocated but not in production• (Disk) New disk storage installed last week & data migration soon to be scheduled
• EMC Clariion (1PB, 2011) -> EMC VNX (1.5PB, 2015)
• (Tape) Unchanged• More 500 TB will be procured next year for 2017 pledges
• (System)• Xrootd upgrade v3.3.4 -> v4.1.3 or later
2016 pledges will be fulfilled at the end of this year
ALICE ONLY
Asia Tier Center Forum 20
Network• Concern about KISTI-ASIA network was informed to Kreonet & TEIN
• Detailed talk at TEIN-GLORIAD-KR joint session
• Current 10Gbps dedicated link between Daejeon-Chicago could be replaced by Kreonet once its upgrade done
2015-09-22