20
CISL Update Operations and Services CISL HPC Advisory Panel Meeting CISL HPC Advisory Panel Meeting 29 October 2009 Anke Kamrath Anke Kamrath [email protected] [email protected] Operations and Services Division Operations and Services Division Operations and Services Division Operations and Services Division Computational and Information Systems Laboratory Computational and Information Systems Laboratory 1 CHAP Meeting 14 May 2009

CISL Update Operations and Services · 2020-01-07 · CISL Update Operations and Services CISL HPC Advisory Panel Meeting 29 October 2009 Anke Kamrath [email protected] Operations and

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

CISL UpdatepOperations and ServicesCISL HPC Advisory Panel MeetingCISL HPC Advisory Panel Meeting

29 October 2009

Anke KamrathAnke [email protected]@ucar.edu

Operations and Services DivisionOperations and Services DivisionOperations and Services DivisionOperations and Services DivisionComputational and Information Systems LaboratoryComputational and Information Systems Laboratory

1 CHAP Meeting14 May 2009

Overview

• Staff Comings and Goings in OSD• Computing, Storage, Data and Software

Updates• University Workshops• University Workshops• Preparations for NWSC already underway

– Security Review– User Surveys– Accounting/Allocation Systems– HPSS MigrationHPSS Migration– RFP Process for NWSC Procurement

2 CHAP Meeting29 October 2009

Staff Comings and Goings…Ch• Changes– Director of Operations and Services

• Anke Kamrath, Joined NCAR July, 2009

– Consulting Services Group (CSG)• Si Liu, consultant replaced Wei Huang (moved to VETS)

– Computer Production Group (CPG)R ld R d O t l d K Alb t ( d t RAL)• Ronald Reed, Operator replaced Ken Albertson (moved to RAL)

– Data Analysis Systems Group (DASG)• Kendall Southwick, Student Assistant on Vapor

Supercomputer Systems Group (SSG)– Supercomputer Systems Group (SSG)• Irfan Elahi, promoted to group head• Russell Gonsalves• Nate RiniNate Rini

• Openings– CSG Consulting Position. SE III (Mike Page Replacement)

DASG Vapor Position SE I opening

3 CHAP Meeting29 October 2009

– DASG Vapor Position. SE I opening.– Co-location Manager Position

Computing Update

• Strong Demand for Bluefire– 2 Large memory bluefire nodes redeployed 2 Large memory bluefire nodes redeployed

into batch pool. Bluefire now has 120 32-processor batch nodes

– IPCC AR5 Ramping up (Sept 09-Dec 10)• 12M GAUs needed to complete• Will Generate 1.3PB data

– High Utilization, Low Queue Wait Times

• Decommissioning Lightning and Pegasus

• 3 Racks Added to Frost (BlueGene )System)

– TeraGrid resource (>20 Tflops)

4 CHAP Meeting29 October 2009

High Utilization, Low Queue Wait Times

80.0%

90.0%

100.0%

Bluefire System Utilization (daily average)

• Bluefire system utilization is routinely

40 0%

50.0%

60.0%

70.0%

utilization is routinely >90%

• Average queue-wait time < 1hr

10.0%

20.0%

30.0%

40.0%

Spring ML Powerdown May 3 Firmware& Software Upgrade

time < 1hr

14000

Average Queue Wait Times for User Jobs at the NCAR/CISL Computing Facility

0.0%

Nov

-08

Dec

-08

Jan-

09

Feb-

09

Mar

-09

Apr

-09

May

-09

Jun-

09

Jul-0

9

Aug-

09

Sep-

09

Oct

-09

Nov

-09

4000

6000

8000

10000

12000

# of

Job

s Premium

Regular

Economy

Standby

bluefire lightning blueice bluevistasince Jun'08 since Dec'04 Jan'07-Jun'08 Jan'06-Sep'08

Peak TFLOPs > 77.0 TFLOPs 1.1 TFLOPs 12.2 TFLOPs 4.4 TFLOPsall jobs all jobs all jobs all jobs

Premium 00:05 00:11 00:11 00:12

NCAR/CISL Computing Facility0

2000

<2m

<5m

<10m

<20m

<40m

<1h

<2h

<4h

<8h

<1d

d

Queue-Wait Time

5 CHAP Meeting29 October 20095

Regular 00:36 00:16 00:37 01:36Economy 01:52 01:10 03:37 03:51Stand-by 01:18 00:55 02:41 02:14

<2

>2d Queue

Archive UpdateArchive Update

• NCAR MSS exceeds 8 PBs total data stored –Sep 2009p

• MSS Transition to HPSS– Target date Jan 2011– Erich to present more on this.p

• AMSTAR Data Ooze (migrating data to new tape technology)– Optimized ooze software deployed July 2009. Streams tape to

tape, creates two copies in parallel.– Oozing on average 20+ TBs per day, 6 PBs of data to ooze of

which 4 PBs are uniqueE ti t d l ti d t M A 2010 ll h d f th – Estimated completion date Mar-Apr 2010 well ahead of the Dec 2010 EOL on Powderhorn tape libraries

6 CHAP Meeting29 October 2009

7 CHAP Meeting29 October 2009

Storage Update:D t S i R d iData Services Redesign

• Pending management review• Near-term Goals:Near term Goals:

– Creation of unified and consistent data environment for NCAR HPC– High-performance availability of central filesystem from many projects/systems

(RDA, ESG, CAVS, Bluefire, TeraGrid)

• Longer term Goals:• Longer-term Goals:– Filesystem cross-mounting– Global WAN filesystems (GPFS, and/or Lustre)

• Tentative Schedule:– Phase 1: March 2010

8 CHAP Meeting29 October 2009

Data Collections Update:Major update of International Comprehensive Ocean-ajo upda e o e a o a Co p e e s e Ocea

Atmosphere Data Set (ICOADS) in the RDA

• Release 2.5 e ease 5covers 1662-2007

• Many data sources added sources added over the full period of record

• NEW monthly automated automated update procedure adds preliminary data, now for 2008 - Sept. 2009p

• ICOADS is a 25-year collaborative project with NOAA

9 CHAP Meeting29 October 2009

Annual distribution (1937-2007) of major platform types in Release 2.5 shown as millions of reports per year.

Software Update

• Version 1.5.0 released– Numerous new features Numerous new features

aimed at WRF community• Geo-referenced imagery• Integration with NCL• Moving domainsMoving domains

– Block structured AMR Grids– Ease of use improvements

• New funding streamsd– NSF XD Vis Award

• NCAR, TACC, Utah, Purdue• 2009-2012, $386k• Focus: petascale DAV This typhoon Jangmi image from MMM’s Bill Kuo and

– TeraGrid GIG Award• 2009-2011, $112k• Focus: progressive access

data models

yp g gWei Wang illustrates several new capabilities. A moving, nested domain is tracked by VAPOR. Satellite imagery and plots generated by NCL are correctly positioned in the 4D scene with geo-referencing.

10 CHAP Meeting29 October 200910

Support for Summer ‘09 Workshops

• Climate Modeling Primer, July 27-31– Organized by CGD

(Gettelman lead) and held at NCAR

– 41 participants, mostly graduate students

– Ran low resolution versions of CAM; some work with CCSM

– Provided 10-15 dedicated nodes and overview of using bluefire to participants

11 CHAP Meeting29 October 2009

Support for Summer ‘09 Workshops

• Scaling to Petascale, August 3-7– Organized by Great Lakes Consortium for Petascale

CComputation – Supported by the NSF through the Blue Waters Project (NCSA

and others)Participants joined in from four locations using high definition – Participants joined in from four locations using high-definition streaming video

– Ran a variety of applications– Provided 16 dedicated – Provided 16 dedicated

nodes during hands-on sessions

12 CHAP Meeting29 October 2009

Support for Summer ‘09 Workshops (continued)(continued)

• Ice Sheet Modeling Workshop, August 3-14– Organized by Christina Hulbe (Portland State Organized by Christina Hulbe (Portland State

University), Jesse Johnson (U of Montana), and Cornelis van der Veen (U of Kansas); supported by NSFBrought current and future ice sheet scientists – Brought current and future ice-sheet scientists together to develop better models for the projection of future sea-level rise

– Provided accounts for each participant and d di t d h bl fi i i dedicated share queue on bluefire since using single processor codes

– Some participants will continue to use bluefire based on Johnson’s NSF award for this workshop

http://websrv.cs.umt.edu/isis/index.php/Summer_Modeling_School

13 CHAP Meeting29 October 2009

Preparations for NWSC Already Underway

– Storage Environment• HPSS Migration• Data Services Redesign (DSR)

D t W kfl R i t A l i• Data Workflow Requirements Analysis

– Security Review• Balancing “Science and Security”• Phase 1: Underway for DSR• Phase 1: Underway for DSR• Phase 2: Global Filesystems

– Accounting/Allocation Systems• Current System built in 70s.Current System built in 70s.• Reviewing requirements• Leveraging other products

(e.g., Gold, AmieGold)

– RFP Process for NWSC Procurement• Tom Engel to Discuss Further

– User Survey

14 CHAP Meeting29 October 2009

User Survey Plans

Q1 2010 f ti Q1 2010 survey of active users– Use to baseline current environment for NWSC– Understand where we’re doing well and where we can g

improve• Investigate any area with

>10% dissatisfaction

– Evaluate future needs– Topics

• Ease of access within current security environment

• Ease of use (initially and on-going)• Consulting (include methodology of service delivery)• Documentation for users• Training desired and method of delivery

Follow-up survey ~4 months after NWSC computer

15 CHAP Meeting29 October 2009

p y pavailable to users

On the road to a PetaFlop Computer…On the road to a PetaFlop Computer…

• And 6-15 Petabyte Filesystem• And +100 Petabyte Archive

16 CHAP Meeting29 October 2009

Peak TFLOPs at NCAR (All Systems)

CURRENT NCAR COMPUTING >100TFLOPS

100

IBM POWER6/Power575/IB (bluefire)

IBM POWER6/Power575/IB (firefly)

IBM POWER5 / 575/HPS

ICESS Phase 2

80

IBM POWER5+/p575/HPS (blueice)

IBM POWER5/p575/HPS (bluevista)

IBM BlueGene/L (frost)

60IBM Opteron/Linux (pegasus)

IBM Opteron/Linux (lightning)

IBM POWER4/Federation (thunder)

ICESS Phase 1

20

40 IBM POWER4/Colony (bluesky)

IBM POWER4 (bluedawn)

SGI Origin3800/128

ARCS Phase 3

ARCS Phase 2

ARCS Phase 4

Linux

bluefire

0

20

J 00 J 01 J 02 J 03 J 04 J 05 J 06 J 07 J 08 J 09 J 10

IBM POWER3 (blackforest)

IBM POWER3 (babyblue)

lightning/pegasus

blueskyblackforest

frostbluevista

blueice

ARCS Phase 1

firefly

17 CHAP Meeting29 October 2009

Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10

NWSC HPC ProjectionP k PFLOP t NCAR

1.8

2.0

Peak PFLOPs at NCAR

NWSC HPC (uncertainty)

1.4

1.6

NWSC HPC

IBM POWER6/Power575/IB (bluefire)

1.0

1.2IBM POWER5+/p575/HPS (blueice)

IBM POWER5/p575/HPS (bluevista)

0 4

0.6

0.8 IBM BlueGene/L (frost)

IBM Opteron/Linux (pegasus)

0.0

0.2

0.4IBM Opteron/Linux (lightning)

IBM POWER4/Colony (bluesky)bluesky

ARCS Phase 4ICESS Phase 1

bluefire

ICESS Phase 2

frost

18 CHAP Meeting29 October 200918

Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14

NCAR Data Archive ProjectionTotal Data in the NCAR Archive (Actual and Projected)

120

Total Data in the NCAR Archive (Actual and Projected)

TotalUnique

80

100

60

80

Peta

byte

s

40

P

0

20

9 0 2 3 4 5 6 7 8 9 0 2 3 4 5

19 CHAP Meeting29 October 200919

Jan-

99

Jan-

00

Jan-

01

Jan-

02

Jan-

03

Jan-

04

Jan-

05

Jan-

06

Jan-

07

Jan-

08

Jan-

09

Jan-

10

Jan-

11

Jan-

12

Jan-

13

Jan-

14

Jan-

15

Questions and Discussion

20 CHAP Meeting29 October 2009