13
Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque, NM LLNL-PRES-405061

Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

Embed Size (px)

Citation preview

Page 1: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

Overview of the Computer Resource Team (CRT)

Blaise Barney (LLNL)Rob Cunningham (LANL)

Barbara Jennings (Sandia)

PSAAP Kickoff MeetingJuly 8, 2008 Albuquerque, NM

LLNL-PRES-405061

Page 2: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

2

What Is The CRT?What Is The CRT?

The Computer Resource Team (CRT) is the component of the PSAAP program that connects Alliance researchers to the High Performance Computing (HPC) resources required to perform their work

The CRT is comprised of a representative from each NNSA Lab who is familiar with their lab's computing resources, personnel and policies.  The following individuals serve on the CRT: • Blaise Barney, LLNL • Rob Cunningham, LANL• Barbara Jennings, SNL

Our primary purpose is to provide assistance and guidance in all aspects related to the use of HPC resources located at LANL, LLNL, Sandia (and SDSC)

Page 3: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

3

What Does The CRT Do For You?What Does The CRT Do For You?

Assist with the establishment and use of computer accounts

Assist with accessing compute resources

Provide essential HPC user documentation

Provide technical support and referral to in-depth consulting

Conduct monthly telecons to keep Alliance users up-to-date with account, access, policy, scheduling and technical issues, and to address issues with HPC platform usage

Interface with other individuals and groups within the Labs, such as management, networking, system administration, storage, customer support, etc., to facilitate the effective support of Alliance users

Track and facilitate the resolution of problems reported to each Labs' customer support “hotline”

Provide training opportunities

Collect and distribute monthly machine usage statistics

Schedule and support special/dedicated runs

Maintain a balance of machine usage between the Alliances

Conduct annual Alliance visits to discuss HPC resources, user issues and to offer technical consultation and/or training

Showcase Alliance research in the NNSA/ASC research exhibit booth at the annual SC conference

Page 4: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

4

HPC Compute Resources Available To The AlliancesHPC Compute Resources Available To The Alliances

Page 5: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

5

Computer AccountsComputer Accounts

Alliances need at least one account authorizer. This can be a PI, POC and/or a trustworthy, knowledgeable designee

Account authorizers are responsible for overseeing the accounts and machine usage for all of their Center's users

Each Lab has its own policies, forms and procedures, however there is a single entry portal (sarape.sandia.gov) for requesting an account at any of the 3 labs

Account processing for non-US citizens requires additional time and “paperwork” - allow 30-90 days (plan ahead)

Having a “backup” authorizer is important if the primary authorizer is often not available

The CRI has sent all PSAAP POCs and PIs “quick sheets” for getting started with account requests and account management.

Questions? Contact your CRT representative (depends upon the Lab where the account is requested)

Page 6: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

6

Computer AccessComputer Access

To access any machine, you must first have an account on that machine

As with accounts, each lab has its own access policies and procedures

All 3 labs require a valid computer account, ssh and use of a password generating device (cryptocard / one-time token), which is sent to you after your initial account request is approved

Additionally, LLNL requires remote users to access resources through VPN (virtual private network):• Makes your local machine appear to be on the LLNL network

• VPN accounts are included with original account applications

• Requires a one-time software download, install and config - or - simply connect via a web interface

Page 7: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

7

User DocumentationUser Documentation

Most of what users need to know is available online via web pages hosted by each of the labs. Recommended starting points:• LLNL

– computing.llnl.gov– computing.llnl.gov/tutorials/lc_resources

• LANL– computing-int.lanl.gov– int.lanl.gov/projects/asci/training/Intro

• Sandia– hpc.sandia.gov– clik.sandia.gov

• SDSC– www.sdsc.edu/us

Access to this information varies:• LLNL, SDSC: most web pages are open – no

authentication required

• Sandia, LANL: most web pages require authentication (need an account setup first)

Page 8: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

8

HPC TrainingHPC Training

Training is important – especially for new users

Online tutorials are available (see previous User Documentation links)

Workshops conducted at the Labs are open to Alliance users

Training delivered at your Center or over the Access Grid is also possibleTopics include:• Getting Started Information

• Compilers

• Performance tools, Optimization

• Debuggers

• Parallel programming (MPI, OpenMP, Pthreads…)

• Batch schedulers

• Architectures (Purple, Redstorm, TLCC, etc.)

• Visualization tools

Topic specific, customized training? The CRT can assist here too.

Page 9: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

9

Customer Support and Problem TrackingCustomer Support and Problem Tracking

Customer support for technical and accounting issues is available via phone and email during normal business hours:

Problems and questions are tracked via a customer support database application (varies with each Lab).

Most problems/questions are handled via “Tier 1” support – the “hotline” at each Lab.

More in-depth issues are typically referred to local “Tier 2” support – a specialist.

The labs coordinate with hardware and software vendors for issues that require outside “Tier 3” support.

Off-hours support handled by Operations staff

CRT reps coordinate regularly with each other on Tri-lab user issues.

Page 10: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

10

Dedicated Runs (DATs)Dedicated Runs (DATs)

Normally, Alliance users share machine usage with other users - jobs are typically submitted to a batch system, queued, and wait their turn for execution.

Additionally, there are limits on the number of nodes and number of hours that a job can use.

Exclusive use of a machine (dedicated application time - DAT) can be requested by any Alliance. For example, at LLNL:• Most weekends are dedicated to Alliance use of the

ALC and UP clusters

• Normal node/time limits are not in effect

• No other user jobs are run - only those of the scheduled Alliance(s)

How to request a DAT:• LLNL: computing.llnl.gov/forms/ASC_dat_form.html

• LANL: email to [email protected]

• Sandia: email to [email protected]

Page 11: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

11

CommunicationsCommunications

Monthly telecons and email list ([email protected])• Active participation by all 8 Alliances, LLNL, LANL, Sandia and SDSC

• Forum for discussion/questions on user topics such as accounts, access, technical issues, machine schedules, etc.

• First Wed each month, 1:00pm Pacific time

• Toll-free number hosted by the CRT: 866-914-3976 code: 187522#

• Minutes are distributed via our email list to all Alliances, ASC HQ and various staff & managers within the Labs

• Let us know if you want anyone else at your Center added to our list - initially it includes only your POC and PI

Usage stats• Collected by the CRT and distributed with the telecon minutes

• Present both aggregate and detailed usage (down to the user level) for each Lab (and SDSC).

Page 12: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

12

CommunicationsCommunications

Email & phone• Customer support staff at each lab are available for assistance and are

also active in sending out important machine/network status notices.

• The CRT can be contacted directly by any of your Center's users:– Blaise Barney (LLNL) [email protected] 925-422-2578– Rob Cunningham (LANL) [email protected] 505-665-4444  x05704 – Barbara Jennings (Sandia) [email protected] 505-845-8554

Visits• Annual visits (2-4 hrs) to the Alliances by the CRT and Lab customer

support staff:– Focus is on the Alliance users of HPC computing resources– Updates on architectures, policies, future plans at the Labs– Forum for discussing user issues, problems, questions– We can include technical "training" sessions also if desired

• We'll be contacting you soon to setup an initial visit - after your users have accounts - possibly Sep-Oct time frame?

Page 13: Overview of the Computer Resource Team (CRT) Blaise Barney (LLNL) Rob Cunningham (LANL) Barbara Jennings (Sandia) PSAAP Kickoff Meeting July 8, 2008 Albuquerque,

13

Questions?Questions?