July 13, 2018 WestGrid Town Hall 2018 WG Town Hall .pdf · A collaboration among libraries,...

Preview:

Citation preview

WestGrid Town Hall: July 13, 2018

Patrick Mann, Director of Operations

Admin

To ask questions:● Websteam: Email info@westgrid.ca ● Vidyo: Use the GROUP CHAT to ask questions.

Outline

1. Federated Research Data Repository (Lee Wilson)2. RAC 2019 preparation (Patrick Mann)3. WestGrid Updates (Patrick Mann)4. Arbutus Cloud expansion & migration (Ryan Enge)5. User Survey Results & Feedback (Erin Trifunov)6. Upcoming opportunities (Erin Trifunov)

FRDR

Federated Research Data RepositoryLee Wilson

Service Manager | Portage/ACENET613.482.9344 ext. 108 | lee.wilson@ace-net.ca |

@lee_wilson001

FRDR

FRDR = Federated Research Data Repository (DFDR en français)

https://www.frdr.ca/

A scalable, federated platform for digital research data management and the discovery of Canadian research data

Discovery Deposit Preservation

FRDR: A Collaborative Effort

A collaboration among libraries, institutions, and federal organizations, involving many stakeholders across the RDM & ARC spheres

● Partnership between Compute Canada/WestGrid and the Canadian Association of Research Libraries (CARL)

● Hosted on Compute Canada hardware and infrastructure, with CC providing development and technical support

● Service operated by Portage, including curation and data management support, with steering and input from CARL, the Network of Expertise, and individual institutions

FRDR Discovery

● FRDR’s harvester indexes data repositories across Canada to make research data held in many repositories discoverable from a single platform

● Currently supports OAI-PMH, CKAN, CSW, Marklogic APIs; harvesting 30+ repositories

Goals:

● Supplement existing repository sites

● Improve discovery of research data across Canada

● Break down repository siloing

● Avoid being “just another repository”

FRDR Deposit

Also a full-featured repository for research data publication:

● Designed for scalability, based on Globus Publication

● Storage is geographically distributed, and can be managed centrally through infrastructure providers (e.g., Compute Canada)

● A place for Canadian researchers to deposit large datasets via Globus File Transfer

● A place to deposit datasets if researcher does not have an appropriate local or domain-specific option

● Compliant with internationally recognized “FAIR” best practices

FRDR Preservation Processing

● Archivematica integration: Preservation processing for long-term usability of datasets

■ Converting file formats into future-friendly formats (e.g. docx-->PDF)

● Creating Archival Information Packages (AIPs)

■ Scalable, automated Archivematica processing for nearly unlimited size files and numbers of files

● parallelized over multiple VMs in CC Cloud

● AIPs transferred to Preservation Service Providers for long-term storage

FRDR in Limited Production

Portage and CC launched Limited Production Sep 21, 2017

Limited Production means:

● Anyone can use FRDR to:

– Search across harvested Canadian repositories

– Download datasets stored in FRDR

● Data deposit is restricted to a select number of research groups

– Portage commits to making their data discoverable and accessible into and beyond the full service launch

– Groups working with us to refine our service ahead of production release

Limited Production Projects

Still accepting new partners for Limited Production!

RAC 2019

Resource Allocation Competition

RAC 2019

RAC 2019

● Coming up fast.● Overlap with usual granting councils!Agency Grant Deadline

SSHRC Partnership Engage Grants Sep 17, 2018

CIHR

Operating Grant: Epigenetics Clinical

Translation Oct 10, 2018

SSHRC Insight Grants Oct 15, 2018

NSERC Discovery Grants Nov 1, 2018

CIHR

Team Grant: HIV/AIDS Biomedical and Clinical

Research Nov 6, 2018

RAC 2019 Draft Schedule

Key Activities Start Finish

Application Guides Available Late August

Fast Track submission period Sep 19 Oct 25

RRG & RPP submission period Sep 19 Nov 8

RPP Progress Report submission period Nov 27 Jan 10

RAC Award letters sent to PIs Late March

Implementation of RAC 2019 allocations Early April

Out-of-Round applications are still available for new faculty and scientific breakthroughs.

RAC 2019 Resources

Remaining legacy systems will be defunded.● WestGrid: Orcinus (UBC) March 31, 2019.● Early days yet, but worthwhile starting on migration.

Overall small increase in total number of cores with more GPUs.● Up-to-date hardware with better performance

So expect similar kinds of success rates, etc.

Great news: GP4 (Béluga) has been ordered. ~30k modern cores and GPUs.1. Delivery sometime late this year2. Installation and testing early Winter 20193. Beta-testing later in the winter4. Available for RAC April 2019.

RAC Preparation

While details not yet available expect similar requirements as for RAC 2018. Worthwhile getting started:● Performance estimates for justification of asks

○ Especially for the new and unfamiliar systems.○ How fast are your apps actually running on the new systems?

● Storage usage and estimates○ Nice if you can cleanup storage!

● Cloud (VM and Storage) usage and estimates.○ And cleanup

● Inactive but valuable data estimates (Nearline)○ Nearline not yet available, but project is in progress and should be

available later in the year.

WestGrid UpdatesPatrick Mann, WestGrid Director of

Operations

Cedar ExpansionCompute expansion is complete (in production April 15 2018), Storage expansion in progress.

● 60,656 cores, 584 GPUs, Intel Omnipath● 3.5 PFlops of theoretical peak performance

Count Node Type Cores Memory Details

576 Base 128G 32 125G 2 x Intel E5-2683 Broadwell 2.1 GHz

128 Large 256G 32 250G (ditto)

24 Large 512G 32 502G (ditto)

24 Bigmem1500 1.5T 32 1510G (ditto)

4 Bigmem3000 3.0T 64 3022G 4 x Intel E7-4809 Broadwell 2.1 GHz

114 Base GPU 24 125G 2 x E5-2650 + 4 x NVidia P100 (12 GB)

32 Large GPU 24 250G 2 x E5-2650 + 4 x NVidia P100 (16 GB)

640 Skylake 48 187G 2 x Intel Skylake 2.1 GHz (30,720 core expansion)

https://docs.computecanada.ca/wiki/Cedar

Cedar CPU Usage

April 15: expansion available

Early May: scheduling issues with large bursts of jobs

May 28 to June 1: major cedar outage for system expansion and updates. (Insufficient notice!)

Original 24,192 cores

Expansion 60,656 cores (up-to-date total)

Expansion

Cedar GPU Usage

584 GPUs.(4 x NVidia P100’s per node.)

April 10 (Townhall) ~50% usage

Current - seems to be peaking at ~400 GPUs.

≳70% usage Variable

GPU-days

Outages & IssuesArbutus cloud July 22 Short BCNet outage for router/network maintenance.

● External connections will be affected.

Graham Aug.21-24 4 day major outage● 2 days for power systems upgrade to 3 MW● 2 days for Lustre shared filesystem upgrade

Orcinus August 25 2 day power systems maintenance.

Cedar End of August 4 day major outage● OS updates (to CentOS 7.5)● Related Lustre and Omnipath upgrades● Dependent on Lustre client testing.

Arbutus cloud September Major outage for cloud expansion (details in cloud presentation)● Dependent on hardware delivery

Scheduling Always! Nothing major but the usual issues with scheduling a wide mix of jobs into saturated systems.

Arbutus UpdateRyan Enge

UVIC Site Lead & Cloud National Team LeadManager

Research Computing ServicesUniversity Systems

University of Victoria

Arbutus Upgrade

● Increased capacity○ Additional ~1400 compute cores○ Additional ~3.5PB useable storage○ Managed DB Service: 2 new DBaaS nodes (also available at Cedar

and Graham)● Updated OpenStack Version (Queens)● New deployment system● New monitoring/alerting infrastructure● Increased performance, scalability and stability

Arbutus Migration

● New URL for access:○ https://arbutus.cloud.computecanada.ca

● Command line access?○ Update API end-points to arbutus○ Update Keystone to V3

● Migration required!● Webinar in early September on What’s New/Migration

Process● Upgrade/Migration information

○ https://docs.computecanada.ca/wiki/Arbutus_West_Cloud_Upgrade

Cloud Survey

Tell Us How We Can Improve Your Cloud Experience● English version:

https://www.surveymonkey.com/r/XRK7GL9 ● French version:

https://fr.surveymonkey.com/r/RFWJC85

BCNET Outage Sunday July 22, 6:00-10:00am PDT● Impacts external network to the Arbutus Cloud

Survey ResultsErin Trifunov

Manager, Projects and OutreachWestGrid

WestGrid User Survey - 2017

Resources Needs

Future needs

WestGrid User Support

Services & Training

User feedback...

2018 Renewals

Satisfaction by Province / RegionOn a scale of 1-5,(1=least, 5=most), how would you rate your satisfaction over the past 12 months with:

1. The resources (such as computing, storage, cloud and Globus) that have been made available to you (either as an individual researcher or as a leader of a research group).

2. The services (such as user support, training) that have been provided to support your research. 3. Your overall satisfaction with Compute Canada.

Province / Region% of Total WG / CC Users

Resources made available Services provided

Overall satisfaction

AB 33% 4.18 4.20 4.28

BC 53% 4.12 4.08 4.16

MB 6% 4.18 4.24 4.28

SK 8% 4.09 4.11 4.15

WestGrid 30% 4.15 4.16 4.22

ON 33% 4.22 4.26 4.32

QC 25% 4.27 4.36 4.37

ACENET 6% 4.31 4.43 4.38

International 5% 4.38 4.42 4.46

Grand Total 100% 4.22 4.26 4.31

Overall Satisfaction

2018 Compute Canada Renewal Survey Overall Satisfaction with Compute Canada(8675 respondents, 2581 from within WestGrid, 37% PIs)

2017 WestGrid User SurveyOverall Satisfaction with Compute Canada(283 respondents, 28% PIs)

Industry Activity

General Feedback

- Not happy with mandatory, linked survey- Migration caused disruptions to research- Concerns over allocation process & perceptions of

fairness- Still not enough resources!- Suggestions for community forum, longer wall times,

and more cloud support

General Updates

Summer School Recap

At UBC June 11-14● 172 attendees● 15 half- or full-day courses, up to 3 parallel streams

At the University of Manitoba June 25-28● 79 attendees● 12 half- or full-day courses, 2 parallel streams

Two summer schools in 2019● locations will be announced during the academic year● advanced research computing: most courses will

assume working knowledge of Linux command line● possible new courses: Julia, code optimization, more

bioinfo, ...

● intro HPC● parallel programming

○ Chapel○ CUDA○ OpenMP○ multi-core Python 3○ PETSc

● serial programming○ Python○ C/C++ functions from Python○ R for bioinformaticians and

microbiologists● databases● Compute Canada cloud● containers for HPC● domain-specific

○ molecular dynamics○ next-gen sequencing

● scientific visualization● commercial

○ MATLAB○ Microsoft Azure Cloud○ Amazon Web Services

Visualize This! in the fall

1. Draw attention to popular 3D open-source tools and workflows for sci-vis

2. Find innovative visualization techniques and make them accessible to all researchers

● Annual event since 2016● Last year: 88 dataset downloads,

fantastic final submissions● Multiple vendor prizes

This year:● molecular dynamics + possible humanities track● look for announcement in mid-September

User Training Archive

https://westgrid.github.io/trainingMaterials/

● Videos, slides, hands-on exercises & other materials from past training sessions

● Links to other guides, documentation & upcoming events

CC Awards

Career AchievementMasao Fujinaga, University of Alberta

Team Choice AwardKamil Marcinkowski, University of Alberta

FULL STORY ONLINE

Website

Visit www.westgrid.ca for latest research profiles, news, training etc. to help stay up to date with ARC.

Support

Contact us anytime:support@westgrid.ca

www.westgrid.cadocs.computecanada.ca

Any issues or problems? We can advocate for WG member and user concerns within Compute Canada.