Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
WestGrid Town Hall: July 13, 2018
Patrick Mann, Director of Operations
Admin
To ask questions:● Websteam: Email [email protected] ● Vidyo: Use the GROUP CHAT to ask questions.
Outline
1. Federated Research Data Repository (Lee Wilson)2. RAC 2019 preparation (Patrick Mann)3. WestGrid Updates (Patrick Mann)4. Arbutus Cloud expansion & migration (Ryan Enge)5. User Survey Results & Feedback (Erin Trifunov)6. Upcoming opportunities (Erin Trifunov)
FRDR
Federated Research Data RepositoryLee Wilson
Service Manager | Portage/ACENET613.482.9344 ext. 108 | [email protected] |
@lee_wilson001
FRDR
FRDR = Federated Research Data Repository (DFDR en français)
https://www.frdr.ca/
A scalable, federated platform for digital research data management and the discovery of Canadian research data
Discovery Deposit Preservation
FRDR: A Collaborative Effort
A collaboration among libraries, institutions, and federal organizations, involving many stakeholders across the RDM & ARC spheres
● Partnership between Compute Canada/WestGrid and the Canadian Association of Research Libraries (CARL)
● Hosted on Compute Canada hardware and infrastructure, with CC providing development and technical support
● Service operated by Portage, including curation and data management support, with steering and input from CARL, the Network of Expertise, and individual institutions
FRDR Discovery
● FRDR’s harvester indexes data repositories across Canada to make research data held in many repositories discoverable from a single platform
● Currently supports OAI-PMH, CKAN, CSW, Marklogic APIs; harvesting 30+ repositories
Goals:
● Supplement existing repository sites
● Improve discovery of research data across Canada
● Break down repository siloing
● Avoid being “just another repository”
FRDR Deposit
Also a full-featured repository for research data publication:
● Designed for scalability, based on Globus Publication
● Storage is geographically distributed, and can be managed centrally through infrastructure providers (e.g., Compute Canada)
● A place for Canadian researchers to deposit large datasets via Globus File Transfer
● A place to deposit datasets if researcher does not have an appropriate local or domain-specific option
● Compliant with internationally recognized “FAIR” best practices
FRDR Preservation Processing
● Archivematica integration: Preservation processing for long-term usability of datasets
■ Converting file formats into future-friendly formats (e.g. docx-->PDF)
● Creating Archival Information Packages (AIPs)
■ Scalable, automated Archivematica processing for nearly unlimited size files and numbers of files
● parallelized over multiple VMs in CC Cloud
● AIPs transferred to Preservation Service Providers for long-term storage
FRDR in Limited Production
Portage and CC launched Limited Production Sep 21, 2017
Limited Production means:
● Anyone can use FRDR to:
– Search across harvested Canadian repositories
– Download datasets stored in FRDR
● Data deposit is restricted to a select number of research groups
– Portage commits to making their data discoverable and accessible into and beyond the full service launch
– Groups working with us to refine our service ahead of production release
Limited Production Projects
Still accepting new partners for Limited Production!
RAC 2019
Resource Allocation Competition
RAC 2019
RAC 2019
● Coming up fast.● Overlap with usual granting councils!Agency Grant Deadline
SSHRC Partnership Engage Grants Sep 17, 2018
CIHR
Operating Grant: Epigenetics Clinical
Translation Oct 10, 2018
SSHRC Insight Grants Oct 15, 2018
NSERC Discovery Grants Nov 1, 2018
CIHR
Team Grant: HIV/AIDS Biomedical and Clinical
Research Nov 6, 2018
RAC 2019 Draft Schedule
Key Activities Start Finish
Application Guides Available Late August
Fast Track submission period Sep 19 Oct 25
RRG & RPP submission period Sep 19 Nov 8
RPP Progress Report submission period Nov 27 Jan 10
RAC Award letters sent to PIs Late March
Implementation of RAC 2019 allocations Early April
Out-of-Round applications are still available for new faculty and scientific breakthroughs.
RAC 2019 Resources
Remaining legacy systems will be defunded.● WestGrid: Orcinus (UBC) March 31, 2019.● Early days yet, but worthwhile starting on migration.
Overall small increase in total number of cores with more GPUs.● Up-to-date hardware with better performance
So expect similar kinds of success rates, etc.
Great news: GP4 (Béluga) has been ordered. ~30k modern cores and GPUs.1. Delivery sometime late this year2. Installation and testing early Winter 20193. Beta-testing later in the winter4. Available for RAC April 2019.
RAC Preparation
While details not yet available expect similar requirements as for RAC 2018. Worthwhile getting started:● Performance estimates for justification of asks
○ Especially for the new and unfamiliar systems.○ How fast are your apps actually running on the new systems?
● Storage usage and estimates○ Nice if you can cleanup storage!
● Cloud (VM and Storage) usage and estimates.○ And cleanup
● Inactive but valuable data estimates (Nearline)○ Nearline not yet available, but project is in progress and should be
available later in the year.
WestGrid UpdatesPatrick Mann, WestGrid Director of
Operations
Cedar ExpansionCompute expansion is complete (in production April 15 2018), Storage expansion in progress.
● 60,656 cores, 584 GPUs, Intel Omnipath● 3.5 PFlops of theoretical peak performance
Count Node Type Cores Memory Details
576 Base 128G 32 125G 2 x Intel E5-2683 Broadwell 2.1 GHz
128 Large 256G 32 250G (ditto)
24 Large 512G 32 502G (ditto)
24 Bigmem1500 1.5T 32 1510G (ditto)
4 Bigmem3000 3.0T 64 3022G 4 x Intel E7-4809 Broadwell 2.1 GHz
114 Base GPU 24 125G 2 x E5-2650 + 4 x NVidia P100 (12 GB)
32 Large GPU 24 250G 2 x E5-2650 + 4 x NVidia P100 (16 GB)
640 Skylake 48 187G 2 x Intel Skylake 2.1 GHz (30,720 core expansion)
https://docs.computecanada.ca/wiki/Cedar
Cedar CPU Usage
April 15: expansion available
Early May: scheduling issues with large bursts of jobs
May 28 to June 1: major cedar outage for system expansion and updates. (Insufficient notice!)
Original 24,192 cores
Expansion 60,656 cores (up-to-date total)
Expansion
Cedar GPU Usage
584 GPUs.(4 x NVidia P100’s per node.)
April 10 (Townhall) ~50% usage
Current - seems to be peaking at ~400 GPUs.
≳70% usage Variable
GPU-days
Outages & IssuesArbutus cloud July 22 Short BCNet outage for router/network maintenance.
● External connections will be affected.
Graham Aug.21-24 4 day major outage● 2 days for power systems upgrade to 3 MW● 2 days for Lustre shared filesystem upgrade
Orcinus August 25 2 day power systems maintenance.
Cedar End of August 4 day major outage● OS updates (to CentOS 7.5)● Related Lustre and Omnipath upgrades● Dependent on Lustre client testing.
Arbutus cloud September Major outage for cloud expansion (details in cloud presentation)● Dependent on hardware delivery
Scheduling Always! Nothing major but the usual issues with scheduling a wide mix of jobs into saturated systems.
Arbutus UpdateRyan Enge
UVIC Site Lead & Cloud National Team LeadManager
Research Computing ServicesUniversity Systems
University of Victoria
Arbutus Upgrade
● Increased capacity○ Additional ~1400 compute cores○ Additional ~3.5PB useable storage○ Managed DB Service: 2 new DBaaS nodes (also available at Cedar
and Graham)● Updated OpenStack Version (Queens)● New deployment system● New monitoring/alerting infrastructure● Increased performance, scalability and stability
Arbutus Migration
● New URL for access:○ https://arbutus.cloud.computecanada.ca
● Command line access?○ Update API end-points to arbutus○ Update Keystone to V3
● Migration required!● Webinar in early September on What’s New/Migration
Process● Upgrade/Migration information
○ https://docs.computecanada.ca/wiki/Arbutus_West_Cloud_Upgrade
Cloud Survey
Tell Us How We Can Improve Your Cloud Experience● English version:
https://www.surveymonkey.com/r/XRK7GL9 ● French version:
https://fr.surveymonkey.com/r/RFWJC85
BCNET Outage Sunday July 22, 6:00-10:00am PDT● Impacts external network to the Arbutus Cloud
Survey ResultsErin Trifunov
Manager, Projects and OutreachWestGrid
WestGrid User Survey - 2017
Resources Needs
Future needs
WestGrid User Support
Services & Training
User feedback...
2018 Renewals
Satisfaction by Province / RegionOn a scale of 1-5,(1=least, 5=most), how would you rate your satisfaction over the past 12 months with:
1. The resources (such as computing, storage, cloud and Globus) that have been made available to you (either as an individual researcher or as a leader of a research group).
2. The services (such as user support, training) that have been provided to support your research. 3. Your overall satisfaction with Compute Canada.
Province / Region% of Total WG / CC Users
Resources made available Services provided
Overall satisfaction
AB 33% 4.18 4.20 4.28
BC 53% 4.12 4.08 4.16
MB 6% 4.18 4.24 4.28
SK 8% 4.09 4.11 4.15
WestGrid 30% 4.15 4.16 4.22
ON 33% 4.22 4.26 4.32
QC 25% 4.27 4.36 4.37
ACENET 6% 4.31 4.43 4.38
International 5% 4.38 4.42 4.46
Grand Total 100% 4.22 4.26 4.31
Overall Satisfaction
2018 Compute Canada Renewal Survey Overall Satisfaction with Compute Canada(8675 respondents, 2581 from within WestGrid, 37% PIs)
2017 WestGrid User SurveyOverall Satisfaction with Compute Canada(283 respondents, 28% PIs)
Industry Activity
General Feedback
- Not happy with mandatory, linked survey- Migration caused disruptions to research- Concerns over allocation process & perceptions of
fairness- Still not enough resources!- Suggestions for community forum, longer wall times,
and more cloud support
General Updates
Summer School Recap
At UBC June 11-14● 172 attendees● 15 half- or full-day courses, up to 3 parallel streams
At the University of Manitoba June 25-28● 79 attendees● 12 half- or full-day courses, 2 parallel streams
Two summer schools in 2019● locations will be announced during the academic year● advanced research computing: most courses will
assume working knowledge of Linux command line● possible new courses: Julia, code optimization, more
bioinfo, ...
● intro HPC● parallel programming
○ Chapel○ CUDA○ OpenMP○ multi-core Python 3○ PETSc
● serial programming○ Python○ C/C++ functions from Python○ R for bioinformaticians and
microbiologists● databases● Compute Canada cloud● containers for HPC● domain-specific
○ molecular dynamics○ next-gen sequencing
● scientific visualization● commercial
○ MATLAB○ Microsoft Azure Cloud○ Amazon Web Services
Visualize This! in the fall
1. Draw attention to popular 3D open-source tools and workflows for sci-vis
2. Find innovative visualization techniques and make them accessible to all researchers
● Annual event since 2016● Last year: 88 dataset downloads,
fantastic final submissions● Multiple vendor prizes
This year:● molecular dynamics + possible humanities track● look for announcement in mid-September
User Training Archive
https://westgrid.github.io/trainingMaterials/
● Videos, slides, hands-on exercises & other materials from past training sessions
● Links to other guides, documentation & upcoming events
CC Awards
Career AchievementMasao Fujinaga, University of Alberta
Team Choice AwardKamil Marcinkowski, University of Alberta
FULL STORY ONLINE
Website
Visit www.westgrid.ca for latest research profiles, news, training etc. to help stay up to date with ARC.
Support
Contact us anytime:[email protected]
www.westgrid.cadocs.computecanada.ca
Any issues or problems? We can advocate for WG member and user concerns within Compute Canada.