Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Research Computing
Muataz Al-Barwani, Ph.D.
December 2019
Outline
• Center for Research Computing
• High Performance Computing
• Computational Research at NYUAD
• Research Support Services
Center for Research Computing
The Center for Research Computing (CRC) at NYU Abu Dhabi offers a set of services that partners with researchers and supports their use of technology as an enabler for their research activities.
Members of our team engage researchers and faculty across all academic divisions, centers and institutes at NYU Abu Dhabi.
We provide High Performance Computing (HPC), Research Application Hosting, Research Professional Services, Research Lab Support and Research Data Science Services.
Page 3
High Performance
Computing at NYUAD
Our goals are to:
• Deliver high quality and efficient High Performance Computing
services exceeding the researchers’ and faculty expectations
• Expand and adapt HPC services to satisfy the researchers’ and
faculty changing research needs for High Performance
Computing
• Maximize the utilization of the HPC by building awareness and
promoting the use of HPC services to the research community
Page 5
• Collaboration with the Technology Industry
• NYU AD is a HPE Beta testing site
• Testing and collaboration with 2CRSI
• Network testing and collaboration with Mellanox and others
• Training and Knowledge Transfer
• Executive training for the Navy staff on HPC management and operation
• HPC Research and Development
• Development of HPC Enhanced Software Management Environment – Presented
at HPC Saudi 2017
• Development of a novel MPI optimization method – Patent filed
Page 6
HPC Center of Excellence
• Locally & Regionally:
• Established UAE HPC Collaboration
Network, members include: Khalifa
University, and UAEU as well as
Ankabut and ADNOC
• Worked closely with American
University of Sharjah (AUS) to
establish a HPC Center at AUS
• Collaboration with KAUST in KSA
• Collaboration with Sultan Qaboos
University (SQU) in Oman
• Collaboration with OMREN in Oman
Page 7
HPC Collaboration
BuTinah: Our first HPC Cluster
What is BuTinah?
BuTinah was NYU Abu Dhabi’s first High Performance Computing (HPC)
Cluster delivered in April 2012. Named after Bu Tinah a tiny protected nature
reserve found in the waters of Abu Dhabi.
In Brief, it was a 70 TFLOPs cluster ranked 397 in the top500 in June 2012 built
in 15 racks consisting of:
Page 9
• 512 Nodes
• 6144 cores
• 48GB RAM each
• 8 High Memory Nodes
• 96 cores
• 192 GB RAM each
• 1 Very Large Memory Node,
• 32 cores
• 1 TB of RAM
• 16 GPU nodes with
• Single NVIDIA Tesla M2070Q
• 96 GB RAM
• 16 Visualization nodes
• NVIDIA Quadro FX 2800M
• All connected through 4xQDR
Infiniband (IB) @ 40 Gb/s
• With 900 TB of Storage (NAS and
Distributed/Parallel)
• Tape backup system (Server, tape
drives and library)
BuTinah Utilization
Page 10
0
10
20
30
40
50
60
70
80
No
ve
mbe
r-12
De
ce
mbe
r-12
Janu
ary
-13
Feb
ruary
-13
Ma
rch
-13
April-1
3
Ma
y-1
3
June
-13
July
-13
Augu
st-
13
Septe
mb
er-
13
Octo
be
r-1
3
No
ve
mbe
r-13
De
ce
mbe
r-13
Janu
ary
-14
Feb
ruary
-14
Ma
rch
-14
April-1
4
Ma
y-1
4
June
-14
July
-14
Augu
st-
14
Septe
mb
er-
14
Octo
be
r-1
4
No
ve
mbe
r-14
De
ce
mbe
r-14
Janu
ary
-15
Feb
ruary
-15
Ma
rch
-15
April-1
5
Ma
y-1
5
June
-15
July
-15
Augu
st-
15
Septe
mb
er-
15
Octo
be
r-1
5
No
ve
mbe
r-15
De
ce
mbe
r-15
Janu
ary
-16
Feb
ruary
-16
Ma
rch
-16
April-1
6
Ma
y-1
6
June
-16
July
-16
Mo
nth
ly A
ve U
tilizati
on
(%
)
Month
Dalma: NYUAD latest HPC Cluster
What is Dalma?
Dalma is NYU Abu Dhabi’s current HPC Cluster – launched in 2016
Named after Dalma Island one of the oldest known permanent settlements in
the UAE with some of the earliest evidence of date palm cultivation going
back 7000 years.
In Brief, it is a 385 TFLOPs (12,000 core) cluster hosted at NYUAD Data Center
in Saadiyat in 20 racks consisting of: of:
Page 12
• 432 Nodes each with
• 28 Broadwell cores
• 128GB RAM each
• 3 Very Large Memory Nodes,
• 64 - 72 cores
• 2 TB of RAM
• 10 GPU Nodes
• 32 Nvidia Tesla V100 GPUs
• Over 3.5 PB of Parallel Storage
(3.3 PB Lustre, 200 TB BeeGFS)
• Over 3 PB Archive (400 TB Disk
and 2.5 PB Tape)
• All connected through a 1 to 1
non-blocking Mellanox EDR
Infiniband (IB) @ 100 Gb/s
• Database server and Viz nodes
Dalma Utilization
Dalma Growth
Academic Year Compute Storage - Scratch Storage – Archive
(Disk + Tape)
2016 236 Nodes 900 TB 1 PB
2017 + 44 = 280 Nodes
Faculty owned
No Change No change
2018 + 148 = 428 Nodes
+ 10 GPU nodes*
No Change Additional Tapes
2019 Visualization nodes
Add Year 4 support
+2.5 PB + 3 PB
2020 Refresh
New Compute & Network No Change No Change
Dalma Growth
NYUAD next HPC Cluster
Coming soon!
Planned Launch 2020
Reason for Refresh
• Dalma – end of life in 2020• Increase in cost of support
• Obsolete technology
• No room for growth • Network limitation
• Space & power limitations in
data center
• Need more Compute• More projects
• New Faculty/Researchers
16
New HPC
Computational
Research at NYUAD
HPC Research Publications
HPC Research Publications
Page 19
Chemistry, 10
Climate Modeling, 25
ComputerScience, 3Engineering, 4
Genomics, 16
Mathematics, 5
Physics, 16
Social Science, 3
Publications up to Oct 2018: 82
Chemistry
Climate Modeling
ComputerScience
Engineering
Genomics
Mathematics
Physics
Social Science
Artificial Intelligence
20
Molecular Modeling & Simulations
Chemistry
12/23/2019 22
Climate Modeling
Page 23
24
Above: Fluid injection
A simulation that ran on 700 cores for
about 8 days and produced 325 GBs
of data which upon post processing
gives that 6 seconds of flow
visualization.
Left: Q-criterion Iso-Contours
Engineering
100 Date Palm Project
Genomics
http://www.thenational.ae/uae/winners-of-khalifa-
international-date-palm-award-announced
NovaSeq 6000
Social Science
26
Research Support Services
Research Computing Support
Research Application Hosting Services
• Network Storage (Research Storage)
• Co-location services
• Managed Server
• Managed Network
• Managed Storage
• Managed Application
Research Lab Support Services
• Transition support service
• Integration support service
Research Professional Services
• Research grant support
• Scientific applications support
• Training
• Programming and Algorithm
development support
Research Data Science Services
• Data Analytics
• Data Visualization
• Data Management
• Big Data
• Artificial Intelligence (AI) Support
Page 28
Compute & Storage @ Saadiyat
• IaaS (Infrastructure as a
Service
• 2 Virtual Hosts providing 100-
200 VMs
• 64 Physical Workstation Blades
• Over 1.2 PB of Storage
• Backed-up to disk and tape
Compute & Storage
• Network Storage
• Total ~ 1.2 PB
• Allocated ~ 686 TB (56%)
• Utilized ~ 540 TB (44%) (78%)
• Co-Location
• Hosting 50 Servers / Workstations
• Managed Server
• 64 Physical Blades
• 119 various VMs
• 50 hosted Servers / Workstations
Research Application Hosting
30
• Managed Application
• Core labs Scheduling
Platform for the CTP
• E.g. Ansys, Cadence and
Synopsys for Engineering
• GitHub
• Managed Network
• Malware Lab for CCS
• Managed Storage
• 3 NAS Storages
• Backup & Archiving
Hyper Converged Infrastructure
• HCI Hardware
• Virtual Machines (VMware)
• Containers (Kubernetes)
• Management & Support
(Rancher)
Research & Development
• 4 Compute + 1 GPU Node
• 97 TB Storage ( NVME ) via VSAN
• 2.37 TB of Memory
• 2 x V100 GPU Cards
• 25 GB Ethernet Backend Connectivity
• 36 Cores CPU Per Node
• GPU Virtualization through NVIDIA GRID ( for VM’s and Containers )
• Minimum turnaround time for VM creation and operation through templates
R & D Hardware
32
Data Science Services
• Data Analytics
• Data Management
• Big Data
• Data Visualization
• Artificial Intelligence
Research Data Science Services
• Data Management & Big Data
• Developing customized data
management plans
• Organizing data (data collection and
analysis)
• Database Management and
Development
• Big Data handling and Processing
• High-memory, multi-processing
computational support.
Research Data Analytics
34
• Data Analytics
• Assistance with data analysis
using available software or
customized tools (e.g. Power BI,
Tableau & QlikView)
• Developing analysis software and
customized pipelines
• Statistical analysis of results.
Research Data Science Services
35
• Artificial Intelligence (AI)
Support
• Advanced Algorithms Design,
Development and Implementation.
• Parallelization and Optimizing of code.
• State of the Art Advanced Model
implementation like Image Recognition,
Social Network Analysis,
Recommendation Engine, Speech and
Text Mining using Deep Learning
frameworks.
• Data Visualization
• Viz Wall (3x3 49” HD screens)
• Visualization Resources
• Viz tools (e.g. GIS & Web Maps,
ggplot2 and Matplotlib, Power BI,
Tableau & QlikView)
• Visualization professional
Service
Research Data Visualization
36
Operating Model
• Support and Consulting
• Collaboration and Projects
Research Data Science Services
37
• Support and Consulting
We provide short-term support on the following:
• Research data organization: sharing and secure storage
• Research data processing / cleaning
• Research programming
• Selection and interpretation of statistical methods
• Research data visualization
• Using HPC & Cloud
Research Data Science Services
38
Research Data Science Services
39
• Collaboration and Projects
We provide extended support & partnership over the lifecycle of a research
project by embedding a data scientist in a research team.
We can
• Design and implement a data analysis pipeline (including Data
Analytics, Big data and AI)
• Develop prototypes of the research focused software tool.
Questions
Thank You!
Do you have any Questions?
Page 40