Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
A Brief History of Supercomputing in Canada and its Emerging Future
Thinking Forward Through the
Past: A Brief History of Supercomputing in Canada and its Emerging Future
2
A Brief History of Supercomputing in Canada and its Emerging Future
Acknowledgements
Compute Ontario gratefully acknowledges the wealth of information provided in reports/articles
published by Allan B. MacIsaac and Mark Whitmore, C3.ca, Compute Canada, LCDRI, and the
Pawsey Supercomputing Centre which served as primary references for the information in this
document. The views expressed in this document are those of Compute Ontario and do not
reflect the opinion of the Province of Ontario, any of the Ontario high-performance computing
consortia, or the authors of the reports/articles referenced in this document.
3
A Brief History of Supercomputing in Canada and its Emerging Future
Table of Contents
EXECUTIVE SUMMARY 5
WHAT IS A SUPERCOMPUTER? 6
WHAT IS ADVANCED RESEARCH COMPUTING? 6
INTERNATIONAL HISTORY OF ARC AND SUPERCOMPUTING 7
HISTORY OF SUPERCOMPUTING IN CANADA 10
CANADA’S ARC ECOSYSTEM 16
CANADIAN DRI PARTNERS 17
COMPUTE CANADA 18
INNOVATION SCIENCE AND ECONOMIC DEVELOPMENT (ISED) 21
PRINCIPLES TO GUIDE A NATIONAL DRI STRATEGY 22
IMPROVED FEDERATED COORDINATION 23
ONTARIO’S ARC ECOSYSTEM 23
THE ONTARIO SUPERCOMPUTING CONSORTIA 24
WHERE DOES ONTARIO GO FROM HERE? 26
A WAY FORWARD FOR ONTARIO 26
DRI ENVIRONMENT – CHALLENGES AND OPPORTUNITIES 27
THE EMERGING FUTURE 28
4
A Brief History of Supercomputing in Canada and its Emerging Future
Thinking Forward Through the Past -
A Brief History of Supercomputing in Canada and its Emerging Future
Technology has always played a critical role in shaping our societies; it empowers individuals by
enabling better information exchange, education and medical care, to create a more enriching
life.
What makes autonomous vehicles possible? What enables development of smart cities? What
is allowing scientists and doctors develop personalized medicine for a world population of 7.7
billion? The answer to all of these questions is High-Performance Computing (HPC), also called
supercomputing.
While there is some debate about popular culture’s first introduction to supercomputers, IBM’s
Watson was one influence when in 2011, it successfully competed in the television game show
Jeopardy! winning the first place prize of $1 million. Initially developed to answer questions
posed in natural language and taking its namesake from IBM’s first CEO, Thomas J. Watson,
the computer introduced the public to machine learning capabilities and optimized hardware.
However, supercomputers had existed long before Watson was revealed to the world and, as
this document explains, have been key to working on problems and equations that are either too
large or too complex for personal computers.
Understanding where we came from and where we are today is critical to understanding the factors that shape our future. Only through studying history can we grasp how things change; only through history can we begin to comprehend the factors that cause change; and only through history can we understand what elements of an institution or a society persist despite change.1 It is principally for these reasons that the team at Compute Ontario began writing this document. We feel it is timely to release this document to contribute to a national conversation at a time when significant resources to support supercomputers for research are being deployed. Departing from a purely historical account of Canada’s supercomputing history, this document offers commentary on critical learning from supercomputing history in Canada that can inform the formation of the new national agency currently being formed. Time will tell whether these intentions are realized, and as this report’s title indicates, we think forward through the past to create the future that Canada needs.
Nizar Ladak President and CEO Compute Ontario
1 Stearns, P. Why Study History, 1998, p.2
5
A Brief History of Supercomputing in Canada and its Emerging Future
Executive Summary
At the time this report is being written, Canada’s Advanced Research Computing (ARC)
ecosystem is undergoing significant changes. A new national organization is being formed to
oversee and coordinate the growth of Canada’s ARC sector. Research data management,
Research software, and Advanced Research Computing are being brought together under a
single national coordinating organization. Over a half a billion dollars ($572.5M) was identified
in the 2018 Federal Budget to support this new organization and the growth of this sector.
This report aims to document key milestones in the evolution of Canada’s advanced research
computing endeavour. Understanding this history is critical to appreciating the lessons learned
and to build upon past successes as the ecosystem evolves.
A key purpose of this document is to help those engaging with Compute Ontario to appreciate
the history of supercomputing and how it has shaped Ontario’s ARC ecosystem. Beginning from
an international history of supercomputers, to a focus on Canada, Compute Ontario then
documents challenges, opportunities, and future considerations for its own provincial ARC
ecosystem. It is the intent of Compute Ontario to use this document for its own strategic
planning purposes for a Board retreat scheduled in the fall of 2019.
Beyond Ontario’s own uses of this material, a key intent of this document is to provide those
charged with the responsibility nationally in developing a new organization and to enhancing
Canada’s ARC ecosystem with lessons learned and constructive advice. We offer five key
pieces of advice to ISED and the new national organization that are elaborated in the following
pages described in the report.
1. The value of Highly Qualified Personnel (HQP) cannot be over-emphasized. They are
the lifeblood of the ecosystem and arguably are Ontario and Canada’s competitive
advantage. However, the advantages are quickly lost and must be continually
cultivated.
2. Episodic funding and out-dated models of cost-sharing have outlived their utility.
Predictable funding models must be implemented as the first order of business for the
new national organization. Knowledge of effective funding models exists within the
system and is informed by decades of experience – capitalize on it, don’t ignore it.
3. This entire document emphasizes how in supercomputing history, it is critical that a
researcher-focussed lens is applied. From the history of c3.ca to challenges in
governance, adopting a researcher-focussed approach is vital.
4. Grass-roots approaches in governance have seen the most benefit and have been
responsible for many of the gains made in Ontario and Canada’s ARC sector. National,
regional, institutional, and consortia led governance approaches can co-exist.
5. Hardware and people enjoy a symbiotic relationship. The systems described in this
document have made the Top 500 for a simple reason. Talent migrated toward
systems. In developing future systems, appreciate that talent must have easy access
in order to cultivate the support systems researchers depend upon.
This report begins with a basic definition of supercomputers and the ARC sector and briefly
summarizes key milestones in the development of each internationally and nationally.
Compute Ontario hopes this report provides a useful perspective for leaders in Canadian
universities, governments, industries, and research organizations wanting to gain a broad
understanding of Canada’s digital research infrastructure ecosystem.
6
A Brief History of Supercomputing in Canada and its Emerging Future
What is a Supercomputer?
Supercomputers are extremely powerful computers designed to work on large and complex
problems and data sets that are beyond the capability of normal computers. As our existing
research and data sets evolve, so do the resulting challenges and required analyses. The
processing capabilities of supercomputers have grown dramatically since the 1965 launch of
CDC 6600, which is generally recognized as the world’s first supercomputer. Built by the “father
of supercomputing,” Seymour Cray, the CDC 6600 represented a turning point in the history of
research computing, and it set into motion much of the technological developments we see in
research today. The fastest supercomputer in 2018 was roughly 70 billion times faster than the
CDC 6600!
Modern supercomputers rely on harnessing the compute power of as many as a million
processors working together, in parallel, on the same problem. This is similar to manufacturing
cars in an assembly plant. The most efficient way to build cars is to have separate teams, each
working on specific parts at the same time. One team will build the engine, while another will
build the frame, so that multiple tasks are completed in parallel.
Writing computer codes that can work in parallel and make use of many processors at once is
still a challenging task. Many common applications and codes can only run effectively on one or
perhaps a handful of processors. Extremely skilled programmers or highly qualified personnel
(HQP) are needed to develop, debug, and improve the specialized research codes that can run
effectively on hundreds or thousands of processors. These codes typically need to be modified
and rewritten with each generation of supercomputers.
Developers and designers are in a continual race to keep up and outshine the existing
supercomputers due to constant technological developments. Supercomputers improve and
evolve rapidly as there is an insatiable need for computing power in order to tackle even-larger
and more complicated problems with increasing fidelity. More often than not, supercomputers
help drive changes that become mainstream and help shape other innovations.
What is Advanced Research Computing?
Modern research, in virtually all domains, often involves significant computational work which
may not require supercomputers and massively parallel codes. Policymakers in Canada
introduced the term “advanced research computing” or “ARC” to refer to the full-range of
computing needs of researchers while using the term “high performance computing” or “HPC” to
refer to the subset of those computing needs which can only be met on a supercomputer.
Due to the complexities involved in ARC, it needs an ecosystem of resources to support it. Like
the analogy of manufacturing a car described earlier, ARC requires HQP to optimize and run
algorithms as well as hardware and software resources, data storage, and data management.
This entire ecosystem, combined with networking resources and cybersecurity together
constitute the bulk of the Digital Research Infrastructure (DRI) ecosystem in Canada.
This ecosystem forms the basis of many of the more sophisticated streams of study and
innovation today, such as artificial intelligence (AI), machine learning, personalized genomic
7
A Brief History of Supercomputing in Canada and its Emerging Future
medicine, cleantech, nanotech, among many others. Thus, advanced research computing and
HPC are crucial to accelerate a country’s growth, solve operational challenges, and build a
more competitive economy that is driven by innovative products and services.
International History of ARC and Supercomputing2
ARC and HPC have been a rather recent phenomenon, and have done much to change the
world in the last 60 years. Although the earliest supercomputing investments can be dated back
to the early 1950s,3 most industry insiders consider 1964-65 to be the year when
supercomputers were invented and were used to solve industrial problems. What follows is a
brief timeline of major milestones in the global history of supercomputing.
1946 - ENIAC
The Electronic Numerical Integrator and Computer, or ENIAC, was launched in 1945 and was
the world’s first general purpose electronic computer. It was used by the United States Army to
calculate artillery firing tables. The power and scope of the ENIAC fired up the imagination of
the public and was often referred to in the media as a “giant brain.”4
1965 – CDC 6600
The CDC 6600 is generally considered to be the first supercomputer in the world. Designed by
Seymour Cray, the CDC 6600 was up to ten times faster than the world’s fastest computer in
1965, the IBM 7030 Stretch. Additionally, the CDC 6600 was approximately the size of four filing
cabinets as compared to the IBM 7030, which was roughly 600 square meters or the size of an
average house. Launched by the Control Data Corporation (CDC), the CDC 6600 revolutionized
the world of supercomputing.
1972 – ILLIAC IV
Once the CDC 6600 was launched, several computing processor manufacturers started
perfecting the supercomputer through different variations. The ILLIAC IV launched in 1972, was
the first to be built with parallel architecture which allowed for multiple processors to work
together just as they do in the supercomputers today. However, bad project management and
costs that ran four times as much as the initial estimates gave the ILLIAC IV a bad name.
Despite this, its model formed the basis of all supercomputers we use today.
1976 – Cray-1 and Vector Programming
2 Readers please note, this is not an exhaustive or complete list and editorial liberties were taken in selecting what the authors felt were significant milestones in supercomputing evolution. 3 Matlis, J. (2005, May 31). A brief History of Supercomputers. Retrieved from https://www.computerworld.com.au/article/132504/brief_history_supercomputers 4 ENIAC (2013, October). Retrieved from https://whatis.techtarget.com/definition/ENIAC
8
A Brief History of Supercomputing in Canada and its Emerging Future
After launching the CDC 6600, Seymour Cray left CDC to launch his venture and Cray-1. Cray
believed that vector processing, rather than multiprocessing, was key to building superior
supercomputers. In laymen terms, both vector and multiprocessors are parallel processors, but
they each work differently. A vector processor has a single instructor stream, but each
instruction works on an array (or vector) of data items in parallel. At the time, Cray quipped, “if
you were plowing a field, which would you rather use: two strong oxen or 1024 chickens”.
Priced at $10 million, the Cray-1 increased electricity bills by ten-fold. Supercomputers built with
vector programming dominated the industry for over 20 years but eventually gave way to
parallel architectures that continue to lead sales today.
1993 – Birth of the Top500
The performance of any processor or computer system can be defined in terms of the number
of floating-point operations it can perform per second or commonly referred to as flops. The
historic CDC 6600 was capable of up to 3 million flops while the fastest supercomputer in 2018
had a theoretical peak speed of 200 petaflops, which is 70 billion times faster.
Since 1993, the world’s supercomputers have been evaluated and ranked on the Top500 list. Updated twice a year, the list ranks supercomputers by measuring the number of flops they achieve on the standard LINPACK benchmark. The list has been particularly helpful as it documents how often today’s supercomputer becomes tomorrow’s fading star with the continuing upgrades and software changes. In 2012, the average age of a system on the list was 1.26 years, and the Top500 had an attrition rate of 190 systems each year.5 This phenomenon is an indicator of how crucial it is for researchers and HQP to continue to learn and upgrade their skill sets. The Top500 list continues to serve as a reference point for the HPC industry. In fact, in June 2019, the 53rd edition of the TOP500 was released. This particular edition was significant since for the first time only Petaflop systems made the list. The total aggregate performance of all 500 systems has now risen to 1.56 Exaflops6.
1997 – ASCI Red
The ASCI Red was launched in 1997 under the Accelerated Strategic Computing Initiative
(ASCI) of the government of the United States and was the first supercomputer to boast of a
teraflop system. It was built by Intel and was installed at the Sandia National Lab, and remained
the fastest supercomputer for four years. It also made it to Top500 lists seven times over a
number of years.
2002 – Earth Simulator
Japan launched the Earth Simulator to predict tectonic movements and create solutions to
environmental challenges in 2002. Designed with an aim to create a virtual Earth and model
various environmental simulations, the Earth Simulator was the most powerful supercomputer at
5 Dongarra, J., Meuer, H., Simon, H., Strohmaier, E., (2015, November). The Top500 List and Progress in High Performance Computing. Retrieved from http://www.netlib.org/utk/people/JackDongarra/PAPERS/top500-progress.pdf 6 https://www.top500.org/lists/2019/06/highs/
9
A Brief History of Supercomputing in Canada and its Emerging Future
the time. It featured thirty-two teraflops of performance power while its closest competitor was
running on just seven. It is the last remaining supercomputer to utilize classical vector
processing.
2004 – IBM Blue Gene
In 2004 the IBM Blue Gene replaced the Earth Simulator that led the Top500 list for two years
as the world’s best supercomputer. Due to its capacity, it was used beyond its original purpose
of simulating protein folding and gene development for biologists.
The IBM Blue Gene revolutionized the world of supercomputing because developers realized
supercomputers might get to a point where they would consume the amount of power used by a
mid-sized town. The Blue Gene/L was designed to use up to 212,000 low-frequency and low-
power processors. This significantly reduced the size, power, and heat generation.
2008 – Roadrunner
Roadrunner was a supercomputer built by IBM in 2008 and was the first to have a petaflop
system. It was the fourth-most energy efficient supercomputer in the world and featured a hybrid
system with AMD processors and IBM PowerXCell. The Roadrunner became obsolete five
years after it was installed, highlighting the speed of advancement within the supercomputing
industry.
2010 – Tianhe-1A
The Tianhe-1A system was the first Chinese system to make it to first place on the Top500 list.
It was also the first system with CPUs and GPUs to rank first on the list and achieved a
performance level of 2.57 petaflops. The Tianhe-1A system was designed by the National
University of Defense Technology in China and was used to address a number of research
problems varying from petroleum exploration to simulation of large aircraft designs.
2011 – K Computer
Named after the Japanese word for ten quadrillions, ‘kei,’ the K Computer was the first to reach
over ten petaflops, approximately ten quadrillion flops. It was more powerful than the next five
supercomputers combined that followed it on the Top500 list. It required a room roughly half the
size of a football oval and over 1,000 km of cable but was energy-efficient for its size.
2018 – Summit
In November 2018’s Top500 list, the Summit system at the Oak Ridge National Lab is the
fastest supercomputer in the world with 191,664 CPU cores and 26,136 GPUs. It is currently the
third-most efficient supercomputer in the world and is the first system to reach exaop, or exa
operations per second, speed. Developed by IBM, Nvidia, and Mellanox for the U.S.
Department of Energy, the Summit is largely used for civilian scientific research.
10
A Brief History of Supercomputing in Canada and its Emerging Future
History of Supercomputing in Canada
1950s – The Birth of Supercomputing in Canada
Canada’s investment in ARC and developing its applications can be traced back to 1952 with
the first research computer being installed at the University of Toronto as a joint initiative
between the University and the National Research Council.7 The Ferranti Mark I, acquired for
$300,000, was powerful enough to help design the St. Lawrence Seaway and thereby fix the
international boundary between Canada and its southern neighbour as the US had no non-
military computing suitable to the task. While there were a series of additional supercomputer
installations during the 1980s, due to lack of funding, various facilities that emerged eventually
shut down. Each story was historically relevant as a milestone in Canadian HPC history as they
characterized the problem with HPC in Canada at the time. As MacIsaac and Whitmore state:
“single generation facilities, with at-best minor upgrades to the hardware before they
disappeared; they had no national mandate to support and develop HPC throughout the
country, and they never had an opportunity to develop and maintain staff to support the
Canadian user community.”8
1962 – The Meteorological Service of Canada
The exception to lack of support from the government was the computing resource maintained
by the Meteorological Service of Canada, subsequently known as Environment Canada. In
1962, that organization acquired the first of its facilities, which have been used ever since to run
weather predictions. While Environment Canada has since maintained first-rate facilities and
staff, it has never had a mandate to support the broader scientific community.;
1967 – University of Waterloo Red Room
In 1967, University of Waterloo computer science professor Wes Graham earned the moniker of
“father of computing” as he advocated for and eventually was able to procure an IBM 360 Model
75 computer, which was the largest academic supercomputer in Canada at the time. The IBM
360 was housed in the university’s Mathematics and Computer building in a room designed with
signature bright red floor tiles, which not only helped contain wiring but became quite an
attraction at the university. Today, the University of Waterloo still uses the iconic Red Room to
house the father of computing’s namesake Supercomputer Graham, one of Canada’s most
powerful academic computers. The red tiles are still in use.
1990s – The Beginning of a New Era
Canada had a low profile in the international ARC research community until the late 1990s. In
1997, Brian Unger of the University of Calgary submitted a Natural Sciences and Engineering
Research Council of Canada (NSERC) grant application called: “HPCnet.” The application
boasted 49 signatories from 11 Canadian Universities, spanning the country from Victoria,
British Columbia to St. John’s, Newfoundland. It was the first time in Canadian history that a
7 MacIsaac A.B. and Whitmore, M (2008) High Performance Computing in Canada: The Early Chapters, p.85 8 MacIsaac A.B. and Whitmore, M (2008) High Performance Computing in Canada: The Early Chapters, p.85
11
A Brief History of Supercomputing in Canada and its Emerging Future
group of universities banded together to build a more sustainable strategy to develop the HPC
ecosystem. It was intended to support access to existing HPC resources, to develop new tools
for using and accessing the facilities, and to foster collaborations. HPCnet was awarded three
years of funding at the level of $175,000 per year, beginning in 1996. A number of critical steps
followed this award that laid the foundation for Canada’s modern-day approach for leadership
and support of ARC.
Three key achievements took place in 1996 that launched Canada’s ARC ecosystem.
1. A group of academic researchers came together to administer the grant and award
funding for support personnel and software development projects;
2. A broad community joined together, with members from the university, government, and
private sectors;
3. The national community of vested stakeholders set forth on an important visioning and
planning mission, culminating in the creation of a new organization, C3.ca and the
publication of, “A Business Case for the Establishment of an Advanced Computation
Infrastructure for Canada.”
The lack of resources nationally was certainly an impediment, but one which the group of researchers at that time was determined to overcome. As an example, an AlphaServer 4100, by Digital Equipment Corporation (DEC) subsequently Hewlett-Packard (HP) Canada, was located at Memorial University of Newfoundland (MUN). DEC and MUN committed to sharing this facility nationally. What began as a consortium of researchers was rapidly evolving into a community that was demonstrating it could successfully share resources across the country.9 C3.ca’s business case presented a plan for a national HPC infrastructure of hardware and software, and personnel joined by a high-speed national network. Considered by many as overly optimistic, the business case presented a notional 7-year budget of approximately $225 million, covering all aspects10.
1999 – 2004 – The Growth of the Canadian HPC Consortia
Many of the early proposals submitted to the CFI were multi-institutional, resulting in the early
forerunners of current HPC consortia. Originally, there were seven regional consortia in
Canada. ACEnet in Atlantic Canada, RQCHP and CLUMEQ in Quebec, HPCVL in Eastern
Ontario, SCINET in Toronto, SHARCNET in Western Ontario, and WESTGRID in Western
Canada provinces. Between 1999 and 2004, these consortia were awarded 12 major CFI
awards amounting to over $100 million and project costs in excess of $250 million overall.11 An
important historical lesson is that each of the awards to these consortia are in part, attributable
to the success of C3.ca and its members. However, the organization itself was not tied to the
success or even the existence of any consortia. This left C3.ca free to carry out its primary
mission of promoting the need to fund HPC research in Canada.
2007 – The Birth of Compute Canada
In 2007, after extensive consultation with C3.ca, the consortia and universities, the CFI created
the National Platform Fund to “provide generic research infrastructure, resources, services and
9 Ibid., p 86 10 Ibid., 86 11 MacIsaac A.B. and Whitmore, M (2008) High Performance Computing in Canada: The Early Chapters: p.86
12
A Brief History of Supercomputing in Canada and its Emerging Future
facilities that serve the needs of many research subjects and disciplines, and that require
periodic reinvestments because of the nature of the technologies” targeted initially at HPC. The
CFI invited a single, national proposal on HPC. The consortia responded with a proposal that
described a structure that reflects the value and critical role that each consortium played at the
time and a management and governance structure to ensure it is a truly national platform. In
response, the CFI awarded $150 million of infrastructure and supported the formation of a new
organization called Compute/Calcul Canada.
Funding of Canadian HPC Systems and our position globally
The creation of CFI helped to spark a new era in ARC for Canada. The impact can be seen
clearly in Fig 1, which shows the top500 ranking of every Canadian system that has appeared
on the list since its inception. The ’90s were largely dominated by the government, including
weather-forecasting, and industry systems. That changed dramatically, beginning in 2000-2001
when a burst of academic systems funded by CFI began to be installed. Since the year 2000,
just under 60% of Canadian entries have been academic systems located on university
campuses or affiliated research hospitals, such as Sick Kids and UHN in Toronto, with 46% of
them in Ontario, 32% in Quebec and 22% in the west.
In total, Canada has had 360 systems on the 53 lists which have been published as of June
2019. This corresponds to 1.36% of all entries and is well under what might be expected from
the Canadian GDP, which averaged roughly 2.1% of world GDP during the same period. This is
consistent with other, independent analyses such as Compute Ontario’s Technology Investment
report which showed that sales of HPC servers in Canada were roughly 2/3 of the G8 average
in the period 2015-2018.
Fig 1 also shows the effect of episodic funding on the academic ARC ecosystem. The number
of systems tails-off as they age, and then there is a burst of new systems when new funding
becomes available. The last two major rounds of CFI funding were the National Platforms Fund
(NPF) which was completed in 2006, but funds flowed in at the start of 2009, and
Cyberinfrastructure which was awarded in 2015 with first systems in 2017. Unfortunately, this
boom-or-bust cycle of funding inhibits integrated planning in the ARC ecosystem, as discussed
in the Compute Ontario Technology Investment report.
November of 2002 marked the first time that an academic system (at HPCVL, now CAC) was
the top-ranked system in the country. Federally-funded weather-forecasting systems had been
the top-ranked Canadian systems for the nine previous years. After June 2002, ten different
academic systems, six from Ontario and two each from Quebec and BC, have ranked as the top
Canadian system for all but two of the last 34 lists. The trajectories of these top-ranked
Canadian academic systems can be seen in Fig 2.
Several of the systems in Fig 2 are noteworthy for various reasons. An early CFI-funded
system was McKenzie at the Canadian Institute for Theoretical Astrophysics, University of
Toronto. When it debuted at #38 in the world in June 2003, it was the highest-ranked academic
system ever in Canada. This despite the fact that it was built for a total cost of just $900K, which
makes it one of the most cost-effective systems to ever appear so high on the list. McKenzie
was the first large-scale Canadian example of a “Beowulf” cluster built from commodity
13
A Brief History of Supercomputing in Canada and its Emerging Future
components and featured novel locally-designed network topologies to boost performance12. It
and other systems of this era demonstrated the ARC innovation and capability that can come
from experienced and empowered “small” sites.
The GPC at the University of Toronto remains the highest-ever ranked Canadian academic
system at #16 in June 2009, and the longest-lived Canadian entry appearing on a total of 14
lists including six entries as the fastest Canadian system. It was a workhorse system for
Compute Canada which provided 25% of all cycles used by Canadian researchers in the years
2010 through 2015 and was finally retired in April 2018 after almost nine years of operation
having run 43 million jobs and delivered 1.9 billion hours of compute time. When installed, the
GPC was the largest cluster in the world with the latest Intel Nehalem CPU. It was also the
largest GPFS cluster in the world and was used by IBM for years as a reference site for the
scalability of the filesystem.
Quebec has hosted a long series of successful systems with Mammouth parallel (Mp) at
Université de Sherbrooke being another highly-ranked system at #40. It was a long-lived system
that was top-ranked in Canada for all six of its appearances on the top500 list. Its successor,
Mp2, appeared nine times but due to stiff competition was ranked first only in its debut at #41.
In the west, Glacier at UBC was the top Canadian system for two lists, beginning Nov 2003, tied
McKenzie’s #38 ranking in June 2004 and appeared a total of seven times.
While CFI was clearly a boon for ARC in Canada, another system that stands out for ranking
and longevity was the SOSCIP BlueGene/Q (BGQ) which was funded by FedDev and installed
at the University of Toronto in 2012. This was the only Canadian installation ever of IBM’s
unique Blue Gene series which emphasized massive-parallelism, high-speed and high-
dimensional interconnects, and world-class energy-efficiency. Expanded in 2014 to 66,536
cores, the BGQ was Canada’s fastest system for nine lists in a row – a record which has never
been matched.
12 Dubinski J, Humble R, Loken C, Martin P, Pen Ue-Li, (2003) McKenzie: A Teraflops Linux Beowulf Cluster for Computational Astrophysics.
14
A Brief History of Supercomputing in Canada and its Emerging Future
Table 1. Listing of all top-ranked Canadian systems from Nov 2002 and on. Previous to
this date, all top-ranked systems were federal weather/climate systems
System Name
Site (consortium)
# of top500 appearances
# lists as top-ranked in Canada
Top ranking worldwide
Date Sector
Fire Queen’s (HPCVL)
2 2 191 Nov 2002
Academic
McKenzie U Toronto (CITA)
5 1 38 Jun 2003
Academic
pSeries 690 IBM 1 1 29 Nov 2003
Industry
Glacier
UBC (WestGrid)
7 2 38 Jun 2004
Academic
Mammouth parallel (Mp)
U Sherbrooke (RQCHP)
6 6 40 Jun 2005
Academic
eServer pSeries
Environment Canada
154 Nov 2008
Weather/Govt
TCS U Toronto (SciNet)
6 1 54 Nov 2008
Academic
GPC U Toronto (SciNet)
14 6 16 Jun 2009
Academic
Mammouth parallel 2 (Mp2)
U Sherbrooke (Calcul Quebec)
9 1 41 Nov 2011
Academic
BGQ U Toronto (SOSCIP/SciNet)
12 9 67 Nov 2012
Academic/industry
Cedar SFU (WestGrid)
5 2 86 Jun 2017
Academic
Niagara U Toronto (SciNet)
3 3 53 Jun 2018
Academic
15
A Brief History of Supercomputing in Canada and its Emerging Future
Fig 1. Canadian systems in the Top500 list over the years
Fig. 1. Every Canadian entry on the top500 list is plotted above and colour-coded by sector
(academic, weather/govt, government, and industry). The world-ranking of systems goes from
#1 at the top to #500 at the bottom, the higher, the better. The system in the top-left corner is
the highest-ranked supercomputer ever located in Canada, a NEC vector system installed by
the federal Atmospheric Environment Service (AES) in June 1993. It is evident that the highest-
ranked academic systems cluster near 2003, 2009 and 2017, and then drop with time reflecting
the vagaries of government funding.
16
A Brief History of Supercomputing in Canada and its Emerging Future
Fig 2. Top-ranked Canadian Systems on the Top500 list
Fig 2. The top-ranked Canadian systems after June 2002 and their histories on the top500 list
are traced above. The world-ranking of systems goes from #1 at the top to #500 at the bottom,
the higher, the better. Prior to this period, the federal government’s weather and climate
systems had always been the top-ranked Canadian systems. HPCVL (Fire), McKenzie, TCS,
GPC, BGQ, and Niagara were all Ontario-based systems. Glacier and Cedar were BC systems.
Mp and Mp2 were based in Quebec. Labels show the rankings of the Ontario-based systems.
Canada’s ARC Ecosystem
As of 2019, Canada has five national systems: Arbutus at the University of Victoria, Graham at
the University of Waterloo, Cedar at Simon Fraser University, Niagara at the University of
Toronto, and and Béluga, the most recent addition in 2019, located at the École de techologie
supérieure and operated by Calcul Québec members. Canada has a number of
supercomputers, of which five are in the June 2019 Top500 list. Together, these systems are
the foundation of Canada’s advanced research computing infrastructure. This ecosystem
provides Canadian researchers with the ability and tools to develop innovative products and
services, push boundaries of research, and engage with the international research community.
Niagara debuted at #53 on the world’s Top 500 list in June 2018 while Béluga is ranked 14 in
the top Green 500 in the June 2019 list.
17
A Brief History of Supercomputing in Canada and its Emerging Future
Canadian DRI Partners
Before proceeding in this document, it is important to introduce other key organizations in the
digital research infrastructure ecosystem in Canada. The list of organizations is by no means
exhaustive and any exclusion is not intended to be a slight, but rather an editorial decision for
the purposes of capturing organizations Compute Ontario routinely interacts with to better serve
researchers.
CANARIE
In 1993, CANARIE was formed to create a leading-edge national network for Canadian researchers. Celebrating its 25th anniversary, CANARIE and its 12 provincial and territorial partners form Canada’s National Research and Education Network. These partner organizations are responsible for network installations within specified geographic boundaries. This ultra-high-speed network connects Canada’s researchers, educators, and innovators to each other and to global data, technology, and colleagues.
Beyond the network, CANARIE funds and promotes reusable research software tools and national research data management initiatives to accelerate discovery provides identity management services to the academic community and offers advanced networking and cloud resources to boost commercialization in Canada’s technology sector. CANARIE’s 2015-2020 strategic mandate includes:
● Providing an internationally competitive ultra high-speed network for Canada’s research, innovation, and advanced education communities;
● Developing, demonstrating and implementing next generation technologies; and ● Assisting firms operating in Canada and Canadian institutions to advance innovation and
commercialization of products and services to bolster Canada’s technology capabilities.
ORION
ORION is Ontario’s only provincial research and education network. Covering 6,000 kilometres, its private network connects regions and over a hundred institutions all over the province, including universities, colleges, hospitals and research institutions, as well as many of Ontario’s school boards. More than two million people in the research and education industry rely on ORION to share and communicate with each other and to connect to a global grid of similar networks across Canada and around the world. ORION’s ultra-fast network can run 50,000 concurrent virtual classrooms. ORION provides
cutting-edge 100Gbps speed, and over 70% of its network will be upgraded to this speed by the
end of 2020. That’s 2,000 times faster than the broadband internet available in most Ontario
homes.
The Canada Foundation for Innovation (CFI)
18
A Brief History of Supercomputing in Canada and its Emerging Future
With an aim to build competitiveness, encourage and fund research infrastructure, the
Government of Canada launched the Canada Foundation for Innovation (CFI) in 1997. Through
an Act of Parliament resulting in its creation in April 1997, the CFI has worked to ensure
Canadian researchers have tools such as cutting-edge labs, facilities, and the equipment they
need to push the frontiers of knowledge in all disciplines, and to contribute to the full spectrum
of research, from discovery to technology development. This has allowed Canada’s brightest
minds to contribute to better health outcomes, a cleaner, greener environment, evidence-based
policy-making, and the competitiveness of Canadian businesses.13 As described on the CFI’s
website, motivated by the mantra “Build it, and they will innovate,” the CFI has been
instrumental in providing the necessary funds to grow Canada’s HPC and ARC ecosystem. For
almost a decade, the CFI has sponsored cyberinfrastructure competitions which have resulted
in the five national data clusters described earlier. The CFI provided leadership and funding at a
critical time in Canada’s supercomputing history, laying the foundation upon which the sector
will grow in the coming years.
Compute Canada
Compute Canada, a national not-for-profit organization funded by the CFI, was launched to
accelerate and consolidate the ARC ecosystem in the country. Compute Canada works in
partnership with the following regional organizations to provide essential services and
infrastructure for researchers and to further their collaborations in all academic and industrial
verticals of study:
• ACENET: The Atlantic Computational Excellence Network is a consortium of Atlantic
Canadian Universities
• Compute Ontario: was established in 2014 and serves the ARC requirements of
Ontario.
• Calcul Québec: a research consortium consisting of 9 members serving all Quebec
universities and colleges
• WestGrid: a regional partner for 15 institutions across British Columbia, Alberta,
Saskatchewan, and Manitoba
Through this partnership across regional organizations, Compute Canada accelerates research
and innovation across the country. Together, the regional organizations and Compute Canada
work toward creating a comprehensive framework that supports Canadian researchers.
Canadian researchers access these resources and infrastructure by applying to Compute
Canada and the respective regional organizations through an annual process called: the
Resource Allocation Competition (RAC). Non-academic users can be granted access on a
case-by-case basis.
13 Our History – CFI. Retrieved from https://www.innovation.ca/our-history
19
A Brief History of Supercomputing in Canada and its Emerging Future
Components of the ARC ecosystem
Like most developing sectors, ARC and DRI will require years, if not decades, worth of intensive
policy development, capital investment, and infrastructure creation to build an ecosystem that
promotes innovation and research. Canada needs to further develop its digital strategy and
strengthen its digital economy in order to enjoy success on a global scale. Key ingredients to be
able to make this happen are human capital, technology, seamlessly integrated systems, and
public policy. Therefore, it is imperative for Canada to work towards building the following:
● Highly Qualified Personnel (HQP): Researchers require support in using
computational power effectively and efficiently. Therefore, it is necessary for each facility
providing computational power to have a strong support system of programmers,
analysts, system administrators, among others to facilitate research. Currently, Canada
does not meet its human capital demands as HQP require extensive training and skill
sets. They often seek employment opportunities south of the border, where
infrastructure and opportunity are more readily available.14 Access to this training is
limited due to a small number of centres within the country. Economies around the world
that are investing in computational power are also investing heavily in the development
of HQP and Canada is lagging due to its historic lack of a focused approach to building
this competency.
● Hardware and software / Technology: Researchers need many processors, access to
large memory, sufficient network capacity, and the right applications in addition to
supercomputers in order to get results as quickly as possible. The ability to translate
data and interpret it, often through the use of visualization technology, is a critical part of
deriving insights to a given problem.
● Data storage and availability: Many of the most complex problems attacked by
researchers require equally complex, and large, datasets. This demands significant
bandwidth and capacity for storage and working memory.
● Seamlessly integrated systems: Researchers and students need to find a seamless
and open line of communication with HQP, support staff, and supercomputing facilities to
enable research and innovation. This completes the ARC ecosystem as it connects the
various dots within the system. In many ways, seamless integration of systems and HQP
is the raison d’être for CO. Enhancing access to systems and growing the HQP in
Ontario and across Canada are the lifeblood of an effective ARC ecosystem.
● Public Policy: ARC and the components mentioned above require an environment that
is conducive to innovation and disruption. To ensure smooth functioning of these
systems, Canada and Ontario need to develop strong frameworks in place for policy,
legislation, and regulation
14 Silcoff, S. (2018, May 3). Retrieved from https://www.theglobeandmail.com/business/technology/article-canada-facing-brain-drain-as-young-tech-talent-leaves-for-silicon
20
A Brief History of Supercomputing in Canada and its Emerging Future
Compute Canada’s Challenges
The promise of Compute Canada came with incredibly high expectations and optimistic
scenarios. Compute Canada’s mission is: “To make Canada a world leader in the use of
advanced computing for research, discovery, and innovation.” Its mandate is: “To enable
excellence in research and innovation for the benefit of Canada by effectively, efficiently and
sustainably deploying a state-of-the-art advanced research computing network supported by
world-class expertise. And to use this network to support a growing base of excellent
researchers, and to serve them as a national voice for advanced research computing.”
While Compute Canada succeeded in many aspects, a series of factors including the changing
research and innovation sector and a failure to adapt led to a change in direction. The needs of
the ecosystem in Canada evolved over time, which required stringent yet agile policy changes
catering to the individual needs of each region and sector. Compute Canada adopted a top-
down leadership style which was a suboptimal fit with stakeholders accustomed to a high-
degree of autonomy, including the provinces that were covering 60% of the expenses of
Compute Canada. Compute Canada’s critics argued that they failed to serve the researchers
who play the most important role within the ecosystem. On May 10, 2013, against mounting
frustration, researchers posted a petition and acquired 238 signatures expressing a lack of
confidence in Compute Canada.15
We now step away from a pure exposition of history and context to take stock of the
lessons learned from Canada’s supercomputing history.
It is crucial for the new national organization being formed to collaborate closely with the
researchers to ensure that the organization, and systems, being created are truly serving their
needs. Many bright spots exist in Canada’s DRI ecosystem and can be built upon in the
ecosystem of the new national ARC organization. For example, the HQP that resides in and
among the regional consortia continues to serve researchers with dedication and unparalleled
expertise. These HQP are seen as the backbone of the DRI infrastructure. Recognition of their
vital role and change management support should be considered by the new organization.
The creation of TECC (Technical Experts of Compute/Calcul Canada) happened prior to the
incorporation of Compute Canada, but was actively supported and expanded upon over the
course of Compute Canada’s existence which was another important development. TECC is a
group of HPC experts who have agreed to work together to better support the Canadian HPC
community to provide guidance to national and regional groups that require access to personnel
with high level technical skills. Members of this group work together, developing standards and
exchanging information in order to maintain the hardware and software used by computational
researchers across Canada.
A prominent voice for researchers and TECC in the formation of the new national ARC
organization will help avoid some of the missteps of the past. One of the key successes of
C3.ca was the creation and publication in 2005 of the Long Range Plan (LRP) for HPC in
Canada which provided a vision of a sustained and internationally competitive infrastructure for
computationally-based research in Canada. One of the hallmarks of a strategically prophetic
and compelling document such as the Long Range Plan (2005) and its follow-on: Renewing the
15 Wadsley, J. (2013, May 10). Retrieved from http://www.ipetitions.com/petition/restore-confidence-in-compute-canada/
21
A Brief History of Supercomputing in Canada and its Emerging Future
Long Range Plan for HPC in Canada (2007) is the fact that most, if not all, of its
recommendations have stood the test of time. At the same token, a dozen years later, the fact
that these recommendations continue to apply also demonstrates how much time has been lost
that could otherwise have represented real progress for HPC in Canada.
It is important to acknowledge that one of the challenges the CFI encountered was the
requirement to use a funding mechanism that was entirely appropriate for administering
research-based competitions, but less than effective in funding shared, distributed,
infrastructure. While the initial culture and successes of C3.ca were based on a collaborative
nature of sharing and broad access to computing capacity, competitions for funding did not
engender the kind of collaboration that C3.ca espoused. Accepting that resources are limited,
and merit-based competitions are the hallmark of any grant-funding agency, applying a similar
funding model for infrastructure requires re-examination and new alternatives for consideration
by the new national organization.
Chair of C3.ca Andrew Pollard stated: “We were excited by the potential, challenged by political
obstacles but united in our focus and resolve. This was about Canadian researchers, built by
Canadian researchers for Canadian researchers and guided by a quintessentially Canadian
ideal of collaboration and sharing.”16
Innovation Science and Economic Development (ISED)
In its 2018 budget announcement, the federal government of Canada committed to
● “Provide $572.5 million over five years, with $52 million per year ongoing, to implement a
Digital Research Infrastructure Strategy that will deliver more open and equitable access
to advanced computing and big data resources to researchers across Canada, and to
● work with interested stakeholders, including provinces, territories, and universities, to
develop the strategy, including how to incorporate the roles currently played by the
Canada Foundation for Innovation, Compute Canada and CANARIE, to provide for more
streamlined access for Canadian researchers.”
The Leadership Council on Digital Research Infrastructure (LCDRI) is a community-driven organization that seeks to provide a unified voice for Canada’s digital research infrastructure community. The LCDRI was tasked by the Federal Minister of Science and Sport to prepare a series of position papers as input to ISED’s work.
The LCDRI released a series of position papers outlining options for improved coordination and equitable funding models specifically for the research data management (RDM) and advanced research computing (ARC) components of DRI. As a step towards delivering this strategy, on April 6, 2019, Innovation Science and Economic Development (ISED) released a call for proposals to create a new organization to deliver, “open and equitable access to advanced computing and big data resources to academic researchers across Canada in order to further enable scientific and research excellence.” The promise of a new organization to oversee and provide leadership in promoting and financially supporting digital research infrastructure growth in Canada is simultaneously exciting and daunting. In some ways, this new organization is emerging in an environment similar to that in which Compute Canada arose, where the
16 Pollard A (2019) in conversation with Nizar Ladak
22
A Brief History of Supercomputing in Canada and its Emerging Future
community looks to the new organization as a panacea to address all systemic ills of the recent past.
The formation of the new national organization is an opportunity to reset certain initiatives and re-design principles that return Canada’s DRI ecosystem to its origins. These origins respect researchers as the core and intentions that were designed to equip them with the tools for innovation and promote intellectual and economic growth.
Principles to Guide a National DRI Strategy
In its submission to the Ontario Ministry of Research, Science and Innovation (MRIS) and to
Innovation Science and Economic Development Canada (ISED), Compute Ontario asserted the
importance of an organizing framework founded upon an agreed set of principles. These
principles should support a modern and globally competitive DRI ecosystem in Canada and
reflect not only the type of system we would like to design but mechanisms for enabling
accountability and giving voice to a diverse community. Considering this, Compute Ontario
suggested the following principles as starting points for discussion. We should aspire towards
solutions that are
o Researcher-Centred
▪ Enabling best in-class research will be central to the goals of Canada’s DRI system;
▪ Researchers and those who represent their interests both within the public and the
private sphere will have a voice in identifying sector needs and setting priorities; and
▪ Researchers will experience a comprehensive system and seamlessly draw on
expertise, tools, and services with the understanding that the system has been
designed to foster world class scholarship and innovation.
o Universal, Equitable, Accessible
▪ Researchers across Canada should have access to the appropriate DRI expertise
and tools required to carry out their work regardless of their research domain or
where they reside.
o Forward-looking, Adaptive, and Nimble
▪ Capable of responding to emerging trends in research and technology in an efficient
and timely manner; and
▪ Continuously develops capacity and supports the skills development of highly
qualified personnel as required to sustain the system.
o Collaborative and Integrated
▪ Promotes collaboration while acknowledging and respecting the diversity and
autonomy of key stakeholders within the ecosystem; and
▪ Results in a comprehensive national system which can also be leveraged to meet
regional research and innovation needs.
o Sustainable
▪ Funding will be available in a predictable and comprehensive way to facilitate long
range strategic planning of both people and tools;
23
A Brief History of Supercomputing in Canada and its Emerging Future
▪ Cost sharing is equitable and, in a manner, proportional to the research requirements
in a region; and
▪ Accountability will be shared through our common understanding of achievable
performance and desired outcomes
Improved Federated Coordination
In its simplest form, a federated approach acknowledges the different strategic goals, and
means, across different stakeholders. Furthermore, while a national strategy is clearly essential,
the past two decades show us that many functions can be delivered very effectively at a
local/regional level as long as there is appropriate coordination.
A federated approach allows us to build on our past successes. Some functions may be
appropriate to coordinate at a national level. Functions such as site operations, procurements,
and training of emerging researchers have been and can be delivered in a decentralized way.
Regional organizations have demonstrated their value-proposition in helping to deliver these
services in close collaboration with universities and institutions. Therefore, we submit that in a
national federation, regional organizations play an important role. As not-for-profit corporations,
regional organizations adhere to legislative requirements on governance, reporting obligations
while seeking to meet the needs of our ecosystem in an accountable way.
A national DRI strategy serves researchers and innovators across the country, co-funded by
multiple provincial governments. In order to respect the autonomy and financial contributions of
our provincial, and institutional partners, federated governance is a necessity.
To realize a world-class DRI ecosystem in Canada, national approaches towards a DRI strategy
might aspire to:
● Reflect principles which promote pan-Canadian ideals on research and a 21st century
DRI ecosystem that is modern and sustainable
● Recognize the importance of a federated approach to coordination which applies those
principles through appropriate governance tools and processes, and recognizes, and
clearly defines, the appropriate roles and accountabilities for all participants in the
ecosystem.
● Apply approaches to funding, which are also grounded in these same set of principles
and foster appropriate investments nationally, regionally, and across sectors as
required.
● Respect the existing strength and diversity of expertise within the DRI sector.
● Unequivocally promote a collaborative approach.
Ontario’s ARC Ecosystem
Compute Ontario
Compute Ontario is an independent, not-for-profit organization that acts as a coordinator for
ARC-related strategy development and investments in Ontario.
24
A Brief History of Supercomputing in Canada and its Emerging Future
As the recent history of ARC ecosystem denotes, collaboration is critical to the advancement of
research and innovation infrastructure. Compute Ontario acts as a relationship builder in
Ontario’s DRI ecosystem through regular and transparent communication amongst DRI
stakeholders.
Compute Ontario works towards the following strategic goals for ARC in Ontario through
collaborations with many partner organizations:
● To serve as a credible voice regarding policy
● Contribute to efforts to promote ARC and its use, provincially and nationally
● Coordinate and support the advanced computing needs of Ontario’s academic research
community and other stakeholders
● Coordinate Ontario’s efforts to develop, retain and increase highly qualified personnel to
support ARC
● Build trust and serve as a focal point for connecting communities
The consortia served by Compute Ontario provide ARC resources such as access to HPC,
hardware and software resources, data storage and management, and most importantly, access
to highly qualified personnel who can assist researchers in using the systems. These consortia
are also affiliated with Compute Canada and offer clients access to resources and computing
power. They have also played a crucial role in developing and training future highly qualified
personnel.
An impact assessment conducted by The Evidence Network (TEN) indicated that the consortia
play a critical role for researchers looking to leverage ARC resources in Ontario.17 These
organizations are succeeding in impacting the knowledge, capabilities, and performance of
researchers across the province. Collectively, the consortia delivered 25,321 hours of teaching
to over 9000 participants in 2018, a 50% increase from the previous year in teaching time, and
number of participants at training events. Ontario continues to boast delivery of over 70% of
training programs nationally, that are credited toward Compute Canada. Each of the consortia
have played a crucial role in Ontario’s success in the global science community.
The Ontario Supercomputing Consortia
• Centre for Advanced Computing (CAC), led by Queen’s University, the CAC provides a high availability high security environment that has resulted in it being home for dozens of health research teams from across Canada working with personal health information. This reputation for security has also garnered the trust of the indigenous community and led to several groundbreaking digital humanities projects with First Nations teams.
While supporting the university research community for over 20 years, the CAC has also become Canada’s leader in industry engagement and is rapidly building an international reputation as it leverages skills in advanced computing, deep analytics, machine learning and artificial intelligence (AI) in general. One area of particular focus is in working with “smart cities” where the combination of big data experience coupled with
17 The Evidence Network (TEN), (2019, March), Impact of Compute Ontario and ARC Consortium Partners, p.23
25
A Brief History of Supercomputing in Canada and its Emerging Future
best-of-breed privacy and security is resulting in job creation and increased economic results for the CAC’s partners.
● HPC4Health has been instrumental in furthering innovation in Ontario’s healthcare
system. It has promoted further research in almost every discipline in healthcare with
studies varying from genomics to medical imaging. HPC4Health focuses on dealing with
a “data deluge” of information, governance of personal health data, and translates this
data into something that will benefit patients by bettering products and services.
HPC4Health is a consortium of health providers who are working together to build the
next-generation compute engine for clinical research. HPC4Health and Compute Ontario
funded initiatives such as the Health Artificial Intelligence Data Analysis Platform
(HAIDAP) directly accelerates Compute Ontario’s goal of advancing Ontario’s research
capability, competitiveness, and their strategy for supporting health data research.
● SciNet, led by the University of Toronto, is Canada’s largest supercomputing centre. It
provides researchers with computational resources and expertise on scales that were
not previously possible in Canada. It houses Niagara, the largest and most powerful
supercomputer in Canada. SciNet, along with SHARCNET, play an integral role in
building HQP workforce in Ontario through training programs and summer schools. Their
technical expertise has helped them become Ontario’s early adopters of new, complex
technology such as quantum computing in the ARC ecosystem. SciNet’s support
expertise, state of the art system, and ability to integrate new technology into the
ecosystem helps advance Compute Ontario’s objective of establishing Ontario as a
global hub for ARC and attracting world class researchers.
● SHARCNET, the Shared Hierarchical Academic Research Computing Network is
comprised of 18 colleges, universities and research institutes. Together, they operate a
network of high-performance computer clusters across a geography of 1800km,
including south-western, central and northern Ontario. It acts as the institutional based
platform for academic access to high-performance computing and on boarding
universities and colleges for HPC usage, HQP training, and hardware distribution. Their
operational philosophy is to build HPC culture by connecting users to a community of
excellence, providing resources to accelerate computational academic research, and
developing entry level users into computationally intensive users. This directly
accelerates Compute Ontario’s strategic priority of advancing the region’s academic
research community. Increasing academia’s research capacity can advance Ontario’s
ARC ecosystem by developing HQP, increasing ARC pervasiveness, and adding
infrastructure to the region.
SOSCIP – Key Collaborator in Ontario’s ARC Ecosystem
In order to further innovation in Ontario, the Federal Economic Development Agency for Southern
Ontario (FedDev Ontario) and the province of Ontario funded several academic institutions, with
IBM as the industry partner, to launch the Southern Ontario Smart Computing for Innovation
Platform (SOSCIP) in April 2012. This consortium worked closely with Ontario Centres of
Excellence (OCE) and many small and medium-sized enterprises (SMEs) across the province to
pair researchers with ARC infrastructure to solve industry and social challenges that drive
economic growth. This helped fuel Canadian innovation within areas of agile computing, health,
26
A Brief History of Supercomputing in Canada and its Emerging Future
water, energy, mining, advanced manufacturing, cybersecurity, among others.18 As of 2018,
SOSCIP helped launch 145 industry-academic collaborative projects and has connected more
than 200 Canadian companies with new data science partnership opportunities.19 Compute
Ontario and SOSCIP signed a memorandum of understanding in 2015, establishing themselves
as strategic partners in Industry Engagement within Ontario’s ARC ecosystem. The alliance has
been extraordinarily collaborative, mutually supporting and benefiting each organization. Board
members for each organization are cross-appointed, enabling consistent messaging and strategic
governance advice for both organizations.
Where Does Ontario Go From Here?
A Hyperion study commissioned by Compute Ontario in 2018 indicated that out of the G8
economies, the Canadian ARC ecosystem is significantly underdeveloped. Canada ranks last in
the G8 when it comes to ARC-related spending,20 and will require further investment to be
internationally competitive. As of November 2017, China has 227 of the world’s fastest
supercomputers, and the US had 109.21 These supercomputers have significantly contributed to
the long-term growth of their respective economies by way of catalyzing innovation projects.
The G8 countries, with the exception of Canada, continue to make large-scale investments in
their ARC ecosystems which reflect in their standings on the Top500 Supercomputers list.
In 2018, there were a total of 13,540 users of the ARC ecosystem in Canada, which led to 1,948
R&D industrial collaborations, 8,158 publications, and the formation of 479 spin off companies.22
Users recorded 251 patents in 2017 alone. However, the economies closest to Canada, in
terms of size, have allotted more resources to their ARC ecosystems. Ontario will equally need
to focus on building a comprehensive and integrated ARC ecosystem that is supported by HQP,
strong policy, and legislative systems, along with hardware and software resources.
A Way Forward for Ontario
The Evidence Network’s (TEN) impact study suggested that going forward, Compute Ontario
should aim to extend their collaborative approaches beyond Compute Canada and the CFI
vertically, and beyond their consortia laterally. This shift of focus has led Compute Ontario to
look for opportunities to collaborate with industry, including the establishment of their Industry
Engagement Committee. Other stakeholders Compute Ontario aims to engage with include
ORION and CANARIE on cybersecurity initiatives, and other organizations that focus on
Research Data Management (RDM). Going forward, Compute Ontario will continue to act as a
18 Who We Are – SOSCIP. Retrieved from https://www.soscip.org/who-we-are/ 19 SOSCIP Impact Report (2018). Retrieved from https://www.soscip.org/wp-content/uploads/2017/08/soscip_impactreport2018_pages.pdf 20 Conway, S., Joseph, E., Norton, Alex., Sorensen, B. (2018, May) Phase 1: Study to support Compute Ontario’s ARC planning 21 Wallace, N. (2019, Jan 14). Retrieved from https://sciencebusiness.net/news/commissions-new-eu92b-procurement-programme-aims-grab-eu-lead-supercomputing 22 Compute Canada Annual Report 2017-2018. Advanced Research Computing – Achieving More Together. Retrieved from https://www.computecanada.ca/wp-content/uploads/2018/12/ComputeCanada-AR2018-EN.pdf
27
A Brief History of Supercomputing in Canada and its Emerging Future
relationship builder between academia, not-for-profits, government, and industry in order to
encourage collaboration among stakeholders to advance Ontario’s ecosystem as a whole.
More importantly, a collaborative and consensus-building model of leadership has served the
organization well and will be a pre-requisite for success as Compute Ontario navigates the
following challenges and opportunities it sees in its environment.
DRI Environment – Challenges and Opportunities
ARC is an essential tool for scientific discovery, innovation, and in turn, economic development.
ARC can lead to critical scientific advancements, new applications, research possibilities, and
improved public policies. However, the lack of awareness surrounding ARC and why Ontario
needs to develop a robust and sustainable support system for it has arguably stalled portions of
Ontario’s innovation economy. This lack of awareness in the country and among policy makers
has resulted in unpredictable funding from the government, which has resulted in shortages of
HQP, lack of technology, and data governance systems. Below is a brief overview of challenges
and how Compute Ontario is working towards actualizing the opportunities through initiatives
and collaborations.
Highly Qualified Personnel
• Challenges: HQP are critical for leveraging Ontario’s economic and social advantage. The C3.ca Long Range Plan recommended a conservative target of 25% of capital investments dedicated to the development of HQP in Canada. By contrast, most of Canada’s global competitors invested between 20% to 50% of their investments in the development of HQP.23 The report also attributed the “scalability gap” in Canada’s
research and innovation infrastructure to the lack of HQP. Compute Ontario’s technology investment study conducted in 2018 by R.A. Malatest and Associates suggested that over 20,000 HQP positions will go unfilled in the next five years in Ontario.24 Greater awareness around HQP career opportunities and more programs to
teach the skills and competencies resulting in HQP are needed to address this challenge. Indeed C3.ca’s follow-on report published in 2007 stated: “The development of highly qualified personnel constitutes the most effective form of technology transfer.”
• Opportunities: Compute Ontario has the opportunity to advance Ontario’s position as a
leader in HQP development and attraction. Compute Ontario can leverage existing
training efforts in the province by recognizing initiatives such as consortia led summer
schools as essential to HQP development and advocating to increase their training
capacity, leveraging and scaling online training initiatives such as webinars, and
continuing to deliver customized training programs such as the OPS Hackathon that was
led by the organization. Compute Ontario can also raise awareness about HQP skills
and competencies through educational networking to promote the development of HQP
23 Building Canada’s Future Research and Innovation Culture. p.5 24 R.A. Malatest & Associates Ltd., (2018, April), Highly Qualified Personnel Study
28
A Brief History of Supercomputing in Canada and its Emerging Future
at all levels of education, and aligning the skills employers consider valuable with HQP
training curriculums.
Technology
• Challenges: Infrastructure and funding are key parts of enabling high quality research in
Ontario. Having up-to-date infrastructure is important, but so are the funding models that
enable it; exploring new models such as industry-based cloud computing, and more
sustainable, predictable funding cycles are important as Ontario’s DRI ecosystem
continues to advance.
• Opportunities: Compute Ontario, through its role as a trusted voice on policy and a
relationship builder, is in a position to support better ARC funding models characterized
by more sustainable and predictable funding cycles. By promoting the scaling of
successfully applied research methods such as ARC-AI convergence and smart city
pilots, Compute Ontario has an opportunity to support greater funding for ARC
infrastructure for innovative research and economic development opportunities.
Government funded credits for access to cloud computing can rapidly enable access to
state-of-the-art technology. Compute Ontario is in a position to explore this new model
utilizing privately owned platforms to conduct innovative research for social good.
Data Governance
• Challenges: As more-and-more sectors of Ontario’s economy increasingly become data
driven, data models need to be established that support the availability, usability,
integration, and security of data. Increased use and access to data governance models
has set the stage for Compute Ontario to leverage their collaborative and strategic
abilities to support frameworks and platforms that enable open data, and greater data
access.
• Opportunities: The need for superior data governance models allows Compute Ontario
to support and enable frameworks that promote the access and use of big data. By
leveraging and scaling projects such as HAIDAP and Smart Cities Pilots, Compute
Ontario is able to articulate how policy and pilot projects can converge to enable big data
use for social and economic development. With more sectors looking to leverage data,
access to raw and curated data is critical to achieving Ontario’s innovative and economic
potential. Compute Ontario is well positioned to enable these opportunities.
The Emerging Future
Canada has a strong history of technological leadership. The Advanced Research Computing
sector has evolved considerably internationally, in Canada and within Ontario. Internationally, in
2021, a short three years from the time this report is written, Argonne National Laboratory will
29
A Brief History of Supercomputing in Canada and its Emerging Future
take delivery of $500 million exascale25 supercomputer developed by Cray Inc. and Intel Corp.
Known as Aurora, the basketball-court-sized machine will use more than 200,000 processor
cores, burn multiple megawatts of power and perform a quintillion calculations in a second.
While the infrastructure is clearly important, and at times can be perceived as the end in itself on
global Top 500 lists, the exciting future is clearly about the nature and type of research that will
be conducted on such machines.
One can only imagine that we will find cures to diseases that have devastated families; or that
researchers will not only halt climate change, but reverse its deadly impact on our planet. Will
researchers discover new galaxies or new planets which humans can inhabit? Of course, any
number of possibilities exist and at the same time, it is nearly impossible to say for certain what
kind of discoveries will occur. However, what an exciting future it can be when exploring the
possibilities. Without question, big data and growing compute resources represent a potential
for researchers that is only limited by human imagination. Compute Ontario is committed to
supporting researchers, industry, and government to ensure human imagination is simply the
only limit to realizing that exciting future.
25 Exascale computing refers to computing systems capable of at least one exaFLOPS, or a billion billion
(i.e. a quintillion) calculations per second.