29
1 A Brief History of Supercomputing in Canada and its Emerging Future Thinking Forward Through the Past: A Brief History of Supercomputing in Canada and its Emerging Future

Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

1

A Brief History of Supercomputing in Canada and its Emerging Future

Thinking Forward Through the

Past: A Brief History of Supercomputing in Canada and its Emerging Future

Page 2: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

2

A Brief History of Supercomputing in Canada and its Emerging Future

Acknowledgements

Compute Ontario gratefully acknowledges the wealth of information provided in reports/articles

published by Allan B. MacIsaac and Mark Whitmore, C3.ca, Compute Canada, LCDRI, and the

Pawsey Supercomputing Centre which served as primary references for the information in this

document. The views expressed in this document are those of Compute Ontario and do not

reflect the opinion of the Province of Ontario, any of the Ontario high-performance computing

consortia, or the authors of the reports/articles referenced in this document.

Page 3: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

3

A Brief History of Supercomputing in Canada and its Emerging Future

Table of Contents

EXECUTIVE SUMMARY 5

WHAT IS A SUPERCOMPUTER? 6

WHAT IS ADVANCED RESEARCH COMPUTING? 6

INTERNATIONAL HISTORY OF ARC AND SUPERCOMPUTING 7

HISTORY OF SUPERCOMPUTING IN CANADA 10

CANADA’S ARC ECOSYSTEM 16

CANADIAN DRI PARTNERS 17

COMPUTE CANADA 18

INNOVATION SCIENCE AND ECONOMIC DEVELOPMENT (ISED) 21

PRINCIPLES TO GUIDE A NATIONAL DRI STRATEGY 22

IMPROVED FEDERATED COORDINATION 23

ONTARIO’S ARC ECOSYSTEM 23

THE ONTARIO SUPERCOMPUTING CONSORTIA 24

WHERE DOES ONTARIO GO FROM HERE? 26

A WAY FORWARD FOR ONTARIO 26

DRI ENVIRONMENT – CHALLENGES AND OPPORTUNITIES 27

THE EMERGING FUTURE 28

Page 4: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

4

A Brief History of Supercomputing in Canada and its Emerging Future

Thinking Forward Through the Past -

A Brief History of Supercomputing in Canada and its Emerging Future

Technology has always played a critical role in shaping our societies; it empowers individuals by

enabling better information exchange, education and medical care, to create a more enriching

life.

What makes autonomous vehicles possible? What enables development of smart cities? What

is allowing scientists and doctors develop personalized medicine for a world population of 7.7

billion? The answer to all of these questions is High-Performance Computing (HPC), also called

supercomputing.

While there is some debate about popular culture’s first introduction to supercomputers, IBM’s

Watson was one influence when in 2011, it successfully competed in the television game show

Jeopardy! winning the first place prize of $1 million. Initially developed to answer questions

posed in natural language and taking its namesake from IBM’s first CEO, Thomas J. Watson,

the computer introduced the public to machine learning capabilities and optimized hardware.

However, supercomputers had existed long before Watson was revealed to the world and, as

this document explains, have been key to working on problems and equations that are either too

large or too complex for personal computers.

Understanding where we came from and where we are today is critical to understanding the factors that shape our future. Only through studying history can we grasp how things change; only through history can we begin to comprehend the factors that cause change; and only through history can we understand what elements of an institution or a society persist despite change.1 It is principally for these reasons that the team at Compute Ontario began writing this document. We feel it is timely to release this document to contribute to a national conversation at a time when significant resources to support supercomputers for research are being deployed. Departing from a purely historical account of Canada’s supercomputing history, this document offers commentary on critical learning from supercomputing history in Canada that can inform the formation of the new national agency currently being formed. Time will tell whether these intentions are realized, and as this report’s title indicates, we think forward through the past to create the future that Canada needs.

Nizar Ladak President and CEO Compute Ontario

1 Stearns, P. Why Study History, 1998, p.2

Page 5: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

5

A Brief History of Supercomputing in Canada and its Emerging Future

Executive Summary

At the time this report is being written, Canada’s Advanced Research Computing (ARC)

ecosystem is undergoing significant changes. A new national organization is being formed to

oversee and coordinate the growth of Canada’s ARC sector. Research data management,

Research software, and Advanced Research Computing are being brought together under a

single national coordinating organization. Over a half a billion dollars ($572.5M) was identified

in the 2018 Federal Budget to support this new organization and the growth of this sector.

This report aims to document key milestones in the evolution of Canada’s advanced research

computing endeavour. Understanding this history is critical to appreciating the lessons learned

and to build upon past successes as the ecosystem evolves.

A key purpose of this document is to help those engaging with Compute Ontario to appreciate

the history of supercomputing and how it has shaped Ontario’s ARC ecosystem. Beginning from

an international history of supercomputers, to a focus on Canada, Compute Ontario then

documents challenges, opportunities, and future considerations for its own provincial ARC

ecosystem. It is the intent of Compute Ontario to use this document for its own strategic

planning purposes for a Board retreat scheduled in the fall of 2019.

Beyond Ontario’s own uses of this material, a key intent of this document is to provide those

charged with the responsibility nationally in developing a new organization and to enhancing

Canada’s ARC ecosystem with lessons learned and constructive advice. We offer five key

pieces of advice to ISED and the new national organization that are elaborated in the following

pages described in the report.

1. The value of Highly Qualified Personnel (HQP) cannot be over-emphasized. They are

the lifeblood of the ecosystem and arguably are Ontario and Canada’s competitive

advantage. However, the advantages are quickly lost and must be continually

cultivated.

2. Episodic funding and out-dated models of cost-sharing have outlived their utility.

Predictable funding models must be implemented as the first order of business for the

new national organization. Knowledge of effective funding models exists within the

system and is informed by decades of experience – capitalize on it, don’t ignore it.

3. This entire document emphasizes how in supercomputing history, it is critical that a

researcher-focussed lens is applied. From the history of c3.ca to challenges in

governance, adopting a researcher-focussed approach is vital.

4. Grass-roots approaches in governance have seen the most benefit and have been

responsible for many of the gains made in Ontario and Canada’s ARC sector. National,

regional, institutional, and consortia led governance approaches can co-exist.

5. Hardware and people enjoy a symbiotic relationship. The systems described in this

document have made the Top 500 for a simple reason. Talent migrated toward

systems. In developing future systems, appreciate that talent must have easy access

in order to cultivate the support systems researchers depend upon.

This report begins with a basic definition of supercomputers and the ARC sector and briefly

summarizes key milestones in the development of each internationally and nationally.

Compute Ontario hopes this report provides a useful perspective for leaders in Canadian

universities, governments, industries, and research organizations wanting to gain a broad

understanding of Canada’s digital research infrastructure ecosystem.

Page 6: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

6

A Brief History of Supercomputing in Canada and its Emerging Future

What is a Supercomputer?

Supercomputers are extremely powerful computers designed to work on large and complex

problems and data sets that are beyond the capability of normal computers. As our existing

research and data sets evolve, so do the resulting challenges and required analyses. The

processing capabilities of supercomputers have grown dramatically since the 1965 launch of

CDC 6600, which is generally recognized as the world’s first supercomputer. Built by the “father

of supercomputing,” Seymour Cray, the CDC 6600 represented a turning point in the history of

research computing, and it set into motion much of the technological developments we see in

research today. The fastest supercomputer in 2018 was roughly 70 billion times faster than the

CDC 6600!

Modern supercomputers rely on harnessing the compute power of as many as a million

processors working together, in parallel, on the same problem. This is similar to manufacturing

cars in an assembly plant. The most efficient way to build cars is to have separate teams, each

working on specific parts at the same time. One team will build the engine, while another will

build the frame, so that multiple tasks are completed in parallel.

Writing computer codes that can work in parallel and make use of many processors at once is

still a challenging task. Many common applications and codes can only run effectively on one or

perhaps a handful of processors. Extremely skilled programmers or highly qualified personnel

(HQP) are needed to develop, debug, and improve the specialized research codes that can run

effectively on hundreds or thousands of processors. These codes typically need to be modified

and rewritten with each generation of supercomputers.

Developers and designers are in a continual race to keep up and outshine the existing

supercomputers due to constant technological developments. Supercomputers improve and

evolve rapidly as there is an insatiable need for computing power in order to tackle even-larger

and more complicated problems with increasing fidelity. More often than not, supercomputers

help drive changes that become mainstream and help shape other innovations.

What is Advanced Research Computing?

Modern research, in virtually all domains, often involves significant computational work which

may not require supercomputers and massively parallel codes. Policymakers in Canada

introduced the term “advanced research computing” or “ARC” to refer to the full-range of

computing needs of researchers while using the term “high performance computing” or “HPC” to

refer to the subset of those computing needs which can only be met on a supercomputer.

Due to the complexities involved in ARC, it needs an ecosystem of resources to support it. Like

the analogy of manufacturing a car described earlier, ARC requires HQP to optimize and run

algorithms as well as hardware and software resources, data storage, and data management.

This entire ecosystem, combined with networking resources and cybersecurity together

constitute the bulk of the Digital Research Infrastructure (DRI) ecosystem in Canada.

This ecosystem forms the basis of many of the more sophisticated streams of study and

innovation today, such as artificial intelligence (AI), machine learning, personalized genomic

Page 7: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

7

A Brief History of Supercomputing in Canada and its Emerging Future

medicine, cleantech, nanotech, among many others. Thus, advanced research computing and

HPC are crucial to accelerate a country’s growth, solve operational challenges, and build a

more competitive economy that is driven by innovative products and services.

International History of ARC and Supercomputing2

ARC and HPC have been a rather recent phenomenon, and have done much to change the

world in the last 60 years. Although the earliest supercomputing investments can be dated back

to the early 1950s,3 most industry insiders consider 1964-65 to be the year when

supercomputers were invented and were used to solve industrial problems. What follows is a

brief timeline of major milestones in the global history of supercomputing.

1946 - ENIAC

The Electronic Numerical Integrator and Computer, or ENIAC, was launched in 1945 and was

the world’s first general purpose electronic computer. It was used by the United States Army to

calculate artillery firing tables. The power and scope of the ENIAC fired up the imagination of

the public and was often referred to in the media as a “giant brain.”4

1965 – CDC 6600

The CDC 6600 is generally considered to be the first supercomputer in the world. Designed by

Seymour Cray, the CDC 6600 was up to ten times faster than the world’s fastest computer in

1965, the IBM 7030 Stretch. Additionally, the CDC 6600 was approximately the size of four filing

cabinets as compared to the IBM 7030, which was roughly 600 square meters or the size of an

average house. Launched by the Control Data Corporation (CDC), the CDC 6600 revolutionized

the world of supercomputing.

1972 – ILLIAC IV

Once the CDC 6600 was launched, several computing processor manufacturers started

perfecting the supercomputer through different variations. The ILLIAC IV launched in 1972, was

the first to be built with parallel architecture which allowed for multiple processors to work

together just as they do in the supercomputers today. However, bad project management and

costs that ran four times as much as the initial estimates gave the ILLIAC IV a bad name.

Despite this, its model formed the basis of all supercomputers we use today.

1976 – Cray-1 and Vector Programming

2 Readers please note, this is not an exhaustive or complete list and editorial liberties were taken in selecting what the authors felt were significant milestones in supercomputing evolution. 3 Matlis, J. (2005, May 31). A brief History of Supercomputers. Retrieved from https://www.computerworld.com.au/article/132504/brief_history_supercomputers 4 ENIAC (2013, October). Retrieved from https://whatis.techtarget.com/definition/ENIAC

Page 8: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

8

A Brief History of Supercomputing in Canada and its Emerging Future

After launching the CDC 6600, Seymour Cray left CDC to launch his venture and Cray-1. Cray

believed that vector processing, rather than multiprocessing, was key to building superior

supercomputers. In laymen terms, both vector and multiprocessors are parallel processors, but

they each work differently. A vector processor has a single instructor stream, but each

instruction works on an array (or vector) of data items in parallel. At the time, Cray quipped, “if

you were plowing a field, which would you rather use: two strong oxen or 1024 chickens”.

Priced at $10 million, the Cray-1 increased electricity bills by ten-fold. Supercomputers built with

vector programming dominated the industry for over 20 years but eventually gave way to

parallel architectures that continue to lead sales today.

1993 – Birth of the Top500

The performance of any processor or computer system can be defined in terms of the number

of floating-point operations it can perform per second or commonly referred to as flops. The

historic CDC 6600 was capable of up to 3 million flops while the fastest supercomputer in 2018

had a theoretical peak speed of 200 petaflops, which is 70 billion times faster.

Since 1993, the world’s supercomputers have been evaluated and ranked on the Top500 list. Updated twice a year, the list ranks supercomputers by measuring the number of flops they achieve on the standard LINPACK benchmark. The list has been particularly helpful as it documents how often today’s supercomputer becomes tomorrow’s fading star with the continuing upgrades and software changes. In 2012, the average age of a system on the list was 1.26 years, and the Top500 had an attrition rate of 190 systems each year.5 This phenomenon is an indicator of how crucial it is for researchers and HQP to continue to learn and upgrade their skill sets. The Top500 list continues to serve as a reference point for the HPC industry. In fact, in June 2019, the 53rd edition of the TOP500 was released. This particular edition was significant since for the first time only Petaflop systems made the list. The total aggregate performance of all 500 systems has now risen to 1.56 Exaflops6.

1997 – ASCI Red

The ASCI Red was launched in 1997 under the Accelerated Strategic Computing Initiative

(ASCI) of the government of the United States and was the first supercomputer to boast of a

teraflop system. It was built by Intel and was installed at the Sandia National Lab, and remained

the fastest supercomputer for four years. It also made it to Top500 lists seven times over a

number of years.

2002 – Earth Simulator

Japan launched the Earth Simulator to predict tectonic movements and create solutions to

environmental challenges in 2002. Designed with an aim to create a virtual Earth and model

various environmental simulations, the Earth Simulator was the most powerful supercomputer at

5 Dongarra, J., Meuer, H., Simon, H., Strohmaier, E., (2015, November). The Top500 List and Progress in High Performance Computing. Retrieved from http://www.netlib.org/utk/people/JackDongarra/PAPERS/top500-progress.pdf 6 https://www.top500.org/lists/2019/06/highs/

Page 9: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

9

A Brief History of Supercomputing in Canada and its Emerging Future

the time. It featured thirty-two teraflops of performance power while its closest competitor was

running on just seven. It is the last remaining supercomputer to utilize classical vector

processing.

2004 – IBM Blue Gene

In 2004 the IBM Blue Gene replaced the Earth Simulator that led the Top500 list for two years

as the world’s best supercomputer. Due to its capacity, it was used beyond its original purpose

of simulating protein folding and gene development for biologists.

The IBM Blue Gene revolutionized the world of supercomputing because developers realized

supercomputers might get to a point where they would consume the amount of power used by a

mid-sized town. The Blue Gene/L was designed to use up to 212,000 low-frequency and low-

power processors. This significantly reduced the size, power, and heat generation.

2008 – Roadrunner

Roadrunner was a supercomputer built by IBM in 2008 and was the first to have a petaflop

system. It was the fourth-most energy efficient supercomputer in the world and featured a hybrid

system with AMD processors and IBM PowerXCell. The Roadrunner became obsolete five

years after it was installed, highlighting the speed of advancement within the supercomputing

industry.

2010 – Tianhe-1A

The Tianhe-1A system was the first Chinese system to make it to first place on the Top500 list.

It was also the first system with CPUs and GPUs to rank first on the list and achieved a

performance level of 2.57 petaflops. The Tianhe-1A system was designed by the National

University of Defense Technology in China and was used to address a number of research

problems varying from petroleum exploration to simulation of large aircraft designs.

2011 – K Computer

Named after the Japanese word for ten quadrillions, ‘kei,’ the K Computer was the first to reach

over ten petaflops, approximately ten quadrillion flops. It was more powerful than the next five

supercomputers combined that followed it on the Top500 list. It required a room roughly half the

size of a football oval and over 1,000 km of cable but was energy-efficient for its size.

2018 – Summit

In November 2018’s Top500 list, the Summit system at the Oak Ridge National Lab is the

fastest supercomputer in the world with 191,664 CPU cores and 26,136 GPUs. It is currently the

third-most efficient supercomputer in the world and is the first system to reach exaop, or exa

operations per second, speed. Developed by IBM, Nvidia, and Mellanox for the U.S.

Department of Energy, the Summit is largely used for civilian scientific research.

Page 10: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

10

A Brief History of Supercomputing in Canada and its Emerging Future

History of Supercomputing in Canada

1950s – The Birth of Supercomputing in Canada

Canada’s investment in ARC and developing its applications can be traced back to 1952 with

the first research computer being installed at the University of Toronto as a joint initiative

between the University and the National Research Council.7 The Ferranti Mark I, acquired for

$300,000, was powerful enough to help design the St. Lawrence Seaway and thereby fix the

international boundary between Canada and its southern neighbour as the US had no non-

military computing suitable to the task. While there were a series of additional supercomputer

installations during the 1980s, due to lack of funding, various facilities that emerged eventually

shut down. Each story was historically relevant as a milestone in Canadian HPC history as they

characterized the problem with HPC in Canada at the time. As MacIsaac and Whitmore state:

“single generation facilities, with at-best minor upgrades to the hardware before they

disappeared; they had no national mandate to support and develop HPC throughout the

country, and they never had an opportunity to develop and maintain staff to support the

Canadian user community.”8

1962 – The Meteorological Service of Canada

The exception to lack of support from the government was the computing resource maintained

by the Meteorological Service of Canada, subsequently known as Environment Canada. In

1962, that organization acquired the first of its facilities, which have been used ever since to run

weather predictions. While Environment Canada has since maintained first-rate facilities and

staff, it has never had a mandate to support the broader scientific community.;

1967 – University of Waterloo Red Room

In 1967, University of Waterloo computer science professor Wes Graham earned the moniker of

“father of computing” as he advocated for and eventually was able to procure an IBM 360 Model

75 computer, which was the largest academic supercomputer in Canada at the time. The IBM

360 was housed in the university’s Mathematics and Computer building in a room designed with

signature bright red floor tiles, which not only helped contain wiring but became quite an

attraction at the university. Today, the University of Waterloo still uses the iconic Red Room to

house the father of computing’s namesake Supercomputer Graham, one of Canada’s most

powerful academic computers. The red tiles are still in use.

1990s – The Beginning of a New Era

Canada had a low profile in the international ARC research community until the late 1990s. In

1997, Brian Unger of the University of Calgary submitted a Natural Sciences and Engineering

Research Council of Canada (NSERC) grant application called: “HPCnet.” The application

boasted 49 signatories from 11 Canadian Universities, spanning the country from Victoria,

British Columbia to St. John’s, Newfoundland. It was the first time in Canadian history that a

7 MacIsaac A.B. and Whitmore, M (2008) High Performance Computing in Canada: The Early Chapters, p.85 8 MacIsaac A.B. and Whitmore, M (2008) High Performance Computing in Canada: The Early Chapters, p.85

Page 11: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

11

A Brief History of Supercomputing in Canada and its Emerging Future

group of universities banded together to build a more sustainable strategy to develop the HPC

ecosystem. It was intended to support access to existing HPC resources, to develop new tools

for using and accessing the facilities, and to foster collaborations. HPCnet was awarded three

years of funding at the level of $175,000 per year, beginning in 1996. A number of critical steps

followed this award that laid the foundation for Canada’s modern-day approach for leadership

and support of ARC.

Three key achievements took place in 1996 that launched Canada’s ARC ecosystem.

1. A group of academic researchers came together to administer the grant and award

funding for support personnel and software development projects;

2. A broad community joined together, with members from the university, government, and

private sectors;

3. The national community of vested stakeholders set forth on an important visioning and

planning mission, culminating in the creation of a new organization, C3.ca and the

publication of, “A Business Case for the Establishment of an Advanced Computation

Infrastructure for Canada.”

The lack of resources nationally was certainly an impediment, but one which the group of researchers at that time was determined to overcome. As an example, an AlphaServer 4100, by Digital Equipment Corporation (DEC) subsequently Hewlett-Packard (HP) Canada, was located at Memorial University of Newfoundland (MUN). DEC and MUN committed to sharing this facility nationally. What began as a consortium of researchers was rapidly evolving into a community that was demonstrating it could successfully share resources across the country.9 C3.ca’s business case presented a plan for a national HPC infrastructure of hardware and software, and personnel joined by a high-speed national network. Considered by many as overly optimistic, the business case presented a notional 7-year budget of approximately $225 million, covering all aspects10.

1999 – 2004 – The Growth of the Canadian HPC Consortia

Many of the early proposals submitted to the CFI were multi-institutional, resulting in the early

forerunners of current HPC consortia. Originally, there were seven regional consortia in

Canada. ACEnet in Atlantic Canada, RQCHP and CLUMEQ in Quebec, HPCVL in Eastern

Ontario, SCINET in Toronto, SHARCNET in Western Ontario, and WESTGRID in Western

Canada provinces. Between 1999 and 2004, these consortia were awarded 12 major CFI

awards amounting to over $100 million and project costs in excess of $250 million overall.11 An

important historical lesson is that each of the awards to these consortia are in part, attributable

to the success of C3.ca and its members. However, the organization itself was not tied to the

success or even the existence of any consortia. This left C3.ca free to carry out its primary

mission of promoting the need to fund HPC research in Canada.

2007 – The Birth of Compute Canada

In 2007, after extensive consultation with C3.ca, the consortia and universities, the CFI created

the National Platform Fund to “provide generic research infrastructure, resources, services and

9 Ibid., p 86 10 Ibid., 86 11 MacIsaac A.B. and Whitmore, M (2008) High Performance Computing in Canada: The Early Chapters: p.86

Page 12: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

12

A Brief History of Supercomputing in Canada and its Emerging Future

facilities that serve the needs of many research subjects and disciplines, and that require

periodic reinvestments because of the nature of the technologies” targeted initially at HPC. The

CFI invited a single, national proposal on HPC. The consortia responded with a proposal that

described a structure that reflects the value and critical role that each consortium played at the

time and a management and governance structure to ensure it is a truly national platform. In

response, the CFI awarded $150 million of infrastructure and supported the formation of a new

organization called Compute/Calcul Canada.

Funding of Canadian HPC Systems and our position globally

The creation of CFI helped to spark a new era in ARC for Canada. The impact can be seen

clearly in Fig 1, which shows the top500 ranking of every Canadian system that has appeared

on the list since its inception. The ’90s were largely dominated by the government, including

weather-forecasting, and industry systems. That changed dramatically, beginning in 2000-2001

when a burst of academic systems funded by CFI began to be installed. Since the year 2000,

just under 60% of Canadian entries have been academic systems located on university

campuses or affiliated research hospitals, such as Sick Kids and UHN in Toronto, with 46% of

them in Ontario, 32% in Quebec and 22% in the west.

In total, Canada has had 360 systems on the 53 lists which have been published as of June

2019. This corresponds to 1.36% of all entries and is well under what might be expected from

the Canadian GDP, which averaged roughly 2.1% of world GDP during the same period. This is

consistent with other, independent analyses such as Compute Ontario’s Technology Investment

report which showed that sales of HPC servers in Canada were roughly 2/3 of the G8 average

in the period 2015-2018.

Fig 1 also shows the effect of episodic funding on the academic ARC ecosystem. The number

of systems tails-off as they age, and then there is a burst of new systems when new funding

becomes available. The last two major rounds of CFI funding were the National Platforms Fund

(NPF) which was completed in 2006, but funds flowed in at the start of 2009, and

Cyberinfrastructure which was awarded in 2015 with first systems in 2017. Unfortunately, this

boom-or-bust cycle of funding inhibits integrated planning in the ARC ecosystem, as discussed

in the Compute Ontario Technology Investment report.

November of 2002 marked the first time that an academic system (at HPCVL, now CAC) was

the top-ranked system in the country. Federally-funded weather-forecasting systems had been

the top-ranked Canadian systems for the nine previous years. After June 2002, ten different

academic systems, six from Ontario and two each from Quebec and BC, have ranked as the top

Canadian system for all but two of the last 34 lists. The trajectories of these top-ranked

Canadian academic systems can be seen in Fig 2.

Several of the systems in Fig 2 are noteworthy for various reasons. An early CFI-funded

system was McKenzie at the Canadian Institute for Theoretical Astrophysics, University of

Toronto. When it debuted at #38 in the world in June 2003, it was the highest-ranked academic

system ever in Canada. This despite the fact that it was built for a total cost of just $900K, which

makes it one of the most cost-effective systems to ever appear so high on the list. McKenzie

was the first large-scale Canadian example of a “Beowulf” cluster built from commodity

Page 13: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

13

A Brief History of Supercomputing in Canada and its Emerging Future

components and featured novel locally-designed network topologies to boost performance12. It

and other systems of this era demonstrated the ARC innovation and capability that can come

from experienced and empowered “small” sites.

The GPC at the University of Toronto remains the highest-ever ranked Canadian academic

system at #16 in June 2009, and the longest-lived Canadian entry appearing on a total of 14

lists including six entries as the fastest Canadian system. It was a workhorse system for

Compute Canada which provided 25% of all cycles used by Canadian researchers in the years

2010 through 2015 and was finally retired in April 2018 after almost nine years of operation

having run 43 million jobs and delivered 1.9 billion hours of compute time. When installed, the

GPC was the largest cluster in the world with the latest Intel Nehalem CPU. It was also the

largest GPFS cluster in the world and was used by IBM for years as a reference site for the

scalability of the filesystem.

Quebec has hosted a long series of successful systems with Mammouth parallel (Mp) at

Université de Sherbrooke being another highly-ranked system at #40. It was a long-lived system

that was top-ranked in Canada for all six of its appearances on the top500 list. Its successor,

Mp2, appeared nine times but due to stiff competition was ranked first only in its debut at #41.

In the west, Glacier at UBC was the top Canadian system for two lists, beginning Nov 2003, tied

McKenzie’s #38 ranking in June 2004 and appeared a total of seven times.

While CFI was clearly a boon for ARC in Canada, another system that stands out for ranking

and longevity was the SOSCIP BlueGene/Q (BGQ) which was funded by FedDev and installed

at the University of Toronto in 2012. This was the only Canadian installation ever of IBM’s

unique Blue Gene series which emphasized massive-parallelism, high-speed and high-

dimensional interconnects, and world-class energy-efficiency. Expanded in 2014 to 66,536

cores, the BGQ was Canada’s fastest system for nine lists in a row – a record which has never

been matched.

12 Dubinski J, Humble R, Loken C, Martin P, Pen Ue-Li, (2003) McKenzie: A Teraflops Linux Beowulf Cluster for Computational Astrophysics.

Page 14: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

14

A Brief History of Supercomputing in Canada and its Emerging Future

Table 1. Listing of all top-ranked Canadian systems from Nov 2002 and on. Previous to

this date, all top-ranked systems were federal weather/climate systems

System Name

Site (consortium)

# of top500 appearances

# lists as top-ranked in Canada

Top ranking worldwide

Date Sector

Fire Queen’s (HPCVL)

2 2 191 Nov 2002

Academic

McKenzie U Toronto (CITA)

5 1 38 Jun 2003

Academic

pSeries 690 IBM 1 1 29 Nov 2003

Industry

Glacier

UBC (WestGrid)

7 2 38 Jun 2004

Academic

Mammouth parallel (Mp)

U Sherbrooke (RQCHP)

6 6 40 Jun 2005

Academic

eServer pSeries

Environment Canada

154 Nov 2008

Weather/Govt

TCS U Toronto (SciNet)

6 1 54 Nov 2008

Academic

GPC U Toronto (SciNet)

14 6 16 Jun 2009

Academic

Mammouth parallel 2 (Mp2)

U Sherbrooke (Calcul Quebec)

9 1 41 Nov 2011

Academic

BGQ U Toronto (SOSCIP/SciNet)

12 9 67 Nov 2012

Academic/industry

Cedar SFU (WestGrid)

5 2 86 Jun 2017

Academic

Niagara U Toronto (SciNet)

3 3 53 Jun 2018

Academic

Page 15: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

15

A Brief History of Supercomputing in Canada and its Emerging Future

Fig 1. Canadian systems in the Top500 list over the years

Fig. 1. Every Canadian entry on the top500 list is plotted above and colour-coded by sector

(academic, weather/govt, government, and industry). The world-ranking of systems goes from

#1 at the top to #500 at the bottom, the higher, the better. The system in the top-left corner is

the highest-ranked supercomputer ever located in Canada, a NEC vector system installed by

the federal Atmospheric Environment Service (AES) in June 1993. It is evident that the highest-

ranked academic systems cluster near 2003, 2009 and 2017, and then drop with time reflecting

the vagaries of government funding.

Page 16: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

16

A Brief History of Supercomputing in Canada and its Emerging Future

Fig 2. Top-ranked Canadian Systems on the Top500 list

Fig 2. The top-ranked Canadian systems after June 2002 and their histories on the top500 list

are traced above. The world-ranking of systems goes from #1 at the top to #500 at the bottom,

the higher, the better. Prior to this period, the federal government’s weather and climate

systems had always been the top-ranked Canadian systems. HPCVL (Fire), McKenzie, TCS,

GPC, BGQ, and Niagara were all Ontario-based systems. Glacier and Cedar were BC systems.

Mp and Mp2 were based in Quebec. Labels show the rankings of the Ontario-based systems.

Canada’s ARC Ecosystem

As of 2019, Canada has five national systems: Arbutus at the University of Victoria, Graham at

the University of Waterloo, Cedar at Simon Fraser University, Niagara at the University of

Toronto, and and Béluga, the most recent addition in 2019, located at the École de techologie

supérieure and operated by Calcul Québec members. Canada has a number of

supercomputers, of which five are in the June 2019 Top500 list. Together, these systems are

the foundation of Canada’s advanced research computing infrastructure. This ecosystem

provides Canadian researchers with the ability and tools to develop innovative products and

services, push boundaries of research, and engage with the international research community.

Niagara debuted at #53 on the world’s Top 500 list in June 2018 while Béluga is ranked 14 in

the top Green 500 in the June 2019 list.

Page 17: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

17

A Brief History of Supercomputing in Canada and its Emerging Future

Canadian DRI Partners

Before proceeding in this document, it is important to introduce other key organizations in the

digital research infrastructure ecosystem in Canada. The list of organizations is by no means

exhaustive and any exclusion is not intended to be a slight, but rather an editorial decision for

the purposes of capturing organizations Compute Ontario routinely interacts with to better serve

researchers.

CANARIE

In 1993, CANARIE was formed to create a leading-edge national network for Canadian researchers. Celebrating its 25th anniversary, CANARIE and its 12 provincial and territorial partners form Canada’s National Research and Education Network. These partner organizations are responsible for network installations within specified geographic boundaries. This ultra-high-speed network connects Canada’s researchers, educators, and innovators to each other and to global data, technology, and colleagues.

Beyond the network, CANARIE funds and promotes reusable research software tools and national research data management initiatives to accelerate discovery provides identity management services to the academic community and offers advanced networking and cloud resources to boost commercialization in Canada’s technology sector. CANARIE’s 2015-2020 strategic mandate includes:

● Providing an internationally competitive ultra high-speed network for Canada’s research, innovation, and advanced education communities;

● Developing, demonstrating and implementing next generation technologies; and ● Assisting firms operating in Canada and Canadian institutions to advance innovation and

commercialization of products and services to bolster Canada’s technology capabilities.

ORION

ORION is Ontario’s only provincial research and education network. Covering 6,000 kilometres, its private network connects regions and over a hundred institutions all over the province, including universities, colleges, hospitals and research institutions, as well as many of Ontario’s school boards. More than two million people in the research and education industry rely on ORION to share and communicate with each other and to connect to a global grid of similar networks across Canada and around the world. ORION’s ultra-fast network can run 50,000 concurrent virtual classrooms. ORION provides

cutting-edge 100Gbps speed, and over 70% of its network will be upgraded to this speed by the

end of 2020. That’s 2,000 times faster than the broadband internet available in most Ontario

homes.

The Canada Foundation for Innovation (CFI)

Page 18: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

18

A Brief History of Supercomputing in Canada and its Emerging Future

With an aim to build competitiveness, encourage and fund research infrastructure, the

Government of Canada launched the Canada Foundation for Innovation (CFI) in 1997. Through

an Act of Parliament resulting in its creation in April 1997, the CFI has worked to ensure

Canadian researchers have tools such as cutting-edge labs, facilities, and the equipment they

need to push the frontiers of knowledge in all disciplines, and to contribute to the full spectrum

of research, from discovery to technology development. This has allowed Canada’s brightest

minds to contribute to better health outcomes, a cleaner, greener environment, evidence-based

policy-making, and the competitiveness of Canadian businesses.13 As described on the CFI’s

website, motivated by the mantra “Build it, and they will innovate,” the CFI has been

instrumental in providing the necessary funds to grow Canada’s HPC and ARC ecosystem. For

almost a decade, the CFI has sponsored cyberinfrastructure competitions which have resulted

in the five national data clusters described earlier. The CFI provided leadership and funding at a

critical time in Canada’s supercomputing history, laying the foundation upon which the sector

will grow in the coming years.

Compute Canada

Compute Canada, a national not-for-profit organization funded by the CFI, was launched to

accelerate and consolidate the ARC ecosystem in the country. Compute Canada works in

partnership with the following regional organizations to provide essential services and

infrastructure for researchers and to further their collaborations in all academic and industrial

verticals of study:

• ACENET: The Atlantic Computational Excellence Network is a consortium of Atlantic

Canadian Universities

• Compute Ontario: was established in 2014 and serves the ARC requirements of

Ontario.

• Calcul Québec: a research consortium consisting of 9 members serving all Quebec

universities and colleges

• WestGrid: a regional partner for 15 institutions across British Columbia, Alberta,

Saskatchewan, and Manitoba

Through this partnership across regional organizations, Compute Canada accelerates research

and innovation across the country. Together, the regional organizations and Compute Canada

work toward creating a comprehensive framework that supports Canadian researchers.

Canadian researchers access these resources and infrastructure by applying to Compute

Canada and the respective regional organizations through an annual process called: the

Resource Allocation Competition (RAC). Non-academic users can be granted access on a

case-by-case basis.

13 Our History – CFI. Retrieved from https://www.innovation.ca/our-history

Page 19: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

19

A Brief History of Supercomputing in Canada and its Emerging Future

Components of the ARC ecosystem

Like most developing sectors, ARC and DRI will require years, if not decades, worth of intensive

policy development, capital investment, and infrastructure creation to build an ecosystem that

promotes innovation and research. Canada needs to further develop its digital strategy and

strengthen its digital economy in order to enjoy success on a global scale. Key ingredients to be

able to make this happen are human capital, technology, seamlessly integrated systems, and

public policy. Therefore, it is imperative for Canada to work towards building the following:

● Highly Qualified Personnel (HQP): Researchers require support in using

computational power effectively and efficiently. Therefore, it is necessary for each facility

providing computational power to have a strong support system of programmers,

analysts, system administrators, among others to facilitate research. Currently, Canada

does not meet its human capital demands as HQP require extensive training and skill

sets. They often seek employment opportunities south of the border, where

infrastructure and opportunity are more readily available.14 Access to this training is

limited due to a small number of centres within the country. Economies around the world

that are investing in computational power are also investing heavily in the development

of HQP and Canada is lagging due to its historic lack of a focused approach to building

this competency.

● Hardware and software / Technology: Researchers need many processors, access to

large memory, sufficient network capacity, and the right applications in addition to

supercomputers in order to get results as quickly as possible. The ability to translate

data and interpret it, often through the use of visualization technology, is a critical part of

deriving insights to a given problem.

● Data storage and availability: Many of the most complex problems attacked by

researchers require equally complex, and large, datasets. This demands significant

bandwidth and capacity for storage and working memory.

● Seamlessly integrated systems: Researchers and students need to find a seamless

and open line of communication with HQP, support staff, and supercomputing facilities to

enable research and innovation. This completes the ARC ecosystem as it connects the

various dots within the system. In many ways, seamless integration of systems and HQP

is the raison d’être for CO. Enhancing access to systems and growing the HQP in

Ontario and across Canada are the lifeblood of an effective ARC ecosystem.

● Public Policy: ARC and the components mentioned above require an environment that

is conducive to innovation and disruption. To ensure smooth functioning of these

systems, Canada and Ontario need to develop strong frameworks in place for policy,

legislation, and regulation

14 Silcoff, S. (2018, May 3). Retrieved from https://www.theglobeandmail.com/business/technology/article-canada-facing-brain-drain-as-young-tech-talent-leaves-for-silicon

Page 20: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

20

A Brief History of Supercomputing in Canada and its Emerging Future

Compute Canada’s Challenges

The promise of Compute Canada came with incredibly high expectations and optimistic

scenarios. Compute Canada’s mission is: “To make Canada a world leader in the use of

advanced computing for research, discovery, and innovation.” Its mandate is: “To enable

excellence in research and innovation for the benefit of Canada by effectively, efficiently and

sustainably deploying a state-of-the-art advanced research computing network supported by

world-class expertise. And to use this network to support a growing base of excellent

researchers, and to serve them as a national voice for advanced research computing.”

While Compute Canada succeeded in many aspects, a series of factors including the changing

research and innovation sector and a failure to adapt led to a change in direction. The needs of

the ecosystem in Canada evolved over time, which required stringent yet agile policy changes

catering to the individual needs of each region and sector. Compute Canada adopted a top-

down leadership style which was a suboptimal fit with stakeholders accustomed to a high-

degree of autonomy, including the provinces that were covering 60% of the expenses of

Compute Canada. Compute Canada’s critics argued that they failed to serve the researchers

who play the most important role within the ecosystem. On May 10, 2013, against mounting

frustration, researchers posted a petition and acquired 238 signatures expressing a lack of

confidence in Compute Canada.15

We now step away from a pure exposition of history and context to take stock of the

lessons learned from Canada’s supercomputing history.

It is crucial for the new national organization being formed to collaborate closely with the

researchers to ensure that the organization, and systems, being created are truly serving their

needs. Many bright spots exist in Canada’s DRI ecosystem and can be built upon in the

ecosystem of the new national ARC organization. For example, the HQP that resides in and

among the regional consortia continues to serve researchers with dedication and unparalleled

expertise. These HQP are seen as the backbone of the DRI infrastructure. Recognition of their

vital role and change management support should be considered by the new organization.

The creation of TECC (Technical Experts of Compute/Calcul Canada) happened prior to the

incorporation of Compute Canada, but was actively supported and expanded upon over the

course of Compute Canada’s existence which was another important development. TECC is a

group of HPC experts who have agreed to work together to better support the Canadian HPC

community to provide guidance to national and regional groups that require access to personnel

with high level technical skills. Members of this group work together, developing standards and

exchanging information in order to maintain the hardware and software used by computational

researchers across Canada.

A prominent voice for researchers and TECC in the formation of the new national ARC

organization will help avoid some of the missteps of the past. One of the key successes of

C3.ca was the creation and publication in 2005 of the Long Range Plan (LRP) for HPC in

Canada which provided a vision of a sustained and internationally competitive infrastructure for

computationally-based research in Canada. One of the hallmarks of a strategically prophetic

and compelling document such as the Long Range Plan (2005) and its follow-on: Renewing the

15 Wadsley, J. (2013, May 10). Retrieved from http://www.ipetitions.com/petition/restore-confidence-in-compute-canada/

Page 21: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

21

A Brief History of Supercomputing in Canada and its Emerging Future

Long Range Plan for HPC in Canada (2007) is the fact that most, if not all, of its

recommendations have stood the test of time. At the same token, a dozen years later, the fact

that these recommendations continue to apply also demonstrates how much time has been lost

that could otherwise have represented real progress for HPC in Canada.

It is important to acknowledge that one of the challenges the CFI encountered was the

requirement to use a funding mechanism that was entirely appropriate for administering

research-based competitions, but less than effective in funding shared, distributed,

infrastructure. While the initial culture and successes of C3.ca were based on a collaborative

nature of sharing and broad access to computing capacity, competitions for funding did not

engender the kind of collaboration that C3.ca espoused. Accepting that resources are limited,

and merit-based competitions are the hallmark of any grant-funding agency, applying a similar

funding model for infrastructure requires re-examination and new alternatives for consideration

by the new national organization.

Chair of C3.ca Andrew Pollard stated: “We were excited by the potential, challenged by political

obstacles but united in our focus and resolve. This was about Canadian researchers, built by

Canadian researchers for Canadian researchers and guided by a quintessentially Canadian

ideal of collaboration and sharing.”16

Innovation Science and Economic Development (ISED)

In its 2018 budget announcement, the federal government of Canada committed to

● “Provide $572.5 million over five years, with $52 million per year ongoing, to implement a

Digital Research Infrastructure Strategy that will deliver more open and equitable access

to advanced computing and big data resources to researchers across Canada, and to

● work with interested stakeholders, including provinces, territories, and universities, to

develop the strategy, including how to incorporate the roles currently played by the

Canada Foundation for Innovation, Compute Canada and CANARIE, to provide for more

streamlined access for Canadian researchers.”

The Leadership Council on Digital Research Infrastructure (LCDRI) is a community-driven organization that seeks to provide a unified voice for Canada’s digital research infrastructure community. The LCDRI was tasked by the Federal Minister of Science and Sport to prepare a series of position papers as input to ISED’s work.

The LCDRI released a series of position papers outlining options for improved coordination and equitable funding models specifically for the research data management (RDM) and advanced research computing (ARC) components of DRI. As a step towards delivering this strategy, on April 6, 2019, Innovation Science and Economic Development (ISED) released a call for proposals to create a new organization to deliver, “open and equitable access to advanced computing and big data resources to academic researchers across Canada in order to further enable scientific and research excellence.” The promise of a new organization to oversee and provide leadership in promoting and financially supporting digital research infrastructure growth in Canada is simultaneously exciting and daunting. In some ways, this new organization is emerging in an environment similar to that in which Compute Canada arose, where the

16 Pollard A (2019) in conversation with Nizar Ladak

Page 22: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

22

A Brief History of Supercomputing in Canada and its Emerging Future

community looks to the new organization as a panacea to address all systemic ills of the recent past.

The formation of the new national organization is an opportunity to reset certain initiatives and re-design principles that return Canada’s DRI ecosystem to its origins. These origins respect researchers as the core and intentions that were designed to equip them with the tools for innovation and promote intellectual and economic growth.

Principles to Guide a National DRI Strategy

In its submission to the Ontario Ministry of Research, Science and Innovation (MRIS) and to

Innovation Science and Economic Development Canada (ISED), Compute Ontario asserted the

importance of an organizing framework founded upon an agreed set of principles. These

principles should support a modern and globally competitive DRI ecosystem in Canada and

reflect not only the type of system we would like to design but mechanisms for enabling

accountability and giving voice to a diverse community. Considering this, Compute Ontario

suggested the following principles as starting points for discussion. We should aspire towards

solutions that are

o Researcher-Centred

▪ Enabling best in-class research will be central to the goals of Canada’s DRI system;

▪ Researchers and those who represent their interests both within the public and the

private sphere will have a voice in identifying sector needs and setting priorities; and

▪ Researchers will experience a comprehensive system and seamlessly draw on

expertise, tools, and services with the understanding that the system has been

designed to foster world class scholarship and innovation.

o Universal, Equitable, Accessible

▪ Researchers across Canada should have access to the appropriate DRI expertise

and tools required to carry out their work regardless of their research domain or

where they reside.

o Forward-looking, Adaptive, and Nimble

▪ Capable of responding to emerging trends in research and technology in an efficient

and timely manner; and

▪ Continuously develops capacity and supports the skills development of highly

qualified personnel as required to sustain the system.

o Collaborative and Integrated

▪ Promotes collaboration while acknowledging and respecting the diversity and

autonomy of key stakeholders within the ecosystem; and

▪ Results in a comprehensive national system which can also be leveraged to meet

regional research and innovation needs.

o Sustainable

▪ Funding will be available in a predictable and comprehensive way to facilitate long

range strategic planning of both people and tools;

Page 23: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

23

A Brief History of Supercomputing in Canada and its Emerging Future

▪ Cost sharing is equitable and, in a manner, proportional to the research requirements

in a region; and

▪ Accountability will be shared through our common understanding of achievable

performance and desired outcomes

Improved Federated Coordination

In its simplest form, a federated approach acknowledges the different strategic goals, and

means, across different stakeholders. Furthermore, while a national strategy is clearly essential,

the past two decades show us that many functions can be delivered very effectively at a

local/regional level as long as there is appropriate coordination.

A federated approach allows us to build on our past successes. Some functions may be

appropriate to coordinate at a national level. Functions such as site operations, procurements,

and training of emerging researchers have been and can be delivered in a decentralized way.

Regional organizations have demonstrated their value-proposition in helping to deliver these

services in close collaboration with universities and institutions. Therefore, we submit that in a

national federation, regional organizations play an important role. As not-for-profit corporations,

regional organizations adhere to legislative requirements on governance, reporting obligations

while seeking to meet the needs of our ecosystem in an accountable way.

A national DRI strategy serves researchers and innovators across the country, co-funded by

multiple provincial governments. In order to respect the autonomy and financial contributions of

our provincial, and institutional partners, federated governance is a necessity.

To realize a world-class DRI ecosystem in Canada, national approaches towards a DRI strategy

might aspire to:

● Reflect principles which promote pan-Canadian ideals on research and a 21st century

DRI ecosystem that is modern and sustainable

● Recognize the importance of a federated approach to coordination which applies those

principles through appropriate governance tools and processes, and recognizes, and

clearly defines, the appropriate roles and accountabilities for all participants in the

ecosystem.

● Apply approaches to funding, which are also grounded in these same set of principles

and foster appropriate investments nationally, regionally, and across sectors as

required.

● Respect the existing strength and diversity of expertise within the DRI sector.

● Unequivocally promote a collaborative approach.

Ontario’s ARC Ecosystem

Compute Ontario

Compute Ontario is an independent, not-for-profit organization that acts as a coordinator for

ARC-related strategy development and investments in Ontario.

Page 24: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

24

A Brief History of Supercomputing in Canada and its Emerging Future

As the recent history of ARC ecosystem denotes, collaboration is critical to the advancement of

research and innovation infrastructure. Compute Ontario acts as a relationship builder in

Ontario’s DRI ecosystem through regular and transparent communication amongst DRI

stakeholders.

Compute Ontario works towards the following strategic goals for ARC in Ontario through

collaborations with many partner organizations:

● To serve as a credible voice regarding policy

● Contribute to efforts to promote ARC and its use, provincially and nationally

● Coordinate and support the advanced computing needs of Ontario’s academic research

community and other stakeholders

● Coordinate Ontario’s efforts to develop, retain and increase highly qualified personnel to

support ARC

● Build trust and serve as a focal point for connecting communities

The consortia served by Compute Ontario provide ARC resources such as access to HPC,

hardware and software resources, data storage and management, and most importantly, access

to highly qualified personnel who can assist researchers in using the systems. These consortia

are also affiliated with Compute Canada and offer clients access to resources and computing

power. They have also played a crucial role in developing and training future highly qualified

personnel.

An impact assessment conducted by The Evidence Network (TEN) indicated that the consortia

play a critical role for researchers looking to leverage ARC resources in Ontario.17 These

organizations are succeeding in impacting the knowledge, capabilities, and performance of

researchers across the province. Collectively, the consortia delivered 25,321 hours of teaching

to over 9000 participants in 2018, a 50% increase from the previous year in teaching time, and

number of participants at training events. Ontario continues to boast delivery of over 70% of

training programs nationally, that are credited toward Compute Canada. Each of the consortia

have played a crucial role in Ontario’s success in the global science community.

The Ontario Supercomputing Consortia

• Centre for Advanced Computing (CAC), led by Queen’s University, the CAC provides a high availability high security environment that has resulted in it being home for dozens of health research teams from across Canada working with personal health information. This reputation for security has also garnered the trust of the indigenous community and led to several groundbreaking digital humanities projects with First Nations teams.

While supporting the university research community for over 20 years, the CAC has also become Canada’s leader in industry engagement and is rapidly building an international reputation as it leverages skills in advanced computing, deep analytics, machine learning and artificial intelligence (AI) in general. One area of particular focus is in working with “smart cities” where the combination of big data experience coupled with

17 The Evidence Network (TEN), (2019, March), Impact of Compute Ontario and ARC Consortium Partners, p.23

Page 25: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

25

A Brief History of Supercomputing in Canada and its Emerging Future

best-of-breed privacy and security is resulting in job creation and increased economic results for the CAC’s partners.

● HPC4Health has been instrumental in furthering innovation in Ontario’s healthcare

system. It has promoted further research in almost every discipline in healthcare with

studies varying from genomics to medical imaging. HPC4Health focuses on dealing with

a “data deluge” of information, governance of personal health data, and translates this

data into something that will benefit patients by bettering products and services.

HPC4Health is a consortium of health providers who are working together to build the

next-generation compute engine for clinical research. HPC4Health and Compute Ontario

funded initiatives such as the Health Artificial Intelligence Data Analysis Platform

(HAIDAP) directly accelerates Compute Ontario’s goal of advancing Ontario’s research

capability, competitiveness, and their strategy for supporting health data research.

● SciNet, led by the University of Toronto, is Canada’s largest supercomputing centre. It

provides researchers with computational resources and expertise on scales that were

not previously possible in Canada. It houses Niagara, the largest and most powerful

supercomputer in Canada. SciNet, along with SHARCNET, play an integral role in

building HQP workforce in Ontario through training programs and summer schools. Their

technical expertise has helped them become Ontario’s early adopters of new, complex

technology such as quantum computing in the ARC ecosystem. SciNet’s support

expertise, state of the art system, and ability to integrate new technology into the

ecosystem helps advance Compute Ontario’s objective of establishing Ontario as a

global hub for ARC and attracting world class researchers.

● SHARCNET, the Shared Hierarchical Academic Research Computing Network is

comprised of 18 colleges, universities and research institutes. Together, they operate a

network of high-performance computer clusters across a geography of 1800km,

including south-western, central and northern Ontario. It acts as the institutional based

platform for academic access to high-performance computing and on boarding

universities and colleges for HPC usage, HQP training, and hardware distribution. Their

operational philosophy is to build HPC culture by connecting users to a community of

excellence, providing resources to accelerate computational academic research, and

developing entry level users into computationally intensive users. This directly

accelerates Compute Ontario’s strategic priority of advancing the region’s academic

research community. Increasing academia’s research capacity can advance Ontario’s

ARC ecosystem by developing HQP, increasing ARC pervasiveness, and adding

infrastructure to the region.

SOSCIP – Key Collaborator in Ontario’s ARC Ecosystem

In order to further innovation in Ontario, the Federal Economic Development Agency for Southern

Ontario (FedDev Ontario) and the province of Ontario funded several academic institutions, with

IBM as the industry partner, to launch the Southern Ontario Smart Computing for Innovation

Platform (SOSCIP) in April 2012. This consortium worked closely with Ontario Centres of

Excellence (OCE) and many small and medium-sized enterprises (SMEs) across the province to

pair researchers with ARC infrastructure to solve industry and social challenges that drive

economic growth. This helped fuel Canadian innovation within areas of agile computing, health,

Page 26: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

26

A Brief History of Supercomputing in Canada and its Emerging Future

water, energy, mining, advanced manufacturing, cybersecurity, among others.18 As of 2018,

SOSCIP helped launch 145 industry-academic collaborative projects and has connected more

than 200 Canadian companies with new data science partnership opportunities.19 Compute

Ontario and SOSCIP signed a memorandum of understanding in 2015, establishing themselves

as strategic partners in Industry Engagement within Ontario’s ARC ecosystem. The alliance has

been extraordinarily collaborative, mutually supporting and benefiting each organization. Board

members for each organization are cross-appointed, enabling consistent messaging and strategic

governance advice for both organizations.

Where Does Ontario Go From Here?

A Hyperion study commissioned by Compute Ontario in 2018 indicated that out of the G8

economies, the Canadian ARC ecosystem is significantly underdeveloped. Canada ranks last in

the G8 when it comes to ARC-related spending,20 and will require further investment to be

internationally competitive. As of November 2017, China has 227 of the world’s fastest

supercomputers, and the US had 109.21 These supercomputers have significantly contributed to

the long-term growth of their respective economies by way of catalyzing innovation projects.

The G8 countries, with the exception of Canada, continue to make large-scale investments in

their ARC ecosystems which reflect in their standings on the Top500 Supercomputers list.

In 2018, there were a total of 13,540 users of the ARC ecosystem in Canada, which led to 1,948

R&D industrial collaborations, 8,158 publications, and the formation of 479 spin off companies.22

Users recorded 251 patents in 2017 alone. However, the economies closest to Canada, in

terms of size, have allotted more resources to their ARC ecosystems. Ontario will equally need

to focus on building a comprehensive and integrated ARC ecosystem that is supported by HQP,

strong policy, and legislative systems, along with hardware and software resources.

A Way Forward for Ontario

The Evidence Network’s (TEN) impact study suggested that going forward, Compute Ontario

should aim to extend their collaborative approaches beyond Compute Canada and the CFI

vertically, and beyond their consortia laterally. This shift of focus has led Compute Ontario to

look for opportunities to collaborate with industry, including the establishment of their Industry

Engagement Committee. Other stakeholders Compute Ontario aims to engage with include

ORION and CANARIE on cybersecurity initiatives, and other organizations that focus on

Research Data Management (RDM). Going forward, Compute Ontario will continue to act as a

18 Who We Are – SOSCIP. Retrieved from https://www.soscip.org/who-we-are/ 19 SOSCIP Impact Report (2018). Retrieved from https://www.soscip.org/wp-content/uploads/2017/08/soscip_impactreport2018_pages.pdf 20 Conway, S., Joseph, E., Norton, Alex., Sorensen, B. (2018, May) Phase 1: Study to support Compute Ontario’s ARC planning 21 Wallace, N. (2019, Jan 14). Retrieved from https://sciencebusiness.net/news/commissions-new-eu92b-procurement-programme-aims-grab-eu-lead-supercomputing 22 Compute Canada Annual Report 2017-2018. Advanced Research Computing – Achieving More Together. Retrieved from https://www.computecanada.ca/wp-content/uploads/2018/12/ComputeCanada-AR2018-EN.pdf

Page 27: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

27

A Brief History of Supercomputing in Canada and its Emerging Future

relationship builder between academia, not-for-profits, government, and industry in order to

encourage collaboration among stakeholders to advance Ontario’s ecosystem as a whole.

More importantly, a collaborative and consensus-building model of leadership has served the

organization well and will be a pre-requisite for success as Compute Ontario navigates the

following challenges and opportunities it sees in its environment.

DRI Environment – Challenges and Opportunities

ARC is an essential tool for scientific discovery, innovation, and in turn, economic development.

ARC can lead to critical scientific advancements, new applications, research possibilities, and

improved public policies. However, the lack of awareness surrounding ARC and why Ontario

needs to develop a robust and sustainable support system for it has arguably stalled portions of

Ontario’s innovation economy. This lack of awareness in the country and among policy makers

has resulted in unpredictable funding from the government, which has resulted in shortages of

HQP, lack of technology, and data governance systems. Below is a brief overview of challenges

and how Compute Ontario is working towards actualizing the opportunities through initiatives

and collaborations.

Highly Qualified Personnel

• Challenges: HQP are critical for leveraging Ontario’s economic and social advantage. The C3.ca Long Range Plan recommended a conservative target of 25% of capital investments dedicated to the development of HQP in Canada. By contrast, most of Canada’s global competitors invested between 20% to 50% of their investments in the development of HQP.23 The report also attributed the “scalability gap” in Canada’s

research and innovation infrastructure to the lack of HQP. Compute Ontario’s technology investment study conducted in 2018 by R.A. Malatest and Associates suggested that over 20,000 HQP positions will go unfilled in the next five years in Ontario.24 Greater awareness around HQP career opportunities and more programs to

teach the skills and competencies resulting in HQP are needed to address this challenge. Indeed C3.ca’s follow-on report published in 2007 stated: “The development of highly qualified personnel constitutes the most effective form of technology transfer.”

• Opportunities: Compute Ontario has the opportunity to advance Ontario’s position as a

leader in HQP development and attraction. Compute Ontario can leverage existing

training efforts in the province by recognizing initiatives such as consortia led summer

schools as essential to HQP development and advocating to increase their training

capacity, leveraging and scaling online training initiatives such as webinars, and

continuing to deliver customized training programs such as the OPS Hackathon that was

led by the organization. Compute Ontario can also raise awareness about HQP skills

and competencies through educational networking to promote the development of HQP

23 Building Canada’s Future Research and Innovation Culture. p.5 24 R.A. Malatest & Associates Ltd., (2018, April), Highly Qualified Personnel Study

Page 28: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

28

A Brief History of Supercomputing in Canada and its Emerging Future

at all levels of education, and aligning the skills employers consider valuable with HQP

training curriculums.

Technology

• Challenges: Infrastructure and funding are key parts of enabling high quality research in

Ontario. Having up-to-date infrastructure is important, but so are the funding models that

enable it; exploring new models such as industry-based cloud computing, and more

sustainable, predictable funding cycles are important as Ontario’s DRI ecosystem

continues to advance.

• Opportunities: Compute Ontario, through its role as a trusted voice on policy and a

relationship builder, is in a position to support better ARC funding models characterized

by more sustainable and predictable funding cycles. By promoting the scaling of

successfully applied research methods such as ARC-AI convergence and smart city

pilots, Compute Ontario has an opportunity to support greater funding for ARC

infrastructure for innovative research and economic development opportunities.

Government funded credits for access to cloud computing can rapidly enable access to

state-of-the-art technology. Compute Ontario is in a position to explore this new model

utilizing privately owned platforms to conduct innovative research for social good.

Data Governance

• Challenges: As more-and-more sectors of Ontario’s economy increasingly become data

driven, data models need to be established that support the availability, usability,

integration, and security of data. Increased use and access to data governance models

has set the stage for Compute Ontario to leverage their collaborative and strategic

abilities to support frameworks and platforms that enable open data, and greater data

access.

• Opportunities: The need for superior data governance models allows Compute Ontario

to support and enable frameworks that promote the access and use of big data. By

leveraging and scaling projects such as HAIDAP and Smart Cities Pilots, Compute

Ontario is able to articulate how policy and pilot projects can converge to enable big data

use for social and economic development. With more sectors looking to leverage data,

access to raw and curated data is critical to achieving Ontario’s innovative and economic

potential. Compute Ontario is well positioned to enable these opportunities.

The Emerging Future

Canada has a strong history of technological leadership. The Advanced Research Computing

sector has evolved considerably internationally, in Canada and within Ontario. Internationally, in

2021, a short three years from the time this report is written, Argonne National Laboratory will

Page 29: Thinking Forward Through the Past - Home - Compute Ontario€¦ · Modern supercomputers rely on harnessing the compute power of as many as a million processors working together,

29

A Brief History of Supercomputing in Canada and its Emerging Future

take delivery of $500 million exascale25 supercomputer developed by Cray Inc. and Intel Corp.

Known as Aurora, the basketball-court-sized machine will use more than 200,000 processor

cores, burn multiple megawatts of power and perform a quintillion calculations in a second.

While the infrastructure is clearly important, and at times can be perceived as the end in itself on

global Top 500 lists, the exciting future is clearly about the nature and type of research that will

be conducted on such machines.

One can only imagine that we will find cures to diseases that have devastated families; or that

researchers will not only halt climate change, but reverse its deadly impact on our planet. Will

researchers discover new galaxies or new planets which humans can inhabit? Of course, any

number of possibilities exist and at the same time, it is nearly impossible to say for certain what

kind of discoveries will occur. However, what an exciting future it can be when exploring the

possibilities. Without question, big data and growing compute resources represent a potential

for researchers that is only limited by human imagination. Compute Ontario is committed to

supporting researchers, industry, and government to ensure human imagination is simply the

only limit to realizing that exciting future.

25 Exascale computing refers to computing systems capable of at least one exaFLOPS, or a billion billion

(i.e. a quintillion) calculations per second.