21
The CRAY-X1 Supercomputer CS-350-2 Spring 2004 Kevin Boucher Brian Femiano Sara Prochnow Allen Peppler

Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

  • Upload
    others

  • View
    32

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

The CRAY-X1 Supercomputer

CS-350-2

Spring 2004

Kevin BoucherBrian FemianoSara ProchnowAllen Peppler

Page 2: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

Table of Contents

Cray History …………………………………………………………………………….3

Introduction of the X1…………………………………………………………………..3

Operating System………………………………………………………………………..4

Cray X1 IRIX 6.5 Implementation Features…………………………………………..4

Raid System on Cray X1………………………………………………………………...4

The C-brick………………………………………………………………………………5

The S-brick……………………………………………………………………………….5

Nodes………………………………………………………………………………...……5

Multi-Chip Modules (MCM)……………………………………………………………6

Cooling……………………………………………………………………………6

Single-Streaming Processors (SSP)……………………………………………………..6

Instruction and Data Caches……………………………………………………..7

ECache……………………………………………………………………………………7

Memory…………………………………………………………………………………...8

Global Addressability…………………………………………………….……….8

Word Size……………………………………………………………………….....8

Modes……………………………………………………………………………...8

Cabinets…………………………………………………………………………………..8

Programming……………………………………………………………………………..9

Applications………………………………………………………………………………9

Summary………………………………………………………………………………...10

APPENDICES

Questions and Answers for Review…………………………………….. APPENDIX A

Bibliography…………………………………………………...…….. ......APPENDIX B

Work Summary Sheet............................................................................... APPENDIX C

2

Page 3: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

Cray History

Seymour Cray is responsible for a great deal of the history revolving around supercomputers. Some people will go as far as to call him the father of supercomputers. Cray’s mission in life was to create the world’s fastest computer. In 1957, he founded the Control Data Corporation with William Norris and in 1958 they developed the first fully transistorized supercomputer, the CDC 1604. After the CDC 1604, Cray designed the CDC 6600, which consisted of 60-bit words and parallel processing. This computer demonstrated RISC (Reduced Instruction Set Computing) design and was at least forty times faster than the 1604.

In 1972, Cray and Norris had a dispute about a new computer which Norris had put on hold. This disagreement caused Cray to leave Control Data Corporation and found a new company called Cray Research. In 1976, Cray designed the CRAY-1 which consisted of 133 megaflops and followed this achievement in 1985 with the CRAY-2TM which consisted of 1.9 gigaflops (Bellis). This second system had the world's largest central memory with the possibility of 2048 megabytes (Long). These two computers were the fastest supercomputers for their time.

In 1989, Cray had a dispute with management at Cray Research after they put the CRAY-3 on hold. This disagreement led to the founding of Cray Computer Corporation where Cray could develop his new project, the Cray-3. The Cray-3 consisted of 4-5 megaflops, which became the fastest supercomputer when it was introduced. This supercomputer was based on 1 GHz gallium arsenide (GaAs) processors where as the other processors were conventional silicon that top out at 400-500 MHz. The Cray-4 followed the Cray-3 and was similarly based on gallium arsenide. This computer is twice as fast per-node than the Cray-3 and smaller than the human brain.

In 1995, the Cray Computer Corporation went bankrupt due to uncontrollable circumstances in the economy. The Cray-3 and Cray-4 had minimal sales. Due to this bankruptcy Cray founded another new company called the SRC Computer Labs to begin building a new computer. Unfortunately, Seymour Cray was killed in an automobile accident a year later, leaving his future plans deserted.

Besides Cray’s development of some of the fastest supercomputers, he also invented and contributed to several technologies used by the supercomputer industry. These include the CRAY-1 vector register technology, immersion cooling technology, gallium arsenide semiconductor technology, and RISC architecture.

Introduction of the X1The death of Seymour Cray was not the end of Cray supercomputers, although it seemed to be. In 1996, Cray Research merged with Silicon Graphics Inc. (SGI) who cancelled future Cray research (Dow). In 2000, Silicon Graphics Inc. sold Cray Research assets to Tera Computer Co. who preformed major reconstruction, minor upgrades to Cray’s existing computers, maintained service business to collect revenue, and changed its name to Cray Inc. (Dow, 2003).

3

Page 4: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

In November of 2002, Cray Inc introduced the X1, a new supercomputer with leading-edge technology. The X1 was created to focus on capability rather than capacity (Dow, 2003). Brooks states, “The X1 can sort the 6 million volumes of the New York City Public Library in under a minute, an improvement of nearly four minutes over the supercomputing standards of the mid-1990s, officials say. It holds the equivalent computing power of 25,000 personal computers,” (2003). The Cray X1 is the latest Cray supercomputer. There is a lot to learn about Cray’s new features and improvements in the X1.

Operating SystemThe Cray X1 system uses the UNICOS/mp operating system to control its overall resource and disk management, which is based on the IRIX 6.5 kernel found in various Unix platforms. The kernel itself, on the Cray X1, has been upgraded for improved scalability and resource scheduling. The kernel implemented within the Cray X1 is based on POSIX 1003.1-1990 and POSIX 1003.2-1992 standards (Cray Docs 1).

Cray X1 IRIX 6.5 Implementation Features The Cray X1 runs a single-system operating system image with hardware error reporting and checkpoint/restart software management tools. Depending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the PBS Pro batch system for batch job management, but does not come standard with the hardware. System monitoring and reporting data can be accessed using the sar and timex utilities provided by the UNICOS/mp operating system. CPU time, processor jobs, memory, I/O, network communications and storage can be monitored using these utilities (Cray Docs 1).

UNICOS/mp operates using the XFS journaling file system and the XLV volume manager for managing logical volumes. Entire file systems, directories, and/or individual files can be backed up and restored using the xfsdump and xfsrestore utilities. Additionally, currently mounted filing systems can be backed up with xfsdump (Cray Docs 1).

Both the Network Filing System (NFS) client and NFS server are available under the UNICOS/mp system implementation on the Cray X1. Support for remote procedure calls is required for session layer scheduling within NFS and Domain Name System support is also available. Transmission Control Protocol/Internet Protocol (TCP/IP) is also supported, including the socket interface for network communications, ftp, telnet, and rsh (Cray Docs 1).

Raid System on Cray X1Cray has traditionally supported two different models of RAID subsystems within its computers. The early production systems housed the RS100 series and the later models use the RS200 series. Both models encompass a pair of redundant RAID controllers and back end storage. Components from the two series cannot be mixed. The RAID storage system in the X1 is based on third party RAID hardware that Cray selects and configures (Cray Docs 2).

4

Page 5: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

The overall system is housed in a PC-20 peripheral cabinet and divided into two components, the C-brick and the S-brick. The C-brick contains the RAID controller and the S-brick contains the physical hard-drives used in storage and the means to interface with the C-brick controllers (Cray Docs 2).

The C-brick has an Ethernet port attached to monitor disk performance of it’s connected S-bricks.

The C-brickAs mentioned earlier, the C-brick houses the RAID controllers. There are two main models among the Cray X1 system series, the CB100 and the CB200. Some early production Cray X1 systems used the CB100. At a minimum, the CB100 houses two RC100 RAID controllers with 2 Fibre Channel front-end connections and 4 back end connections. Redundant power supplies with cache batteries are used incase of power failure. The CB200, which is the C-brick of an RS200-based RAID subsystem, has two 2-Gbps Fibre Channel front-end connections and four back end connections. Each RAID controller can access these loops. It has the same power and cooling features as the CB100 model, but additional Ethernet and serial connections for administrative control. The architecture of the CB200 has been improved allowing a considerably more powerful performance from the RS200 RAID controllers (Cray Docs 2).

The S-brickThe number of S-bricks attached to a given C-brick depends on configuration, but there are limitations to what can be done. Mixing of RS100 and RS200 components is not allowed. Moreover, the maximum number of S-bricks a single RAID subsystem can handle is eight, and they must be added in pairs. Because the S-bricks have only two Fibre Channel connections, the S-bricks are attached to C-bricks as a pair of redundant loops. Each individual S-brick contains a series number to signify compatibility for either the RS100 or RS200 models, and a spindle size indictor of either 0 for 36 Gigabyte spindles, 1 for 73 Gigabyte spindles, or 2 for 146 Gigabyte spindles (Cray Docs 2).

For the RS100 series, the S-bricks contain housing units for 10 dual-ported Fibre Channel drives, two 1 Gigabyte/second back-end connections, and power and cooling units. Spindle sizes are limited to 73 Gigabytes of storage space. For the RS200 series, the S-bricks contain housing units for 14 dual ported Fibre Channel drives, similar connection bandwidth and cooling, but spindle sizes can vary from 36, 73, to 146 Gigabytes of storage (Cray Docs 2).

NodesThe basic unit of the Cray machine is the node. A node is made up of four multi-chip modules (MCM) and main memory. The MCMs and memory are attached to routers that allow communication between different nodes (Cray Docs 3).

5

Page 6: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

Figure 1 shows the MCM model.

Multi-Chip Modules (MCM)Each MCM contains a single multi-streaming processor (MSP). These MSPs are made up of four scalar single-streaming processors (SSP) and two megabytes of ecache. Each MSP is a vector processor, meaning it can handle a large numbers of instructions at one time due to its composition of four scalar processors, each only being able to compute one instruction at a time (Partridge, 2002).

Figure 2 shows the SSP connections to the cache.

CoolingThe MSPs are cooled by spraying them with Fluorinert, an inert liquid. Each processor is sprayed by a tiny nozzle. The heat from the processor then causes the liquid to evaporate. The evaporating liquid cools the processor and the gas is collected for reuse. After collection the Fluorinert is cooled, filtered, and sent back to be used in the cooling process once again (Partridge, 2002).

Single-Streaming Processors (SSP)The four SSPs that make up the MSP are scalar processors with two vector registers. The two vector registers allow the SSPs to fetch, decode, and execute two instructions per clock cycle. Running the SSPs at peak performance, 800MHz, and computing two operations per clock cycle one arrives with the 12.8 gigaflops of processing power that

6

Page 7: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

makes the Cray X1 such an amazing machine (Partridge, 2002). The term “flops” stands for floating-point operations per second and is simply a measurement of processor speed (Wikipedia, 2004).

Instruction and Data CachesEach of the scalar processors contains an instruction and data cache. Each of the caches is sixteen kilobytes in size making a total of thirty-two kilobytes of cache on each scalar processor. Each cache is composed of 256 sets of two lines. These lines are two way set-associative and are thirty-two bytes long (Cray Docs 4).

Figure 3 is a visual depiction of the instruction and data caches.

A data and instruction cache address is forty-eight bits long. The tag field is thirty-five bits long, the set field is eight bits long, and the line-offset field is five bits long (Cray Docs 4).

Figure 4 is an example of the tag for the cache.

The data cache is write-through which means that whenever data is sent to it that data is also sent to the ecache. Scalar data is written to the data cache and the ecache, but vector data is only written to the ecache (Cray Docs 4).

ECacheThe ecache is a high speed cache that gives the processors a large amount of temporary storage. The processor can load instructions from the ecache at a rate of 51.2 GB per

7

Page 8: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

second and can send instructions to be stored in the ecache as fast as 25.6 GB per second. It can even access the local memory at a rate of 38.4 GB per second. The ecache is similar in structure to the data and instruction caches. It is addressed exactly the same, but its format is slightly different. Instead of 256 sets of lines, it has 32,768 sets of lines. Another difference is that the ecache is write-back. This means that no lines of data are written to it unless they are newer versions of lines previously there. Even when they are newer versions they are not written until the line that they will replace has been evicted, or flushed from the cache (Cray Docs 4).

MemoryEach node has sixteen memory controller chips and thirty-two dynamic random access (DRAM) daughter memory cards. These daughter memory cards come in two sizes, 288 megabit chips and 576 megabit chips. This makes a total of sixteen gigabytes or thirty-two gigabytes of memory available, respectively (Johnson, 2003).

Global AddressabilityAs mentioned in the node description, every node is connected by means of a system of routers. This network of nodes allows the memory on each node to be globally addressable. This means that memory on any node can be accessed by not only the components on its node, but by any component on any node. When being accessed by another node, however, the transfer rate is not nearly as fast as if it were being accessed by the components on its own node (Cray Docs 3).

Word SizeThe memory on each node is broken up into seventy-two bit words. Sixty-four of these bits are used for data and can be used for sixty-four bits operations or broken up into two sections for thirty-two operations. The other eight bits are used for single-error-correction, double-error-detection (SECDED) (Johnson). This allows memory to detect single or double errors and to correct the single errors that are found (Wu, 2003).

ModesMemory is set up to run in two possible modes that will allow for the loss of memory cells due to unforeseen circumstances. The first mode reserves half of the memory chips on each card to cover the potential loss of a memory chip. This mode cuts the memory space in half, but does not affect the bandwidth at which data transfers. The second mode reserves half of the daughter cards in case an entire card is lost. This mode not only reduces the memory space by half but cuts the bandwidth available for data transfer in half also (Johnson, 2003).

CabinetsCray machines can be purchased in one of two types of cabinets, air cooled and liquid cooled. When using an air cooled cabinet, the Fluorinert is gathered after evaporating from the processors and sent through a system that transfers its heat to air blowing through the cabinet. An air cooled system can hold up to four nodes (sixteen MSPs). A liquid cooled cabinet, however, can hold up to sixteen nodes (sixty-four MSPs). This is

8

Page 9: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

because liquid cooling is much more efficient. The Fluorinert is sent through a liquid cooling system that the consumer must supply. Often this system passes the liquid through cold water which absorbs much of the heat. Liquid cooled cabinets are mostly used in large computing centers due to the large amount of space that is needed for the computer and the liquid cooling unit (Partridge, 2002).

ProgrammingIt is recommended that users of the X1 looking to perform specific tasks create their own program to run on the machine, rather than search for similar developments that may not suite their tasks. As a result, the Cray X1 programming environment supports Fortran, Co-Array Fortran, C/C++, shmem, Unified Parallel C, MPI, and OpenMPI.

Applications

The Army High Performance Computing Research Center (AHPCRC) will use these systems for key Army research applications in atmospheric science modeling, for survivability and lethality applications, and in chemical and biological dispersion applications (Muzio).

In support of the AHPCRC research activities in chemical and biological dispersion in atmospheric sciences, AHPCRC–NetworkCS research scientist Tony Meys has ported the NCAR/Penn State Mesoscale Model (MM5, Version 3.5) on the CRAY X1. Data is from hour 12 of a 24-hour MM5 “forecast” test run on the CRAY X1 made September 24, 2002. The calculations were performed across all the available application multi-streaming processors (MSPs) on one CRAY X1. The results were validated against forecast data previously calculated on the CRAY T3E-1200. Meys now has the distinction of being among the first to migrate and test weather and environmental codes on the CRAY Y-MP, CRAY C90, CRAY T3D, CRAY T90, CRAY T3E, CRAY J90, and the CRAY X1(Muzio).

Figure 5 shows a 3D depiction of clouds and wind flow at two levels of the atmosphere centered over Baltimore.

9

Page 10: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

In support of the AHPCRC research activities in projectile target interaction, AHPCRC–NetworkCS research scientists Steve Beissel, Charles Gerlach, and Fran Hill have implemented the EPIC code on the CRAY X1. EPIC is an explicit, dynamic, Lagrangian finite-element code for the simulation of the mechanical and thermal responses of solids. Its applications include the simulation of high-velocity impact, explosive detonation, and other events of short duration (Muzio).

Figure 6 shows a simulation of EPIC solid mechanics code on the Cray X1.

In recent years supercomputers such as the Cray X1 have also been used in the areas of nuclear simulation research. Although supercomputers perform many civilian functions, they were invented primarily to design U.S. atomic and hydrogen bombs. Supercomputers are a powerful tool for developing both nuclear weapons and long-range missiles because they can simulate the implosive shock wave that detonates a nuclear warhead, or model the forces affecting a missile from launch to impact(Wisconsin Project,1995).

Within the last several years, Israel has purchased a large number of Cray Computers, including recent X1 models, for it’s nuclear launch simulation research and A-bomb research(Wisconsin Project,1995).

Additionally, Cray computers have traditionally been in code breaking and other cryptological functions. The NSA, and other foreign allies, have employed Cray X1s over the recent years. From an encryption standpoint, the Cray X1’s parallel processing and raw power allows it to crunch numbers at a rate that has redefined the efficiency standard for brute force attacks on encrypted data (Asier).

Summary

The Cray X1’s unique system architecture make it the fastest available supercomputer on the market today. Since each SSP has two vector registries, this enables parallel processing which makes this machine run extremely fast. The uses of this machine are endless because of the instruction execution capabilities. In addition to the processing, the data transfer and ecache are also extremely fast so this is a no slack machine. In

10

Page 11: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

conclusion, due to the architecture of the Cray X1 explained in this paper, this computer will be used for decades to come.

11

Page 12: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

Questions for Review

1.) How many multi-chip modules (MCM) are there on every node in the Cray X1?a. 1b. 2c. 3d. 4

2.) How are the multi-streaming processors cooled in the Cray X1?a. They are sprayed with Fluorinertb. They are cooled by fansc. They are sprayed with waterd. They are immersed in water

3.) What does it mean to say that the memory on the Cray X1 is globally addressable?a. Any computer on the globe can access the memoryb. Any component on any node can access the memory on another nodec. The memory is in a centralized location and is accessed globally by the nodesd. The memory address must be specified by a global addressing unit

4.) Which of the following is NOT a common application for the Cray X1 supercomputera. nuclear research and simulationb. atmosphere forecastc. commercial bankingd. cryptology

5.) Cray machines can be purchased in one of two types of cabinets, which are they?a. air cooled and liquid cooledb. air heated and liquid cooledc. air cooled and liquid heatedd. air heated and liquid heated

Answers to Questions

1.) d2.) a3.) b4.) c5.) a

APPENDIX A

Page 13: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

Bibliography

Asier Technology Corporation. “The Threat” URL http://www.asiertech.com/threats/ threat_index.htm

Bellis, Mary “Cray Supercomputer - Seymour Cray”. URL http://inventors.about.com/library/inventors/blsupercomputer.htm

Brooks, Leslie (2003). “Cray gets contracts for X1” URL http://www.twincities.com/mld/twincities/business/5168077.htm?1c

Cray Docs 1. URL: www.cray.com/craydoc/manuals/S-2377-22/ html-S-2377-22/z1029362381.html

Cray Docs 2. URL: www.cray.com/craydoc/manuals/S-2377-23/ html-S-2377-23/z1039368796.html

Cray Docs 3. URL: http://www.cray.com/cgi-bin/swpubs/craydoc30/craydoc.cgi?frames=1&html=toc_view&pub=S-2346-24

Cray Docs 4. URL: http://www.cray.com/craydoc/manuals/S-2315-50/html-S-2315-50/z1051195667brbethke.html

Dow Jones Business News via NewsEdge.(2003) “Cray Inc. Making a Comeback” Daily News. URL http://www.computeruser.com/news/03/06/20/news8.html

Johnson, Andrew A. (2003). “Computational Fluid Dynamics Applications on the Cray X1 Architecture: Experiences, Algorithms, and Performance Analysis.” URL: http://www.cray.com/cgi-bin/swpubs/craydoc30/craydoc.cgi?frames=1&html=toc_view&pub=S-2346-24

Long, Kathy. “History of Computing”. URL http://www.4-winner.com/computers/history_computing.htm

Muzio, Paul.”The Cray X1 has arrived”. URL: http://www.ahpcrc.org/publications/archives/v12n3/Story1/

Partridge, Richard (2002). “Cray Launches X1 for Extreme Computing.” URL: http://www.cray.com/products/systems/x1/crayx1_dhbrown.pdf

Wikipedia (2004). “Vector Processor.” URL: http://en.wikipedia.org/wiki/Vector_processor

Wisconsin Project on Nuclear Arms Control (1995), “Israel Gets High-speed Computers” URL: http://www.wisconsinproject.org/countries/israel/highspeedcomputers.htm

APPENDIX B

Page 14: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

Wu, David Yu-Lang (2003). “Ch6 Memory.” URL: http://appsrv.cse.cuhk.edu.hk/~erg2020b/lecture/chap6-2.pdf

APPENDIX B

Page 15: Cray History - JMU€¦  · Web viewDepending on user permission settings, various processes can be controlled with the software management tool cpr. The Cray X1 also supports the

Work Summary Sheet

Allen PepplerResearched Nodes, Cache, Memory, and Cabinets = 2 hoursWrote Sections on Research = 2 hours

Brian FemianoResearched Operating System, Storage, Applications = 2 hoursWrote Sections on Research = 2 hours

Sara ProchnowResearched Cray History and Cray X1 History and wrote sections = 2 hoursFinalized Paper (Edited, Added Appendices, Added Table of Contents, Reviewed Requirements) = 3 hoursCompleted PowerPoint = 2 hours

Kevin BoucherFinalized Paper (Edited, Added Title Page, Reviewed Requirements) = 3 hoursCompleted PowerPoint = 2 hours

APPENDIX C