6
21 1 5 10 15 20 25 30 35 40 45 50 1 5 10 15 20 25 30 35 40 45 50 ABSTRACT ABSTRACT ABSTRACT ABSTRACT ABSTRACT 1-1 1-2 1-3 & 2-1 2-2 2-3 & 3-1 The Earth Simulator System The Earth Simulator System The Earth Simulator System The Earth Simulator System The Earth Simulator System By Shinichi HABATA,* Mitsuo YOKOKAWA† and Shigemune KITAWAKI‡ *Computers Division †National Institute of Advanced Industrial Science and Technology (AIST) ‡Earth Simulator Center, Japan Marine Science and Technology Center The Earth Simulator, developed by the Japanese government’s initiative “Earth Simulator Project,” is a highly parallel vector supercomputer system that consists of 640 processor nodes and interconnection network. The processor node is a shared memory parallel vector supercomputer, in which 8 vector processors that can deliver 8GFLOPS are tightly connected to a shared memory with a peak performance of 64GFLOPS. The interconnection network is a huge non-blocking crossbar switch linking 640 processor nodes and supports for global addressing and synchronization. The aggregate peak vector performance of the Earth Simulator is 40TFLOPS, and the intercommunication bandwidth between every two processor nodes is 12.3GB/s in each direction. The aggregate switching capacity of the interconnection network is 7.87TB/s. To realize a high-performance and high-efficiency computer system, three architectural features are applied in the Earth Simulator; vector processor, shared-memory and high-bandwidth non-blocking interconnection crossbar network. The Earth Simulator achieved 35.86TFLOPS, or 87.5% of peak performance of the system, in LINPACK benchmark, and has been proven as the most powerful supercomputer in the world. It also achieved 26.58TFLOPS, or 64.9% of peak performance of the system, for a global atmospheric circulation model with the spectral method. This record-breaking sustained performance makes this innovative system a very effective scientific tool for providing solutions to the sustainable development of humankind and its symbiosis with the planet earth. KEYWORDS KEYWORDS KEYWORDS KEYWORDS KEYWORDS Supercomputer, HPC (High Performance Computing), Parallel processing, Shared memory, Supercomputer, HPC (High Performance Computing), Parallel processing, Shared memory, Supercomputer, HPC (High Performance Computing), Parallel processing, Shared memory, Supercomputer, HPC (High Performance Computing), Parallel processing, Shared memory, Supercomputer, HPC (High Performance Computing), Parallel processing, Shared memory, Distributed memory, Vector processor, Crossbar network Distributed memory, Vector processor, Crossbar network Distributed memory, Vector processor, Crossbar network Distributed memory, Vector processor, Crossbar network Distributed memory, Vector processor, Crossbar network 1. 1. 1. 1. 1. INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION The Japanese government’s initiative “Earth Simulator project” started in 1997. Its target was to promote research for global change predictions by using computer simulation. One of the most impor- tant project activities was to develop a supercomputer at least 1,000 times as powerful as the fastest one available in the beginning of “Earth Simulator project.” After five years of research and development activities, the Earth Simulator was completed and came into operation at the end of February 2002. Within only two months from the beginning of its operational run, the Earth Simulator was proven to be the most powerful supercomputer in the world by achieving 35.86TFLOPS, or 87.5% of peak perfor- mance of the system, in LINPACK benchmark. It also achieved 26.58TFLOPS, or 64.9% of peak performance of the system, for a global atmospheric circulation model with the spectral method. This simulation model was also developed in the “Earth Simulator project.” 2. 2. 2. 2. 2. SYSTEM OVERVIEW SYSTEM OVERVIEW SYSTEM OVERVIEW SYSTEM OVERVIEW SYSTEM OVERVIEW The Earth Simulator is a highly parallel vector supercomputer system consisting of 640 processor nodes (PN) and interconnection network (IN) (Fig. 1). Each processor node is a shared memory parallel vector supercomputer, in which 8 arithmetic proces- sors (AP) are tightly connected to a main memory system (MS). AP is a vector processor, which can deliver 8GFLOPS, and MS is a shared memory of 16GB. The total system thus comprises 5,120 APs Interconnection Network 640 x 640 crossbar switch Main memory System 16GB Arithmetic Processor #0 Arithmetic Processor #1 Arithmetic Processor #7 Main memory System 16GB Arithmetic Processor #0 Arithmetic Processor #1 Arithmetic Processor #7 Main memory System 16GB Arithmetic Processor #0 Arithmetic Processor #1 Arithmetic Processor #7 Processor Node #0 Processor Node #1 Processor Node #639 Fig. 1 General configuration of the Earth Simula- tor. Architecture and Hardware for HPC

The Earth Simulator SystemThe Earth Simulator System · The Earth Simulator, developed by the Japanese government’s initiative “Earth Simulator Project,” is a highly parallel

  • Upload
    others

  • View
    31

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Earth Simulator SystemThe Earth Simulator System · The Earth Simulator, developed by the Japanese government’s initiative “Earth Simulator Project,” is a highly parallel

21

1

5

10

15

20

25

30

35

40

45

50

1

5

10

15

20

25

30

35

40

45

50

Special Issue on High Performance Computing

ABSTRACTABSTRACTABSTRACTABSTRACTABSTRACT1-1

1-2

1-3 & 2-1

2-2

2-3 & 3-1

The Earth Simulator SystemThe Earth Simulator SystemThe Earth Simulator SystemThe Earth Simulator SystemThe Earth Simulator System

By Shinichi HABATA,* Mitsuo YOKOKAWA† and Shigemune KITAWAKI‡

*Computers Division†National Institute of Advanced Industrial Science and

Technology (AIST)‡Earth Simulator Center, Japan Marine Science and

Technology Center

The Earth Simulator, developed by the Japanese government’s initiative “Earth SimulatorProject,” is a highly parallel vector supercomputer system that consists of 640 processor nodes and

interconnection network. The processor node is a shared memory parallel vector supercomputer, in which 8vector processors that can deliver 8GFLOPS are tightly connected to a shared memory with a peak performanceof 64GFLOPS. The interconnection network is a huge non-blocking crossbar switch linking 640 processor nodesand supports for global addressing and synchronization. The aggregate peak vector performance of the EarthSimulator is 40TFLOPS, and the intercommunication bandwidth between every two processor nodes is 12.3GB/sin each direction. The aggregate switching capacity of the interconnection network is 7.87TB/s. To realize ahigh-performance and high-efficiency computer system, three architectural features are applied in the EarthSimulator; vector processor, shared-memory and high-bandwidth non-blocking interconnection crossbarnetwork. The Earth Simulator achieved 35.86TFLOPS, or 87.5% of peak performance of the system, inLINPACK benchmark, and has been proven as the most powerful supercomputer in the world. It also achieved26.58TFLOPS, or 64.9% of peak performance of the system, for a global atmospheric circulation model with thespectral method. This record-breaking sustained performance makes this innovative system a very effectivescientific tool for providing solutions to the sustainable development of humankind and its symbiosis with theplanet earth.

KEYWORDSKEYWORDSKEYWORDSKEYWORDSKEYWORDS Supercomputer, HPC (High Performance Computing), Parallel processing, Shared memory,Supercomputer, HPC (High Performance Computing), Parallel processing, Shared memory,Supercomputer, HPC (High Performance Computing), Parallel processing, Shared memory,Supercomputer, HPC (High Performance Computing), Parallel processing, Shared memory,Supercomputer, HPC (High Performance Computing), Parallel processing, Shared memory,Distributed memory, Vector processor, Crossbar networkDistributed memory, Vector processor, Crossbar networkDistributed memory, Vector processor, Crossbar networkDistributed memory, Vector processor, Crossbar networkDistributed memory, Vector processor, Crossbar network

1.1.1.1.1. INTRODUCTIONINTRODUCTIONINTRODUCTIONINTRODUCTIONINTRODUCTION

The Japanese government’s initiative “EarthSimulator project” started in 1997. Its target was topromote research for global change predictions byusing computer simulation. One of the most impor-tant project activities was to develop a supercomputerat least 1,000 times as powerful as the fastest oneavailable in the beginning of “Earth Simulatorproject.” After five years of research and developmentactivities, the Earth Simulator was completed andcame into operation at the end of February 2002.Within only two months from the beginning of itsoperational run, the Earth Simulator was proven tobe the most powerful supercomputer in the world byachieving 35.86TFLOPS, or 87.5% of peak perfor-mance of the system, in LINPACK benchmark.

It also achieved 26.58TFLOPS, or 64.9% of peakperformance of the system, for a global atmosphericcirculation model with the spectral method. Thissimulation model was also developed in the “EarthSimulator project.”

2.2.2.2.2. SYSTEM OVERVIEWSYSTEM OVERVIEWSYSTEM OVERVIEWSYSTEM OVERVIEWSYSTEM OVERVIEW

The Earth Simulator is a highly parallel vectorsupercomputer system consisting of 640 processornodes (PN) and interconnection network (IN) (Fig. 1).Each processor node is a shared memory parallelvector supercomputer, in which 8 arithmetic proces-sors (AP) are tightly connected to a main memorysystem (MS). AP is a vector processor, which candeliver 8GFLOPS, and MS is a shared memory of16GB. The total system thus comprises 5,120 APs

Interconnection Network 640 x 640 crossbar switch

Main memorySystem 16GB

Ari

thm

etic

Pro

cess

or #

0

Ari

thm

etic

Pro

cess

or #

1

Ari

thm

etic

Pro

cess

or #

7

Main memorySystem 16GB

Ari

thm

etic

Pro

cess

or #

0

Ari

thm

etic

Pro

cess

or #

1

Ari

thm

etic

Pro

cess

or #

7

Main memorySystem 16GB

Ari

thm

etic

Pro

cess

or #

0

Ari

thm

etic

Pro

cess

or #

1

Ari

thm

etic

Pro

cess

or #

7

Processor Node #0 Processor Node #1 Processor Node #639

Fig. 1 General configuration of the Earth Simula-tor.

Architecture and Hardware for HPC

Page 2: The Earth Simulator SystemThe Earth Simulator System · The Earth Simulator, developed by the Japanese government’s initiative “Earth Simulator Project,” is a highly parallel

1

5

10

15

20

25

30

35

40

45

50

22

1

5

10

15

20

25

30

35

40

45

50

NEC Res. & Develop., Vol. 44, No. 1, January 2003

and 640 MSs, and the aggregate peak vector perfor-mance and memory capacity of the Earth Simulatorare 40TFLOPS and 10TB, respectively.

The interconnection network is a huge 640 × 640non-blocking crossbar switch linking 640 PNs. Theinterconnection bandwidth between every two PNs is12.3GB/s in each direction. The aggregate switchingcapacity of the interconnection network is 7.87TB/s.

The operating system manages the Earth Simula-tor as a two-level cluster system, called the Super-Cluster System. In the Super-Cluster System, 640nodes are grouped into 40 clusters (Fig. 2). Each clus-ter consists of 16 processor nodes, a Cluster ControlStation (CCS), an I/O Control Station (IOCS) andsystem disks. Each CCS controls 16 processor nodesand an IOCS. The IOCS is in charge of data transferbetween the system disk and Mass Storage System,and transfers user file data from the Mass StorageSystem to a system disk, and the processing resultfrom a system disk to the Mass Storage System.

The supervisor, called the Super-Cluster ControlStation (SCCS), manages all 40 clusters, and pro-vides a Single System Image (SSI) operational envi-ronment. For efficient resource management and jobcontrol, 40 clusters are classified as one S-cluster and39 L-clusters. In the S-cluster, two nodes are used forinteractive use and another for small-size batch jobs.

User disks are connected only to S-cluster nodes,and used for storing user files. The aggregate capaci-

ties of system disks and user disks are 415TB and225TB, respectively. The Mass Storage System is acartridge tape library system, and its capacity is morethan 1.5PB.

To realize a high-performance and high-efficiencycomputer system, three architectural features are ap-plied in the Earth Simulator:

· Vector processor· Shared memory· High-bandwidth and non-blocking interconnection

crossbar network

From the standpoint of parallel programming,three levels of parallelizing paradigms are providedto gain high-sustained performance:

· Vector processing on a processor· Parallel processing with shared memory within a

node· Parallel processing among distributed nodes via

the interconnection network.

3.3.3.3.3. PROCESSOR NODEPROCESSOR NODEPROCESSOR NODEPROCESSOR NODEPROCESSOR NODE

The processor node is a shared memory parallelvector supercomputer, in which 8 arithmetic proces-sors which can deliver 8GFLOPS are tightly con-nected to a main memory system with a peak

Fig. 2 System configuration of the Earth Simulator.

Page 3: The Earth Simulator SystemThe Earth Simulator System · The Earth Simulator, developed by the Japanese government’s initiative “Earth Simulator Project,” is a highly parallel

23

1

5

10

15

20

25

30

35

40

45

50

1

5

10

15

20

25

30

35

40

45

50

Special Issue on High Performance Computing

performance of 64GFLOPS (Fig. 3). It consists of 8arithmetic processors, a main memory system, aRemote-access Control Unit (RCU), and an I/O proc-essor (IOP).

The most advanced hardware technologies areadopted in the Earth Simulator development. One ofthe main features is a “One chip Vector Processor”with a peak performance of 8GFLOPS. This highlyintegrated LSI is fabricated by using 0.15µm CMOStechnology with copper interconnection. The vectorpipeline units on the chip operate at 1GHz, whileother parts including external interface circuits oper-ate at 500MHz. The Earth Simulator has ordinary aircooling, although the processor chip dissipates ap-proximately 140W, because a high-efficiency heatsink using heat pipe is adopted.

A high-speed main memory device was also devel-oped for reducing the memory access latency andaccess cycle time. In addition, 500MHz source syn-chronous transmission is used for the data transferbetween the processor and main memory to increasethe memory bandwidth. The data transfer rate be-tween the processor and main memory is 32GB/s, andthe aggregate bandwidth of the main memory is256GB/s.

Two levels of parallel programming paradigms areprovided within a processor node:

· Vector processing on a processor· Parallel processing with shared memory

4.4.4.4.4. INTERCONNECTION NETWORKINTERCONNECTION NETWORKINTERCONNECTION NETWORKINTERCONNECTION NETWORKINTERCONNECTION NETWORK

The interconnection network is a huge 640 × 640non-blocking crossbar switch, supporting global ad-dressing and synchronization.

To realize such a huge crossbar switch, a byte-slicing technique is applied. Thus, the huge 640 × 640non-blocking crossbar switch is divided into a controlunit and 128 data switch units as shown in Fig. 4.Each data switch unit is a one-byte width 640 × 640non-blocking crossbar switch.

Physically, the Earth Simulator comprises 320 PNcabinets and 65 IN cabinets. Each PN cabinet con-tains two processor nodes, and the 65 IN cabinetscontain the interconnection network. These PN andIN cabinets are installed in the building which is 65mlong and 50m wide (Fig. 5). The interconnection net-work is positioned in the center of the computer room.The area occupied by the interconnection network isapproximately 180m2, or 14m long and 13m wide, andthe Earth Simulator occupies an area of approxi-mately 1,600m2, or 41m long and 40m wide.

No supervisor exists in the interconnection net-work. The control unit and 128 data switch unitsoperate asynchronously, so a processor node controlsthe total sequence of inter-node communication. For

Main memory System (MS) (Capacity : 16GB)

From / ToInterconnection Network

LAN

MM

U#0

MM

U#1

MM

U#2

MM

U#3

MM

U#4

MM

U#5

MM

U#6

MM

U#2

9

MM

U#3

0

AP

#0

AP

#1

AP

#2

MM

U#3

1

User/WorkDisks

AP

#7 RCU IOP

128 Mbit DRAM developed for ES (Bank Cycle Time : 24nsec)2048-way banking scheme

Bandwidth : 256GB/s

Fig. 3 Processor node configuration.

Fig. 4 Interconnection network configuration.

Processor Node Cabinets(PN Cabinets)

System Disks

User Disks

Air ConditioningSystem

Double Floor(Floor height 1.5m)

Power SupplySystem 50m

65m

40m

41m14m

13m

Interconnection NetworkCabinets (IN Cabinets)

Mass Storage System

Fig. 5 Bird’s-eye view of the Earth Simulatorbuilding.

Page 4: The Earth Simulator SystemThe Earth Simulator System · The Earth Simulator, developed by the Japanese government’s initiative “Earth Simulator Project,” is a highly parallel

1

5

10

15

20

25

30

35

40

45

50

24

1

5

10

15

20

25

30

35

40

45

50

NEC Res. & Develop., Vol. 44, No. 1, January 2003

example, the sequence of data transfer from node A tonode B is shown in Fig. 6.

1) Node A requests the control unit to reserve a datapath from node A to node B, and the control unitreserves the data path, then replies to node A.

2) Node A begins data transfer to node B.3) Node B receives all the data, then sends the data

transfer completion code to node A.

In the Earth Simulator, 83,200 pairs of 1.25GHzserial transmission through copper cable are used forrealizing the aggregate switching capacity of7.87TB/s, and 130 pairs are used for connecting aprocessor node and the interconnection network.Thus, the error occurrence rate cannot be ignored forrealizing stable inter-node communication. To resolvethis error occurrence rate problem, ECC codes areadded to the transfer data as shown in Fig. 7. Thus, areceiver node detects the occurrence of intermittentinter-node communication failure by checking ECCcodes, and the error byte data can almost always becorrected by RCU within the receiver node.

ECC codes are also used for recovering from acontinuous inter-node communication failure result-ing from a data switch unit malfunction. In this casethe error byte data are continuously corrected byRCU within any receiver node until the broken dataswitch unit is repaired.

To realize high-speed parallel processing synchro-nization among nodes, a special feature is introduced.Counters within the interconnection network’s con-trol unit, called Global Barrier Counter (GBC), andflag registers within a processor node, called GlobalBarrier Flag (GBF), are used. This barrier synchroni-zation mechanism is shown in Fig. 8.

1) The master node sets the number of nodes used forthe parallel program into GBC within the IN’scontrol unit.

2) The control unit resets all GBF’s of the nodes usedfor the program.

3) The node, on which task completes, decrementsGBC within the control unit, and repeats to checkGBF until GBF is asserted.

4) When GBC = 0, the control unit asserts all GBF’sof the nodes used for the program.

5) All the nodes begin to process the next tasks.

Therefore the barrier synchronization time is con-stantly less than 3.5µsec, the number of nodes vary-ing from 2 to 512, as shown in Fig. 9.

5.5.5.5.5. PERFORMANCEPERFORMANCEPERFORMANCEPERFORMANCEPERFORMANCE

Several performances of the interconnection net-work and LINPACK benchmark have been measured.

First, the interconnection bandwidth between

Control Unit

1) reserve a data path

2) transfer data fromnode A to B

3) send completioncode

RCUAP AP

MS

node A

AP AP RCU

MS

node B

XSW

128

XSWData Switch

unit

add ECC code

check ECC code

×

Fig. 6 Inter-node communication mechanism.

Processor Node Processor NodeInterconnection Network

DA

TA 13B

yte

x 8

x 8

P/SX0 S/P

P/SX14 S/P

P/SS/P S/PX15

checking ECC

codes and correcting error data

DA

TA 13B

yte

x 8

P/S S/P

P/S S/P

P/S

DA

TA 13B

ECC

3B

DA

TA 13B

ECC

3B

x 8X16

X127

adding ECC

codes

Fig. 7 Inter-node interface with ECC codes.

2

AP AP RCUGBF

MS

Master node

SREGAP AP RCU

GBF

MS

Node

Control UnitGBC 0-1

-1

-1

GBC1

3

4

5

23

3

3

4

5

Fig. 8 Barrier synchronization mechanism usingGBC.

Page 5: The Earth Simulator SystemThe Earth Simulator System · The Earth Simulator, developed by the Japanese government’s initiative “Earth Simulator Project,” is a highly parallel

25

1

5

10

15

20

25

30

35

40

45

50

1

5

10

15

20

25

30

35

40

45

50

Special Issue on High Performance Computing

every two nodes is shown in Fig. 10. The horizontalaxis shows data size, and the vertical axis showsbandwidth. Because the interconnection network is asingle-stage crossbar switch, this performance is al-ways achieved for two-node communication.

Second, the effectiveness of the Global BarrierCounter is summarized in Fig. 11. The horizontal axisshows the number of processor nodes, and the Verti-cal axis shows MPI_barrier synchronization time.The line labeled “with GBC” shows the barrier syn-chronization time using GBC, and “without GBC”shows the software barrier synchronization. By usingGBC feature, MPI_barrier synchronization time isconstantly less than 3.5µsec. On the other hand, thesoftware barrier synchronization time increases, or isproportional to the number of nodes.

The performance of LINPACK benchmark is mea-sured to verify that the Earth Simulator is a high-performance and high-efficiency computer system.The result is shown in Fig. 12, the number of proces-sors varying from 1,024 to 5,120. The ratio of peak

0

2

4

6

8

10

12

14

0.000001 0.00001 0.0001 0.001 0.01 0.1 1 10 100Transfer Data Size [MB]

Ban

dwid

th

Read

Write[GB

/sec

]

Fig. 10 Inter-node communication bandwidth.

00.5

11.5

22.5

33.5

44.5

5

0 100 200 300 400 500 600The number of Processor Nodes

Tim

e (

sec)

µ

Fig. 9 Scalability of MPI_Barrier.

Fig. 12 LINPACK performance and peak ratio.

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

80.0

90.0

100.0

0 1000 2000 3000 4000 5000 6000

The number of processors

Peak Ratio [%]

LINPACK HPC[TFLOPS]

0.0

10.0

20.0

30.0

40.0

50.0

60.0

2 4 8 16 32 64

The number of Processor Nodes

Tim

e (

sec)

without GBC

with GBC

µ

Fig. 11 Scalability of MPI_Barrier.

performance is more than 85%, and LINPACK perfor-mance is proportional to the number of processors.

6.6.6.6.6. CONCLUSIONCONCLUSIONCONCLUSIONCONCLUSIONCONCLUSION

An overview of the Earth Simulator system hasbeen given in this paper, focusing on its three archi-tectural features and hardware realization, especiallythe details of the interconnection network.

Three architectural features (i.e.: vector processor,shared memory and high-bandwidth non-blocking in-terconnection crossbar network) provide three levelsof parallel programming paradigms; vector process-ing on a processor, parallel processing with sharedmemory within a node and parallel processing amongdistributed nodes via the interconnection network.The Earth Simulator was thus proven to be the mostpowerful supercomputer, achieving 35.86TFLOPS, or87.5% of peak performance of the system, in

Page 6: The Earth Simulator SystemThe Earth Simulator System · The Earth Simulator, developed by the Japanese government’s initiative “Earth Simulator Project,” is a highly parallel

1

5

10

15

20

25

30

35

40

45

50

26

1

5

10

15

20

25

30

35

40

45

50

NEC Res. & Develop., Vol. 44, No. 1, January 2003

Shinichi HABATA received his B.E. degree inelectrical engineering and M.E. degree in in-formation engineering from Hokkaido Univer-sity in 1980 and 1982, respectively. He joinedNEC Corporation in 1982, and is now Managerof the 4th Engineering Department, Comput-

ers Division. He is engaged in the development ofsupercomputer hardware.

Mr. Habata is a member of the Information ProcessingSociety of Japan.

Mitsuo YOKOKAWA received his B.S. andM.S. degrees in mathematics from TsukubaUniversity in 1982 and 1984, respectively. Healso received a D.E. degree in computer engi-neering from Tsukuba University in 1991. Hejoined the Japan Atomic Energy Research In-

stitute in 1984, and was engaged in high performance compu-tation in nuclear engineering. He was engaged in the develop-ment of the Earth Simulator from 1996 to 2002. He has joinedthe National Institute of Advanced Industrial Science andTechnology (AIST) in 2002, and is Deputy Director of the GridTechnology Research Center of AIST. He was a visiting re-searcher of the Theory Center at Cornell University in the USfrom 1994 to 1995.

Dr. Yokokawa is a member of the Information ProcessingSociety of Japan and the Japan Society for Industrial andApplied Mathematics.

Shigemune KITAWAKI received his B.S. de-gree and M.S. degree in mathematical engi-neering from Kyoto University in 1966 and1968, respectively. He joined NEC Corporationin 1968. He was then involved in developingcompilers for NEC’s computers. He joined

Earth Simulator project in 1998, and is now Group leader of“User Support Group” of the Earth Simulator Center, JapanMarine Science and Technology Center.

Mr. Kitawaki belongs to the Information Processing Societyof Japan and the Association for Computing Machinery.

LINPACK benchmark, and 26.58TFLOPS, or 64.9%of peak performance of the system, for a global atmo-spheric circulation model with the spectral method.

This record-breaking sustained performance makethis innovative system a very effective scientific toolfor providing solutions for the sustainable develop-ment of humankind and its symbiosis with the planetearth.

ACKNOWLEDGMENTSACKNOWLEDGMENTSACKNOWLEDGMENTSACKNOWLEDGMENTSACKNOWLEDGMENTS

The authors would like to offer their sincere condo-lences for the late Dr. Hajime Miyoshi, who initiatedthe Earth Simulator project with outstanding leader-ship.

The authors would also like to thank all members

who were engaged in the Earth Simulator project,and especially H. Uehara and J. Li for performancemeasurement.

REFERENCESREFERENCESREFERENCESREFERENCESREFERENCES

[1] M. Yokokawa, S. Shingu, et al., “Performance Estimationof the Earth Simulator,” Towards Teracomputing, Proc. of8th ECMWF Workshop, pp.34-53, World Scientific, 1998.

[2] K. Yoshida and S. Shingu, “Research and development ofthe Earth Simulator,” Proc. of 9th ECMWF Workshop,pp.1-13, World Scientific, 2000.

[3] M. Yokokawa, S. Shingu, et al., “A 26.58 TFLOPS GlobalAtmospheric Simulation with the Spectral TransformMethod on the Earth Simulator,” SC2002, 2002.

* * * * * * * * * * * * * * *

Received November 1, 2002

* * * * * * * * * * * * * * *