View
227
Download
5
Category
Preview:
Citation preview
Ninghui SunInstitute of Computing Technology
Chinese Academy of Sciences
May 18th, 2006. Beijing, ITER
Supercomputer in China: Dawning’s Experience
Expertuse
Early adoption
Public recognized
Widelypopularization
Degree of mature
mainframe
PC/ServerInternet
Current position
Gridcomputing
Time
Progress Curve of Information Technology Progress Curve of Information Technology
Supercomputers in History
中国计算机事业的摇篮
Established in 1956
First computing research institute in China
103 机
第一台小型通用数字电子计算机First Computer in China
1958 年 8 月
第一台自行研制的晶体管大型通用数字电子计算机First Transistor Computer in China
1965 年 6 月
109 乙机
第一批小规模集成电路通用数字电子计算机First LSI Computer in China
1970 年
111 机
大型向量中规模集成电路数字电子计算机Vector Computer
1983 年 11 月
757 机
大型计算机系统Mainframe Computer
1991 年 11 月
KJ8920 机
Applications
Scientific Computing
Very important applications
Small market
Dawning’s
High Performance Computers
Dawning1000First MPP Computer in China
1995
Intel i860 microprocessor Massively Parallel
Processing architecture 2D Mesh Worm-hole
Routing Chip Micro-kernel OS Message passing protocol
Challenge
WRC
MEM
i860
Controller
NIC
Link
Compare with No.1 of TOP500
Time Name Rpeak Rmax Num Processor Site Vendor
1994.6 Paragon 184 143.40 3680 Intel i860/XP Sandia Intel XPS140 National Lab
Dawning1000: 2.5Gflops Peak Gap: 74 times
Dawning2000First SMP Cluster in China
1999
Dawning3000SUMA Cluster in China
2001
Challenge
Scalability
Usability
Manageability
Availability
SUMA™ Technology for Cluster:leadership in delivering scalable cluster systems in China
Dawning Cluster
SUMA Cluster Intel Xeon, IBM Power Linux, Windows, AIX Interconnected by five networks
System area network Myrinet, NCIC Switch, or Gigabit Ethernet
Parallel programming software BCL3, PVM, MPI, JIAJIA, OpenMP Autopar, Dawning Cluster DeBugger Open source software (GNU, etc)
Cluster file system - COSMOS Cluster management software
RMS, JOSS, CSMS, MONITOR, DSC, SEPS, PowerRouter
Compare with No.1 of TOP500
Time Name Rpeak Rmax Num Processor Site Vendor
2001.11 ASCI 12288 7226 8192 IBM Powers Lawrence IBM White Livermore
Dawning3000: 400Gflops Peak Gap: 30 times
Dawning4000-AGrid-enabling Cluster in China
2004
Compare with No.1 of TOP500
Time Name Rpeak Rmax Num Processor Site Vendor
2004.6 Earth 40960 35860 5120 NEC SX-6 NASDA NEC Simulator
Dawning4000A: 11264Gflops Peak Gap: 4 times
Business Impact
Founder of Dawning Information Industries Co. Founded in 1995, 50 people Total capital: about US$9 million Our share: about US$2.3 million
Dawning Now Listed in Hong Kong Stock Exchange Total capital: about US$100 million Our share: about US$8 million 600 people “It’s SUMA” has become Dawning brand name
Dawning Server Family
Evolution of Dawning HPC Systems
0. 01
0. 1
1
10
100
1000
10000
100000
1993 1995 1996 1998 2000 2001 2003 2004
Gfl op/ s Memory GB Storage GB CPUs Li npack
Dawning Corp.
Founded
Clusters
95/6 2004/6 Gflop/s Top1=170 Top1=35860 Top500=1.96 Top500=624
Dawning 1000: 1.2 Dawning 4000A: 8061
Annual Sale: tens hundreds
Industry Standard Cluster(COTS)
CPU: Xeon/Opteron/Itanium
Memory: SRAM/DDR
I/O: HyperTransport/PCI Express
Storage: SCSI/FC/SATA/iSCSI
Interconnect: Myrinet/Quatric/Infiniband/Ethernet
OS: Linux/GNU Compiler/Java
Protocol: GM/VIA/Verb/MPI/PVM/uDAPL
Library: MKL/ESSL/ACML/Scalapack/Gauss
Tools: CMS/PBS/LSF/Luster
Application: Paradiam/LS-DYNA/Cerius/MM5/Blast/ Oracle RAC
Integrate: LinuxNetworx/Scali
Applications
Scientific Computing, Simulation, Data Processing, Network Service
Enterprise applications
Medium market
Currently Typical Systems
Technique Approach
Standard System: Market Constellation: SMP/NUMA Unix Cluster
Cluster: COTS, Rack/Blade
Customized System: Performance MPP: IBM BlueGene-L, Cray Redstorm Vector: NEC Earth-Simulator Application Accelerator: QCD, Grape Innovation Architecture: Sun Hero, Cray Cascade, IBM
PERCS
SoC MPP - BlueGene/L
Features
Low frequency CPU
700MHz
Low power, High density
Interconnect on Chip
Disadvantage
64,000 Chips – massively parallelism
Small memory size per CPU – limited applications
Customized -- NOT low cost
DSM Vector - NEC Earth Simulator
Architecture Vector Node, MIMD, DSM
Vendor: NECSystem
640 Computing Nodes Peak: 40.96TFlops LINPACK:35.86T
Node 8 Vector Processor 8Gflop/s peak per processor 16GB memory
Interconnect Single-stage crossbar network 16GB/s cross section
bandwidth
Features
Vector CPU: high sustained performance Distributed shared memory
Single-stage crossbar network
Disadvantage
High price, high power, high proportion
Customized – very small market
Gravity (N-Body) calculation for many particles with 31 Gflops/chip
32 chips / board - 0.99 Tflops/board 64 boards of full system is installed in
University of Tokyo - 63 Tflops On each board, all particles data are
set onto SRAM memory, and each target particle data is injected into the pipeline, then acceleration data is calculated No software!
Gordon Bell Prize at SC for a number of years (Prof. Makino, U. Tokyo)
Accelerator - Grape-6
Features
Specific application Reconfigurable computing
Disadvantage
Bad for programming
Dedicated design
Constellation - SGI Columbia
Architecture: NUMA ConstellationVendor: SGISystem
20 Altix NUMA systems 320 Cabs Peak: 61.4TFlops LINPAK: 52TFlops
Node 1.5GHz Itanium2 512 procs/node (NUMA) Dual FPU/proc
Interconnect Intra-node: SGI® NUMAlink™ Inter-node: Infiniband + Gigabit
Ethernet
2000KW total
Features
RISC processor, Unix operating system
Customized high performance interconnection
Disadvantage
Unix server – NOT low cost
Standard but NOT open: controlled by one vendor
COTS Cluster - Dawning4000A
System640 NodesPeak: 11.26TLINPACK: 8.06T
Node2.2GHz Opteron 8GB memory/node
InterconnectionMyrinet2000
Features
Commodity On The Shelf
Compatible with PC
Disadvantage Low sustained performance
HPC = High Price Computer
HPC = High Power Computer
HPC = High Proportion Computer
LPC NEEDED !
Technique Challenge
• Memory wall• Massively threads
• MPP/SMP/Cluster/Vector on Chip
• PIM , Stream , Sun Hero
• Programming model• PRAM
• time-to-solution
• parallel efficiency
• Reliability of large-scale system
• LPC (Price/Power/Proportion)
• Scalable I/O
• Single System Image
Difficult
Single CPU Performance
CPU Frequencies
Aggregate Systems Performance
0.0010
0.0100
0.1000
1.0000
10.0000
100.0000
1000.0000
10000.0000
100000.0000
1000000.0000
1980 1985 1990 1995 2000 2005 2010
Year
FLO
PS
100M
1G
10G
100G
10T
100T
1P
10M
1T
100MHz
1GHz
10GHz
10MHz
X-MP VP-200
S-810/20S-820/80
VP-400SX-2
CRAY-2 Y-MP8VP2600/10
SX-3
C90
SX-3RS-3800 SR2201
SR2201/2K
CM-5
Paragon
T3D
T90
T3E
NWT/166VPP500
SX-4ASCI Red VPP700
SX-5
SR8000
SR8000G1SX-6
ASCI BlueASCI Blue Mountain
VPP800
VPP5000
ASCI WhiteASCI Q
Earth Simulator
1M
Increasing Parallelism
History of High Performance Computers
STOP
Look at Future
Expertuse
Early adoption
Public recognized
Widelypopularization
Degree of mature
Supercomputer
High Performance Computer
COTS Cluster
Current position
Grid Computing
Time
Progress Curve of Supercomputing Progress Curve of Supercomputing
Nu
mb
er
of
Us
ers
Amount of Computing Power
Nu
mb
er
of
Ap
plic
atio
ns
# of Dollars
Current Market for HPC
Easy Pickings
Nu
mb
er
of
Ta
sks
, Storage , & CapabilityDoD
8
1 2 4 64
Competitive NecessityBusiness ROIProgrammer Productivity
NSF DoE
Heroes
Ideal Market for HPC
Blue-Collar HPC
IncreasedProductivity Gains
In Industry and Engineering Increased
Gains inScientific Discovery
Blue Collar Computing
Observational Science Scientist gathers data by direct observation Scientist analyzes Information
Analytical Science Scientist builds analytical model Makes predictions
Computational Science Simulate analytical model Validate model and makes predictions
Information Exploration Science Information captured by instrumentsOr Information generated by simulator Processed by software Placed in a database / files Scientist analyzes database / files
The Evolution of Science
High-end Supercomputer
Capability Computing: run one application
Y2010: 1-10 PetaFlops CPU: 32-128 cores Innovative technology
Accelerating single thread within multi-core
Many data transfer channels needed
Grid-enabling HPC
Capacity Computing: run many applicationsGrid environment: resource sharingProductivity: Virtualization technologyCapability Server as the node
Shanghai
Beijing
Xi-an HefeiSeamless integrate and collaboration of remote resources, applications, experts
Collaborative Scientific Research by Grid
Personal Supercomputer
Motivation: popularization as PC Hardware: desktop, integrated Software: using and managing as easily as PC Application: enough third-part software Windows Compute Cluster Server
互连网络底板
电源系统
散热系统
HOSTPE
输入/输出LAN接口
存储系统
Supercomputer on Chip
Moore’s Law works up to Y2020How to use the transistors: increasing the
parallelismSingle-chip supercomputer (IBM TRIPS)Y2010 mobile computer: 500MHz, 1024
cores, 2TFlops/chip, 100w, low price
Programming Challenge
Thousands processors: more parallelism PRAM programming model: shared
address space Three-tier memory hierarchy: cache,
memory, interconnect, locality-aware Friendly parallel programming language:
MATLAB
THANKS
Recommended