View
43
Download
0
Category
Preview:
DESCRIPTION
Essential Overview. Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006. Agenda. Hardware Software Documentation. Hardware Overview. Processors: Nodes: Clusters:. Product Naming. Processor Progression. POWER5 Systems. POWER5 processors - PowerPoint PPT Presentation
Citation preview
© 2005 IBM
Essential Overview
Louisiana Tech UniversityRuston, Louisiana
Charles GrasslIBM
January, 2006
2 © 2005 IBM Corporation
Agenda
• Hardware• Software• Documentation
3 © 2005 IBM Corporation
Hardware Overview
• Processors:
• Nodes:
• Clusters:
4 © 2005 IBM Corporation
Product Naming
New Name Old Name Market Processor
iSeries AS400 Commercial RS64
pSeriesRS600
SPSP2
TechnicalPOWER3POWER4POWER5
xSeriesIA-32IA-64
ServerXeonAMD
zSeries ES9000 Mainframe RS64
5 © 2005 IBM Corporation
Processor Progression
Processor Years Clock Rate Feature
POWER2 1990 - 1994 20 – 60 MHz RISC
P2SC 1994 - 1998 60 – 150 MHz Bandwidth
POWER3 1998 – 2002 200 – 450 MHz Single Chip
POWER4 2001 – 2005 1 – 1.9 GHz Dual Core
POWER5 2004 - 1.5 – 1.9 GHz Multi-Thread
6 © 2005 IBM Corporation
POWER5 Systems
• POWER5 processors•Single and Dual processor chips
• Modules•Dual Chip Modules (DCM)
•Multi Chip Modules (MCM)• Nodes
•Multiple modules• p5-575• p5-595
• Cluster•Multiple nodes
•Connected with High Speed Switch (HPS)
7 © 2005 IBM Corporation
Systems (“Nodes”)
Model ProcessorsClock Rate
(GHz)Memory
(x 2^30 byte)
p5-595 16-6416-64 1.65, 1.91.65, 1.9 20002000
p5-590 8-32 1.65, 1.9 1000
p5-575 8 1,5, 1.9 256
p5-570 2-16 1.65, 1.9 512
p5-550 2-4 1.65 64
p5-520 2 1.65 32
p5-510 1,2 1.65 1 - 32
8 © 2005 IBM Corporation
POWER5 Processor Systems
MCM
Chip
Processor
DCM p5-575
p5-595
Cluster
9 © 2005 IBM Corporation
Cluster 1600
Multi ProcessorNodes
Physical ViewLogical View
Network,Disk System
10 © 2005 IBM Corporation
Local System Name
• IBM p5-575 nodes• 1.9 GHz POWER5 processors
• Single processor chips
• 8 processors per node
• HPS interconnect
• “575” distinction:• Dual Chip Module (DCM)• 8 DCMs• One or two processors
per chip• Single Core (SC)
• Dual Core (DC)
• “595” distinction:• Multi Chip Module (MCM)
construction• 8 MCMs
11 © 2005 IBM Corporation
POWER5 Processors
• Multi-processor chip• High clock rate: Multiple GHz• Three cache levels
• Bandwidth
• Latency hiding
• Shared Memory• Large memory size
12 © 2005 IBM Corporation
POWER5 Features
• Private L1 cache• Shared L2 cache• Shared L3 cache• Interleaved memory• Hardware Prefetch• Multiple Page Size support
13 © 2005 IBM Corporation
Processor Characteristics
• High frequency clocks•Deep pipelines
•High asymptotic rates
• Superscalar• Speculative out-of-order instructions • Up to 8 outstanding cache line misses• Large number of instructions in flight• Branch prediction• Hardware Prefetching
14 © 2005 IBM Corporation
Processor FeaturesPOWER4 POWER5
Clock 1.0 – 1.9 GHz 1.5 – 1.9 - … GHz
Caches Three levels Three levels
L3 Speed 1/3 clock frequency ½ clock frequency
Virtualization Up to 32 partitions Up to 254 partitions
Partitions Unit processor Fractional
Power Mang. Static Dynamic
ThreadExecution
Single Thread Multi Threading
Memory Store
Single Buffer Double Buffer
Renaming Registers
GP: 72FP: 80
GP: 120FP: 120
15 © 2005 IBM Corporation
Caches and Memory
POWER4 POWER5
L1 CacheData: 32 kbyteInstruction: 64 kbyte2-way Assoc., FIFO
Data: 32 kbyteInstruction: 64 kbyte4-way Assoc., LRU
L2 Cache 1.5 Mbyte8-way Assoc., FIFO
1.9 Mbyte10-way Assoc., LRU
L3 Cache32 Mbyte8-way Assoc., LRU120 Cycles
36 Mbyte12-way Assoc., LRU~80 Cycles
Memory Bandwidth 4 Gbyte/s / Chip 16 Gbyte/s / Chip
16 © 2005 IBM Corporation
POWER4+ POWER5
Frequency (GHz) 1.7 1.9
L2 Latency (Cycles) 12 12
L3 Latency (Cycles) 120 80
Memory Latency (Cycles) 351 220
Copy Bandwidth4 proc. (Gbyte/s)
8 18
Linpack RateN=1000 (Gflop/s)
3.9 5.6
SPECint_base2000 1077 1398
SPECfp_base2000 1598 2576
POWER4 – POWER5 Comparison
17 © 2005 IBM Corporation
POWER5 Design: Summary
• More gates•170 million 260 million
• Enhancements• Increased cache associativity
• Increased number of rename registers
•Reduced L3 and cache latency
• New features•Simultaneous Multi Threading
•Dynamic power management
18 © 2005 IBM Corporation
Processor Systems (Nodes)
• Multiple processors• Multiple modules• Various construction formats
•Multi Chip Modules
•Dual Chip Modules
• Shared memory
19 © 2005 IBM Corporation
Multi Chip and Dual Chip Modules
Multi Chip Module (MCM)p5-590p5-595
ChipPOWER5 Processor
Dual Chip Module (MCM)p5-570p5-575
20 © 2005 IBM Corporation
Dual Chip Module
• Each Module:• 1 processor chip
• 1 L3 cache
• 1 Memory card• Each Processor
Chip• 2 processors
• L1 caches• Registers• Functional
units
• 1 L2 cache
• 1 path to memory
36Mbyte
L3
Memory
21 © 2005 IBM Corporation
Multi Chip Module
• Each Module:• 4 processor chips
• 4 L3 cache chips
• 2 Memory cards
• Each Processor Chip• 2 processors
• L1 caches
• Registers
• Functional units
• 1 L2 cache
• 1 path to memory
Memory
Memory
Memory
Memory
22 © 2005 IBM Corporation
POWER5 Multi Chip Module
• Four POWER5 chips• Four L3 cache chips• 95mm 95mm• 4,491 signal I/Os• 89 layers of metal
23 © 2005 IBM Corporation
POWER5 Dual Chip Module
• One POWER5 chip
• Single or Dual Core
• One L3 cache chips
24 © 2005 IBM Corporation
L3
Modifications to POWER4 System Structure
P P
L2
Memory
L3
Fab Ctl
P P
L2
L3
Memory
L3
Fab Ctl
L3 L3
Mem Ctl Mem Ctl
Mem Ctl Mem Ctl
25 © 2005 IBM Corporation
Switch Technology
• Internal network•In lieu of GigEthernet, Myrinet, Quadrics, etc.
• Fourth generation•HPS Switch (POWER2 generation)
•SP Switch (POWER2 -> POWER3)
•SP Switch 2 (POWER3 -> POWER4)
•HPS (POWER4 -> POWER5)• Multiple links per node
•Match number of links to number of processors
26 © 2005 IBM Corporation
High Performance Switch (HPS)
• Also Known As “Federation”• Follow on to SP Switch2
•Also known as “Colony”• Specifications:
•2 Gbyte/s (bidirectional)
•5 microsecond latency• Configuration:
•Up to four adaptors per node• 2 links per adaptor• 16 Gbyte/s per node
27 © 2005 IBM Corporation
HPS Specifications
Latency[microsec.]
Bandwidth, single
[Mbyte/s]
Bandwidth, multiple
[Mbyte/s]
SP Switch 2 15 350 550
HPS 5 1800 1930
28 © 2005 IBM Corporation
Software Overview
• Operating System• AIX
• Compilers• C
• C++
• Fortran• Batch Queue
• LoadLeveler (IBM)
• LSF (Platform)
• PBS
• Gridware
29 © 2005 IBM Corporation
AIX
• Current Version: AIX 5.3• Processors:
• POWER3
• POWER4
• POWER5• Linux Affinity• Logical PARtitions (LPAR) Nodes
• Operating system
• Memory
• Network connections• Kernel Address Size:
• 64-bit
• 32-bit
30 © 2005 IBM Corporation
Linux on POWER
• Native Linux, SuSE7 SuSE8• Rpm's and package managers• Cluster Systems Manager• 64-bit kernel• 32/64-bit applications support (SuSE8)
Compiler User Name
C Xlc
C++ xlC
Fortran xlf
31 © 2005 IBM Corporation
Compilers
C and C++• Visual Age C and C++
Professional for AIX• Versions 6, 7, 8
• ANSI C
• C++
• Compiler names:• xlc
• xlC
Fortran• XL Fortran for AIX
• Versions 8, 9, 10
• Fortran 77
• Fortran 90
• Compiler names:• xlf77
• xlf90
32 © 2005 IBM Corporation
Compiler Names
Compiler User Name
Fortran 77 xlf77
Fortran 90 xlf90
C xlc
C++ xlC
MPI compile mpxlf, mpcc
Reentrant xlf_r, xlc_r
AIX uses different compiler names to perform some tasks which are handled by compiler flags on most other systems
33 © 2005 IBM Corporation
Compiler Usage
Language Command Feature Extension
ANSI Cxlc
xlc_rANSI
Thread safe.c
Extended C cc Pre-ANSI .c
MPI, C mpxlc MPI .c
C++xlC
xlC_r Thread safe.C .cc .cpp
Fortran 77xlf
xlf_r Thread safe.f
Fortran 90xlf90
xlf90_r Thread safe.f
MPI fortran mpxlf MPI .f
34 © 2005 IBM Corporation
User Limits
• Set by the system administrator• Ulimit:
•C or K shell built-in
•Sets or reports resource limits
•Limits are defined in /etc/security/limits
• Sizes are in 512 byte blocks• Times are in seconds
•$ ulimit -a
35 © 2005 IBM Corporation
Ulimit Defaults
Value
Limit Definition Default Typical
fsize File Size 2097151 Unlimited (-1)
core Core File Size 2097151 Unlimited (-1)
cpu Per Process limit -1 (unlimited)
Unlimited (-1)
data Data Segment Size 262144 Unlimited (-1)
stack Stack Segment Size 65536 *Unlimited (-1)
No. files File Descriptor Limit
2000 2000
* 64-bit address mode
36 © 2005 IBM Corporation
Other Defaults
• Thread control• /etc/environment
• AIXTHREAD_SCOPE=S• AIXTHREAD_MNRATIO=1:1• AIXTHREAD_COND_DEBUG=OFF• AIXTHREAD_GUARDPAGES=4• AIXTHREAD_MUTEX_DEBUG=OFF• AIXTHREAD_RWLOCK_DEBUG=OFF
37 © 2005 IBM Corporation
Batch Queuing
• Compile on any AIX node•Use –qarch=pwr5
• Submit job with available batch utility• Use appropriate queue name• Available queuing systems:
•LoadLeveler
•PBS
•Gridware
•LSF
38 © 2005 IBM Corporation
Cluster Layout
CompileAnd
SubmitNode
Node 0 Node 1
Network
Node 2
39 © 2005 IBM Corporation
Documentation
• Software:•www.software.ibm.com
• Products A-Z• X -> xl C, xl C/C++, xl Fortran
•www.servers.ibm.com/aix• Compilers
• /usr/vac/doc
• /usr/vacpp/doc
• /usr/lpp/xlf/doc• Redbooks:
•www.redbooks.ibm.com/• IBM eServer p5 590 and 595 System Handbook
40 © 2005 IBM Corporation
Documentation
• AIX Commands Reference•AIX command:
• /usr/sbin/infocenter• /opt/ibm_help/help_start.sh
•http://www.unet.univie.ac.at/aix/aixgen/wbinfnav/aixcmdsrefbooks.htm• Google search: “AIX Commands Reference”
41 © 2005 IBM Corporation
Documentation Library
Google Search: AIX 5L documentation Libraryhttp://publibn.boulder.ibm.com/cgi-bin/ds_rslt
42 © 2005 IBM Corporation
Summary: Architecture
• System architecture• Processors
• Nodes
• Cluster
• Processors• POWER5
• Three levels of cache
• Nodes:• Eight processor p5-575
• Cluster:• 14 p5-575 nodes
• HPS interconnect
Recommended