Upload
coleen-leonard
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
The Xenon Processor
http://arstechnica.com/articles/paedia/cpu/xbox360-2.ars/2
Xenon is actually a modified PPE unit of the Cell Processor. IBM designed it for Microsoft.
Broadway CPUSingle Core
729 MHz
http://www.wikipedia.org, http://www.reghardware.co.uk/2007/07/20/wii_tops_3m_in_japan/
SMT Architecture
Figure 5.8
Each logical CPU has:- Its own registers- Can handle interrupts
Similar to Virtual Machinesbut done at the HW level
CPU Affinity (proc staying at one processor)
CORE 1 CORE 2
Cache Cache
MainMemory
Soft Affinity – Process may be migrated to a different processorHard Affinity – Process is locked to one processor
Load Balancing: Push Migration
CORE 1 CORE 2
Ready Queue 1 Ready Queue 2
Kernel
Check loadCheck load
Push Migration
Load Balancing: Pull Migration
CORE 1 CORE 2
Ready Queue 1 Ready Queue 2
Kernel
Notify queue empty
Pull Migration
CPU 0
Scheduling Domains in the Linux Kernel(v 2.6.7 and later)
Core 0
Core 1
CPU 1
Core 0
Core 1
SchedLevel 0
Level 1
Level 2
LoadBalance
LoadBalance
LoadBalance
Takes CPU Affinity into consideration. It tries to migrate only in the same group
Benefits of Scheduling Domain Keep migration local when possible.
Less cache-miss.
Can optimize for power saving mode. Schedule only for one domain when possible.
Future trend of Multi-CPU Processors?
AMPAsymmetric Multi-Processing
Few High-speed Serial Core + Many Slower Parallel Cores
Turbo Boost Technology (Intel)Core i5, i7 Processors
3-4 Cores2.26 GHz
2 Cores3.06 GHz
1 Cores3.2 GHz
Can turn on/off any core and adjust the speed
Parallel Tasks Sequential Tasks
Example of Performance Asymmetry
Core 0
Core 1
Core 2
Core 3
Performance Index = 2
PI = 1 PI = 1 PI = 1
Scaled Load – Core 0’s ready queue should be twice as long
Handling High Number of Cores None Pre-emptive
- Less need to share CPU - Save context switch time
Smart Barrier- A thread can tell the OS what resources it is waiting for- OS does not need to schedule the thread until the resources are ready
Parallel Processing Exercise
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
1. New Data [ x, y] = Old Data [x, y] ^ 2
2. New Data [ x, y] = (D [x, y] + D [x-1, y] + D [x+1, y] + D [x , y-1] + D [x, y+1]) / 5