COMP25212 CPU Multi Threading Learning Outcomes: to be able to: –Describe the motivation for multithread support in CPU hardware –To distinguish the benefits

COMP25212 CPU Multi ThreadingLearning Outcomes: to be able to:Describe the motivation for multithread support in CPU hardwareTo distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreadingTo explain when multithreading is inappropriateTo be able to describe a multithreading implementationsTo be able to estimate performance of these implementationsTo be able to state important assumptions of this performance model

Revision: IncreasingCPU PerformanceData CacheFetch LogicFetch LogicDecode LogicFetch LogicExec LogicFetch LogicMem LogicWrite LogicInst CacheHow can throughput be increased? Clocka

c

b

d

f

e

Increasing CPU PerformanceBy increasing clock frequencyBy increasing Instructions per ClockMinimizing memory access impact data cacheMaximising Inst issue rate branch predictionMaximising Inst issue rate superscalarMaximising pipeline utilisation avoid instruction dependencies out of order execution(What does lengthening pipeline do?)

Increasing Program ParellelismKeep issuing instructions after branch?Keep processing instructions after cache miss?Process instructions in parallel?Write register while previous write pending?

Where can we find additional independent instructions?In a different program!

Revision Process StatesTerminatedRunning on a CPUBlocked waiting for eventReady waiting for a CPUNewDispatch (scheduler)Needs to wait(e.g. I/O)I/O occursPre-empted(e.g. timer)

Revision Process Control BlockProcess IDProcess StatePCStack PointerGeneral RegistersMemory Management Info

Open File List, with positionsNetwork ConnectionsCPU time usedParent Process ID

Revision: CPU SwitchProcess P0Process P1Operating SystemSave state into PCB0Load state fromPCB1Save state into PCB0Load state fromPCB1

What does CPU load on dispatch?Process IDProcess StatePCStack PointerGeneral RegistersMemory Management Info


What does CPU need to store on deschedule?Process IDProcess StatePCStack PointerGeneral RegistersMemory Management Info


CPU Support for Multithreading

How Should OS View Extra Hardware Thread?A variety of solutions

Simplest is probably to declare extra CPU

Need multiprocessor-aware OS

CPU Support for MultithreadingDesign Issue:when to switch threads

Coarse-Grain MultithreadingSwitch Thread on expensive operation:E.g. I-cache missE.g. D-cache miss

Some are easier than others!

Switch Threads on Icache miss

1234567Inst aIFIDEXMEMWBInst bIFIDEXMEMWBInst cIF MISSIDEXMEMWBInst dIFIDEXMEMInst eIFIDEXInst fIFID

Inst XInst YInst Z

----

Performance of Coarse GrainAssume (conservatively) 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks)1 i-cache miss per 100 instructions1 instruction per clock otherwiseThen, time to execute 100 instructions without multithreading100 + 20 clock cyclesInst per Clock = 100 / 120 = 0.83.With multithreading: time to exec 100 instructions:100 [+ 1]Inst per Clock = 100 / 101 = 0.99..

Switch Threads on Dcache missPerformance:similar calculation (STATE ASSUMPTIONS!)

Where to restart after memory cycle? I suggest instruction a why?Abort these

1234567Inst aIFIDEXM-MissWBInst bIFIDEXMEMWBInst cIFIDEXMEMWBInst dIFIDEXMEMInst eIFIDEXInst fIFID

MISSMISSMISS

---

---

---

Inst XInst Y

By increasing clock frequency

By increasing Instructions per Clock

Minimizing memory access impact data cache

Maximising Inst issue rate branch prediction

Maximising inst issue rate superscalar

Maximising pipeline utilisation avoid instruction dependencies out of order execution

**

Documents

COMP25212 CPU Multi Threading Learning Outcomes: to be able to: –Describe the motivation for multithread support in CPU hardware –To distinguish the benefits