32
1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu

Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

  • Upload
    selia

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu. Paper Overview. Multiprocessor Code Reuse Poor resource utilization Computation Spreading - PowerPoint PPT Presentation

Citation preview

Page 1: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

1

Computation Spreading: Employing Hardware Migration to Specialize

CMP Cores On-the-fly

Koushik Chakraborty Philip WellsGurindar Sohi

{kchak,pwells,sohi}@cs.wisc.edu

Page 2: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 2

Paper Overview

Multiprocessor Code ReusePoor resource utilization

Computation SpreadingNew model for assigning computation within a program on CMP cores in H/WCase Study: OS and User computation

Investigate performance characteristics

Page 3: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 3

Talk Outline

Motivation Computation Spreading (CSP)

Case study: OS and User compution Implementation Results Related Work and Summary

Page 4: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 4

Homogeneous CMP

Many existing systems are homogeneous

Sun Niagara, IBM Power 5, Intel Xeon MP

Multithreaded server application Composed of server threadsTypically each thread handles a client requestOS assigns software threads to cores• Entire computation from one thread

execute on a single core (barring migration)

Page 5: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 5

Code Reuse

Many client requests are similarSimilar service across multiple threadsSame code path traversed in multiple cores

Instruction footprint classificationExclusive – single core accessCommon – many cores accessUniversal – all cores access

Page 6: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 6

Multiprocessor Code Reuse

Page 7: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 7

Implications

Lack of instruction stream specialization

Redundancy in predictive structures• Poor capacity utilization

Destructive interference No synergy among multiple cores

Lost opportunity for co-operationExploit core proximity in CMPExploit core proximity in CMP

Page 8: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 8

Talk Outline

Motivation Computation Spreading (CSP)

Case study: OS and User compution Implementation Results Related Work and Summary

Page 9: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 9

Computation Spreading (CSP)

Computation fragment = dynamic instruction stream portion

Collocate similar computation fragments from multiple threads

Enhance constructive interference

Distribute dissimilar computation fragments from a single thread Reduce destructive interference

Reassignment is the key

Page 10: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 10

Example

A1

B1

C1

B2

C2

A2

C3

A3

B3

T1 T2 T3

B3

A3

C3A1

C1

B1

B2

C2

A2

P1 P2 P3

CCAANNOONNIICCAALL

CCSSPP

time

A1

B1

C1

B2

C2

A2

C3

A3

B3

Page 11: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 11

Key Aspects

Dynamic SpecializationHomogeneous multicore acquires specialization via retaining mutually exclusive predictive state

Data LocalityData dependencies between different computation fragmentsCareful fragment selection to avoid loss of data locality

Page 12: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 12

Selecting Fragments

Server workloads characteristicsLarge data and instruction footprintSignificant OS computation

User Computation and OS Computation

A natural separationExclusive instruction footprints

Relatively independent Relatively independent data footprint

Page 13: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 13

Data Communication

T1 T2

T1-User

T1-OS

T2-User

T2-OS

Core 1 Core 2

Page 14: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 14

Relative Inter-core Data Communication

Apache OLTP

OS-User Communication is limited

Page 15: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 15

Talk Outline

Motivation Computation Spreading (CSP)

Case study: OS and User compution Implementation Results Related Work and Summary

Page 16: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 16

Implementation

Migrating ComputationTransfer state through the memory subsystem

• ~2KB of register state in SPARC V9• Memory state through coherence

Lightweight Virtual Machine Monitor

Migrates computation as dictated by the CSP PolicyImplemented in hardware/firmware

Page 17: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 17

BaselineUser Cores

OS Cores

User CompOS Comp

Virtual CPUs

Physical

Cores

Software

Stack

Implementation contThreads

Page 18: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 18

User Cores

OS Cores

Virtual CPUs

Physical

Cores

Software

Stack

Implementation contThreads

Page 19: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 19

CSP Policy

Policy dictates computation assignment

Thread Assignment Policy (TAP)Maintains affinity between VCPUs and physical cores

Syscall Assignment Policy (SAP)OS computation assigned based on system calls

TAP and SAP use identical assignment for user computation

Page 20: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 20

Talk Outline

Motivation Computation Spreading (CSP)

Case study: OS and User compution Implementation Results Related Work and Summary

Page 21: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 21

Simulation Methodology Virtutech SIMICS MAI running Solaris 9 CMP system: 8 out-of-order processors

2 wide, 8 stages, 128 entry ROB, 3GHz 3 level memory hierarchy

Private L1 and L2Directory base MOSIL3: Shared, Exclusive 8MB (16w) (75 cycle load-to-use)Point to point ordered interconnect (25 cycle latency)Main Memory 255 cycle load to use, 40GB/s

Measure impact on predictive structures

Page 22: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 22

L2 Instruction Reference

Page 23: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 23

Result Summary

Branch predictors9-25% reduction in mis-predictions

L2 data references0-19% reduction in load missesModerate increase in store misses

Interconnect messagesModerate reduction (after accounting extra messages for migration)

Page 24: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 24

Performance Potential

Migration Overhead

Page 25: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 25

Talk Outline

Motivation Computation Spreading (CSP)

Case study: OS and User compution Implementation Results Related Work and Summary

Page 26: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 26

Related Work

Software re-design: staged executionCohort Scheduling [Larus and Parkes 01], STEPS [Ailamaki 04], SEDA [Welsh 01], LARD [Pai 98]CSP: similar execution in hardware

OS and User Interference [several]Structural separation to avoid interferenceCSP avoids interference and exploits synergy

Page 27: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 27

Summary

Extensive code reuse in CMPs45-66% instruction blocks universally accessed in server workloads

Computation SpreadingLocalize similar computation and separate dissimilar computationExploits core proximity in CMPs

Case Study: OS and User computationDemonstrate substantial performance potential

Page 28: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 28

Thank You!

Page 29: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 29

Backup Slides

Page 30: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 30

L2 Data Reference

L2 load miss comparable, slight to moderate increase in L2 store miss

Page 31: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 31

Multiprocessor Code Reuse

Page 32: Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly

Chakraborty, Wells, and Sohi ASPLOS 2006 32

Performance Potential