24
Spring 2005 CSE P548 1 Instruction-Level Parallelism (ILP) Fine-grained parallelism Obtained by: instruction overlap in a pipeline executing instructions in parallel (later, with multiple instruction issue) In contrast to: loop-level parallelism (medium-grained) process-level or task-level or thread-level parallelism (coarse- grained)

Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

  • Upload
    others

  • View
    7

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

81

Inst

ruct

ion

-Lev

el P

aral

lelis

m (

ILP

)

Fin

e-g

rain

ed p

aral

lelis

m

Obt

aine

d by

:

•in

stru

ctio

n ov

erla

p in

a p

ipel

ine

•ex

ecut

ing

inst

ruct

ions

in p

aral

lel (

late

r, w

ith m

ultip

le in

stru

ctio

n is

sue)

In c

ontr

ast t

o:

•lo

op

-lev

elpa

ralle

lism

(m

ediu

m-g

rain

ed)

•p

roce

ss-l

evel

or t

ask-

leve

lor

thre

ad-l

evel

para

llelis

m (

coar

se-

grai

ned)

Page 2: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

82

Inst

ruct

ion

-Lev

el P

aral

lelis

m (

ILP

)

Can

be

expl

oite

d w

hen

inst

ruct

ion

oper

ands

are

ind

epen

den

tof

eac

h ot

her,

for

exam

ple,

•tw

o in

stru

ctio

ns a

re in

depe

nden

t if t

heir

oper

ands

are

diff

eren

t

•an

exa

mpl

e of

inde

pend

ent i

nstr

uctio

ns

Eac

h th

read

(pr

ogra

m)

has

a fa

ir am

ount

of p

oten

tial I

LP

•ve

ry li

ttle

can

be e

xplo

ited

on to

day’

s co

mpu

ters

•re

sear

cher

s tr

ying

to in

crea

se it

ld R1, 0(R2)

or R7, R3, R8

Page 3: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

83

Dep

end

ence

s

dat

a d

epen

den

ce: a

rises

from

the

flow

of v

alue

s th

roug

h pr

ogra

ms

•co

nsum

er in

stru

ctio

n ge

ts a

val

ue fr

om a

pro

duce

r in

stru

ctio

n•

dete

rmin

es th

e or

der

in w

hich

inst

ruct

ions

can

be

exec

uted

nam

e d

epen

den

ce: i

nstr

uctio

ns u

se th

e sa

me

regi

ster

but

no

flow

of d

ata

betw

een

them

•an

tid

epen

den

ce

•o

utp

ut

dep

end

ence

ld R1, 32(R3)

add R3, R1, R8

ld R1, 32(R3)

add R3, R1, R8

ld R1,

16 (

R3)

Page 4: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

84

Dep

end

ence

s

con

tro

ldep

ende

nce

•ar

ises

from

the

flow

of c

ontr

ol

•in

stru

ctio

ns a

fter

a br

anch

dep

end

on th

e va

lue

of th

e br

anch

’s

cond

ition

var

iabl

e

Dep

ende

nces

inhi

bit I

LP

beqzR2, target

lwr1, 0(r3)

target:

addr1, ...

Page 5: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

85

Pip

elin

ing

Impl

emen

tatio

n te

chni

que

(but

it is

vis

ible

to th

e ar

chite

ctur

e)

•ov

erla

ps e

xecu

tion

of d

iffer

ent i

nstr

uctio

ns

•ex

ecut

e al

l ste

ps in

the

exec

utio

n cy

cle

sim

ulta

neou

sly,

but

on

diffe

rent

inst

ruct

ions

Exp

loits

ILP

by

exec

utin

g se

vera

l ins

truc

tions

“in

par

alle

l”

Goa

l is

to in

crea

se in

stru

ctio

n th

roug

hput

Page 6: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

86

Pip

elin

ing

Page 7: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

87

Pip

elin

ing

No

t th

at s

imp

le!

•pi

pelin

e h

azar

ds

(str

uctu

ral,

data

, con

trol

)

•pl

ace

a so

ft “li

mit”

on

the

num

ber

of s

tage

s

•in

crea

se in

stru

ctio

n la

tenc

y (a

littl

e)

•w

rite

& r

ead

pipe

line

regi

ster

s fo

r da

ta th

at is

com

pute

d in

a

stag

e

•tim

e fo

r cl

ock

& c

ontr

ol li

nes

to r

each

all

stag

es

•al

l sta

ges

are

the

sam

e le

ngth

whi

ch is

det

erm

ined

by

the

long

est s

tage

•st

age

leng

th d

eter

min

es c

lock

cyc

le ti

me

IBM

Str

etch

(19

61):

the

first

gen

eral

-pur

pose

pip

elin

ed c

ompu

ter

Page 8: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

88

Haz

ard

s

Str

uctu

ral h

azar

ds

Dat

a ha

zard

s

Con

trol

haz

ards

Wha

t hap

pens

on

a ha

zard

•in

stru

ctio

n th

at c

ause

d th

e ha

zard

& p

revi

ous

inst

ruct

ions

com

plet

e

•al

l sub

sequ

ent i

nstr

uctio

ns s

tall

until

the

haza

rd is

rem

oved

(in-o

rder

exe

cutio

n)

•on

ly in

stru

ctio

ns th

at d

epen

d on

that

inst

ruct

ion

stal

l(o

ut-o

f-or

der

exec

utio

n)

•ha

zard

rem

oved

•in

stru

ctio

ns c

ontin

ue e

xecu

tion

Page 9: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

89

Str

uct

ura

l Haz

ard

s

Cau

se:

inst

ruct

ions

in d

iffer

ent s

tage

s w

ant t

o us

e th

e sa

me

reso

urce

in

the

sam

e cy

cle

e.g.

, 4 F

P in

stru

ctio

ns r

eady

to e

xecu

te &

onl

y 2

FP

uni

ts

So

luti

on

s:

•m

ore

hard

war

e (e

limin

ate

the

haza

rd)

•st

all (

tole

rate

the

haza

rd)

•le

ss h

ardw

are,

low

er p

erfo

rman

ce

•on

ly fo

r bi

g ha

rdw

are

com

pone

nts

Page 10: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

810

Page 11: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

811

Dat

a H

azar

ds

Cau

se:

•an

inst

ruct

ion

early

in th

e pi

pelin

e ne

eds

the

resu

lt pr

oduc

ed b

y an

in

stru

ctio

n fa

rthe

r do

wn

the

pipe

line

befo

re it

is w

ritte

n to

a r

egis

ter

•w

ould

not

hav

e oc

curr

ed if

the

impl

emen

tatio

n w

as n

ot p

ipel

ined

Typ

es RA

W (

data

: flo

w),

WA

R (

nam

e: a

ntid

epen

denc

e), W

AW

(na

me:

ou

tput

)

HW

so

luti

on

s

•fo

rwar

ding

har

dwar

e (e

limin

ate

the

haza

rd)

•st

all v

ia p

ipel

ined

inte

rlock

s

Co

mp

iler

solu

tio

n

•co

de s

ched

ulin

g (f

or lo

ads)

Page 12: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

812

Dep

end

ence

s vs

. Haz

ard

s

Page 13: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

813

Fo

rwar

din

g

Fo

rwar

din

g(a

lso

calle

d b

ypas

sin

g):

•ou

tput

of o

ne s

tage

(th

e re

sult

in th

at s

tage

’s p

ipel

ine

regi

ster

) is

bu

sed

(byp

asse

d) to

the

inpu

t of a

pre

viou

s st

age

•w

hy fo

rwar

ding

is p

ossi

ble

•re

sults

are

com

pute

d 1

or m

ore

stag

es b

efor

e th

ey a

re w

ritte

n to

a r

egis

ter

•at

the

end

of th

e E

X s

tage

for

com

puta

tiona

l ins

truc

tions

•at

the

end

of M

EM

for

a lo

ad

•re

sults

are

use

d 1

or m

ore

stag

es a

fter

regi

ster

s ar

e re

ad

•if

you

forw

ard

a re

sult

to a

n A

LU in

put a

s so

on a

s it

has

been

co

mpu

ted,

you

can

elim

inat

e th

e ha

zard

or

redu

ce s

talli

ng

Page 14: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

814

Fo

rwar

din

g E

xam

ple

Page 15: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

815

Fo

rwar

din

g Im

ple

men

tati

on

Fo

rwar

din

g u

nit

che

cks

whe

ther

forw

arde

d va

lues

sho

uld

be u

sed:

•be

twee

n in

stru

ctio

ns in

ID a

nd E

X

•co

mpa

re th

e R

-typ

e d

esti

nat

ion

reg

iste

r n

um

ber

in E

X/M

EM

pipe

line

regi

ster

to e

ach

sou

rce

reg

iste

r n

um

ber

in ID

/EX

•be

twee

n in

stru

ctio

ns in

ID a

nd M

EM

•co

mpa

re th

e R

-typ

e d

esti

nat

ion

reg

iste

r n

um

ber

in M

EM

/WB

to e

ach

sou

rce

reg

iste

r n

um

ber

in ID

/EX

If a

mat

ch, s

et M

UX

to c

hoos

e bu

ssed

val

ues

from

EX

/ME

Mor

ME

M/W

B

Page 16: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

816

prod

ucer

prod

ucer

cons

umer

Page 17: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

817

Fo

rwar

din

g H

ard

war

e

Har

dwar

e to

impl

emen

t for

war

ding

:

•de

stin

atio

n re

gist

er n

umbe

r in

pip

elin

e re

gist

ers

(but

nee

d it

anyw

ay b

ecau

se w

e ne

ed to

kno

w w

hich

reg

iste

r to

w

rite

whe

n st

orin

g an

ALU

or

load

res

ult)

•so

urce

reg

iste

r nu

mbe

rs(p

roba

bly

only

one

, e.g

., rs

on M

IPS

R2/

3000

)is

ext

ra)

•a

com

para

tor

for

each

sou

rce-

dest

inat

ion

regi

ster

pai

r

•bu

ses

to s

hip

data

and

reg

iste

r nu

mbe

rs −

the

BIG

cost

•la

rger

ALU

MU

Xes

for

2 by

pass

val

ues

Page 18: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

818

Lo

ads

Lo

ads •

data

haz

ard

caus

ed b

y a

load

inst

ruct

ion

& a

n im

med

iate

use

of t

he

load

ed v

alue

•fo

rwar

ding

won

’t el

imin

ate

the

haza

rdw

hy?

data

not

bac

k fr

om m

emor

y un

til th

e en

d of

the

ME

M s

tage

•2

solu

tions

use

d to

geth

er

•st

allv

ia p

ipel

ined

inte

rlock

s

•sc

hed

ule

inde

pend

ent i

nstr

uctio

ns in

to th

e lo

add

elay

slo

t(a

pip

elin

e ha

zard

that

is e

xpos

ed to

the

com

pile

r) s

o th

at th

ere

will

be

no s

tall

Page 19: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

819

Lo

ads

Page 20: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

820

Imp

lem

enti

ng

Pip

elin

ed In

terl

ock

s

Det

ectin

g a

stal

l situ

atio

n

Haz

ard

det

ecti

on

un

it s

talls

the

use

afte

r a

load

•is

the

inst

ruct

ion

in E

X a

load

?

•do

es th

e de

stin

atio

n re

gist

er n

umbe

r of

the

load

= e

ither

sou

rce

regi

ster

num

ber

in th

e ne

xt in

stru

ctio

n?

•co

mpa

re th

e lo

ad w

rite

regi

ster

num

ber

in ID

/EX

to e

ach

read

re

gist

er n

umbe

r in

IF/ID

⇒ ⇒⇒⇒if

both

yes

, sta

ll th

e pi

pe 1

cyc

le

Page 21: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

821

Imp

lem

enti

ng

Pip

elin

ed In

terl

ock

s

How

sta

lling

is im

plem

ente

d:

•n

ulli

fy t

he

inst

ruct

ion

in t

he

ID s

tag

e, th

e on

e th

at u

ses

the

load

ed v

alue

•ch

ange

EX

, ME

M, W

B c

ontr

ol s

igna

ls in

ID/E

X p

ipel

ine

regi

ster

to

0

•th

e in

stru

ctio

n in

the

ID s

tage

will

hav

e no

sid

e ef

fect

sas

it

pass

es d

own

the

pipe

line

•re

star

t th

e in

stru

ctio

ns

that

wer

e st

alle

d in

ID &

IF s

tag

es•

disa

ble

writ

ing

the

PC

---

the

sam

e in

stru

ctio

n w

ill b

e fe

tche

d ag

ain

•di

sabl

e w

ritin

g th

e IF

/ID p

ipel

ine

regi

ster

---

the

load

use

in

stru

ctio

n w

ill b

e de

code

d &

its

regi

ster

s re

ad a

gain

Page 22: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

822

Lo

ads

haza

rd d

etec

tion

fetc

h ag

ain

deco

de a

gain

Page 23: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

823

Imp

lem

enti

ng

Pip

elin

ed In

terl

ock

s

Har

dwar

e to

impl

emen

t sta

lling

:

•rt

reg

iste

r nu

mbe

r in

ID/E

X p

ipel

ine

regi

ster

(but

nee

d it

anyw

ay b

ecau

se w

e ne

ed to

kno

w w

hat r

egis

ter

to w

rite

whe

n st

orin

g lo

ad d

ata)

•bo

th s

ourc

e re

gist

er n

umbe

rs in

IF/ID

pip

elin

e re

gist

er(a

lread

y th

ere)

•a

com

para

tor

for

each

sou

rce-

dest

inat

ion

regi

ster

pai

r

•bu

ses

to s

hip

regi

ster

num

bers

•w

rite

enab

le/d

isab

le fo

r P

C

•w

rite

enab

le/d

isab

le fo

r th

e IF

/ID p

ipel

ine

regi

ster

•a

MU

X to

the

ID/E

X p

ipel

ine

regi

ster

(+

0s)

Triv

ial a

mou

nt o

f har

dwar

e &

nee

ded

for

cach

e m

isse

s an

yway

Page 24: Instruction-Level Parallelism (ILP) · 2005. 5. 18. · Instruction-Level Parallelism (ILP) Can be exploited when instruction operands are Spring 2005 CSE P548 2 independent of each

Spri

ng 2

005

CSE

P54

824

Co

ntr

ol H

azar

ds

Cau

se:

cond

ition

& ta

rget

det

erm

ined

afte

r th

e ne

xt fe

tch

has

alre

ady

been

do

ne

Ear

ly H

W s

olu

tio

ns

•st

all

•as

sum

e an

out

com

e &

flus

h pi

pelin

e if

wro

ng

•m

ove

bran

ch r

esol

utio

n ha

rdw

are

forw

ard

in th

e pi

pelin

e

Co

mp

iler

solu

tio

ns

•co

de s

ched

ulin

g

•st

atic

bra

nch

pred

ictio

n

To

day

’s H

W s

olu

tio

ns

•dy

nam

ic b

ranc

h pr

edic

tion