Procesadores Superescalares

Prof. Mateo Valero

Procesadores Superescalares

Las Palmas de Gran Canaria

26 de Noviembre de 1999

M. Valero 2

Initial developments

• Mechanical machines

• 1854: Boolean algebra by G. Boole

• 1904: Diode vacuum tube by J.A. Fleming

• 1946: ENIAC by J.P. Eckert and J. Mauchly

• 1945: Stored program by J.V. Neuman

• 1949: EDSAC by M. Wilkes

• 1952: UNIVAC I and IBM 701

M. Valero 3

Eniac 1946

M. Valero 4

EDSAC 1949

M. Valero 5

Pipeline

M. Valero 6

Superscalar ProcessorF

etch

Dec

ode

Ren

ame

Inst

ruct

ion

Win

dow

Wak

eup+

sele

ct

Reg

iste

rfi

le

Byp

ass

Dat

a C

ache

Fetch of multiple instructions every cycle.Rename of registers to eliminate added dependencies.Instructions wait for source operands and for functional units.Out- of -order execution, but in order graduation.

Scalable Pipes

M. Valero 7

Technology Trends and Impact

0

500

1000

1500

2000

2500

3000

3500

0.80 micras0.35 micras0.18 micras

Delay in Psec.

Issue Width= 4 Issue Width= 8

S. Palacharla et al ¨Complexity Effective…¨. ISCA 1997. Denver.

ROB Size = 32 ROB Size = 64

M. Valero 8

Physical Scalability

0102030405060708090

100

0,25 0,18 0,13 0,1 0,08 0,06

Processor generation (microns)

Di

e

re

ac

ha

bl

e

(%

)

1 clock2 clocks4 clocks8 clocks16 clocks

0,25 0,18 0,13 0,1 0,08 0,06

Doug Matzke. ¨ Will Physical Scalability… ¨. IEEE Computer. Sept. 1997. pp 37-39.

Die

rea

chab

le (

per

cen

t)

Processor generation (microns)

M. Valero 9

Register influence on ILP

• Spec95

0,4

0,9

1,4

1,9

2,4

2,9

3,4

3,9

48 64 96 128 160 192 224 256

Register file size

IPC Integer

Floating Point

8-way fetch/issuewindow of 256 entriesup to 1 taken branchg-share 64k entriesOne cycle latency

M. Valero 10

Register File Latency

– 66% and 20% performance improvement when moving from 2 to 1-cycle latency

1

1,5

2

2,5

3

3,5

4

4,5

IPC

appl

u

apsi

fppp

p

hydr

o2d

mgr

id

su2c

or

swim

tom

catv

turb

3d

wav

e5

Hm

ean

1 cycle 2 cycle

0,5

0,7

0,9

1,1

1,3

1,5

1,7

1,9

2,1

2,3

IPC

com

pres

s

gcc go

ijpeg li

m88

ksim perl

vort

ex

Hm

ean

1 cycle 2 cycle

M. Valero 11

Outline

• Virtual-physical register• A register file cache• VLIW architectures

M. Valero 12

Virtual-Physical Registers

• Motivation

– Conventional renaming scheme

– Virtual-Physical Registers

Icache Decode&Rename Commit

Register unusedRegister

used

Register used

M. Valero 13

load f2, 0(r4)fdiv f2, f2, f10fmul f2, f2, f12fadd f2, f2, 1

load p1, 0(r4)fdiv p2, p1, p10fmul p3, p2, p12fadd p4, p2, 1

renameCache miss: 20Fdiv: 20Fmul: 10Fadd: 5

Example

– Register pressure: average registers per cycle

0 5 10 15 20 25 30 35 40 45 50 55

p4

p3

p2

p1

p4

p3

p2

p1

Conventional: 3.6

Virtual-Physical: 0.7

M. Valero 14

Percentage of Used/Wasted Registers

0

20

40

60

80

100

120

UsedWasted

0

20

40

60

80

100

120

140

M. Valero 15

Virtual-Physical register• Physical register play two different roles

– Keep track of dependences (decode)– Provide a storage location for results (write-

back)• Proposal: Three types of registers

– Logical: Architected registers– Virtual-Physical (VP): Keep track of

dependences– Physical: Store values

• Approach– Decode: rename from logical to VP– Write-back (or issue): rename from VP to

physical

M. Valero 16


• Hardware support

VPreg

DS

rc1

R1

Src

2R

2

Inst

. que

ue

Lre

gC

VP

reg

RO

B

VP Preg VLreg

General Map Table

Preg

Phy. Map Table

Fet

ch

Decode IssueE

xecu

teWrite-back Commit

M. Valero 17


• No free physical register– Re-execute but… if it is the oldest instruction…

– Avoiding deadlock• A number (NRR) of registers are reserved for the oldest

instructions

• 21% speedup for Spec95 on a 8-way issue [HPCA-4]

– Conclusions– Optimal NRR is different for each program

– For a given program, best NRR may be different for different sections of code

M. Valero 18

Virtual-Physical Registers– Performance evaluation

• SimpleScalar OoO with modified renaming

• 8-way issue• RUU: 128 entries• FU (latency)

» 8 Simple int. (1)» 4 Int Mult (7)» 6 Simple FP (4)» 4 FP Mult (4)» 4 FP Div (16)» 4 mem ports

• L1 Dcache» 32 KB, 2-way, 32

B/line, 1 cycle

• L1 Icache» 32 KB, 2-way, 64

B/line, 1 cycle

• L2 cache» 1 MB, 2-way, 64 B/line,

12 cycles

• Main memory» 50 cycles

• Branch prediction» 18-bit Gshare» 2 taken branches

• Benchmarks: SPEC95» Compac/Dec compilers -

O5

M. Valero 19


– Performance evaluation

5

1 0 13

6

29

22

42

20

0

5

10

15

20

25

30

35

40

45%

Sp

eed

up

Speedup for 64 registers

M. Valero 20

IPC and NRR

1

1,5

2

2,5

3

3,5

1 4 8 16 24 36

liapplu

M. Valero 21

Virtual-Physical Registers• What is the optimal allocation policy ?

– Approximation• Registers should be allocated to the instructions that can use

them earlier (avoid unused registers)

• If some instruction should be stall because of the lack of registers, choose the latest instructions (delaying the earliest would also delay the commit of the latest)

– Implementation• Each instruction allocates a physical register in the write-

back. If none available, it steals the register from the latest instruction after the current

M. Valero 22

DSY Performance

1,9

2,1

2,3

2,5

2,7

2,9

3,1

3,3

com

pres

s gcc go li

perl

Hm

ean

conventionalvp-originalvp-dsy

1,51,71,92,12,32,52,72,93,13,3

mgr

id

tom

catv

appl

u

swim

hydr

o2d

Hm

ean

SpecInt95 SpecFp99

M. Valero 23

Performance and Number of Registers

2,2

2,3

2,4

2,5

2,6

2,7

2,8

48 64 80 96 128 160

conventionalvp-originalvp-dsy

1,2

1,4

1,6

1,8

2

2,2

2,4

2,6

2,8

3

48 64 80 96 128 160

SpecFp95SpecIn95

M. Valero 24

Outline

• Virtual-physical register• A register file cache• VLIW architecture

M. Valero 25

Register Requirements

SpecInt95

0

20

40

60

80

100

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

Value & InstructionValue & ready Instruction

SpecFP95

0

20

40

60

80

100

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

Value & InstructionValue & Ready Instruction

M. Valero 26

Register File Latency

– 66% and 20% performance improvement when moving from 2 to 1-cycle latency

1

1,5

2

2,5

3

3,5

4

4,5

IPC

appluapsi

fpppphydro2d

mgridsu2cor

swimtomcatv

turb3d

wave5Hmean

1 cycle 2 cycle

0,5

0,7

0,9

1,1

1,3

1,5

1,7

1,9

2,1

2,3

IPC

compress

gccgo ijpeg

li m88ksim

perlvortex

Hmean

1 cycle 2 cycle

M. Valero 27

Register File Bypass

0,50,70,91,11,31,51,71,92,12,3

1-cycle, 1-bypasslevel2 cycle, 2-bypasslevels2-cycle, 1-bypasslevel

SpecInt95

M. Valero 28

Register File Bypass

1

1,5

2

2,5

3

3,5

4

4,5

applu

apsi

fppp

hydro2dm

grid

su2corsw

in

tomcatv

turb3d

wave5

Hm

ean

1-cycle, 1-bypasslevel2 cycle, 2-bypasslevels2-cycle, 1-bypasslevel

SpecFP95

M. Valero 29

Register File Cache

• Organization– Bank 1 (Register File)

• All registers (128)

• 2-cycle latency

– Bank 2 (Reg. File Cache)

• A subset of registers (16)

• 1-cycle latency

RF

RFC

M. Valero 30

Experimental Framework

– OoO simulator• 8-way issue/commit

• Functional Units (lat.)– 2 Simple integer (1)– 3 Complex integer

» Mult. (2)» Div. (14)

– 4 Simple FP (2)– 2 FP div.: 2 (14)– 3 Branch (1)– 4 Load/Store

• 128-entry ROB

• 16-bit Gshare

• Icache and Dcache– 64 KB– 2-way set-associative– 1/8-cycle hit/miss– Dcache: Lock-up free-16

outstanding misses

– Benchmarks• Spec95• DEC compiler -O4 (int.) -O5

(FP)• 100 million after inizialitations

– Access time and area models• Extension to Wilton&Jouppi

models

M. Valero 31

Caching Policy (1 of 3)

• First policy• Many values (85%-Int and

84%-FP) are used at most once

• Thus, only non-bypassed values are cached

• FIFO replacement

RF

RFC

M. Valero 32

Performance

– 20% and 4% improvement over 2-cycle

– 29% and 13% degradation over 1-cycle

0,5

0,7

0,9

1,1

1,3

1,5

1,7

1,9

2,1

2,3

IPC

com

pres

s

gcc go

ijpe

g li

m88

ksim perl

vort

ex

Hm

ean

1 cycle RFC.1 2 cycle

1

1,5

2

2,5

3

3,5

4

4,5

IPC

appl

u

apsi

fppp

p

hydr

o2d

mgr

id

su2c

or

swim

tom

catv

turb

3d

wav

e5

Hm

ean


M. Valero 33


• Second policy• Values that are sources of

any non-issued instruction with all its operands ready

– Not issued because of lack of functional units

– or, the other operand in in the main register file

RF

RFC

M. Valero 34

Performance



0,5

0,7

0,9

1,1

1,3

1,5

1,7

1,9

2,1

2,3

IPC

com

pres

s

gcc go

ijpe

g li

m88

ksim perl

vort

ex

Am

ean

Hm

ean


1

1,5

2

2,5

3

3,5

4

4,5

IPC

appl

u

apsi

fppp

p

hydr

o2d

mgr

id

su2c

or

swim

tom

catv

turb

3d

wav

e5

Hm

ean


M. Valero 35


• Third policy• Values that are sources of any non-issued

instruction with all its operands ready

• Prefetching– Table that for each physical register indicates which is

the other operand of the first instruction that uses it

• Replacement: give priority to those values already read at least once

M. Valero 36

Performance



0,5

0,7

0,9

1,1

1,3

1,5

1,7

1,9

2,1

2,3

IPC

com

pres

s

gcc go

ijpeg li

m88

ksim perl

vort

ex

Hm

ean


1

1,5

2

2,5

3

3,5

4

4,5

IPC

appl

u

apsi

fppp

p

hydr

o2d

mgr

id

su2c

or

swim

tom

catv

turb

3d

wav

e5

Hm

ean


M. Valero 37

Speed for Different RFC Architectures

0,7

0,9

1,1

1,3

1,5

1,7

1,9

2,1

C1 C2 C3 C4

1-cycle

2-cycle, one bypass

Non-bypass caching+ prefetch-first-pair

SpecInt95Taken into account access time

M. Valero 38

Speed for Different RFC Architectures

0,7

1,2

1,7

2,2

2,7

3,2

C1 C2 C3 C4

1-cycle

2-cycle, one bypass

Non-bypass caching+ prefetch-first-pair

SpecFp95

M. Valero 39

Conclusions

– Register file access time is critical

– Virtual-physical registers significantly

reduce the register pressure

• 24% improvement for SpecFP95

– A register file cache can reduce the average

access time

• 27% and 7% improvement for a two-level,

locality-based partitioning architecture

High performance instruction fetch through a

software/hardware cooperationAlex Ramirez

Josep Ll. Larriba-Pey

Mateo Valero

UPC-Barcelona

M. Valero 41

Superscalar ProcessorF

etch

Dec

ode

Ren

ame

Inst

ruct

ion

Win

dow

Wak

eup+

sele

ct

Reg

iste

rfi

le

Byp

ass

Dat

a C

ache

Fetch of multiple instructions every cycle.

Rename of registers to eliminate added dependencies.

Instructions wait for source operands and for functional units.

Out- of -order execution, but in order graduation.

J.E. Smith and S.Vajapeyam.¨Trace Processors…¨ IEEE Computer.Sept. 1997. pp68-74.

M. Valero 42

Motivation

• Instruction Fetch rate important not only in steady state– Program start-up– Miss-speculation points– Program segments with little ILP

InstructionFetch &Decode

InstructionFetch &Decode

InstructionExecutionInstructionExecution

Instruction Queue(s)

Branch /Jump outcome

M. Valero 43

Motivation

• Instruction fetch effectively limits the performance of superscalar processors– Even more relevant at program startup points

• More aggressive processors need higher fetch bandwidth– Multiple basic block fetching becomes necessary

• Current solutions need extensive additional hardware– Branch address cache– Collapsing buffer: multi-ported cache – Trace cache: special purpose cache

M. Valero 44

PostgreSQL

1

1.2

1.4

1.6

1.8

2

2.2

2.4

32KB 64KB F4 F8 F16 PBr Pic Bw4 Bw8 Bw16 PF- PF4

Postgres

64KB I1, 64KB D1, 256KB L2

LB

BL=0

M. Valero 45

Programs Behaviour

1

1,5

2

2,5

3

3,5

32KB 64KB F4 F8 F16 PBr Pic Bw4 Bw8 Bw16 PF- PF4

Postgres Gcc Vortex

64KB I1, 64KB D1, 256KB L2

M. Valero 46

The Fetch Unit (1 of 3)Fetch

Address

Instruction Cache

(i-cache)

Instruction Cache

(i-cache)

Shift & MaskShift & Mask

BranchPrediction

Mechanism

BranchPrediction

Mechanism

Next Address Logic

Next Address Logic

Scalar Fetch Unit To Decode

Next Fetch Address

• Scalar Fetch Unit

– Few instructions per cycle

– 1 branch

• Limitations

– Prediction accuracy

– I-cache miss rate

• Prev. work, code reordering– Fisher (IEEE Tr. on Comp. 81)

– Hwu and Chang (ISCA’89)

– Petis and Hansen (Sigplan’90)

– Torrellas et al. (HPCA’95)

– Kalamatianos et al. (HPCA’98)Software,reduce cachemisses

M. Valero 47


Address

Instruction Cache

(i-cache)

Instruction Cache

(i-cache)


BranchTargetBuffer

BranchTargetBuffer

ReturnStack

ReturnStack

MultipleBranch

Predictor

MultipleBranch

Predictor

Next Address Logic

Next Address Logic

Aggressive Core Fetch Unit

• Aggressive Fetch Unit

– Lot of instructions per cycle

– Several branches

• Limitations

– Prediction accuracy

– Sequentiality

– I-cache miss rate

• Prev. work, trace building– Yeh et al. (ICS’93)

– Conte et al. (ISCA’95)

– Rottenberg et al. (MICRO’96)

– Friendly et al. (MICRO’97)Hardware,form tracesat run time

To Decode Next Fetch Address

M. Valero 48

Trace Cache

b1

b2b3

b4b7 b6

b5

b0

b8

Trace is a sequence of logically contiguos instructions.

Trace cache line stores a segment of the dynamic instruction traces across multiple, potentially, taken branches:(b1-b2-b4, b1-b3-b7….)

It is indexed by fetch address and branches outcome

History-based fetch mecanism.

M. Valero 49


Address

Instruction Cache

(i-cache)

Instruction Cache

(i-cache)


BranchTargetBuffer

BranchTargetBuffer

ReturnStack

ReturnStack

MultipleBranch

Predictor

MultipleBranch

Predictor

Next Address Logic

Next Address Logic

Aggressive Core Fetch Unit

To Decode Next Fetch Address

Trace Cache(t-cache)

Trace Cache(t-cache)

FillBuffer

FillBuffer

From Fetch or CommitTrace Cache aims atforming traces run time

M. Valero 50

Our Contribution• Mixed software-hardware approach

– Optimize performance at compile-time• Use profiling information• Make optimum use of the available hardware

– Avoid redundant work at run-time• Do not repeat what was done at compile-time• Adapt hardware to the new software

• Software Trace Cache– Profile-directed code reordering & mapping

• Selective Trace Storage– Fill Unit modification

M. Valero 51

Our Work

• Workload analysis– Temporal locality– Sequentiality

• Software Trace Cache– Seed selection– Trace building– Trace mapping– Results

• Selective Trace Storage– Counting blue traces– Implementation– Results

32KB instruction cache64KB trace cache

6,5

7,5

8,5

9,5

10,5

gcc li postgres

FIP

A

Base TC STC STS

M. Valero 52

Dynamic referencesBenchmark75% 90% 99%

Codesize

swim 148 232 763 110350hydro2d 1223 1977 5371 125946applu 2407 5060 10509 132803m88ksim 458 1006 2863 51341li 325 563 1365 38126gcc 9595 22098 57878 349382compress 243 338 525 21991postgres 2716 5221 11748 374399

Workload Analysis (Reference Locality)

• Considerable amount of reference locality

M. Valero 53

Workload Analysis (Sequentiality)Benchmark Unpredictable Predictableswim 45.3 54.7mgrid 19.9 81.1apsi 22.1 77.9m88ksim 37.3 62.7li 49.2 50.8gcc 60.1 39.9ijpeg 70.2 29.8postgres 23.8 76.2

Loop branches Indirect jumps Subroutine returns Unpredictable conditional

branches

Fall-through Unconditional branches Conditional branches with Fixed

Behaviour Subroutine calls

Predictable Un-predictable

M. Valero 54

Software Trace Cache• Profile directed code reordering

– Obtain a weighted control flow graph– Select seeds or starting basic blocks– Build basic block traces

• Map dynamically consecutive basic blocks to physically contiguous storage

• Move unused basic blocks out of the execution path

– Carefully map these traces in memory• Avoid conflict misses in the most popular traces• Minimize conflicts among the rest

• Increased role of the instruction cache– Able to provide longer instruction traces

M. Valero 55

STC : Seed Selection

• All procedure entry points– Ordered by popularity– Starts building traces on the most popular procedures

• Knowledge based selection– Based on source code knowledge– Leads to longer sequences

• Inlining of the main path of found procedures

– Loses temporal locality• Less popular basic blocks surround the most popular ones

M. Valero 56

STC : Trace Building

• Greedy algorithm– Follow the most likely

path out of a basic block– Add secondary seeds for

all other targets• Two threshold values

– Execution threshold• Do not include

unpopular basic blocks– Transition threshold

• Do not follow unlikely transitions

• Iterate process with less restrictive thresholds

2.4

A1

A2

A3

A4 A5

A6 A7

A8

B1

C1

C2

C3

C5 C4

10

10

10

6 4

7.6

10

30

20

11

150

20

20

1

0.4

1

1

0.6

0.1

0.9

0.45

0.55

1

0.01

1

0.4

0.6

0.10.9

0.99

Branch Threshold

Branch Threshold

Valid,visit later

Valid,visit later

Exec Threshold

M. Valero 57

STC : Trace Mapping

CFA

I-cache sizeNo code here

I-cache

Most popular traces Least popular traces

M. Valero 58

I-cache Miss Rate

Instruction Cache(i-cache)


Xchange, Shift & MaskXchange, Shift & Mask

BTBBTB RASRAS BPBP

Next Address LogicNext Address Logic

Code Layout CacheI-cache/CFA Base P&H Torr Auto Ops 2-way Victim8KB I-cache 6.5 3.0 * * * 6.1 5.6

2KB CFA 2.3 2.2 2.14KB CFA 2.9 4.2 2.96KB CFA

* *3.1 2.3 5.2

* *

32KB I-cache 2.7 0.3 * * * 1.2 1.64KB CFA 0.2 0.3 0.28KB CFA 0.2 0.4 0.2

24KB CFA* *

0.2 0.3 0.2* *

64KB I-cache 1.4 0.09 * * * 0.3 0.48KB CFA 0.05 0.07 0.04

16KB CFA 0.14 0.08 0.0524KB CFA

* *0.02 0.03 0.03

* *

M. Valero 59

Fetch Bandwidth




BTBBTB RASRAS BPBP


Code Layout Trace CacheI-cache/CFA

Base P&H Torr Auto Ops 16KB 16KB+ops

IDEAL 7.6 9.6 8.5-9.9 9.9 10.7 10.3 12.28KB I-cache 3.1 5.2 * * * 5.1 *

2KB CFA 5.6 6.0 6.2 8.44KB CFA 5.0 5.3 6.6 8.76KB CFA

* *4.9 5.8 5.6

*8.1

32KB I-cache 4.7 8.8 * * * 7.2 *4KB CFA 8.9 9.2 10.0 11.58KB CFA 8.4 8.8 10.1 11.5

24KB CFA* *

8.2 9.2 10.1*

11.664KB I-cache 1.4 9.3 * * * 8.6 *

8KB CFA 8.8 9.8 10.6 12.016KB CFA 8.4 9.7 10.5 12.124KB CFA

* *8.5 9.8 10.6

*12.1

M. Valero 60

STC : Results

32KB Instruction cache, 64KB Trace cache

2,2

4,41

3,13

2,65

4,61

5,05

2,55

4,95

4,54

2,97

5,11

5,64

2

3

4

5

6

gcc li postgres

FIP

C

BaseSTCTCS/HTC

M. Valero 61

STC: Conclusions

• STC increases the role of the core fetch unit– Build traces at compile-time

• Increases code sequentiality– Map them carefully in memory

• Reduces instruction cache miss rate

• Increased core fetch unit performance– Trace cache-like performance with no additional

hardware cost• Compile-time solution

or ...– Optimum results with a small supporting trace cache

• Better fail-safe mechanism on a trace cache miss

M. Valero 62

Selective Trace Storage

• The STC constructed traces at compile time– Blue traces

• Built at compile-time• Traces containing only consecutive instructions• May be provided by the instruction cache in a single cycle

– Red traces• Built at run-time• Traces containing taken branches• Can be provided by the trace cache in a single cycle

• Blue traces need not be stored in the trace cache– Better usage of the storage space

• Better performance with same cost• Equivalent performance at lower cost

M. Valero 63

STS: Counting Blue Traces

0%

20%

40%

60%

80%

100%

3+ breaks210

Reordering reduces the number of

breaksHigh degree of redundancy,

even in the original code

M. Valero 64

STS: Implementation

FillUnitFillUnit

BranchTargetBuffer

BranchTargetBuffer

MultipleBranch

Predictor

MultipleBranch

Predictor

ReturnAddress

Stack

ReturnAddress

Stack

Fetch Address

Filter outblue traces

in the fill unit



To DecodeNext Fetch Address

Blue (redundant)

trace

Red tracecomponents

Hit

M. Valero 65

STS: FIPA - Realistic Branch Predictor

7

7.5

8

8.5

9

9.5

10

10.5

11

11.5

Gcc Li Postgres

M. Valero 66

STS: FIPC - Realistic BP - 64KB i-cache

2

2.5

3

3.5

4

4.5

5

5.5

6

Gcc Li Postgres

M. Valero 67

STS: FIPA - Perfect Branch Predictor

8

8.5

9

9.5

10

10.5

11

11.5

12

Gcc Li Postgres

M. Valero 68

STS: Conclusions

• Minor hardware modification– Filter out blue traces in the fill unit

• Avoid redundant run-time work

• Better usage of the storage space– Higher performance with the same cost

– Equivalent performance at much lower cost

• Benefits of STS increase when used with STC– The more work done at compile-time, the less work left

to do at run-time

M. Valero 69

Conclusions

• Instruction fetch is better approached using both software and hardware techniques– Compile-time code reorganization

• Increase code sequentiality• Minimize instruction cache misses

– Avoid run-time redundant work• Do not store the same traces twice

• High fetch unit performance with little additional hardware– Small 2KB complementary trace cache & smart fill unit

M. Valero 70

Future Work• Further increasing fetch performance

– Increase i-cache performance• Reduce miss ratio• Reduce miss penalty

– Increase quality of provided instructions• Better branch prediction accuracy

– Faster recovery after mispredictions

• Take the path of least resistance– Simplicity of design– Software approach whenever possible

M. Valero 71

The End

Documents

Procesadores Superescalares