24
Computer Architecture Chapter 8 Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003

Chapter08-Multiprocessors

Embed Size (px)

DESCRIPTION

multiprocessors

Citation preview

Computer Architecture

Chapter 8

Multiprocessors

Shared Memory Architectures

Prof. Jerry Breecher

CSCI 240

Fall 2003

Chap. 8 - Multiprocessors 2

Chapter OverviewWe’re going to do only one section from this chapter, that part

related to how caches from multiple processors interact with each other.

8.1 Introduction – the big picture

8.3 Centralized Shared Memory Architectures

Chap. 8 - Multiprocessors 3

Introduction

8.1 Introduction

8.3 Centralized Shared Memory Architectures

The Big Picture: Where are We Now?

The major issue is this:

We’ve taken copies of the contents of main memory and put them in caches closer to the processors. But what happens to those copies if someone else wants to use the main memory data?

How do we keep all copies of the data in synch with each other?

Chap. 8 - Multiprocessors 4

The Multiprocessor Picture

Processor/MemoryBus

PCI Bus

I/O Busses

Example: Pentium System

Organization

Chap. 8 - Multiprocessors 5

Memory

Disk & other IO

Shared Memory Multiprocessor

Registers

Caches

Processor

Registers

Caches

Processor

Registers

Caches

Processor

Registers

Caches

Processor

Chipset •Memory: centralized with Uniform Memory Access time (“uma”) and bus interconnect, I/O•Examples: Sun Enterprise 6000, SGI Challenge, Intel SystemPro

Chap. 8 - Multiprocessors 6

• Several processors share one address space– conceptually a shared memory– often implemented just like a

multicomputer• address space distributed

over private memories• Communication is implicit

– read and write accesses to shared memory locations

• Synchronization– via shared memory locations

• spin waiting for non-zero– barriers

P

M

Network/Bus

P P

Conceptual Model

Shared Memory Multiprocessor

Chap. 8 - Multiprocessors 7

Message Passing Multicomputers

• Computers (nodes) connected by a network

– Fast network interface

• Send, receive, barrier

– Nodes not different than regular PC or workstation

• Cluster conventional workstations or PCs with fast network

– cluster computing

– Berkley NOW

– IBM SP2P

M

P

M

P

M

Network

Node

Chap. 8 - Multiprocessors 8

Large-Scale MP DesignsMemory: distributed with nonuniform memory access time (“numa”)

and scalable interconnect (distributed memory)

1 cycle

Low LatencyHigh Reliability

40 cycles100 cycles

Chap. 8 - Multiprocessors 9

Shared Memory Architectures

In this section we will understand the issues around:

• Sharing one memory space among several processors.

• Maintaining coherence among several copies of a data item.

8.1 Introduction

8.3 Centralized Shared Memory Architectures

Chap. 8 - Multiprocessors 10

The Problem of Cache Coherency

CPU

Cache

100

200

A’

B’

Memory

100

200

A

B

I/O

a) Cache and memory coherent: A’ = A, B’ = B.

CPU

Cache

550

200

A’

B’

Memory

100

200

A

B

I/OOutput of A gives 100

b) Cache and memory incoherent: A’ ^= A.

CPU

Cache

100

200

A’

B’

Memory

100

440

A

B

I/OInput 440 to B

c) Cache and memory incoherent: B’ ^= B.

Shared Memory Architectures

Chap. 8 - Multiprocessors 11

Some Simple DefinitionsShared Memory Architectures

Mechanism How It Works Performance Coherency Issues

Write Back

Write Through

Write modified data from cache to memory only

when necessary.

Write modified data from cache

to memory immediately.

Good, because

doesn’t tie up memory

bandwidth.

Not so good - uses a lot of

memory bandwidth.

Can have problems with various copies containing different

values.

Modified values always written to

memory; data always matches.

Chap. 8 - Multiprocessors 12

What Does Coherency Mean?

• Informally:

– “Any read must return the most recent write”

– Too strict and too difficult to implement

• Better:

– “Any write must eventually be seen by a read”

– All writes are seen in proper order (“serialization”)

• Two rules to ensure this:

– “If P writes x and P1 reads it, P’s write will be seen by P1 if the read and write are sufficiently far apart”

– Writes to a single location are serialized: seen in one order

• Latest write will be seen

• Otherwise could see writes in illogical order (could see older value after a newer value)

Shared Memory Architectures

Chap. 8 - Multiprocessors 13

There are Different Types of Memory In The Cache

What kinds of memory are there in the cache?

Shared Memory Architectures

Test_and_set(lock) shared_data = xyz;Clear(lock);

TYPE Shared? Writable How Kept Coherent

Code Shared No No Need.

Private Data Exclusive Yes Write Back

Shared Data Shared Yes Write Back *

Interlock Data Shared Yes Write Through **

* Write Back gives good performance, but if you use write through here, there will be performance degradation.

** Write through here means the lock state is seen immediately. You want a write through here to flush the cache.

Chap. 8 - Multiprocessors 14

Potential HW Coherency Solutions

• Snooping Solution (Snoopy Bus):

– Send all requests for data to all processors

– Processors snoop to see if they have a copy and respond accordingly

– Requires broadcast, since caching information is at processors

– Works well with bus (natural broadcast medium)

– Dominates for small scale machines (most of the market)

• Directory-Based Schemes

– Keep track of what is being shared in one centralized place

– Distributed memory => distributed directory for scalability(avoids bottlenecks)

– Send point-to-point requests to processors via network

– Scales better than Snooping

– Actually existed BEFORE Snooping-based schemes

Shared Memory Architectures

Chap. 8 - Multiprocessors 15

An Example Snoopy ProtocolMaintained by Hardware

Invalidation protocol, write-back cache

Each block of memory is in one state:

Clean in all caches and up-to-date in memory (Shared)

OR Dirty in exactly one cache (Exclusive)

OR Not in any caches

Each cache block is in one state (track these):

Shared : block can be read

OR Exclusive : cache has only copy, its writeable, and dirty

OR Invalid : block contains no data

Read misses: cause all caches to snoop bus

Writes to clean line are treated as misses

Shared Memory Architectures

Chap. 8 - Multiprocessors 16

Snoopy-Cache State Machine-I

• State machinefor CPU requestsfor each cache block

InvalidShared

(read/only)

Exclusive(read/write)

CPU Read

CPU Write

CPU Read hit

Place read misson bus

Place Write Miss on bus

CPU read missWrite back block

CPU WritePlace Write Miss on Bus

CPU Read missPlace read miss on bus

CPU Write MissWrite back cache blockPlace write miss on bus

CPU read hitCPU write hit

Cache BlockState

Shared Memory Architectures

Applies to Write Back

Data

Chap. 8 - Multiprocessors 17

Snoopy-Cache State Machine-II

• State machinefor bus requests for each cache block

• Appendix E gives details of bus requests

InvalidShared

(read/only)

Exclusive(read/write)

Write BackBlock; (abortmemory access)

Write miss for this block

Read miss for this block

Write miss for this block

Write BackBlock; (abortmemory access)

Shared Memory Architectures

Chap. 8 - Multiprocessors 18

Example

P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value

P1: Write 10 to A1P1: Read A1P2: Read A1

P2: Write 20 to A1P2: Write 40 to A2

Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2

Processor 1 Processor 2 Bus Memory

Remote Write

or MissWrite Back

Remote Write or Miss

Invalid Shared

Exclusive

CPU Read hit

Read miss on bus

Write miss on bus CPU Write

Place Write Miss on Bus

CPU read hitCPU write hit

Remote Read Write Back

Shared Memory Architectures

This is the Cache for P1.

Chap. 8 - Multiprocessors 19

Example: Step 1

P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value

P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1P1: Read A1P2: Read A1

P2: Write 20 to A1P2: Write 40 to A2

Invalid Shared

Exclusive

Write miss on bus

Shared Memory Architectures

Chap. 8 - Multiprocessors 20

P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value

P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1P1: Read A1 Excl. A1 10P2: Read A1

P2: Write 20 to A1P2: Write 40 to A2

Example: Step 2

Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2

Invalid Shared

Exclusive

CPU read hit

Shared Memory Architectures

Chap. 8 - Multiprocessors 21

Example: Step 3

P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value

P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1P1: Read A1 Excl. A1 10P2: Read A1 Shar. A1 RdMs P2 A1

Shar. A1 10 WrBk P1 A1 10 10Shar. A1 10 RdDa P2 A1 10 10

P2: Write 20 to A1 10P2: Write 40 to A2 10

10

Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2.

Invalid Shared

Exclusive

Read miss on bus

Remote Read Write Back

A1

Shared Memory Architectures

Chap. 8 - Multiprocessors 22

Example: Step 4

P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value

P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1P1: Read A1 Excl. A1 10P2: Read A1 Shar. A1 RdMs P2 A1

Shar. A1 10 WrBk P1 A1 10 10Shar. A1 10 RdDa P2 A1 10 10

P2: Write 20 to A1 Inv. Excl. A1 20 WrMs P2 A1 10P2: Write 40 to A2 10

10

Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2

Remote Write

Invalid Shared

Exclusive

A1

Shared Memory Architectures

Chap. 8 - Multiprocessors 23

Example: Step 5

P1 P2 Bus Memorystep State Addr Value State Addr Value Action Proc. Addr Value Addr Value

P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1P1: Read A1 Excl. A1 10P2: Read A1 Shar. A1 RdMs P2 A1

Shar. A1 10 WrBk P1 A1 10 10Shar. A1 10 RdDa P2 A1 10 10

P2: Write 20 to A1 Inv. Excl. A1 20 WrMs P2 A1 10P2: Write 40 to A2 WrMs P2 A2 10

Excl. A2 40 WrBk P2 A1 20 20

A1

A1

Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2

Shared Memory Architectures

Chap. 8 - Multiprocessors 24

Summary8.1 Introduction – the big picture

8.3 Centralized Shared Memory Architectures

We’ve looked at what happens to caches when we have multiple processors or devices looking at memory.