19
Flexible Interleaved Memory Design for eneralized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge ,MA Distributed Memory Computing Conference, 1991. Proceedings., The Sixth , on Pages: 637 - 644 元元元元 元元元元元 元元元 1999/11/24

A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Embed Size (px)

DESCRIPTION

Abstract High bandwidth delivery of data to the processor(s) is critical for good performance in parallel computer systems. To increase memory throughput, many systems make use of interleaved parallel memory banks. This paper proposes an implementation for an interleaved system that exhibits low contention for memory banks during virtually all patterned accesses. A variant of this design is currently in use on BBN TC2000 parallel computer.

Citation preview

Page 1: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access

Laurence S.Kaplan BBN Advanced Computers Inc.

Cambridge ,MA

Distributed Memory Computing Conference, 1991. Proceedings., The Sixth ,

on Pages: 637 - 644

元智大學 系統實驗室 楊登傑 1999/11/24

Page 2: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

outline

• Introduction• Interleaved memory system design• Definitions• Interleaving• Implementation• Performance• Conclusions

Page 3: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Abstract• High bandwidth delivery of data to the processor(s) is

critical for good performance in parallel computer systems.

• To increase memory throughput, many systems make use of interleaved parallel memory banks.

• This paper proposes an implementation for an interleaved system that exhibits low contention for memory banks during virtually all patterned accesses.

• A variant of this design is currently in use on BBN TC2000 parallel computer.

Page 4: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Introduction

• In parallel processing architectures, it allows the different processing elements to service concurrent requests simultaneously, without contention at the memory banks of the system.

• Physical addresses in such systems are interpreted in a way that spreads the references across these memory banks.

• A simple interleaving system treats a physical addresses as a binary 2-tuple (target memory bank,byte offset in bank).

Page 5: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Introduction (cont.)• The width of the target memory bank

field,l,determines the maximum number of memory banks that can be interleaved,M=

• The width of the byte offset field determines the size of each memory bank, b.

• A stride-access in parallel processor:– This is defined as N processors, each attempting

to fetch a distinct item simultaneously, where the items are separated by a stride S and start at base address a.

l2

Page 6: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Interleaved memory system design

• A new method permute addresses that reference interleaved memory.

• It supports dynamic configuration of the memory banks being interleaved.

Page 7: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Definitions

• A clump is the basic unit of interleaving. It consists of one or more bytes of storage.

• A gallery refers to the set of page frames.• A stripe is a sequence of memory banks starting with a spe

cific bank.• The clump numbers for a given stripe are used to index int

o this sequence.• This interleaving approach starts by dividing up the physic

al address to be permuted into a binary 4-tuple(stripe,gallery,clump,byte).

Page 8: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Definitions(cont.)

• An interleaved page consists of the entire set of clumps for specific values of stripe and gallery.

• If there are c bits in the clump field, then there are clumps in an interleaved page.• With this method, interleaved pages are distinguis

hed by their stripe number.• This number determines which element in the targ

et memory bank sequence the first clump in the page indexes to.

c2

Page 9: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Definitions(cont.)

• This interleaving method requires that the clump field of the physical address be at least as wide as the strip field.

• The strip field width in bits,w,is set according to the maximum number of memory banks.

• The width of byte field determines the size of a clump.

Page 10: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Definitions (cont.)

Page 11: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Interleaving

• The interleaving method takes the physical addresses corresponding to a gallery of page frames and permutes them to yield N interleaved pages.

• The transformation involves using clump to index into the target memory bank sequence that starts with a memory bank specified by stripe.

• This is accomplished using the sum of stripe and clump to address a lookup take of size 2^(w+1)by w bits.

• This lookup produces the target memory bank number that replaces the stripe portion of the address.

Page 12: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Interleaving (cont.)

• Table 1:The clump is the row index of the table,stripe is the column index, and the target memory bank numbers are filled into the table.

• The address in the Modulus RAM used to store the target memory value is calculated by adding clump to stripe.

• Table 2:The target memory bank is the column index,clump is the row index, and stripe value are filled into the table.

• Here,stripe select different staring points in sequence that clump then indexes into.

Page 13: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Interleaving (cont.)

• This picture shows how the pages within a gallery are distributed across the memory banks by the clumps within a page and the stripe value of the page.

Page 14: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Implementation

• A variant of this interleaver design has been implemented on the BBN TC2000 MIMD parallel processor.

• This computer is a distributed memory machine where each processor has local memory that is also part of the system’s globally addressable physical memory.

Page 15: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Implementation (cont.)

• The hardware automatically forwards memory requests to the appropriate memory bank(remote or local).

• These RAMs can be dynamically loaded by the local processor.

Page 16: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Performance

• A performance metric is useful to show the success of this method at interleaving regular patterned accesses.

• Measurements were taken by simulating the interleaving method and measuring the non-uniformity for a stride access for each stride s with 10,000 different starting addresses a.

• There results can be interpreted to mean that no memory banks are ever referenced more than once for any of the stride accesses simulated.

Page 17: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Performance (cont.)

• The method proposed in this paper performs much better than the randomized non-uniformity for all of the strides.

Page 18: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Future work

• More work needs to be done regarding the ordering of requests and contention within the interconnection network.

• These are topics of varying importance,depending on the type of processor architecture and this type of interconnection network using this interleaving approach.

Page 19: A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed

Conclusions

• This paper has proposed a highly effective and flexible method for reducing memory conflicts during virtually all stride accesses.

• This method is applicable to a wide range of architectures desiring low conflict parallel memory access.