16
A Fully Buffered Memory System A Fully Buffered Memory System Simulator Simulator Rami Nasr Rami Nasr -M.S. Thesis, and ENEE 759H -M.S. Thesis, and ENEE 759H Course Project Course Project Thursday May 12 Thursday May 12 th th , 2005 , 2005

A Fully Buffered Memory System Simulator

  • Upload
    caelan

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

A Fully Buffered Memory System Simulator. FBsim 1.0. Rami Nasr -M.S. Thesis, and ENEE 759H Course Project Thursday May 12 th , 2005. Another Simulator?. Sim-DRAM exists and supports FB-DIMM. Why write another simulator?. - PowerPoint PPT Presentation

Citation preview

Page 1: A Fully Buffered Memory System Simulator

A Fully Buffered Memory System A Fully Buffered Memory System SimulatorSimulator

Rami NasrRami Nasr-M.S. Thesis, and ENEE 759H Course -M.S. Thesis, and ENEE 759H Course ProjectProjectThursday May 12Thursday May 12thth, 2005, 2005

Page 2: A Fully Buffered Memory System Simulator

Another Simulator?Another Simulator?

Sim-DRAM still had a few unworkable bugs in its FB-DIMM Sim-DRAM still had a few unworkable bugs in its FB-DIMM model when I began my study. model when I began my study.

FB-DIMM is radically different than other memory FB-DIMM is radically different than other memory architectures. New simulator => fresh start.architectures. New simulator => fresh start.

FBsim is made exclusively for simulating and studying the FBsim is made exclusively for simulating and studying the FB-DIMM architecture. Easier to study FB-DIMM with an FB-DIMM architecture. Easier to study FB-DIMM with an exclusive simulator.exclusive simulator.

Different scheduler, mapping algorithm, approach, style, Different scheduler, mapping algorithm, approach, style, section of study in the FB-DIMM design space. section of study in the FB-DIMM design space.

FBsim is ideal for simulating ‘unreasonably’ high memory FBsim is ideal for simulating ‘unreasonably’ high memory request rates and studying channel saturation effects.request rates and studying channel saturation effects.

The two simulators can be used to validate each other’s The two simulators can be used to validate each other’s results in FB-DIMM studies.results in FB-DIMM studies.

Writing a memory simulator was a great experience for Writing a memory simulator was a great experience for me. me.

Sim-DRAM exists and supports FB-DIMM. Why write another simulator?

Page 3: A Fully Buffered Memory System Simulator

FBsim OverviewFBsim Overview All code written from scratch.All code written from scratch. Standalone product. Does not currently interface with CPU simulators Standalone product. Does not currently interface with CPU simulators

or memory traces. Instead probabilistically models memory or memory traces. Instead probabilistically models memory transactions according to user specifications. transactions according to user specifications.

=> Does not actually store memory data=> Does not actually store memory data Written in ANSI C. ~5000 lines of code. Code organized into header Written in ANSI C. ~5000 lines of code. Code organized into header

files, commented, quite easy to hack. files, commented, quite easy to hack. Fast. For each memory channel, 1 second simulates ~10ms (or Fast. For each memory channel, 1 second simulates ~10ms (or

~1ms during channel saturation) on a 2.4 GHz Pentium 4. ~1ms during channel saturation) on a 2.4 GHz Pentium 4. Supports Open & Closed Page Mode, Fixed & Variable Latency Mode.Supports Open & Closed Page Mode, Fixed & Variable Latency Mode. Supports output of macro and micro (frame by frame) simulation Supports output of macro and micro (frame by frame) simulation

datadata Does not model channel init, maintenance, sync. overhead.Does not model channel init, maintenance, sync. overhead. Does not model memory refresh.Does not model memory refresh. Does not model power consumption, and power timing limitations Does not model power consumption, and power timing limitations

(t(tFAWFAW etc.). etc.). The above options can be incorporated readily into future versions.The above options can be incorporated readily into future versions.

Page 4: A Fully Buffered Memory System Simulator

FBsim Overview 2FBsim Overview 2

Channel Scheduler

0Channel

Scheduler 1

Channel Scheduler

7

Address Mapper

Input Transactio

n Generator

A Frame Iteration•Try to generate transactions•Map any generated transactions to its channel scheduler. •Fire each scheduler once.

Page 5: A Fully Buffered Memory System Simulator

Input Transaction Input Transaction ModelModel

• Step Distributions • Normal (Gaussian) Distributions

Page 6: A Fully Buffered Memory System Simulator

Input Transaction Input Transaction Model 2Model 2

Bus Trace Viewer

FBsim Model

Page 7: A Fully Buffered Memory System Simulator

Address MappingAddress Mapping Physical address must be mapped somehow to Physical address must be mapped somehow to

the right channel, DIMM, rank, bank, row, and the right channel, DIMM, rank, bank, row, and column. column.

FBsim built to support different DIMM FBsim built to support different DIMM capacities, different channel capacities, even capacities, different channel capacities, even unbalanced configurationsunbalanced configurations

=> Algorithm needed to map incoming => Algorithm needed to map incoming transaction to DIMMtransaction to DIMM

WHILE (a non zero row sum exists){ WHILE (visit each channel with a non zero row sum exactly once) { The next 'result' is channel DIMM with the highest number. Decrement that DIMM's number by 1. Decrement the row sum by 1. }}

Modulus = 4+2+1+2 = 9

Closed Page Mode

Open Page Mode

Page 8: A Fully Buffered Memory System Simulator

Channel SchedulerChannel Scheduler

Page 9: A Fully Buffered Memory System Simulator

FB-DIMM Frame FB-DIMM Frame Format Format ReviewReview

SouthBound (SB) Frame could be a:• Channel Frame (not modeled in FBsim)• Command Frame (up to three DRAM commands, with only one command possible to each DIMM in the channel)• Command + Wdata Frame (holds one DRAM command, plus one DDR beat of write data)

NorthBound (NB) Frame could be a:• Channel Frame (not modeled in FBsim)• Read Response Frame (holds two DDR beats of returned read data)

Page 10: A Fully Buffered Memory System Simulator

Some of my ResultsSome of my Results

• 1x8 achieved 7.9 GBps before saturating (82%)

• 2x4 achieved 15.6 GBps (82%)

• 4x2 achieved 31.3 GBps (82%)

• 8x1 achieved 45.2 GBps (59%!)

Case Study Conclusion

• With at least two DIMMs on each channel, performance scales very well in FB-DIMM

•More than two DIMMs only increases capacity, not throughput

•Adding each DIMM adds ~5ns average channel latency in FLM, and slightly over half that in VLM

• In closed page mode, only 82% of peak theoretical throughput of a channel can be reached.

Page 11: A Fully Buffered Memory System Simulator

Some of my Results 2Some of my Results 2

• In Closed Page Mode with 2:1 read/write ratio, a reordering window of size ~12 transactions achieves best possible performance (channel saturation) for a FB-DIMM channel scheduler. Increasing window-size over this has no benefit.• The more skewed the read/write ratio, the bigger the scheduling window needs to be (at 4:1, its ~18). • In Variable Latency Mode, a reordering window of size ~20 achieves best possible performance.

Page 12: A Fully Buffered Memory System Simulator

Some of my Results 3Some of my Results 3

Micro-study shows that in Closed Page Mode, the FB channel can at most reach ~93% write data utilization on the SB, and ~84% read data utilization on the NB.

Micro-study showed that FBsim channel utilization was slightly worse for non 2:1 read/write ratios (it was 2% worse for 4:1). FBsim scheduler can quite straightforwardly be made more adaptive to read/write ratio of transactions in scheduler.

Page 13: A Fully Buffered Memory System Simulator

Future Ideas with Future Ideas with FBsimFBsim

(me)• I’m graduating this semester (if Dr Jacob and Mr (Dr?) Wang so please), and escaping to the corporate world.

• => Writing a guide for FBsim along with some ideas for future work. Anyone who wishes to take over development is eagerly encouraged to.• If so, I would be happy to help get things rolling by email or in person. Feel free to access & use anything in FBsim or my thesis paper. • I strongly believe a very interesting paper or three can quite quickly come out of this research area

Page 14: A Fully Buffered Memory System Simulator

Future Ideas with FBsim Future Ideas with FBsim 22• For credibility in a paper, add an interface between FBsim and a CPU simulator or memory traces. Run real benchmarks through FBsim. Compare and contrast these results with the transaction modeling results. • AND/OR add more functionality and provable realism to the transaction modeler. Study this. • Best yet, integrate FBsim into the Sim-DRAM package as an added option.

• Add modeling for channel overhead, memory refresh overhead, error simulation and error handling, power consumption constraints and metrics.• Enhance adaptivity of FBsim scheduler to non 2:1 read/write ratios. • Experiment with address mapping algorithm and load balancing. • Experiment with different type scheduler implementations (eg. ones not based on pattern matching). *involved*• Study hardware constraints in FB-DIMM channel scheduling.

Page 15: A Fully Buffered Memory System Simulator

More Possible FB-More Possible FB-DIMM StudiesDIMM Studies

Channel utilization and configuration trade-offs for Channel utilization and configuration trade-offs for Open Page ModeOpen Page Mode

Performance degradation of shrinking scheduler Performance degradation of shrinking scheduler reorder window sizereorder window size

Relaxation on critical DRAM device parameters Relaxation on critical DRAM device parameters (density, nBanks, timing constraints, clock (density, nBanks, timing constraints, clock frequency) allowed by FB-DIMM architecturefrequency) allowed by FB-DIMM architecture

OR optimizing the FB-DIMM architecture by OR optimizing the FB-DIMM architecture by increasing the SB and NB channel widths (adding increasing the SB and NB channel widths (adding lines) or bitrates, and maybe modifying the frame lines) or bitrates, and maybe modifying the frame protocolprotocol

AMB is a logic device on a memory module!! Can AMB is a logic device on a memory module!! Can add buffers, arithmetic units, processing power, add buffers, arithmetic units, processing power, etc…..etc…..

Page 16: A Fully Buffered Memory System Simulator

Special Thanks to..Special Thanks to..

Dr Jacob for introducing me to the Dr Jacob for introducing me to the field and guiding my progressfield and guiding my progress

David Wang for the course David Wang for the course lectures and materiallectures and material