©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM Bandwidth 1

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012 ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM

Download PPTX Report

Upload
magnus-bennett
View
214
Download
0

Tags:

Embed Size (px)

Citation preview

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

ECE408/CS483

Applied Parallel Programming

Lecture 7: DRAM Bandwidth

Objective

• To understand DRAM bandwidth– Cause of the DRAM bandwidth problem– Programming techniques that address the problem:

memory coalescing, corner turning,

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

Global Memory (DRAM) Bandwidth

Ideal Reality

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

DRAM Bank Organization

• Each core array has about 1M bits

• Each bit is stored in a tiny capacitor, made of one transistor

Memory CellCore Array

RowDecoder

Sense Amps

Column Latches

Mux

RowAddr

ColumnAddr

Off-chip Data

Wide

Narrow

Pin Interface

A very small (8x2 bit) DRAM Bank

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

deco

0 1 1

Sense amps

Mux5

Page 6: ©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012 ECE408/CS483 Applied Parallel Programming Lecture 7: DRAM

DRAM core arrays are slow.

• Reading from a cell in the core array is a very slow process– DDR: Core speed = ½ interface speed– DDR2/GDDR3: Core speed = ¼ interface speed– DDR3/GDDR4: Core speed = ⅛ interface speed– … likely to be worse in the future

deco

To sense amps

A very small capacitance that stores a data bit

About 1000 cells connected to each vertical line

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

DRAM Bursting.

• For DDR{2,3} SDRAM cores clocked at 1/N speed of the interface:

– Load (N × interface width) of DRAM bits from the same row at once to an internal buffer, then transfer in N steps at interface speed

– DDR2/GDDR3: buffer width = 4X interface width

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

DRAM Bursting

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

deco

0 1 0

Sense amps

Mux8

DRAM Bursting

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

deco

0 1 1

Sense amps and buffer

Mux9

DRAM Bursting for the 8x2 Bank

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

time

Address bits to decoder

Core Array access delay2 bitsto pin

2 bitsto pin

Non-burst timing

Burst timing

Modern DRAM systems are designed to be always accessed in burst mode. Burst bytes are transferred but discarded when accesses are not to sequential locations.

Multiple DRAM Banks

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

deco

Sense amps

Muxde

code

Sense amps

Mux

0 1 10

Bank 0 Bank 1

DRAM Bursting for the 8x2 Bank

©Wen-mei W. Hwu and David Kirk/NVIDIA, ECE408/CS483/ECE498AL, University of Illinois, 2007-2012

time

Address bits to decoder

Core Array access delay2 bitsto pin

2 bitsto pin

Single-Bank burst timing, dead time on interface

Multi-Bank burst timing, reduced dead time 12

CS483 Analysis of Algorithms Lecture 11 – NP …jmlien/teaching/09_spring_cs483/...Analysis of Algorithms CS483 Lecture 11 – NP-completeness trans – 1 CS483 Analysis of Algorithms

Documents

David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, 2007-2012 1 ECE408 / CS483 Applied Parallel Programming Lecture 27:

Documents

ln-os-ch12 Mass-Storage Structureet.engr.iupui.edu/~dskim/Classes/ECE408/ln-os-ch12 Mass-Storage... · Disk Structure nDisk drives are addressed as large 1 -dimensional arrays of

Documents

Advance Transform Techniques Course Code: ECE408...11/5/2014 1 Fourier Analysis and Synthesis Tool Nikesh Bajaj nikesh.14730@lpu.co.in Digital Signal Processing School of Electronics

Advance Transform Techniques Course Code: ECE408...11/5/2014 1 Fourier Analysis and Synthesis Tool Nikesh Bajaj [email protected] Digital Signal Processing School of Electronics

Documents

CS483 Design and Analysis of Algorithms - cs.gmu.edulifei/teaching/cs483_fall08/review_final_Fall... · CS483 Design and Analysis of Algorithms ... 4 You are allowed to haveone-page

Documents

CS483 Design and Analysis of Algorithms* - cs.gmu.edulifei/teaching/cs483_fall07/lecture01.pdf · *This lecture note is based on Introduction to The Design and Analysis of Algorithms

Documents

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2012 ECE408/CS483/498AL, University of Illinois, Urbana-Champaign 1 ECE 408/CS483 Applied Parallel Programming

Documents

Introductionet.engr.iupui.edu/~dskim/Classes/ECE408/ln-os-ch01 Introduction.pdf · 2. Operating system – controls and coordinates the use of the hardware among the various application

Documents

FPGA Laboratory Assignment 1 Due Date: 2/10/2018 · A VHDL module should now appear in project navigator, as shown below. ... report ECE408 -Digital Design with FPGAs 9 . Department

Documents

CS 483 - Data Structures and Algorithm Analysis - Lecture ...pwiegand/cs483-Spring06/lecturenotes/cs483-l5… · CS 483 - Data Structures and Algorithm Analysis Lecture V: Chapter

Documents

Divide and Conquer - Home | George Mason …kosecka/cs483-001/05divide-and-conquer.pdf · Divide-and-conquer: n log n. Divide et impera. Veni, vidi, vici. ... Supply chain management

Documents

New ECE408 Exam #1 Study Guidelumetta.web.engr.illinois.edu/408-S19/ref/ECE408-S19... · 2019. 3. 27. ·

Documents

Intro to Qt - Missouri S&T - Missouri University of ...web.mst.edu/~tauritzd/courses/cs483/JasonTrent_Qt4Intro.pdfWhy Qt? Multi-platform –No “virtual machines” or emulation layers

Documents

CS483 Design and Analysis of Algorithmslifei/teaching/cs483_fall08/lecture... · 2008-09-03 · CS483 Design and Analysis of Algorithms Lectures 2-3 Algorithms with Numbers Instructor:

Documents

$CS483 Design and Analysis of Algorithmslifei/teaching/cs483_fall08/lecture...from \One Two Three . . . In nity: Facts and Speculations of Science" by George Gamow, Dover, 1988 I Decimal$

CS483 Design and Analysis of Algorithmslifei/teaching/cs483_fall08/lecture...from \One Two Three . . . In nity: Facts and Speculations of Science" by George Gamow, Dover, 1988 I Decimal

Documents

© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, 2007-2012 1 GPU Programming Lecture 7 Atomic Operations and Histogramming

Documents

athena.union.eduathena.union.edu/~hemmendd/Courses/cs483/dis-memo.pdfin a 3—address format) R R R R might well be preferable to R R R R if the adder and multiplier were independent

Documents

1 ECE 8823A GPU Architectures Module 4: Memory Model and Locality © David Kirk/NVIDIA and Wen-mei Hwu, 2007-2012 ECE408/CS483/ECE498al, University of Illinois,

Documents

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2010 ECE408, University of Illinois, Urbana Champaign 1 Programming Massively Parallel Processors CUDA Memories

Documents

CS 483 - Data Structures and Algorithm Analysis - Lecture ...pwiegand/cs483-Spring06/lecturenotes/cs483-l7… · CS 483 - Data Structures and Algorithm Analysis Lecture VII: Chapter

Documents

Chapter 7 - George Mason Universitykosecka/cs483-001/07maxflow-applications.pdf · Chapter 7 Network Flow ... 7 Theorem. Max cardinality matching in G = value of max flow in G'.

Documents

CS 483 Introduction to Data Mining 1 DATA MINING Introductory Dr. Mohammed Alhaddad Collage of Information Technology King AbdulAziz University CS483

Documents

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009 ECE498AL, University of Illinois, Urbana-Champaign 1 CUDA Threads

Documents

CS483-04 Non-recursive and Recursive Algorithm …lifei/teaching/cs483_fall07/lecture...CS483-04 Non-recursive and Recursive Algorithm Analysis Instructor: Fei Li Room 443 ST II Ofﬁce

Documents

CS 483 - Data Structures and Algorithm Analysis - …pwiegand/cs483-Spring06/lecturenotes/cs483-l1...Outline Introduction Algorithms & Problems Fundamentals Problem Types Data Structures

Documents

That One Special Shotrobotics.cs.tamu.edu/dshell/cs483/samples/Sample_Final_Report.pdf · Current software which attempts to create these databases are inadequate, for none ... working

Documents

1 2D Convolution, Constant Memory and Constant Caching © David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al University of Illinois, 2007-2011

Documents

CS483/683 Multi-Agent Systems

Documents

$CS483 Design and Analysis of Algorithmslifei/teaching/cs483_fall08/Chapter8_DPV.pdfCS483 Design and Analysis of Algorithms ... Figures unclaimed are from books \Algorithms" and \Introduction$