16
4/2/11 1 Demand Code Paging for NAND Flash in MMU-less Embedded Systems Jose Baiocchi and Bruce Childers University of Pittsburgh Pittsburgh PA USA [email protected] 2 Bruce Childers University of Pittsburgh Memory Shadowing Range of embedded systems commonly have both main memory and storage CPU L1 D L1 I DRAM NAND Flash ASIC I/O Flash storage : stores program binary image Main memory : holds both code and data

Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

1

Demand Code Paging for NAND Flash in MMU-less Embedded Systems

Jose Baiocchi and Bruce Childers

University of Pittsburgh Pittsburgh PA USA

[email protected]

2 Bruce Childers University of Pittsburgh

Memory Shadowing •  Range of embedded systems commonly have both

main memory and storage

CPU

L1 D L1 I

DRAM

NAND Flash

ASIC I/O

•  Flash storage: stores program binary image •  Main memory: holds both code and data

Page 2: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

2

3 Bruce Childers University of Pittsburgh

Memory Shadowing

CPU

L1 D L1 I

ASIC I/O 1

•  Range of embedded systems commonly have both main memory and storage

•  Flash storage: stores program binary image •  Main memory: holds both code and data

copy binary image

Flash holds code and data pages

4 Bruce Childers University of Pittsburgh

Memory Shadowing

CPU

L1 D L1 I

ASIC I/O

Execute from memory Binary image copied to memory for execution

•  Range of embedded systems commonly have both main memory and storage

•  Flash storage: stores program binary image •  Main memory: holds both code and data

Page 3: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

3

5 Bruce Childers University of Pittsburgh

Memory Shadowing •  Demand paging: Bring pages in as needed with OS

and hardware (MMU) support

•  Low cost embedded devices •  May lack support to detect missing code

through paging mechanism

Reduce copy and boot-up latencies when full shadowing cannot be amortized

•  Full shadowing: Copy entire binary image •  Copy latency needs to be amortized •  Boot-up delay to copy binary image

6 Bruce Childers University of Pittsburgh

Our Approach •  Dynamic binary translation

•  Virtualization, security, resource management, among many other uses

•  Translation steps 1.  Fetch instruction from memory 2.  Decode 3.  Possibly modify 4.  Save instruction in software-managed buffer 5.  Execute instructions from buffer

•  “Translation” may be from same ISA to same ISA

Page 4: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

4

7 Bruce Childers University of Pittsburgh

Dynamic Binary Translation

Context Capture

Context Switch

Next PC Translate Decode Fetch

New Fragment

Finished?

Dynamic Translator

Cached? New PC

Flash (organized as pages)

DRAM DRAM

copied

program started, translator entered

Binary image is shadowed to DRAM

Translator entered on program start - fetch

Context switch to execute “fragment”

Fetch, translate, execute more code

Execution captured in fragment cache

Fragment Cache Binary Image

8 Bruce Childers University of Pittsburgh

Dynamic Binary Translation

Context Capture

Context Switch

Next PC Translate Decode Fetch

New Fragment

Finished?

Dynamic Translator

Cached? New PC

Flash (organized as pages)

DRAM DRAM

Fragment Cache Binary Image

Some Flash pages aren’t actually needed

Page 5: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

5

9 Bruce Childers University of Pittsburgh

Dynamic Binary Translation

Context Capture

Context Switch

Next PC Translate Decode Fetch

New Fragment

Finished?

Dynamic Translator

Cached? New PC

Flash (organized as pages)

DRAM DRAM

Fragment Cache Binary Image

Our approach: DBT changed to load pages when needed

10 Bruce Childers University of Pittsburgh

Demand Page Loading •  Simple idea: Load Flash code pages on demand

•  DBT translates along execution path

•  Change Fetch to perform Flash page load •  Load page for requested instruction •  Return requested instruction •  Buffer loaded page to avoid loading again

•  Challenge: How to manage buffers for translated code and loaded pages

•  Scattered page buffer •  Unified code buffer

Page 6: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

6

11 Bruce Childers University of Pittsburgh

Scattered Page Buffer •  Two buffers: Fragment cache + Page buffer

•  Managed by DBT system as part of Fetch step •  Buffer size set to number pages in binary image •  Unique buffer page per Flash page

Fetch steps 1.  Check whether page for requested instruction

address is already loaded 2.  Load missing page to pre-determined location 3.  Fetch instruction from loaded page

Essentially, full shadowing with pages loaded on-demand

12 Bruce Childers University of Pittsburgh

Scattered Page Buffer

Fully shadowing without DBT On-demand paging with DBT using scattered page buffer

Page 7: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

7

13 Bruce Childers University of Pittsburgh

Scattered Page Buffer 1.  Minimizes number Flash page reads 2.  Spreads Flash reads out over time 3.  Potentially sparsely populated page buffer

•  Advantage: Simple one-to-one mapping •  Flash page at fixed location – either there or not •  Low overhead: Quick lookup and no additional

data structures •  If DBT is already used, little additional overhead

•  Disadvantage: Increases memory overhead •  Footprint: Size of SPB + FC + DBT data structures

14 Bruce Childers University of Pittsburgh

Unified Code Buffer •  Combine fragment cache + page buffer

•  Managed as single buffer •  Multiple caching units: fragments and pages •  Fixed constraint on total size

•  Management more complex •  Single buffer with regions for fragments

(translated code) and pages (untranslated code) •  Overflow between regions •  Which region gets priority

Page 8: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

8

15 Bruce Childers University of Pittsburgh

Unified Code Buffer

16 Bruce Childers University of Pittsburgh

Unified Code Buffer •  Page locality, eviction policy (LRU/FIFO), UCB

capacity determine how well scheme works

•  Advantage: Constrain total DBT footprint •  UCB + DBT structures ≤ Full shadow size •  75% of full shadow size works well •  DBT data structures are 1 word per 3 instructions

•  Disadvantage: Performance overhead may be worse •  May need to reload previously seen pages •  Manage data structures, e.g., LRU information

Page 9: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

9

17 Bruce Childers University of Pittsburgh

Experimental Methodology •  When is DBT-based demand code paging needed •  Does UCB perform as well as SPB while mitigating

memory footprint of DBT system

•  Strata DBT for SimpleScalar/PISA •  Simulated SoC with 400 MHz ARM-like processor •  NAND Flash card Kingston 1GB CF, 0.6MB/sec •  MiBench with large input data sets (show selected)

•  FS (baseline): Fully shadowed binary •  SPB: Scattered page buffer •  UCB-75%: FIFO; size set to 75% of FS

18 Bruce Childers University of Pittsburgh

NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU

fft 92 80 124 120

ghostscript 2047 971 971 971

lame 470 391 534 529

jpeg.dec 277 168 187 183

pgp.enc 524 290 292 291

susan.cor 149 88 91 89

Absolute number of page reads with full shadowing (FS), scattered page buffer (SPB) and unified code buffer (UCB) with FIFO and LRU and sized to 75% of binary image.

Page 10: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

10

19 Bruce Childers University of Pittsburgh

NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU

fft 92 80 124 120

ghostscript 2047 971 971 971

lame 470 391 534 529

jpeg.dec 277 168 187 183

pgp.enc 524 290 292 291

susan.cor 149 88 91 89

Use full shadowing: small reduction, or page reads are well amortized

20 Bruce Childers University of Pittsburgh

NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU

fft 92 80 124 120

ghostscript 2047 971 971 971

lame 470 391 534 529

jpeg.dec 277 168 187 183

pgp.enc 524 290 292 291

susan.cor 149 88 91 89

Use demand paging: large reduction and/or page reads are not amortized

Page 11: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

11

21 Bruce Childers University of Pittsburgh

NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU

fft 92 80 124 120

ghostscript 2047 971 971 971

lame 470 391 534 529

jpeg.dec 277 168 187 183

pgp.enc 524 290 292 291

susan.cor 149 88 91 89

FIFO is nearly as good, yet is much simpler with less management overhead cost

Remaining results use FIFO.

22 Bruce Childers University of Pittsburgh

Improvement in Boot Time

1 2 3 4 5 6 7

Avg-All

fft

ghostscript

lame

jpeg.dec

pgp.enc

susan.cor

Factor Improvement in Boot Time

UCB-75% SPB

Measured as delay to executing first application instruction

Page 12: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

12

23 Bruce Childers University of Pittsburgh

Improvement in Boot Time

1 2 3 4 5 6 7

Avg-All

fft

ghostscript

lame

jpeg.dec

pgp.enc

susan.cor

Factor Improvement in Boot Time

UCB-75% SPB

Measured as delay to executing first application instruction

Use demand paging

24 Bruce Childers University of Pittsburgh

Improvement in Boot Time

1 2 3 4 5 6 7

Avg-All

fft

ghostscript

lame

jpeg.dec

pgp.enc

susan.cor

Factor Improvement in Boot Time

UCB-75% SPB

Measured as delay to executing first application instruction

Use full shadowing

Page 13: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

13

25 Bruce Childers University of Pittsburgh

Improvement in Boot Time

1 2 3 4 5 6 7

Avg-All

fft

ghostscript

lame

jpeg.dec

pgp.enc

susan.cor

Factor Improvement in Boot Time

UCB-75% SPB

Measured as delay to executing first application instruction

Significantly improved boot time even when

shadowing is preferred

26 Bruce Childers University of Pittsburgh

Improvement in Performance

0.9 1 1.1 1.2 1.3 1.4

Avg-All

Avg-Demand

Avg-Shadow

fft

ghostscript

lame

jpeg.dec

pgp.enc

susan.cor

Performance Speedup

UCB-75% SPB

Page 14: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

14

27 Bruce Childers University of Pittsburgh

Improvement in Performance

0.9 1 1.1 1.2 1.3 1.4

Avg-All

Avg-Demand

Avg-Shadow

fft

ghostscript

lame

jpeg.dec

pgp.enc

susan.cor

Performance Speedup

UCB-75% SPB

Use demand paging

28 Bruce Childers University of Pittsburgh

Improvement in Performance

0.9 1 1.1 1.2 1.3 1.4

Avg-All

Avg-Demand

Avg-Shadow

fft

ghostscript

lame

jpeg.dec

pgp.enc

susan.cor

Performance Speedup

UCB-75% SPB

Use full shadowing

Page 15: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

15

29 Bruce Childers University of Pittsburgh

Improvement in Performance

0.9 1 1.1 1.2 1.3 1.4

Avg-All

Avg-Demand

Avg-Shadow

fft

ghostscript

lame

jpeg.dec

pgp.enc

susan.cor

Performance Speedup

UCB-75% SPB

~5% loss – use full shadowing

~11% (UCB) ~14% (SPB) gain use DBT demand paging

30 Bruce Childers University of Pittsburgh

Improvement in Performance

0.9 1 1.1 1.2 1.3 1.4

Avg-All

Avg-Demand

Avg-Shadow

fft

ghostscript

lame

jpeg.dec

pgp.enc

susan.cor

Performance Speedup

UCB-75% SPB

UCB-75% nearly as good yet memory size about same as full binary shadowing

Page 16: Memory Shadowing - University of Pittsburghchilders/papers/DATE11.pdf4/2/11 11 Bruce Childers 21 University of Pittsburgh NAND Page Reads Program FS SPB UCB-75-FIFO UCB-75-LRU fft

4/2/11

16

31 Bruce Childers University of Pittsburgh

Conclusion •  Dynamic binary translation can serve as basis for

on-demand code paging

•  Challenge is how to manage combine memory resources to effectively hold pages and translated instructions

•  UCB most effective: Constrains footprint, yet does well when full shadowing shouldn’t be used

•  Boot time and performance (UCB-75%) •  Boot time 4.75x (average) faster •  Performance 1.11 (average) speedup