38
The importance of memory in the next generation of real-time systems Paolo Burgio [email protected]

The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

The importance of memory in the

next generation of real-time systems

Paolo Burgio

[email protected]

Page 2: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

The four horsemen1. Heavy workloads

– Sensor-fusion and image-processing

2. Reduced power consumption

– Smaller batteries and renewable power sources

3. Quickly interact with the environment

– Prompt elaboration of sensor data

4. Run highest criticality workloads

– Replacing safety-critical human activities

Future embedded systems

Artificial intelligence

Industry 4.0

Internet-of-Things

Autonomous

drivingHealth and medicine

Cyber-physical

systems

IWES @Rome, September 8, 2017 2

Page 3: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Multi- and many-core platforms are the solution for 1-2(-3)

✓ Climbing "the power wall"

✓ High Performance @ poor Watts

Real-Time system: produce result in a guaranteed/bounded amount of time

✓ By construction

✓ Application fields: automotive, avionics, industry, medical…

The keyword: predictability

✓ Provide the correct result.…when expected

✓ The system must be simple to analyze

Real-Time multi-core systems?

IWES @Rome, September 8, 2017 3

Page 4: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Single-core, multiple tasks/applications

1. Analyze the system (HW/SW)

2. Derive a (mathematical?) model

3. Do some magic mathematics…

…guaranteed timing

bounds!

Optimal sharing of the core between task

✓ ..and guaranteed by construction

✓ Scheduling (also, mapping)

Real-Time systems – traditional approach

CPU

Main memory,

or L3 cache

Offchip memory

(DRAM)

T

L1 $

Level-2 $

TT

IF

IWES @Rome, September 8, 2017 4

Application

(Taskset)

Page 5: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Architectural bottlenecks

✓ Shared memory banks

✓ Caches ($)

✓ I/Os

Multi-core systems

IWES @Rome, September 8, 2017

CPU

0

Main memory, or L3 cache

Offchip memory

CPU

1

CPU

2

CPU

3

T TT T T

L1 $ L1 $ L1 $ L1 $

Level-2 $

IF

5IWES @Rome, September 8, 2017

Page 6: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Beyond traditional tecnhiques

1. More parameters

– Shared resources (e.g., memory, SSDs, IOs, caches..)

– The complexty of analysis grows exponentially w/number of

cores

2. Mem accesses: instead of thin lines, big bars

– The mostly accessed resource in the system

– Traditional techniques are too conservative (bounds too

high)

It's (mainly) a memory issue!

MEM

Mem

ory

acc

esse

s

TT

# cores

8 4 2 1

IWES @Rome, September 8, 2017 6

Page 7: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Thousands cores arranged in CLUSTERS✓

Host✓ -acccelerator architecture (e.g., GP-GPUs)

..even worse!✓

Many-core systems

CPU

L1 $

MMU

L2 $

CPU

L1 $

MMU

L2 $

cluster

L1 MEM

DMA

cluster

L1 MEM

DMA

L2 MEM

Main Memory

Coherent interconnect

Interconnect

NetworkInterface

General purpose host Many-core acceleratorCPU

MMU/$

CPU CPU…

DMA

100s cores

IWES @Rome, September 8, 2017 7

1000s cores

Page 8: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Two motivating examples

✓ Both from real systems

1. Many-core accelerator-based platforms

– Quad-/Octa-core as host

– Integrated GPU – iGPU of FPGA

– Powerful enough to run neural networks

2. Reference industrial system

– Multi-core ARM

– Multi-OS (embedded Linux + Win for UI)

– Hypervisor-based

Knowledge of the platform is power

IWES @Rome, September 8, 2017 8

Page 9: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Qualitatively analyze and characterize the conflicts due to parallel accesses to main memory by

both CPU cores and iGPU

1. NVIDIA Tegra K1 w/Kepler GPU

2. NVIDIA Tegra X1 w/Maxwell GPU

3. NVIDIA Tegra X2 w/Parker GPU – automotive-grade

4. Intel i7-6700 w/intel GPU

5. Xilinx Zynq Ultrascale multi-core + FPGA (+GPU)

Testbed #1: "automotive" platforms

Roberto Cavicchioli, Nicola Capodieci and Marko Bertogna, "Memory Interference

Characterization between CPU cores and integrated GPUs in Mixed-Criticality

Platforms", 22nd IEEE International Conference on Emerging Technologies And

Factory Automation

IWES @Rome, September 8, 2017 9

Page 10: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

✓ Shared memory between CPU/GPU complex

– "Unified Virtual Memory"

– Unlike traditional "discrete" GPU systems

Notable contention points

NVIDIA Tegra K2

1

IWES @Rome, September 8, 2017 10

Page 11: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Test 'A' - Tegra X2 – A57

IWES @Rome, September 8, 2017 11

Page 12: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

✓ Last-generation FPGA-based heterogeneous SoC

– FPGA = (re-)programmability

✓ ARM A53 Quad-core as host "PS"

✓ FPGA as accelerator "PL"

Notable contention points

Xilinx Zynq Ultrascale

1

IWES @Rome, September 8, 2017 12

Page 13: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Test 'A' - Xilinx Zynq

IWES @Rome, September 8, 2017 13

Page 14: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Test 'B' - Tegra X2

IWES @Rome, September 8, 2017 14

Page 15: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Test 'B' - Xilinx Ultrascale

IWES @Rome, September 8, 2017 15

Page 16: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Test 'C' - Tegra X2 – A57

IWES @Rome, September 8, 2017 16

Page 17: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Test 'C' - Xilinx Ultrascale

IWES @Rome, September 8, 2017 17

Page 18: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

✓ Interfere with prefetching mechanism

✓ Interfering cores read at increasing strided addresses

Prefetching

IWES @Rome, September 8, 2017 18

Page 19: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

NXP iMX✓ 6 from Egicon

Components for F– 1 teams, industrial telescopic arms

Credits to Francesco Bellei–

Testbed #2: industrial platform

IWES @Rome, September 8, 2017 19

Page 20: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

✓ More "traditional"

iMX6 mem hierarchy

Core 1

Cache L1

Cache L2

Core 2

Cache L1

Core 3

Cache L1

Core 4

Cache L1

Memoria (RAM)

fast fast fast fast

slow

IWES @Rome, September 8, 2017 20

Page 21: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Memory latency - sequential (ns)

0,0

50,0

100,0

150,0

200,0

250,0

300,0

1,0 4,0 16,0 64,0 256,0 1024,0 4096,0 16384,0

Sequential access

Senza Interferenza Interferenza 1 core Interferenza 2 core Interfernza 3 coreWorking Set in KB

N

a

n

o

s

e

c

o

n

d

s

Lat

Cache Line Size

62,8 ns

32 byte

== 0,5 GB/sMax BW =

IWES @Rome, September 8, 2017 21

Page 22: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Memory interference impact

0,0

100,0

200,0

300,0

400,0

500,0

600,0

1,0 4,0 16,0 64,0 256,0 1024,0 4096,0 16384,0

Random vs Sequential

Senza Interf. Seq. Con Interf. Seq Senza Interf. Rand. Con interf Rnd.

3 random

interference

3 sequential

interference

L1$ region Mem

N

a

n

o

s

e

c

o

n

d

s

L2 $ region

IWES @Rome, September 8, 2017 22

Page 23: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

What do we do with this

knowledge?

Page 24: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

✓ A set of techniques to turn the view of the system that software has..

Single-core equivalence

CPU 0

Shared RAM

CPU 1

Shared $

CPU 0

RAM

$

CPU 1

RAM

$

…into this

Cache coloring/

partitioning

Time Division

Multiple Access

Multi-port mem

w/bank partitioning

From this…

IWES @Rome, September 8, 2017 24

Interconnect

Interconnect

Page 25: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

✓ Group memory access at the beginning of

every software task

✓ Co-schedule memory accesses and tasks-

to-cores

✓ Greatly reduces the complexity of the

scheduling problem

…and increases performance

Up to 4x predictable performance

on a many-core platform

PREM - PRedictable Execution Models

MEM

Mem

ory

acc

esse

s

TT

non-PREM

TT

C

M

C

M

With PREM

Memoryscheduler

2015 paper

@ RTEST

IWES @Rome, September 8, 2017 25

Page 26: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

Thank you!

Paolo Burgio

[email protected]

http://hipert.unimore.it

Page 27: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

Backup

Page 28: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

1. One observed core reads sequentially within a variable sized working set, while other cores are

interfering sequentially

2. One observed core reads randomly within a variable sized working set, while other cores are

interfering sequentially

3. One observed core reads sequentially within a variable sized working set, while other cores are

interfering randomly

4. One observed core reads randomly within a variable sized working set, while other cores are

interfering randomly

Test case A – intra-CPU interference

IWES @Rome, September 8, 2017 28

Page 29: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

✓ Shared memory between CPU/GPU complex

– "Unified Virtual Memory"

Notable contention points

NVIDIA Tegra family

TK1 TX1/2

1

IWES @Rome, September 8, 2017 29

Page 30: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Intel i7-6700 Skylake

✓ x86_64 powerful host + iGPU

– Sharing L3$, External DRAM…

Notable contention points 1

IWES @Rome, September 8, 2017 30

Page 31: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Tegra X1

IWES @Rome, September 8, 2017 31

Page 32: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Tegra X2 - Denver

IWES @Rome, September 8, 2017 32

Page 33: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

1. One CPU core reads sequentially within a variable working set, while the GPU accesses

memory according to different paradigms:

– CUDA memcpy

– CUDA memcpy on UVM

– CUDA memcpy on pinned mem

– CUDA memset (0)

2. Same, but CPU core reads randomly

Test case B – iGPU interference on CPU

IWES @Rome, September 8, 2017 33

Page 34: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Tegra X1

IWES @Rome, September 8, 2017 34

Page 35: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

1. CPU generates sequential interfering mem accesses, while GPU accesses memory according

to different paradigms:

– CUDA memcpy

– CUDA memcpy on UVM

– CUDA memcpy on pinned mem

– CUDA memset (0)

2. Same, but CPU core interference is random

Test case C – CPU interference on iGPU

IWES @Rome, September 8, 2017 35

Page 36: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Tegra X1

IWES @Rome, September 8, 2017 36

Page 37: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Tegra X2 - Denver

IWES @Rome, September 8, 2017 37

Page 38: The importance of memory in the next generation of real ...mclab.di.uniroma1.it/iwes2017/documents/burgio2/slides.pdf · The importance of memory in the next generation of real-time

©2017 University of Modena and Reggio Emilia

Test 'C' - Intel i7-6700

IWES @Rome, September 8, 2017 38