13
TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

Embed Size (px)

Citation preview

Page 1: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF

OS support for TerafluxA Prototype

Avi Mendelson

Doron Shamia

Page 2: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF System and Execution ModelsData Flow Based

Jan 17-18 2011, Rome, Italy

2

• System is made out of clusters.– Each cluster contains 16 cores (may change)– Each cluster is controlled by a single “OS kernel”; e.g., Linux, L4

• Execution is made up of tasks; each task– Has no side effects– Are scheduled with their data (may use pointers)– May return results– If fail to complete, can be reschedule on the same core/other core

• Tasks can be executed on any (service) cluster and has a unified view of system memory

• All resource allocation/management is done in two levels, a local one and a global one

Page 3: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF System Overview Target Protoyped System

Jan 17-18 2011, Rome, Italy

3

Linux

L4

Configuration Page

Message Buffers

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Linux

L4

Cores View Memory View

CPU == Cluster

Page 4: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF Target SystemOS Requirements

Jan 17-18 2011, Rome, Italy

4

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Linux

L4 L4 (uKernel)

Linux (Full OS)

• Manages jobs on uKernel (uK) cores• Proxies uKs I/O requests• Remote debug uKs/self• Runs high level (system) FT

managing uK/self faults

• Each uK runs a job• Jobs sent by full OS (FOS)• Jobs have no side-effects• Failed jobs are simply restarted• Runs low level FT, reporting to

FOS

Single chipMulti cores

Page 5: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF Communications (1)

Jan 17-18 2011, Rome, Italy

5

Linux

L4

Configuration Page

Message Buffers

Buffer

Buffer

Buffer

Buffer

• Ownership (L4/Linux)

• Ready flag• Type• Length (bytes)

• Data• Fixups (optional)

Page 6: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF Communications (2)

Jan 17-18 2011, Rome, Italy

6

Linux

L4

Configuration Page

Message Buffers

Buffer

Buffer

Buffer

Buffer

• Ownership (L4/Linux)

• Ready flag• Type• Length (bytes)• Data• Fixups (optional)

• Ownership: who currently uses the buffer• Ready: Signals the buffer is ready to be

transferred to the other side (inverse owner)

• Type: The message type • Data: simply the raw data (according to type)• Fixups: A list of fixups in case we pass pointers

Page 7: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF Current Prototype

• Goal: Quick development of OS support, and applications (later to move on COTson full prototype)

• Quick prototyping via VMs• Linux on both ends (Fedora 13)

– Main node = Linux (host)– Service Nodes = Linux (VMs)

• Using shared memory between – Host and VMs– Between VMs

• Shared memory uses kernel driver (ivshmem)

Jan 17-18 2011, Rome, Italy

7

Page 8: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF Prototype Architecture

Jan 17-18 2011, Rome, Italy

8

Linux F13 (Host)

User space

Kernel space

IVSHMEM

Linux F13QEMU

Linux F13QEMU

Linux F13QEMU

Linux F13QEMU

App

Page 9: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF IV Shared Memory Arch

Jan 17-18 2011, Rome, Italy

9

QEMU maps shared-memory into RAM

Exposed as a PCI BAR

mmap to user level

Page 10: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF Communications

Jan 17-18 2011, Rome, Italy

10

Linux F13 (Host)

User space

Kernel space

Shared RAM

Linux F13QEMU

Linux F13QEMU

Linux F13QEMU

Linux F13QEMU

MsgMsg

Msg

App

Message queue API

Data Flow App

Page 11: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF Demo (toy) Apps

• Distributed sum app – Single work dispatcher (host)– Multiple sum-engines (VMs)

• Distributed Mandelbrot – Single work dispatcher – lines (host)– Multiple compute engines – compute pixels of

each line (VMs)

Jan 17-18 2011, Rome, Italy

11

Page 12: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF Futures

Jan 17-18 2011, Rome, Italy

12

• Single Boot– A TeraFlux chips boots a FOS– FOS boots the uKs on the other cores– Looks like a single boot process

• Distributed Fault Tolerance– Allow uK/FOS to test each others health– One step beyond FOS-centric FT

• Cores Repurposing – If FOS cores fail, uK cores re-boot as FOS– New FOS takes over using last valid data snapshot

Page 13: TERA LUX F OS support for Teraflux A Prototype Avi Mendelson Doron Shamia

TERA LUXF References

• Inter-VM Shared memory

Jan 17-18 2011, Rome, Italy

13