Upload
erik-kelly
View
214
Download
0
Embed Size (px)
Citation preview
TERA LUXF
OS support for TerafluxA Prototype
Avi Mendelson
Doron Shamia
TERA LUXF System and Execution ModelsData Flow Based
Jan 17-18 2011, Rome, Italy
2
• System is made out of clusters.– Each cluster contains 16 cores (may change)– Each cluster is controlled by a single “OS kernel”; e.g., Linux, L4
• Execution is made up of tasks; each task– Has no side effects– Are scheduled with their data (may use pointers)– May return results– If fail to complete, can be reschedule on the same core/other core
• Tasks can be executed on any (service) cluster and has a unified view of system memory
• All resource allocation/management is done in two levels, a local one and a global one
TERA LUXF System Overview Target Protoyped System
Jan 17-18 2011, Rome, Italy
3
Linux
L4
Configuration Page
Message Buffers
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Linux
L4
Cores View Memory View
CPU == Cluster
TERA LUXF Target SystemOS Requirements
Jan 17-18 2011, Rome, Italy
4
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Linux
L4 L4 (uKernel)
Linux (Full OS)
• Manages jobs on uKernel (uK) cores• Proxies uKs I/O requests• Remote debug uKs/self• Runs high level (system) FT
managing uK/self faults
• Each uK runs a job• Jobs sent by full OS (FOS)• Jobs have no side-effects• Failed jobs are simply restarted• Runs low level FT, reporting to
FOS
Single chipMulti cores
TERA LUXF Communications (1)
Jan 17-18 2011, Rome, Italy
5
Linux
L4
Configuration Page
Message Buffers
Buffer
Buffer
Buffer
Buffer
• Ownership (L4/Linux)
• Ready flag• Type• Length (bytes)
• Data• Fixups (optional)
TERA LUXF Communications (2)
Jan 17-18 2011, Rome, Italy
6
Linux
L4
Configuration Page
Message Buffers
Buffer
Buffer
Buffer
Buffer
• Ownership (L4/Linux)
• Ready flag• Type• Length (bytes)• Data• Fixups (optional)
• Ownership: who currently uses the buffer• Ready: Signals the buffer is ready to be
transferred to the other side (inverse owner)
• Type: The message type • Data: simply the raw data (according to type)• Fixups: A list of fixups in case we pass pointers
TERA LUXF Current Prototype
• Goal: Quick development of OS support, and applications (later to move on COTson full prototype)
• Quick prototyping via VMs• Linux on both ends (Fedora 13)
– Main node = Linux (host)– Service Nodes = Linux (VMs)
• Using shared memory between – Host and VMs– Between VMs
• Shared memory uses kernel driver (ivshmem)
Jan 17-18 2011, Rome, Italy
7
TERA LUXF Prototype Architecture
Jan 17-18 2011, Rome, Italy
8
Linux F13 (Host)
User space
Kernel space
IVSHMEM
Linux F13QEMU
Linux F13QEMU
Linux F13QEMU
Linux F13QEMU
App
TERA LUXF IV Shared Memory Arch
Jan 17-18 2011, Rome, Italy
9
QEMU maps shared-memory into RAM
Exposed as a PCI BAR
mmap to user level
TERA LUXF Communications
Jan 17-18 2011, Rome, Italy
10
Linux F13 (Host)
User space
Kernel space
Shared RAM
Linux F13QEMU
Linux F13QEMU
Linux F13QEMU
Linux F13QEMU
MsgMsg
Msg
App
Message queue API
Data Flow App
TERA LUXF Demo (toy) Apps
• Distributed sum app – Single work dispatcher (host)– Multiple sum-engines (VMs)
• Distributed Mandelbrot – Single work dispatcher – lines (host)– Multiple compute engines – compute pixels of
each line (VMs)
Jan 17-18 2011, Rome, Italy
11
TERA LUXF Futures
Jan 17-18 2011, Rome, Italy
12
• Single Boot– A TeraFlux chips boots a FOS– FOS boots the uKs on the other cores– Looks like a single boot process
• Distributed Fault Tolerance– Allow uK/FOS to test each others health– One step beyond FOS-centric FT
• Cores Repurposing – If FOS cores fail, uK cores re-boot as FOS– New FOS takes over using last valid data snapshot
TERA LUXF References
• Inter-VM Shared memory
Jan 17-18 2011, Rome, Italy
13