Transcript

Enabling Exascale Hardware and Software Design Through Scalable System Virtualization

ApproachObjectives

Architecture: Virtualization for Exascale

Impact

Compute Nodes

I/O Nodes

Login Nodes

PE

PE PE

PE PE

PE PE PE PE

PE PE PE PE

PE PE PE PE

PE

VirtualMachine

Palacios

Linux/Kitten

Challenges

Accomplishments

Formal Problem Statement Research Products/Artifacts

Provide a novel solution for testing at scale

Ease the transition to production by supporting scaling of

legacy system software

Enable advanced research

Architecture research toward exascale

New parallel programming models

System software research

Extend the Kitten/Palacios prototype

Support for modern hardware

Port to HPC operating systems

Integration of system management tools

Design & implement new capabilities

Integration with micro-architectural simulator

Binary translation for the emulation of new

hardware

Fault injection

Provide a test-bed solution for exascale

Vertical profiling

Fault injection

Provide a platform for exascale research

System architecture research

Programming languages research

System software research

Application must be ported to HPC systems

Scientists focus on technical issues rather than

science

Expensive in term of time and resource

Limited availability of test-beds

Makes the transition to large-scale systems

difficult

Limit architecture and system research at scale

Difficult to isolate the application from the hardware

Minimize the overhead created by the virtualization

Efficient access to hardware

Decrease overhead associated with context switches of virtual machines

Efficient capabilities such as migration

Cope with the complexity created by virtualization

Management and deployment tools

Tools for application tracing and profiling in a virtual environment

A test-bed simulating large-scale systems

Tools which ease the transition to large-scale systems &

enable advanced system management

• Improves system customization, adaptation, and resilience

Advanced security capabilities (applications are isolated

into virtual machines)

Software releases

Several versions of Palacios have been released: http://v3vee.org/

• Several versions of Kitten have been release:

https://software.sandia.gov/trac/kitten

• Virtualization system management tools available in OSCAR:

http://oscar.openclustergroup.org

• Experiments

• Use of an INCITE award for testing on Cray systems (e.g., JAGUAR)

Contact:

Stephen L. Scott – [email protected]

Thomas Naughton – [email protected]

Geoffroy R. Vallée – [email protected]

Recommended