1
Rule The Next Generation Supercomputers With X10 Why a New Language? Frequency Wall Inability to follow past frequency scaling trends. Memory Wall Inability to support a coherent uniform-memory access model with reasonable performance. Scalability Wall Inability to utilize all levels of available parallelism in the system[1]. What is X10? X10 is a new language developed in the IBM PERCS project as part of the DARPA program on High Productivity Computing Systems (HPCS)[2]. X10 is an instance of the APGAS framework in the Java family. References [1] Kemal Ebcioglu, Vijay Saraswat, Vivek Sarkar, X10: Programming for Hierarchical Parallelism and Non-uniform data access. 3rd International Workshop on Language Runtimes, Impact of Next Generation Processor Architectures on Virtual Machine Technologies co-located with ACM OOPSLA, 2004. [2] http://www.x10-lang.org/ [3] http://jikesrvm.org/ Supervised by Dr. Stephen Blackburn & Dr. Alistair P. Rendell X10 Places (Processes) def addTo(a:DistArray[Int], b:DistArray[Int]) {a.dist == b.dist} = { val D = a.dist; for(p in D.places()) at(p) { for(i in D.get(p)) a(i) += b(i); } } Same Distribution One 'at' per relevant place Local loop over points at p X10 Activities (Threads) public static def main(argv:Rail[String]!) { val sums = Rail.make[Int](2, (Int) => 0); finish { async { sums(0) = sum(1, 100, (i:Int) => i*i); } async { sums(1) = sum(1, 1000, (i:Int) => i); } } val t = sums(0) + sums(1); x10.io.Console.OUT.println("t=" + t); } Spawn an activity Spawn another activity Wait for finish Why X10? Is more productive than current parallel programming models as well as more convenient and accurate than Java. Can support high levels of abstraction. Can exploit multiple levels of parallelism and non- uniform data access. Is suitable for multiple architectures, and multiple workloads. Research Areas X10 Runtime Jikes RVM: Jikes Research Virtual Machine[3] is implemented in the Java™ programming language, which runs on itself without requiring a second virtual machine. Ongoing work for extending Jikes RVM as X10 Java runtime. MPI: Runtime support for Point to point communication in X10 code existing. Ongoing work for implementation of collective communication among X10 team object comprising of threads and processes. Cell & CUDA: Designing and implementing X10 runtime for Cell processors and CUDA architecture is also a very promising research area. X10 Application X10 Compiler X10 Runtime in X10 (XRX) C++ Compiler JAVA Compiler X10aux (Runtime Support Library X10RT X10RT Logical MPI PGAS Cell CUDA C++ JAVA Executable Vivek Kumar Immutable data: final fields, value type instance Local Objects Local Sections Activities Remote Objects Remote Sections Activities Global Arrays Outbound & Inbound Activities Globally Asynchronous CPU Cache CPU Cache CPU Cache Interconnect System Bus Memory CPU Cache I/O Controller I/O Controller Disks Disks GROUP-1 CPU Cache CPU Cache CPU Cache Interconnect System Bus Memory CPU Cache I/O Controller I/O Controller Disks Disks GROUP-2

Vivek Kumar · Vivek Kumar Immutable data: final fields, value type instance Local Objects Local Sections Activities Remote Objects Remote Sections Activities Global Arrays Outbound

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vivek Kumar · Vivek Kumar Immutable data: final fields, value type instance Local Objects Local Sections Activities Remote Objects Remote Sections Activities Global Arrays Outbound

Rule The Next Generation Supercomputers With X10

Why a New Language?

Frequency Wall – Inability to follow past frequencyscaling trends.

Memory Wall – Inability to support a coherentuniform-memory access model with reasonableperformance.

Scalability Wall – Inability to utilize all levels ofavailable parallelism in the system[1].

What is X10?

X10 is a new language developed in the IBM PERCSproject as part of the DARPA program on High ProductivityComputing Systems (HPCS)[2].

X10 is an instance of the APGAS framework in the Javafamily.

References

[1] Kemal Ebcioglu, Vijay Saraswat, Vivek Sarkar, X10: Programmingfor Hierarchical Parallelism and Non-uniform data access. 3rdInternational Workshop on Language Runtimes, Impact of NextGeneration Processor Architectures on Virtual Machine Technologiesco-located with ACM OOPSLA, 2004.

[2] http://www.x10-lang.org/[3] http://jikesrvm.org/

Supervised by Dr. Stephen Blackburn & Dr. Alistair P. Rendell

X10 Places (Processes)

def addTo(a:DistArray[Int], b:DistArray[Int])

{a.dist == b.dist} = {

val D = a.dist;

for(p in D.places())

at(p) {

for(i in D.get(p))

a(i) += b(i);

}

}

Same Distribution

One 'at' per relevant place

Local loop over points at p

X10 Activities (Threads)

public static def main(argv:Rail[String]!) {

val sums = Rail.make[Int](2, (Int) => 0);

finish {

async

{ sums(0) = sum(1, 100, (i:Int) => i*i); }

async

{ sums(1) = sum(1, 1000, (i:Int) => i); }

}

val t = sums(0) + sums(1);

x10.io.Console.OUT.println("t=" + t);

}

Spawn an activity

Spawn another activity

Wait for finish

Why X10?

Is more productive than current parallel programmingmodels as well as more convenient and accurate thanJava.

Can support high levels of abstraction.

Can exploit multiple levels of parallelism and non-uniform data access.

Is suitable for multiple architectures, and multipleworkloads.

Research Areas – X10 Runtime

Jikes RVM: Jikes Research Virtual Machine[3] isimplemented in the Java™ programming language, whichruns on itself without requiring a second virtual machine.Ongoing work for extending Jikes RVM as X10 Javaruntime.

MPI: Runtime support for Point to point communicationin X10 code existing. Ongoing work for implementation ofcollective communication among X10 team objectcomprising of threads and processes.

Cell & CUDA: Designing and implementing X10runtime for Cell processors and CUDA architecture is alsoa very promising research area.

X10 Application

X10 CompilerX10 Runtime in X10

(XRX)

C++ Compiler

JAVA Compiler

X10aux (Runtime Support Library

X10RT X10RT Logical

MPI PGAS Cell CUDA

C++ JAVA

Executable

Vivek Kumar

Immutable data: final fields, value type instance

Local Objects

Local Sections

Activities

Remote Objects

Remote Sections

Activities

Global Arrays

Outbound & Inbound Activities

Globally

Asynchronous

CPUCache

CPUCache

CPUCache

Interconnect

System Bus

Memory CPUCache

I/O ControllerI/O Controller

Disks DisksGROUP-1

CPUCache

CPUCache

CPUCache

Interconnect

System Bus

Memory CPUCache

I/O ControllerI/O Controller

Disks DisksGROUP-2