18
libHPC: Software sustainability and reuse through metadata preservation Jeremy Cohen, John Darlington, Brian Fuchs London e-Science Centre / Department of Computing, Imperial College London David Moxey, Chris Cantwell, Pavel Burovskiy, Spencer Sherwin Department of Aeronautics, Imperial College London Neil Chue Hong Software Sustainability Institute, University of Edinburgh First Workshop on Maintainable Software Practices in e-Science, Chicago Tuesday 9 th October 2012

libHPC: Software sustainability and reuse through metadata preservation

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: libHPC: Software sustainability and reuse through metadata preservation

libHPC: Software sustainability and reuse through metadata preservation Jeremy Cohen, John Darlington, Brian Fuchs London e-Science Centre / Department of Computing, Imperial College London

David Moxey, Chris Cantwell, Pavel Burovskiy, Spencer Sherwin Department of Aeronautics, Imperial College London

Neil Chue Hong Software Sustainability Institute, University of Edinburgh First Workshop on Maintainable Software Practices in e-Science, Chicago Tuesday 9th October 2012

Page 2: libHPC: Software sustainability and reuse through metadata preservation

Introduction

•  Decision making – building scientific software can be hard

•  Abstraction – hide the complexity

•  Efficiency – achieve the performance

•  Aim for a universal technology that spans all application

domains, machines, metrics

•  Coordination forms – a different approach to task

specification

•  Components – encapsulated building blocks

Machines

Applications

Metrics

ClusterCloudMulti-core

GPUFPGA

Time

Cost

Energy

Num.Intensive

Data Intensive

Bioinformatics

CFD

Page 3: libHPC: Software sustainability and reuse through metadata preservation

Information and decisions

Why is software development and re-use hard?

•  A particular piece of code is the result of many development decisions

•  Developers invest significant knowledge about the task to be solved

…however…

•  Decisions made by developers cannot be reconstructed from the code

•  Loss of original information and structure invested by developer(s)

Page 4: libHPC: Software sustainability and reuse through metadata preservation

Information and decisions

Understanding code structure and the options available and the decisions made during development is important:

•  Portability; optimisation on different architectures

•  Long-term sustainability

Need an explicit representation of decisions and alternatives:

•  Decision tree used to represent this (structure)

•  Metadata used to annotate decision tree (information)

•  Modifications can be made to decision tree (based on metadata analysis) which can than be mapped to modified code

Page 5: libHPC: Software sustainability and reuse through metadata preservation

Information and decisions

e.g. code that uses a solver:

•  Many options to select suitable solver – abstract components

•  Choice dependent on problem being addressed, parameters, etc.

•  Represent solver choice on a tree of component alternatives, leaf nodes are concrete implementations higher-level nodes are abstract

Linear Solver"

Jacobi"LU"

Matrix

Vector Vector

Matrix Vector Vector Matrix

Vector Vector

Sequential LU" Parallel LU"(OpenMP)"

Parallel LU"(MPI)"

Sequential Jacobi"

Parallel Jacobi (UPC)"

Page 6: libHPC: Software sustainability and reuse through metadata preservation

Abstractions

a Encapsulation

Encapsulate functions as components (reuse)

Allow alternatives

a Functional properties

Referentially transparent a Encapsulation

Church-Rosser a Alternative behaviours

Page 7: libHPC: Software sustainability and reuse through metadata preservation

Abstractions – alternative behaviours

i.e. Church-Rosser

(4 + 3) – (2 + 1)

7 – 3

4

7 – (2 + 1) (4 + 3) – 3

Page 8: libHPC: Software sustainability and reuse through metadata preservation

Application flow and specification

We represent application elements using two techniques

•  Data processing – core code that forms application building blocks

a Components (first-order functions)

•  Control flow, orchestration

a High-order functions

a Coordination Forms

e.g. Pipe, Parallel, Map / Reduce, …

Page 9: libHPC: Software sustainability and reuse through metadata preservation

•  A functional/mathematical approach to job specification

•  Based on work by Darlington, et al.

•  Applied to components – define application flow

•  May be:

•  General – applicable to most applications – e.g. PIPE, PAR

•  Iterative patterns – e.g FARM, ITERATE

•  Domain-specific higher-level forms – e.g. Monte Carlo

•  Extensible – new patterns can be introduced

Coordination Forms

J. Darlington, Y. Guo, H. W. To and J. Yang. Functional skeletons for parallel coordination. In proceedings of EURO-PAR ’95 Parallel Processing, LNCS 966/1995, p. 55-66, 1995. Springer Berlin/Heidelberg

Page 10: libHPC: Software sustainability and reuse through metadata preservation

•  A given form may have multiple underlying implementations

•  E.g. PAR may provide sequential, multi-threaded and MPI parallel implementations

•  Forms aim to be as lightweight as possible

•  They result in code that can be run

•  They intelligently glue together component building blocks

•  PIPE as an example – functions f1 to fn with initial input a:

PIPE [ f1, f2,…fn ]a = (f1 ° f2 ° … fn)a

= f1(f2 (… (fn(a))))

Coordination Forms

Page 11: libHPC: Software sustainability and reuse through metadata preservation

PIPE ([component list], initial input)

PAR ([component list], [(input1), (input2), …, (inputn)])

Coordination Forms – Impementation

•  Prototype implementation in Python •  Class wrappers for component and parameter metadata –

concrete implementation code selectable

PIPE – Compose a series of components in the order specified

PAR – Run a series of components independently (perhaps in parallel)

Additional parameters can be added in component list

E.g. for components add, multiply, divide:

2 * ( (245+34) / (6+8) )

PIPE([(multiply, 2), divide, PAR([add,add],[(245,34),(6,8)])])

Page 12: libHPC: Software sustainability and reuse through metadata preservation

Bioinformatics: Genome Read Pre-Processing/Mapping Reference Genome

FASTA file

Short Read Set (Paired)

Single FASTQ file

FASTQ splitbwa index

bwa aln bwa aln

SR_1 SR_2

bwa sampe - generate alignment (paired ended)

samtools import

FAST

A file

+ in

dex

file

SAM file

BAM file

samtools sort

sorted BAM file

samtools index

OUTPUT

Input files – Reference Genome – FASTA file Reads from sequencing machine - FASTQ

((sr1,sr2), u) = PAR([fastq_split, bwa_index], [(short_read_file, None, None),(ref_genome_file,)])

(v, w) = PAR([bwa_aln, bwa_aln], [(ref_genome_file, sr1, None), (ref_genome_file, sr2, None)])

result = PIPE([samtools_index, samtools_sort, (samtools_import, ref_genome_file), bwa_sampe],

[ref_genome_file, [v,w], [sr1, sr2], None])

Page 13: libHPC: Software sustainability and reuse through metadata preservation

LibHPC Project

•  LibHPC

•  Two year project under EPSRC HPC Software Programme

•  Imperial College London (Computing (LeSC), Aeronautics, ICT)

•  SSI, Edinburgh

•  Implementing/demonstrating framework with main supporting application (Nektar++) + other exemplars

Page 14: libHPC: Software sustainability and reuse through metadata preservation

Example

Optimising FEM Codes

High-level Application Description / Job Specification(Co-ordination Forms, DSLs, etc.)

Job Specification Analysis/Processing

Hardware Resources

Software Component Library & Metadata Resource

Discovery & Metadata

Domain-specificApplication Support

Libraries

Page 15: libHPC: Software sustainability and reuse through metadata preservation

Nektar++ - Hybrid Assembly

•  Nektar++ operates on matrices based on input mesh

•  Each element of input mesh is mapped to an (elemental) matrix

•  There are two matrix assembly strategies:

•  Local

•  Global

Page 16: libHPC: Software sustainability and reuse through metadata preservation

Nektar++ - Hybrid Assembly

=

=

=

1

=

=

=

1

Local Assembly Global Assembly

Page 17: libHPC: Software sustainability and reuse through metadata preservation

Nektar++ - Hybrid Assembly

=

=

=

1

Hybrid Assembly

Page 18: libHPC: Software sustainability and reuse through metadata preservation

Thank You