36
STFC Scientific Computing Department Juan Bicarregui Head of Data Division PLaN-E meeting Copenhagen 9 April 2015 1

STFC Scientific Computing Department - · PDF fileOpen Data Infrastructure ... Different Infrastructures Raw Data Different User Experiences ... STFC Scientific Computing Department

Embed Size (px)

Citation preview

STFC

Scientific Computing Department

Juan Bicarregui

Head of Data Division

PLaN-E meeting

Copenhagen

9 April 2015

1

Overview

• Science and Technology Facilities Council

– Scientific Computing Department

Applications Division

Systems Division

Technology Division

Data Division

2

Programme includes:

• Neutron and Muon Source

• Synchrotron Radiation Source

• Lasers

• Space Science

• Particle Physics

• Compuing and Data Management

• Microstructures

• Nuclear Physics

• Radio Communications

What is STFC?

250m

ESRF & ILL, GrenobleDaresbury Laboratory

Square Kilometre Array Large Hadron Collider

The National Laboratories

Overview

• Science and Technology Facilities Council

– Scientific Computing Department

Applications Division

Systems Division

Technology Division

Data Division

5

Maximising

the impact of

Scientific Computing

Established 1st April 2012from e-Science and Computational Science and Engineering

Big Data for Big Science

Scientific Computing

180 staff supporting over 7500 users

• Scientific Applications development and support

• Compute and data facilities and services

• Systems administration, data services,

• high-performance computing,

• numerical analysis & software engineering.

> 100 publications pa

> 3500 training days pa

David Corney

Acting Director

Paul SherwoodApplications

Division Head

Group Leader

Martyn WinnBiology and Life Sciences

Dave EmersonEngineering and

Environment

Barbara MontanariTheoretical and Computational

Physics

TBCComputational Chemistry

Hannes Loeffler Eugene Krissinel Ville Uski David Waterman Marcin Wojdyr Charles Ballard Ronan Keegan Narayanan Krishnan Chris Morris Andrey Lebedev Chang Sik Kim Agnel Joseph Chris Wood Valeria Losasso

Benzi John Charles Moulinec Rob Barber Xiaojun Gu

Keith Refson Leonardo Bernasconi

Leon Petit Martin Lueders Martin Plummer Dominik Jochym Simone Sturniolo

Ilian Todorov John Purton David Gunn Laurence Ellison Richard Anderson Micheal Seaton Sebastian Metz Thomas Keal Henry Boateng Chin Yong

AP

PLI

CA

TIO

NS Shirley Miller

DL Administration

CECAM

Finance

Group Leader

Damian Jones

Karen McIntyre Jean Pearce Carol Malpass

Laura Johnston Esme Williams

Jenny Williams

CO

RE

AC

TIV

ITIE

S

Juan BicarreguiData

Jens JensenData Services

Brian MatthewsResearch Data

Division Head

Group Leader

Brian Davies Matt Viljoen Bruno Canning Shaun de Witt Christopher Prosser Kashyap Manjusha Carmine Cioffi Roger Downing Kevin O'Neill Juan Sierra

Shirley Crompton Kevin Phipps Antony Wilson Vasily Bunakov Simon Lambert Catherine Jones Alastair Duncan Brian Ritchie Wayne Chung Erica Yang Holly Zhen Denise Small Kunal AroraSteve Fisher Alistair Mills Thomas Kirkham

DA

TA

Tom Burnley

Stefano Rolfo Malgorzata Zimon Yin Yue

Richard Blake

RAL Administration

Paul Durham

Nic HarrisonTracey Kelly

Rob Appleyard Cheney Ketley

Acting Division

Head

Andrew SansumSystems

Dave CableHigh Performance Systems

Nick HillResearch Infrastructure

Andrew SansumPeta Scale Computing

and Storage

Group Leader

Tim Franks Viliam Kalavsky Colin Morey Tavia Stone Stephen Hill Don Parkin

Jonathan Churchill Suleman Tariq Ian Johnson Derek Ross Mohit Mittal Sam Worley Dave Meredith Claire Devereux John Kewley Cristina del Cano Novales

Stuart Pullinger Ahmed Sajid Matt Langthorpe

George Ryall

Martin Bly Tim Folkes Kashif Hafeez James Adams Dimitrios Zilaskos Ian Collier Catalin Condurache Gareth Smith John Kelly Tiju Idiculla

SYST

EMS

Peter OliverTechnology

Division Head

Jennifer ScottNumerical Analysis

Chris GreenoughSoftware Engineering

Mike AshworthApplication Performance

Engineering

Martin Turner Visualisation

Group Leader

Iain Duff Nick Gould Jonathan Hogg Tyrone Rees Sue Thorne

Stephen Pickles Lucian Anton Xiaohu Guo Andrew Porter Andrew Sunderland Rupert Ford

Barry Searle Ronald Fowler Srikanth Nagella

TEC

HN

OLO

GY

Cliff BreretonHartree Centre

DivisionHead

Michael Gleaves

HA

RTR

EEC

ENTR

E

Sarah Steele

Rob Allan

EvgueniOvtchinnikov

Jonathan Follows

Adrian Toland Terry Hewitt Lee Hannis David Moss

Andrew LahiffPPD

Alistair Dewhurst PPD

Rob Harper

Adrian Coveney

Frazer Barnsley Greg Corbett

Graham Riley

GR

AD

UA

TES

Emile YoumbiMbuenmo

John Gordon

Bill SmithDziidka SzotekValerie BurkePaul Kummer

Bob McMeekingJohn Reid

Paul Strange

Alan Kyffin

Karl Richardson

Gemma Poulter

September 2014

Vendel Szeremi Mark Mawson

HO

NO

RA

RY

SCIE

NTI

STS

VIS

ITIN

G

SCIE

NTI

STS

Coralia CartisJack Dongarra

Kirk JordanStanko Tmoic

James Gebbie

VIS

ITO

RS

Liam Jones OCFJeremy Appleyard

NVIDIA

Debbie FranksDaresbury Laboratory

Mark Swaisland

Maureen Williamson

Linda GilbertTracy Colborne

Rutherford Appleton Laboratory

Angela Walsh

Christelle Gendrin

Dawn Geatches David Bray

Alison Packer

Group Leader

Leon Petit

Chadwick and RAL Libraries

4 Divisions:

Applications

Data

Systems

Technology

Scientific Computing Department

Hartree Centre

Overview

• Science and Technology Facilities Council

– Scientific Computing Department

Applications Division

Systems Division

Technology Division

Data Division

8

Applications Division

• Four groups

• Developing and applying computational science

software packages

• Physical and biological sciences.

– Computational Biology (Martyn Winn)

including structural biology, molecular simulation

and bioinformatics

– Theoretical and Computational Physics (Barbara

Montanari),

electronic structure of the solid state and surfaces,

atomic and molecular physics

– Computational Engineering (David Emerson)

HPC solutions in fluid flow modelling, with

particular strength in turbulence and microfluidics

– Computational Chemistry (Ilian Todorov)

molecular dynamics, quantum chemistry and

QM/MM techniques, and mesoscale methods

9

Research Software in SCD

Nanometer Micrometer Meter

Seconds

Microseconds

Nanoseconds

Picoseconds

Femtoseconds

Time

Distance

CASTEP DL_POLY DL_MESO etc.

CRYSTALSoftware

Collaborative Computational Projects

CCP4 Prof David Brown Macromolecular Crystallography

CCP5 Prof Stephen Parker The Computer Simulation of Condensed Phases

CCP9 Prof Mike Payne Computational Electronic Structure of Condensed Matter

CCP12 Prof Stewart Cant High Performance Computing in Engineering

CCP-ASEArch Prof Mike Giles Algorithms and Software for Emerging Architectures

CCP-BioSim Prof Adrian Mulholland Biomolecular Simulation at the Life Sciences Interface

CCP-EM Dr Martyn Winn Electron Cryo-Microscopy

CCPi Prof Phillip Withers Tomographic Imaging

CCPN Prof Geerten Vuister NMR

CCP-NC Dr Jonathan Yates NMR Crystallography

CCPP Dr Tony Arber Computational Plasma Physics

CCPQ * Prof Tania Monteiro Quantum Dynamics in Atomic, Molecular and Optical Physics

CCP-SAS Prof Steve Perkins Structural Data in Chemical Biology and Soft Condensed Matter

CCPForge Prof Chris Greenough Collaborative Software Development Environment Tool11

Overview

• Science and Technology Facilities Council

– Scientific Computing Department

Applications Division

Systems Division

Technology Division

Data Division

12

UK Tier-1 for the LHC

Hunting the Higgs:

• Low cost commodity hardware

• Open source middleware

• Remote job submission (Grid)

• >99% availability

• >50 petabytes/year moved globally

• >300 petabytes/year moved internally

Run 2 (doubles data rates and volumes)

• 14 PB disk

• 16 PB tape

• 10,000 CPU cores

• 2000 servers

• 40Gb network

• 10Gb/s direct

optical link to CERN

Predictive environmental science (NERC)

JASMIN is a world leading, unique hybrid of:

• 16PB high performance storage (~250GByte/s)

• High-performance computing (~4,000 cores)

• Non-blocking Networking (> 3Tbit/sec),

and Optical Private Network WAN’s

• Coupled with cloud hosting capabilities

• JASMIN Holds >60% of Data used by the latest IPCC report on Climate change.

JASMIN

National Service Computational

Chemistry Software

Compute, Training and support

SGI Altix UV SMP system

512 CPUs,

4TB shared memory

~70 peer reviewed papers

per year

Over 40 applications

installed

Large scale GPU resource

£1M purchased in 2012

Oxford, Southampton, Bristol, and UCL

Universities

Diamond Tomography

Wide range of scientific usage from drug

discovery to climate modelling:

0

5

10

15

20

25

30

Bristol

Oxford

Southa

mpton

Emerald

Data Archive for BBSRC

Data Support Service for MRC

Overview

• Science and Technology Facilities Council

– Scientific Computing Department

Applications Division

Systems Division

Technology Division

Data Division

16

Software Engineering,

Algorithms and Optimisation

Numerical Analysis

– Maths Research, Sparse Linear Algebra and

Optimisation

Software Engineering

– Software Engineering Tools and Services,

Agent Modelling

Application Performance Engineering

• Performance tuning and optimisation of HPC

Applications, Novel Architectures

– Leading edge visualisation facilities

– Tomographic Imaging

– Collaboration spaces

– IMAT Neutron Imaging at ISIS

– offering time-of-flight tomography-

driven diffraction

Visualisation

Overview

• Science and Technology Facilities Council

– Scientific Computing Department

Applications Division

Systems Division

Technology Division

Data Division

19

Building an

Open Data Infrastructure

for Research

From Policy to Practice

The Innovation Lifecycle

The Body of Knowledge

The GovernmentProcess

The ResearchProcess

Aggregation of Knowledge lies at the heart of the innovation lifecycle

Enabling Knowledge Creation

Enabling Wealth Creation

Quality Assessment

Strategic Direction

Improved Quality of Life

Improved Understanding

Data centric view of research

DataCreation

Archival

Access

Storage ComputeNetwork

Services

Curation

the researcher actsthrough ingest and access

Virtual Research Environment

the researcher shouldn’t have to worry about the information infrastructure

Information Infrastructure

PaN-Data Infrastructure for Photon and Neutron Sources

Data Sharing Vision

Single Infrastructure Single User Experience

CapacityStorage

Publications Repositories

Data Repositories

Software Repositories

Raw Data Data Analysis

Analysed

DataPublication Data

Publications

Experiment 1

Raw Data Data Analysis

Analysed

DataPublication Data

Publications

Observation 2

Raw Data Data Analysis

Analysed

DataPublication Data

Publications

Simulation 3

Different Infrastructures Different User ExperiencesRaw Data Catalogue

Data Analysis

Analysed Data Catalogue

Publication Data Catalogue

Publications Catalogue

The 7 C’s

Creation Collection

Capacity

Computation

Curation

Collaboration Communication

DataCreation

Archival

Access

Storage Compute

Network

Services

Curation

Linked systems for:

• Proposal submission

• User management

• Data acquisition

Metadata carried from

each system to the

next

Detectors moving from

Hz to KHz, towards

MHz,...

Creation

Examining the detectors on MAPS instrument on ISIS

1

10

100

1.000

10.000

Total Data Stored (TeraBytes)

Capacity

Moore’s law for us

is about 15 months

}

10 x Moore’s Law (2 years)2 x Moore’s Law (1.5 years)

Moore’s Law

x1000 in 13 years

Doubling every 1.3 years

2012

Currently store about

20 PetaBytes of data

20PB

Computation

(BlueGene/Q,1.2 Pflop,#13; Emerald GPGPU cluster

at RAL)

Computational derivation of properties from

theory

Real-time diagnostics of instrument performance and data flow pipeline.

Fitting of experimental data to model Compute intensive components on HPC

Curation

RAL Facility Archives

• All ISIS data (~25 years) > 3,000,000 files

• All Diamond Data (~5 years) > 100,000,000 files

LHC Tier 1

• UK hub for LHC data (30PB)

Other UK Research Funders

• NERC JASMIN+CEMS super data cluster

• BBSRC Institutes data archive

• MRC Data Support Service

Universities

• Imperial College - National Service for Computational

Chemistry Software

• Oxford, UCL, Southampton Bristol, Emerald GPGPU

cluster

Publications:

• The STFC Publications Archive

The

StorageTek

tape robots

100PB

Capacity

JASMIN

Collection

Proposal

Approval

Scheduling

Experiment

Data

cleansing

Record

Publication

Scientist

submits

application for

beamtime

Facility committee

approves

application

Facility registers,

trains, and

schedules

scientist’s visit

Scientists visits,

facility run’s

experiment

Subsequent

publication

registered with

facility

Raw data filtered

and cleansed

Data

analysis

Tools for

processing made

available

Communication

Immense Expectations !

Web enables:

– access to everything

Everything on-demand

Interlinking enables:

– Validation of results

– Repetition of experiment

Discovery enables:

– new knowledge from old

STFC’s “e-pubs” Institutional

Repository has records of

30,000 publications spanning

25 years

The 7 C’s

Creation Collection

Capacity

Computation

Curation

Collaboration Communication

DataCreation

Archival

Access

Storage Compute

Network

Services

Curation

Overview

• Science and Technology Facilities Council

– Scientific Computing Department

Applications Division

Systems Division

Technology Division

Data Division

32

Latest News

9 new projects in E-INFRA and FET

EINFRA

• EUDAT 2020

• RDA 3

• NFFA

• …

FET

• NLAFET

• SAGE

• …

33

Usable

Medic

ine

En

viro

nm

en

t

Physic

al S

ci

So

cia

l S

ci

Hum

anitie

sBio

Questions?

STFC

Scientific Computing Department

Juan Bicarregui

Head of Data Division

PLaN-E meeting

Copenhagen

9 April 2015

36