29
William O'Mullane European Space Astronomy Centre 1 Inauguration of Institute for Data intensive Engineering and Science JHU Baltimore MD, USA. August 25,26 th 2009 Data Intensive Engineering and Science Gaia and Virtualisation Bringing the process to the data – Virtualization? William O’Mullane Gaia Science Operations Development Manager European Space Astronomy Centre (ESAC) Madrid,Spain http://www.rssd.esa.int/Gaia

Data Intensive Engineering and Science

  • Upload
    rene

  • View
    80

  • Download
    1

Embed Size (px)

DESCRIPTION

Data Intensive Engineering and Science. Gaia and Virtualisation Bringing the process to the data – Virtualization? William O’Mullane Gaia Science Operations Development Manager European Space Astronomy Centre (ESAC) Madrid,Spain. http://www.rssd.esa.int/ Gaia. Satellite. Mission: - PowerPoint PPT Presentation

Citation preview

Page 1: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre1

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Data Intensive Engineering and Science

Gaia and Virtualisation

Bringing the process to the data – Virtualization?

William O’MullaneGaia Science Operations Development Manager

European Space Astronomy Centre (ESAC)

Madrid,Spain

http://www.rssd.esa.int/Gaia

Page 2: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre2

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

SatelliteMission:

Stereoscopic Census of Galaxy arcsec Astrometry G<20 (10^9

sources)Radial Velocities G<16Photometry G < 20

• Status: ESA Corner Stone 6

– ESA provide the hardware and launch Launch: Spring 2012. Satellite In development

EADS/Astrium

Page 3: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre3

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Cruise to L2

Graphic -EADSAstrium

Page 4: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre4

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Lissajous Scanning L2Full sky 3 fold every six months5 year coverage

Lindegren

Page 5: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre5

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Giga Pixel Focal Plane

Image motion

22

106 CCDs , 938 million pixels, 2800 cm2

Star motion in 10 s

Astrometric Field CCDs

Blue P

hotometer C

CD

s

Sky Mapper CCDs

Red P

hotometer C

CD

s Radial-Velocity Spectrometer

CCDs

Basic Angle

Monitor

Wave Front Sensor

Basic Angle

Monitor

Wave Front Sensor

42.3

5cm

104.26cm

Page 6: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre6

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Pixels and images ….• Need the astrometric centroid of the CCD

image determined to an accuracy of 1% of the pixel size!

• There will be 10^12 images• Images are ‘windowed’ on board

– Only binned windows are down linked• Never actually get to see the Gaia ‘picture’

• Milimag Photometry also difficult (calib)• Spectra – serious blending problems• And then there is radiation (CTI effects)

Page 7: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre7

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Processing• Done by the community

– Data Processing and Analysis Consortium (DPAC)– ~360 active (means >=10%) participants – Divied in 9 Coordination Units (CU) – Top level Executive (DPACE) and Project Office (PO)– 6 Data Processing Centres to run Software

• All code in Java (only one exception ) – for portability have to run till 2020– Maintainability, testability etc.. JUnit, Hudson– Easier to write CORRECT code in higher level language

• Fewer Core Dumps• OK so its replaced by the NPE (Null Pointer Exception)

• Several ‘Relational’ Databases – Oracle– Postgress– MySql– Derby for Testing

Page 8: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre8

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

ArchitectureHighly distributed

Multiple independent DPCs and CUs

Want/need decoupleReduce

dependencies

risk

Hub and spokes

Max flexibility for CUs and DPCs

Minimum ICDs = Interface Control Document

Page 9: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre9

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

ESA/SOC and DPAC• The Science Operations Centre (SOC) is

funded by ESA (part of Gaia CAC)– Will carry out Science Operations of Gaia – Get the data to DPAC for processing

• Initial processing software to run in ESAC

• SOC is also embedded directly in the DPAC structure since outset:– Provides Architecture and Technical

advice/guidance (CU1) • CU1 also has CNES and other DPC people• All CU leaders in Executive

– Provides one of the six Data Processing Centres– Provides Technical support for Core processing

• Specifically significant effort in Astrometric Solution

Page 10: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre10

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

AGIS• Astrometric Global Iterative Solution

(Lindegren,Lammers)• Provide rigid independent reference frame for Gaia

Observations– rotate to ICRS using quasars

• perhaps about 10% of all the processing – only deals with about 10% of data (well behaved stars to make

the grid)

• Block iterative solution– using Gauss-Seidel “preconditioner” with simple iterations– Moving to Conjugate Gradient

• Collaboration with Yoshiyuki Yamada for Nano Jasmine processing – Picardo – last year, Parache now

Page 11: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre11

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Global Iterative Solution

Sky scans(highest accuracy

along scan)

Scan width: 0.7°

1. Objects are matched in successive scans2. Attitude and calibrations are updated3. Objects positions etc. are solved4. Higher-order terms are solved5. More scans are added6. Whole system is iterated

Page 12: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre12

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

AGIS – Observation ModelThe centroid of a star image is modelled as

noiseoffset

CCD/pixel

Attitude

instrument

position

Source

frame ref.

Global

location

Observed

6 astrometricparameters

fixed

r 000

geom. calibration +

chromaticity +

CTI shift

e.g.PPN white

gaussian, known

nCASGO Symbolically:

quaternion q(t)represented by

cubic splinecoefficients

Page 13: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre13

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

AGIS – How?Block-iterative least-squares solution of the over-determined system of equations

initialize S, A, C, G

GCAOS

GCSOA

GASOC

CASOG

one star at a time

one attitude interval at a time

one calibration unit a time + renormalise*

for the whole data set

iterate until convergence renormalise** Sand adjust A

* defines origin of instrument axes** defines origin of celestial axes

(order of operations may vary)

nCASGO

Page 14: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre14

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

AGIS Architecture

Optimised AGISDatabase

SourceAttitudeGlobalCalibration

Datatrains drive through AGIS Database passing observations to algorithms.

SourceAttitudeGlobalCalibrationSourceAttitudeGlobalCalibration

SourceAttitudeGlobalCalibrationSourceAttitudeGlobalCalibration

There can be as many Datatrains in parallel as we wish

SourceAttitudeGlobalCalibration

ElemetaryTakers

ObjectFactoryStore GaiaTable

Data Access Layer

AstroElementary

Page 15: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre15

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Scheduling• Very simple ..

Keep all machines busy all the time!Busy = CPU ~90%

Post jobs on whiteboard (like OPUS blackboard)

Job 1Job 2Job 3Job 4Job ..Job N

Trains/Workers Mark Jobs – and do them

*

*

Mark finished – repeat until done

**

*

Previous attempt had much more general scheduling It was also ~1000 times slower.

Page 16: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre16

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Some important architecture pointsFrom the outset we try in Gaia to:• Keep it just as simple as possible• Isolate algorithms form Data

– Already tried to virtualize algorithms

• Let Data drive the system (DataTrain)– Algorithms mostly not allowed to ‘query’– Specific data access patterns Data orgainised accordingly– Similar to Ferris-Wheel idea (Szalay ) but no hopping on/off!

• Access any piece of data on disk exactly once – preload some data on each node– E.g. 5 years attitude quaternions fit in 150-250Mb

• Be distributed – try to avoid large memory processes– Again in some cases it makes sense ..

Page 17: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre17

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Notes on AGIS Implementation• Highly distributed usually running on >40

nodes has run on >100 (1400 threads).• Only uses Java no special MPI libraries

needed – new languages come with almost all you need.– Hard part is breaking problem in

distributable parts – no language really helps with that.

• Truly portable – can run on laptops desktops, clusters and even Amazon cloud.

Page 18: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre18

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

AGIS Evolution –selected iterations

• Assuming:– Need 40 Iterations – more complexity – scaling to full data – availability of a x25 better machine ~10TFLOP

• Final AGIS would take ~50days in ESAC

Page 19: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre19

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Efficiency - an asideHigh load, low network, high CPU!Some HPC people tell us we should rewrite in C (save energy)

• On SOME machines C is faster • The energy bill is large (see later)• The (re)coding effort also large• IMHO Cost more than the energy• Maintainability ? (to 2020)

Supercomputer centres seem to have very specific macros to include in C code to make it efficient for THEIR machine.

• looks a little like a virtual machine • Why not provide a better JVM for

their machine ?• Or windows CLR ?

Page 20: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre20

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Virtualization• Started looking at virtualization ~2007• Seemed ideal for the multiple test setups

needed (VMWARE)• Agreed Cloud experiment 2009 (with

Parsons)• Had to be convincing• RUN AGIS obvious choice

– Already 4 years in development– In Java

• so its portable right!

Page 21: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre21

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

AGIS on the cloud• Took one person less than one week to get

running (Parsons,Olias).– Main problem DB config– Also found scalability problem in our code (never

had one hundred nodes before)• It ran at similar performance to our in house

cheap cluster.– E2C indeed is no super computer– Oracle image was available already– AGIS image was straightforward to construct but

was time consuming – better get it correct !• Availability of large number of nodes very

interesting –not affordable in house.

Page 22: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre22

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Cost effectiveness of E2C.• AGIS runs intermittently with growing Data volume.

• Estimate 2015 ~1.1MEuro (machine) + 3Meuro (energy bill more?) = ~4Meuro– In fact staggered spending for machines– buy machines as data volume increase

• Estimate on Amazon at today prices -340K for final run + 1.7MEuro for intermittent runs (less data) = ~2Meuro– Possibility to use more nodes and finish faster !

• Reckon you still need in house machine to avoid wasting time testing on E2C

• Old nut, Vendor lock-in ? (Sayeed railways..)

Page 23: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre23

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Final cloudy thought- the title!• Gaia Archive will have multi parameter data in time

series– Solar System sources – Galactic sources– Extra Galactic sources

• Cloud seems ideal way to allow complex access to an archive. - Ton Hey tells us MS is doing it – Amazon offer free storage for public datasets..

• Make Data available as Database on cloud– Provide VM to user – User codes in favorite language directly against DB api– BUT it runs local to the Data !

• Should we consider a new type of Archive?• Could VO = Virtualized Observatory ?

Page 24: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre24

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Questions ??

Ariane V188 carrying Herschel and Planck (May 14 2009)

Page 25: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre25

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

AGIS matrix

calibration (~106)

Filled

Sparse

Zeroes

s1s2s3 ... a1 a2 a3 ... c

source attitude calibration

source (5·108)

attitude (4·107)

0 0

0

0Gauss-Seidel

Pre-Conditioner

(Lammers)

Page 26: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre26

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Data Train load

Page 27: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre27

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

EC2 Instance Types

Small Large Extra Large

High CPU Medium

High CPU Large

Bits 32 64 64 32 64

RAM 1.7 GB 7.5 GB 15 GB 1.7 GB 7 GB

Disk 160 GB 850 GB 1690 GB 350 GB 1690 GB

EC2 Compute Units 1 4 8 5 20

I/O Performance Medium High High High High

Firewall Yes Yes Yes Yes Yes

Page 28: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre28

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

Architecture in the Cloud

ConvergenceMonitor

ConvergenceMonitor

Attitude UpdateServer

Attitude UpdateServer

StoreStore

GaiaTable

Object FactoryObject Factory

AstroElementaryElementaryDataTrain

ElementaryDataTrain

Request AstroElementaries between a range (x,y)

Calibration CollectorCalibration Collector

Attitucde CollectorAttitucde Collector

Source CollectorSource Collector

Global CollectorGlobal Collector

Source CollectorSource Collector

Data Trains

AGIS DB

RunManagerRunManager

1x Large Instance

AGIS AMI

Elastic IP

<n> x Extra Large or High CPU Large instances

AGIS AMI1x Large instance

Oracle AMI

Elastic IP

3 x Extra Large instances

AGIS AMI

Page 29: Data Intensive Engineering and Science

William O'Mullane European Space Astronomy Centre29

Inau

gura

tion

of

Inst

itut

e fo

r D

ata

inte

nsiv

e E

ngin

eeri

ng a

nd S

cien

ce

JHU

Bal

tim

ore

MD

, US

A. A

ugus

t 25,

26th 2

009

FLOPS and FLOP count estimatesCurrent hardware

18 Dual-processor, single-core Xeon blades8 Dual-processor, quad-core Xeon blades5 TB FibreChannel SANGives about 400 GFLOPS

Run time of one outer iteration: 1h/106 stars, so, 1 cycle with 40 iterations takes about 2d

CPU occupancy >90% (I/O never a problem)FLOP count estimates:

1.4 · 1020 for creation of final catalog2.2 · 1019 for final cycle [50d on 10 TFLOP machine]Estimate is based on a simple modelRegularly updated