23
VO Sandpit, November 2009

VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival... • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

  • Upload
    lethien

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

Page 2: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

The Centre for Environmental Data Archival...

www.ceda.ac.uk • BADC • NEODC • UKSSDC • IPCC DDC

Page 3: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

...has got some new kit!

Page 4: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

Through e-Infrastructure Investment

JASMIN CEMS

Page 5: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

CEMS vs JASMIN?

Same... • customers • drivers • location • hardware • administrators • managers • storage • networking

Different... • customers • drivers • data access • virtualisation tools • funding

Page 6: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

So what is JASMIN?

For the long answer, and more of this, read the paper: http://arxiv.org/abs/1204.3553

Presenter
Presentation Notes
Project (TB) JASMIN CEMS Current BADC 350 Current NEODC 300 Current CMIP5 350 CEDA Expansion 200 200 CMIP5 Expansion 800 300 CORDEX 300 MONSooN Shared Data 400 Other HPC Shared Data 600 User Scratch 500 300 Disk Totals 3500 1100
Page 7: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

JASMIN in 5 bullets

• 4.6 Petabytes of “fast” disk – with excellent connectivity

• A compute platform for running Virtual Machines

• An HPC compute cluster (known as “LOTUS”)

• Satellite nodes at remote sites

• Dedicated network connections to specific sites

Presenter
Presentation Notes
Project (TB) JASMIN CEMS Current BADC 350 Current NEODC 300 Current CMIP5 350 CEDA Expansion 200 200 CMIP5 Expansion 800 300 CORDEX 300 MONSooN Shared Data 400 Other HPC Shared Data 600 User Scratch 500 300 Disk Totals 3500 1100
Page 8: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

JASMIN locations

JASMIN-West University of Bristol 150 Tb

JASMIN-North University of Leeds 150 Tb

JASMIN-South University of Reading 500 Tb + compute

JASMIN-Core STFC RAL 3.5 Pb + compute

Page 9: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

JASMIN Dedicated Network Connections

Page 10: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

What is new/different about JASMIN

Users Processing Power Storage

Archives (BADC & NEODC)

Group Workspaces

HPC Overflow

Page 11: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

JASMIN/CEMS Storage

Project JASMIN CEMS NEODC Current 300 BADC Current 350 CMIP5 Current 350 CEDA Expansion 200 200 CMIP5 Expansion 800 300 CORDEX 300 MONSooN Shared Data

400

Group Workspaces

600

User Scratch 500 300 Totals 3500 Tb 1100 Tb

4.6 Petabytes sounds like a lot, but... ...we have already allocated it all for existing operations and upcoming projects... ...although some of it does include shared Group Workspaces and user scratch disk.

Presenter
Presentation Notes
Project (TB) JASMIN CEMS Current BADC 350 Current NEODC 300 Current CMIP5 350 CEDA Expansion 200 200 CMIP5 Expansion 800 300 CORDEX 300 MONSooN Shared Data 400 Other HPC Shared Data 600 User Scratch 500 300 Disk Totals 3500 1100
Page 12: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

Group Workspaces (GWS)

• For “projects” that want to: • Share a LARGE network-accessible

disk. • Allow access from a number of

institutions. • Pull data from external sites to a common cache. • Push data to external sites. • Process/analyse the data. • Process/analyse the data in conjunction with other

archived or group workspace datasets.

NOTE: A Group Workspace is temporary. It is not the same as a long-term archive.

Page 13: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

HPC Overflow

• Our community runs models at a number of High Performance Computing (HPC) sites

• Disk space is often limited at HPC sites • JASMIN provides a way of managing the overflow:

• GWS for big modelling projects • GWS buys the scientists time to work out which runs

should be archived in the long term • We are working with MONSooN to get more integrate

set up of new project with set of GWS (when required)

Page 14: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

User Access

FIREWALL

Login [1] Transfer [1..*] Analysis [1..*]

project-specific [*]

Page 15: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

User Access – VM Types

Login [1]

Transfer [1..*]

Analysis [1..*]

project-specific [*]

• jasmin-login1.ceda.ac.uk – exists; acts as a gateway to other JASMIN nodes; only one; no functionality.

• jasmin-xfer1.ceda.ac.uk – exists; for copying data in/out; currently SCP & RSYNC; GridFTP in pipeline; not official ingest route; archive read-access; read-write to GWS; will scale up number of VMs soon.

• jasmin-sci[12].ceda.ac.uk – exists; testing at present; see next slide; common software build;

• *.ceda.ac.uk – some running now; requested by specific projects/users; ROOT access for trusted partners; read-write access to GWS.

Page 16: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

Scientific Analysis VMs Plans for the Scientific Analysis VMs:

1. Scalable – create multiple VMs as required 2. SSH access via JASMIN login node 3. Supported set of common software 4. Repeatable build process developed in a managed way 5. Process for requesting software installs/upgrades 6. Read-access to BADC & NEODC archives (for

authorised users) 7. Write-access to cache (“requests”) disk 8. Read/write-access to GWS (for authorised users) 9. Home directories

Page 17: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

Creating a Standard Scientific Analysis VM

Using a clearly documented RPM-based approach to install, upgrade and maintain VMs, we should be able to:

• Share the bundle of RPMs with external groups (within

NCAS, and beyond) • Liaise with partners (such as Met Office) to align our

approaches and packages • Capture most of the commonly requested software in a

single VM build

Question: Can we, as a community, share a single VM build across multiple institutes/platforms? Do we have the same requirements?

Page 18: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

Parallel Processing Cluster (LOTUS)

LOTUS has just been fully commissioned (testing started yesterday):

- Login ‘head’ node (lotus.jc.rl.ac.uk) is a VM

- 4 Batch queues:

• lotus (8 Nodes with 2x6 Cores Intel 3.5GHz, 48G RAM, 10G Networking. 10Gb Std latency TCP MPI ) = 96 Cores

• lotus-g (3..6 Nodes with 2x6 Cores Intel 3.0GHz 96G RAM, 10G Networking. 10Gb Gnodal low latency TCP MPI ) = 36..72 cores

• lotus-smp (1 node with 4x12 cores AMD 2.6GHZ 256GB RAM, 10Gb Networking)

• lotus-serial (co-exists with lotus-smp and lotus queue hardware)

Page 19: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

Parallel Processing Cluster (LOTUS)

- Software • RHEL6.2 OS • Platform LSF batch scheduler • Platform MPI (+/- OpenMPI)

- Full Support for MPI I/O on Panasas parallel file systems - Intel and PGI compilers • Central repository for community installed software • Environment modules

- Panasas Parallel Storage connected at 10Gb throughout:

• CEDA Archive • 4TB scratch (expandable to 100’s TB) • Home directories

Page 20: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

Interesting Plans 1 Some things in the pipeline or currently being discussed... 1. Direct read-access to archive 2. Availability of Analysis VMs 3. Putting models on JASMIN

• PRECIS, JULES, NAME? • Phase 1: Project-specific VM and SSH access • Phase 2: Using archive for driving data • Phase 3: Simple configurations behind a web-interface • Phase 4: Porting to LOTUS (where appropriate)

Page 21: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

VO Sandpit, November 2009

Interesting Plans 2 4. Use of LOTUS for large data processing

• Investigating use of parallel i-python tools

5. Deployment of PP-to-CF toolkit on specific VMs: • um-conv[12] – running MOHC climate simulation

conversion code • Supporting HIGEM, APPOSITE and University of

Edinburgh

Page 22: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute

CEDA: http://www.ceda.ac.uk JASMIN Overview: http://www.ceda.ac.uk/projects/jasmin/ JASMIN Paper: http://arxiv.org/abs/1204.3553

Thank you!

More information

Page 23: VO Sandpit, November 2009 - JASMIN Sandpit, November 2009 The Centre for Environmental Data Archival...  • BADC • NEODC ... STFC RAL . 3.5 Pb + compute