32
Prabhat XLDB May 24, 2016 Realtime Data Analytics at NERSC - 1 -

Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Prabhat XLDB May 24, 2016

Realtime Data Analytics at NERSC

- 1 -

Page 2: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Lawrence Berkeley National Laboratory

- 2 -

Page 3: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

3

National Energy Research Scientific Computing Center

Page 4: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

NERSC is the Production HPC & Data Facility for DOE

Biological and Environmental Systems

Applied Math, Exascale Materials, Chemistry, Geophysics

Particle Physics, Astrophysics

Largest funder of physical science research in U.S.

Nuclear Physics Fusion Energy, Plasma Physics

- 4 -

Page 5: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Focus on Science • NERSC supports the broad

mission needs of the six DOE Office of Science program offices

• 6,000 users and 750 projects • Extensive science engagement

and user training programs • 2078 refereed publications in

2015

- 5 -

Page 6: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

NERSC - 2016

2 x 10 Gb

1 x 100 Gb

Software Defined Networking

Data-Intensive Systems PDSF, JGI,KBASE,HEP

14x QDR

Vis & Analytics Data Transfer Nodes Adv. Arch. Testbeds Science Gateways

Global Scratch

3.6 PB 5 x SFA12KE

/project

5 PB DDN9900 & NexSAN

/home 250 TB NetApp 5460

50 PB stored, 240 PB capacity HPSS

80 GB/s

50 GB/s

5 GB/s

12 GB/s

32x FDR IB

28 PB Local

Scratch >700 GB/s

Cori: Cray XC-40

Ph1: 1630 nodes, 2.3GHz Intel “Haswell” Cores, 203TB RAM Ph2: >9300 nodes, >60cores, 16GB HBM, 96GB DDR per node

- 6 -

7.6 PB Local

Scratch 163 GB/s

16x FDR IB

Edison: Cray XC-30

5,576 nodes, 133K, 2.4GHz Intel “IvyBridge” Cores, 357TB RAM

Ethernet & IB Fabric

Science Friendly Security Production Monitoring

Power Efficiency

WAN

1.5 PB “DataWarp”

>1.5 TB/s

Page 7: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

The Cori System • Cori will transition HPC and data-

centric workloads to energy efficient architectures

- 7 -

System named after Gerty Cori, Biochemist and first American woman to receive the Nobel prize in science.

Page 8: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Astronomy

Physics Light Sources

Genomics Climate

DOE facilities are facing a data deluge

Page 9: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

- 9 -

Page 10: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is
Page 11: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

- 11 -

Page 12: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

- 12 -

Page 13: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

- 13 -

Page 14: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

- 14 -

Page 15: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

- 15 -

Page 16: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

- 16 -

Page 17: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

- 17 -

Page 18: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

4 V’s of Scientific Big Data

- 18 -

Science Domain

Variety Volume Velocity Veracity

Astronomy Multiple Telescopes, multi-band/spectra

O(100) TB 100 GB/night – 10 TB/night

Noisy, acquisition artefacts

Light Sources Multiple imaging modalities

O(100) GB 1 Gb/s-1 Tb/s Noisy, sample preparation/acquisition artefacts

Genomics Sequencers, Mass-spec, proteomics

O(1-10) TB TB/week Missing data, errors

High Energy Physics

Multiple detectors O(100) TB – O(10) PB

1-10 PB/s reduced to GB/s

Noisy, artefacts, spatio-temporal

Climate Simulations Multi-variate, spatio-temporal

O(10) TB 100 GB/s ‘Clean’, need to account for multiple sources of uncertainty

Page 19: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Why Real-time Analytics? Why Now?

• Large instruments are producing massive data streams – Fast, predictable turnaround is integral to the processing

pipeline – Traditional HPC systems use batch queues with long or

unpredictable wait times

• Computational Steering <-> Experimental Steering – Change experimental configuration during your precious

beam-time!

• Follow-on analysis might be time critical – Supernovae candidates, asteroid detection

- 19 -

Page 20: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Real-time Use Cases

• Realtime interaction with experimental facilities – Light Sources: ALS, LCLS

• Realtime jobs driven by web portals – OpenMSI, MetAtlas

• Computational Steering – DIII – D reactor

• Experimental Steering – iPTF follow-on

- 20 -

Page 21: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Real-time Queue at NERSC

• NERSC has made a small pool of nodes available for immediate turnaround / “Realtime” computing – Up to 32 nodes in realtime queue (1024 cores) – Realtime nodes have higher priority than other queues – Pool can shrink or grow as needed based on demand

• Approved projects have a small number of nodes available on-demand without queue wait times – Configurations on a per-repo basis for

• Maximum number of jobs • Maximum number of cores • Wallclock • …

- 21 -

Page 22: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Usage (12/2015-04/2016)

- 22 -

Page 23: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Distribution

- 23 -

TOTALS: 332,625 hours used 23,244 jobs

Page 24: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Science Use Case: iPTF

- 24 -

DISCOVERIES Yi Cao, et al. (2015) Nature, “A strong ultraviolet pulse from a newborn Type Ia supernova”

PI: Kasliwal, Nugent, Cao

• Nightly images transferred • Subtractions performed • Candidates inserted in database • Typical turn-around time < 5

minutes

Page 25: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Science Use Case: Advanced Light Source

- 25 -

Production running at ALS beamlines: • 24x7 Operation • 176,293 Datasets • 155 Beamline Users • 1,050 TB Data Stored • 2,379,754 Jobs at NERSC

• Image reconstruction algorithms run on Cori

• 3D volume rendered on SPOT web portal

• ALS beamline users receive instant feedback

Page 26: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Science Use Case: Metabolite Atlas

Ben Bowen, LBL

- 26 -

• Pre-computed fragmentation trees for 10,000+ compounds

• Real-time queue used for comparing raw spectra to trees to obtain possible matches

• Results obtained in minutes • iPython interface to NERSC

Page 27: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Science Use Case: Cryo-Electron Microscopy

• Structure determination of TFIID

• 10-100 GB image stacks • Image classification • Real time queue used for

• Assessment of data quality during electron microscopy data collection

• Rapid optimization of data processing strategies

3D structure of TFIID-containing complex Nogales Lab Louder et al. (2016), Nature 531 (7596): 604-619

Page 28: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

LCLS Workflow Today: 150 TB Analysis in 5 days

stream XTC format

hitfinder

spotfinder

index

integrate

Cornell–SLAC Pixel Array

Diffraction Detector

Injector

DAQ multilevel data acquisition and

control system

HPSS

Global Scratch

/Project (NGF)

hitfinder

spotfinder

index

integrate

hitfinder

spotfinder

index

integrate

… psana

Prompt analysis

requires Fast Networks

& Real-time HPC

Queues

Compute Engine Cray XC30 Science DMZ

HPSS

Global Scratch

/Project (NGF)

Reconstruction

Actionable knowledge for Next Beamtime

Page 29: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

HPC 2GB/s

Streaming data from the detector to HPC ● 100-1000x data rates ● Indexing, classification, reconstruction, via on-the-fly veto system ● Quasi real-time response (<10 min) ● Terabit/s throughput from front-end

electronics ● Petaflop scale analysis on-demand

Indexed Diffraction Image

Reconstructed structure

LCLS-II 2019: Nanocrystallography Pipeline

Page 30: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Key Takeaways

• Data streaming and real-time analytics are emerging requirements at NERSC

• Experimental facilities are heaviest users – Light sources, Telescopes

• SDN capabilities are needed to enable data flows directly between compute node and workflow DBs

• Users would like to use realtime nodes to do more long-running interactive work/debugging

• Provisioning resources for real-time queue is an ongoing exercise

- 30 -

Page 31: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Acknowledgments

• Shreyas Cholia • Doug Jacobsen (NERSC) • NERSC Real-time queue users!

- 31 -

Page 32: Realtime Data Analytics at NERSC...Realtime Data Analytics at NERSC - 1 - Lawrence Berkeley National Laboratory - 2 - 3 National Energy Research Scientific Computing Center NERSC is

Thanks!

- 32 -