Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Big Data Challenges at Diamond
Dr Andrew RichardsHead of Scientific Computing
Diamond Light Source Ltd
Central Laser Facility
ISIS (SpallationNeutron Source)
Diamond Light Source
Research Complex (for users of Diamond, ISIS and CLF)
LHC Tier 1 computing
Harwell Science and Innovation Campus
Diamond Light Source
Beamlines or Instruments
25/2/2016
Operational period
Beamlines by Village
Macromolecular CrystallographySoft Condensed MatterSpectroscopy
MaterialsEngineering and Environment Surfaces and Interfaces
Science SR Examples
Pharmaceutical manufacture &
processing
Casting aluminium
Structure of the Histamine H1 receptor
Non-destructive imaging of fossils
A National User Facility for Biological Electron Cryo-microscopy (eBIC)
Wellcome Trust Strategic Award/MRC/BBSRC, applicants:Helen Saibil, Kay Grünewald, David Stuart, Gerhard Materlik• Funded initially by the Wellcome Trust, MRC and BBSRC
at level of £15.6 M over 5 years, augmented to ~£25 M by additional investment by the Trust in 2016
• The facility currently includes:- 4 high-end 300kV automated cryo EMs (Titan Krios FEI)- 200 keV automated feeder instrument (Talos Arctica)- Cryo focussed ion beam instrument (SCIOS)- Sample prep incl. vitreous sectioning- Correlative fluorescence/EM- FEI Polara @OPIC Oxford for CAT 3 samples
New eBIC Facility• Initially constructed with
two large rooms for two Krios, remodel to house four - completed 9/16.
• Sample preparation, loading and general labs. + multiple rooms for smaller microscopes
Typical User Setup
GDA – User Interface
• Rich GUI clients – widgets, views, or perspectives using Eclipse plugin framework
Script Editor
Terminal
Live Plotting
Analysis & Visualisation
Log View
• 2007 No detector faster than ~10 MB/sec• 2009 Pilatus 6M system 60 MB/s• 2011 25Hz Pilatus 6M 150 MB/s• 2013 100Hz Pilatus 6M 600 MB/sec• 2013 ~10 beamlines with 10 GbE
detectors (mainly Pilatus and PCO Edge)• 2016 Percival detector 6GB/sec
1
10
100
1000
10000
2007 2012
Detector Performance (MB/s)
Data Rates
Data Rates
Electron Microscope
• Life Science EMs– 2x Titan Krios Electron
Microscopes – Gatan Quantum
Detector 600MB/s• 2x Physical Science
EMs to come• 2x further Life Science
EMs to come
Scientific Computing and Infrastructure at Diamond
Underpinning The Applications layer
• Scientific Software
• Data Acquisition
• Controls
Big Data
Data FlowMark Heron Diamond Light Source
Network Bandwidth BalanceMark Heron Diamond Light Source
10 Gbit/s
1 Gbit/s
Beamline Switch
400 Gbit/s
Disks
Beamline Switch
Cluster Switch
40 Gbit/s
40 Gbit/s
40x10 Gbit/s
CentralSwitch
Cluster
80 Gbit/s GPFS40 Gbit/s Lustre
400 Gbit/s
IB 10x56Gbit/s
10 Gbit/s10 Gbit/s
Scientific Computing Infrastructure
• HPC / HTC Cluster (~3500 cores)– X86, Nvidia GPU (K80, P100)
• High Performance Storage (~7.5PB)– Lustre03, Lustre04, GPFS01, GPFS02
• Network infrastructure– 10Gb/s, 40Gb/s to some beamlines
• User Gateways, Visualisation, Data Transfer– NX Service, Globus endpoint
• Support– Predominantly Linux infrastructure,– BUT also Windows support to beamlines/EM/etc and VM platforms– Relies on working with Corporate IT and other groups in Controls and Scientific Software
Statistics: Data
Target Available Used PerformanceXFS 50 TB 47 TB < 1GB/sLustre03 470 TB 370 TB 6 GB/sLustre04 140 TB 70 TB 2 GB/sGPFS01 1 PB 700 TB 15 GB/sGPFS02 3.7 PB 1.5 PB 40 GB/s
STFC Archive n/a 12 PB 12 TB – 50 TB per day ingest
Moving Data Off Site a Science DMZ
Future Provision
• New Data Centre (CSCR3) in Zone 13 Inner Courtyard
– Completed
• 30 Rack data centre for high performance compute and storage. • Will provide flexibility for future upgrades between current and new
datacentre
• Will enable larger on-premise platforms for data capture and data analysis
• First new HPC+Storage service planned for summer 2018
• BUT: exploring use of off-premise locations and commercial cloud capabilities for long term post processing of post visit data sets.
Scientific ComputingNew Computer Room
(CSCR3 – Inner courtyard)
12 PB of
Archived Data
‘Big’ Data Lifecycle challenges
• How much do you mean by BIG?
• How ‘FAST’ do you need to analyse the data?
• What data can be THROWN AWAY?– (and at what stage?)
• How LONG do you need to keep the data?
• And WHERE? Where do you want to transfer the data to/from?
• And WHERE do we best do Post-Processing?
Thank you