Upload
marcus
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Testing the In-Memory Column Store for in- d atabase physics analysis. Dr. Maaike Limper. About CERN. CERN - European Laboratory for Particle Physics. Support the research activities of 10 000 scientists from 110+ nationalities. - PowerPoint PPT Presentation
Citation preview
Testing the In-Memory Column Store for in-database physics analysis
Dr. Maaike Limper
Maaike Limper - CERN 2
About CERN
Largest machine in the world, the Large Hadron Collider: 27km, 6000+ superconducting magnets
Four main experiments: ATLAS, ALICE, CMS, LHCb
17/6/2014
CERN - European Laboratory for Particle Physics
Support the research activities of 10 000 scientists from 110+ nationalities
Maaike Limper - CERN 3
Higgs Boson discovery
4 July 2012: Scientists from ATLAS and CMS present Higgs discovery result
17/6/2014
› Operation of the Large Hadron Collider and its experiments relies on Oracle
databases: conditions data, metadata, logging & monitoring data, …
› … but the data-points in these plots did not came out of a database
Plots of the invariant mass of photon-pairs produced at the LHC show a significant bump around 125 GeV …
Maaike Limper - CERN 4
CERN openlab
“CERN openlab is a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide LHC community” http://openlab.web.cern.ch
17/6/2014
My project: “Test the possibility of using the Oracle database for physics analysis”
Maaike Limper - CERN 5
In-database physics analysis
Higgs decay to 2 photons candidate: event display from the ATLAS experiment
17/6/2014
Maaike Limper - CERN 6
In-database physics analysis
Physics Analysis database
› Separate physics-objects in separate tables
› Physics-object described by hundreds of variables wide tables!
17/6/2014
J/ψ
Ψ(3686)
Analysis queries
› Predicate filtering to quickly apply object quality-criteria
› Each analysis-specific query uses unique combination of columns
Maaike Limper - CERN 7
The problem
› Analysis query performance typically limited by I/O reads Full table scans over tables with many columns, while only few
columns are used for each specific analysis
› Combination of columns unique for each query Can’t index every column!
17/6/2014
Maaike Limper - CERN 8
In-Memory Column Store
Oracle’s In-Memory Column Store provides a solution to reduce I/O read time, especially for tables with many columns
17/6/2014
› Profit from fast In-Memory reads
› Read only columns relevant for the specific analysis query
Maaike Limper - CERN
Compression rates
› COMPRESS FOR QUERY vs CAPACITY HIGH “electron” typical physics-object data: mixture of int, float, double “Event Filter” only booleans (mostly false), best compression “Missing Energy” table with floats & double, worst compression
9
Table name Compress ratio IMC cap. high
Compress ratio IMC query
“electron” 3.52 1.97
“Event Filter” 63.46 22.13
“Missing Energy” 1.7 1.2
17/6/201417/6/2014 9
Average compression rate of dataset is 2.1 with query compression and 3.6 with capacity high: physics-objects represent the bulk of the data
Maaike Limper - CERN
Simple query performance
› Comparing “read from disk” vs IMC time: 1000x faster
› Comparing “read from buffer cache” vs IMC time: 40x faster
Note 2x more memory needed to put data in the buffer cache compared to placing it in the In-Memory Column store !
1017/6/201417/6/2014 10
Maaike Limper - CERN
Complex query performance
› Comparing “read from disk” vs IMC time: 70x faster
› Comparing “read from buffer cache” vs IMC time: 7x faster
11
With IMC only 10 s to make this plot, allowing the analyst to quickly optimize results while trying different variable combinations
17/6/201417/6/2014 11
Maaike Limper - CERN
Conclusion
IMC’s STAR-story:
› Situation: In-database physics analysis is limited by I/O
› Task: Remove I/O bottleneck for any query using any combination of columns in a table
› Action: Use Oracle’s In-Memory Column Store Take advantage of fast reads from cache Columnar compression increases size of data that fits in-memory Access only relevant columns and use predicate pruning to further reduce
I/O
› Result: I/O bottleneck removed, real-time in-database physics analysis is now possible*
*while the Oracle database is not currently used for physics analysis, this study shows promising results using the In-Memory Column Store for in-database physics analysis 1217/6/201417/6/2014