Testing the In-Memory Column Store for in- d atabase physics analysis

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Testing the In-Memory Column Store for in- d atabase physics analysis. Dr. Maaike Limper. About CERN. CERN - European Laboratory for Particle Physics. Support the research activities of 10 000 scientists from 110+ nationalities. - PowerPoint PPT Presentation

Text of Testing the In-Memory Column Store for in- d atabase physics analysis

PowerPoint Presentation

Testing the In-Memory Column Store for in-database physics analysisDr. Maaike Limper

About CERNLargest machine in the world, the Large Hadron Collider: 27km, 6000+ superconducting magnetsFour main experiments: ATLAS, ALICE, CMS, LHCbMaaike Limper - CERN 17/6/20142CERN - European Laboratory for Particle PhysicsSupport the research activities of 10 000 scientists from 110+ nationalities2

Higgs Boson discovery4 July 2012: Scientists from ATLAS and CMS present Higgs discovery resultMaaike Limper - CERN 17/6/20143Operation of the Large Hadron Collider and its experiments relies on Oracle databases: conditions data, metadata, logging & monitoring data, but the data-points in these plots did not came out of a database

Plots of the invariant mass of photon-pairs produced at the LHC show a significant bump around 125 GeV What if we could use the database to find those bumps in our data?3

CERN openlabCERN openlab is a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide LHC community http://openlab.web.cern.ch

Maaike Limper - CERN 17/6/20144My project: Test the possibility of using the Oracle database for physics analysis44

In-database physics analysisHiggs decay to 2 photons candidate: event display from the ATLAS experimentMaaike Limper - CERN 17/6/20145In-database physics analysisPhysics Analysis databaseSeparate physics-objects in separate tablesPhysics-object described by hundreds of variables wide tables!

Maaike Limper - CERN 17/6/20146

J/(3686)

Analysis queriesPredicate filtering to quickly apply object quality-criteriaEach analysis-specific query uses unique combination of columns

The problemAnalysis query performance typically limited by I/O readsFull table scans over tables with many columns, while only few columns are used for each specific analysisCombination of columns unique for each queryCant index every column!Maaike Limper - CERN 717/6/2014

In-Memory Column StoreOracles In-Memory Column Store provides a solution to reduce I/O read time, especially for tables with many columns

Maaike Limper - CERN 17/6/20148

Profit from fast In-Memory readsRead only columns relevant for the specific analysis query

Compression ratesCOMPRESS FOR QUERY vs CAPACITY HIGHelectron typical physics-object data: mixture of int, float, doubleEvent Filter only booleans (mostly false), best compressionMissing Energy table with floats & double, worst compression

9Table nameCompress ratio IMC cap. highCompress ratio IMC queryelectron3.521.97Event Filter63.4622.13Missing Energy1.71.217/6/2014Maaike Limper - CERN 17/6/20149Average compression rate of dataset is 2.1 with query compression and 3.6 with capacity high: physics-objects represent the bulk of the data Bulk data is physics object9Simple query performanceComparing read from disk vs IMC time: 1000x faster Comparing read from buffer cache vs IMC time: 40x faster

Note 2x more memory needed to put data in the buffer cache compared to placing it in the In-Memory Column store !10

17/6/2014Maaike Limper - CERN 17/6/201410Complex query performanceComparing read from disk vs IMC time: 70x faster Comparing read from buffer cache vs IMC time: 7x faster11

With IMC only 10 s to make this plot, allowing the analyst to quickly optimize results while trying different variable combinations17/6/2014Maaike Limper - CERN 17/6/201411ConclusionIMCs STAR-story:Situation: In-database physics analysis is limited by I/OTask: Remove I/O bottleneck for any query using any combination of columns in a tableAction: Use Oracles In-Memory Column StoreTake advantage of fast reads from cacheColumnar compression increases size of data that fits in-memoryAccess only relevant columns and use predicate pruning to further reduce I/OResult: I/O bottleneck removed, real-time in-database physics analysis is now possible*

*while the Oracle database is not currently used for physics analysis, this study shows promising results using the In-Memory Column Store for in-database physics analysisMaaike Limper - CERN 1217/6/201417/6/2014