34
1 In-Memory Execution for Databases Tanel Poder a long time computer performance geek

GNW01: In-Memory Processing for Databases

Embed Size (px)

Citation preview

Page 1: GNW01: In-Memory Processing for Databases

gluent.com 1

In-MemoryExecutionforDatabases

TanelPoderalongtimecomputerperformancegeek

Page 2: GNW01: In-Memory Processing for Databases

gluent.com 2

Intro:Aboutme

• TanelPõder• OracleDatabasePerformancegeek(18+years)• ExadataPerformancegeek• LinuxPerformancegeek• HadoopPerformancegeek

• CEO&co-founder:

ExpertOracleExadatabook

(2nd editionisoutnow!)

Instantpromotion

Page 3: GNW01: In-Memory Processing for Databases

gluent.com 3

GluentOracle

TeradataNoSQL

BigDataSources

MSSQL

AppX

AppY

AppZ

Gluentasadatavirtualizationlayer

OpenDataFormats!

Page 4: GNW01: In-Memory Processing for Databases

gluent.com 4

GluentAdvisor

1. Analyzes DBstorageuseandaccesspatternsforsafeoffloading

2. 500+Databasesanalyzed

3. 10+PB analyzed– 81% offloadable

4. 2-24x queryspeedup

10PBInterestedinanalyzingyourdatabase?

http://gluent.com/whitepapers

Page 5: GNW01: In-Memory Processing for Databases

gluent.com 5

Tapeisdead,diskistape,flashisdisk,RAMlocalityisking

JimGray,2006

http://research.microsoft.com/en-us/um/people/gray/talks/flash_is_good.ppt

Page 6: GNW01: In-Memory Processing for Databases

gluent.com 6

SeagateCheetah15kRPMdiskspecs

200MB/sec!

Page 7: GNW01: In-Memory Processing for Databases

gluent.com 7

SpinningdiskIOthroughput

• B-Treeindex-walking disk-basedRDBMS• 15000rpmspinningdisks• ~200random IOPSperdisk• ~8kBreadperrandomIO

• 8kB*200IOPS=1.6MB/sec perdisk

• Fullscanning basedworkloads• Potentiallymuchmoredatatoaccess&filter• Partitionpruning,zonemaps,storageindexeshelptoskipdata1• Scanonlyrequiredcolumns(formatswithlargechunksizes)• SequentialIOrateupto200MB/sec perdisk

http://www.dbms2.com/2013/05/27/data-skipping/

However,indexscanscanreadonlyasubsetofdata

Page 8: GNW01: In-Memory Processing for Databases

gluent.com 8

ScanningabunchofspinningdiskscankeepyourCPUsreallybusy!

*NoteventalkingaboutflashorRAMhere!

Page 9: GNW01: In-Memory Processing for Databases

gluent.com 9

AsimplequerybottleneckedbyCPU

9GBscanned,processedin7seconds:

~1300MB/sinPX~80MB/sperslave

Page 10: GNW01: In-Memory Processing for Databases

gluent.com 10

AcomplexquerybottleneckedbyCPU

ComplexQuery:MuchmoreCPUspenton

aggregations,joins.9GBprocessedin1.5minutes

9GB/90seconds=~100MB/sPX

6MB/sperslave

Page 11: GNW01: In-Memory Processing for Databases

gluent.com 11

Ifdisksandstoragesubsystemsaregettingsofast,whyallthebuzzaroundin-memorydatabasesystems?

*Can’twejustcachetheolddatabasefilesinRAM?

Page 12: GNW01: In-Memory Processing for Databases

gluent.com 12

AsimpleDataRetrievaltest!

• Retrieve1% rowsoutofa8GBtable:

SELECTCOUNT(*)

, SUM(order_total)FROM

orders WHERE

warehouse_id BETWEEN 500 AND 510

TheWarehouseIDsrangebetween

1and999

Testdatageneratedby

SwingBench tool

Page 13: GNW01: In-Memory Processing for Databases

gluent.com 13

DataRetrieval:TestResults• Remember,thisisaverysimplescanning+filteringquery:

TESTNAME PLAN_HASH ELA_MS CPU_MS LIOS BLK_READ------------------------- ---------- -------- -------- --------- ---------test1: index range scan * 16715356 265203 37438 782858 511231test2: full buffered */ C 630573765 132075 48944 1013913 849316test3: full direct path * 630573765 15567 11808 1013873 1013850test4: full smart scan */ 630573765 2102 729 1013873 1013850test5: full inmemory scan 630573765 155 155 14 0test6: full buffer cache 630573765 7850 7831 1014741 0

Test5&Test6runentirelyfrommemory

Source:http://www.slideshare.net/tanelp/oracle-database-inmemory-option-in-action

Butwhy50xdifferenceinCPUusage?

Page 14: GNW01: In-Memory Processing for Databases

gluent.com 14

Tapeisdead,diskistape,flashisdisk,RAMlocalityisking

JimGray,2006

http://research.microsoft.com/en-us/um/people/gray/talks/flash_is_good.ppt

Page 15: GNW01: In-Memory Processing for Databases

gluent.com 15

LatencyNumbersEveryProgrammerShouldKnow

Latency Comparison Numbers--------------------------L1 cache reference 0.5 nsBranch mispredict 5 ns

L2 cache reference 7 ns 14x L1 cacheMutex lock/unlock 25 nsMain memory reference 100 ns 20x L2 cache,

200x L1 cacheCompress 1K bytes with Zippy 3,000 ns 3 usSend 1K bytes over 1 Gbps network 10,000 ns 10 us

Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSDRead 1 MB sequentially from memory 250,000 ns 250 usRound trip within same datacenter 500,000 ns 500 us

Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory

Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip

Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory,20X SSD

Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms

Source:https://gist.github.com/jboner/2841832

Page 16: GNW01: In-Memory Processing for Databases

gluent.com 16

CPU=fast

CPUL2/L3cacheinbetween

RAM=slow

Page 17: GNW01: In-Memory Processing for Databases

gluent.com 17

RAMaccessisthebottleneckofmoderncomputers

WaitsforRAMaccessshowupasCPUusageinmonitoringtools

Wanttowaitless?Doitless!

Page 18: GNW01: In-Memory Processing for Databases

gluent.com 18

CPU&cachefriendlydatastructuresarekey!

Headers,ITLentries

RowDirectory

#0hdr row

#1hdr row

#2hdr row

#3hdr row

#4hdr row

#5hdr row

#6hdr row

#7hdr row

#8hdr row

… row

#1offset#2offset#3offset

#0offset

Hdrbyte ColumndataLock

byteCCbyte

Col.len ColumndataCol.

len ColumndataCol.len ColumndataCol.

len

• OLTP:Block->Row->Columnformat• 8kBblocks• Greatforwrites,changes

• Field-lengthencoding• Readingcolumn#100requireswalking

throughallprecedingcolumns

• Columns(withsimilarvalues)notdenselypackedtogether

• NotCPUcachefriendlyforanalytics!

Page 19: GNW01: In-Memory Processing for Databases

gluent.com 19

Scanningcolumnardatastructures

Scanningacolumninarow-oriented datablock

Scanningacolumninacolumn-oriented compressionunit

col1 col2

col3

col4

col5

col6

col2col2

col3col3

col4col4

col5col5

col5col6

col1 col2

3…

col3 col4col4 col5

col6 col1 col2col3

col3

col4

col4

col5

col5col1 col2

col6col6

col1 col2

3…

col3 col4col4 col5

col6 col1 col2col3

col3

col4

col4

col5

col5col1 col2

col6col6

col1 col2

3…

col3 col4col4 col5

col6 col1 col2col3

col3

col4

col4

col5

col5col1 col2

col6col6 Readfilter

column(s)first.Accessonly

projectedcolumnsifmatchesfound.

Reducedmemorytraffic.More

sequentialRAMaccess,SIMD onadjacentdata.

Page 20: GNW01: In-Memory Processing for Databases

gluent.com 20

Howtomeasure thisstuff?

Page 21: GNW01: In-Memory Processing for Databases

gluent.com 21

CPUPerformanceCountersonLinux# perf stat -d -p PID sleep 30

Performance counter stats for process id '34783':

27373.819908 task-clock # 0.912 CPUs utilized86,428,653,040 cycles # 3.157 GHz 32,115,412,877 instructions # 0.37 insns per cycle

# 2.39 stalled cycles per insn7,386,220,210 branches # 269.828 M/sec

22,056,397 branch-misses # 0.30% of all branches 76,697,049,420 stalled-cycles-frontend # 88.74% frontend cycles idle 58,627,393,395 stalled-cycles-backend # 67.83% backend cycles idle

256,440,384 cache-references # 9.368 M/sec 222,036,981 cache-misses # 86.584 % of all cache refs 234,361,189 LLC-loads # 8.562 M/sec 218,570,294 LLC-load-misses # 93.26% of all LL-cache hits 18,493,582 LLC-stores # 0.676 M/sec 3,233,231 LLC-store-misses # 0.118 M/sec

7,324,946,042 L1-dcache-loads # 267.589 M/sec 305,276,341 L1-dcache-load-misses # 4.17% of all L1-dcache hits 36,890,302 L1-dcache-prefetches # 1.348 M/sec

30.000601214 seconds time elapsed

Measurewhat’sgoingoninside a

CPU!

Metricsexplainedinmyblogentry:

http://bit.ly/1PBIlde

Page 22: GNW01: In-Memory Processing for Databases

gluent.com 22

TestingdataaccesspathdifferencesonOracle12c

SELECT COUNT(cust_valid) FROM customers_nopart c WHERE cust_id > 0

Runthesamequeryonsamedatasetstoredindifferentformats/layouts.

Fulldetails:http://blog.tanelpoder.com/2015/11/30/ram-is-the-new-disk-and-how-to-measure-its-performance-part-3-cpu-instructions-cycles/

Testresultdata:http://bit.ly/1RitNMr

Page 23: GNW01: In-Memory Processing for Databases

gluent.com 23

CPUinstructionsusedforscanning/counting69Mrows

Page 24: GNW01: In-Memory Processing for Databases

gluent.com 24

AverageCPUinstructionsperrowprocessed

• Knowingthatthetablehasabout69Mrows,Icancalculatetheaveragenumberofinstructionsissuedperrowprocessed

Page 25: GNW01: In-Memory Processing for Databases

gluent.com 25

CPUcyclesconsumed(fullscansonly)

Page 26: GNW01: In-Memory Processing for Databases

gluent.com 26

CPUefficiency(Instructions-per-Cycle)

Yes,modernsuperscalarCPUscanexecutemultiple

instructionspercycle

Page 27: GNW01: In-Memory Processing for Databases

gluent.com 27

ReducingmemorywriteswithinSQLexecution

• Oldapproach:1. Readcompresseddatachunk2. Decompressdata(writedatatotemporarymemorylocation)3. Filteroutnon-matchingrows4. Returndata

• Newapproach:1. Readandfilter compressedcolumns2. Decompressonlyrequiredcolumnsofmatchingrows3. Returndata

Page 28: GNW01: In-Memory Processing for Databases

gluent.com 28

Memoryreads&writesduringinternalprocessing

Unit=MB Readonlyrequestedcolumns

Rowscountedfromchunkheaders

Scancompresseddata:fewmemorywrites

Page 29: GNW01: In-Memory Processing for Databases

gluent.com 29

Past&Future

Page 30: GNW01: In-Memory Processing for Databases

gluent.com 30

Somecommercialcolumnstorehistory

• Disk-optimizedcolumnstores• Expressway103/SybaseIQ(early‘90s)• MonetDB (early‘90s)• OracleHybridColumnarCompression(disk/OLTPoptimized)• …

• Memory-optimizedcolumnstores• …• SAPHANA(December2010)• IBMDB2withBLUAcceleration(June2013)• OracleDatabase12cwithIn-MemoryOption(July2014)• …

*Notaddressingmemory-optimizedOLTP/row-storeshere

Page 31: GNW01: In-Memory Processing for Databases

gluent.com 31

Future-proofOpenDataFormats!

• Disk-optimizedcolumnardatastructures• ApacheParquet

• https://parquet.apache.org/

• ApacheORC• https://orc.apache.org/

• Memory/CPU-cacheoptimizeddatastructures• ApacheArrow

• Notonlystorageformat• …alsoacross-system/cross-platformIPCcommunicationframework• https://arrow.apache.org/

Page 32: GNW01: In-Memory Processing for Databases

gluent.com 32

Future

1. RAMgetscheaper+bigger,notnecessarilyfaster

2. CPUcachesgetlarger

3. RAMblendswithstorageandbecomesnon-volatile

4. IOsubsystems(flash)getevenclosertoCPUs

5. IOlatenciesshrink

6. Thelatencydifferencebetweennon-volatilestorageandvolatileRAMshrinks- newdatabaselayouts!

7. CPUcacheisking– newdatastructuresneeded!

Page 33: GNW01: In-Memory Processing for Databases

gluent.com 33

References

• Slides&Videoofthispresentation:• http://www.slideshare.net/tanelp• https://vimeo.com/gluent

• Indexrangescansvsfullscans:• http://blog.tanelpoder.com/2014/09/17/about-index-range-scans-

disk-re-reads-and-how-your-new-car-can-go-600-miles-per-hour/

• RAMisthenewdiskseries:• http://blog.tanelpoder.com/2015/08/09/ram-is-the-new-disk-and-

how-to-measure-its-performance-part-1/• https://docs.google.com/spreadsheets/d/1ss0rBG8mePAVYP4hlpvjqA

AlHnZqmuVmSFbHMLDsjaU/

Page 34: GNW01: In-Memory Processing for Databases

gluent.com 34

Thanks!

http://gluent.com/whitepapers

Wearehiringdevelopers&dataengineers!!!

http://[email protected]

@tanelpoder