Quantifying the cost of compression - Presented by Thomas Kejser at SQL Bits

Preview:

DESCRIPTION

A DBA running SQL Server 2008 or above will often need to understand if it is worth trading CPU cycles for I/O by enabling row or page compression. The benefits can be significant, but does the cost in core licensing offset the storage capacity saved? Often, comparing the workload before and after compression isn't an option. how then can you make an educated guess about the cost of compressing tables and indexes?   In this session, we will use Grade of the Steel type workloads to quantify the CPU cost of enabling different types of compression. Using CPU profiling, we will try to quantify the cost of this feature.

Citation preview

Thomas Kejser

thomas@kejser.org

http://blog.kejser.org

@thomaskejser

Quantifying the cost of Compression

Grade of the Steel

• Reduce Cost of Storage

• Reduce the number of IOPS

• Is this relevant?

• Squeeze more memory into DRAM

• But at what cost to CPU?

Why Compress Data?

SQL Server Compression Overview

Page 3-4x

Row 1-2x

Column 5-10x

Backup 5-10x

• It depends on WHAT your data contains

• There is NO WAY to tell until you try

• Anyone who tells you otherwise are lying!

How much will my Data Compress?

• The Information Entropy of the data

• The block size of compression

• Row: 1 column in one row

• Page: 8K

• Backup: up to 4MB

• Column store: 1M rows

• The algorithm you use

• = time you have to compression

• How the algorithm fits the data

What does it Depend on?

• VERY interesting subject

• A lot can be done and said about this

• But it would be another full length prezo

• See my PASS Nordic presentation

Nothing about Column Stores

How does row Compression Work

• Variable Length Encoding (new row format)

• Special Handling of NULL and 0s

0 0 0 1

4-byte integer

1 SQL Server 2008+ (Compression enabled)

4 Bytes SQL Server 2005

1 Byte

Consider These Two Examples

CREATE TABLE Numbers ( foo BIGINT NOT NULL , bar BIGINT NOT NULL ) INSERT INTO Numbers WITH (TABLOCK) (foo, bar) SELECT (n2.n - 1) * 1000 + (n1.n - 1) , (n2.n - 1) * 1000 + (n1.n - 1) FROM fn_nums(1000) n1 CROSS JOIN fn_nums(1000) n2 CREATE UNIQUE CLUSTERED INDEX CIX ON Numbers(foo) WITH (DATA_COMPRESSION = ROW) EXEC sp_spaceused 'Numbers'

CREATE TABLE Numbers ( foo BIGINT NOT NULL , bar BIGINT NOT NULL ) INSERT INTO Numbers WITH (TABLOCK) (foo, bar) SELECT (n2.n - 1) * 1000 + (n1.n - 1) , (n2.n - 1) * 1000000000 + (n1.n - 1) FROM fn_nums(1000) n1 CROSS JOIN fn_nums(1000) n2 CREATE UNIQUE CLUSTERED INDEX CIX ON Numbers(foo) WITH (DATA_COMPRESSION = ROW) EXEC sp_spaceused 'Numbers'

11MB 13MB

• Combination of

• Prefix Compression – common prefix inside column

• Dictionary – Common values across all columns

• Pages are kept compressed in buffer pool

• So: It should be more expensive to access them even there?

How does Page Compression Work?

Page Compression- Column Prefix

Page Header

0x5B8D80

0x41AABB

0x9A4041

0x112233

0x5CAABB

Lambert 5000000 NULL

4

5 20x41

20x43

20x42

Page Header

0x5B8D80

0x41AABB

0x9A4041

0x112233

0x5CAABB

Lambert 5000000 NULL

4

5 20x41

20x43

20x42

Column prefix

Page Dictionary

1

1

0

0

Decimal 6000000 is HEX 0x5B8D80

Page Compression - Dictionary

How fast is it?

Our Reasonably Priced Server

• 2 Socket Xeon E3645 • 2 x 6 Cores

• 2.4Ghz

• NUMA enabled, HT off

• 12 GB RAM

• 1 ioDrive2 Duo • 2.4TB Flash

• 4K formatted

• 64K AUS

• 1 Stripe

• Power Save Off

• Win 2008R2

• SQL 2012

Image Source: DeviantArt

Test: Table Scan with Page Compress

• Use TPC-H

• Scale factor 10

• 10GB total dataset

• Apply PAGE compress

Compression Size Build Time CPU Load

None 6.5 GB 18 sec 100%

ROW 4.9 GB 16 sec 100%

PAGE 3.9 GB 38 sec 100%

Gzip –5 2.5 GB NA NA

NTFS Compress (Best: 4K AUS) 3.7 GB NA NA

Windows Zipping 2.3 GB NA NA

• Fast Scan Through Table

• Table resident in memory first

• Scan NONE and PAGE

Compressed tables are faster, right?

Compression Size Scan Time CPU Load

None - Memory 6.5 GB 4 sec 100%

PAGE - Memory 3.9 GB 8 sec 100%

PAGE

Ahh.. But Thomas: you didn’t do I/O

Compression Size Scan Time CPU Load

None - Memory 6.5 GB 4 sec 100%

PAGE - Memory 3.9 GB 8 sec 100%

None – I/O 6.5 GB 6 sec 100%

PAGE – I/O 3.9 GB 9 sec 100%

NONE

• Our old friend xperf

Where does the time go?

xperf –on base –stackwalk profile

Function NONE PAGE

MinMaxStep 15.9% 9.2%

IndexDataSetSession::GetNextRowValuesInternal 14.6% 17.0%

CEsExec::GeneralEval 11.8% 6.7%

CValXVarTable::GetDataX 8.8% 5.2%

CXVariant::CopyDeep 7.8% 4.6%

memcpy 6.5% .4%

CValXVarTableRow::SetDataX 6.0% 3.3%

GetDataFromXvar8 5.2% 2.9%

RowsetNewSS::FetchNextRow 2.7% 1.6%

ps_dl_sqlhilo 2.7% 1.5%

GetData 2.1% 1.2%

GetDataFromXvar 1.8% 1.0%

CTEsCompare<122;122>::BlCompareXcArgArg 1.5% .9%

CTEsCompare<58;58>::BlCompareXcArgArg 1.2% .6%

CTEsCompare<167;167>::BlCompareXcArgArg 1.1% .7%

CTEsCompare<56;56>::BlCompareXcArgArg 1.1% .6%

SetMultData 1.1% .6%

CQScanTableScanNew::GetRow 1.0% .6%

CQScanStreamAggregateNew::GetRowHelper 1.0% .6%

CTEsCompare<52;52>::BlCompareXcArgArg .8% .7%

ScalarCompression::AddPadding .7%

PageComprMgr::DecompressColumn 4.7%

DataAccessWrapper::DecompressColumnValue 9.9%

CDRecord::LocateColumnInternal 14.4%

AnchorRecordCache::LocateColumn 2.3%

DataAccessWrapper::StoreColumnValue 4.3%

Additional Runtime of GetNextRowValuesInternal 2.4%

Total 38.7%

• Sample from LINEITEM

• Force loop join with index seeks

• Do 1.4M seeks

Test: Singleton Row Fetch

Singleton seeks – Cost of compression

Compression Seek (1.4M seeks) CPU Load

None - Memory 13 sec 100% one core

PAGE - Memory 24 sec 100% one core

None – I/O 21 sec 100% one core

PAGE – I/O 32 sec 100% one core

Function % Weight

CDRecord::LocateColumnInternal 0.82%

DataAccessWrapper::DecompressColumnValue 0.47%

SearchInfo::CompareCompressedColumn 0.28%

PageComprMgr::DecompressColumn 0.24%

AnchorRecordCache::LocateColumn 0.18%

ScalarCompression::AddPadding 0.04%

ScalarCompression::Compare 0.11%

Additional Runtime of GetNextRowValuesInternal 0.14%

Total Compression 2.28%

Total CPU (single core) 8.33%

Compression % 27.00%

xperf –on base –stackwalk profile

Test: Updates of pages

Compression Update 1.4M CPU Load

None - Memory 13 sec 100% one core

PAGE - Memory 54 sec 100% one core

None – I/O 17 sec 100% one core

PAGE – I/O 59 sec 100% one core

L_QUANTITY is NOT NULL i.e. in place UPDATE

Function CPU %

qsort 0.86

CDRecord::Resize 0.84

CDRecord::LocateColumnInternal 0.36

perror 0.36

Page::CompactPage 0.36

ObjectMetadata::`scalar deleting destructor' 0.27

SearchInfo::CompareCompressedColumn 0.24

CDRecord::InitVariable 0.19

CDRecord::LocateColumnWithCookie 0.18

memcmp 0.16

PageDictionary::ValueToSymbol 0.16

Record::DecompressRec 0.14

PageComprMgr::DecompressColumn 0.14

CDRecord::InitFixedFromOld 0.1

SOS_MemoryManager::GetAddressInfo64 0.08

AnchorRecordCache::LocateColumn 0.08

CDRecord::GetDataForAllColumns 0.08

ScalarCompression::Compare 0.07

PageComprMgr::CompressColumn 0.07

Record::CreatePageCompressedRecNoCheck 0.06

memset 0.05

PageComprMgr::ExpandPrefix 0.04

PageRef::ModifyColumnsInternal 0.04

Page::ModifyColumns 0.03

DataAccessWrapper::ProcessAndCompressBuffer 0.03

SingleColAccessor::LocateColumn 0.03

CDRecord::BuildLongRegionBulk 0.02

ChecksumSectors 0.02

Page::MCILinearRegress 0.02

DataAccessWrapper::DecompressColumnValue 0.02

SOS_MemoryManager::GetAddressInfo 0.02

CDRecord::FindDiff 0.02

AnchorRecordCache::Init 0.02

PageComprMgr::CombinePrefix 0.01

Total 5.17

UPDATE Compression burners

Out of 8.55 … Approx: 60%

• I am going to do a demo!

• Lets see xperf in action

And now, for something unusual!

Benchmark Test: HammerDb

Compression TPM CPU Load

NONE Compress 16.7 M 70% all cores

PAGE Compress 14.5 M 70% all cores

Benchmark Simulating TPC type workloads It is NOT TPC, but it helps you set up something similar Below are “TPC-C like” workload

What about locks and contention?

Xevent Trace Lock Acquire/Release

How long are locks held?

0

100

200

300

400

500

600

PAGE NONE

CPU KCycles

Lock Held Cycle Count

Avg

StdDev

&

Recommended