Upload
sandisk
View
417
Download
5
Tags:
Embed Size (px)
DESCRIPTION
A DBA running SQL Server 2008 or above will often need to understand if it is worth trading CPU cycles for I/O by enabling row or page compression. The benefits can be significant, but does the cost in core licensing offset the storage capacity saved? Often, comparing the workload before and after compression isn't an option. how then can you make an educated guess about the cost of compressing tables and indexes? In this session, we will use Grade of the Steel type workloads to quantify the CPU cost of enabling different types of compression. Using CPU profiling, we will try to quantify the cost of this feature.
Citation preview
Thomas Kejser
http://blog.kejser.org
@thomaskejser
Quantifying the cost of Compression
Grade of the Steel
• Reduce Cost of Storage
• Reduce the number of IOPS
• Is this relevant?
• Squeeze more memory into DRAM
• But at what cost to CPU?
Why Compress Data?
SQL Server Compression Overview
Page 3-4x
Row 1-2x
Column 5-10x
Backup 5-10x
• It depends on WHAT your data contains
• There is NO WAY to tell until you try
• Anyone who tells you otherwise are lying!
How much will my Data Compress?
• The Information Entropy of the data
• The block size of compression
• Row: 1 column in one row
• Page: 8K
• Backup: up to 4MB
• Column store: 1M rows
• The algorithm you use
• = time you have to compression
• How the algorithm fits the data
What does it Depend on?
• VERY interesting subject
• A lot can be done and said about this
• But it would be another full length prezo
• See my PASS Nordic presentation
Nothing about Column Stores
How does row Compression Work
• Variable Length Encoding (new row format)
• Special Handling of NULL and 0s
0 0 0 1
4-byte integer
1 SQL Server 2008+ (Compression enabled)
4 Bytes SQL Server 2005
1 Byte
Consider These Two Examples
CREATE TABLE Numbers ( foo BIGINT NOT NULL , bar BIGINT NOT NULL ) INSERT INTO Numbers WITH (TABLOCK) (foo, bar) SELECT (n2.n - 1) * 1000 + (n1.n - 1) , (n2.n - 1) * 1000 + (n1.n - 1) FROM fn_nums(1000) n1 CROSS JOIN fn_nums(1000) n2 CREATE UNIQUE CLUSTERED INDEX CIX ON Numbers(foo) WITH (DATA_COMPRESSION = ROW) EXEC sp_spaceused 'Numbers'
CREATE TABLE Numbers ( foo BIGINT NOT NULL , bar BIGINT NOT NULL ) INSERT INTO Numbers WITH (TABLOCK) (foo, bar) SELECT (n2.n - 1) * 1000 + (n1.n - 1) , (n2.n - 1) * 1000000000 + (n1.n - 1) FROM fn_nums(1000) n1 CROSS JOIN fn_nums(1000) n2 CREATE UNIQUE CLUSTERED INDEX CIX ON Numbers(foo) WITH (DATA_COMPRESSION = ROW) EXEC sp_spaceused 'Numbers'
11MB 13MB
• Combination of
• Prefix Compression – common prefix inside column
• Dictionary – Common values across all columns
• Pages are kept compressed in buffer pool
• So: It should be more expensive to access them even there?
How does Page Compression Work?
Page Compression- Column Prefix
Page Header
0x5B8D80
0x41AABB
0x9A4041
0x112233
0x5CAABB
Lambert 5000000 NULL
4
5 20x41
20x43
20x42
Page Header
0x5B8D80
0x41AABB
0x9A4041
0x112233
0x5CAABB
Lambert 5000000 NULL
4
5 20x41
20x43
20x42
Column prefix
Page Dictionary
1
1
0
0
Decimal 6000000 is HEX 0x5B8D80
Page Compression - Dictionary
How fast is it?
Our Reasonably Priced Server
• 2 Socket Xeon E3645 • 2 x 6 Cores
• 2.4Ghz
• NUMA enabled, HT off
• 12 GB RAM
• 1 ioDrive2 Duo • 2.4TB Flash
• 4K formatted
• 64K AUS
• 1 Stripe
• Power Save Off
• Win 2008R2
• SQL 2012
Image Source: DeviantArt
Test: Table Scan with Page Compress
• Use TPC-H
• Scale factor 10
• 10GB total dataset
• Apply PAGE compress
Compression Size Build Time CPU Load
None 6.5 GB 18 sec 100%
ROW 4.9 GB 16 sec 100%
PAGE 3.9 GB 38 sec 100%
Gzip –5 2.5 GB NA NA
NTFS Compress (Best: 4K AUS) 3.7 GB NA NA
Windows Zipping 2.3 GB NA NA
• Fast Scan Through Table
• Table resident in memory first
• Scan NONE and PAGE
Compressed tables are faster, right?
Compression Size Scan Time CPU Load
None - Memory 6.5 GB 4 sec 100%
PAGE - Memory 3.9 GB 8 sec 100%
PAGE
Ahh.. But Thomas: you didn’t do I/O
Compression Size Scan Time CPU Load
None - Memory 6.5 GB 4 sec 100%
PAGE - Memory 3.9 GB 8 sec 100%
None – I/O 6.5 GB 6 sec 100%
PAGE – I/O 3.9 GB 9 sec 100%
NONE
• Our old friend xperf
Where does the time go?
xperf –on base –stackwalk profile
Function NONE PAGE
MinMaxStep 15.9% 9.2%
IndexDataSetSession::GetNextRowValuesInternal 14.6% 17.0%
CEsExec::GeneralEval 11.8% 6.7%
CValXVarTable::GetDataX 8.8% 5.2%
CXVariant::CopyDeep 7.8% 4.6%
memcpy 6.5% .4%
CValXVarTableRow::SetDataX 6.0% 3.3%
GetDataFromXvar8 5.2% 2.9%
RowsetNewSS::FetchNextRow 2.7% 1.6%
ps_dl_sqlhilo 2.7% 1.5%
GetData 2.1% 1.2%
GetDataFromXvar 1.8% 1.0%
CTEsCompare<122;122>::BlCompareXcArgArg 1.5% .9%
CTEsCompare<58;58>::BlCompareXcArgArg 1.2% .6%
CTEsCompare<167;167>::BlCompareXcArgArg 1.1% .7%
CTEsCompare<56;56>::BlCompareXcArgArg 1.1% .6%
SetMultData 1.1% .6%
CQScanTableScanNew::GetRow 1.0% .6%
CQScanStreamAggregateNew::GetRowHelper 1.0% .6%
CTEsCompare<52;52>::BlCompareXcArgArg .8% .7%
ScalarCompression::AddPadding .7%
PageComprMgr::DecompressColumn 4.7%
DataAccessWrapper::DecompressColumnValue 9.9%
CDRecord::LocateColumnInternal 14.4%
AnchorRecordCache::LocateColumn 2.3%
DataAccessWrapper::StoreColumnValue 4.3%
Additional Runtime of GetNextRowValuesInternal 2.4%
Total 38.7%
• Sample from LINEITEM
• Force loop join with index seeks
• Do 1.4M seeks
Test: Singleton Row Fetch
Singleton seeks – Cost of compression
Compression Seek (1.4M seeks) CPU Load
None - Memory 13 sec 100% one core
PAGE - Memory 24 sec 100% one core
None – I/O 21 sec 100% one core
PAGE – I/O 32 sec 100% one core
Function % Weight
CDRecord::LocateColumnInternal 0.82%
DataAccessWrapper::DecompressColumnValue 0.47%
SearchInfo::CompareCompressedColumn 0.28%
PageComprMgr::DecompressColumn 0.24%
AnchorRecordCache::LocateColumn 0.18%
ScalarCompression::AddPadding 0.04%
ScalarCompression::Compare 0.11%
Additional Runtime of GetNextRowValuesInternal 0.14%
Total Compression 2.28%
Total CPU (single core) 8.33%
Compression % 27.00%
xperf –on base –stackwalk profile
Test: Updates of pages
Compression Update 1.4M CPU Load
None - Memory 13 sec 100% one core
PAGE - Memory 54 sec 100% one core
None – I/O 17 sec 100% one core
PAGE – I/O 59 sec 100% one core
L_QUANTITY is NOT NULL i.e. in place UPDATE
Function CPU %
qsort 0.86
CDRecord::Resize 0.84
CDRecord::LocateColumnInternal 0.36
perror 0.36
Page::CompactPage 0.36
ObjectMetadata::`scalar deleting destructor' 0.27
SearchInfo::CompareCompressedColumn 0.24
CDRecord::InitVariable 0.19
CDRecord::LocateColumnWithCookie 0.18
memcmp 0.16
PageDictionary::ValueToSymbol 0.16
Record::DecompressRec 0.14
PageComprMgr::DecompressColumn 0.14
CDRecord::InitFixedFromOld 0.1
SOS_MemoryManager::GetAddressInfo64 0.08
AnchorRecordCache::LocateColumn 0.08
CDRecord::GetDataForAllColumns 0.08
ScalarCompression::Compare 0.07
PageComprMgr::CompressColumn 0.07
Record::CreatePageCompressedRecNoCheck 0.06
memset 0.05
PageComprMgr::ExpandPrefix 0.04
PageRef::ModifyColumnsInternal 0.04
Page::ModifyColumns 0.03
DataAccessWrapper::ProcessAndCompressBuffer 0.03
SingleColAccessor::LocateColumn 0.03
CDRecord::BuildLongRegionBulk 0.02
ChecksumSectors 0.02
Page::MCILinearRegress 0.02
DataAccessWrapper::DecompressColumnValue 0.02
SOS_MemoryManager::GetAddressInfo 0.02
CDRecord::FindDiff 0.02
AnchorRecordCache::Init 0.02
PageComprMgr::CombinePrefix 0.01
Total 5.17
UPDATE Compression burners
Out of 8.55 … Approx: 60%
• I am going to do a demo!
• Lets see xperf in action
And now, for something unusual!
Benchmark Test: HammerDb
Compression TPM CPU Load
NONE Compress 16.7 M 70% all cores
PAGE Compress 14.5 M 70% all cores
Benchmark Simulating TPC type workloads It is NOT TPC, but it helps you set up something similar Below are “TPC-C like” workload
What about locks and contention?
Xevent Trace Lock Acquire/Release
How long are locks held?
0
100
200
300
400
500
600
PAGE NONE
CPU KCycles
Lock Held Cycle Count
Avg
StdDev
&