View
62
Download
2
Category
Preview:
DESCRIPTION
From A to E: Analyzing TPC’s OLTP Benchmarks. The obsolete, the ubiquitous, the unknown. Pınar Tözün Ippokratis Pandis* Cansu Kaynak Djordje Jevdjic Anastasia Ailamaki. École Polytechnique Fédérale de Lausanne *IBM Almaden Research Center. OLTP Benchmarks of TPC. 2005. 2015. - PowerPoint PPT Presentation
Citation preview
From A to E:Analyzing TPC’s OLTP Benchmarks
Pınar Tözün Ippokratis Pandis*Cansu Kaynak Djordje Jevdjic
Anastasia Ailamaki
École Polytechnique Fédérale de Lausanne*IBM Almaden Research Center
The obsolete, the ubiquitous, the unknown
OLTP Benchmarks of TPC
2
• Allow fair product comparisons• Drive innovations for better performance
TPC-E: Unknown – Results from one DBMS vendorTPC-C: Ubiquitous – Most common
TPC-A, TPC-B: Obsolete
20151985 1995 2005
19901989 1992 2007
TPC-C
TPC-B
TPC-E
TPC-ABanking
Wholesale supplier
Brokerage house
3
How is TPC-E different?
Hardware
Storage Manager
Workload
Micro-architectural behavior
Where does time go?
Characteristics/Statistics
Under-utilization due to instruction stallsFewer cache misses and higher IPC
Harder to partition requestsLogical lock contention
More page re-useComplex schema & transactions
Longer held locks
4
Outline• Preview• Setup & Methodology• Micro-architectural behavior• Within the storage manager• Conclusions
5
Experimental SetupServer Fat (Intel Xeon X5660) Lean (Sun Niagara T2)
#Sockets 2 1#Cores per Socket 6 (OoO) 8 (in-order)
#HW Contexts 24 64Clock Speed 2.80GHz 1.40GHz
Memory 48GB 64GBL3 12MB (shared) –L2 256KB (per core) 4MB (shared)
L1-D 32KB (per core) 8KB (per core)L1-I 32KB (per core) 16KB (per core)OS Ubuntu 10.04
Linux kernel 2.6.32SunOS 5.10
Generic_141414-10
6
Methodology• Shore-MT
– Scalable open-source storage manager
• Shore-Kits– Application layer for Shore-MT– Workloads: TPC-B, TPC-C, TPC-E, ++
• Micro-architectural– Xeon X5660: Vtune, Niagara T2: cputrack– Measured at peak throughput
• Storage manager profiling– Niagara T2: dtrace
*https://sites.google.com/site/shoremt
*
*
7
Outline• Preview• Setup & Methodology• Micro-architectural behavior• Within the storage manager• Conclusions
8
IPC on Fat & Lean Cores
TPC-B TPC-C TPC-E0
0.5
1
1.5
2
2.5
3
3.5
4
Inst
ructi
ons
per C
ycle
Intel Xeon X5660
TPC-B TPC-C TPC-E0
0.5
1
1.5
2
2.5
3
3.5
4
Inst
ructi
ons
per C
ycle
Sun Niagara T2Maximum
Maximum
OLTP utilizes lean cores betterTPC-E has higher IPC
9
Execution Cycles and StallsIntel Xeon X5660
More than half of execution time goes to stallsInstruction stalls are the main problem
TPC-B TPC-C TPC-E0%
20%
40%
60%
80%
100%Busy Stalled
Exec
ution
Cyc
les
Brea
kdow
n
TPC-B TPC-C TPC-E0%
20%
40%
60%
80%
100%Rest Instruction
Core
Sta
lls
10
TPC-B TPC-C TPC-E0
20
40
60
80
100 L2D L1D
L2I L1I
Mis
ses
per k
-Inst
ructi
ons
Cache Misses
TPC-E has lower data miss ratio (MPKI)L1-I misses dominate
Intel Xeon X566032KB L1-I & 32 KB L1-D
Sun Niagara T216KB L1-I & 8KB L1-D
TPC-B TPC-C TPC-E0
20
40
60
80
100LLC L2D L1D L2IL1I
Mis
ses
per k
-Inst
ructi
ons
11
Why TPC-E has lower miss ratio?
TPC-B TPC-C TPC-E0
30
60
90
120
150
#Rec
ords
Acc
esse
d
More scans of TPC-E Increased page reuse
Average per transaction
TPC-B TPC-C TPC-E0
5
10
15
20
25
30
35
40
HeapIndex
#Pag
es A
cces
sed
12
Outline• Preview• Setup & Methodology• Micro-architectural behavior• Within the storage manager• Conclusions
13
From A to E: Schema
branch warehouse
Fixed
Scaling
Growing
customer
Increasing schema complexity
TPC-B TPC-C TPC-E
14
From A to E: TransactionsTPC-B TPC-C TPC-E
#Transactions 1 5 12Transaction Mix RW 100% RW 92% RW 23%Secondary Indexes
None 2 transactions 10 transactions
Transaction Input includes
Branch ID Warehouse ID Customer ID orBroker ID orTrade ID or
…
Harder to partitionMore complexity & variety in transaction mix
15
Within the Storage ManagerSun Niagara T2
64 HW Contexts
SF 64 – 0.6GBSpread
SF 64 – 8.2GBSpread
SF 1 – 20GBNo-Spread
4 16 48 4 16 48 4 16 48TPC-B TPC-C TPC-E
0%
20%
40%
60%
80%
100%
Other
Btree
BPool
Logging
Locking
#HW Contexts
Tim
e Br
eakd
own
16
Within the Storage ManagerSun Niagara T2
64 HW Contexts
SF 64 – 0.6GBSpread
SF 64 – 8.2GBSpread
SF 1 – 20GBNo-Spread
Lock manager is the main bottleneck for TPC-E
4 16 48 4 16 48 4 16 48TPC-B TPC-C TPC-E
0%
20%
40%
60%
80%
100%
Other
Btree
BPool
Logging
Locking
#HW Contexts
Tim
e Br
eakd
own
17
4 16 48 4 16 48 4 16 48TPC-B TPC-C TPC-E
0%
20%
40%
60%
80%
100%
Physical ContentionLogical ContentionLock Ac-quisition
#HW Contexts
Tim
e Br
eakd
own
SF 64 – 8.2GBSpread
Inside the Lock Manager
SF 64 – 0.6GBSpread
SF 1 – 20GBNo-Spread
Logical contention even for a large DB
18
Conclusions• Modern hardware is still highly under-utilized
– TPC-E: fewer misses, less stall time, higher IPC– OLTP utilizes less aggressive cores better
• Instruction footprint is too large to fit in L1-I– Spread instructions, (software guided) prefetching– Code/Compiler optimizations
• Logical lock contention due to hotspots– Increased complexity in schema and transactions– TPC-E: harder to physically partition– Logical partitioning, OCC
The obsolet
eThe
ubiquitous
The unexplored
Directed by
Produced by
Also starring: Shore-MT, Xeon X5660, Niagara T2
TP C- B
TPC-C
TP C- E
Recommended