Upload
ziqiang-feng
View
58
Download
0
Embed Size (px)
Citation preview
Accelerating Aggregation using
Intra-cycle ParallelismZiqiang Feng, Eric Lo
Department of ComputingThe Hong Kong Polytechnic University{cszqfeng, ericlo}@comp.polyu.edu.hk
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 2
Background• Analytic database• Memory-resident• Column store• Compression
Column values compressed into short (int) codes
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 3
Background• Analytic database• Memory-resident• Column store• Compression
Column values compressed into short (int) codes
salary1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
Encodedsalary
1
2
3
4
5
6
7
8
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 4
Background• Analytic database• Memory-resident• Column store• Compression
Column values compressed into short (int) codes
salary1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
Encodedsalary
1
2
3
4
5
6
7
8
Binary rep.
0001
0010
0011
0100
0101
0110
0111
1000
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 5
Background
64-bit Wide processor words• Process 64 bits information per cycle
• Analytic database• Memory-resident• Column store• Compression
Column values compressed into short (int) codes
salary1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
Encodedsalary
1
2
3
4
5
6
7
8
Binary rep.
0001
0010
0011
0100
0101
0110
0111
1000
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 6
Background
64-bit Wide processor words• Process 64 bits information per cycle
• Analytic database• Memory-resident• Column store• Compression
Column values compressed into short (int) codes
salary1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
1 0 0 0
load
Encodedsalary
1
2
3
4
5
6
7
8
Binary rep.
0001
0010
0011
0100
0101
0110
0111
1000
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 7
Background
64-bit Wide processor words• Process 64 bits information per cycle
Wasted ↑
How to utilize?
• Analytic database• Memory-resident• Column store• Compression
Column values compressed into short (int) codes
salary1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
1 0 0 0
load
Encodedsalary
1
2
3
4
5
6
7
8
Binary rep.
0001
0010
0011
0100
0101
0110
0111
1000
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 8
• HBP & VBP (Li and Patel, SIGMOD’13)• Two bit-packed storage layouts• Horizontal Bit Packing (HBP)• Vertical Bit Packing (VBP)• Fast filter scans (e.g., )
A Better Approach …
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 9
• HBP & VBP (Li and Patel, SIGMOD’13)• Two bit-packed storage layouts• Horizontal Bit Packing (HBP)• Vertical Bit Packing (VBP)• Fast filter scans (e.g., )
• An example of HBP:
A Better Approach … Encodesalary
12
3
456
7
8
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 10
• HBP & VBP (Li and Patel, SIGMOD’13)• Two bit-packed storage layouts• Horizontal Bit Packing (HBP)• Vertical Bit Packing (VBP)• Fast filter scans (e.g., )
• An example of HBP:
A Better Approach … Encodesalary
12
3
456
7
8
Load 8 values
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 11
• HBP & VBP (Li and Patel, SIGMOD’13)• Two bit-packed storage layouts• Horizontal Bit Packing (HBP)• Vertical Bit Packing (VBP)• Fast filter scans (e.g., )
• An example of HBP:
A Better Approach …
1 2 3 4 5 6 7 8
Encodesalary
12
3
456
7
8
Load 8 values
0000000100000010000000110000010000000101000001100000011100001000
64-bit CPU register
8 bits
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 12
• HBP & VBP (Li and Patel, SIGMOD’13)• Two bit-packed storage layouts• Horizontal Bit Packing (HBP)• Vertical Bit Packing (VBP)• Fast filter scans (e.g., )
• An example of HBP:
A Better Approach …
1 2 3 4 5 6 7 8
Encodesalary
12
3
456
7
8
Load 8 values
0000000100000010000000110000010000000101000001100000011100001000
64-bit CPU register
8 bits
1 CPU instruction process 8 values simultaneously 8x intra-cycle parallelism
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 13
• HBP & VBP (Li and Patel, SIGMOD’13)• Two bit-packed storage layouts• Horizontal Bit Packing (HBP)• Vertical Bit Packing (VBP)• Fast filter scans (e.g., )
• An example of HBP:
A Better Approach …
1 2 3 4 5 6 7 8
Encodesalary
12
3
456
7
8
Load 8 values
0000000100000010000000110000010000000101000001100000011100001000
Not Wasted 64-bit CPU register
8 bits
1 CPU instruction process 8 values simultaneously 8x intra-cycle parallelism
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 14
• Remains open• Baseline (example: sum)
Aggregation (SUM, MIN, MAX, MEDIAN, AVG, COUNT) on HBP/VBP?
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 15
• Remains open• Baseline (example: sum)
Aggregation (SUM, MIN, MAX, MEDIAN, AVG, COUNT) on HBP/VBP?
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
000000011lookup
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 16
• Remains open• Baseline (example: sum)
Aggregation (SUM, MIN, MAX, MEDIAN, AVG, COUNT) on HBP/VBP?
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
000000011
000000102
lookup
lookup
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 17
• Remains open• Baseline (example: sum)
Aggregation (SUM, MIN, MAX, MEDIAN, AVG, COUNT) on HBP/VBP?
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
000000011
000000102
000000113
lookup
lookup
lookup
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 18
• Remains open• Baseline (example: sum)
Aggregation (SUM, MIN, MAX, MEDIAN, AVG, COUNT) on HBP/VBP?
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
000000011
000000102
000000113
lookup
lookup
lookup
lookup
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 19
• Remains open• Baseline (example: sum)
Aggregation (SUM, MIN, MAX, MEDIAN, AVG, COUNT) on HBP/VBP?
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
000000011
000000102
000000113
+¿
+¿+¿
… …
lookup
0010010036
lookup
lookup
lookup
¿
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 20
• Remains open• Baseline (example: sum)
Aggregation (SUM, MIN, MAX, MEDIAN, AVG, COUNT) on HBP/VBP?
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
000000011
000000102
000000113
+¿
+¿+¿
… …
lookup
0010010036
lookup
lookup
lookup
¿
Wasted (again).
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 21
• Remains open• Baseline (example: sum)
Aggregation (SUM, MIN, MAX, MEDIAN, AVG, COUNT) on HBP/VBP?
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
000000011
000000102
000000113
+¿
+¿+¿
… …
lookup
0010010036
lookup
lookup
lookup
A lookup is expensive • Involves a sequence of • Scan: cycle/value• Lookup: cycles/value!
¿
Wasted (again).
22
Bit-parallel Approach (This paper):
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
23
Bit-parallel Approach (This paper):
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
0010010036
∧ ,∨ ,≪ ,≫ ,+ ,×
24
Bit-parallel Approach (This paper):
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
0010010036
∧ ,∨ ,≪ ,≫ ,+ ,×
• Each instruction works on 8 values simultaneously. • Uses much fewer instructions. • Expensive lookups are avoided.
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 25
Challenge0000000100000010000000110000010000000101000001100000011100001000
1 2 3 4 5 6 7 8
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 26
Challenge0000000100000010000000110000010000000101000001100000011100001000
1 2 3 4 5 6 7 8
What CPU sees: a 64-bit integer
64-bit CPU register
0000000100000010000000110000010000000101000001100000011100001000
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 27
• We want: 36 (sum)
Challenge0000000100000010000000110000010000000101000001100000011100001000
1 2 3 4 5 6 7 8
What CPU sees: a 64-bit integer
64-bit CPU register
0000000100000010000000110000010000000101000001100000011100001000
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 28
• We want: 36 (sum)• How to interpret/manipulate this (meaningless)
huge number?
Challenge0000000100000010000000110000010000000101000001100000011100001000
1 2 3 4 5 6 7 8
What CPU sees: a 64-bit integer
64-bit CPU register
0000000100000010000000110000010000000101000001100000011100001000
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 29
Contribution
CountSumAvgMinMax
Median
HBPVBP× are covered in our paper.Bit-parallel Aggregations for
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 30
CountSumAvgMinMax
Median
HBPVBP× are covered in our paper.Bit-parallel Aggregations for
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 31
Example: Sum in HBP0000000100000010000000110000010000000101000001100000011100001000
1 2 3 4 5 6 7 8
Baseline
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 32
Example: Sum in HBP0000000100000010000000110000010000000101000001100000011100001000
1 2 3 4 5 6 7 8
000000010000001000000011000001000000010100000110000001111 2 3 4 5 6 7
(1). Right shift 8
Baseline
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 33
Example: Sum in HBP0000000100000010000000110000010000000101000001100000011100001000
1 2 3 4 5 6 7 8
000000010000001000000011000001000000010100000110000001111 2 3 4 5 6 7
00000001000000110000010100000111000010010000101100001101000011111 1+2 2+3 3+4 4+5 5+6 6+7 7+8
(1). Right shift 8
(2). Add
Baseline
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 34
Example: Sum in HBP0000000100000010000000110000010000000101000001100000011100001000
1 2 3 4 5 6 7 8
000000010000001000000011000001000000010100000110000001111 2 3 4 5 6 7
00000001000000110000010100000111000010010000101100001101000011111 1+2 2+3 3+4 4+5 5+6 6+7 7+8
00000000000000110000000000000111000000000000101100000000000011110 1+2 0 3+4 0 5+6 0 7+8
(1). Right shift 8
(2). Add
(3). Mask
Baseline
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 35
Cont’d0000000000000011000000000000011100000000000010110000000000001111
0 1+2 0 3+4 0 5+6 0 7+8
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 36
Cont’d0000000000000011000000000000011100000000000010110000000000001111
0 1+2 0 3+4 0 5+6 0 7+8
0000000000000001000000000000000100000000000000010000000000000001
00000000001001000000000000100001000000000001101000000000000011111+2+3+4+5+6+7+8 3+4+5+6+7+8 5+6+7+8 7+8
(4). Multiply
0 1 0 1 0 1 0 1
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 37
Cont’d0000000000000011000000000000011100000000000010110000000000001111
0 1+2 0 3+4 0 5+6 0 7+8
0000000000000001000000000000000100000000000000010000000000000001
00000000001001000000000000100001000000000001101000000000000011111+2+3+4+5+6+7+8 3+4+5+6+7+8 5+6+7+8 7+8
(4). Multiply
0 1 0 1 0 1 0 1
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 38
Cont’d0000000000000011000000000000011100000000000010110000000000001111
0 1+2 0 3+4 0 5+6 0 7+8
0000000000000001000000000000000100000000000000010000000000000001
00000000001001000000000000100001000000000001101000000000000011111+2+3+4+5+6+7+8 3+4+5+6+7+8 5+6+7+8 7+8
(4). Multiply
0 1 0 1 0 1 0 1
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 39
Cont’d0000000000000011000000000000011100000000000010110000000000001111
0 1+2 0 3+4 0 5+6 0 7+8
0000000000000001000000000000000100000000000000010000000000000001
00000000001001000000000000100001000000000001101000000000000011111+2+3+4+5+6+7+8 3+4+5+6+7+8 5+6+7+8 7+8
00000000000000000000000000000000000000000000000000000000001001001+2+3+4+5+6+7+8 = 36
(4). Multiply
(5). Right shift 48
0 1 0 1 0 1 0 1
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 40
• Baseline:
Summary: Sum on HBP
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 41
• Baseline:
• Our method: use only 5 instructions
Summary: Sum on HBP
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 42
• Baseline:
• Our method: use only 5 instructions
• One 64-bit word contains 8 eight-bit values• One instruction processes 8 values in parallel
Summary: Sum on HBP
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 43
• Baseline:
• Our method: use only 5 instructions
• One 64-bit word contains 8 eight-bit values• One instruction processes 8 values in parallel• Advantage:
• parallelism is achieved • # of instructions is low
Summary: Sum on HBP
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 44
CountSumAvgMinMax
Median
HBPVBP× are covered in our paper.Bit-parallel Aggregations for
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 45
• Baseline
Example: min on HBP
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 46
• Baseline
Example: min on HBP
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
000000011
000000102
000000113
lookup
lookup
lookup
lookup … …
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 47
• Baseline
Example: min on HBP
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
000000011
000000102
000000113
lookup
lookup
lookup
lookup … … min
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 48
• Bit-parallel: consider 16 values …
Cont’d
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
0000001100000011000000110000001100000011000000110000001100000011
𝑣1 𝑣8
𝑣9 𝑣163 3 3 3 3 3 3 3
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 49
• Bit-parallel: consider 16 values …
Cont’d
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
0000001100000011000000110000001100000011000000110000001100000011
00000001000000100000001100000011000000110000001100000011000000111 2 3 3 3 3 3 3
𝑣1 𝑣8
𝑣9 𝑣163 3 3 3 3 3 3 3
Bit-parallel slot-wise min(1). ; (2). ; (3). ; … …
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 50
• Bit-parallel: consider 16 values …
Cont’d
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
0000001100000011000000110000001100000011000000110000001100000011
00000001000000100000001100000011000000110000001100000011000000111 2 3 3 3 3 3 3
𝑣1 𝑣8
𝑣9 𝑣163 3 3 3 3 3 3 3
Bit-parallel slot-wise min(1). ; (2). ; (3). ; … …
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 51
• Bit-parallel: consider 16 values …
Cont’d
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
0000001100000011000000110000001100000011000000110000001100000011
00000001000000100000001100000011000000110000001100000011000000111 2 3 3 3 3 3 3
𝑣1 𝑣8
𝑣9 𝑣163 3 3 3 3 3 3 3
Bit-parallel slot-wise min(1). ; (2). ; (3). ; … …
lookup lookup lookup… ….
64-bit CPU register
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 52
• Work on the storage layout (HBP/VBP) directly• Intra-cycle parallelism is utilized in calculation• Avoid expensive lookup operation• # of instructions is low
In a nutshell …
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 53
Aggregation time Reduced by 28.1%
TPC-H Result: HBP
Aggregation(this paper)
Aggregation(baseline)vs
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 54
Aggregation time Reduced by 28.1%
TPC-H Result: HBP
Aggregation(this paper)
Aggregation(baseline)vs
Whole-query(this paper)
Whole-query(baseline)vs Whole-query time Reduced by 20.4%
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 55
TPC-H Result: VBP
Aggregation time Reduced by 55.0%Aggregation(this paper)
Aggregation(baseline)vs
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 56
TPC-H Result: VBP
Aggregation time Reduced by 55.0%Aggregation(this paper)
Aggregation(baseline)vs
Whole-query(this paper)
Whole-query(baseline)vs Whole-query time Reduced by 44.4%
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 57
• Intra-cycle parallelism is important. • We devised a suite of algorithms to compute aggregation
very efficiently.• This paper has been built around HBP and VBP, but …
Conclusion and Future Work
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 58
• Intra-cycle parallelism is important. • We devised a suite of algorithms to compute aggregation
very efficiently.• This paper has been built around HBP and VBP, but …
Conclusion and Future Work
ByteSlice --- better than HBP and VBP.See our SIGMOD’15 paper:“ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout”
59
Thank you.
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 60
select SUM(b) from R where a > 88
Combining Scan and Aggregation
0 0 0 1 0 1 0 1
Filter result bit vector
Column b
(1). Scan
(2). Identify the 4th tuple
(3). Lookup
Running SUM
(4). Add
The Non-bit-parallel Approach
Column a
(5). Find next
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 61
select SUM(b) from R where a > 88
Combining Scan and Aggregation
0 0 0 1 0 1 0 1
Filter result bit vector (1). Scan
The Bit-parallel Approach
Column a
00000001000000100000001100000100000001010000011000000111000010001 2 3 4 5 6 7 8
00000000000000000000000011111111000000001111111100000000111111110 0 0 0xFF 0 0xFF 0 0xFF
00000000000000000000000000000100000000000000011000000000000010000 0 0 4 0 6 0 8
(2). Transform to a mask
(3). Intersect
… …
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 62
• Materialized join• Column store + compression Low space overhead• Benefit: complex (join) query scan-then-aggregate• WideTable (Li and Patel, VLDB’14)
• Sorted Projection• Exploit the replica requirement
• Multiple (copies) projections in different sort orders• Vertica
Handling Join and Group-by
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 63
• Materialize pseudo-column • A few pseudo-columns can satisfy the whole workload
Multiple Attributes
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 64
• Our solutions have significant improvement when selectivity > 1%
Effect of Query Selectivity
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 65
Our solutions outperform the baseline under all value widths (# of bits)
Effect of Value Width
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 66
• All attributes can be encoded in bits• More than a half can be encoded in bits.
TPC-H Attributes
Accelerating Aggregation using Intra-cycle Parallelism. Ziqiang Feng and Eric Lo. The Hong Kong Polytechnic University. ICDE’15. Seoul. 67
• There’s a tradeoff, depending on data distribution/workload.• HBP: slower scan + faster lookup• VBP: faster scan + slower lookup• Solution?• ByteSlice: fast scan + fast lookup (see our
SIGMOD’15 paper)
HBP or VBP?