Performance evaluation of fast integer compression
techniques over tables
Ikhtear Md. Sharif Bhuyan
Supervisors: Hazel Webb, Daniel Lemire, Owen Kaser
©Ikhtear Md. Sharif Bhuyan
Overview • Introduction
• Compression in databases and issues
• Objectives
• Experimental Results
• Conclusion
• Future Work
12/4/2013 Performance evaluation of fast integer compression techniques over tables 2
Query processing
12/4/2013 3 Performance evaluation of fast integer compression techniques over tables
RAM
Disk
Cache
Processor
Compression in databases
• Reduce storage
• Query processing speed
• Save I/O bandwidth
• Improve performance for I/O-bound operation
12/4/2013 4 Performance evaluation of fast integer compression techniques over tables
Selecting Compression in
databases
• Lossless
• Trade off between compression ratio and speed of
compression and decompression
12/4/2013 5 Performance evaluation of fast integer compression techniques over tables
Objective
• Examining and comparing the performance of
patched schemes with other methods with respect
to compression ratio, decompression speed and
compression speed.
• Assessing the effect of different factors such as row
order.
12/4/2013 6 Performance evaluation of fast integer compression techniques over tables
Column-oriented database
system
ID Name
104543 Peter
203456 Sam
234321 Maria
12/4/2013 Performance evaluation of fast integer compression techniques over tables 7
104543 Peter
203456 Sam
234321 Maria
104543 203456 234621
Peter Sam Maria
Row-oriented database
Column-oriented database
Compression Algorithm • Variable length output
o Byte-oriented compression: Integers are coded in
units of bytes. i.e., Variable-Byte
o Block-based compression: These schemes use a
fixed number of input integers and output a
variable number of bytes. e.g., FOR, NewPFD,
FastPFD
12/4/2013 8 Performance evaluation of fast integer compression techniques over tables
Compression Algorithm (Contd …)
• Fixed length output Each step takes a variable number of integers
and produces a compressed form of those integers
using a fixed number of bits as a unit. i.e., Simple9
12/4/2013
Performance evaluation of fast integer compression techniques over tables
9
Binary packing
• Original Sequence
• the numbers range from 67 to 98.
• Compressed Sequence
12/4/2013 10 Performance evaluation of fast integer compression techniques over tables
67 78 85 96 98
0 11 18 29 31
Patched Compression
• Original Sequence
• The exception # 11111.
• Base value b=2 (non-exceptional values), maximum
number of bits 5, number of exception 1, location
of exception 125
• Compressed Sequence
12/4/2013 11 Performance evaluation of fast integer compression techniques over tables
11 1 10 … 11 11 11111 10 11
11 1 10 … 11 11 11 10 11
Synthetic data experiments • Compression Ratio Clustered data
12/4/2013 12 Performance evaluation of fast integer compression techniques over tables Clustered Data
Synthetic data experiments (Contd …)
• Compression Ratio Uniform data
12/4/2013 13 Performance evaluation of fast integer compression techniques over tables
Uniform data
Synthetic data experiments(Contd …)
• Decompression Speed:
12/4/2013 14 Performance evaluation of fast integer compression techniques over tables Clustered data
Synthetic data experiments(Contd …)
12/4/2013 15 Performance evaluation of fast integer compression techniques over tables
Uniform Data
Real Data Sets
• Census-Income
• Census1881
• Star Schema Benchmark
12/4/2013 16 Performance evaluation of fast integer compression techniques over tables
Column wise Compressed size
12/4/2013 17 Performance evaluation of fast integer compression techniques over tables
Column-wise compressed size for Census1881 of frequency coded file
Original Shuffled
Column wise Compressed size
(Contd …)
12/4/2013 18 Performance evaluation of fast integer compression techniques over tables
Column-wise compressed size for Census1881 of frequency coded file
Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3)
Column wise Compression
speed
12/4/2013 19 Performance evaluation of fast integer compression techniques over tables
Column-wise compression speed for Census1881 of frequency coded file
Column wise Compression
speed (Contd …)
12/4/2013 20 Performance evaluation of fast integer compression techniques over tables
Column-wise compression speed for Census1881 of frequency coded file
Column wise Decompression
speed
12/4/2013 21 Performance evaluation of fast integer compression techniques over tables
Column-wise decompression speed for Census1881 of frequency coded file
Column wise Decompression
speed (Contd …)
12/4/2013 22 Performance evaluation of fast integer compression techniques over tables
Column-wise decompression speed for Census1881 of frequency coded file
Effect of Row Order
12/4/2013 23 Performance evaluation of fast integer compression techniques over tables
Histogram of compressed size (bits/int)
Conclusion • Sorting columns results in good compressed size.
• Sorted columns can be compressed and
decompressed faster than shuffled order.
• Selection of compression schemes depends on the
nature of database(OLPT/OLAP) and the
requirement of storage and data access speed.
12/4/2013 24 Performance evaluation of fast integer compression techniques over tables
Future Work • Incorporating a query engine to asses real world
performance.
• Comparing on processor-level metrics.
• Using multiple threads in compression algorithm.
• Query in compressed form
12/4/2013 25 Performance evaluation of fast integer compression techniques over tables
Thank You
12/4/2013 26 Performance evaluation of fast integer compression techniques over tables
Backup
12/4/2013 27 Performance evaluation of fast integer compression techniques over tables
Key Issues
• Data access latency
The time it takes between the request sent and the
data is found on disk to start processing.
• Disk bandwidth
The amount of data can be sent per second from the
disk.
12/4/2013 28 Performance evaluation of fast integer compression techniques over tables
Experimental Setup
• Hardware o Intel Core i5-2400
o RAM: 8 GB
o Cache: 6MB L3
o Memory Clock Speed: 1333 MHz
• Software o Java SDK version 1.7.0
o https://github.com/lemire/JavaFastPFOR
o Single-threaded
• More Info o http://hdl.handle.net/1882/45703
12/4/2013 29 Performance evaluation of fast integer compression techniques over tables
Compressed Size
12/4/2013 30 Performance evaluation of fast integer compression techniques over tables
Coding Scheme Original Shuffled High Card. Low Card.
Variable-Byte 15.00 15.00 15.00 15.00
Binary Packing 11.37 11.42 11.15 11.37
NewPFD 13.06 13.19 12.32 13.14
OptPFD 11.84 11.85 11.80 11.80
FastPFOR 11.27 11.29 11.06 11.24
Simple9 15.75 15.90 15.72 15.84
Result of compression (bits per integer) on SSB with frequency coded file
Compression Speed
12/4/2013 31 Performance evaluation of fast integer compression techniques over tables
Coding Scheme Original Shuffled High Card. Low Card.
Variable-Byte 33 31 33 31
Binary Packing 729 711 746 732
NewPFD 52 36 40 34
OptPFD 6 3 5 4
FastPFOR 104 76 89 84
Simple9 78 60 69 64
Result of compression speed (mis) on Census1881 with frequency coded file
Decompression Speed
12/4/2013 32 Performance evaluation of fast integer compression techniques over tables
Coding Scheme Original Shuffled High Card. Low Card.
Variable-Byte 165 197 214 186
Binary Packing 1151 1089 1151 1135
NewPFD 709 615 729 689
OptPFD 421 357 482 381
FastPFOR 776 707 763 730
Simple9 488 377 447 398
Result of decompression speed (mis) on Census1881 with frequency coded file
Column wise Compressed size
12/4/2013 33 Performance evaluation of fast integer compression techniques over tables
Column-wise compressed size for Census1881 of frequency coded file
Original Shuffled
Column wise Compressed size
12/4/2013 34 Performance evaluation of fast integer compression techniques over tables
Column-wise compressed size for Census1881 of frequency coded file
Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3)
Column wise Compression
speed
12/4/2013 35 Performance evaluation of fast integer compression techniques over tables
Column-wise compression speed for Census1881 of frequency coded file
Column wise Decompression
speed
12/4/2013 36 Performance evaluation of fast integer compression techniques over tables
Column-wise decompression speed for Census1881 of frequency coded file
Effect of CPU family on
compression speed
12/4/2013 37 Performance evaluation of fast integer compression techniques over tables Compression speed (mis) on different processor
Effect of CPU family on
decompression speed
12/4/2013 38 Performance evaluation of fast integer compression techniques over tables
Decompression speed (mis) on different processor