Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
PG-Strom Query Acceleration Engine of PostgreSQL
Powered by GPGPU
NEC OSS Promotion Center
The PG-Strom Project
KaiGai Kohei <[email protected]>
Self Introduction
▌Name: KaiGai Kohei
▌Company: NEC
▌Mission: Software architect & Intrepreneur
▌Background:
Linux kernel development (2003~?)
PostgreSQL development (2006~)
SAP alliance (2011~2013)
PG-Strom development & productization (2012~)
▌PG-Strom Project:
In-company startup of NEC
Also, an open source software project
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.2
What is PG-Strom
▌An Extension of PostgreSQL
▌Off-loads CPU intensive SQL workloads to GPU processors
▌Major Features
① Automatic and just-in-time GPU code generation from SQL
② Asynchronous and concurrent query executor
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.3
database
Query Executor
Query Planner
Custom Executor
Custom Planner
GPU code on the fly SQL
command
Async- Execution
PG-Strom Q
uery
Fro
nte
nd
Concept
▌No Pain Looks like a traditional PostgreSQL database from standpoint of
applications, thus, we can utilize existing tools, drivers, applications.
▌No Tuning Massive computing capability by GPGPU kills necessity of database
tuning by human. It allows engineering folks to focus on the task only human can do.
▌No Complexity No need to export large data to external tools from RDBMS, because its
computing performance is sufficient to run the workloads nearby data.
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.4
Technology Trend
▌Movement to CPU/GPU integrated architecture rather than multicore CPU
▌Free lunch for SW by HW evolution will finish soon
Unless software is not designed to utilize GPU capability, unable to pull-out the full hardware capability.
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL Page. 5
SOURCE: THE HEART OF AMD INNOVATION, Lisa Su, at AMD Developer Summit 2013
Background: How SQL is executed
▌Planner constructs query execution plan based on cost estimation
▌SQL never defines how to execute the query, but what shall be returned
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.6
postgres# EXPLAIN SELECT cat, avg(x) FROM t0 NATURAL JOIN t1
WHERE t0.z like '%abc%' GROUP BY cat;
QUERY PLAN
------------------------------------------------------------------
HashAggregate (cost=6629.88..6629.89 rows=1 width=12)
Group Key: t0.cat
-> Hash Join (cost=1234.00..6619.77 rows=2020 width=12)
Hash Cond: (t0.aid = t1.aid)
-> Seq Scan on t0 (cost=0.00..5358.00 rows=2020 width=16)
Filter: (z ~~ '%abc%'::text)
-> Hash (cost=734.00..734.00 rows=40000 width=4)
-> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=4)
(8 rows)
Background: Custom-Plan Interface
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.7
Aggregate
SELECT cat, avg(x) FROM t1, t2
WHERE t1.id = t2.id AND y > 100
GROUP BY cat;
Scan on t1 Scan on t2
Join
t1 t2
key: cat • Hash Join • Merge Join • Nested Loop • Custom Join
• Seq Scan • Index Scan • Index-Only Scan • Tid Scan • Custom Scan
IndexScan on t1
y > 100
“BulkLoad” on t1
“GpuHashJoin”
t1.id = t2.id
PG-Strom Features
▌Logics
GpuScan ... Parallel evaluation of scan qualifiers
GpuHashJoin ... Parallel multi-relational join
GpuPreAgg ... Two phase aggregation
GpuSort ... GPU + CPU Hybrid Sorting
GpuNestedLoop (in develop)
▌Data Types
Integer, Float, Date/Time, Numeric, Text
▌Function and Operators
Equality and comparison operators
Arithmetic operators and mathematical functions
Aggregates: count, min/max, sum, avg, std, var, corr, regr
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.8
Automatic GPU code generation
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.9
postgres=# SET pg_strom.show_device_kernel = on;
postgres=# EXPLAIN VERBOSE SELECT * FROM t0 WHERE sqrt(x+y) < 10;
QUERY PLAN
--------------------------------------------------------------------------------
Custom Scan (GpuScan) on public.t0 (cost=500.00..357569.35 rows=6666683 width=77)
Output: id, cat, aid, bid, cid, did, eid, x, y, z
Device Filter: (sqrt((t0.x + t0.y)) < 10::double precision)
Features: likely-tuple-slot
Kernel Source: #include "opencl_common.h“
:
static pg_bool_t
gpuscan_qual_eval(__private cl_int *errcode,
__global kern_parambuf *kparams,
__global kern_data_store *kds,
__global kern_data_store *ktoast,
size_t kds_index)
{
pg_float8_t KPARAM_0 = pg_float8_param(kparams,errcode,0);
pg_float8_t KVAR_8 = pg_float8_vref(kds,ktoast,errcode,7,kds_index);
pg_float8_t KVAR_9 = pg_float8_vref(kds,ktoast,errcode,8,kds_index);
return pgfn_float8lt(errcode,
pgfn_dsqrt(errcode, pgfn_float8pl(errcode, KVAR_8, KVAR_9)), KPARAM_0);
}
Implementation (1/3) – GpuScan
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.10
Table
DMA Send
DMA Recv
DMA Send
DMA Recv
DMA Send
DMA Recv
Execution of auto-generated GPU code
Result Output Stream
Input Stream
Chunk (16~64MB)
PostgreSQL PG-Strom
Software Architecture
DB Tech Showcase 2014 Tokyo; PG-Strom - GPGPU acceleration on PostgreSQL Page. 11
GPU Code Generator
Storage
Storage Manager
Shared Buffer
Query Parser
Query Optimizer
Query Executor
SQL Query
Breaks down the query to parse tree
Makes query execution plan
Run the query Cust
om
-Pla
n A
PIs
GpuScan
GpuHashJoin
GpuPreAgg
GPU Program Manager
PG-Strom OpenCL Server
Message Queue
GpuSort
※ Current version based on OpenCL
Implementation (2/3) – GpuHashJoin
PG-Strom Preview Feb-2015 Page. 12
Inner relation
Outer relation
Inner relation
Outer relation
Hash Table Hash Table
Next stage Next stage
CPU just references materialized results
Hash-Table Search by CPU
Sequential Materialization by CPU
Parallel Materialization
Parallel Hash-Table
Search
vanilla Hash-Join GpuHashJoin
Benchmark result (1/2) – simple tables join
▌Benchmark Query: SELECT * FROM t0 NATURAL JOIN t1 [NATURAL JOIN ....];
▌Environment: t0 has 100million rows (13GB), t1-t9 has 40,000 rows for each, all-data pre-loaded
CPU: Xeon E5-2670v3 (12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.13
18.19 19.45 21.04 23.66 26.69 37.64 43.22 49.57 56.38
64.27
87.73
109.73
132.21
155.10
179.62
207.85
233.31
263.51
0.00
50.00
100.00
150.00
200.00
250.00
300.00
2 3 4 5 6 7 8 9 10
Qu
ery
Re
spo
nse
Tim
e [
sec]
number of tables joined
Simple Tables Join Benchmark
PG-Strom PostgreSQL
Implementation (3/3) – GpuPreAgg
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.14
Table
1st Stage Reduction
2nd Stage Reduction
Chunk (16~64MB)
Benchmark result (2/2) – Star Schema Model
▌40 typical reporting queries
▌100GB of retail / start-schema data, all pre-loaded
▌Environment CPU: Xeon E5-2670v3(12C, 2.3GHz) x2, RAM: 384GB, GPU: Tesla K20c x1
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.15
0.00
200.00
400.00
600.00
800.00
1000.00
1200.00
1400.00
1600.00
1800.00
2000.00
Q.0
1
Q.0
2
Q.0
3
Q.0
4
Q.0
5
Q.0
6
Q.0
7
Q.0
8
Q.0
9
Q.1
0
Q.1
1
Q.1
2
Q.1
3
Q.1
4
Q.1
5
Q.1
6
Q.1
7
Q.1
8
Q.1
9
Q.2
0
Q.2
1
Q.2
2
Q.2
3
Q.2
4
Q.2
5
Q.2
6
Q.2
7
Q.2
8
Q.2
9
Q.3
0
Q.3
1
Q.3
2
Q.3
3
Q.3
4
Q.3
5
Q.3
6
Q.3
7
Q.3
8
Q.3
9
Q.4
0
Qu
ery
Res
po
nse
Tim
e [s
ec]
Typical Reporting Queries on Retail / Star-Schema Data
PG-Strom PostgreSQL
Expected Scenario – Reduction of ETL
▌ETL – Its design is human centric task
▌Replication – much automatous task
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.16
ERP CRM SCM BI
OLTP database
OLAP database
ETL
OLAP Cubes Master / Fact Tables
BI
Replication
Replica of Master / Fact Tables
Optimized to transaction
workloads
Optimized to analytic workloads
Sufficient to analytic
workloads also
PG-Strom
Direction of PG-Strom
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.17
Development Plan
Migration of OpenCL to CUDA
Add support of GpuNestedLoop
Add support multi-functional kernel
Standardization of custom-join interface
...and more...?
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.18
Short term target: PostgreSQL v9.5 timeline (2015)
Middle term target: PostgreSQL v9.6 timeline (2016)
Current version: PG-Strom β + PostgreSQL v9.5devel
Integration with funnel executor
Investigation to SSD/NvRAM utilization
Custom-sort/aggregate interface
Add support for spatial data types (?)
Let’s try – Deployment on AWS
The PostgreSQL Conference 2014, Tokyo - GPGPU Accelerates PostgreSQL Page. 19
Search by “strom” !
AWS GPU Instance (g2.2xlarge)
CPU Xeon E5-2670 (8 xCPU)
RAM 15GB
GPU NVIDIA GRID K2 (1536core)
Storage 60GB of SSD
Price $0.898/hour, $646.56/mon (*) Price for on-demand instance on Tokyo region at Nov-2014
Welcome your involvement!
▌How to be involved?
as a user
as a developer
as a business partner
▌Source code
https://github.com/pg-strom/devel
▌Contact US
e-mail: [email protected]
twitter: @kkaigai
PG-Strom - Query Acceleration Engine of PostgreSQL Powered by GPGPU - P.20
check it out!
...or, catch me in the Convention Center