34
December 5, 2017 Takayuki Tsunakawa Fujitsu Limited Road to a Multi-model Database -- making PostgreSQL the most popular and versatile database Copyright 2017 FUJITSU LIMITED 0 PGConf.ASIA 2017

Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Embed Size (px)

Citation preview

Page 1: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

December 5, 2017

Takayuki TsunakawaFujitsu Limited

Road to a Multi-model Database-- making PostgreSQL the most

popular and versatile database

Copyright 2017 FUJITSU LIMITED0

PGConf.ASIA 2017

Page 2: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Who am I?

Takayuki Tsunakawa PostgreSQL contributor PostgreSQL Enterprise Consortium member

(PostgreSQL Ecosystem Wiki maintainer)

Develop/Maintain/SupportFUJITSU Software Enterprise Postgres

(PostgreSQL-based product) Support open source PostgreSQL in various products

Copyright 2017 FUJITSU LIMITED1

Page 3: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

AgendaWhy is multi-model necessary? (background)What is multi-model database?How should we implement it?

Copyright 2017 FUJITSU LIMITED2

Page 4: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Why is multi-model necessary?

Copyright 2017 FUJITSU LIMITED3

Page 5: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Big Data

Copyright 2017 FUJITSU LIMITED

Variety

VelocityVolume

4

Page 6: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Can PostgreSQL Handle Big Data?

Copyright 2017 FUJITSU LIMITED

Variety

Volume Velocity

Key-value modelhstore type

Document modeljsonb type

PartitioningPostgreSQL 10~

ScaleoutPostgres-XL Citus

(fork) (extension)

GPUPG-Strom

(extension)

StreamingPipelineDB

(fork)

In-memory columnarIn developing

Persistent memory, FPGA, SIMD

N/A

5

Page 7: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Developer productivity with flexible data model Can handle various data types as-is (array, list, object, graph, etc.)No need to map to relational model (eliminate ORM)

High scalability Can store and process voluminous data Can handle many requests simultaneously

Fault tolerance

Why NoSQL Attracts Attention?

Copyright 2017 FUJITSU LIMITED

array list object

graph ・・・relational modelMapping

is not needed

voluminous dataapplication

6

Page 8: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Data Models

Copyright 2017 FUJITSU LIMITED

Data model Representative DBMSsRelational Oracle, MySQL, SQL Server, PostgreSQLKey-value Redis, MemcachedDocument MongoDB, CouchBase, MarkLogicGraph Neo4jWide columnar Cassandra, HbaseRDF MarkLogic, Virtuoso, OracleText search Elasticsearch, Apache SolrTime series InfluxDBMulti-dimensional array rasdaman, SciDBEvent Event Store, NEventStoreObject InterSystems Cache

7

Page 9: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Polyglot PersistenceUse multiple DBMSs in one system/application Spread by Martin Fowler

Copyright 2017 FUJITSU LIMITED

Graph

Key-value

Document Wide columnar

Data models in online shopping application

RDBWeb sessionshopping cartuser profile

customerorder

recommendationproduct catalog

Web access log

application

8

Page 10: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Multiple DBMSs Use Leading tech companies use many DBMSs (ex. Netflix)

Copyright 2017 FUJITSU LIMITED

Data model DBMSsRelational MySQL, Redshift

Key-value Memcached, Redis, Hollow (developed by Netflix )

Text search Elasticsearch

Wide columnar Cassandra

Time series Atlas (developed by Netflix )

Event Druid

9

Page 11: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Problems (1/2)Data silo to prevent cross-sectional data analysis Time-consuming and laborious ETL Complex logic in application (fetch, join, aggregation, sort)

Data consistency among DBMSsDistributed transaction is not available in all DBMSs

Infrastructure cost increase due to duplication of data

Copyright 2017 FUJITSU LIMITED

・・・ ・・・

Key-value Graph

RDB Document

10

Page 12: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Problems (2/2)Operational complexity Product/OSS software management, support/service contracts Infrastructure provisioning (server, storage, network)Deployment, patching, testing, configuration, version control Security: user management, access control, encryption, auditingMonitoring and diagnosis, performance tuning, troubleshootingHA: backup/recovery, local failover, disaster recovery

Steep learning curve for developersDBMS-specific non-SQL API and SQL-like query language Transaction control, consistency model, application tuning

Lack of skilled personnel

Copyright 2017 FUJITSU LIMITED

? ?

11

Page 13: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

What is multi-model database?

Copyright 2017 FUJITSU LIMITED12

Page 14: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Overview Support multiple data models in one DBMS

Copyright 2017 FUJITSU LIMITED

etc・・・

RDB Graph Key-value Document

application

Very smart!

13

Page 15: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Merits

"All-in-one" is convenient, just like a smartphone

Copyright 2017 FUJITSU LIMITED

Smooth data utilization with less data integration

Higher developer productivity

Lower cost for infrastructure and DBA

14

Page 16: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Multi-model Database Examples

Copyright 2017 FUJITSU LIMITED

DBMS Supported data modelsArangoDB key-value, document, graph

Cosmos DB key-value, document, graph

CouchBase key-value, document

DataStax(on Cassandra) key-value, wide column, graph

MarkLogic document, text/binary, RDF

OrientDB key-value, document, graph, text/binary

15

Page 17: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Trends of Major DBMSsMajor RDBMSs are adding data modelsNoSQL DBMSs are also adding data models

Copyright 2017 FUJITSU LIMITED

DBMS Key-value Document Wide column Graph

Oracle ++ +MySQL ++ +SQL Server + +MongoDB + ++ +PostgreSQL + +

Data model support in top 5 popular DBMSs

16

Page 18: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

PostgreSQL as a Multi-model DatabaseWhy based on RDBMS?

Why based on PostgreSQL?

Copyright 2017 FUJITSU LIMITED

Mature storage engine and transaction management Smart optimizer Prevalent RDBMS gives more people the chance to use

RDBMS has

Extensibility as a data platform Liberal community open to niche data models

PostgreSQL has

17

Page 19: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

How should we implement multi-model database?

Copyright 2017 FUJITSU LIMITED18

Page 20: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

What is Data Model?

Copyright 2017 FUJITSU LIMITED

Data model = Structure + Constraint + OperationData model

Structure

table, row, column

key, value

node, relationship,property, label

Constraint

unique, referential, check, not null, ...

unique

unique, node existence

Operation

scan, join, restriction,

projection, …

get, put

scan, join, restriction, projection,

pattern match, …

Relational

Key-value

Graph

19

Page 21: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Query Language and API Adopt standard and well-known languages/APIs per

data modelDeveloper productivity: leverage skill/know-how/asset Rich information for learning Standard compliance and popularity for ecosystem

Examples

Copyright 2017 FUJITSU LIMITED

Data model languages/APIs Key-value Redis API, Memcached APIDocument SQL/JSON path (SQL standard), MongoDB APIGraph Cypher, GremlinRDF SPARQL (W3C standard)Array SQL/MDA (Multi-Dimensional Array) (future SQL standard)

20

Page 22: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Multi-model Approach 1 Flexible Schema Data (FSD) Leverage RDBMS’s user defined data type, function, and index Store/access data in a table column with functions in SQLUsed for XML, JSON, geospatial data

Copyright 2017 FUJITSU LIMITED

http://cidrdb.org/cidr2015/Papers/CIDR15_Paper5.pdfreference :

Relational Data Flexible SchemaData (FSD)

RDBMSSQLSQL

NoSQL API

application

user defined data type,function, index

21

Page 23: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Multi-model Approach 2 Independent data model componentsQuery language and API for each data modelData is optionally separated from relational dataUse for Graph, RDF, time series, event…

Independence ensures performance for each data model

Copyright 2017 FUJITSU LIMITED

ParserTransformer

PlannerExecutor

Graph Parser

RDF

Que

ry P

roce

ssor

Cypher, GremlinSPARQL

ParserTransformer

PlannerExecutor

SQL

Relational

application

Storage engineStorage engine

22

Page 24: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Examples Based on Approach 2Graph model: AgensGraph (fork) https://github.com/bitnine-oss/agensgraph

Time series model: TimescaleDB (extension) https://github.com/timescale/timescaledb

Copyright 2017 FUJITSU LIMITED23

Page 25: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Pluggable Data ModelWant to facilitate data model development Introduce 3 pluggable objectsQuery language : generate parse tree from query stringData model : generate query plan from parse tree and run it Region : combination of query language and data model

Copyright 2017 FUJITSU LIMITED

Data model as an extension

Querylanguage

Data model

Relational model

PostgreSQL

Querylanguage

Data model

Graph model

Querylanguage

RDF model24

Page 26: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Multi-model QueryMix queries for multiple data models in a query string Execute query in a specified region

in_region(region_name, query string) Convert data across regions

cast_region(source data, dest region name,dest container, dest schema)

Copyright 2017 FUJITSU LIMITED

-- Among Chinese restaurants in Tokyo, -- list up to 5 top ones among friends' friends SELECT r.name, g.num_likers FROM restaurant r,

cast_region(in_region('graph_cypher',

'MATCH (:Person {name:"Taro"})-[:IS_FRIEND_OF*1..2]-(friend),(friend)-[:LIKES]->(restaurant:Restaurant)RETURN restaurant.name, count(*)'),

'relational', 'g', '(name text, num_likers int') WHERE r.name = g.name AND r.city = 'Tokyo' AND r.cuisine = ‘chinese'ORDER BY g.num_likers DESC LIMIT 5;

25

Page 27: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Mixed-model Query Execution

Copyright 2017 FUJITSU LIMITED

Multi-model query plan

relation:table/index scanrestaurant

relation:join

relation:sort

graph:pattern matchIS_FRIEND_OF

graph:node scanPerson

26

Page 28: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Document Model PostgreSQL supports JSON since 2012, but…Different SQL/JSON was standardized in SQL:2016 Store JSON data in character/binary column Intuitive function and SQL/JSON path language Powerful JSON_TABLE function to map JSON to relational data

Support for SQL/JSON is being developed in community

Copyright 2017 FUJITSU LIMITED

SELECT JSON_VALUE(jcol, '$.name') AS name, JSON_QUERY(jcol, '$.skills') AS skills

FROM empWHERE

JSON_EXISTS(jcol, '$.projects[*] ? (@.category == "IoT")');

SELECT jcol ->> 'name' AS name, jcol -> 'skills' AS skills

FROM empWHERE

jcol @> '{ "projects": [{ "category": "IoT" }] }';

Query in current PostgreSQL Query in SQL/JSON

27

Page 29: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Index

Graph Model The key is performance in storage engine RDB is slow to traverse graph due to index scan Eliminate index scan using direct pointers between recordsNode traversal cost drops from O(n) to O(1)

Copyright 2017 FUJITSU LIMITED

Friend Friend

FriendFriend

Native graph

Friend

Jill JackJohn JackJohn JillJack JillJack John

Graph in RDBMS

John Jack Jill

John

Jack

28

Page 30: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Key-value Model PostgreSQL has hstore data type, but Less performant than expectedUnfamiliar API

Solution: Redis in the background workerMaximal performance by bypassing SQL processor Familiar, developer-friendly Redis API

Copyright 2017 FUJITSU LIMITED

SQL processor storage engine

Redis API (get/put)

application

DiskTable

29

Page 31: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Conclusion

Copyright 2017 FUJITSU LIMITED

Multi-model is necessary for broader use of PostgreSQL

PostgreSQL 10

PostgreSQL 11 Build pluggable data model infrastructure Add/Improve popular data models:

key-value, SQL/JSON, graph

PostgreSQL 12 Add other (niche?) data models

30

Page 32: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Let’s do it together! Search “multi-model“ in pgsql-hackers mailing list

Any idea/wish comment as a user is welcome Contact me if inconvenient (Japanese/English OK)

[email protected]

Copyright 2017 FUJITSU LIMITED31

Page 33: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

32

Page 34: Road to a Multi-model Database - PGConf ASIA · Document MongoDB, CouchBase, MarkLogic ... performance tuning, troubleshooting HA: ... Road to a Multi-model Database Author:

Copyright 2017 FUJITSU LIMITED

Questions?

33