Upload
ashnikbiz
View
807
Download
1
Tags:
Embed Size (px)
Citation preview
Building a Hybrid Data Cluster with MongoDB
and PostgresA solution based on PostgreSQL’s Foreign Data Wrapper
27 April 2015
Context and Customer scenario
Customer Requirements for Hybrid Cluster
- More and more unstructured data being generated
- Increasing use and requirements of noSQL databases –because of
- usage scenario- ability to scale horizontally
- Challenges- A lot of Admin and Developer still prefer SQL as easy and
intutive tool to query information out of available data- Not many noSQL databases support complex queries as SQL
does e.g. JOINs, Sub-query etc
3
Real Life Use Cases
- noSQL as Archive store of RDBMS- RDBMS being used to store the operational and transactional data
- while noSQL may act as an archive store for historical data
- noSQL for receiving write stream- noSQL databases being used to accumulate data from various sources
with high write throughput across multiple shards
- while RDBMS is used to store the filtered data after it has been transformed into proper structures
- RDBMS makes it easier for the users to query data using SQLs and JOINs
4
Hybrid Data Cluster is the ‘need of hour’
- Most Advanced Open Source Database
- Supports Relational model of storing database
- Supports ACID features of Transactions- Multi Version Concurrency Control
- Write Ahead WAL files
- Scalability with Tablespaces and Partitions/child tables
- Supports unstructured data-types (JSON, JSONB, HSTORE) and full text search features
PostgreSQL
6
- Most popular noSQL Database for vast set of workloads
- Best for storing un-structured data
- Horizontal Scalability with sharding capability
- Provision for secondary indexes
- Aggregation and Map-reduce features
MongoDB
7
- Get the best out of both the worlds
- Based on SQL/MED – Management of External Data
- Allows you to create FOREIGN TABLES which maps to external entities
- These entities could be - Table in RDBMS- collection in MongoDB- Or can be mapped respective entities in HDFS or File System
- More about FDW in Postgres: https://wiki.postgresql.org/wiki/Foreign_data_wrappers
Foreign Data Wrappers of PostgreSQL
8
FDW for MongoDB
- Started by CitusDB and then forked by EnterpriseDB
- More details - https://github.com/EnterpriseDB/mongo_fdw
- The example we will discuss here is based on a Blogpost from EnterpriseDB -http://www.enterprisedb.com/postgres-plus-edb-blog/jason-davis/tales-trenches-new-mongodb-fdw
- Let’s go through the Demo
MongoDB FDW
10
Preparing the MongoDB
- Platform: Windows 7- Create the directories that you will need
- cd d:\mongodb- mkdir a0- mkdir b0- mkdir c0- mkdir c1- mkdir c2- mkdir d0- mkdir d1- mkdir d2- mkdir cfg0- mkdir cfg1- mkdir cfg2
Prepare for a MongoDB Cluster
12
mongod --configsvr --dbpath d:\mongodb\cfg0 --port 26050 --install --logpathd:\mongodb\cfg0.log --serviceName new_mongod_cfg0 --serviceDisplayNamenew_mongod_cfg0
net start new_mongod_cfg0
mongod --configsvr --dbpath d:\mongodb\cfg1 --port 26051 --install --logpathd:\mongodb\cfg1.log --serviceName new_mongod_cfg1 --serviceDisplayNamenew_mongod_cfg1
net start new_mongod_cfg1
mongod --configsvr --dbpath d:\mongodb\cfg2 --port 26052 --install --logpathd:\mongodb\cfg2.log --serviceName new_mongod_cfg2 --serviceDisplayNamenew_mongod_cfg2
net start new_mongod_cfg2
Create the services for MongoDB Cluster: ConfigServer
13
mongod --shardsvr --replSet a --dbpath d:\mongodb\a0 --logpath d:\mongodb\a0.log --port 27000 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a0 --serviceDisplayName new_mongod_shrd_a0
net start new_mongod_shrd_a0
mongod --shardsvr --replSet b --dbpath d:\mongodb\b0 --logpath d:\mongodb\b0.log --port 27100 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_b0 --serviceDisplayName new_mongod_shrd_b0
net start new_mongod_shrd_b0
mongod --shardsvr --replSet c --dbpath d:\mongodb\c0 --logpath d:\mongodb\c0.log --port 27200 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_c0 --serviceDisplayName new_mongod_shrd_c0
net start new_mongod_shrd_c0
Create the services for MongoDB Cluster: Create Shards
14
- Though here for simplicity we have skipped the creation of replica set you can do that
- e.g. - mkdir a1
- mongod --shardsvr --replSet a --dbpath d:\mongodb\a1 --logpathd:\mongodb\a0.log --port 27001 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a1 --serviceDisplayNamenew_mongod_shrd_a1
- net start new_mongod_shrd_a1
Create the services for MongoDB Cluster: Optionally Create the Replicas
15
- mongos --configdbsameer:26050,sameer:26051,sameer:26052 --install --serviceName new_mongos_svc0 --serviceDisplayNamenew_mongos_svc0 --logpath d:\mongodb\mongos0.log --port 26060
- net start new_mongos_svc0
Initiate the Mongos
16
- I am going to initiate 1 member replica set for all my shards
Initiate the Replica Set
17
- Shard Amongo --port 27000> rs.initiate()a:OTHER> rs.conf()a:PRIMARY> exit
- Shard Bmongo --port 27100> rs.initiate()b:OTHER> rs.conf()b:PRIMARY> exit
- Shard Cmongo --port 27200> rs.initiate()c:OTHER> rs.conf()c:PRIMARY> exit
mongo --port 26060 test
mongos> sh.addShard("sameer:27100")
mongos> sh.addShard("sameer:27200")
mongos> sh.addShard("sameer:27000")
mongos> sh.enableSharding("db")
mongos> sh.shardCollection("db.warehouse",{warehouse_created:1},true)
Setup Sharding
18
mongos> use db
mongos> db.createUser(
... {
... user: "superuser",
... pwd: "password",
... roles: [ { role: "root", db: "admin" } ]
... }
... )
Setup Users and Security
19
Creating FDW Extension in Postgres
- Download MongoDB FDW from Github
- Installation is quite easy when you use autogen.sh- Cd $PATH_WHERE_FDW_IS_EXTRACTED- ./autogen.sh
- It will automatically install all the required components- libbson- libmongoc
- Once installation is done then you can make and install- make -f Makefile.meta && make -f Makefile.meta install
Build MongoDB FDW
21
- Allows you to build with Legacy Driver or Master Branch
- Has read and write capability for the foreign table
- Connection Pooling which uses the same MongoDB connection for queries in same session
- Build with MongoDB's legacy branch driver- autogen.sh --with-legacy
- Build MongoDB's master branch driver- autogen.sh --with-master
Features of mongo_fdw
22
- Create Extension for mongo_fdw in PostgreSQL database
- You may create the table in template database
- Create a Foreign Data Server
- Create a user mapping a MongoDB user in Postgres
- Create Foreign Table which maps to a MongoDB Collection
Using mongo_fdw
23
- psql=# CREATE EXTENSION mongo_fdw;
- psql=# CREATE SERVER mongo_server
FOREIGN DATA WRAPPER mongo_fdw
OPTIONS (address '192.168.160.1', port '26060');
- psql=# CREATE USER MAPPING FOR postgres
SERVER mongo_server
OPTIONS (username 'superuser',
password 'password');
Create Foreign Server: Example
24
- psql=# CREATE FOREIGN TABLE warehouse(
_id NAME,
warehouse_id int,
warehouse_name text,
warehouse_created timestamptz)
SERVER mongo_server
OPTIONS (database 'db', collection 'warehouse');
Create Foreign Table: Example
25
- It stores a unique Object ID
- By default if you skip this column MongoDB will insert a 12 Byte BSON Object ID
- While inserting data into MongoDB you may choose the value of this field
- In mongo_fdw you have to define _id column with its data type as “NAME”
- mongo_fdw will ignore the value inserted in _id column and let MongoDB
‘_id’ column of MongoDB
26
- INSERT INTO warehouse values (0, 1, 'UPS', '2014-12-12T07:12:10Z');
- INSERT INTO warehouse values (0, 2, 'EMS', '2013-12-12T07:12:10Z');
- INSERT INTO warehouse values (0, 3, 'ASX', '2013-11-12T07:12:10Z');
- UPDATE warehouse set warehouse_name = 'UPS_NEW' where warehouse_id = 1;
DML on Foreign Tables
27
- Connect to MongoDB- mongo --port 26060 --username superuser --password password
- Check the data in collection- db.warehouse.find()
Operations on MongoDB
28
- You can run analyze on the foreign Table to collect statistics
- You can fire queries with “where” clause
- You may fire JOIN queries with other FOREIGN TABLE or NATIVE PostgreSQL Tables
Operations in Postgres on Foreign Data
29
Live walkthrough of the Hybrid Cluster
Leverage upon complex SQLs with Sharded MongoDB
Benefits of this Setup
- Build a sharded MongoDB cluster with SQL Interface
- Query MongoDB data using SQL
- Join MongoDB collections with each other or with tables in Postgres
- Combine and process MongoDB data with data from other data source with help of respective FDW e.g. Hadoop, Oracle, MySQL etc
- Add more shards on the go
- Add Replica for MongoDB on the go
- Use Postgres as front end to insert/update/delete data in MongoDB using SQL
31