27
SQL Access to NoSQL Brody Messmer and Phil Prudich

SQL Access to NoSQL

Embed Size (px)

Citation preview

Page 1: SQL Access to NoSQL

SQL Access to NoSQL

Brody Messmer and Phil Prudich

Page 2: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 2

Agenda

What is NoSQL?

• The Benefits

• Implementations: MongoDB, Cassandra, & MarkLogic

• NoSQL Data Model

• Challenges

DataDirect’s Connectors

• The Benefits

• What You Need to Know

• Case Studies

Page 3: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 3

What is NoSQL?

Sample JSON Document (MongoDB):

{ name: “sue”, age: 26, status: “A”, groups: [“news”, “sports”]}

Relational database design

focuses on data storage

NoSQL database design

focuses on data use

Key Value Store (Cassandra):

Page 4: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 4

A Little Humor…

Page 5: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 5

Benefits of NoSQL

High Performance

• Data can easily be partitioned across multiple nodes

• Low default isolation levels for both read and write operations

• Object models (Denormalized schema design) reduce need for expensive joins

• Typical index support, even on fields within embedded documents and arrays

High Availability & Fault Tolerance

• Replica sets / nodes provide automatic failover and data redundancy

Easily Scale Up or Scale Out

• Capable of running on commodity hardware

Cost

Page 6: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 6

CAP Theorem

Page 7: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 7

Implementations

MongoDB

Type: JSON Document Store

Query Language: API / PSQL

Typical Use Case:

• Web Applications (especially when

built with JavaScript)

Additional Benefits:

• Node.js / Web friendly -- JSON

• Dynamic schema

Apache Cassandra

Type: Key Value Store

Query Language: CQL

Typical Use Case:

• Real-time analytic workloads

• Heavy Writes, with desire for

reporting

Additional Benefits:

• Especially High Availability with

CAP focus on Availability and

Partition Tolerance

MarkLogic

Type: Multi-Model

Query Language: API

Typical Use Case:

• Search

• Recommendation Engine

Additional Benefits:

• Handles any type of data

Page 8: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 8

Schema Design Comparison

Relational Design NoSQL Document Design

{ user: {

first: “Brody,

last: “Messmer”, ...

}

purchases: [

{ symbol: “PRGS”, date: “2013-02-13”, price: 23.50, qty: 100, ...},

{ symbol: “PRGS”, date: “2012-06-12”, price: 20.57, qty: 100, ...},

...

]

}

...

Collection: users

VS

user_id first last …

123456 Brody Messmer …

user_id symbol date price qty …

123456 PRGS 2013-02-13 23.50 100 …

123456 PRGS 2012-06-12 20.57 100 …

Table: users

Table: purchases

Page 9: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 9

Is the “Object” model of NoSQL really used? Yes!!

Depth of arrays/document nesting?

Page 10: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 10

A Strong Need for SQL on NoSQL DBs

Business Intelligence Data Integration

OD

BC

/ J

DB

C

NoSQL

RDBMS

SaaS

Page 11: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 11

Connectivity to NoSQL Databases Is Hard

Cassandra MongoDB Challenges

Non-Standard Query Language

Lack of Common RDBMS Functionality

• No Support for Joins

• Limited support for filters, aggregates, etc

• Sorting is not ANSI SQL Compliant

• No ACID Transactions

• Unique Authentication

Non-relational Schema

• Heavy use of complex types (denormalized data model)

• Self-describing schema – Can only discover columns by selecting data

• Primary / Foreign Keys maintained by apps, not the database

Frequent Release Cadence

Page 12: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 12

DataDirect’s Connectors

Page 13: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 13

Collection Name: stock

{ symbol: “PRGS”,

purchases:[

{date: ISODate(“2013-02-13T16:58:36Z”), price: 23.50, qty: 100},

{date: ISODate(“2012-06-12T08:00:01Z”), price: 20.57, qty: 100,

sellDate: ISODate(“2013-08-16T12:34:58Z”)}, sellPrice: 24.60}

]

}

“Normalizing” the NoSQL Data Model – to Infinity!

Table Name: stock

_id symbol

1 PRGS

stock_id Date Price qty sellDate sellPrice

1 2013-02-13

16:58:36

23.50 100 NULL NULL

1 2012-06-12

08:00:01

20.57 100 2013-08-16

12:34:58

24.60

Table Name: stock_purchases

The Benefits:

Re-use of existing skills (SQL, Joins, etc)

• Exposing complex types using concepts familiar to those savvy

with RDBMS

As arrays/lists/maps/sets grow, table definitions remain constant

Simplified / Narrower table definitions

Joins across parent/child tables result in a single query to the database.

In other words, there’s no performance penalty.

Data in arrays can be sorted and/or aggregated via SQL

Page 14: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 14

Full ANSI SQL Query Support

Full SQL support for operations that may not be supported by the DB:

• Complete Join Support

• Full Where Clause support

• Aggregates and Scalar Functions

• Group by, having

• ANSI SQL Compliant Sorting

Tested against real-world NoSQL data models

Limitations:

Write Support Often Crippled

Create/Drop Table Often not Supported

Page 15: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 15

Performance

Push-down operations whenever possible

• Where Clause

• Limit, Offset

• Order by

• Aggregates (Sum, Avg, Max, Min, Count)

• Group by, Having

Highly performant, multi-threaded SQL Engine

Efficient use of memory and limited caching to disk

Advanced sorting algorithm when required

We take performance seriously. Losses are defects

Page 16: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 16

Connectivity to NoSQL Databases is Hard Easy!

Cassandra MongoDB DataDirect

Connectors

Challenges

Standard Query Language

Common RDBMS Functionality

• Full Join Support

• Full ANSI SQL-like support for filters, aggregates, etc

• ANSI SQL Compliant Sorting

• ACID Transactions

• Unique Authentication

Relational Schema

• Exposes complex types for relationally minded applications/users

• Auto-discovers and exposes schema when necessary

• Helps enforce column constraints

Simplify Support for New Database Releases

Driving Innovation in the Market:

Introduced and normalized SQL connectivity for NoSQL

Most complete pushdown operations

Most complete SQL Support

Highest performing drivers

Recognition:

The only ODBC/JDBC driver certified and recommended by MongoDB Inc

Page 17: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 17

What You Need to Know

Test using realistic data sets & server setups

• Nested / complex data

• Size of data

Test using vendor provided sample data and default DB install

Dynamic schema woes

• Opportunity for infinite data modeling techniques

• Schema inference limitations

• Abnormal sorting and group by results

Imposing constraints on strings

Isolation Levels

ACID transactions

Page 18: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 18

CASE STUDY

CHALLENGE

MongoDB became a production database in Killik & Co’s infrastructure, and the team

began to move many processes from SQL to MongoDB. Various departments began

asking for data for reporting purposes, which necessitated real-time connectivity

between SQL Server and MongoDB.

The SOLUTION

Using Progress DataDirect Connect for ODBC, Killik & Co will can expose the data in

the MongoDB database as normalized relational tables, enabling the team to query, sort

and aggregate data from both systems to gain a far more comprehensive view of its

customers

Page 19: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 19

Geek Bit - End to End

{

"_id": "c792351c-05b3-4794-b9e3-9cddccc1fb0f",

"audit": {

"created": "2014-09-03T18:06:27+01:00",

"userCreated": "Cater, Simon"

},

"data": {

"code": "G1234567G",

"name": "Dr S A Cater",

"type": "MPG",

"properties": {

"objective": "Killik Growth",

"reportTitle": "Dr S A Cater",

"modelResult": {

"rules": "Passed",

"guidelines": "Passed",

"lastRun": "2015-10-29T06:40:55+00:00",

"lastPassed": "2015-10-29T06:40:55+00:00"

},

"equityTarget": "85",

"nonEquityTarget": "15"

},

"scope": {

"clientReportScope": "CATER1,CATER2,-A123511:Managed Portfolio"

}

},

"owner": "Ipswich"

}

Page 20: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 20

TIBCO Jaspersoft’s Journey with MongoDB

Native MongoDBDriver

ETL

In-Memory

Virtualization

Embedded

Progress

Driver

• Reports

created by

IT/Dev

• Requires

knowledge of

MongoDB

native query

language

• Extract data

from

MongoDB

• Good to

blend with

other data but

not using

power of

MongoDB

• Allows

blending data

and end user

driven reports

& analytics

• Slow, hard to

model data

• Full reporting,

dashboards,

analytics driven

by end users

• Easy metadata

• Use full power

of MongoDB

with complex

schemas

2015 2013 2012 2011

Page 22: SQL Access to NoSQL

Questions?

Page 23: SQL Access to NoSQL
Page 24: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 24

Additional Important Traits of a NoSQL Connector

Close Relationship with NoSQL Vendors

• DataDirect’s drivers are the only ones recommended and certified by MongoDB

Configurable Fine Grained Control of Schema Map

OEM / ISV Friendly

• MongoDB’s BI Connector is not embeddable

Page 25: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 25

MongoDB BI Connector from Progress DataDirect vs MongoDB

MongoDB (Q4 2015) Progress DataDirect (Q1 2014)

Supported

Versions

MongoDB Enterprise Advanced 3.2 Supported with v2.2, 2.4, 2.6, 3.0,3.2

Free Software Foundation's GNU AGPL v3.0

MongoDB Professioal

MongoDB Enterprise Advanced

Known Workloads Data Visualization (extract) Data Visualization (extract)

Connect-Live

Operational BI

Data Federation

Deployment BI Desktop and/or Application Server

BI Connector on Linux Server Node(s)

BI Desktop and/or Application Server

Interface Postgres xDBC ANSI SQL MongoDB xDBC ANSI SQL

Fully Embeddable n/a Yes

Certification MongoDB MongoDB

DataDirect OVS/JVS (includes ISV suites)

Open source No No

Client Support Postgres open source community Commercial (includes TSANet Multi Vendor

Support)

Page 26: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 26

Introducing the Schema Tool

Allows you to quickly and easily normalize the mongoDB schema

• Samples MongoDB Data

• Sets SQL datatypes if the field/column type is consistent (else defaults to varchar)

• Automatically normalizes complex types

Perfect your schema

• Adjust SQL Types and sizes

• Alter column/table names

• Hide Columns/Tables/Databases

• Add Columns

View statistics about your MongoDB data

• Schema consistency (ie data type consistency for a field/column)

• Max String length per field/column

• Min and Max elements in an array object

Creates a contract of the schema the driver will expose to ODBC/JDBC apps

As the MongoDB schema changes (new fields added), an application will have to opt-in to

these changes

This ensures MongoDB schema changes don’t break your app!

This “contract” is stored as an XML file on the client

Page 27: SQL Access to NoSQL

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 27

DataDirect MongoDB ODBC and JDBC drivers (released Q1 2014)

First Reliable MongoDB Connector (Unlimited Normalization)

and only one certified by MongoDB, Inc.

Picked up latest MongoDB features such as WiredTiger Engine

Support, Aggregate Framework, Security such as SSL

Support across Windows, Linux, AIX, Solaris, HP-UX