Upload
progress
View
1.097
Download
5
Embed Size (px)
Citation preview
SQL Access to NoSQL
Brody Messmer and Phil Prudich
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 2
Agenda
What is NoSQL?
• The Benefits
• Implementations: MongoDB, Cassandra, & MarkLogic
• NoSQL Data Model
• Challenges
DataDirect’s Connectors
• The Benefits
• What You Need to Know
• Case Studies
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 3
What is NoSQL?
Sample JSON Document (MongoDB):
{ name: “sue”, age: 26, status: “A”, groups: [“news”, “sports”]}
Relational database design
focuses on data storage
NoSQL database design
focuses on data use
Key Value Store (Cassandra):
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 4
A Little Humor…
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 5
Benefits of NoSQL
High Performance
• Data can easily be partitioned across multiple nodes
• Low default isolation levels for both read and write operations
• Object models (Denormalized schema design) reduce need for expensive joins
• Typical index support, even on fields within embedded documents and arrays
High Availability & Fault Tolerance
• Replica sets / nodes provide automatic failover and data redundancy
Easily Scale Up or Scale Out
• Capable of running on commodity hardware
Cost
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 6
CAP Theorem
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 7
Implementations
MongoDB
Type: JSON Document Store
Query Language: API / PSQL
Typical Use Case:
• Web Applications (especially when
built with JavaScript)
Additional Benefits:
• Node.js / Web friendly -- JSON
• Dynamic schema
Apache Cassandra
Type: Key Value Store
Query Language: CQL
Typical Use Case:
• Real-time analytic workloads
• Heavy Writes, with desire for
reporting
Additional Benefits:
• Especially High Availability with
CAP focus on Availability and
Partition Tolerance
MarkLogic
Type: Multi-Model
Query Language: API
Typical Use Case:
• Search
• Recommendation Engine
Additional Benefits:
• Handles any type of data
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 8
Schema Design Comparison
Relational Design NoSQL Document Design
{ user: {
first: “Brody,
last: “Messmer”, ...
}
purchases: [
{ symbol: “PRGS”, date: “2013-02-13”, price: 23.50, qty: 100, ...},
{ symbol: “PRGS”, date: “2012-06-12”, price: 20.57, qty: 100, ...},
...
]
}
...
Collection: users
VS
user_id first last …
123456 Brody Messmer …
…
user_id symbol date price qty …
123456 PRGS 2013-02-13 23.50 100 …
123456 PRGS 2012-06-12 20.57 100 …
…
Table: users
Table: purchases
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 9
Is the “Object” model of NoSQL really used? Yes!!
Depth of arrays/document nesting?
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 10
A Strong Need for SQL on NoSQL DBs
Business Intelligence Data Integration
OD
BC
/ J
DB
C
NoSQL
RDBMS
SaaS
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 11
Connectivity to NoSQL Databases Is Hard
Cassandra MongoDB Challenges
Non-Standard Query Language
Lack of Common RDBMS Functionality
• No Support for Joins
• Limited support for filters, aggregates, etc
• Sorting is not ANSI SQL Compliant
• No ACID Transactions
• Unique Authentication
Non-relational Schema
• Heavy use of complex types (denormalized data model)
• Self-describing schema – Can only discover columns by selecting data
• Primary / Foreign Keys maintained by apps, not the database
Frequent Release Cadence
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 12
DataDirect’s Connectors
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 13
Collection Name: stock
{ symbol: “PRGS”,
purchases:[
{date: ISODate(“2013-02-13T16:58:36Z”), price: 23.50, qty: 100},
{date: ISODate(“2012-06-12T08:00:01Z”), price: 20.57, qty: 100,
sellDate: ISODate(“2013-08-16T12:34:58Z”)}, sellPrice: 24.60}
]
}
“Normalizing” the NoSQL Data Model – to Infinity!
Table Name: stock
_id symbol
1 PRGS
stock_id Date Price qty sellDate sellPrice
1 2013-02-13
16:58:36
23.50 100 NULL NULL
1 2012-06-12
08:00:01
20.57 100 2013-08-16
12:34:58
24.60
Table Name: stock_purchases
The Benefits:
Re-use of existing skills (SQL, Joins, etc)
• Exposing complex types using concepts familiar to those savvy
with RDBMS
As arrays/lists/maps/sets grow, table definitions remain constant
Simplified / Narrower table definitions
Joins across parent/child tables result in a single query to the database.
In other words, there’s no performance penalty.
Data in arrays can be sorted and/or aggregated via SQL
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 14
Full ANSI SQL Query Support
Full SQL support for operations that may not be supported by the DB:
• Complete Join Support
• Full Where Clause support
• Aggregates and Scalar Functions
• Group by, having
• ANSI SQL Compliant Sorting
Tested against real-world NoSQL data models
Limitations:
Write Support Often Crippled
Create/Drop Table Often not Supported
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 15
Performance
Push-down operations whenever possible
• Where Clause
• Limit, Offset
• Order by
• Aggregates (Sum, Avg, Max, Min, Count)
• Group by, Having
Highly performant, multi-threaded SQL Engine
Efficient use of memory and limited caching to disk
Advanced sorting algorithm when required
We take performance seriously. Losses are defects
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 16
Connectivity to NoSQL Databases is Hard Easy!
Cassandra MongoDB DataDirect
Connectors
Challenges
Standard Query Language
Common RDBMS Functionality
• Full Join Support
• Full ANSI SQL-like support for filters, aggregates, etc
• ANSI SQL Compliant Sorting
• ACID Transactions
• Unique Authentication
Relational Schema
• Exposes complex types for relationally minded applications/users
• Auto-discovers and exposes schema when necessary
• Helps enforce column constraints
Simplify Support for New Database Releases
Driving Innovation in the Market:
Introduced and normalized SQL connectivity for NoSQL
Most complete pushdown operations
Most complete SQL Support
Highest performing drivers
Recognition:
The only ODBC/JDBC driver certified and recommended by MongoDB Inc
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 17
What You Need to Know
Test using realistic data sets & server setups
• Nested / complex data
• Size of data
Test using vendor provided sample data and default DB install
Dynamic schema woes
• Opportunity for infinite data modeling techniques
• Schema inference limitations
• Abnormal sorting and group by results
Imposing constraints on strings
Isolation Levels
ACID transactions
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 18
CASE STUDY
CHALLENGE
MongoDB became a production database in Killik & Co’s infrastructure, and the team
began to move many processes from SQL to MongoDB. Various departments began
asking for data for reporting purposes, which necessitated real-time connectivity
between SQL Server and MongoDB.
The SOLUTION
Using Progress DataDirect Connect for ODBC, Killik & Co will can expose the data in
the MongoDB database as normalized relational tables, enabling the team to query, sort
and aggregate data from both systems to gain a far more comprehensive view of its
customers
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 19
Geek Bit - End to End
{
"_id": "c792351c-05b3-4794-b9e3-9cddccc1fb0f",
"audit": {
"created": "2014-09-03T18:06:27+01:00",
"userCreated": "Cater, Simon"
},
"data": {
"code": "G1234567G",
"name": "Dr S A Cater",
"type": "MPG",
"properties": {
"objective": "Killik Growth",
"reportTitle": "Dr S A Cater",
"modelResult": {
"rules": "Passed",
"guidelines": "Passed",
"lastRun": "2015-10-29T06:40:55+00:00",
"lastPassed": "2015-10-29T06:40:55+00:00"
},
"equityTarget": "85",
"nonEquityTarget": "15"
},
"scope": {
"clientReportScope": "CATER1,CATER2,-A123511:Managed Portfolio"
}
},
"owner": "Ipswich"
}
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 20
TIBCO Jaspersoft’s Journey with MongoDB
Native MongoDBDriver
ETL
In-Memory
Virtualization
Embedded
Progress
Driver
• Reports
created by
IT/Dev
• Requires
knowledge of
MongoDB
native query
language
• Extract data
from
MongoDB
• Good to
blend with
other data but
not using
power of
MongoDB
• Allows
blending data
and end user
driven reports
& analytics
• Slow, hard to
model data
• Full reporting,
dashboards,
analytics driven
by end users
• Easy metadata
• Use full power
of MongoDB
with complex
schemas
2015 2013 2012 2011
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 21
Try it out
https://www.progress.com/odbc/mongodb
https://www.progress.com/jdbc/mongodb
https://www.progress.com/odbc/apache-cassandra
https://www.progress.com/jdbc/apache-cassandra
Questions?
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 24
Additional Important Traits of a NoSQL Connector
Close Relationship with NoSQL Vendors
• DataDirect’s drivers are the only ones recommended and certified by MongoDB
Configurable Fine Grained Control of Schema Map
OEM / ISV Friendly
• MongoDB’s BI Connector is not embeddable
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 25
MongoDB BI Connector from Progress DataDirect vs MongoDB
MongoDB (Q4 2015) Progress DataDirect (Q1 2014)
Supported
Versions
MongoDB Enterprise Advanced 3.2 Supported with v2.2, 2.4, 2.6, 3.0,3.2
Free Software Foundation's GNU AGPL v3.0
MongoDB Professioal
MongoDB Enterprise Advanced
Known Workloads Data Visualization (extract) Data Visualization (extract)
Connect-Live
Operational BI
Data Federation
Deployment BI Desktop and/or Application Server
BI Connector on Linux Server Node(s)
BI Desktop and/or Application Server
Interface Postgres xDBC ANSI SQL MongoDB xDBC ANSI SQL
Fully Embeddable n/a Yes
Certification MongoDB MongoDB
DataDirect OVS/JVS (includes ISV suites)
Open source No No
Client Support Postgres open source community Commercial (includes TSANet Multi Vendor
Support)
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 26
Introducing the Schema Tool
Allows you to quickly and easily normalize the mongoDB schema
• Samples MongoDB Data
• Sets SQL datatypes if the field/column type is consistent (else defaults to varchar)
• Automatically normalizes complex types
Perfect your schema
• Adjust SQL Types and sizes
• Alter column/table names
• Hide Columns/Tables/Databases
• Add Columns
View statistics about your MongoDB data
• Schema consistency (ie data type consistency for a field/column)
• Max String length per field/column
• Min and Max elements in an array object
Creates a contract of the schema the driver will expose to ODBC/JDBC apps
As the MongoDB schema changes (new fields added), an application will have to opt-in to
these changes
This ensures MongoDB schema changes don’t break your app!
This “contract” is stored as an XML file on the client
© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 27
DataDirect MongoDB ODBC and JDBC drivers (released Q1 2014)
First Reliable MongoDB Connector (Unlimited Normalization)
and only one certified by MongoDB, Inc.
Picked up latest MongoDB features such as WiredTiger Engine
Support, Aggregate Framework, Security such as SSL
Support across Windows, Linux, AIX, Solaris, HP-UX