Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Non-Relational Databases
Pelle Jakovits
25 October 2017
Outline
• Background– Relational model– Database scaling– The NoSQL Movement– CAP Theorem
• Non-relational data models– Key-value– Document-oriented– Column family– Graph
• Example databases
2/36
The Relational Model
• Data is stored in tables
• Strict relationships between tables
– Foreign key references between columns
• The expected format of the data is specified with a restrictive schema
• Data is typically accessed using Structured Query Language (SQL)
3/36
Database Scaling
• Vertical scaling – on one machine
• Horizontal scaling – across a cluster of machines
• Relational model does not scale well horizontally
– Because there are too many dependencies in relational model
– Database sharding is one approach
4/36
Sharding
5/36
Relational database
6/38
https://dev.mysql.com/doc/sakila/en/
The NoSQL Movement
• Emergence of persistence solutions using non-relational data models
• Driven by the rise of „Big Data“ and Cloud Computing
• Non-relational data models are based on Key - Value structure
• Simpler schema-less key-value based data models scale better than the relational model
• NoSQL is a broad term with no clear boundaries– The term NoSQL itself is very misleading
– No SQL?
– Not only SQL?
7/36
CAP Theorem
• It is impossible for a distributed computer system to simultaneously provide all three of the following:– Consistency - every read receives the most recent write or
an error– Availability - every request receives a response– Partition/Fault tolerance - the system continues to operate
despite arbitrary partitioning (network failures, dropped
• Have to choose between consistency or availability• NoSQL solutions which are more focused on
availability try to achieve eventual consistency• When trying to aim for both consistency or availability
– Will have high latency
8/36
CAP Theorem
9/36
Benefits of the Key-value Model
• Horizontal scalability
– Data with the same Key stored close to each other
– Suitable for cloud computing
• Flexible schema-less models suitable for unstructured data
• Fetching data by key can be very fast
10/36
Indexing and Partitioning
• In Key-value model, Key acts as an index
• Secondary indexes can be created in some solutions
– Implementations vary from partitioning to local distributed indexes
• Data is partitioned between different machines in the cluster
• Usually data is partitioned by rows or/and column families
• Users can often specify partitioning parameters
– Gives control over how data is distributed
– Very important for optimizing query speed
11/36
Aggregate-oriented DB
• Non-relational data models aim to store aggregated data together
• Aggregate is a collection of data that is treated as a unit– E.g. a customer and all of his orders
• In normalized relational databases aggregates are computed using GroupBy operations
• Keyed aggregates make for a natural unit of data sharding
12/36
Non-relational Data Models
The Key-value Model
• Data stored as key-value pairs
• The value is an opaque blob to the database
• Examples: Dynamo, Riak
14
The Document-oriented Model
• Data is also stored as key-value pairs
• Value is a „document“ and has further structure
• No strict schema
• Examples: CouchDB, MongoDB
15/36
Example
• Aggregates described in JSON using map and array data structures
16
The Column Family Model
• Data stored in large sparse tabular structures
• Columns are grouped into column families
– Column family is a meaningful group of columns
– Similar concept as a table in relational database
• A record can be thought of as a two-level map
• New columns can be added at any time
• Examples: BigTable, Cassandra, HBase
17/36
Column Family Example
18/36
a001
Names
username jsmith
firstname John
lastname Smith
Contactsphone 5 550 001
email [email protected]
Messages
item1 Message1
item2 Message2
… …
itemN MessageN
b014
Names
username pauljones
Contacts
Messages
item1 new Message
https://neo4j.com/developer/guide-build-a-recommendation-engine/
Graph Databases
• Data stored as nodes and edges of a graph
• The nodes and edges can have fields and values
• Aimed at graph traversal queries in connected data
• Examples: Neo4J, FlockDB
19/36
Non-relational Data Models
• In non-relational data stores data is denormalized
• It is common to also store JSON documents in key-value stores
• The key-value, document-oriented and column family models are aggregate oriented models
• The other models are based on the key-value model
• In reality the classification of databases into different models is not as straight forward
• Multiparadigm databases also exist (e.g ArangoDB)
20/36
Non Relational Database Examples
Riak
• Based on Amazon's Dynamo specification
• Key-value model
• Distributed decentralized persistent hash table
– Keyspace is partitioned between nodes
• Consistent hashing
– Avoid hash-to-node remapping when a node is added or removed
• Eventual Consistency using Multiversion Concurrency Control (MVCC)
22/36
Riak
• RESTful query interface
– Basic PUT, GET, POST, and DELETE functions
• Links and link walking
– One way relationships between data objects
– Turns Key-Value store into a simple Graph database
• Higher level querying on-top of Key-Value structure:
– Riak search is based on Apache Solr search engine
– MapReduce in Erlang and JavaScript
23/36
MongoDB
• Document oriented (BSON)
• Query language based on JavaScript
• GridFS for BLOB file storage
• Master-slave architecture
• Linking between documents
• Supports MapReduce in JavaScript
24/36
Query example
db.inventory.find({
status: "A",
$or: [ {qty: { $lt: 30 }}, { item: /^p/ }]
})
• Matches the following SQL query:
SELECT * FROM inventory WHERE status = "A" AND ( qty < 30 OR item LIKE "p%")
25/38
CouchDB
• Document oriented (JSON)
• RESTful query interface
• Built in web server
• Web based query front-end Futon
26/36
CouchDB
• MapReduce is the primary query method (JavaScript and Erlang)
• Materialized views as results of incremental MapReduce jobs
• CouchApps – javascript-heavy web applications built entirely on top of CouchDBwithout a separate web server for the logic layer
27/36
Query example
Documents:
{ „id“: 2, "position" : "programmer", "first_name": "James", "salary" : 100 }
{ „id“: 7, "position" : "support", "first_name": "John", "salary" : 23 }
Lets extract average salary for each unique position
Map:
function(doc, meta) {
if (doc.position && doc.salary) { emit(doc.position,doc.salary); }
}
Reduce:
function(key, values, rereduce) {
return sum(values)/values.length;
}
Some Reduce functions have been pre-defined: _sum(), _count(), _stats()
28/38
Cassandra
• Column family model
• Data model from BigTable, distribution model from Dynamo (decentralized)
• Uses Cassandra Query Language (CQL)
– SQL-like querying language
29/36
Cassandra
• Provides Availability & Partition-Tolerance from the CAP theorem
• Static and dynamic column families
• Dynamic column families as materialized views on data
• The concept of super-columns
– Family of column families
30/36
Neo4J
• Open source NoSQL Graph Database
• Uses the Cypher Query Language for querringgraph data
• Graph consists of Nodes, Edges and Attributes
31/36
Neo4J Querry example
MATCH (you {name:"You"})
MATCH (expert)-[:WORKED_WITH]->
(db:Database {name:"Neo4j"})
MATCH path = shortestPath(
(you)-[:FRIEND*..5]-(expert)
)
RETURN db,expert,path
32/38https://neo4j.com/developer/cypher-query-language/
Why use Non-Relational DB?
• Volume of Data
• Elasticity
• Flexible Schemas
• Cost
33/38
http://martinfowler.com/bliki/PolyglotPersistence.html
Polyglot Persistence
34/36
Conclusions
• In recent years there has been a rise in non-relational (NoSQL) data stores
• This is related to the rise of cloud computing –key-value models offer better scalability
• The NoSQL landscape is extremely varied• The four basic non-relational data models are:
– The key-value model– The document-oriented model– The column family model– Graph databases
35/36
That is All
• Next week's practice session
– Using NoSQL databases
• Exam times
– Wednesday - 01 November 2017
– Thursday - 02 November 2017
• NB! A set of example exam questions are available on the course website
36/36