Upload
andrew-liu
View
115
Download
0
Embed Size (px)
Citation preview
NoSQL Evolution
VolumeVelocityVariety
• How can my app deal with massive volumes of data and throughput?
• How do I elastically scale my database?
• How do I write responsive apps?• How do I make data available where my users are?• How do I write highly available apps?
• How do I deal with schema changes? • How do I iterate rapidly?• What data models work at scale?
NoSQL EvolutionVolume
Hyper converged/hyper
scale architectures
Horizontal partitioning
Elastic scale
Velocity
Write optimized database engines
Global distribution
Active-active topologies
Tunable consistency
Variety
Dynamically typed databases
Schema Free databases
Logical index layouts (inverted,
columnar etc)
2009MongoDB/ Riak/Neo4J
2015DocumentDB GA
2006BigTablepaper
Cassandra
20082007Dynamo paper/ AWS SimpleDB
2014DocumentDB Preview
20162012AWS DynamoDB
2010Project Florence
One size does not fit all
Azure PaaS
Scale-up Co-located compute & storage
Index Mgmt/QP
Local persistence
Local compute & storage
• A single database up to 1TB (future, 4TB)
Azure SQL DB
Read optimized
Relational
Scale-Out Disaggregated remote storage
Distributed file system
<1 EB
Azure Data Lake/U-SQL, HDInsight/Spark
Data Lakes
Compute runtimes
…
Index Mgmt/QP
Local persistence
Local compute & storage on each shard
Scale-out Co-located compute & storage
• A single collection 1PB & 100s of millions of req/sec
• Multiple collections in a database
Azure DocumentDB
Write and Read optimized
NoSQL
Common scenariosRetail, CMS, Education• Product Catalog• Product Recommendations• Personalization• Campaign Management• Blogs and CMS
Gaming• Multiplayer Games• Social Gameplay• Leaderboards• Game analytics
IoT, Sensor Data• Telemetry + Event Store• Telematics• Device Registry
Social Analytics, Ad Tech• User behavior telemetry• Personalization• Customer 360 view
Global distribution from the ground-up• Worldwide presence• Automatic multi-region replication
• Any number of regions• Policy based geo-fencing
• Multi-homing APIs• Apps don’t need to be redeployed during
regional failover• Customers can simulate/trigger
manual failover• Well defined guarantees for
latency, throughput, availability and consistency
Regional Availability
As a Ring 0 service, DocumentDB will be available by default in all new Azure regions
Guaranteed Low Latency
“I want my data wherever my users are.”
Reads <10ms @ P99, <1ms @ P50Writes <15ms @ P99, <6ms at P50• Globally distributed with reads
and writes served from local region
• Write optimized, latch-free database engine designed for SSDs and low latency access
• Synchronous and automatic indexing at sustained ingestion rates
Elastically scalable storage• System designed to independently
scale storage and throughput
• Transparent server side partition management and routing
• Automatically indexed SSD storage
• Automatic global distribution of data across any number of Azure regions
• Optionally evict old data using built-in support for TTL
Scale a single DocumentDB collection from 10GB-PBs
Elastically scalable throughput• Elastically scale throughput from
100 to 10s of millions of requests/sec across multiple regions.
• Customers pay by the hour for the provisioned throughput.
• Transparent server side partition management and routing.
More throughput
Less throughput9PM PST
Less throughput
More throughput11PM PST
99.99% availability SLA• Multi-homing APIs - apps don’t need
to be redeployed during regional failover
• Customers can simulate/trigger manual failover (via portal or APIs)
• Automatic failover (policy driven) in the event of regional failures
• All clusters configured with 10-20 FDS
• Each partition is protected by a replica set
• Majority quorum based durable, synchronous commits within a DC
99.99%
Well defined consistency models• Global distribution forces us to
navigate the CAP theorem
• Intuitive programming model for well-defined, relaxed consistency models with clear PACELC tradeoffs
• Four well-defined consistency levels to choose from
• Can be overridden on a per request basis
Strong consistency, High latency
Eventual consistency, Low latency
27%3%
54%
16%
Observed Distribution
BoundedStal-enessEventualSessionStrong
Schema agnostic indexing
• At global scale, ALTER TABLE and schema/index management is a non-starter
• Automatic and synchronous indexing of all ingested content
• No need to define schemas or secondary indices upfront!
• Highly write optimized database engine with latch free and log structured techniques
• Fully resource governed with back pressure and rate limiting built into the log structured storage engine
• Online and in-situ index transformations No
Problem
No Schema
Rich SQL and JavaScript queries• No impedance mismatch - JavaScript
is the type system of the database engine
• Query using either SQL and JavaScript (or both)
• Write business logic entirely in JavaScript with stored procedures and triggers
• JavaScript language integrated multi-item ACID transactions with snapshot isolation
TCP (SSL), HTTPS
DocumentDB Database Engine
Accessing DocumentDB
SQL JavaScript MongoDB
Query IL Database Runtime
Java .NETNative DocumentDB client drivers
Java.NET
Ruby…Native MongoDB client drivers
…
… …
Roadmap• Unique Index• Aggregates• Deeper engine level integration• Support for more databases
Customer Growth• Significant customer growth since the
initial launch• ISVs : Parse, Sitecore and others• Large MongoDB customers running into
security, scalability, robustness issues with MongoDB
MongoDB API Compatibility
Core database operations CRUD/Query• Insert, InsertMany, InsertOne, Update, UpdateMany, UpdateOne, ReplaceOne,
DeleteOne, DeleteMany, Remove • $inc, $mul, $rename, $set, $unset, $min, $max• $addToSet, $pullAll, $pull, $pushAll, $slice, $push, $pop, $each, $sort,
$position, $all, $size, $elemMatch• Bitwise, comparison, logical operators• $type, • $mod, $regex, • $2dspehere, 2d, polygon, $near, $nearSphere, $geoWithin, $geoIntersects
(incl. geometry support for points, lines, polygons sphere) • find, insert, update, delete, getLastError, getMore, findAndModify• getnonce, logout, authenticate• createIndex, listIndexes, dropIndexes, connectionStatus, reIndex, listDatabases,
collStats, dbStats
Turnkey• Fully managed, fully secure and compliant and backed by SLAs for availability,
latency, consistency and throughput• Partitioned collections• Global distribution across any number of regions
Security• Firewall support to restrict access
to specific IP addresses• Built-in RBAC support• Highly scalable AuthZ model with
built-in support for users and permissions
• Fine grained/row level AuthZ• All external (and internal)
communication over SSL• Coming Soon: Encryption@Rest
Compliance
Certification Details Compliance StatusStrong Privacy and Security Commitments · No mining of customer data for advertising · No voluntary disclosure to law enforcement agencies
Achieved
Contractual commitment to meet US and EU data residency requirements
Achieved
ISO 27001 AchievedISO 27018 AchievedEU Model Clauses (EUMC) AchievedHIPAA Business Associate Agreement AchievedPCI Started (in progress)SOC 1 & SOC 2 Started (in progress)FedRAMP, IRS 1075, UK Official (IL2) Started (in progress)Health Information Trust Alliance (HITRUST) Planned
DocumentDB Local EmulatorFree, downloadable, and high fidelity version of the cloud service for offline dev/test
Change Feed
Ingestion using DocumentDB Delta Feed
Compute (stream and batch)
Query using DocumentDB
• Lambda pattern with significantly lower TCO• Single scalable database solution for both ingestion and
query