[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB

Blazing Fast, Planet-ScaleCustomer Scenarios with

Azure DocumentDB

Denny Lee

Program Manager

Azure DocumentDB

@dennylee

Andrew Liu

Program Manager

Azure DocumentDB

@aliuy8

A Brief Overview...

Elastically Scalable Throughput + Storage

Guaranteed low latency

Reads <10ms @ P99

Writes <15ms @ P99

Globally distributed

Speaks your language

Azure DocumentDB

{

"name": "SmugMug",

"permalink": "smugmug",

"homepage_url":

"http://www.smugmug.com",

"blog_url":

"http://blogs.smugmug.com/",

"category_code": "photo_video",

"products": [

{

"name": "SmugMug",

"permalink": "smugmug"

}

],

"offices": [

{

"description": "",

"address1": "67 E. Evelyn Ave",

"address2": "",

"zip_code": "94041",

"city": "Mountain View",

"state_code": "CA",

"country_code": "USA",

"latitude": 37.390056,

"longitude": -122.067692

}

]

}

Perfect for

these

Documents

schema-agnostic JSON store

for

hierarchical and de-normalized data at scale

Not these

documents

{

"name": "SmugMug",

"permalink": "smugmug",

"homepage_url":

"http://www.smugmug.com",

"blog_url":

"http://blogs.smugmug.com/",

"category_code": "photo_video",

"products": [

{

"name": "SmugMug",

"permalink": "smugmug"

}

],

"offices": [

{

"description": "",

"address1": "67 E. Evelyn Ave",

"address2": "",

"zip_code": "94041",

"city": "Mountain View",

"state_code": "CA",

"country_code": "USA",

"latitude": 37.390056,

"longitude": -122.067692

}

]

}

Perfect for

these

Documents

schema-agnostic JSON store

for

hierarchical and de-normalized data at scale

“If all you have is a hammer, everything looks like a nail“

-Abraham Maslow

Choose the right

tools for the right job

SQL

SQL Server 2016

SQL Database

Azure DocumentDB

Azure Search

Azure HDInsight

Azure Data Lake

Azure DW APS

Azure Stream Analytics

SQL

SQL Server 2016

Azure Data Factory

Azure ML

Azure Data Catalog

Power BI

SQL

SQL Server 2016

SQLServer 2016

SQL

Microsoft Data Platform

3 V’s of data : Endless possibilities

LearningGaming

Retail

Telematics

Mobile Apps

IoT

Let’s talk about scale.

Problem 1: Volume and Velocity

More users, more problems

<10ms99P query

latency

>1M game

downloads

~1B requests / day

The Walking Dead , results

How ?

Just throw some data in a

database!

Not that easy…

The right tool for the job ?

The answer for low latency @ massive scale

Fact: Managing shards is really painful.

Managing shards or partitions

Good news: DocumentDB has done all the heavy lifting.

Elastic scale

Request Unit (RU) is the

normalized currency

%

Memory

%

IOPS

%

CPU

Replica gets a fixed budget

of Request Units

Resource

Resource

setResource

Resource

DocumentsSQL

sprocsargs

Resource Resource

Predictable Performance

Request units

Creating partitioned collections

ScaleDemo

Code: https://aka.ms/docdb-benchmark

https://aka.ms/docdb-benchmark

Configured @10,100 RUs

~940 writes / second

Writing @ ~9800 RUs

Configured @250,000 RUs

~12,100 writes / second

Writing @ ~128,800 RUs

VM @ 99% CPU

Globally Distributed

Azure DocumentDB gives you the ability cheat the speed of light!

… with well-defined consistency models!

Bounded

StalenessSessio

nEventualStrong

LEFT TO RIGHT Relaxed consistency => better performance and availability

Consistency Level Strong Bounded Staleness Session Eventual

Total global order Yes Yes, outside of the “staleness

window”

No, partial “session” order No

Consistent prefix

guarantee

Yes Yes Yes Yes

Monotonic reads Yes Yes, across regions outside of the

staleness window and within a region

all the time

Yes, for the given session No

Monotonic writes Yes Yes Yes Yes

Read your writes Yes Yes (in the write region) Yes No

27%

3%

54%

16%

Observed Distribution

BoundedStaleness

Eventual

Session

Strong

App defined regional preferences

Global DistributionDemo

Code: https://aka.ms/docdb-latency-script-nodejs

https://aka.ms/docdb-latency-script-nodejs

Let’s talk about schema-freedom.

Problem 2: Variety

Item Color Microwave Safe Liquid Capacity

Geek Mug Graphite Yes 16oz

Coffee Bean Mug Tan No 12oz

Problem 2: Variety

Item Color Microwave Safe Liquid Capacity



Surface Book Gray ??? ???

Variety : Different attributes

Variety : Different attributes

Item Color Microwave

Safe

Liquid

Capacity

CPU Memory Storage

Geek Mug Graphite Yes 16oz ??? ??? ???

Coffee Bean Mug Tan No 12oz ??? ??? ???

Surface Book Gray ??? ??? 3.4 GHz Intel

Skylake Core i7-

6600U

16GB 1 TB SSD

Variety : More columns ?

Item Color Microwave

Safe

Liquid

Capacity



Variety : More tables ?

Item CPU Memory Storage

Surface Book 3.4 GHz Intel

Skylake Core i7-

6600U

16GB 1 TB SSD

ProductId Name

1 Geek Mug

2 Coffee Bean Mug

3 Surface Book

Variety : Master data ?

ProductId Attribute Value

1 Microwave Safe Yes

1 Liquid Capacity 16oz

… … …

2 Microwave Safe No

2 Liquid Capacity 12oz

… … …

3 CPU 3.4 GHz Intel Skylake Core i7-

6600U

3 Memory 16GB

… … …

2.4 GHz Core i5-6300U

3.4 GHz Core i7-6600U

Variety : JSON is beautiful

Retail

• Product Catalog

• Product Recommendations + Personalization

Gaming

• Multiplayer + Social Gameplay

IoT / Sensor Data

• Telemetry + Event Store

• Device Registry

Social Analytics + Ad Technology

• User behavior telemetry

• 3rd-Party Data from Web Crawlers

Common scenarios

IoT / Sensor Data

• Telemetry + Event Store

• Device Registry

Common scenarios

IoT / Sensor Data Challenges:

• Hardware is relatively hard to update

• Different generations of devices=> different schema

(Variety)

• Lots of sensors emitting telemetry=> high rate of ingestion

(Volume + Velocity)

IoT : Vehicle Telematics

IoT : Vehicle Telematics

Ingress API

HOT

Warm

Cold

Common Scenarios

Social Analytics + Ad Technology:

• Ingest + Analyze 3rd-Party Data

=> Who dictates schema? How do you index?

(Variety)

• Lots of social / user profiles

=> high rate of ingestion

(Volume + Velocity)


• User behavior telemetry

• 3rd-Party Data from Web Crawlers


>1BSocial Media

Profiles

>50M Tweets per Day


>1BSocial Media

Profiles

>50M Tweets per Day

Before moving to DocumentDB, my developers would

need to come to me to confirm that our Elasticsearch

deployment would support their data or if I would need

to scale things to handle it. DocumentDB removed me

as a bottleneck, which has been great for me and them.

-Stephen Hankinson, CTO, Affinio

Data ScienceDemo

Example: Graph Structures

Example: Graph Structures

Classic Graph Scenario: Flightsvertex = airports

edges = flights

Flight Graph with

Spark and DocumentDB

Notebook

View: https://aka.ms/docdb-spark-graph

Code: https://aka.ms/docdb-spark-graph-code

Demo

https://aka.ms/docdb-spark-graph

https://aka.ms/docdb-spark-graph-code

Understanding most important

airport (most flights in / out)

tripGraph.inDegrees\

.sort(desc("inDegree"))\

.limit(10))

Graph Calculations: Degrees, PageRank

56

• Blazing Fast IoT Scenarios

• Updateable columns

• Push-down predicate filtering

Advantages of DocumentDB in Data Science Scenarios

57

AdvantagesBlazing Fast IoT Scenarios

58

Flight

information

global safety

alerts

weather

Data Science Scenarios

Device

Notifications

Web / REST API

AdvantagesUpdateable Columns

59

Flight

information


Device

Notifications

Web / REST API

{ tripid: “100100”,delay: -5,time: “01:00:01”

}

{ tripid: “100100”,delay: -30,time: “01:00:01”

}

{delay:-30}

{delay:-30}

{delay:-30}

AdvantagesPushdown Predicate Filtering

60


{city:SEA}

locations headquarter exports

0 1

country

Germany

city

Seattle

country

France

city

Paris

city

Moscow

city

Athens

Belgium 0 1{city:SEA, dst: POR, ...},{city:SEA, dst: JFK, ...}, {city:SEA, dst: SFO, ...}, {city:SEA, dst: YVR, ...}, {city:SEA, dst: YUL, ...}, ...

Azure DocumentDB

More Resources / Coming Soon

Want to know more about Spark-to-DocumentDB

Connector?

Have any other questions?

Session Evaluations

ways to access

Go to passSummit.com Download the GuideBook App

and search: PASS Summit 2016

Follow the QR code link displayed

on session signage throughout the

conference venue and in the

program guide

Submit by 5pmFriday November 6th toWIN prizes

Your feedback is important and valuable. 3

Thank You

Learn more from

Azure [email protected] or follow @DocumentDB

Data & Analytics

[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB