92
IoT day 2015 NoSQL in Azure per l’IoT (e il Business) Marco Parenzan Microsoft Azure MVP @marco_parenzan marco [dot] parenzan [at] 1nn0va [dot] it

NoSQL Database in Azure for IoT and Business

Embed Size (px)

Citation preview

Page 1: NoSQL Database in Azure for IoT and Business

IoT day 2015

NoSQL in Azure per l’IoT

(e il Business)

Marco ParenzanMicrosoft Azure MVP

@marco_parenzan

marco [dot] parenzan [at] 1nn0va [dot] it

Page 2: NoSQL Database in Azure for IoT and Business

IoT day 2015

Sponsor

Page 3: NoSQL Database in Azure for IoT and Business

IoT day 2015

Speaker info/Marco Parenzan

www.slideshare.net/marco.parenzan

www.github.com/marcoparenzan

marco [dot] parenzan [at] 1nn0va [dot] it

www.1nnova.it

@marco_parenzan

Formazione ,Divulgazione e Consulenza con 1nn0va

Microsoft MVP 2014 for Microsoft Azure

Cloud Architect, NET developer

Loves Functional Programming, Html5 Game Programming and Internet of Things

Microservices

Saturday 2015:

un viaggio con

NServiceBus LI

VE

AZURE

COMMUNITY

BOOTCAMP 2015

Page 4: NoSQL Database in Azure for IoT and Business

IoT as an hobby (now…?)

Page 5: NoSQL Database in Azure for IoT and Business

IoT day 2015

Data Ecosystem

Where do I put data

received in EventHub?

Page 6: NoSQL Database in Azure for IoT and Business

From private to public Cloud

A Continuous offering

Microsoft Relational Storage Options

Page 7: NoSQL Database in Azure for IoT and Business

IoT day 2015

SQL Server database technology “as a Service”

Fully managed database-as-a-service built on SQL with near zero administration

Enterprise-ready with automatic support for HA, DR, Backups, replication and more

Highly available and elastically scalable for unpredictable SaaS workloads

Uptime SLA of 99.99%

Predictable performance & Pricing

Built-in regional database geo-replication for additional protection

All core search capabilities - faceting, suggestions, geospatial

Secure and compliant for your sensitive data

Fully compatible with SQL Server 2014 databases

SQL Azure features

Page 8: NoSQL Database in Azure for IoT and Business

StreamingRelational

Internal &

external

Non-

relational NoSQL

MobileReports

Natural

language

queryDashboardsApplications

Orchestration

Machine

learningModeling

Information

management

Complex

event

processing

Data

The Microsoft data platform

Page 9: NoSQL Database in Azure for IoT and Business

The traditional world

Page 10: NoSQL Database in Azure for IoT and Business

IoT day 2015

Business, no longer data, is the foundation of software design

DDD!=OOP

Don’t start from Data

Data are not unique

No more ACID…ACID transactions are not useful with a

distributed model over different storages

Paradigm Shift

Page 11: NoSQL Database in Azure for IoT and Business

IoT day 2015

How many queries can be determined at level analysis?

“A repository should offer an explicit and well defined contract

and avoid arbitrary query”

In business … don’t‘ delete anything (Repository doesn’t

delete anything)

From theory to practice

Page 12: NoSQL Database in Azure for IoT and Business

Classic MVC

Business Logic

Contract BL/P

View

Controller

Page 13: NoSQL Database in Azure for IoT and Business

CQRS (Service Bus powered)

Event Handler

UI

EventCommand Handler

Queue

Topics/Subscription

Page 14: NoSQL Database in Azure for IoT and Business

CQRS for IoT (Service Bus Powered)

Event Handler

UI

Event

Command Handler

Even

t

Device

Queue

Topics/Subscription

Event Hub

Write

Model

Read

/Search

Model

Page 15: NoSQL Database in Azure for IoT and Business

IoT day 2015

No longer build on data…but on “what happens”

No more one single data store

Data store typess

Logs

Persistence

Saga (long transactions)

Search

Event-based systems

Page 16: NoSQL Database in Azure for IoT and Business

The Big Picture

A modern view:

Page 17: NoSQL Database in Azure for IoT and Business

The traditional world in Azure

Page 18: NoSQL Database in Azure for IoT and Business

Why Use a NoSQL Technology on Azure?

Page 19: NoSQL Database in Azure for IoT and Business

Choosing a Data Technology

Page 20: NoSQL Database in Azure for IoT and Business

IoT day 2015

Db for what?

To store data?

To manipulate data?

Long-term theme

Page 21: NoSQL Database in Azure for IoT and Business

IoT day 2015

NoSql Introduction

Page 22: NoSQL Database in Azure for IoT and Business

IoT day 2015

Key/Value

Table

Blob

Queue

Graph

Document

Not Only Sql Paradigms

Page 23: NoSQL Database in Azure for IoT and Business

What is a document database?

Definitely NOT this

kind of document !

Page 24: NoSQL Database in Azure for IoT and Business

What is a document database?

Not ideal, but it can work -

{

"id": "13244_post",

"text": "Lorizzle ghetto dolor tellivizzle boofron, stuff pimpin' elizzle. Nullam sapizzle

velizzle, my shizz tellivizzle, suscipizzle funky fresh, shizzle my nizzle crocodizzle

vizzle, arcu. Pellentesque eget tortizzle. Sizzle erizzle. Mammasay mammasa mamma oo sa

break it down dolor own yo' things fo shizzle mah nizzle fo rizzle, mah home g-dizzle

sure. Maurizzle pellentesque dawg ghetto turpizzle. Shiz izzle my shizz. Pellentesque

eleifend rhoncizzle nisi. In its fo rizzle owned ma nizzle dictumst. Sizzle gangsta.

Curabitur tellizzle urna, pretizzle go to hizzle, mattizzle izzle, eleifend vitae,

tellivizzle. Dawg shizzlin dizzle. Integer semper velit sizzle stuff.

Boofron mofo auctizzle ma nizzle. Pot a elizzle ut nibh pretium tincidunt. Maecenizzle

things erat. Own yo' in lacizzle sed maurizzle elementizzle tristique. I'm in the

shizzle yippiyo sizzle daahng dawg eros ultricizzle . In velit tortor, ultricizzle

ghetto, hendrerizzle fo shizzle mah nizzle fo rizzle, mah home g-dizzle, adipiscing

crunk, boom shackalack. Etizzle velit doggy, hizzle consequizzle, pharetra get down

get down, dictizzle sed, shut the shizzle up. Fo shizzle neque. Fo lorizzle. Bling

bling vitae pizzle ut libero commodo gizzle. Fusce izzle augue eu yo mamma dang.

Phasellizzle break it down fo nizzle erat. Suspendisse shizzlin dizzle owned,

sollicitudin sizzle, mah nizzle izzle, commodo nec, justo. Donizzle fizzle

porttitizzle ligula. Nunc feugizzle, tellus tellivizzle ornare tempor, sapizzle break

it down tincidunt gangster, eget dapibus daahng dawg enizzle izzle that's the shizzle.

Stuff quizzle leo, imperdizzle izzle, fo shizzle my nizzle izzle, semper izzle,

sapien. Ut boofron magna vizzle ghetto. I'm in the shizzle ante bling bling,

suscipizzle vitae, yo mamma stuff, rutrizzle pizzle, velizzle.

Mauris da bomb go to zzle. Sizzle mammasay mammasa mamma oo sa magna own yo' amet risus

congue. Boofron mofo auctizzle ma nizzle. Pot a elizzle ut nibh pretium tincidunt.

things erat. Own yo' in lacizzle sed maurizzle elementizzle tristique. I'm in the

shizzle yippiyo sizzle daahng dawg eros ultricizzle . In velit tortor, ultricizzle

ghetto, hendrerizzle fo shizzle mah nizzle fo rizzle, mah home g-dizzle, adipiscing

crunk, boom shackalack. Etizzle velit doggy, hizzle consequizzle, pharetra get down

get down, dictizzle sed, shut the shizzle up. Fo shizzle neque. Fo lorizzle. Bling "

}

Page 25: NoSQL Database in Azure for IoT and Business

What is a document database?

Ideally suited to this

kind of document -

{

"id": "13244_user",

"firstName": "John",

"lastName": "Smith",

"age": 25,

"employmentHistory" : [

{

"company":"Contoso Inc"

"start": {"date":"Thu, 02 Apr 2015 20:54:45 GMT", "epoch":1428008086},

"position":"CEO"

},

{

"start": {"date":"Thu, 02 Apr 2012 20:54:45 GMT", "epoch":1428008086},

"end": {"date":"Thu, 01 Apr 2015 20:54:45 GMT", "epoch":1428008086},

"position":"GM"},

],

"address":

{

"streetAddress": "21 2nd Str",

"city": "New York",

"state": "NY",

"postalCode": "10021"

},

"children": [

{"name":"Megan", "age":10},

{"name": "Bruce", "age":7},

{"name": "Angus", "sports" : ["football", "basketball", "hockey"]}

]

"mobileNumber": "212 555-1234"

}

Page 26: NoSQL Database in Azure for IoT and Business

IoT day 2015

JSON can represent complex containment relationships that are

difficult to represent in RDBMS

Schema-less – great for growing requirements during dev unlike

RDBMS where you must know the structure up front and its

painful to modify it

Native notation for JavaScript

Why JSON?

Page 27: NoSQL Database in Azure for IoT and Business

IoT day 2015

try to treat your entities as self-contained documents represented in JSONWhen working with relational databases, we've been taught for years to normalize, normalize, normalize.

There are contains relationships between entities.

There are one-to-few relationships between entities.

There is embedded data that changes infrequently.

There is embedded data won't grow without bound.

There is embedded data that is integral to data in a document.

Embedding

better read performance

Page 28: NoSQL Database in Azure for IoT and Business

IoT day 2015

Representing one-to-many relationships.

Representing many-to-many relationships.

Related data changes frequently.

Referenced data could be unbounded

Provides more flexibility than embedding

More round trips to read data

Referencing

Normalizing typically provides better write performance

Page 29: NoSQL Database in Azure for IoT and Business

• No magic bullet

Think about how your data

is going to be written, read

and model accordingly

Hybrid models ~ denormalize + reference + aggregate

{ "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [

{"thumbnail": "http://....png"} {"profile": "http://....png"}

] }

{ "id": 1, "name": "DocumentDB 101", "authors": [

{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"}, {"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}

] }

Page 30: NoSQL Database in Azure for IoT and Business

IoT day 2015

Promote code first development (mapping objects to json)

Resilient to iterative schema changes

Richer query and indexing (compared to KV stores)

Low impedance as object / JSON store; no ORM required

It just works

It’s fast

Developer Appeal

Page 31: NoSQL Database in Azure for IoT and Business

IoT day 2015

DocumentDb Introduction

Page 32: NoSQL Database in Azure for IoT and Business

IoT day 2015

Store schema-less JSON documents

Excels at search w/ SQL syntax

JavaScript for Stored Procs, Triggers and UDFs

Elastic capacity (not in specific Azure sense, up to now)

Multi-document transaction (Batch)

Tweak everything (read/write performance vs. consistency, index performance, security)

Designed for massive scale

What is DocumentDb?

Page 33: NoSQL Database in Azure for IoT and Business

IoT day 2015

Applications that need managed elastic scale

Customer does not want to add additional IT resources for

support and maintenance

Avoiding CAPEX and OPEX

Built-for-the-cloud database technology

Access via RESTful HTTP API or client library

DocumentDB: DbaaS

Page 34: NoSQL Database in Azure for IoT and Business

IoT day 2015

Catalog data

Preferences and state

Event store

User generated content

Data exchange

Typical usage

Page 35: NoSQL Database in Azure for IoT and Business

IoT day 2015

Resource Model

Page 36: NoSQL Database in Azure for IoT and Business

Database Account

JS

JS

JS

101010

Page 37: NoSQL Database in Azure for IoT and Business

Database

JS

JS

JS

101010

Page 38: NoSQL Database in Azure for IoT and Business

Collections

JS

JS

JS

101010

* collection != table of homogenous entities

collection ~ a data partition

Page 39: NoSQL Database in Azure for IoT and Business

Documents

JS

JS

JS

101010

{

"id" : "123"

"name" : "joe"

"age" : 30

"address" : {

"street" : "some st"

}

}

Page 40: NoSQL Database in Azure for IoT and Business

Users, Server Scripts, Attachments

JS

JS

JS

101010

Page 41: NoSQL Database in Azure for IoT and Business

IoT day 2015

Collections

Page 42: NoSQL Database in Azure for IoT and Business

IoT day 2015

a container of JSON documents and the associated JavaScript

application logic

JSON docs inside of a collection can vary dramatically

A unit of scale for transaction and query throughput (capacity

units allocated uniformly across all collections)

A unit of scale for capacity

A unit of replication

What is a collection?

Page 43: NoSQL Database in Azure for IoT and Business

IoT day 2015

Collections in DocumentDB are not just logical containers, but also physical containers

They are the transaction boundary for stored procedures and triggers

entry point to queries and CRUD operations

Each collection is assigned a reserved amount of throughput which is not shared with other collections in the same account

Collections do not enforce schema

Collections

Page 44: NoSQL Database in Azure for IoT and Business

IoT day 2015

Partitioning

Page 45: NoSQL Database in Azure for IoT and Business

Design: Partitioning

Why Partition?

• Data SizeA single collection (currently*) holds 10GB

• Throughput3 Performance tiers with a max of 2,500 RU/sec

Page 46: NoSQL Database in Azure for IoT and Business

IoT day 2015

In hash partitioning, partitions are assigned based on the value

of a hash function, allowing you to evenly distribute requests

and data across a number of partitions. This is commonly used

to partition data produced or consumed from a large number of

distinct clients, and is useful for storing user profiles, catalog

items, and IoT ("Internet of Things") telemetry data.

Hash Partitioning

Page 47: NoSQL Database in Azure for IoT and Business

IoT day 2015

In range partitioning, partitions are assigned based on whether

the partition key is within a certain range

This is commonly used for partitioning with time

stamp properties

Keep current data hot, Warm historical data, Scale-down older

data, Purge / Archive

Range partitioning

Page 48: NoSQL Database in Azure for IoT and Business

IoT day 2015

In lookup partitioning, partitions are assigned based on a lookup map that assigns discrete partition values to specific partitions a.k.a. a partition or shard map

This is commonly used for partitioning by region

Lookup partitioning

Tenant Partition Id

Customer 1

Big Customer 2

Another 3

Page 49: NoSQL Database in Azure for IoT and Business

{

record: "1",created: {

"date": "6/1/2014","epoch": 1401662986

}},

{record: "3",created: {

"date": "9/23/2014""epoch": 1411512586

}} ,

{record: "123",created: {

"date": "8/17/2013""epoch": 1376779786

}}

SELECT * FROM root r WHERE r.date.epoch BETWEEN 1376779786 AND 1401662986

{

record: "1",created: {

"date": "6/1/2014","epoch": 1401662986

}},

{record: "3",created: {

"date": "9/23/2014""epoch": 1411512586

}}

{record: "43233",created: {

"epoch": 1411512586}

} ,

{record: "1123",created: {

"date": "8/17/2013""epoch": 1376779786

}},

{ record: "43234",created: {

"epoch": 1376779786}

Partitioning - Fan-out Queries

Page 50: NoSQL Database in Azure for IoT and Business

IoT day 2015

Consistency

Page 51: NoSQL Database in Azure for IoT and Business

IoT day 2015

Query / transaction throughput (and reliability – i.e., hardware failure) depend on

replication!

All writes to the primary are replicated across two secondary replicas

All reads are distributed across three copies

“Scalability of throughput” – allowing different clients to read from different replicas helps prevent

bottlenecks

BUT replication takes time!

Potential scenario: some clients are

reading while another is writing

Now, the data is out-of-date, inconsistent!

Why worry about consistency?

Page 52: NoSQL Database in Azure for IoT and Business

IoT day 2015

Trade-off: speed (performance & availability) or consistency (data correctness)?“Does every read need the MOST current data?”

“Or do I need every request to be handled and handled quickly?”

No “one size fits all” answer … so it’s up to you!

4 options …For the entire Db…

…In a future release, we intend to support overriding the default consistency level on a per collection basis.

Tweakable Consistency

Page 53: NoSQL Database in Azure for IoT and Business

IoT day 2015

client always sees completely consistent data

Slowest reads / writes

Mission critical: e.x. stock market, banking, airline reservation

Strong

Page 54: NoSQL Database in Azure for IoT and Business

IoT day 2015

Default – even trade-off between performance & availability vs.

data correctness

client reads its own writes, but other clients reading this same

data might see older values

Session

Page 55: NoSQL Database in Azure for IoT and Business

IoT day 2015

client might see old data, but it can specify a limit for how old

that data can be (ex. 2 seconds)

Updates happen in order received

similar to Session consistency, but speeds up reads while still

preserving the order of updates

Bounded Staleness

Page 56: NoSQL Database in Azure for IoT and Business

IoT day 2015

client might see old data for as long as it takes a write to

propagate to all replicas

High performance & availability, but a client might sometimes

read out-of-date information or see updates out of order

Eventual

Page 57: NoSQL Database in Azure for IoT and Business

IoT day 2015

At the database level (see preview portal)

On a per-read or per-query basis (optional parameter on

CreateDocumentQuery method)

Setting Consistency

Page 58: NoSQL Database in Azure for IoT and Business

IoT day 2015

Use Weaker Consistency Levels for better Read latencies

• IoT

• Data Analysis

http://azure.microsoft.com/blog/2015/01/27/performance-tips-

for-azure-documentdb-part-2/

Consistency Tips

Page 59: NoSQL Database in Azure for IoT and Business

IoT day 2015

Indexing

Page 60: NoSQL Database in Azure for IoT and Business

IoT day 2015

Efficient, rich hierarchical and relational queries without any schema or

index definitions.

Consistent query results while handling a sustained volume of writes. For

high write throughput workloads with consistent queries, the index is

updated incrementally, efficiently, and online while handling a sustained

volume of writes.

Storage efficiency. For cost effectiveness, the on-disk storage overhead of

the index is bounded and predictable.

Indexing

Page 61: NoSQL Database in Azure for IoT and Business

var collection = new DocumentCollection

{

Id = "lazyCollection"

};

collection.IndexingPolicy.IndexingMode = IndexingMode.Lazy;

client.CreateDocumentCollectionAsync(databaseLink, collection);

Indexing modes

ConsistentDefault mode

Index updated synchronously on writes

LazyUseful for bulk ingestion scenarios

Indexing policies

AutomaticDefault

ManualCan choose to index documents via RequestOptions

Can read non-indexed documents via selflink

Indexing – Modes and policies

Set indexing mode

Set indexing policy

var collection = new DocumentCollection{

Id = "manualCollection"};

collection.IndexingPolicy.Automatic = false;

client.CreateDocumentCollectionAsync(databaseLink, collection);

Page 62: NoSQL Database in Azure for IoT and Business

Setting paths, types, and precisionvar collection = new DocumentCollection

{ Id = "Orders"

};

collection.IndexingPolicy.ExcludedPaths.Add("/\"metaData\"/*");

collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath{

IndexType = IndexType.Hash,Path = "/",

});

collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath{

IndexType = IndexType.Range,Path = @"/""shippedTimestamp""/?",NumericPrecision = 7

});

client.CreateDocumentCollectionAsync(databaseLink, collection);

Index paths

Include and/or Exclude paths

Index types

HashSupported for strings and numbers

Optimized for equality matches

RangeSupported for numbers

Optimized for comparison queries

Index precision

String precisionDefault is 3

Numeric precisionDefault is 3

Increase for larger number fields

Indexing – Paths and types

Page 63: NoSQL Database in Azure for IoT and Business

IoT day 2015

Use lazy indexing for faster peak time ingestion rates

Exclude unused paths from indexing for faster writes

Specify range index path type for all paths used in range queries

Vary index precision for write vs query performance vs storage

tradeoffs

http://azure.microsoft.com/blog/2015/01/27/performance-tips-

for-azure-documentdb-part-2/

Indexing tips

Page 64: NoSQL Database in Azure for IoT and Business

IoT day 2015

Querying

Page 65: NoSQL Database in Azure for IoT and Business

IoT day 2015

Optimize for queries with small result sets for scalability

Limit use of scans (no range index, NOT, UDFs in WHERE)

Use page size (MaxItemCount) and continuation tokens

For large result sets, use a larger page size (1000)

Querying

Page 66: NoSQL Database in Azure for IoT and Business

Query over heterogeneous documents without defining

schema or managing indexes

Query arbitrary paths, properties and values without

specifying secondary indexes or indexing hints

Execute queries with consistent results

Supported SQL features; predicates, iterations (arrays),

sub-queries, logical operators, UDFs, intra-document

JOINs, JSON transforms

In general, more predicates result in a larger request

charge.

Additional predicates can help if they result in narrowing

the overall result set.

from book in client.CreateDocumentQuery<Book>(collectionSelfLink)

where book.Title == "War and Peace"

select book;

from book in client.CreateDocumentQuery<Book>(collectionSelfLink)

where book.Author.Name == "Leo Tolstoy"

select book.Author;

-- Nested lookup against index

SELECT B.Author

FROM Books B

WHERE B.Author.Name = "Leo Tolstoy"

-- Transformation, Filters, Array access

SELECT { Name: B.Title, Author: B.Author.Name }

FROM Books B

WHERE B.Price > 10 AND B.Language[0] = "English"

-- Joins, User Defined Functions (UDF)

SELECT udf.CalculateRegionalTax(B.Price, "USA", "WA")

FROM Books B

JOIN L IN B.Languages

WHERE L.Language = "Russian"

LINQ Query

SQL Query Grammar

Query

Page 67: NoSQL Database in Azure for IoT and Business

IoT day 2015

Programmability

Page 68: NoSQL Database in Azure for IoT and Business

function region(doc)

{

switch (doc.Location.Region)

{

case 0:

return "North";

case 1:

return "Middle";

case 2:

return "South";

}

}

The complexity of a query impacts the

request units consumed for an operation:

Use of user-defined functions (UDFs)

SELECT or WHERE clauses

To take advantage of indexing, try and have at least one filter against an indexed property when leveraging a UDF in the WHERE clause

.

Query with user-defined function

Page 69: NoSQL Database in Azure for IoT and Business

function count(filterQuery, continuationToken) {var collection = getContext().getCollection();var maxResult = 25; // MAX number of docs to process in one

batch, when reached, return to client/request continuation. // intentionally set low to demonstrate the concept. This can

be much higher. Try experimenting.// We've had it in to the high thousands before seeing the

stored proceudre timing out.

// The number of documents counted.var result = 0;

tryQuery(continuationToken);}

Execute “explicit” Javascript

code on collection

Executing Stored Procedures

Page 70: NoSQL Database in Azure for IoT and Business

function normalize() {var collection = getContext().getCollection();var collectionLink = collection.getSelfLink();var doc = getContext().getRequest().getBody();

var newDoc = {"Sensor": {"Id": doc.sensorId,"Class": 0},"Degree": {"Value": doc.degreeValue,"Type": 0},"Location": {"Name": doc.locationName,"Region": doc.locationRegion,"Longitude": doc.locationLong,"Latitude": doc.locationLat},

"id": doc.id};

// Update the request -- this is what is going to be inserted.getContext().getRequest().setBody(newDoc);

}

Execute “implicit” Javascript

code on CRUD operations

(Insert, Update, Delete) on

collections

Triggers!

Page 71: NoSQL Database in Azure for IoT and Business

IoT day 2015

Performances

Page 72: NoSQL Database in Azure for IoT and Business

IoT day 2015

Data is saved on SSD

All writes to the primary are replicated across two secondary replicas(Replicas are spread on different hardware in same region to protect against failures)

All reads are distributed across the three copies (when and how depend on consistency level for db account and query)

DocumentDb Performance

Page 73: NoSQL Database in Azure for IoT and Business

IoT day 2015

Measure and Tune for lower request units/second usage

DocumentDB offers a rich set of database operations including relational and hierarchical queries with UDFs, stored procedures and triggers – all operating on the documents within a database collection. The cost associated with each of these operations will vary based on the CPU, IO and memory required to complete the operation. Instead of thinking about and managing hardware resources, you can think of a request unit (RU) as a single measure for the resources required to perform various database operations and service an application request.

Handle Server throttles/request rate too large

When a client attempts to exceed the reserved throughput for an account, there will be no performance degradation at the server and no use of throughput capacity beyond the reserved level. The server will preemptively end the request with RequestRateTooLarge (HTTP status code 429) and return the x-ms-retry-after-ms header indicating the amount of time, in

milliseconds, that the user must wait before reattempting the request.

Delete empty collections to utilize all provisioned throughput

Every document collection created in a DocumentDB account is allocated reserved throughput capacity based on the number of Capacity Units (CUs) provisioned, and the number of collections created. A single CU makes available 2,000 request units (RUs) and supports up to 3 collections

Design for smaller documents for higher throughput

The Request Charge (i.e. request processing cost) of a given operation is directly correlated to the size of the document

http://azure.microsoft.com/blog/2015/01/27/performance-tips-for-azure-documentdb-part-2/

Performance Tips

Page 74: NoSQL Database in Azure for IoT and Business

IoT day 2015

Considerations

Page 75: NoSQL Database in Azure for IoT and Business

IoT day 2015

User generated content

Many specific data (varbinary(MAX) in SQL)

Catalog data

Log data

User preferences data

Device sensor data

IoT use cases commonly share some patterns in how they ingest, process and store data. First, these systems allow for data intake that can ingest bursts of data from device sensors of various locales. Next, these systems process and analyze streaming data to derive real time insights. And last but not least, most if not all data will eventually land in a data store for adhoc querying and offline analytics.

Usage: what is DocumentDb for?

Page 76: NoSQL Database in Azure for IoT and Business

IoT day 2015

Maturity: Balancing embedding (ok) and relating (limits)

Searching and Denormalizing

Opportunity

Storing transient Data

Better Opportunities

Storing Files

Append Only

(Table) Storage

Limits from DocumentDb

Page 77: NoSQL Database in Azure for IoT and Business

IoT day 2015

Logs

Attachments

Transient Data

Search

Alternatives for some scenarios

Page 78: NoSQL Database in Azure for IoT and Business

IoT day 2015

Targeted at streaming workloads (E.g. files read from beginning

to end like media files)

Each blob consists of a sequence of blocks

Each block is identified by a Block ID

Each block can be a maximum of 64 MB in size

Size limit 200GB per blob

Azure Storage Blob: Block Blob

Block Blob:

Page 79: NoSQL Database in Azure for IoT and Business

IoT day 2015

Targeted at random read/write workloads (E.g. backing storage

for the VHDs used in Azure VMs)

Each blob consists of an array of pages

Each page is identified by its offset from the start of the blob

Size limit 1TB per blob

Azure Storage Blob: Page Blob

Page 80: NoSQL Database in Azure for IoT and Business

IoT day 2015

Not an RDBMS Table!

The mental picture is ‘Entities’

Entity can have up to 255 properties

Up to 1MB per entity

Partitioning

PartitionKey & RowKey are mandatory properties

Composite key which uniquely identifies an entity

They are the only indexed properties

Defines the sort order

Purpose of the PartitionKey:

Entity Locality

Entities in the same partition will be stored together

Efficient querying and cache locality

Entity Group Transactions

Target throughput – 500 tps/partition, several thousand tps/account

Microsoft Azure monitors the usage patterns of partitions

Automatically load balance partitions

Each partition can be served by a different storage node

Scale to meet the traffic needs of your table

Supports full manipulation (CRUD)

Table Scalability

Azure Table Storage Details

Page 81: NoSQL Database in Azure for IoT and Business

IoT day 2015

Embed a sophisticated search experience into web and mobile

applications without having to worry about the complexities of

full-text search and without having to deploy, maintain or

manage any infrastructure.

Perfect for enterprise cloud developers, cloud software vendors,

cloud architects who need a fully-managed search solution.

Search is a natural backend for CortanaTake a bunch of words apply linguistics return relevant results

Azure Search

Page 82: NoSQL Database in Azure for IoT and Business

IoT day 2015

“Search service”Scope for capacity

Bound to a region

Has keys, indexes, indexers, data sources

ProvisioningAzure Portal

Azure resource management API

Elastic scaleCapacity can be changed dynamically

Replicas ~ more QPS, HA

Partitions ~ more documents, write throughput

Azure Search Service

Page 83: NoSQL Database in Azure for IoT and Business

IoT day 2015

Simple HTTP/JSON API for creating indexes, pushing documents, searching

Keyword search with user-friendly operators (+, -, *, “”, etc.)

Hit highlighting

Faceting (histograms over ranges, typically used in catalog browsing)

Based on ElasticSearch

Search Functionality

Page 84: NoSQL Database in Azure for IoT and Business

IoT day 2015

Linguistics are key in search

Support for 50 languagesWord breaking, stop words, inflections

Lucene analyzersWell-known analyzer stack

Stemming

Microsoft analyzersSame NLP stack used by parts of Office, Bing

Lematization in many languages

Linguistics

Page 85: NoSQL Database in Azure for IoT and Business

IoT day 2015

Suggestions (auto-complete)

Rich structured queries (filter, select, sort) that combines with search

Scoring profiles to model search result relevance

Geo-spatial support integrated in filtering, sorting and ranking (such as finding all

restaurants within 5 KM of your current location)

Search Functionality

Page 86: NoSQL Database in Azure for IoT and Business

IoT day 2015

Redis is an open source, BSD licensed, networked, single-threaded, in-memory key-value cache and store.

Key-value cache and store (value can be a couple of things)

In-memory (no persistence, but you can)

Single-threaded (atomic operations & transactions)

Networked (it’s a server and it does master/slave)

Some other stuff (scripting, pub/sub, Sentinel, snapshot

Caching: Redis

Page 87: NoSQL Database in Azure for IoT and Business

IoT day 2015

Conclusions

Page 88: NoSQL Database in Azure for IoT and Business

IoT day 2015

Pro:

partitioning, replica and scaling at it’s core

self contained documents

programmability in Javascript

SQL like “intradocument” queries

Cons:

No SQL generic queries

Can work alone just in few scenarios

So DocumentDb…

Page 89: NoSQL Database in Azure for IoT and Business

IoT day 2015

Great storage opportunities in Azure

• Log

• Search

• Transient

• Files/Attachments

• SQL!

• And all new Data Analysis/Machine Learning opportunities

Other Not Only SQL alternatives

Page 90: NoSQL Database in Azure for IoT and Business

IoT day 2015

http://bit.do/documentdb-pricing

Capacity Units (CU)Capacity

Throughput (in terms of rate of transactions / second)

• Request Units (RU) = 2000 request per second

• “Request” depends on the size of the document – ex. Uploading 1000 large JSON documents

might count as more than one request

Pricing

Page 91: NoSQL Database in Azure for IoT and Business

Standard pricing tier with hourly billing

1 hr from just $0.034!

Performance levels can be adjusted

Each collection = 10GB of SSD

Collection* perf is set by S1, S2, S3

Limit of 100 collections (1 TB)

Soft limit, can be lifted as needed per account

What does DocumentDB cost?

* collection != table of homogenous entities

collection ~ a data partition

Page 92: NoSQL Database in Azure for IoT and Business

IoT day 2015

NoSQL in Azure per l’IoT

(e il Business)

Marco ParenzanMicrosoft Azure MVP

@marco_parenzan

marco [dot] parenzan [at] 1nn0va [dot] it