Advanced Schema Design Patterns

Preview:

Citation preview

FEBRUARY 15, 2018 | BELL HARBOR

#MDBlocal

Advanced Schema

Design Patterns

#MDBlocal

{ "name": "Daniel Coupal","jobs_at_MongoDB": [{ "job": "Senior Curriculum Engineer","from": new Date("2016-11") },

{ "job": "Senior Technical Service Engineer","from": new Date("2013-11") }

],"previous_jobs": ["Consultant","Developer","Manager Quality & Tools Team","Manager Software Team","Tools Developer"

],"likes": [ "food", "beers", "movies", "MongoDB" ],"email": "daniel.coupal@mongodb.com"

}

Who Am I?

#MDBlocal

The "Gang of Four":

A design pattern systematically names, explains,

and evaluates an important and recurring design

in object-oriented systems

MongoDB systems can also be built using its own

patterns

PATTERNPattern

#MDBlocal

• 10 years with the document model

• Use of a common methodology and vocabulary when designing schemas for MongoDB

• Ability to model schemas using building blocks

• Less art and more methodology

Why this Talk?

#MDBlocal

Ensure:

• Good performance

• Scalability

despite constraints

• Hardware• RAM faster than Disk

• Disk cheaper than RAM

• Network latency

• Reduce costs $$$

• Database Server• Maximum size for a document

• Atomicity of a write

• Data set• Size of data

Why do we Create Models?

#MDBlocal

However don't Over Design!

#MDBlocal

WMDB -

World Movie Database

Any events, characters and entities depicted in this presentation are fictional.

Any resemblance or similarity to reality is entirely coincidental

#MDBlocal

WMDB -

World Movie Database

First iteration3 collections:

A. moviesB. moviegoersC. screenings

#MDBlocal

Our mission, should we decide to accept it, is to

fix this solution, so it can perform well and scale.

As always, should I or anyone in the audience do

it without training, WMDB will disavow any

knowledge of our actions.

This tape will self-destruct in five seconds. Good

luck!

Mission Possible

#MDBlocal

#MDBlocal

• Frequency of Access• Subset ✔️

• Approximation ✔️

• Extended Reference

Patterns by Category

• Grouping• Computed ✔️

• Bucket

• Outlier

• Representation• Attribute ✔️

• Schema Versioning ✔️

• Document Versioning

• Tree

• Polymorphism

• Pre-Allocation

#MDBlocal

{

title: "Dunkirk",

...

release_USA: "2017/07/23",

release_Mexico: "2017/08/01",

release_France: "2017/08/01",

release_Festival_San_Jose:"2017/07/22"

}

Would need the following indexes:

{ release_USA: 1 }

{ release_Mexico: 1 }

{ release_France: 1 }

...

{ release_Festival_San_Jose: 1 }...

Issue #1: Big Documents, Many Fields

and Many Indexes

#MDBlocal

Pattern #1: Attribute

{

title: "Dunkirk",

...

release_USA: "2017/07/23",

release_Mexico: "2017/08/01",

release_France: "2017/08/01",

release_Festival_San_Jose:"2017/07/22"

}

#MDBlocal

Problem:

• Lots of similar fields

• Common characteristic to search across those fields together

• Fields present in only a small subset of documents

Use cases:

• Product attributes like ‘color’, ‘size’, ‘dimensions’, ...

• Release dates of a movie in different countries, festivals

Attribute Pattern

#MDBlocal

Solution:

• Field pairs in an array

Benefits:

• Allow for non deterministic list of attributes

• Easy to index{ "releases.location": 1, "releases.date": 1 }

• Easy to extend with a qualifier, for example:{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }

Attribute Pattern - Solution

#MDBlocal

Possible solutions:

A. Reduce the size of your working set

B. Add more RAM per machine

C. Start sharding or add more shards

Issue #2: Working Set doesn’t fit in RAM

#MDBlocal

WMDB -

World Movie Database

First iteration3 collections:

A. moviesB. moviegoersC. screenings

#MDBlocal

In this example, we can:

• Limit the list of actors and crew to 20

• Limit the embedded reviews to the top 20

• …

Pattern #2: Subset

#MDBlocal

Problem:

• There is a 1-N or N-N relationship, and only a few documents always need to be shown

• Only infrequently do you need to pull all of the depending documents

Use cases:

• Main actors of a movie

• List of reviews or comments

Subset Pattern

#MDBlocal

Solution:

• Keep duplicates of a small subset of fields in the main collection

Benefits:

• Allows for fast data retrieval and a reduced working set size

• One query brings all the information needed for the "main page"

Subset Pattern - Solution

#MDBlocal

Question:

• Which new MongoDB 3.6 feature will allow me to notify an application if the name of an actor is changed?

Quiz A

Subset Pattern

#MDBlocal

• CPU is on fire!

Issue #3: Lot of CPU Usage

#MDBlocal

{

title: "The Shape of Water",

...

viewings: 5,000

viewers: 385,000

revenues: 5,074,800

}

Issue #3: ..caused by repeated calculations

#MDBlocal

For example:

• Apply a sum, count, ...

• rollup data by minute, hour, day

• As long as you don’t mess with your source, you can recreate the rollups

Pattern #3: Computed

#MDBlocal

Problem:

• There is data that needs to be computed

• The same calculations would happen over and over

• Reads outnumber writes:• example: 1K writes per hour vs 1M read per hour

Use cases:

• Have revenues per movie showing, want to display sums

• Time series data, Event Sourcing

Computed Pattern

#MDBlocal

Solution:

• Apply a computation or operation on data and store the result

Benefits:

• Avoid re-computing the same thing over and over

Computed Pattern - Solution

#MDBlocal

Question:

• Which Relational Database feature is typically used to mimic the computed pattern?

Quiz B

Computed Pattern

#MDBlocal

Issue #4: Lots of Writes

#MDBlocal

Issue #4: … for non critical data

#MDBlocal

• Only increment once in X iterations

• Increment by X

Pattern #4: Approximation

#MDBlocal

#MDBlocal

Problem:

• Data is difficult to calculate correctly

• May be too expensive to update the document every time to keep an exact count

• No one gives a damn if the number is exact

Use cases:

• Population of a country

• Web site visits

Approximation Pattern

#MDBlocal

Solution:

• Fewer stronger writes

Benefits:

• Less writes, reducing contention on some documents

Approximation Pattern –

Solution

#MDBlocal

• Keeping track of the schema version of a document

Issue #5: Need to change the list of fields in the

documents

#MDBlocal

Add a field to track the schema version number, per document

Does not have to exist for version 1

Pattern #5: Schema Versioning

#MDBlocal

Problem:

• Updating the schema of a database is:• Not atomic

• Long operation

• May not want to update all documents, only do it on updates

Use cases:

• Practically any database that will go to production

Schema Versioning Pattern

#MDBlocal

Solution:

• Have a field keeping track of the schema version

Benefits:

• Don't need to update all the documents at once

• May not have to update documents until their next modification

Schema Versioning Pattern –

Solution

#MDBlocal

BACK to reality

#MDBlocal

• How duplication is handledA. Update both source and target in real time

B. Update target from source at regular intervals. Examples:• Most popular items => update nightly

• Revenues from a movie => update every hour

• Last 10 reviews => update hourly? daily?

Aspect of Patterns: Consistency

#MDBlocal

What our Patterns did for us

Problem Pattern

Messy and Large Documents Attribute

Too much RAM Subset

Too much CPU Computed

Too many disk accesses Approximation

No downtime to upgrade schema Schema Versioning

#MDBlocal

• Bucket

• grouping documents together, to have less documents

• Document Versioning

• tracking of content changes in a document

• Outlier

• Avoid few documents drive the design, and impact performance for all

• External Reference

• Tree(s)

• Polymorphism

• Pre-allocation

Other Patterns

#MDBlocal

A. Simple grouping from tables to collections is not optimal

B. Learn a common vocabulary for designing schemas with MongoDB

C. Use patterns as "plug-and-play" to improve performance

Take Aways

#MDBlocal

A full design example for a given problem:

• E-commerce site

• Contents Management System

• Social Networking

• Single view

• …

References for complete Solutions

#MDBlocal

• More patterns in a follow up to this presentation

• MongoDB in-person training courses on Schema Design

• Upcoming Online course atMongoDB University:

• https://university.mongodb.com

• Data Modeling

How Can I Learn More About Schema Design?

#MDBlocal

Question:

• Which Pattern is used in the following document?

{ "name": "Daniel Coupal","jobs_at_MongoDB": [{ "job": "Senior Curriculum Engineer","from": new Date("2016-11") },

{ "job": "Senior Technical Service Engineer","from": new Date("2013-11") }

],"previous_jobs": ["Consultant","Developer","Manager Quality & Tools Team","Manager Software Team","Tools Developer"

],"likes": [ "food", "beers", "movies", "MongoDB" ],"email": "daniel.coupal@mongodb.com"

}

Quiz C

Which Pattern is used

#MDBlocal

Thank You for using MongoDB !