33
Look Ma! No more blobs Aparna Chaudhary NoSQL matters, @Cologne Germany 2013

Look Ma! No more blobs

Embed Size (px)

DESCRIPTION

GridFS is a storage mechanism for persisting large binary data in MongoDB.

Citation preview

Page 1: Look Ma! No more blobs

Look Ma! No more blobs

Aparna Chaudhary

NoSQL matters, @Cologne Germany 2013

Page 2: Look Ma! No more blobs

EMBRACEPOLYGLOT

PERSISTENCE!

STOP RDBMS ABUSE!

KNOW YOUR USE CASE

Page 3: Look Ma! No more blobs

Parse

Extract

Store

Read XML

We don't do rocket science...

Use Case

Runtime support for document types

Metadata definition provided at runtime

Document type names - max 50 char

Look up content based on metadata

RA

Page 4: Look Ma! No more blobs

Challenges

Storage of up to one million documents of 10KB to 2GB per document type per year

Write 1MB < x msec

Retrieve 1MB < y msec

......and detailsRA

But…the Numbers make it interesting...

Page 5: Look Ma! No more blobs

How?

File System

MongoDB

RDBMS

JCR

Document Management

Page 6: Look Ma! No more blobs

if you want to store files, its logical to use file system.

ain't it?

File System

✓ Ease of Use

✓ No special skill-set

✓ Backup and Recovery

✓ It’s free!

Page 7: Look Ma! No more blobs

How do I name them?

Support for metadata storage?

Performance with too many small files?

Query - Administration?

High Availability?

Limitation on total number of

files?

Page 8: Look Ma! No more blobs

Relational database

IntegrityConsistency

Durability

Atomicity

JoinsBackups

High Availability

You name it, We have it!

RDBMS

Aggregations

Page 9: Look Ma! No more blobs

RDBMS Developer’s Perspective

Page 10: Look Ma! No more blobs

Challenge #1

RA

We need runtime support for document type.

RA

We need runtime support for document type.

Page 11: Look Ma! No more blobs

Challenge #1

DOC_1 DOC_2 DOC_3

DOC_4 DOC_5 DOC_6

Dynamic DDL Generation

DOC_1 DOC_2 DOC_3

DOC_4 DOC_5 DOC_6

Dynamic DDL Generation

Page 12: Look Ma! No more blobs

Challenge #1String concatenations

are ugly…

DEV

String concatenations are ugly…

DEV

Page 13: Look Ma! No more blobs

Challenge #1Let's build a utility.

DEV

Let's build a utility.

DEV

Page 14: Look Ma! No more blobs

Challenge #1

More Work More Work

Page 15: Look Ma! No more blobs

Challenge #2

RA

Document type is 50 char long

RA

Document type is 50 char long

Page 16: Look Ma! No more blobs

Challenge #2TABLE NAME LIMITS

Wait…SQL-92 says 128 Char

?We rule. Let's support only

30 char.

TABLE NAME LIMITS

Wait…SQL-92 says 128 Char

?We rule. Let's support only

30 char.

Page 17: Look Ma! No more blobs

Challenge #2

DOC_TYPE_MAPPING

Let's create a mapping table.

DEV

DOC_TYPE_MAPPING

Let's create a mapping table.

DEV

Page 18: Look Ma! No more blobs

Challenge #2

Ugly unreadable table names!

Ugly unreadable table names!

Page 19: Look Ma! No more blobs

So...f inally...Read XML

Dynamic DDL generation

Document Type Alias

DocumentTypeDefined

Yes

No

Extract Metadata

Store Metadata

Store Content

Simple use case becomes complex...

Page 20: Look Ma! No more blobs

Remember...Our Challenge

QA

Let's see if we are in spec for response time.

Aah..what about performance now?

DEV

Page 21: Look Ma! No more blobs

MongoDB

Document BasedGridFS

B-TreeDynamic Schema

JSON

BSON Query

Scalablehttp://www.10gen.com/presentations/storage-engine-internals

Joins

Complex Transaction

Page 22: Look Ma! No more blobs

F1 F2 F3 F4 F5ID1

ID2

ID3

ID4

ID5

F1

F1

F1

F1

F2

F2 F3 F4 F5 F6

F2 F3 F4 F5 Fx

F8

F3

F9 F7

Concepts

Database

Collection

Collection Collection Collection

CollectionCollection

Database

Collection

Collection Collection Collection

CollectionCollection

Database

Collection

Collection Collection Collection

CollectionCollection

Database

Collection

Collection Collection Collection

CollectionCollection

Table = Collection

Column = Field

Row = Document

Database = Database

Page 23: Look Ma! No more blobs

GridFS

MongoDB divides the

large content into

chunks

Stores Metadata and Chunks separately

http://docs.mongodb.org/manual/core/gridfs/

Page 24: Look Ma! No more blobs

> mybucket.files{ "_id" : ObjectId("514d5cb8c2e6ea4329646a5c"),

"chunkSize" : NumberLong(262144),

"length" : NumberLong(103015),

"md5" : "34d29a163276accc7304bd69c5520e55",

"filename" : "health_record_2.xml",

"contentType" : application/xml,

"uploadDate" : ISODate("2013-03-23T07:41:44.907Z"),

"aliases" : null,

"metadata" : { "fname" : "Aparna", "lname" : "Chaudhary","country" : "Netherlands" }

}

ObjectId - 12 Byte BSON:4 Byte - Seconds since Epoch3 Byte - Machine Id2 Byte - Process Id3 Byte - Counter

Page 25: Look Ma! No more blobs

> mybucket.chunks

{ "_id" : ObjectId("514d5cb8c2e6ea4329646a5d"), "files_id" : ObjectId("514d5cb8c2e6ea4329646a5c"),

"n" : 0,

"data" : BinData(0,...)

}

Page 26: Look Ma! No more blobs

?I'm storing 10KB file, but

would it use 256KB on disk?

Last Chunk =

FileSize % 256+

Metadata overhead

256

1128KB

256 256 256 104 + x

10KB

10 + x

Chunk is as big as it

needs to be...

Page 27: Look Ma! No more blobs

Challenge #1

DEV

MongoDB supports Dynamic Schema.

You can use collection per docType and they are created dynamically.

RA

We need runtime support for document type.

Page 28: Look Ma! No more blobs

Challenge #2

RA

Document type is 50 char long

DEV

MongoDB namespace can be up to 123 char.

Page 29: Look Ma! No more blobs

So...f inally...

Simple use case remains simple...well becomes

simpler...

Read XML

Extract Metadata

Store Metadata & Content

Page 30: Look Ma! No more blobs

Remember...Our Challenge

QA

Let's see if we are in spec for response time.

DEV

Performance test is part of our definition of 'DONE'

Page 31: Look Ma! No more blobs

BEcause seeing is believing!

Demo

‣ GridFS 2.4.0

‣ PostgreSQL 9.2

‣ Spring Data

‣ JMeter 2.7

‣ Mac OS X 10.8.3 2.3GHz Quad-Core Intel Core i7, 16GB RAM

https://github.com/aparnachaudhary/nosql-matters-demo

Page 32: Look Ma! No more blobs

EMBRACEPOLYGLOT

PERSISTENCE!

STOP RDBMS ABUSE!

KNOW YOUR USE CASE

@aparnachaudhary

Page 33: Look Ma! No more blobs

Java Developer, Data Lover

Eindhoven, Netherlands

http://blog.aparnachaudhary.com/

@aparnachaudhary

Thank You!