Rapid and Scalable Development with MongoDB, PyMongo, and Ming

1.R apid and Scalable Development with MongoDB, PyMongo, and Ming Rick Copeland @rick446 [email_address]

SourceForge and MongoDB

Get started with PyMongo

Sprinkle in some Ming schemas

ORM: When a dict just wont do

What we are learning

3. SourceForges MongoDB

Tried CouchDB liked the dev model, not so much the performance

Migrated consumer-facing pages (summary, browse, download) to MongoDB and it worked great (on MongoDB 0.8 no less!)

All our new stuff uses MongoDB (Allura, Zarkov, Ming, )

4. What is MongoDB? MongoDB (from "humongous") is a scalable, high-performance,open source, document-oriented database. Sharding, Replication 20k inserts/s? No problem Hierarchical JSON-like store,easyto develop app Source Forge. Yeah. We like FOSS 5. MongoDB to Relational Mental Mapping

Rows are flat, documents are nested

Typing: SQL is static, MongoDB is dynamic

Relational (SQL) MongoDB Database Database Table Collection Index Index Row Document Column Field 6.

7. PyMongo: Getting Started

>>>import pymongo

>>>conn= pymongo.Connection( )

>>>conn

Connection('localhost', 27017)

>>>conn .test

Database(Connection('localhost', 27017), u'test')

>>>conn .test.foo

Collection(Database(Connection('localhost', 27017), u'test'), u'foo')

>>>conn[ 'test-db']

Database(Connection('localhost', 27017), u'test-db')

>>>conn[ 'test-db']['foo-collection']

Collection(Database(Connection('localhost', 27017), u'test-db'), u'foo-collection')

>>>conn .test.foo.bar.baz

Collection(Database(Connection('localhost', 27017), u'test'), u'foo.bar.baz')

8. PyMongo: Insert / Update / Delete

>>>db= conn.test

>>>id= db.foo.insert({ 'bar': 1,'baz':[1, 2, { k': 5} ] })

ObjectId('4e712e21eb033009fa000000')

>>>db .foo.find()

>>>list(db .foo.find())

[{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {k': 5}]}]

>>>db .foo.update({ '_id': id}, { '$set': { 'bar': 2}})

>>>db .foo.find().next()

{u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {k': 5}]}

>>>db .foo.remove({ '_id': id})

>>>list(db .foo.find())

9. PyMongo: Queries, Indexes

>>>db .foo.insert([dict(x =x)forxinrange( 10) ])

[ObjectId('4e71313aeb033009fa00000b'), ]

>>>list(db .foo.find({'x': {'$gt':3} }))

[{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')},

{u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')},

{u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, ]

>>>list(db .foo.find({'x': {'$gt':3} }, {'_id': 0 } ))

[{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8},

{u'x': 9}]

>>>list(db .foo.find({'x': {'$gt':3} }, {'_id': 0 } )

.skip( 1) .limit( 2))

[{u'x': 5}, {u'x': 6}]

>>>db .foo.ensure_index([

( 'x', pymongo .ASCENDING), ( 'y', pymongo .DESCENDING) ] )

u'x_1_y_-1'

10. PyMongo: Aggregation et.al.

You gotta write Javascript(for now)

Its pretty slow (single-threaded JS engine)

Javascript is used by

$where in a query

.group(key, condition, initial, reduce, finalize=None)

.map_reduce(map, reduce, out, finalize=None, )

If you shard, you can get some parallelism across multiple mongod instances with .map_reduce() (and possibly $where). Otherwise youre single threaded.

11. PyMongo: GridFS >>>import gridfs >>>fs= gridfs.GridFS(db) >>>withfs .new_file()asfp: ...fp .write( 'The file') ...>>>fp >>>fp ._id ObjectId('4e727f64eb03300c0b000003') >>>fs .get(fp._id).read() 'The file'

Arbitrary data can be attached to the fp object its just a Document

Mime type

Filename

12. PyMongo: GridFS Versioning >>> file_id =fs .put( 'Moar data!', filename = 'foo.txt') >>>fs .get_last_version( 'foo.txt') .read() 'Moar data! >>> file_id =fs .put( 'Even moar data!', filename = 'foo.txt') >>>fs .get_last_version( 'foo.txt') .read() 'Even moar data! >>>fs .get_version( 'foo.txt',- 2) .read() 'Moar data! >>>fs .list() [u'foo.txt'] >>>fs .delete(fs.get_last_version( 'foo.txt') ._id) >>>fs .list() [u'foo.txt'] >>>fs .delete(fs.get_last_version( 'foo.txt') ._id) >>>fs .list() [] 13.

14. Why Ming?

Your data has a schema

Your database can define and enforce it

It can live in your application (as with MongoDB)

Nice to have the schema defined in one place in the code

Sometimes youneeda migration

Changing the structure/meaning of fields

Adding indexes, particularly unique indexes

Sometimes lazy, sometimes eager

Unit of work: Queuing up all your updates can be handy

Python dicts are nice; objects are nicer

15. Ming: Engines & Sessions >>>import ming.datastore >>>ds= ming.datastore.DataStore( 'mongodb://localhost:27017', database = 'test') >>>ds .db Database(Connection('localhost', 27017), u'test') >>>session= ming.Session(ds) >>>session .db Database(Connection('localhost', 27017), u'test') >>>ming .configure(**{ 'ming.main.master':'mongodb://localhost:27017', 'ming.main.database':'test'}) >>>Session .by_name( 'main') .db Database(Connection(u'localhost', 27017), u'test') 16. Ming: Define Your Schema

from ming importschema, Field

WikiDoc=collection( wiki_page' , session,

Field( '_id' , schema . ObjectId()),

Field( 'title' ,str , index = True ),

Field( 'text' ,str ))

CommentDoc=collection( comment' , session,

Field( '_id' , schema . ObjectId()),

Field( 'page_id' , schema . ObjectId(), index = True ),

Field( 'text' ,str ))

17. Ming: Define Your Schema Once more, withfeeling

from ming importDocument, Session, Field

class WikiDoc (Document):

class __mongometa__ :

session =Session.by_name( main')

name = 'wiki_page

indexes =[ ( 'title') ]

title= Field( str)

text= Field( str)

Old declarative syntax continues to exist and be supported, but its not being actively improved

18. Ming: Use Your Schema

>>>doc= WikiDoc( dict(title = 'Cats', text = 'I can haz cheezburger?'))

>>>doc .m.save()

>>>WikiDoc .m.find()

>>>WikiDoc .m.find().all()

[{'text': u'I can haz cheezburger?', '_id': ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}]

>>>WikiDoc .m.find().one().text

u'I can haz cheezburger?

>>>doc= WikiDoc( dict(tietul = 'LOL', text = 'Invisible bicycle'))

>>>doc .m.save()

Traceback (most recent call last):File"", line1,

ming.schema.Invalid : '>: Extra keys: set(['tietul'])

19. Ming: Adding Your own Types

Not usually necessary, built-in SchemaItems provide BSON types, default values, etc.

class ForceInt (ming .schema.FancySchemaItem): def _validate( self, value): try :returnint(value) exceptTypeError: raiseInvalid( 'Bad value%s '% value, value,None) 20. Ming Bonus: Mongo-in-Memory >>>ming .datastore.DataStore( 'mim://', database = 'test') .db mim.Database(test)

MongoDB is (generally) fast

except when creating databases

particularly when you preallocate

Unit tests like things to be isolated

MIM gives you isolation at the expense of speed & scaling

22. Ming ORM: Classes and Collectionsfrom ming importschema, Field from ming.orm import(mapper, Mapper, RelationProperty,ForeignIdProperty) WikiDoc=collection( wiki_page' , session, Field( '_id' , schema . ObjectId()), Field( 'title' ,str , index = True ), Field( 'text' ,str )) CommentDoc=collection( comment' , session, Field( '_id' , schema . ObjectId()), Field( 'page_id' , schema . ObjectId(), index = True ), Field( 'text' ,str )) class WikiPage ( object ):pass class Comment ( object ):pass ormsession . mapper(WikiPage, WikiDoc, properties = dict ( comments = RelationProperty( 'WikiComment' ))) ormsession . mapper(Comment, CommentDoc, properties = dict ( page_id = ForeignIdProperty( 'WikiPage' ), page = RelationProperty( 'WikiPage' ))) Mapper . compile_all() 23. Ming ORM: Classes and Collections (declarative)class WikiPage (MappedClass): class __mongometa__ : session= main_orm_session name= 'wiki_page indexes= ['title' ] _id =FieldProperty(S.ObjectId) title = FieldProperty( str) text= FieldProperty( str) class CommentDoc (MappedClass): class __mongometa__ : session= main_orm_session name= 'comment indexes= ['page_id' ] _id =FieldProperty(S.ObjectId) page_id = ForeignIdProperty(WikiPage) page = RelationProperty(WikiPage) text = FieldProperty( str) 24. Ming ORM: Sessions and Queries

SessionORMSession

My_collection.mMy_mapped_class.query

ORMSession actuallydoesstuff

Track object identity

Track object modifications

Unit of work flushing all changes at once

>>>pg= WikiPage(title= 'MyPage', text = 'is here') >>>session .db.wiki_page.count() 0 >>>main_orm_session .flush() >>>session .db.wiki_page.count() 1 25. Ming ORM: Extending the Session

Various plug points in the session

before_flush

after_flush

Some uses

Logging changes to sensitive data or for analytics purposes

Full-text search indexing

last modified fields

Sprinkle in some Ming Schemas

27. Tips From the Trenches

Watch your document size

Choose your indexes well

Watch your server log; bad queries show up there

Dont go crazy with denormalization

Try to use an index if all you need is a backref

Stale data is a tricky problem

Try to stay with one database

Watch the # of queries

Drop to lower levels (ORMdocumentpymongo) when performance is an issue

28. Future Work

Performance

Analytics in MongoDB: Zarkov

Web framework integration

Magic Columns (?)

29. Related Projects Ming http://sf.net/projects/merciless/ MIT License Zarkov http://sf.net/p/zarkov/ Apache License Allura http://sf.net/p/allura/ Apache License PyMongo http://api.mongodb.org/python Apache License 30. Rick Copeland @rick446 [email_address]

Technology

Rapid and Scalable Development with MongoDB, PyMongo, and Ming