27
© 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick Copeland @rick446 [email protected]

© 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

Embed Size (px)

DESCRIPTION

-Get started with PyMongo -Sprinkle in some Ming schemas -ORM: When a dict just won’t do

Citation preview

Page 1: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Rapid and Scalable Development with MongoDB,

PyMongo, and Ming

Rick Copeland@rick446

[email protected]

Page 2: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Getting Acquainted

http://www.flickr.com/photos/fazen/9079179/

Page 3: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

- Get started with PyMongo

- Sprinkle in some Ming schemas

- ORM: When a dict just won’t do

Page 4: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

PyMongo: Getting Started>>> import pymongo>>> conn = pymongo.Connection()>>> connConnection('localhost', 27017)

>>> conn.testDatabase(Connection('localhost', 27017), u'test')

>>> conn.test.fooCollection(Database(Connection('localhost', 27017), u'test'),

u'foo')

>>> conn['test-db']Database(Connection('localhost', 27017), u'test-db')

>>> conn['test-db']['foo-collection']Collection(Database(Connection('localhost', 27017), u'test-db'),

u'foo-collection')

>>> conn.test.foo.bar.bazCollection(Database(Connection('localhost', 27017), u'test'),

u'foo.bar.baz')

Page 5: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

PyMongo: Insert / Update / Delete>>> db = conn.test>>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2, {’k':5} ] })>>> idObjectId('4e712e21eb033009fa000000')

>>> db.foo.find()<pymongo.cursor.Cursor object at 0x29c7d50>

>>> list(db.foo.find())[{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1,

2, {k': 5}]}]

>>> db.foo.update({'_id':id}, {'$set': { 'bar':2}})>>> db.foo.find().next(){u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2,

{k': 5}]}

>>> db.foo.remove({'_id':id})>>> list(db.foo.find())[ ]

Page 6: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

PyMongo: Queries, Indexes>>> db.foo.insert([ dict(x=x) for x in range(10) ])[ObjectId('4e71313aeb033009fa00000b'), … ]

>>> list(db.foo.find({ 'x': {'$gt': 3} }))[{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')},

{u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')},

{u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, …]

>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ))[{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8},

{u'x': 9}]

>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ) .skip(1).limit(2))

[{u'x': 5}, {u'x': 6}]

>>> db.foo.ensure_index([ ('x', pymongo.ASCENDING), ('y', pymongo.DESCENDING) ] )

u'x_1_y_-1'

Page 7: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

PyMongo and Locking

One Rule (for now): Avoid Javascripthttp://www.flickr.com/photos/lizjones/

295567490/

Page 8: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

PyMongo: Aggregation et.al. You gotta write Javascript (for now) It’s pretty slow (single-threaded JS engine) Javascript is used by

$where in a query .group(key, condition, initial, reduce, finalize=None) .map_reduce(map, reduce, out, finalize=None, …)

If you shard, you can get some parallelism across multiple mongod instances with .map_reduce() (and possibly ‘$where’). Otherwise you’re single threaded.

Page 9: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

PyMongo: GridFS>>> import gridfs>>> fs = gridfs.GridFS(db)>>> with fs.new_file() as fp:... fp.write('The file')... >>> fp<gridfs.grid_file.GridIn object at 0x2cae910>>>> fp._idObjectId('4e727f64eb03300c0b000003')>>> fs.get(fp._id).read()'The file'

Arbitrary data can be stored in the ‘fp’ object – it’s just a Document (but please put it in ‘fp.metadata’) Mime type Filename

Page 10: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

PyMongo: GridFS Versioning>>> file_id = fs.put('Moar data!', filename='foo.txt')>>> fs.get_last_version('foo.txt').read()'Moar data!’>>> file_id = fs.put('Even moar data!', filename='foo.txt')>>> fs.get_last_version('foo.txt').read()'Even moar data!’>>> fs.get_version('foo.txt', -2).read()'Moar data!’>>> fs.list()[u'foo.txt']>>> fs.delete(fs.get_last_version('foo.txt')._id)>>> fs.list()[u'foo.txt']>>> fs.delete(fs.get_last_version('foo.txt')._id)>>> fs.list()[]

Page 11: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

- Get started with PyMongo

- Sprinkle in some Ming schemas

- ORM: When a dict just won’t do

Page 12: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Why Ming? Your data has a schema

Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code

Sometimes you need a “migration” Changing the structure/meaning of fields Adding indexes, particularly unique indexes Sometimes lazy, sometimes eager

“Unit of work:” Queuing up all your updates can be handy

Python dicts are nice; objects are nicer

Page 13: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming: Engines & Sessions>>> import ming.datastore>>> ds = ming.datastore.DataStore('mongodb://localhost:27017',

database='test')

>>> ds.dbDatabase(Connection('localhost', 27017), u'test')

>>> session = ming.Session(ds)>>> session.dbDatabase(Connection('localhost', 27017), u'test')

>>> ming.configure(**{'ming.main.master':'mongodb://localhost:27017', 'ming.main.database':'test'})

>>> Session.by_name('main').dbDatabase(Connection(u'localhost', 27017), u'test')

Page 14: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Surprising Data

http://www.flickr.com/photos/pictureclara/5333266789/

Page 15: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming: Define Your Schema

from ming import schema, Field

WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))

Page 16: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming: Define Your Schema…Once more, with feeling

from ming import Document, Session, Fieldclass WikiDoc(Document): class __mongometa__: session=Session.by_name(’main')

name='wiki_page’

indexes=[ ('title') ]

title = Field(str)

text = Field(str)

Old declarative syntax continues to exist and be supported, but it’s not being actively improved

Sometimes nice when you want additional methods/attrs on your document class

Page 17: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming: Use Your Schema>>> doc = WikiDoc(dict(title='Cats', text='I can haz cheezburger?'))

>>> doc.m.save()>>> WikiDoc.m.find()<ming.base.Cursor object at 0x2c2cd90>

>>> WikiDoc.m.find().all()[{'text': u'I can haz cheezburger?', '_id': ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}]

>>> WikiDoc.m.find().one().textu'I can haz cheezburger?’

>>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle'))>>> doc.m.save()Traceback (most recent call last): File "<stdin>", line 1, …

ming.schema.Invalid: <class 'ming.metadata.Document<wiki_page>'>: Extra keys: set(['tietul'])

Page 18: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming Bonus:Mongo-in-Memory

>>> ming.datastore.DataStore('mim://', database='test').dbmim.Database(test)

MongoDB is (generally) fast … except when creating databases … particularly when you preallocate

Unit tests like things to be isolated

MIM gives you isolation at the expense of speed & scaling

Page 19: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

- Get started with PyMongo

- Sprinkle in some Ming schemas

- ORM: When a dict just won’t do

Page 20: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming ORM: Classes and Collections from ming import schema, Fieldfrom ming.orm import (mapper, Mapper, RelationProperty,

ForeignIdProperty)

WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))

class WikiPage(object): passclass Comment(object): pass

ormsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('WikiComment')))ormsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage')))

Page 21: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming ORM: Classes and Collections (declarative)

class WikiPage(MappedClass): class __mongometa__: session = main_orm_session name='wiki_page’ indexes = [ 'title' ]

_id=FieldProperty(S.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments = RelationProperty(‘Comment’)

class Comment(MappedClass): class __mongometa__: session = main_orm_session name='comment’ indexes = [ 'page_id' ]

_id=FieldProperty(S.ObjectId) page_id = ForeignIdProperty(WikiPage) page = RelationProperty(WikiPage) text = FieldProperty(str)

Page 22: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming ORM: Sessions and Queries Session ORMSession My_collection.m… My_mapped_class.query… ORMSession actually does stuff

Track object identity Track object modifications Unit of work flushing all changes at once

>>> pg = WikiPage(title='MyPage', text='is here')>>> session.db.wiki_page.count()0

>>> main_orm_session.flush()>>> session.db.wiki_page.count()1

Page 23: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming Plugins

http://www.flickr.com/photos/39747297@N05/5229733647/

Page 24: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming ORM: Extending the Session Various plug points in the session

before_flush after_flush

Some uses Logging changes to sensitive data or for

analytics purposes Full-text search indexing “last modified” fields Performance instrumentation

Page 25: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Ming ORM: Extending the Mapper Various plug points in the mapper

before_/after_: Insert Update Delete Remove

Some uses Collection/model-specific logging (user creation,

etc.) Anything you might want a SessionExtension for

but would rather do per-model

Page 26: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

Related Projects

Minghttp://sf.net/projects/

merciless/MIT License

Zarkovhttp://sf.net/p/zarkov/

Apache License

Allurahttp://sf.net/p/allura/

Apache License

PyMongohttp://

api.mongodb.org/python

Apache License

Page 27: © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick

© 2011Geeknet Inc

Rick Copeland@rick446

[email protected]://www.flickr.com/photos/f-oxymoron/

5005673112/