30
© 2011Geeknet Inc R apid and Scalable Development with MongoDB, PyMongo, and Ming Rick Copeland @rick446 [email protected]

Rapid and Scalable Development with MongoDB, PyMongo, and Ming

Embed Size (px)

DESCRIPTION

This talk, given at PyGotham 2011, will teach you techniques using the popular NoSQL database MongoDB and the Python library Ming to write maintainable, high-performance, and scalable applications. We will cover everything you need to become an effective Ming/MongoDB developer from basic PyMongo queries to high-level object-document mapping setups in Ming.

Citation preview

  • 1.R apid and Scalable Development with MongoDB, PyMongo, and Ming Rick Copeland @rick446 [email_address]

2.

  • SourceForge and MongoDB
  • Get started with PyMongo
  • Sprinkle in some Ming schemas
  • ORM: When a dict just wont do
  • What we are learning

3. SourceForges MongoDB

  • Tried CouchDB liked the dev model, not so much the performance
  • Migrated consumer-facing pages (summary, browse, download) to MongoDB and it worked great (on MongoDB 0.8 no less!)
  • All our new stuff uses MongoDB (Allura, Zarkov, Ming, )

4. What is MongoDB? MongoDB (from "humongous") is a scalable, high-performance,open source, document-oriented database. Sharding, Replication 20k inserts/s? No problem Hierarchical JSON-like store,easyto develop app Source Forge. Yeah. We like FOSS 5. MongoDB to Relational Mental Mapping

  • Rows are flat, documents are nested
  • Typing: SQL is static, MongoDB is dynamic

Relational (SQL) MongoDB Database Database Table Collection Index Index Row Document Column Field 6.

  • SourceForge and MongoDB
  • Get started with PyMongo
  • Sprinkle in some Ming schemas
  • ORM: When a dict just wont do
  • What we are learning

7. PyMongo: Getting Started

  • >>>import pymongo
  • >>>conn= pymongo.Connection( )
  • >>>conn
  • Connection('localhost', 27017)
  • >>>conn .test
  • Database(Connection('localhost', 27017), u'test')
  • >>>conn .test.foo
  • Collection(Database(Connection('localhost', 27017), u'test'), u'foo')
  • >>>conn[ 'test-db']
  • Database(Connection('localhost', 27017), u'test-db')
  • >>>conn[ 'test-db']['foo-collection']
  • Collection(Database(Connection('localhost', 27017), u'test-db'), u'foo-collection')
  • >>>conn .test.foo.bar.baz
  • Collection(Database(Connection('localhost', 27017), u'test'), u'foo.bar.baz')

8. PyMongo: Insert / Update / Delete

  • >>>db= conn.test
  • >>>id= db.foo.insert({ 'bar': 1,'baz':[1, 2, { k': 5} ] })
  • >>>id
  • ObjectId('4e712e21eb033009fa000000')
  • >>>db .foo.find()
  • >>>list(db .foo.find())
  • [{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {k': 5}]}]
  • >>>db .foo.update({ '_id': id}, { '$set': { 'bar': 2}})
  • >>>db .foo.find().next()
  • {u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2, {k': 5}]}
  • >>>db .foo.remove({ '_id': id})
  • >>>list(db .foo.find())
  • [ ]

9. PyMongo: Queries, Indexes

  • >>>db .foo.insert([dict(x =x)forxinrange( 10) ])
  • [ObjectId('4e71313aeb033009fa00000b'), ]
  • >>>list(db .foo.find({'x': {'$gt':3} }))
  • [{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')},
  • {u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')},
  • {u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, ]
  • >>>list(db .foo.find({'x': {'$gt':3} }, {'_id': 0 } ))
  • [{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8},
  • {u'x': 9}]
  • >>>list(db .foo.find({'x': {'$gt':3} }, {'_id': 0 } )
  • .skip( 1) .limit( 2))
  • [{u'x': 5}, {u'x': 6}]
  • >>>db .foo.ensure_index([
  • ( 'x', pymongo .ASCENDING), ( 'y', pymongo .DESCENDING) ] )
  • u'x_1_y_-1'

10. PyMongo: Aggregation et.al.

  • You gotta write Javascript(for now)
  • Its pretty slow (single-threaded JS engine)
  • Javascript is used by
    • $where in a query
    • .group(key, condition, initial, reduce, finalize=None)
    • .map_reduce(map, reduce, out, finalize=None, )
  • If you shard, you can get some parallelism across multiple mongod instances with .map_reduce() (and possibly $where). Otherwise youre single threaded.

11. PyMongo: GridFS >>>import gridfs >>>fs= gridfs.GridFS(db) >>>withfs .new_file()asfp: ...fp .write( 'The file') ...>>>fp >>>fp ._id ObjectId('4e727f64eb03300c0b000003') >>>fs .get(fp._id).read() 'The file'

  • Arbitrary data can be attached to the fp object its just a Document
    • Mime type
    • Filename

12. PyMongo: GridFS Versioning >>> file_id =fs .put( 'Moar data!', filename = 'foo.txt') >>>fs .get_last_version( 'foo.txt') .read() 'Moar data! >>> file_id =fs .put( 'Even moar data!', filename = 'foo.txt') >>>fs .get_last_version( 'foo.txt') .read() 'Even moar data! >>>fs .get_version( 'foo.txt',- 2) .read() 'Moar data! >>>fs .list() [u'foo.txt'] >>>fs .delete(fs.get_last_version( 'foo.txt') ._id) >>>fs .list() [u'foo.txt'] >>>fs .delete(fs.get_last_version( 'foo.txt') ._id) >>>fs .list() [] 13.

  • SourceForge and MongoDB
  • Get started with PyMongo
  • Sprinkle in some Ming schemas
  • ORM: When a dict just wont do
  • What we are learning

14. Why Ming?

  • Your data has a schema
    • Your database can define and enforce it
    • It can live in your application (as with MongoDB)
    • Nice to have the schema defined in one place in the code
  • Sometimes youneeda migration
    • Changing the structure/meaning of fields
    • Adding indexes, particularly unique indexes
    • Sometimes lazy, sometimes eager
  • Unit of work: Queuing up all your updates can be handy
  • Python dicts are nice; objects are nicer

15. Ming: Engines & Sessions >>>import ming.datastore >>>ds= ming.datastore.DataStore( 'mongodb://localhost:27017', database = 'test') >>>ds .db Database(Connection('localhost', 27017), u'test') >>>session= ming.Session(ds) >>>session .db Database(Connection('localhost', 27017), u'test') >>>ming .configure(**{ 'ming.main.master':'mongodb://localhost:27017', 'ming.main.database':'test'}) >>>Session .by_name( 'main') .db Database(Connection(u'localhost', 27017), u'test') 16. Ming: Define Your Schema

  • from ming importschema, Field
  • WikiDoc=collection( wiki_page' , session,
  • Field( '_id' , schema . ObjectId()),
  • Field( 'title' ,str , index = True ),
  • Field( 'text' ,str ))
  • CommentDoc=collection( comment' , session,
  • Field( '_id' , schema . ObjectId()),
  • Field( 'page_id' , schema . ObjectId(), index = True ),
  • Field( 'text' ,str ))

17. Ming: Define Your Schema Once more, withfeeling

  • from ming importDocument, Session, Field
  • class WikiDoc (Document):
  • class __mongometa__ :
  • session =Session.by_name( main')
  • name = 'wiki_page
  • indexes =[ ( 'title') ]
  • title= Field( str)
  • text= Field( str)
  • Old declarative syntax continues to exist and be supported, but its not being actively improved

18. Ming: Use Your Schema

  • >>>doc= WikiDoc( dict(title = 'Cats', text = 'I can haz cheezburger?'))
  • >>>doc .m.save()
  • >>>WikiDoc .m.find()
  • >>>WikiDoc .m.find().all()
  • [{'text': u'I can haz cheezburger?', '_id': ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}]
  • >>>WikiDoc .m.find().one().text
  • u'I can haz cheezburger?
  • >>>doc= WikiDoc( dict(tietul = 'LOL', text = 'Invisible bicycle'))
  • >>>doc .m.save()
  • Traceback (most recent call last):File"", line1,
  • ming.schema.Invalid : '>: Extra keys: set(['tietul'])

19. Ming: Adding Your own Types

  • Not usually necessary, built-in SchemaItems provide BSON types, default values, etc.

class ForceInt (ming .schema.FancySchemaItem): def _validate( self, value): try :returnint(value) exceptTypeError: raiseInvalid( 'Bad value%s '% value, value,None) 20. Ming Bonus: Mongo-in-Memory >>>ming .datastore.DataStore( 'mim://', database = 'test') .db mim.Database(test)

  • MongoDB is (generally) fast
    • except when creating databases
    • particularly when you preallocate
  • Unit tests like things to be isolated
  • MIM gives you isolation at the expense of speed & scaling

21.

  • SourceForge and MongoDB
  • Get started with PyMongo
  • Sprinkle in some Ming schemas
  • ORM: When a dict just wont do
  • What we are learning

22. Ming ORM: Classes and Collectionsfrom ming importschema, Field from ming.orm import(mapper, Mapper, RelationProperty,ForeignIdProperty) WikiDoc=collection( wiki_page' , session, Field( '_id' , schema . ObjectId()), Field( 'title' ,str , index = True ), Field( 'text' ,str )) CommentDoc=collection( comment' , session, Field( '_id' , schema . ObjectId()), Field( 'page_id' , schema . ObjectId(), index = True ), Field( 'text' ,str )) class WikiPage ( object ):pass class Comment ( object ):pass ormsession . mapper(WikiPage, WikiDoc, properties = dict ( comments = RelationProperty( 'WikiComment' ))) ormsession . mapper(Comment, CommentDoc, properties = dict ( page_id = ForeignIdProperty( 'WikiPage' ), page = RelationProperty( 'WikiPage' ))) Mapper . compile_all() 23. Ming ORM: Classes and Collections (declarative)class WikiPage (MappedClass): class __mongometa__ : session= main_orm_session name= 'wiki_page indexes= ['title' ] _id =FieldProperty(S.ObjectId) title = FieldProperty( str) text= FieldProperty( str) class CommentDoc (MappedClass): class __mongometa__ : session= main_orm_session name= 'comment indexes= ['page_id' ] _id =FieldProperty(S.ObjectId) page_id = ForeignIdProperty(WikiPage) page = RelationProperty(WikiPage) text = FieldProperty( str) 24. Ming ORM: Sessions and Queries

  • SessionORMSession
  • My_collection.mMy_mapped_class.query
  • ORMSession actuallydoesstuff
    • Track object identity
    • Track object modifications
    • Unit of work flushing all changes at once

>>>pg= WikiPage(title= 'MyPage', text = 'is here') >>>session .db.wiki_page.count() 0 >>>main_orm_session .flush() >>>session .db.wiki_page.count() 1 25. Ming ORM: Extending the Session

  • Various plug points in the session
    • before_flush
    • after_flush
  • Some uses
    • Logging changes to sensitive data or for analytics purposes
    • Full-text search indexing
    • last modified fields

26.

  • SourceForge and MongoDB
  • Get started with PyMongo
  • Sprinkle in some Ming Schemas
  • ORM: When a dict just wont do
  • What we are learning

27. Tips From the Trenches

  • Watch your document size
  • Choose your indexes well
    • Watch your server log; bad queries show up there
  • Dont go crazy with denormalization
    • Try to use an index if all you need is a backref
    • Stale data is a tricky problem
  • Try to stay with one database
  • Watch the # of queries
  • Drop to lower levels (ORMdocumentpymongo) when performance is an issue

28. Future Work

  • Performance
  • Analytics in MongoDB: Zarkov
  • Web framework integration
  • Magic Columns (?)
  • ???

29. Related Projects Ming http://sf.net/projects/merciless/ MIT License Zarkov http://sf.net/p/zarkov/ Apache License Allura http://sf.net/p/allura/ Apache License PyMongo http://api.mongodb.org/python Apache License 30. Rick Copeland @rick446 [email_address]