Upload
jon-haddad
View
364
Download
4
Embed Size (px)
Citation preview
©2013 DataStax Confidential. Do not distribute without consent.
@rustyrazorblade
Jon HaddadTechnical Evangelist, DataStax
Python & Cassandra
1
This should be boring• Talking to a database should not
be any of the following: • Exciting • "AH HA!" • Confusing
[email protected]:rustyrazorblade/python-presentation.git
Agenda• Go over driver basic concepts • Connecting • Perform queries • Introduce object mapper
(cqlengine) • Application integration
DataStax Native Python Driver• Talks to Cassandra • Connection pooling • Aware of cluster topology • Automatic retries / failure
management • Load balancing •Will include object mapper
(cqlengine) in next release • Fully Open Source (Apache
License)
Connect to Cassandra• Import and create a Cluster instance • Cluster takes options such as load balancing policy, reconnect policy, retry
policy • On connection, driver discovers entire cluster automatically
Executing queries• CQL: Similar to SQL • session.execute()• Create tables, insert, selects • Can accept simple strings • Not token aware
Prepared Statements• Use for all queries (inserts / updates / deletes) • Decrease server load • Increase security • Allows for token aware queries
Async Queries• Prepared statements required! •Much faster than sync • Utilize the entire cluster • Driver can help us here •We can use futures
1 statement = """INSERT INTO sensor 2 (sensor_id, name, created_at) 3 VALUES (?, ?, ?)""" 4 5 insert_sensor = session.prepare(statement) 6 7 def create_sensor_entries_callback(response, sensor_id): 8 print "CALLBACK" 9 10 for x in range(10): 11 sensor_data = (uuid.uuid4(), "sensor %d" % x, datetime.now()) 12 future = session.execute_async(insert_sensor, sensor_data) 13 future.add_callback(create_sensor_entries_callback, sensor_id) 14
Async Queries w/ Callbacks
callback function
add callback
1 from cassandra.concurrent import execute_concurrent_with_args 2 3 stmt = """SELECT * FROM sensor_data WHERE sensor_id=? 4 ORDER BY created_at DESC LIMIT 1""") 5 6 select_statement = session.prepare(stmt) 7 8 sensor_ids = [["f472d5ff-0c76-404a-8044-038db416685e"], 9 ["940cb741-d5b5-4c5d-82f5-bf1aa61c6d47"], 10 ["497d4b2c-cba2-4d0f-bd80-42de612690fd"], 11 ["1bdeac75-7e12-43ba-80b5-2d38405f9843"] 12 13 result = execute_concurrent_with_args(session, select_statement, sensor_ids)
Async Queries (managed)
prepared statement
automatically manages concurrency
Performance Considerations• Like SQL, CQL features IN() but in
general, it's terrible for performance • Results in more GC & perf
problems • BATCH has the same issue • Failure to get a single result
causes entire IN() or batch to retry
Object Mapper
Defining Models• Each model maps to a single table • Every model inherits from cassandra.cqlengine.models.Model • Define fields in your table programatically • Collections map to native Python types (lists, sets, dict) • Table management included (no need to write ALTER)
Model with Collections• Sets & Maps are most useful • Use to denormalize • Lists can have performance issues if misused
1 class Message(Model): 2 message_id = TimeUUID(primary_key=True, default=uuid1) 3 subject = Text() 4 body = Text() 5 addressed_to = Set(UUID) 6 7 class Photo(Model): 8 photo_id = UUID(primary_key=True, default=uuid4) 9 title = Text() 10 likes = Map<UUID, Text>
Clustering Keys• Automatically determined by
ordering in model • First primary key is partition key • The rest are clustering keys
1 class UsersInGroup(Model): 2 group_id = UUID(primary_key=True) 3 user_id = UUID(primary_key=True) 4 is_admin = Boolean() 5 6
1 class UsersInGroupByState(Model): 2 group_id = UUID(primary_key=True, partition_key=True) 3 state = Text(primary_key=True, partition_key=True 4 user_id = UUID(primary_key=True) 5 is_admin = Boolean(default=False)
Inserting Data• Model.create(**kwargs)• Performs validation • Supports custom validation • Supports TTLs
Lightweight Transactions• Uses paxos for consensus • IF NOT EXISTS for INSERT • IF FIELD=VALUE for UPDATE • Use sparingly - requires
several round trips
Batches• Use only to maintain multiple views (for consistency purposes)
1 class User(Model): 2 name = Text(primary_key=True) 3 twitter = Text() 4 email = Text() 5 6 class TwitterToUser(Model): 7 twitter = Text(primary_key=True) 8 name = Text() 9 10 (twitter, name) = ("rustyrazorblade", "jon") 11 12 with BatchQuery() as b: 13 User.batch(b).create(name=name, twitter=twitter) 14 EmailToUser.batch(b).create(twitter=twitter, name=name)
Fetching a Row•Model.get() can be used to
fetch a single row •Will throw a DoesNotExist
exception if not found
Fetching Many Rows•Model.objects() accepts any filter acceptable to Cassandra
Table Properties• Every table option supported • Compaction • gc_grace_seconds • read repair chance • caching
Table Inheritance•Multiple tables with similar fields • Query Pattern: filtering
Table Polymorphism• Similar to inheritance • Uses a single table • Query pattern: select all types
Application Development
Virtual Environments• virtualenv is your friend! • mkvirtualenv also your friend! • pip install mkvirtualenv
Flask==0.10.1blist==1.3.6
cassandra-driver==2.1.2Flask==0.9.0rednose==0.4.1
ipdb==0.7ipdbplugin==1.2ipython==2.3.1mock==1.0.1nose==1.3.4
All sandboxed environments
Integrations• Django • django-cassandra-engine • Integrates with manage.py
• Flask • use @app.before_first_request
• General rule: connect post-fork
Go build stuff!
©2013 DataStax Confidential. Do not distribute without consent. 28