8
Building Apps with the Cassandra Python Driver Eddie Satterly– CTO Big Data & Analytics at CSC Dial In: 1-877-668-4493 Access Code: 807 224 168

Webinar | Building Apps with the Cassandra Python Driver

Embed Size (px)

DESCRIPTION

With the new Python driver for Cassandra it is easy to build integrations and apps that use Cassandra seamlessly as a back in. This session will explore what it takes to build the app and the features available with the new Python drivers.

Citation preview

Page 1: Webinar | Building Apps with the Cassandra Python Driver

Building Apps with the Cassandra Python Driver

Eddie Satterly– CTO Big Data & Analytics at CSC

Dial In: 1-877-668-4493 Access Code: 807 224 168

Page 2: Webinar | Building Apps with the Cassandra Python Driver

Where is the Driverhttps://github.com/datastax/python-driver

Page 3: Webinar | Building Apps with the Cassandra Python Driver

Key Features

The driver is a connection handler for the Cassandra system underneath your app with a low-level API. The key features which really helped simplify the python code from the earlier version of the app are:

Connection Pooling & Node Discovery – This lets you connect to the whole set of nodes providing only the seed nodes in your list. With my old driver you had to provide the list of all nodes and make the python code decide how to connect.

You give it this set of nodes 192.168.1.1 & 192.168.1.2 and the driver makes a connection and automatically discovers all other nodes in the cluster instance.

Page 4: Webinar | Building Apps with the Cassandra Python Driver

Key Features Cont.

Cluster Attributes – There are several cluster object attributes you can set but some of the key ones are the ability to set a default keyspace via the method cluster.connect(‘mykeyspace’) as well as setting the CQL version for cluster that run in mixed mode due to different timing of data models being built also metrics_enabled which controls metrics collection

SSL_Options – This attribute is called out separately due to the high value of this in environments where client to node communication needs to be encrypted and that feature is turned on cluster side. While this is not turned on by default in my app it is needed for many of the customers that are using it.

Load balancing – This is a great added feature that really helps to avoid hotspot nodes in the older driver approach as now you set the policy in an attribute (roundrobin is the default) and the driver controls connection. In early test with the old driver even though the code was supposed to pick a pseudo-random node affinity seemed to happen and creat hotspot nodes for queries.

Page 5: Webinar | Building Apps with the Cassandra Python Driver

Key Features Cont.default_timeout– Setting a timeout so that the app can detect failures and respond without leaving the client hanging is key

row_factory – This lets you determine what format to return the results in. This is super valuable to make sure your app has the data returned in the optimal way for analysis and manipulation. There were over 50 lines on code in my old python scripts to handle one-offs that are now gone since this feature exists. Below are the options:

execute_async() – This is one of the best features in the new driver and makes the processing time for requests much faster from the client PoV. There is a method to call to force blocking for results to this if needed but in most cases doing other work while waiting on results providers speeds up the response times by many milliseconds.

Page 6: Webinar | Building Apps with the Cassandra Python Driver

Take a Look at Docs

There are many other features I did not call out so take a look at:

http://datastax.github.io/python-driver/index.html

http://datastax.github.io/python-driver/api/index.html

For high throughput operations like remote lookups I highly suggest using multiprocessing module instead of using multithreading, but make sure you understand the implication with object passing.

Page 7: Webinar | Building Apps with the Cassandra Python Driver

How I Use It

Take a look at my github in a couple of weeks the new version of the app will be there using this driver once all the final testing is done. The current version there is using the old driver and approach so look for v2.0https://github.com/esatterly/splunk-cassandra

Build your own playgrounds and figure out the right options and configuration settings to return data and do analysis and manipulation on it. I will be putting two other apps out in the next few months for other non-Splunk use cases as well so stay tuned.

Page 8: Webinar | Building Apps with the Cassandra Python Driver

Thank You

Questions?