Python Performance Profiling: The Guts And The Glory

Preview:

DESCRIPTION

Your Python program is too slow, and you need to optimize it. Where do you start? With the right tools, you can optimize your code where it counts. We’ll explore the guts of the Python profiler “Yappi” to understand its features and limitations. We’ll learn how to find the maximum performance wins with minimum effort.

Citation preview

Python Profiling:

A. Jesse Jiryu Davis

@jessejiryudavis

MongoDB

The Glory&

The Guts

“PyMongo is slower!compared to the JavaScript version”

MongoDB Node.js driver:!88,000 per secondPyMongo: ! ! ! ! ! ! ! ! ! 29,000 per second

“Why Is PyMongo Slower?”

From:!steve@mongodb.com!To:!! jesse@mongodb.com!CC:!! eliot@mongodb.com

Hi Jesse,!!Why is the Node MongoDB driver 3 times!faster than PyMongo?!http://dzone.com/articles/mongodb-facts-over-80000

The Python Code

# Obtain a MongoDB collection.!import pymongo!!client = pymongo.MongoClient('localhost')!db = client.random!collection = db.randomData!collection.remove()!

n_documents = 80000!batch_size = 5000!batch = []!!import time!start = time.time()

The Python Code

import random!from datetime import datetime!!min_date = datetime(2012, 1, 1)!max_date = datetime(2013, 1, 1)!delta = (max_date - min_date).total_seconds()!

The Python Code

What?!

The Python Codefor i in range(n_documents):! date = datetime.fromtimestamp(! time.mktime(min_date.timetuple())! + int(round(random.random() * delta)))!! value = random.random()! document = {! 'created_on': date,! 'value': value}!! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!

duration = time.time() - start!!print 'inserted %d documents per second' % (! n_documents / duration)!

The Python Code

inserted 30,000 documents per second

The Node.js Code

(not shown)

The Question

Why is the Python script 3 times slower than the equivalent Node script?

Why Profile?

• Optimization is like debugging• Hypothesis:

“The following change will yield a worthwhile improvement.”

• Experiment

• Repeat until fast enough

Why Profile?

Profiling is a way togenerate hypotheses.

Which Profiler?

• cProfile • GreenletProfiler • Yappi

Yappi

By Sümer Cip

Yappi

Compared to cProfile, it is: !

• As fast • Also measures functions • Can measure CPU time, not just wall• Can measure all threads • Can export to callgrind

Yappiimport yappi!!yappi.set_clock_type('cpu')!yappi.start(builtins=True)!!start = time.time()!!for i in range(n_documents):! # ... same code ... !!duration = time.time() - start!stats = yappi.get_func_stats()!stats.save('callgrind.out', type='callgrind')!

Same code as before

KCacheGrind

for index in range(n_documents):! date = datetime.fromtimestamp(! time.mktime(min_date.timetuple())! + int(round(random.random() * delta)))!! value = random.random()! document = {! 'created_on': date,! 'value': value}!! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!

The Python Code

one third

of the tim

e

for index in range(n_documents):! date = datetime.now()!!!! value = random.random()! document = {! 'created_on': date,! 'value': value}!! batch.append(document)! if len(batch) == batch_size:! collection.insert(batch)! batch = []!

The Python Code

The Python Code

• Before: 30,000 inserts per second • After: 50,000 inserts per second

Why Profile?

• Generate hypotheses• Estimate possible improvement

How DoesProfiling Work?

int callback(PyFrameObject *frame,! int what,! PyObject *arg);!

int start(void)!{! PyEval_SetProfile(callback);!}!

PyObject *!PyEval_EvalFrameEx(PyFrameObject *frame)!{! if (tstate->c_profilefunc != NULL) {! tstate->c_profilefunc(frame,! PyTrace_CALL,! Py_None);! }!! /* ... execute bytecode in the frame! * until return or exception... */!! if (tstate->c_profilefunc != NULL) {! tstate->c_profilefunc(frame,! PyTrace_RETURN,! retval);! }!}!

int callback(PyFrameObject *frame,! int what,! PyObject *arg)!{! switch (what) {! case PyTrace_CALL:! {! PyCodeObject *cobj = frame->f_code;! PyObject *filename = cobj->co_filename;! PyObject *funcname = cobj->co_name;!! /* ... record the function call ... */! }! break;!! /* ... other cases ... */!! }!}!

A. Jesse Jiryu Davis

@jessejiryudavis

MongoDB

Recommended