Click here to load reader
Upload
katerina-nerush
View
1.037
Download
19
Embed Size (px)
Citation preview
@KNerush @Volodymyrk
Clean CodeIn Jupyter notebooks, using Python
1
5th of July, 2016
@KNerush @Volodymyrk
Volodymyr (Vlad) Kazantsev
Head of Data @ product madness
Product Manager
MBA @LBS
Graphics programming
Writes code for money since 2002
Math degree2
Kateryna (Katya) Nerush
Mobile Dev @ Octopus Labs
Dev Lead in Finance
Data Engineer
Web Developer
Writes code for money since 2003
CS degree
@KNerush @Volodymyrk
Why we end-up with messy ipy notebooks?
3
Coding
Stats Business
@KNerush @Volodymyrk
Who are Data Scientists, really?
4
Coding
Stats Business “In a nutshell, coding is telling a computer to do something using a language it understands.”
Data Science with Python
@KNerush @Volodymyrk
It is not going to production anyway!
5
@KNerush @Volodymyrk
“Any fool can write code that a computer can understand. Good programmers write code that humans can understand” - Kent Beck, 1999
6
WTF! How am I suppose to validate this??
Sorry, but how do can I calculate 7 day retention ?
@KNerush @Volodymyrk
From Prototype to ... The Data Science Spiral
7
Ideas & Questions
Data Analysis
Insights
Impact
@KNerush @Volodymyrk
You do it for your own good..
8
Re-run all AB tests analysis for the last months, by tomorrow
Ideas & Questions
Data Analysis
Insights
Impact
@KNerush @Volodymyrk
Part 2What can Data Scientists learn from
Software Engineers?
9
@KNerush @Volodymyrk
Robert C. Martin, a.k.a. “Uncle Bob”
10
https://cleancoders.com/
@KNerush @Volodymyrk
“Clean Code” ?
11
Pleasingly graceful and stylish in appearance or manner
Bjarne StroustrupInventor of C++
Clean code reads like well written proseGrady Boochcreator of UML
.. each routine turns out to be pretty much what you expected
Ward Cunninghaminventor of Wiki and XP
@KNerush @Volodymyrk
One does not simply start writing clean code..
12
First make it work,Then make it Right,Then make it fast and small
Kent Beckco-inventor of XP and TDD
Leave the campground cleaner than you found it
- Run all the tests
- Contains no duplicate code
- Expresses all ideas...
- Minimize classes and methods
Ron Jeffriesauthor of Extreme
Programming Installed
The Boy Scouts of America
Applied to programming by Uncle Bob
@KNerush @Volodymyrk
I'm not a great programmer; I'm just a good programmer with great habits.
13
Kent Beck
@KNerush @Volodymyrk
“There are only two hard problems in Computer Science: cache invalidation and naming things" - Phil Karlton
long_descriptive_names
Avoid: x, i, stuff, do_blah()
Pronounceable and Searchable
revenue_per_payer vs. arpdpu
Avoid encodings, abbreviations, prefixes, suffixes.. if possible bonus_points_on_iphone vs. cns_crm_dip
Add meaningful contextdaily_revenue_per_payer
Don’t be lazy. Spend time naming and renaming things.14
@KNerush @Volodymyrk
“each routine turns out to be pretty much what you expected” - Ward Cunningham
Small
Do one thing
One Level of Abstraction
Have only few arguments (one is the best)
Less important in Python, with named arguments.
15
@KNerush @Volodymyrk
Use good names
Avoid obvious comments.
Dead Commented-out Code
ToDo, licenses, history, markup for documentation and other nonsense
But there are exceptions..
“When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous” Kent Beck
16
@KNerush @Volodymyrk
// When I wrote this, only God and I understood what I was doing// Now, God only knows
17
@KNerush @Volodymyrk
// sometimes I believe compiler ignores all my comments
18
@KNerush @Volodymyrk
/*** Always returns true.*/public boolean isAvailable() { return false;}
19
@KNerush @Volodymyrk
“Long functions is where classes are trying to hide” - Robert C. Martin
20
Small
Do one thing
SOLID, Design Patterns, etc.
@KNerush @Volodymyrk
Code conventions
Team should produce same style code as if that was one person
Team conventions over language one, over personal ones
Automate style formatting
21
@KNerush @Volodymyrk
Part 3How to write Clean Code in Python?
(e.g. this is not Java)
22
@KNerush @Volodymyrk
● Indentation● Tabs or Spaces?● Maximum Line Length● Should a line break before or after a binary operator?● Blank Lines● Imports● Comments● Naming Conventions
Example:
PEP 8 -- Style Guide for Python Code
23
foo = long_function_name(var_one, var_two, var_three, var_four)
foo = long_function_name(var_one, var_two, var_three, var_four)
Good Bad
https://www.python.org/dev/peps/pep-0008/
@KNerush @Volodymyrk
Google Python Style Guide
24
https://google.github.io/styleguide/pyguide.html
@KNerush @Volodymyrk25
My favourite !
This is not Java or C++
Functions are first-class objects
Duck-typing as an interface
No setters/getters
Itertools, zip, enumerate
etc.
@KNerush @Volodymyrk
Part 4How to write Clean Python Code in
Jupyter Notebook?
26
@KNerush @Volodymyrk
1. Imports
27
2. Get Data
5.Visualisation
6. Making sense of the data
4. Modelling
3. Transform Data
Typical structure of the ipynb
@KNerush @Volodymyrk
How big should a notebook file be?
28
@KNerush @Volodymyrk
How big should a notebook file be?
Hypothesis - Data - Interpretation
29
@KNerush @Volodymyrk
Keep your notebooks small!
(4-10 cells each)
30
@KNerush @Volodymyrk
Example:
Tip 1: break fat notebook into many small ones
31
1_data_preparation.ipynb
df.to_pickle(‘clean_data_1.pkl)
2_linear_model.py
df = pd.read_pickle(‘clean_data_1.pkl)
3_ensamble.py
df = pd.read_pickle(‘clean_data_1.pkl)
@KNerush @Volodymyrk
Tip 2: shared library
Data access
Common plotting functionality
Report generation
Misc. utils
32
acme_data_utils Data_access.py plotting.py setup.py tests/
@KNerush @Volodymyrk
Tip 3: Don’t just be pythonic. Be IPythonicDon’t hide “secret sauce” inside imported module
BAD:
Good:
33
@KNerush @Volodymyrk
Clean code reads like well written prose
34
Grady Booch
@KNerush @Volodymyrk
Good jupyter notebook reads like well written prose
35
@KNerush @Volodymyrk
How big should one Cell be?
36
@KNerush @Volodymyrk
One “idea - execution - output” triplet per cell
Import Cell: expected output is no import errors
CMD+SHIFT+P
37
Tip 4: each cell should have one logical output
@KNerush @Volodymyrk
Tip 5: write tests .. in jupyter notebooks
38
https://pypi.python.org/pypi/pytest-ipynb
@KNerush @Volodymyrk
Tip 6: ..to the cloud
39
@KNerush @Volodymyrk
Code Smells .. in ipynb
- Cells can’t be executed in order (with runAll and Restart&RunAll)
- Prototype (check ideas) code is mixed with “analysis” code
- Debugging cells
- Copy-paste cells
- Duplicate code (in general)
- Multiple notebooks that re-implement the same function
40
@KNerush @Volodymyrk
Tip 7: Run notebook from another notebook!
41
analysis.ipynb
@KNerush @Volodymyrk
Make Data Product from notebooks!
42
@KNerush @Volodymyrk
Summary: How to organise a Jupyter project
1. Notebook should have one Hypothesis-Data-Interpretation loop
2. Make a multi-project utils library
3. Good jupyter notebook reads like a well written prose
4. Each cell should have one and only one output
5. Write tests in notebooks
6. Deploy a shared Jupyter server
7. Try to keep code inside notebooks. Avoid refactoring to modules, if possible.
43