Leveraging Open Technologies Pragmatically Within a Traditionally Closed Ecosystem | AnacondaCON...

Preview:

Citation preview

Leveraging open technologies

pragmatically within a

traditionally closed ecosystem

Dharhas Pothina US Army Engineer Research and Development Center

Some History• 5 years university research• 10 years state government• 3 years federal government

Started out with mostly in-house codebases plus proprietary tools and some scripting for automation / data transformation

my workflow

circa 2008

• bash• perl• awk/

sed• fortran• c

artisanal data scienceworkflows are fragile and ineffective

Image credit: Quilted Northern April Fools

Why Python?

Transitioning was easy• I could understand the programs I read• Had the scientific libraries I needed• Could interoperate with everything in my processing

pipeline• Had powerful data structures and language features• Great community support

I tried learning Java 3 time in my career Python was nicer

Python Scales

Easy things are easyComplex things are sensibleHard things are possible

Non Technical User/AnalystData Scientist/EngineerSoftware Developer

PYTHON IS OPTIMIZED FOR HUMAN PRODUCTIVITY RATHER THAN MACHINE PRODUCTIVITY

Image credit: Sonny Abesamis (CC BY 2.0)

Closed Ecosystems are Resource Limited• Limited staff• Limited time• Limited

expertise• Limited

fundingso stop building your own machine learning library

and use your limited resources on mission

critical activities instead

Reduce License Friction*• impacts development speed• impacts agility/trying new things• impacts deployment • Impacts scaling

whenever possible avoid proprietary

tools*

* If you work for state/federal agencies, or anywhere with a long procurement process

internal teams cannot match the resources of the open data science community (neither can commercial vendors)

Build a layer not a internal platform

Internal Software

or you will own that puppy…

Image credit: Marcos Leal (CC BY 2.0)

Risks

Be very selective• Bus Factor + Code Complexity• Software Ecosystem• Code Quality• Python 3 compatibility• Continuous Integration• Cross Platform Compatibility• License – BSD, MIT, Apache

understand your dependencies

Packaging is hard (we use )

Packaged by Continuum

Packaged by Community

Internal, Secret & Export Restricted

Should you make internal code open?• Can (but may not) gain you external contributors• Takes effort • Refactoring/Clean Up• Documentation• Legal Review• Tests/Continuous Integration

• Social contract• Gains you the open infrastructure ecosystem – ci,

github, conda-forge, etcmost of the steps you need convert a tool to be open are the same to make it useful across your

own organization

Recommended