Click here to load reader
Upload
work-bench
View
7.368
Download
0
Embed Size (px)
Citation preview
computingJulia
Julia + R for Data Science
Stefan Karpinski
u alj i
Center For Data Science
In a single project, I was using:
‣ Matlab for linear algebra
‣ R for stats & visualization
‣ C for the fast stuff
‣ Ruby to tie it all together
My Data Science Stack circa 2009
What is the “Two Language Problem”?
a.k.a. “Ousterholt’s dichotomy”
“systems languages” “scripting languages”static dynamic
compiled interpreteduser types standard types
fast slowhard easy
What is the “Two Language Problem”?
Because of this dichotomy, a two-tier compromise is standard:
‣ for convenience, use a scripting language (Python, R, Matlab)
‣ but do all the hard stuff in a systems language (C, C++, Fortran)
Pragmatic for many applications, but has drawbacks
‣ aren’t the hard parts exactly where you need an easier language?
‣ forces vectorization everywhere, even when awkward or wasteful
‣ creates a social barrier – a wall between users and developers
The Julian Unification
Ousterholt’s dichotomy
static dynamic
compiled interpreted
user types standard types
hard easy
fast slow
The Julian Unification
dynamic
compiled
user types standard types
Julia
and
easy
fast
Speedtim
e re
lativ
e to
C
Speed vs. Productivitytim
e re
lativ
e to
C
normalized lines of code
Julia
Rube Goldberg Revised
Here’s my data science stack today:
‣ Matlab Julia for linear algebra
‣ R Julia for stats & visualization
‣ C Julia for the fast stuff
‣ Ruby Julia to tie it all together
Why Try Julia?
‣ Because it’s fun!
some people enjoy trying out and learning new languages
‣ You’re in a world of pain with other tools
R / Python / Matlab isn’t fast enough for the work you doRcpp / Cython / C++ are unappealing, not productive enoughJulia is in the sweet spot for speed & high productivity
‣ For the fancy features
multiple dispatch, coroutine-based I/O, macros & metaprogramming,efficient custom types, advanced linear algebra, …