10

Click here to load reader

Julia + R for Data Science

Embed Size (px)

Citation preview

Page 1: Julia + R for Data Science

computingJulia

Julia + R for Data Science

Stefan Karpinski

u alj i

Center For Data Science

Page 2: Julia + R for Data Science

In a single project, I was using:

‣ Matlab for linear algebra

‣ R for stats & visualization

‣ C for the fast stuff

‣ Ruby to tie it all together

My Data Science Stack circa 2009

Page 3: Julia + R for Data Science

What is the “Two Language Problem”?

a.k.a. “Ousterholt’s dichotomy”

“systems languages” “scripting languages”static dynamic

compiled interpreteduser types standard types

fast slowhard easy

Page 4: Julia + R for Data Science

What is the “Two Language Problem”?

Because of this dichotomy, a two-tier compromise is standard:

‣ for convenience, use a scripting language (Python, R, Matlab)

‣ but do all the hard stuff in a systems language (C, C++, Fortran)

Pragmatic for many applications, but has drawbacks

‣ aren’t the hard parts exactly where you need an easier language?

‣ forces vectorization everywhere, even when awkward or wasteful

‣ creates a social barrier – a wall between users and developers

Page 5: Julia + R for Data Science

The Julian Unification

Ousterholt’s dichotomy

static dynamic

compiled interpreted

user types standard types

hard easy

fast slow

Page 6: Julia + R for Data Science

The Julian Unification

dynamic

compiled

user types standard types

Julia

and

easy

fast

Page 7: Julia + R for Data Science

Speedtim

e re

lativ

e to

C

Page 8: Julia + R for Data Science

Speed vs. Productivitytim

e re

lativ

e to

C

normalized lines of code

Julia

Page 9: Julia + R for Data Science

Rube Goldberg Revised

Here’s my data science stack today:

‣ Matlab Julia for linear algebra

‣ R Julia for stats & visualization

‣ C Julia for the fast stuff

‣ Ruby Julia to tie it all together

Page 10: Julia + R for Data Science

Why Try Julia?

‣ Because it’s fun!

some people enjoy trying out and learning new languages

‣ You’re in a world of pain with other tools

R / Python / Matlab isn’t fast enough for the work you doRcpp / Cython / C++ are unappealing, not productive enoughJulia is in the sweet spot for speed & high productivity

‣ For the fancy features

multiple dispatch, coroutine-based I/O, macros & metaprogramming,efficient custom types, advanced linear algebra, …