Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Kirill Eremenko: This is episode number 263 with founder at Kyso.io,
Eoin Murray.
Kirill Eremenko: Welcome to the SuperDataScience podcast. My name
is Kirill Eremenko, Data Science Coach and Lifestyle
Entrepreneur and each week we bring you inspiring
people and ideas to help you build your successful
career in data science. Thanks for being here today
and now let's make the complex simple.
Kirill Eremenko: This episode of the SuperDataScience podcast is
brought to you by our very own Data Science Insider.
The Data Science Insider is a weekly newsletter for
data scientists, which is designed specifically to help
you find out what have been the latest updates and
what is the most important news in the space of data
science, artificial intelligence and other technologies. It
is completely free and you can sign up at
superdatascience.com/dsi. And the way this works is
that, every week there's plenty of updates and
seemingly important information coming out in the
world of technology. But at the same time it is virtually
impossible for a single person, on a weekly basis, to go
through all this and find out what is actually really
relevant to a career of a data scientist and what is
actually very important. And that's why our team
curates the top five updates of the week, puts them
into an email and sends it to you.
Kirill Eremenko: So once you sign up for The Data Science Insider,
every single Friday you will receive this email in your
inbox. It doesn't spam your inbox it just arrives and
has a top five updates with brief descriptions. And
that's what I liked the most about it, the descriptions.
So you don't actually even have to read every single
article. So, our team has already read these articles for
you and put the summaries into the email, so you can
simply just read the updates in the email and be up to
speed in a matter of seconds.
Kirill Eremenko: And if you like a certain article, you can click on it and
read into it further. And so whether you want great
ideas that can be used to boost your next project, or
you're just curious about the latest news in
technology, The Data Science Insider is perfect for you.
So once again, you can sign up at
www.superdatascience.com/dsi. So make sure not to
miss this opportunity and sign up for The Data
Science Insider today. And that way you will join the
rest of our community and start receiving the most
important technology updates relevant to your career
already this week.
Kirill Eremenko: Welcome back to the SuperDataScience podcast ladies
and gentlemen, super excited to have you back here on
the show. And I literally just got off the phone with
Eoin Murray, who is one of the founders at Kyso.io.
Kyso.io is an amazing tool which you will love hearing
about. It's a platform where you can blog about your
data science projects using tools such as Jupyter
notebooks. So it really makes sharing of projects very
easy and creates a fantastic user experience for the
readers who are going to be reading your projects. And
this all ties in very well with the whole notion of
building your online presence and online portfolio in
order to progress your career forward and to impact
people, to help people and make a statement out there
in the world.
Kirill Eremenko: So I'm very excited about this product, not just the
podcast, but Kyso.io, I think it's a really cool thing and
in fact the base version is actually free, free forever as
you'll see on the website. So I'm sure you guys will love
checking it out. And what are we talked about on this
podcast is we started off with some very interesting
conversations about startups and how you can jump
into creating a startup, what accelerators are, what
angel investors are, what venture capital funds are,
what's Eoin's journey has been like in that process. So
this is his second company that he's found. He's a
serial entrepreneur. He's been through the Techstars
accelerator. He'll tell you all about what it was like
there. What mentor madness is, what you get out of
these experiences in the startup world. So if you are
interested in or even considering at some point, maybe
down in the future, to get into a startup or create a
startup, I think this will be very interesting to hear
about.
Kirill Eremenko: Then we talked about Kyso.io, the actual websites and
product that they've created and what it means for
data scientists and how it is actually so important to
communicate data science insights in a non complex
way and how Kyso facilitates that journey. I
recommend because I think Kyso has got a bright
future. It's like Github, but with a lot of additional
layers that make the experience really cool. Plus it has
integrations with Github anyway. So I think you'll find
it interesting. Kyso probably got a very bright future
ahead and you'll be one of the first people to hear
about it on a podcast. And finally at the end we talked
about Eoin's other interests. So Eoin is a really
interesting person. He used to do quantum computing,
he's worked on really cool projects. So we talked about
his view of where data science is going, what the
future's like, whether or not data science should be a
certified profession.
Kirill Eremenko: And he gave us an example of a project from his past
life dealing with the E. coli bacteria using lasers and
data science. So I think you'll find that interesting. On
that note can't wait for you to check out this podcast.
And without further ado, I bring to you the founder at
Kyso.io, Eoin Murray.
Kirill Eremenko: Welcome back to the SuperDataScience podcast ladies
and gentlemen. Super excited to have you on this
show today because I've got a very exciting and
interesting guest calling in all the way from Valencia,
Spain, Eoin Murray. Eion, how are you going today?
Eoin Murray: I'm brilliant Kirill. Thanks for having me on this show.
Kirill Eremenko: It's my pleasure. I've heard a bit about your work and
we were introduced by Raul Popa who's been on the
podcast before, so I'm very excited about the things
we're going to talk about. How did you end up in
Valencia? I've never asked you this. Like you're from
Ireland, what are you doing in Valencia?
Eoin Murray: Oh, cool. So my co-founder, Elena is Spanish. And we
actually founded Kyso in Andalusia in Spain. And then
we moved to New York for a bit to do a Techstars New
York City. That's where I met Raul, who was on the
show with typing DNA. So Techstars is a program
where, if you're starting a startup, they will give you
some investment and tons of advice. And you go in
and you grow really fast and then you maybe raise
some more money from investors, and then you either
stay in New York or you go back to wherever you are
based previously. So we came to Valencia because it's
a great place to live. Its next to the beach and the
Internet connection is outstanding. And yeah, it's a
really good place to start a company.
Kirill Eremenko: Got you. Did you, by the way, like I was learning
Spanish a couple of weeks ago and I noticed that they
don't pronounce the letter V. So for them, Valencia is
the same as Valencia. Do you hear that?
Eoin Murray: Yeah. It can be confusing sometimes. And then you
have different regions of Spain have quite different
Spanish. So in Barcelona, they'll say Barcelona, but
then Andalusia they'll say, Barcelona.
Kirill Eremenko: Barcelona, yeah. Yeah. It's a Catalan versus of, what's
the other one?
Eoin Murray: The Castilian.
Kirill Eremenko: Castilian.
Eoin Murray: Is the Spanish that you maybe call it Spanish.
Kirill Eremenko: Yeah. Got you. So Techstars, that's... First of all,
congratulations. That's really cool. We'll talk about a
Kyso in a second, but Techstars, just so I understand
that better. So there's angel investors and there's
venture capital funds, like angel investors come
earlier, venture capital funds come later. Where is
Techstars or, you mentioned before the podcast, it's
similar to Y Combinator, where do those types of
companies sit? Close to angel investors or venture
capitalist?
Eoin Murray: So Techstars would typically be your first investment
or very, very close to your first investment. So when we
did Techstars, there was 12 companies in our batch.
So they do the program twice a year in many different
cities around the world. Manage programs, then they'll
do it twice and they'll have, maybe 12, 11 to 13
companies in each batch. And when we went there, we
were very early stage, so we didn't have revenue. We
don't really start product. There was a few companies
in the batch who actually hadn't even started building
their product by the time they got in. However, there
was some companies who were doing, like half a
million in revenue so far that year. So there's a mix.
But it's like they're typically very early. I think the
traditional thing is you come in to Techstars when
you've released your product maybe and you're ready
to grow it really fast and they'll give you tons of
support to grow it really fast.
Eoin Murray: And then at the end of the program, after the end of
the three month program, there's a demo day or an
investor week where they'll sit you down with 30 or 40
venture capitalists and angel investors and you try to
raise more money.
Kirill Eremenko: So they come even before the angel investors.
Eoin Murray: Yeah. Well, yeah, roughly speaking. As always. I mean,
every company is unique. Everybody has a unique
story behind it.
Kirill Eremenko: So, and was it that hard to get into Techstars? It was
like the [mental 00:10:06] prerequisites or screening
difficult process?
Eoin Murray: It's quite a selective program. I think for ours it's
maybe 12 companies out of 1500 applications or
something, but there's a lot of other accelerators on
the world. So anybody who's listening, who's interested
in startups, it's a pretty good way to get your startup
off the ground. Especially if you're thinking of starting
a startup. Maybe you have a job and you're thinking
this is something you might enjoy doing. Accelerators
are a really good way to de risk the idea for yourself.
Techstars is a really good one. It's a very famous one.
It was hard to get into. We were lucky because both
myself and my co founder are technical, so we can
code and we've had experience in data science where
that's Kyso's area and we had also started a company
before and raised money for a company before. So I
think that gave us a bit of an edge up.
Kirill Eremenko: So like they could see, you know what you're doing?
Eoin Murray: Yeah. Yeah. But I mean in general accelerators are a
really good way for anybody to kind of, even the
interview process helps you refine your idea and let
you know if you actually want to pursue something. So
if anyone's like listening, I would definitely say like if
you have an interest in a startup, even I'm from a
small city in Ireland. Ireland has a population of 4
million people and I think there's like 12 or 15
accelerators in the country now, that you can apply to.
And then there might be a country nearby you so, if
you're in EU, there's plenty of accelerators you can
apply to. So you just chat to loads of people and see if
you get into one.
Kirill Eremenko: How come you went to the one in New York City then?
Eoin Murray: We got into a carpool around Europe and even one in
Hong Kong. Alex Iskold was the guy who ran that
program, and he was just really, really helpful even in
the interview process. And he adheres, he liked strong
technical skills. So he knew what we were about. It
depends on the person, each accelerator is very
unique. So Techstars even runs money programs, but
depending on who is the specific team in your program
will completely change your experience that you have,
so we were just like drawn to the program Alex had set
up and that worked for us.
Kirill Eremenko: Interesting. And so once you get in and then you get
there, is it like a several week process? What is the
program? How has the program is structured?
Eoin Murray: Yeah, so I think, each person or each MD or managing
director of each program will have a specific flavor. So
I know for example, Techstars and Y Combinator, have
a quite different philosophy. So Y Combinator takes in
about a hundred companies into a batch and they
basically say, "Come in and talk to us once a week.
But other than that you should be living and working
in your flat, coding and building product every day."
Techstars is a little different. So what they do is when
you come in, they do like what they call, mentor
madness. So it's like two week process where, they will
find many, many experienced venture capitalists,
experienced founders, experienced product people or
experts in various sectors and sit you down and you
have like half an hour meeting with about five people a
day.
Eoin Murray: And then you pitch your idea and then they all give
you feedback. And they do that in the first two weeks
and you definitely, after the first two weeks, we'll have
either refined or changed your idea a little bit and then
you do maybe, and then that's the first two weeks of
three months. Then the rest of the program is basically
you set a weekly target and you do whatever you need
to hit that target. That can be building product, they
can be doing sales calls, they can go meet the
customers. And you do that to the end of the program.
And then the last two weeks is trying to raise more
money. And there's a lot of workshops along the way.
And then there's like you have a meeting with mentors
every week to kind of help you solve whatever specific
problem you're facing right now.
Kirill Eremenko: Gotcha. And all their requests in return is a share of
your product.
Eoin Murray: Yeah. So a Techstars is like, they give you about a
hundred thousand dollars of investment. And then for
that investment plus the program, they take about 8%
of your business.
Kirill Eremenko: Oh, okay. Well that's not too bad at all. Good. I think
that's pretty fair.
Eoin Murray: I mean, if you think of your coming to the program
with a certain valuation and you leave with a higher
one, you've already personally made money by the end
of the program. If you think of that valuation is.
Kirill Eremenko: Yeah. But like as you say, the connections you make
and the learning you experience throughout the
process is invaluable.
Eoin Murray: I mean, it's ridiculous how much you gain. Even
personally.
Kirill Eremenko: Yeah. No wonder there's so many applications, 1200
and only 12 or something that get in. That's crazy.
Crazy one out of 100 makes it.
Eoin Murray: We were quite lucky.
Kirill Eremenko: What would you say contributed to this success of
getting through? Was it like you knowing somebody or
something about your idea or your application?
Eoin Murray: Oh yeah. This is actually a funny one because it was
actually at my first startup, which I started in the UK
and I was trying to scramble. So at that point I really
didn't know what it was doing but I would take a
meeting with anybody and I think that that was the
right approach. And I ended up basically like trying,
when I was trying to raise money for my first company.
It's quite John Bradford in the UK who actually
previously ran Techstars London. I got onto him and
he was trying to give me advice, funny giving
instruction in the UK, none of which panned out in
[inaudible 00:16:11], a lot of money for that. But then
later on he gave me another connection who then gave
me another connection to then put me in touch with
Alex Iskold, really kind of like, and when I first met the
guys, I wasn't thinking of how this will pan out almost
two years later that I'd be able to follow a network root
to Alex who then led us into Techstars.
Eoin Murray: And another point I think is important to make is we
applied really, really early. Maybe Alex was running
maybe a two month application process, I'd say we
spoke to him about two weeks before he really started
doing that. And that helped us get in because, him
and the team were not talking to too many other
companies at that point. There's still the open spaces,
maybe if we had applied in the last week of the
application window, it would have been a lot harder.
Kirill Eremenko: Got you. What is very impressive to me is that you
mentioned you not only got into that New York, NYC
chapter, you got into a couple of other ones in Europe,
are they all linked? Or [crosstalk 00:17:21]
Eoin Murray: No, no, it was just other different on connected
accelerators. So basically myself and Elena we're, my
co-funder, we're based in Spain running out of
whatever little money we had, we were funding
ourselves with, and we needed to raise money. So we
applied to everything and got into some things and
then chose Techstars.
Kirill Eremenko: Got you. Understood. Okay. Wow. Well thank you very
much for the rundown. I'm sure if anybody's looking to
get into startup now, they're very well equipped with
the whole process that accelerators follow, how to get
in on that.
Kirill Eremenko: And on that note, tell us about Kyso. Like I think
there's so much anticipation built up now. You got to
tell us what this idea is and guys listen up this is
pretty crazy. It's really data science related, relevant.
And I'm like really sure a lot of you are going to be
using this after this podcast. So please Eoin take it
away.
Eoin Murray: So very, very simply, Kyso is a place where you can
blog your data science. So if you have a chart that you
want to share or a dataset, or you want to write an
article, a data journalism article, you can post all of
this to Kyso. So it's like Medium, but we want to focus
on data science. And to make that even easier for data
scientists, is we actually support a lot of the data
science tools. So for example, Jupyter notebooks or
Markdown notebooks. So what that means is that, so
with Jupyter notebook is like a really, really common
data science tool where it's an interactive coding
environment where you type code into a cell, you
evaluate that code and the results appear to you live in
the document. So this is super useful if you're
visualizing data.
Eoin Murray: So even if you're making a line chart, you just type in
the code, evaluate the cell, the chart appears inside
the documents. I used to work with these so much in
my past career. And there were a bit little difficult to
share. So you can share them for example, on Github.
But then they look like this kind of technical
document where the code and like any terminal output
is all visible. What we do in Kyso is we just hide the
code by default. Now you can click a button to see it
again, but you'd basically upload your hardcore data
science document upload it to Kyso and it just looks
like a blog post. So it means, why its so useful is
because you can be writing a technical document and
then you can trivially share it with a non technical
audience without needing to do any extra work.
Kirill Eremenko: That is really cool. And for those listening who, if
you've taken our Python A-Z Course, that whole
course is done in Jupyter notebooks. And in fact,
Jupyter notebooks is a very powerful tool. It's like
some of the big companies like Google, Facebook and
so on, use Jupyter notebooks for some of their work.
And you can do end to end even deep learning and AI
in Jupyter notebooks. So if you haven't heard of
Jupyter notebooks then definitely check it out. It's a
really cool place where you can not only just code,
what I like about it is that not only just code, but you
along the way can write comments, can annotate
things and what's Eoin and the team at Kyso have
created is that you just like upload your Jupyter
notebook and it renders really beautifully into
something that people can read and the user
experience is really cool.
Eoin Murray: One of the one things I guess when I was learning
python and data science in the beginning found super
useful because at this point where you type into a cell
and then like you type code into one box and evaluate
that and it just really allows you to interactively play
with your code. You know what I mean? And you learn
a lot faster and a lot more because, and you can do
super cool things. Like if you tab, is it command tab
when you're on, say if I'm using pandas and I go
pandas.dataframe and I'm like, what are the docs for
data frame? What's the order of the arguments? I can
either Google that or I can actually like do command
tab and it just like, a little pop up appears with all the
documentation for that specific function. It's just
really, really helpful way to get started in data science.
Eoin Murray: And then it's cool because it's actually still the tool you
will use when you're an expert in data science, when
you're doing it day to day.
Kirill Eremenko: How'd you come up with this idea?
Eoin Murray: So, in a past life I worked in science. So I used to work
as a quantum computing researcher in Ireland and
then in the UK. And basically the workflow that we
had was, we would design the chip, then bring it to the
lab. So these chips were interesting because a typical
computer trip runs on electricity. This would run on
light. So we would use optical fibers and plugged light
into these chips. And then we would measure the
spectrum or various pieces of data about these chips.
And then maybe me and other people on the team
would take the data and have to process the data,
maybe make a track of the spectrum, track of the
temperature, see how it's working, and then share
those tracks with the rest of the team, so that then we
could like analyze yesterday's experiment to design a
new chip for next week. Does that make sense? And
we played with a lot of tools, so I mean, you can
always import your data into excel. But that quickly
just wasn't quite powerful enough for all of the
customized analysis that we needed to do.
Eoin Murray: So we stumbled upon Jupyter notebooks. And it's
such an amazing tool for this where you can write your
comments, you can format the document, you can
have all of your plots and tracks in the document. But
we just found them a little difficult to share and a little
difficult to reuse. So if for example, if we're
collaborating on some projects and I'm doing a
notebook today and then next week you want to use it,
I mean you can use Github and it's currently, that's
currently a good way to reuse them. But maybe if you
want to take a snippet and you need to be able to
discover and see and read my documents or my
notebooks before you'll know exactly what you want to
reuse. So we found that a little difficult.
Eoin Murray: And then I went to the UK. I was on a big team there
and we had similar problems. So it was always in the
back of my head. I wanted to do something around
making these Jupyter notebooks easier to share. And
just in general, making it easier to communicate data
science, because that's what these Jupyter notebooks
are, they're communication tools. Which is the most
important part of data science in my opinion. So like,
you know the phrase, if a tree falls in the forest and
nobody is around to hear it, does it even make a
sound? It's exact same thing. If you gain an insight
from data and you don't tell anybody, did you even
gain that insight? Did it even matter?
Kirill Eremenko: True.
Eoin Murray: Communication is the key point. And that's why this
technology is really useful.
Kirill Eremenko: Okay. Got you. And so would you say that that's the
main difference between Github and Kyso, that you
can actually, as opposed to like forking a whole
repository on Github, you can just read through the
document, the Jupyter notebook on Kyso and select
the elements that you want or are there any other
differences?
Eoin Murray: The big one is that you can choose to show and hide
the code for the Jupyter Notebook. So what that
means is that, I can be writing an extraordinarily deep
document with highly technical code about how to
process a piece of data. But then if I write my
comments properly and my output graphics look really
nice, when I upload it to Kyso showing the code is
optional. So if you have the code hidden, the Jupyter
notebook just looks like a blog post, it's just texts,
graphs, more texts, more graphs, so you can read it.
So a nontechnical person can come along and read it
depending on what the comments you've written are.
But if someone technical comes along and they see a
graph or they see a technique that they really like
because of how you've explained it, they can just click
a button and show the code and it'll show them the
code, let's say, generated that graph or did that piece
of processing. Does that make sense?
Kirill Eremenko: Yeah. Very cool. So it's almost like a conspiracy.
Somebody might end up on Kyso by accident and it
looks like a regular blogging platform, but it's in
reality, it's data scientists just having fun.
Eoin Murray: Actually that's something that we were surprised by
and we've actually had to work on. So, in the
beginning, data scientists were coming to Kyso and
they were like, "This really interesting article," and we
will be like, "Did you know what's actually Jupyter
Notebook?" They're like, "Whoa, no way." Because it
wasn't obvious enough. It just looked too like a blog
post.
Kirill Eremenko: Yeah. That's very cool. So, what I really wanted to say
is, I really like this idea for enabling people to build
their online portfolios and presence. For me, this has
been, people come and ask questions, how do I build a
career in data science? How do I advance my career?
How do I get a promotion? How do I break into this
field? And my answer is always, "What is your online
presence? Do you have projects that you've shared?
Have you gone and published in a tableau public
workbook? Do you have code on Github? Do you have
articles on Medium? Do you have articles on Linkedin?
What are you doing to share this knowledge, to show
people out there that you are capable and the projects
that you're working on? Have you done Kaggle
competitions?" And like Kyso in that sense, the way I
see it, is an ideal place to go and share those projects
that you're working on in your free time. In order to
just have that portfolio, first of all, other people can
learn from you and ask you questions and you can
explain things and learn it even better.
Kirill Eremenko: But on the other hand as well, so that either recruiters
or employers or your employer or your manager,
people can actually see that you are an expert in this
field and you're not afraid to position yourself up as
one or you're learning and you're going to be an expert.
Basically. They can see the passion of you putting time
and effort into this. And that speaks a lot, like with
data science becoming so popular on your side of the
salaries going through the roof, there's a lot of people
who want to get in, but the people that make the best
data scientists are the ones that are actually
passionate about the field, that we're not just like
talking about it. And one way to demonstrate it is
through something like Kyso.io. So, I just want to
thank you on behalf of our audience that you're
enabling this movement and people to share their work
like that.
Eoin Murray: Yeah, no, Kirill. I really agree with that and I think
that actually is like a secret weapon that data
scientists have is that, and this is really something we
want to drive home is that, because here at Kyso you
can share with a nontechnical audience. And what
we've noticed actually is that a lot of the content
shared on Kyso is very conversational, right? So, if you
have a really nice Linkedin profile, you might get a
message from a recruiter who will then put you in
touch with the technical recruiter at a company, for
example. And the first recruiter might not be a
technical person. Right? And then if they're looking at
your Github profile and everything you've published
looks very technical and cody, it's hard for them to
pass it, whereas with these kinds of notebooks that we
see people publishing on Kyso, they're very
conversational.
Eoin Murray: So one study is actually someone who's used the
Github API, to measure and then predict the future of
the number of Jupyter notebooks on Github, or it's
things like looking at the GDP of countries versus their
democracy index. So seeing how democratic they are,
things like looking at GDP per capita versus the Gini
Coefficient. So this is all lots of stuff about climate
change. How much tons of CO2 per year are going into
the atmosphere for different countries? And how was
your country doing? And it's very conversational work.
So you actually, you kind of as a secret weapon I think
that data scientists have over other technical fields, is
that if you do it right, everybody can read your work,
not just other programmers. Does that make sense?
Kirill Eremenko: Yup. Yup. From my perspective, as you say, secret
weapon, that's a really the most valuable data
scientists are the ones who can bridge the gap between
technical insights and the nontechnical business
decision makers. And what I'm getting from your
description of Kyso is that you can get into the habit of
practicing speaking your insights in a nontechnical
way or in a conversational way. And I think it's a very
important soft skill that a lot of data scientists miss
out on and that but should be focusing on developing.
Because for me in my career, I'm by far nowhere near
the top data scientists in the world, but at the same
time, I find I can actually explain complex things in a
simple manner. And that's what helps me get ahead.
And I wish that to as many people as possible. So if
you can practice that in a setting like this, I think
that's a really cool thing.
Eoin Murray: And I definitely agree with that point because I really
think it is a learned skill. It's not that you just wake
up someday as a good communicator, it's practice. We
publish a lot of fun studies on Kyso and in the
beginning they would, I don't know, 500 people would
read them if I post them on Reddit, that is beautiful.
And then we learned about how to make the graphs
nicer to look at, more interesting and simple to look at
because people will comment and they're like, "I don't
understand this or I don't like this." And you just get
better at like, have picking a proper title, the proper
amount of description, not too much to make it way
too detailed and a little bit dull, not too little that
there's nothing to bite on.
Eoin Murray: Having the right amount of graphs in a report for
example, maybe you should, we kind of, it's like
between one and three. Makes a lot of sense and you'll
get a lot of readers as we learn, we actually ourselves
have learned this skill in the last year of Kyso, where
in the beginning you're only getting 500 people reading
it and now you get 25000 people reading an article.
And it's just like you posted in the same place,
actually, this is maybe something that your listeners
might find useful. So we have to learn ourselves in the
beginning, like, "How do I actually share?" Because if
I'm on Linkedin and I have a hundred connections, I'm
on Twitter and I have a hundred people following me. I
can host my report or my article on Kyso but how do I
go about actually getting people to read it? How do I go
from maybe a hundred followers to lots more?
Eoin Murray: And what we've learned actually is Reddit, the sub
reddit data is beautiful. It has 13 million people
reading it. And it gets about 25 to 30 submissions a
day. We've noticed that if something is good and
people would come along and comment on it, that a lot
of people will actually read it. On Kyso you can see the
amount of views, you get as well still get some
analytics about your post. So if people are like, people
are listening and they're figuring out a place to
suppose to work to get readers, data is beautiful as a
really good one. And the hacker news is obviously good
one as well. It's a bit more hit and miss. But out of
every five posts you publish, maybe one will hit the
front page and then you'll get a lot of readers and that.
Eoin Murray: And one thing we've noticed as well, is that like if you
rank high on data is beautiful or hacker news or data
science or like as well, the point to make is that if say
somebody is to read it, it's pick a topic where your
graph is interesting. So if you like write an economics
article, you look at the wealth per health household of
lots of different countries, right? Postdocs or
economics and we've noticed that if it's a good thing
it'll get ranked highly and people will share it in other
places, and before you know it your post is cascaded
onto like, then there's like a hundred people tweeting
about it. It's on hacker news as well, someone else has
posted about it.
Eoin Murray: So that might be something that your listeners might
be interested in. If they're thinking of how to build a
portfolio it's just like, write about six or seven articles
and then just post them to like about four different
places. Don't do too much, don't be spammy. But if
you do that every now and then, maybe you're
publishing an article every week or two weeks and you
do the four steps for each article, you're definitely
going to start getting readers.
Kirill Eremenko: Got you. Got you. Very cool. I wanted to ask you on
the flip side, let's say, to your point earlier, when I'm
inside an organization, I'm a data scientist, and I'm
working on a project or our team is working on a
project and we know that we will probably need to
replicate this on a monthly basis, but with some
alterations and some new changes, developments and
so on. Can I use Kyso? Is it safe to upload projects
with company's specific information with maybe
sensitive data and things like that, because of course
it's valuable in the public side. But what about inside
a company?
Eoin Murray: Yeah. Yeah. Cool. So maybe there's two points there.
I'll just reference the one about reusing work. So in
Kyso you can fork everything so for example, I want to
look at, if you have a study about the carbon
emissions of Germany for last year and I'm like, that's
amazing, I want to see that for my country Ireland, I
can press the fork button, I can actually open that up.
And a point to make is we recently launched it, so
actually you can open up a Jupyter notebook server on
Kyso so you can actually play with the code or you can
download the notebook and run Jupyter notebook
locally and then publish it. But you can download an
existing study, swap the data in for say Ireland versus
Germany and just republish that.
Eoin Murray: And the fork is track. So it's really, really cool way to
reuse work. So that people can expand and extend
each other's work and remix stuff. And then to your
point about internal, yeah, so about a month ago we
launched Kyso for teams, which is basically the full
price of stack, ring fenced only to a private
environment for teams, where you can share sensitive
graphs and stuff that you don't want the public to read
obviously. And you can have permissions controls. So
for example, I can make a team on Kyso, and then I
can add other editors. So these are people who are
allowed to publish to that team's scope. And then I can
add viewers and the viewer permissions being people
are only allowed to read stuff and comment on stuff
and they're not allowed to have submissions.
Eoin Murray: So this is useful than if you're just trying to, maybe
run a reviewing process, where there's a limited
amount. So some people want everybody to be able to
post everything. Some people want to restrict that.
Some people want to review work so that Kyso acts as
an internal journal as opposed to like a blog where you
post everything. So yeah, it's completely suitable for
that purpose. And we've companies now are using it a
lot and it seems it's really, really useful.
Kirill Eremenko: Got you. And I'm just looking, so definitely that's a
very, very valuable feature. And is a corporate
subscription type of offering. And what I want to talk
to, I'm just looking through the Kyso.io, can you help
me out. How do I, let's say you mentioned like the
German study, is there like a search button where I
can search for a specific study that I'm after because I
don't seem to see where to do that.
Eoin Murray: Oh yeah. So right now we have tags and we get a lot of
questions about that. Our search functionality is
coming really, really soon. We've been working on it.
And they'll be a big search bar there where you can get
everything on our to do list.
Kirill Eremenko: When did you start Kyso.io?
Eoin Murray: So we started it about a year and a half ago. But we
did a big pivot, about six months ago, which is why
you see now as the current iteration.
Kirill Eremenko: Well, it's very impressive for something that's only a
year old. It's really cool. So yeah, for listeners, if you're
interested, it's Kyso, K-Y-S-O.io. By the way, with,
where does the name come from?
Eoin Murray: So we used to play this game. We used to ask like
investors or just anybody who would ask us, I'd be
like, "Look, I'll give you 10 points or I'll buy you a beer.
If you could tell me what Kyso means." And people
would spend a month googling and trying to figure it
out. And it doesn't mean anything. It's a four letter
domain name that we were able to buy narrowly. And
it sounds kind of catchy. Also in the very, very
beginning Kyso was, we started out as a command line
tool, to turn on and turn off Jupyter notebook servers
on AWS. And because it started as a command line
tool, we wanted the command line to have the same
name as the website, like Gifs or some of those. So we
really wanted to have a four letter word or even three,
but that's impossible.
Eoin Murray: And it had to be easy to type as well. I don't know how
to say it, but there's a flow sometimes when you're
typing a word all the time, you want to be able to
maybe typing with one hand or you don't want the
letters to be too, like you don't want A and P and Q
and M or something, they're too far away on the
keyboard.
Kirill Eremenko: Yeah. By the way, that's a really cool tip for it was
speaking of startups and people wanting to get into the
space. Like that's the same approach I'd take when
you were starting a new business and first thing you
do is you go and check for the domain name and then
from what's available, then you pick out the name of
your business pretty much. That's because the domain
name is important, right? Has to be memorable.
Eoin Murray: Yeah. I mean I think as well, if it's a tool, you have to
have the name tied to the tool.
Kirill Eremenko: Yeah, true. Okay, cool. So that's Kyso.io, everybody
who's interested make sure to check it out. Upload
your projects there and Eoin tell us a bit more about
yourself. Like you've got a really cool, interesting
background, not to even mention the quantum
computing with lights and things like that that you've
done, you're a serial entrepreneur and things like that.
What are some of the other things that you're
interested in these days?
Eoin Murray: So one thing I think is very interesting is to think
about the evolution of data science as a subject. Not
enough necessarily as an industry where you process
data and present it at work, and make decisions there,
but how it will, I think influence the wider way a
society is processing information. So, a few years ago,
right, before you had a smartphone and Wikipedia,
you could be at a bar with a friend and you'd start
arguing about some trivial fact and your friend has a
different opinion.
Eoin Murray: "What's the population of France?" Right? And you'd
be like, "20 million, but like it's 120 million." And we'd
go on for ages and only like the next day would we
actually be able to check, right? And people don't
really have these kinds of discussions anymore
because you'll just Google it. Right. So what happens
there is that kind of discussion now is that like single
point facts are trivial to check. And that's changed the
types of discussions you'll have with people. And I
think what data science might do is like the same
thing, but for more like multidimensional facts. Does
that make sense? So it's before, "What's the population
of France?" Now it's like, "How is the population of
France changed in the last few years? And how it's
going to evolve in the future?" Or a question would
become like, "How's the population of France change
and how its demographics shifted?" Or "How has that
that population change related to its economic growth
performance for the last few years? And these are
going to be things that are just more widely known by
people. Does that make sense?
Kirill Eremenko: Do you think there'll be in part also enabled by
assistance like the Google assistant and an Alexa and
so on where they just can do those predictions for you.
Eoin Murray: Yeah. I think that's going to happen. Like right now, if
you'll check like a single factor in Wikipedia, soon
enough we'll be getting charts and graphs and under
discussion we'll change towards having more and
multi dimensional view of things. And I definitely think
it's going to be parked on. People will demand this
kind of stuff because data journalism is exploding
where you don't have to go interview politicians or go
out into the field to discover something. You can just
process data that exists. So the discussion even in the
news today, you see more and more charts posted.
Eoin Murray: And I definitely think, I think Siri and stuff you'll be
asking, if was paper published in France, it won't tell
you a number. It'll show you a graph for the last five
years.
Kirill Eremenko: Got you. And the other thing.
Eoin Murray: Oh, I think what's very interesting topic for discussion,
and I'm nowhere near an expert on this, but it's like
the ethics of data science and AI. How they're going to
be going forward.
Kirill Eremenko: Okay. So what are your thoughts? How are they going
to be going forward?
Eoin Murray: So I think it's very hard question. So, what's that
show, is it Little Britain? Where you're trying to get
your driver's license and the person behind the desk
just says like, "The computer says, no."
Kirill Eremenko: No, I haven't seen that.
Eoin Murray: Computer Says No. Because sometimes I think that
like this tendency of people to think that the computer
is like an objective system that gives you like an
objectively correct answer about something. Right?
Whereas an actual effect, a computer or an AI system,
it just like reflects the biases or the input it was given
or the decision making capability it was given. Right?
So you see that an AI it can be very biased towards
and against certain groups of people or certain types of
behavior. Does that make sense? And I think you see
world governments all over the place now saying
because a big issue with neural nets is how knowable
it is, right? So maybe there's a nightclub and instead
of having a bouncer, it has a facial recognition. And
then it doesn't let me in. Right now it's very hard to
ask a neural net why you didn't let Eoin get into the
nightclub.
Eoin Murray: And I think making that transparent and knowing why
the AI made that decision and then like being able to
ask you to try and make a decision again or be able to
like escalate your problems until you're talking to a
human, it's something that's very, very important.
Eoin Murray: You imagine this is an important thing to have. And I
think I'm a bit worried. I think some people are
worried that we're actually going to have this system
where we just let the AI make all the decisions and
there's no transparency into it or able to escalate, to
like petition a change in that. Because I think it's an
amazing technology, but we have to remember how it's
implemented and understand how it's implemented,
how it affects different groups of people.
Kirill Eremenko: That's a whole discussion about interpretable AI. On
one hand you can make AI more interpretable, you
minimize that problem on that, but at the same time
you lose inefficiency, right? Like, the less interpretable
it is, the less there's restrictions and boundaries for
what can be inside in terms of implementation.
Kirill Eremenko: And that just means a more variety, more
opportunities for artificial intelligence. Yeah, it's a
tough topic at the moment. Right?
Eoin Murray: Yeah. It's like if you're learning data science, you have
to learn the skill, but also like in a part of like the
philosophy around it.
Kirill Eremenko: Yeah. Got you.
Eoin Murray: Sometimes you can have this thing where you think
you're going to make a vision system to analyze cancer
data and it could get used in a weapon and maybe how
to [inaudible 00:47:49] with that. Or maybe you
[inaudible 00:47:50], I don't know.
Kirill Eremenko: Yeah. Got you. What's your stand on data science
being a certified profession? For instance,
accountants, they have the chartered accountants or
in finance they have certain exams that they need to
pass, lawyers need to be certified in order to practice
law, that's like, probably the clearest example is, you
cannot be somebody whose lawyer unless, especially
in certain circumstances unless you have a
certification or yet to get bearish. What do you think
should data scientists and people who develop AI,
should they be required to have certifications?
Eoin Murray: I'm not sure, like on the question of should it or
shouldn't it? I'm not sure on the question of will it? I
don't think so for the simple fact that I think in like it's
going to become like a skill that everybody has in 10 or
15 years. You know what I mean? So it's to restrict it
in that way, I don't think it'd be feasible because I
think you're going to have, now we're seeing at the
forefront of people in education of data science. Right?
And then, you're helping people get into the industry. I
think in 10, 15 years, like it's going to go from maybe
there's six or seven million people today learning data
science, it's going to be 120, 130 million people in 10
years. It's going to be very hard to implement some
regulations or certification system around that, you
know?
Kirill Eremenko: Yeah. I see what you mean. Actually this question
popped in my head. Like now being an entrepreneur
and having started a business or your second
business now. You mentioned you were back in the
day, in another life, you're doing quantum computing
and data science as I imagine, do you miss it? Do you
miss being in the field and actually doing data science
as opposed to entrepreneuring?
Eoin Murray: Yeah, yeah, I do. Sometimes it comes into my head
like, "Oh, there was this beauty around that." That I
would have some data I can't explain and I would have
to read a book about how to simulate some system
and another book about how to like actually do the
data science of that simulation and I would then apply
it. But then what motivates that that was a very
beautiful scientific process and it's very satisfying to
do that. When you see that you've built a model and
nine times out of 10 it doesn't work. But when it works
you're like, "Oh my God, that's amazing." That's such a
great feeling.
Eoin Murray: What motivates me now though is that I think that, it's
I was one scientist, if I can make, if I can make a
thousand scientists 5% more efficient in the way they
work, the overall impact is just so big. But there's
definitely something beautiful and satisfying about
how when you have a lot of data coming in and you
process it and you analyze it and you then finally fully
understand it. It's like you've taken this mess and
ordered that system in a way that you can understand
it and then give that understanding to other people.
That's a very, very satisfying process.
Kirill Eremenko: Okay, cool. Do you have any examples from your past
life of interesting projects that you might be able to
share with us?
Eoin Murray: Yeah. So there's one project I was advising on, which
was using micro fluidics and photonix to try to identify
contaminants in water. So E.coli and other bacteria
like legionnaires for example. And what we did was
there was a cracked ship with new pipes in it and we
would contaminate some water with gloves obviously,
and we will put the water through these little pipes
and we'd shine a laser at it. And then depending on
the... So every bacteria, so the laser will hit the
bacteria and it would reflect, and you'd measure the
reflection in a spectrometer, so you'd get a histogram
of the wavelength of the light versus its intensity. And
every piece of bacteria had a very specific spectrum.
Like it was a unique identifier, right? We wanted to
come up with an automated classifier so a robot could
tell you what it is.
Eoin Murray: I'm 90% sure this is E.coli versus legionnaires, you
needed to know the specific bacteria, not just the
existence of bacteria. So we used a support vector
machines too, so basically we just did lots of repeated
tests of taking spectrum on the two different bacteria
or the fewer the five or six, and then use a support
vector classifiers to be able to run them through a
model. And then it would just tell you what it thinks it
is. And you know, when we started the project, we
were only getting 50% probably success rates, which is
not great because it's effectively random. And then
after about six months of just tweaking the way the
data was processed, we couldn't exactly learn support
vector machine algorithm. So we actually just ended
up like a log 10 times formation, made it really, really
accurate.
Eoin Murray: We were getting up to like 99%. So that meant that
basically it was the beginnings of a system where you
could run water through a pipe, shine a laser at it,
gather the latest laser spectrum, and it would be able
to tell you if there was bacteria present in that water
and then within a group what kind of bacteria that
was.
Kirill Eremenko: Wow. That's very cool. Did that get implemented
anywhere?
Eoin Murray: It was a research project, but, and this is about four
years ago we did this and I think they're doing some
small field trials now. I mean, there's a lot of work in
getting all of that system package.
Kirill Eremenko: What I find interesting about this is that you, well first
of all like why did you select SVM? What was the
decision for that, if you remember? And the other one
was like, you selected SVM, you got a 50% accuracy,
but you're still stuck with support vector machine
rather than switching to a different model and you got
the end result that you wanted. So just curious about
thinking behind that.
Eoin Murray: I can't remember specifically why we chose it. I think
there was like a team standard of using that library.
So I inherited that a little bit and the 50% was
basically, I think it was, so a lot of it was to do with
how the data was prepared before it went in. So what
it was, was that the signals were so similar in intensity
after being normalized that there was like you have a
lot of these different peaks in the histogram and that
basically there was maybe 45 unique indicators of a
bacteria, but then there was only two or three which
would tell you between two different bacteria. So you
had to like amplify that difference, which is why a log
term transformation would do. It would make that look
bigger.
Eoin Murray: So like multiply everything by a billion or something
and you are different areas and you'd see that because
it's like yeah. I think that's basically it. It's basically
that the difference was quite small between the
identifiers that you had to somehow make the space
between them bigger to separate them out.
Kirill Eremenko: And the log 10 transformation did the trick?
Eoin Murray: Yeah.
Kirill Eremenko: Yeah. Got you.
Kirill Eremenko: Okay. Well, interesting project. Hopefully that rolls out
and helps people in their lives. Well on that note, that
actually brings us to towards the end of this podcast.
Really cool to hear your insights. And of course, the
work that you guys are doing at Kyso.io. Could you
share some links [inaudible 00:56:30] where they can
get in touch, follow you, maybe ask you some
questions and just see where your career takes you.
Eoin Murray: Yeah. Super. So, I mean, Kyso.io is Kyso, K-Y-S-O.io.
Anybody wants to ask me a specific question? You can
get me by my email. I'll respond pretty quickly.
[email protected]. And I'm also on Twitter that's Eo_in and
I love when people send me datasets and I see if I can
visualize them. Some of your fun thing.
Kirill Eremenko: All right. Be careful what you wish for you'll get like
10000 datasets after this podcast.
Eoin Murray: I would then, I can do an interesting study on, come
on this podcast and then what kind of data sets gets
send to me. I can tell you a lot about your listeners.
Kirill Eremenko: Oh, true, true. All right, cool. And also LinkedIn is
okay for people to connect with you there?
Eoin Murray: Oh yeah. Super. What's my Linkedin unique code?
Kirill Eremenko: We'll add it to the show notes.
Eoin Murray: Super. Yeah. Happy for that.
Kirill Eremenko: Awesome. Okay. Well, one more question actually
before you go, is there any book that you can
recommend to our listeners that has helped you in
your career?
Eoin Murray: Yeah, there is. So I learned data science by doing
during my physics career, but a lot of data science
fundamentally is just linear Algebra. So I think I'd
recommend, this is a very difficult book, but if you can
read the first chapter of it, you'll definitely walk away
with a lot more knowledge than when you went in. And
it's a book called Quantum Computation and
Quantum Information by Isaac Chuang and Michael
Nielsen. And I wouldn't really recommend the whole
book. It's like the bible of quantum information. It's a
very, very big book, but the first chapter of it is by far
the best introduction I've come across to linear
Algebra, which is an advanced step in data science,
but it's very, very useful.
Kirill Eremenko: Okay. Got you. Quantum Information and Quantum
Computation, right?
Eoin Murray: Yeah. By Chuang and Nielson.
Kirill Eremenko: By Chuang and Nielsen. Perfect. All right, Eoin thanks
so much again for coming on the show. Sharing your
insights and keep up the great work you guys are
doing with Kyso.io.
Eoin Murray: Thanks so much. Thanks for having me on.
Kirill Eremenko: So there you have it ladies and gentlemen. That was
Eoin Murray from Kyso.io. I hope you enjoyed this
conversation as much as I did and got some valuable
takeaways. For me, probably the most interesting part
was the whole conversation around startups and
accelerators, different types of investments and what
you get out of these programs that you can participate
in. I don't know if I'll ever be in one of them, if I'll ever
apply, but it is just good to know this whole world
because startups are on the rise. There's so many
interesting things happening in the startup world. So
like I got a really good share of knowledge from that
positive conversation. And of course needless to say,
the whole concept of Kyso.io. The tool where you can
share your data science projects. I'm very grateful
Eoin's looking into that and it's really cool also to see
that the base level of pricing on the platform is free
and as it says there right now, free forever.
Kirill Eremenko: So that's very admirable that they're creating this tool
for us data scientists to actually share our work and
experiences. And I look forward to seeing how it's going
to develop. So in its first year of existence they're
already so cool. So I can only see like a bright future
ahead for it.
Kirill Eremenko: On that note, you can get all the show notes for this
episode at www.superdatascience.com/263 that's
superdatascience.com/263. There you'll get all the
links that were mentioned on this episode, a URL to
Eoin Linkedin and other social media we can follow
him and connect with him, plus a transcript for this
episode and anything else that might be required in
order for you to get the maximum out of this podcast
episode so check it out. On that note, thanks so much
for being here and I look forward seeing you back here
next time. Until then, happy analyzing.