From Big Data Management to Big Data Scienceeldawy/19FCS226/slides/CS226-14-Whats...What is next?...

Preview:

Citation preview

From Big Data Management

to Big Data Science

1

What is next?

Real big data is widely available

Only a few people know how to deal with it

You’re now one of them

Applications

The project is a start

Keep your hands dirty

Consider using the public cloud (e.g., AWS,

Google Cloud, or Microsoft Azure)

2

Job Market

https://www.techicy.com/5-best-programming-languages-to-watch-out-in-2019-for-data-science.html

3

Data Science

Credits: Drew Conway 4

Data Science

https://mashimo.wordpress.com/2016/05/28/big-data-data-science-and-machine-learning-explained/

5

Data Scientist

6

Next Steps

CS

Big data tools

Python/R/Scala

Math/Stats

Linear algebra

Correlation analysis

Hypothesis tests

Collaboration with domain experts

Visualization

Prototyping

7

CS

https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize

8

CS/Big Data

https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize

9

Math/Stats

https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize

10

Online Courses

https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize

11

Data Analytics

https://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize

12

Big Data Landscape

Distributed

StorageHDFS

KV

stores

LSM

trees

Column

stores

Query

Processing

Map

ReduceRDD Hyracks

High level

APIsPig

Latin

Spark

SQLHBase

Big data

packages

Algebricks

MLlib GraphX SparkR

13

Recommended