28
Introduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1

IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

IntroductionEvangelos Pournaras Izabela Moise Dirk Helbing

Evangelos Pournaras Izabela Moise Dirk Helbing 1

Outline

1 Data Science

2 Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 2

Part 1 - Data Science

Evangelos Pournaras Izabela Moise Dirk Helbing 3

What is Data Science

A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services

Evangelos Pournaras Izabela Moise Dirk Helbing 4

Is Data Science about Big Data I

Evangelos Pournaras Izabela Moise Dirk Helbing 5

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise Dirk Helbing 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise Dirk Helbing 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise Dirk Helbing 8

Threats I

Evangelos Pournaras Izabela Moise Dirk Helbing 9

Threats II

Evangelos Pournaras Izabela Moise Dirk Helbing 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 2: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Outline

1 Data Science

2 Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 2

Part 1 - Data Science

Evangelos Pournaras Izabela Moise Dirk Helbing 3

What is Data Science

A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services

Evangelos Pournaras Izabela Moise Dirk Helbing 4

Is Data Science about Big Data I

Evangelos Pournaras Izabela Moise Dirk Helbing 5

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise Dirk Helbing 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise Dirk Helbing 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise Dirk Helbing 8

Threats I

Evangelos Pournaras Izabela Moise Dirk Helbing 9

Threats II

Evangelos Pournaras Izabela Moise Dirk Helbing 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 3: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Part 1 - Data Science

Evangelos Pournaras Izabela Moise Dirk Helbing 3

What is Data Science

A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services

Evangelos Pournaras Izabela Moise Dirk Helbing 4

Is Data Science about Big Data I

Evangelos Pournaras Izabela Moise Dirk Helbing 5

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise Dirk Helbing 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise Dirk Helbing 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise Dirk Helbing 8

Threats I

Evangelos Pournaras Izabela Moise Dirk Helbing 9

Threats II

Evangelos Pournaras Izabela Moise Dirk Helbing 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 4: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

What is Data Science

A collection of orchestrated methods from different scientific fieldseg statistics computer science etc that provide understanding ofdomain data and result in data-based products and services

Evangelos Pournaras Izabela Moise Dirk Helbing 4

Is Data Science about Big Data I

Evangelos Pournaras Izabela Moise Dirk Helbing 5

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise Dirk Helbing 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise Dirk Helbing 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise Dirk Helbing 8

Threats I

Evangelos Pournaras Izabela Moise Dirk Helbing 9

Threats II

Evangelos Pournaras Izabela Moise Dirk Helbing 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 5: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Is Data Science about Big Data I

Evangelos Pournaras Izabela Moise Dirk Helbing 5

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise Dirk Helbing 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise Dirk Helbing 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise Dirk Helbing 8

Threats I

Evangelos Pournaras Izabela Moise Dirk Helbing 9

Threats II

Evangelos Pournaras Izabela Moise Dirk Helbing 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 6: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Is Data Science about Big Data II

Itrsquos more about using the right dataand asking the right questions

Evangelos Pournaras Izabela Moise Dirk Helbing 6

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise Dirk Helbing 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise Dirk Helbing 8

Threats I

Evangelos Pournaras Izabela Moise Dirk Helbing 9

Threats II

Evangelos Pournaras Izabela Moise Dirk Helbing 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 7: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

What about Techno-socio-economic Systems

Evangelos Pournaras Izabela Moise Dirk Helbing 7

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise Dirk Helbing 8

Threats I

Evangelos Pournaras Izabela Moise Dirk Helbing 9

Threats II

Evangelos Pournaras Izabela Moise Dirk Helbing 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 8: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

ICT amp Techno-socio-economic Systems

bull Embedded ICT systems in most societal domains How

bull Internet of Things pervasiveubiquitous computing advancednetworking systems inter-operability Result

bull A new explosion of data sources Opportunities

bull Understanding improving managing amp sustaining our complexsociety Threats

bull Privacy discrimination misinterpretations over-fitting etc

Evangelos Pournaras Izabela Moise Dirk Helbing 8

Threats I

Evangelos Pournaras Izabela Moise Dirk Helbing 9

Threats II

Evangelos Pournaras Izabela Moise Dirk Helbing 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 9: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Threats I

Evangelos Pournaras Izabela Moise Dirk Helbing 9

Threats II

Evangelos Pournaras Izabela Moise Dirk Helbing 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 10: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Threats II

Evangelos Pournaras Izabela Moise Dirk Helbing 10

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 11: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Who is a Data Scientist

bull A statistician

bull A computer programmer

bull Both and More

TipDomain knowledge can be more valuable than machine learning datamining etc

Evangelos Pournaras Izabela Moise Dirk Helbing 11

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 12: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Real-world Profile I

Evangelos Pournaras Izabela Moise Dirk Helbing 12

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 13: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Real-world Profile II

Evangelos Pournaras Izabela Moise Dirk Helbing 13

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 14: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

More about Data Scientists

httpshbrorg201210data-scientist-the-sexiest-job-of-the-21st-century

Evangelos Pournaras Izabela Moise Dirk Helbing 14

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 15: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

More about Data Scientists

Data scientistsrsquo most basic universal skill is the ability to writecode This may be less true in five yearsrsquo time when many morepeople will have the title data scientist on their business cardsMore enduring will be the need for data scientists to communicate inlanguage that all their stakeholders understand-and to demonstratethe special skills involved in storytelling with data whetherverbally visually or - ideally both

Evangelos Pournaras Izabela Moise Dirk Helbing 15

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 16: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

More about Data Scientists

But we would say the dominant trait among data scientists is anintense curiosity-a desire to go beneath the surface of a problemfind the questions at its heart and distill them into a very clearset of hypotheses that can be tested

Evangelos Pournaras Izabela Moise Dirk Helbing 16

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 17: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

More about Data Scientists

A quantitative analyst can be great at analyzing data but not atsubduing a mass of unstructured data and getting it into a form inwhich it can be analyzed

A data management expert might be great at generating andorganizing data in structured form but not at turning unstructureddata into structured data-and also not at actually analyzing thedata

And while people without strong social skills might thrive intraditional data professions data scientists must have such skills tobe effective

Evangelos Pournaras Izabela Moise Dirk Helbing 17

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 18: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Part 2 - Course Description

Evangelos Pournaras Izabela Moise Dirk Helbing 18

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 19: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Course ObjectivesQualify you with knowledge amp skills to tackle real-world problemsusing data

1 Acquiring domain knowledge and understanding2 Better understanding and interpretation of data

3 Awareness about the applicability of different data sciencemethods

4 Development of technical skills eg programming use ofdifferent tools etc

5 Presenting scientific results both written and orally

Evangelos Pournaras Izabela Moise Dirk Helbing 19

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 20: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Course Prerequisites

Some programming skills are required eg skills for the material ofthis course

1 JavaC++Python

2 UNIX

Didnrsquot you have an opportunity to practice this earlier

No problem this is a golden opportunity

TipProgramming skills will make you more flexible and efficient datascientist

Evangelos Pournaras Izabela Moise Dirk Helbing 20

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 21: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Assessment

bull Seminar thesis

bull 100 of the grade no exams

bull Detailed illustration in a next lecture

TipStart early Give the opportunity for your project and your skills todevelop during the course

Evangelos Pournaras Izabela Moise Dirk Helbing 21

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 22: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Lectures

bull Every Monday 1715-1900 at LFW B 1

bull Participation is not obligatory but highly recommended

bull 60 minutes lectures followed by 40 minutes interactivediscussions

bull Opportunity to discuss your projectbull Lectures at

httpwwwcossethzcheducationdatasciencehtml

Evangelos Pournaras Izabela Moise Dirk Helbing 22

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 23: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Subjects I

1 Computational Social Science Applications - 3 weeksndash Smart Grids geolocation traffic systems social sensingminingndash Tools amp platforms Nervousnet Twitter GDELT

2 Data Science Fundamentals - 2 weeksndash databases data types data collection data pre-processing

plotting visualization etcndash Tools Java AWK MySQL Gnuplot Gephi etc

3 Data Mining and Machine Learning - 2 weeksndash classification clustering prediction neural networks etcndash Tools Weka

4 Big Data Analytics - 2 weeks

Evangelos Pournaras Izabela Moise Dirk Helbing 23

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 24: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Subjects II

ndash MapReduce parallel computing data streaming social mediaetc

ndash Tools Hadoop Spark Mahout Spark Streaming Storm etc

5 The Nervousnet Hackathonndash Social Sensing amp Analyticsndash httpwwwnervousnetethzchhackathon

6 Other - 2 weeksndash Project presentations

Evangelos Pournaras Izabela Moise Dirk Helbing 24

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 25: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Lectures Outline

Lecture 01 (220216)IntroductionLecture 02 (290216)ApplicationsLecture 03 (070316)ApplicationsLecture 04 (140316)ApplicationsLecture 05 (210316)Data Science FundamentalsLecture 06 (040416)Data Science Fundamentals

Lecture 07 (110416)Data Mining and Machine LearningLecture 08 (250416)Data Mining and Machine LearningLecture 10 (020516)Big Data AnalyticsLecture 11 (090516)Big Data AnalyticsLecture 12 (230516)Oral PresentationsLecture 12 (300516)Oral PresentationsSpecial Lecture (22amp230416)The Nervousnet Hackathon

Evangelos Pournaras Izabela Moise Dirk Helbing 25

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 26: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

How to contact us

Communication

bull Discussion session in the course

bull E-mail with subject[DATA-SCIENCE-COURSE-2016]ltotherinfogtto

ndash Evangelos Pournaras epournarasethzch andorndash Iza Moise imoiseethzch

Supervision - strictly for issues not addressed in the course

bull Mondays 1500-1700Clausiusstrasse 50 (CLU C 4) 8092 Zurich

Evangelos Pournaras Izabela Moise Dirk Helbing 26

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 27: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

Proposed Literature

B Ellis

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Wiley Publishing 1st edition 2014

J Han

Data Mining Concepts and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 2005

T White

Hadoop The Definitive Guide

OrsquoReilly Media Inc 2015

I H Witten E Frank and M A Hall

Data Mining Practical Machine Learning Tools and Techniques

Morgan Kaufmann Publishers Inc San Francisco CA USA 3rd edition2011

Evangelos Pournaras Izabela Moise Dirk Helbing 27

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28

Page 28: IntroductionIntroduction Evangelos Pournaras, Izabela Moise, Dirk Helbing Evangelos Pournaras, Izabela Moise, Dirk Helbing 1 Outline 1.Data Science 2.Course Description Evangelos Pournaras,

What is next

bull Seminar thesis

bull Examples and applications

Evangelos Pournaras Izabela Moise Dirk Helbing 28