52
CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

Embed Size (px)

DESCRIPTION

3

Citation preview

Page 1: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

CS 1944: Sophomore SeminarBig Data and Machine Learning

B. Aditya PrakashAssistant Professor

Nov 3, 2015

Page 2: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

2

About me Assistant Professor, CS– Member, Discovery Analytics Center

Previously– Ph.D. in Computer Science, Carnegie Mellon University– B.Tech in Computer Science and Engg, Indian Institute of Technology (IIT) – Bombay– Internships at Sprint, Yahoo, Microsoft Research

Prakash 2015

Page 3: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

3Prakash 2015

Page 4: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

4Data contains value and knowledge

Prakash 2015

Page 5: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

5

Data and Business

Prakash 2015 Source: A. Machhanavajjhala

Page 6: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

6

Data and Science

Prakash 2015

Page 7: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

7

Data and Government

Prakash 2015 Source: A. Machhanavajjhala

Page 8: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

8

Data and Culture

Prakash 2015 Source: A. Machhanavajjhala

Page 9: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

9Prakash 2015

Page 10: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

10

Good news: Demand for Data Mining

Prakash 2015

Page 11: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

11

How to extract value from data?

Manipulate Data– CS, Domain expertise

Analyze Data– Math, CS, Stat…

Communicate your results– CS, Domain Expertise

Prakash 2015

Page 12: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

12

Communication is important!

Prakash 2015

Page 13: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

13

What is Data Mining? Given lots of data Discover patterns and models that are:– Valid: hold on new data with some certainty– Useful: should be possible to act on the item – Unexpected: non-obvious to the system– Understandable: humans should be able to

interpret the pattern

Prakash 2015

Page 14: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

14

Data Mining Tasks

Descriptive methods– Find human-interpretable patterns that

describe the data• Example: Clustering

Predictive methods– Use some variables to predict unknown

or future values of other variables• Example: Recommender systems

Prakash 2015

Page 15: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

ML & Stats.

Comp. Systems

Theory & Algo.

Biology

Econ.

Social Science

Physics

15

Big data

Prakash 2015

Page 16: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

16

Data at CS, VT

Knowledge, Information and Data

http://www.cs.vt.edu/undergraduate/tracks/kid

People: Fox, Harrison, Huang, Lu (in NVA), Ramakrishnan (in NVA), Rozovskaya, Prakash

Prakash 2015

Page 17: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

17

Courses

Background in some areas: – CS3414 (Numerical Methods); also prob/stat

4000 level– 4244 Internet Software Development– 4604 Database Management Systems– 4624 Capstone (Multimedia, Information Access)– 4634 Design of Information (Capstone)– 4804 AI– 4984 Computational Linguistics (Capstone)

Prakash 2015

Page 18: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

18

Discovery Analytics Center

Prakash 2015

Page 19: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

19

MY RESEARCH

Prakash 2015

Page 20: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

20

Networks are everywhere!

Human Disease Network [Barabasi 2007]

Gene Regulatory Network [Decourty 2008]

Facebook Network [2010]

The Internet [2005]

Prakash 2015

Page 21: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

21

What else do they have in common?

Prakash 2015

Page 22: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

22

High School Dating Network

Prakash 2015

Bearman et. al. Am. Jnl. of Sociology, 2004. Image: Mark Newman Blue: Male

Pink: Female

Interesting observations?

Page 23: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

23

The Internet

Prakash 2015

Skewed DegreesRobustness

Page 24: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

24

Karate Club Network

Prakash 2015

Page 25: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

25

Dynamical Processes over networks are also everywhere!

Prakash 2015

Page 26: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

26

Why do we care? Social collaboration Information Diffusion Viral Marketing Epidemiology and Public Health Cyber Security Human mobility Games and Virtual Worlds Ecology........Prakash 2015

Page 27: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

27

Why do we care? (1: Epidemiology)

Dynamical Processes over networks[AJPH 2007]

CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts

Diseases over contact networks

Prakash 2015

Page 28: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

28

Why do we care? (1: Epidemiology)

Dynamical Processes over networks

• Each circle is a hospital• ~3000 hospitals• More than 30,000 patients transferred

[US-MEDICARE NETWORK 2005]

Problem: Given k units of disinfectant, whom to immunize?

Prakash 2015

Page 29: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

29

Why do we care? (1: Epidemiology)

CURRENT PRACTICE OUR METHOD

~6x fewer!

[US-MEDICARE NETWORK 2005]

Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year)Prakash 2015

Page 30: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

30

Why do we care? (2: Online Diffusion)> 800m users, ~$1B revenue [WSJ 2010]

~100m active users

> 50m users

Prakash 2015

Page 31: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

31

Why do we care? (2: Online Diffusion)

Dynamical Processes over networks

Celebrity

Buy Versace™!

Followers

Social Media MarketingPrakash 2015

Page 32: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

Social Biological Contagion

Automatically learnmodels

Prakash 2014 32

Page 33: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

33

Why do we care? (3: To change the world?)

Dynamical Processes over networks

Social networks and Collaborative ActionPrakash 2015

Page 34: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

34

High Impact – Multiple Settings

Q. How to squash rumors faster?

Q. How do opinions spread?

Q. How to market better?

epidemic out-breaks

products/viruses

transmit s/w patches

Prakash 2015

Page 35: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

35

Dynamical Processes = (a lot of) Networks + (some) Time-Series

Prakash 2015

Page 36: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

36

Research Theme

DATALarge real-world

networks & processes

ANALYSISUnderstanding

POLICY/ ACTIONManaging

Prakash 2015

Page 37: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

37

Research Theme – Public Health

DATAModeling # patient

transfers

ANALYSISWill an epidemic

happen?

POLICY/ ACTION

How to control out-breaks?Prakash 2015

Page 38: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

38

Research Theme – Social Media

DATAModeling Tweets

spreading

POLICY/ ACTION

How to market better?

ANALYSIS# cascades in

future?

Prakash 2015

Page 39: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

39

A Question How many of you think your friends have more friends

than you?

A recent Facebook study– Examined all of FB’s users: 721 million people with 69 billion

friendships. • about 10 percent of the world’s population!

– Found that user’s friend count was less than the average friend count of his or her friends, 93 percent of the time.

– Users had an average of 190 friends, while their friends averaged 635 friends of their own.

Prakash 2015

Page 40: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

40

Possible Reasons?

You are a loner? Your friends are extroverts? There are more extroverts than introverts in

the world?

Prakash 2015

Page 41: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

41

Example

Prakash 2015

Source: S. Strogatz, NYT 2012

Average number of friends?

Page 42: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

42

Example

Prakash 2015

Source: S. Strogatz, NYT 2012

Average number of friends= ( 1 + 3 + 2 + 2 ) / 4= 2

Page 43: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

43

Example

Prakash 2015

Source: S. Strogatz, NYT 2012

Average number of friends= ( 1 + 3 + 2 + 2 ) / 4= 2

Average number of friends of friends

Page 44: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

44

Example

Prakash 2015

Source: S. Strogatz, NYT 2012

Average number of friends= ( 1 + 3 + 2 + 2 ) / 4= 2

Average number of friends of friends= (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8= ((1x1) + (3x3) + (2x2) + (2x2))/8

Page 45: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

45

Example

Prakash 2015

Source: S. Strogatz, NYT 2012

Average number of friends= ( 1 + 3 + 2 + 2 ) / 4= 2

Average number of friends of friends= (3 + 1 + 2 + 2 + 3 + 2 + 3 + 2)/8= ((1x1) + (3x3) + (2x2) + (2x2))/8= 2.25!

Page 46: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

46

Actually it is (almost) always true!

Proof?

Prakash 2015

Page 47: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

47

Actually it is (almost) always true!

Proof?

Prakash 2015

Page 48: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

48

Actually it is (almost) always true!

Proof?

Prakash 2015

Page 49: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

49

Actually it is (almost) always true!

Proof?

Prakash 2015

Page 50: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

50

Actually it is (almost) always true!

Proof?

Prakash 2015

Essentially, it is true if there is any spread in # of friends (non-zero variance)!

Page 51: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

51

Implications

Immunization – acquaintance immunization• Immunize friend-of-friend

Early warning of outbreaks– Again, monitor friends of friends

Prakash 2015

Page 52: CS 1944: Sophomore Seminar Big Data and Machine Learning B. Aditya Prakash Assistant Professor Nov 3, 2015

52

Thanks---Questions?

B. Aditya Prakash3160 F Torgersen [email protected] my homepage for more details and papers: http://www.cs.vt.edu/~badityap

Prakash 2015