132
1

Transforming Big Data into Smart Data

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 2: Transforming Big Data into Smart Data

Ohio Center of Excellence in Knowledge-enabled Computing

• Shares 2nd position among all universities in the world in World Wide Web (cf: 5-yr impact, Microsoft Academic Search)

• Largest academic group in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications

• Exceptional student success: internships and jobs at top salary (IBM Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups )

• 100 researchers including 15 World Class faculty (>3K citations/faculty) and 45+ PhD students- practically all funded

• $2M+/yr research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …)

Page 3: Transforming Big Data into Smart Data

2011

How much data?

48 (2013)

500 (2013)

3 http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/

Page 4: Transforming Big Data into Smart Data

1% of the data is

used for analysis.

4 http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode http://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume

Page 5: Transforming Big Data into Smart Data

Variety

Semi structured

5

Page 6: Transforming Big Data into Smart Data

Velocity

Fast Data

Rapid Changes

Real-Time/Stream Analysis

Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail 6

Page 7: Transforming Big Data into Smart Data

• Focus on verticals: advertising‚ social media‚ retail‚ financial services‚ telecom‚ and healthcare

– Aggregate data, focused on transactions, limited integration (limited complexity), analytics to find (simple) patterns

– Emphasis on technologies to handle volume/scale, and to lesser extent velocity: Hadoop, NoSQL,MPP warehouse ….

– Full faith in the power of data (no hypothesis), bottom up analysis

7

Current Focus on Big Data

Page 8: Transforming Big Data into Smart Data

• What if your data volume gets so large and varied you don't know how to deal with it?

• Do you store all your data?

• Do you analyze it all?

• How can you find out which data points are really important?

• How can you use it to your best advantage?

8

Questions typically asked on Big Data

http://www.sas.com/big-data/

Page 9: Transforming Big Data into Smart Data

http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/

Variety of Data Analytics Enablers

9

Page 10: Transforming Big Data into Smart Data

• Prediction of the spread of flu in real time during H1N1 2009 – Google tested a mammoth of 450 million different mathematical

models to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds

– Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]

• FareCast: predict the direction of air fares over different routes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]

• NY city manholes problem [ICML Discussion, 2012]

10

Illustrative Big Data Applications

Page 11: Transforming Big Data into Smart Data

• Current focus mainly to serve business intelligence and targeted analytics needs, not to serve complex individual and collective human needs (e.g., empower human in health, fitness and well-being; better disaster coordination, personalized smart energy) that are highly personalized/individualized/contextualized – Incorporate real-world complexity: multi-modal and multi-sensory nature

of real-world and human perception – Need deeper understanding of data and its role to information (e.g., skew,

coverage)

• Human involvement and guidance: Leading to actionable

information, understanding and insight right in the context of human activities – Bottom-up & Top-down processing: Infusion of models and background

knowledge (data + knowledge + reasoning)

11

What is missing?

Page 12: Transforming Big Data into Smart Data

Makes Sense

Actionable or help decision support/making

12

Page 13: Transforming Big Data into Smart Data

13

Before Definition – A short recap of SMART DATA

2004-2005

Notice the formulation of Smart Data strategy providing services for Search, Explore, Notify

Page 14: Transforming Big Data into Smart Data

14

Semagix – A short recap of SMART DATA

Use of Ontologies and Data

repositories to gain relevant

insights

Page 15: Transforming Big Data into Smart Data

Smart Data

Smart data makes sense out of Big data

It provides value from harnessing the challenges posed by volume, velocity, variety and veracity

of big data, in-turn providing actionable information and improve decision making.

15

Page 16: Transforming Big Data into Smart Data

“OF human, BY human and FOR human”

Smart data is focused on the actionable value achieved by human involvement in data

creation, processing and consumption phases for improving

the human experience.

Another perspective on Smart Data

16

Page 17: Transforming Big Data into Smart Data

“OF human, BY human and FOR human”

Another perspective on Smart Data

18

Page 18: Transforming Big Data into Smart Data

Petabytes of Physical(sensory)-Cyber-Social Data everyday! More on PCS Computing: http://wiki.knoesis.org/index.php/PCS 19

‘OF human’ : Relevant Real-time Data Streams for Human Experience

Page 19: Transforming Big Data into Smart Data

“OF human, BY human and FOR human”

20

Another perspective on Smart Data

Page 20: Transforming Big Data into Smart Data

Use of Prior Human-created Knowledge Models

21

‘BY human’: Involving Crowd Intelligence in data processing workflows

Crowdsourcing and Domain-expert guided Machine Learning Modeling

Page 21: Transforming Big Data into Smart Data

“OF human, BY human and FOR human”

Another perspective on Smart Data

22

Page 22: Transforming Big Data into Smart Data

Detection of events, such as wheezing

sound, indoor temperature, humidity,

dust, and CO2 level

Weather Application

Asthma Healthcare Application

Close the window at home during day to avoid CO2 in

gush, to avoid asthma attacks at night

23

‘FOR human’ : Improving Human Experience

Population Level

Personal

Public Health

Action in the Physical World

Page 23: Transforming Big Data into Smart Data

Electricity usage over a day, device at

work, power consumption, cost/kWh,

heat index, relative humidity, and public

events from social stream

Weather Application

Power Monitoring Application

24

‘FOR human’ : Improving Human Experience

Population Level Observations

Personal Level Observations

Action in the Physical World

Washing and drying has

resulted in significant cost

since it was done during peak

load period. Consider

changing this time to night.

Page 24: Transforming Big Data into Smart Data

25

Why do we care about Smart Data

rather than Big Data?

Page 26: Transforming Big Data into Smart Data

Second-costliest hurricane in United States history estimated damage $75 billion

90-115 mph winds

State of Emergency in New York

285 people killed on the track of Sandy

750,000 without power (NY)

Immense devastation and Human suffering

27

Big Data to Smart Data: Disaster Management example

http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.html

Page 27: Transforming Big Data into Smart Data

20 million tweets with “sandy, hurricane” keywords between Oct 27th and Nov 1st

2nd most popular topic on Facebook during 2012

Social (Big) Data during Hurricane Sandy

28

• http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-flooding

• http://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.html

• http://mashable.com/2012/10/31/hurricane-sandy-facebook/

Page 28: Transforming Big Data into Smart Data

For information seeking

For timely information

For unique information

For unfiltered information

To determine disaster magnitude

To check in with family and friends

To self-mobilize

To maintain a sense of community

To seek emotional support and healing

Governments

Emergency management organizations

Journalists

Disaster responders

Public

BIG DATA TO SMART DATA: WHY? and FOR WHOM?

29

Fraustino et al. Social Media Use during Disasters: A Review of the Knowledge Base and Gaps. US Dept. of Homeland Security, START 2012.

Page 29: Transforming Big Data into Smart Data

Improving situational awareness - Timely delivery of necessary information to the right people

Improving coordination between resource seekers and suppliers

Detecting the magnitude of disaster by people sentiments.

Many more challenges…

Can SNS’s make Disaster Management easier – Giving Actionable Information (Smart Data)

30

http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec http://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html http://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.html

Page 30: Transforming Big Data into Smart Data

Volume

Twitter hits half a billion tweets a day!

Challenges

Delivering the necessary actionable/information to the right people

31 http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US

Page 31: Transforming Big Data into Smart Data

Velocity

Volume

@ConEdison Twitter handle that the company had only set up in June gained an extra 16,000 followers over the storm. – Did the information reach everyone?

Challenges

Delivering the necessary/actionable information to the right people

Rate of Data Arrival Approximately 7000 TPS 10 images per second on instagram

32

http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US http://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdf

Page 32: Transforming Big Data into Smart Data

http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US

Velocity

Variety

Volume

Semi Structured

Structured

Unstructured

Sensors Linked Open Data

Wikipedia

Challenges

Delivering the necessary/actionable information to the right people

33

Page 34: Transforming Big Data into Smart Data

Velocity

Variety

Veracity

Volume

35

Page 37: Transforming Big Data into Smart Data

Descriptive Exploratory Inferential Predictive

Causal

Human Centric Computing

Improved Analytics Creation

Processing

Experience

38

Page 38: Transforming Big Data into Smart Data

• Healthcare – kHealth

– SemHeath

• Social event coordination – Twitris

• Traffic monitoring – kTraffic

39

Applications of Smart Data Analytics

Page 40: Transforming Big Data into Smart Data

To gain new insight in patient care & early indications of disease

41

Smart Data in Healthcare

Page 41: Transforming Big Data into Smart Data

Sensing is a key enabler of the Internet of Things

BUT, how do we make sense of the resulting avalanche of sensor data?

50 Billion Things by 2020 (Cisco)

42

Page 42: Transforming Big Data into Smart Data

Parkinson’s disease (PD) data from The Michael J. Fox Foundation

for Parkinson’s Research.

43

1https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data

8 weeks of data from 5 sensors on a smart phone, collected for 16 patients resulting in ~12 GB (with lot of missing data).

Variety Volume

Veracity Velocity

Value Can we detect the onset of Parkinson’s disease? Can we characterize the disease progression? Can we provide actionable information to the patient?

sem

anti

cs

Representing prior knowledge of PD led to a focused exploration of this massive dataset

WHY Big Data to Smart Data: Healthcare example

Page 43: Transforming Big Data into Smart Data

44

Big Data to Smart Data Using a Knowledge Based Approach

ParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person) ParkinsonModerate(person) = MoveSlow(person) ∧ PoorSleep(person) ∧ MonotoneSpeech(person) ParkinsonAdvanced(person) = Fall(person)

Control Group PD Patients

Movements of an active person has a good

distribution over X, Y, and Z axis

Restricted movements by a PD patient can be seen

in the acceleration readings

Audio is well modulated with good variations in the energy of the voice

Audio is not well modulated represented a

monotone speech

Declarative Knowledge of Parkinson’s Disease used to focus

our attention on symptom manifestations in sensor

observations

Page 44: Transforming Big Data into Smart Data

• 25 million people in the U.S. are diagnosed with asthma (7 million are children)1.

• 300 million people suffering from asthma worldwide2.

• Asthma related healthcare costs alone are around $50 billion a year2.

• 155,000 hospital admissions and 593,000 emergency department visits in 20063.

45

1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/ 2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.

Asthma: Severity of the problem

Page 45: Transforming Big Data into Smart Data

46

Patient Health Score (diagnostic)

Semantic Perception and risk assessment algorithms can transform raw data (hard to comprehend) to abstractions (e.g., Patient Health is 3 on a scale of 5) that is

intuitively understandable and valuable for decision makers.

Having health score for various patients will allow efficient utilization of a decision maker’s precious attention

Risk assessment model

Semantic Perception

Population health record

Personal health record

Expert opinion

Clinical research

Clinical decision support

Page 46: Transforming Big Data into Smart Data

47

Patient Vulnerability Score (prognostic)

The Clinical Decision Support systems such as EMR alert system in its current state follows the high recall philosophy by reporting every

possible alert!

Doctors need actionable information and not the deluge of alerts to make timely and important decisions. Providing a vulnerability score would

facilitate right use of Doctor’s time to investigate further on vulnerabilities.

Risk assessment model

Semantic Perception

Population health record

Personal health record

Expert opinion

Clinical research

Clinical decision support

Page 47: Transforming Big Data into Smart Data

48

Value: Patient Context

How could Smart Data help?

Page 48: Transforming Big Data into Smart Data

49

Data Overload for Patients/health aficionados

Providing actionable information in a timely manner is crucial to avoid information overload or fatigue

Sleep data Community data

Personal Schedule Activity data

Personal health records

Page 49: Transforming Big Data into Smart Data

50

Optimizing Cost, Benefit, and Preferences

Algorithms on the patient side should consider all the health signals and provide actionable and timely information for informed decision making

What are the reasons for my increasing weight? What should I consider before I get a kidney transplant?

Semantic Perception

Personalized optimization

Personalized recommendation

Img: http://marloncarvallovillae.blogspot.com/2011_02_01_archive.html http://www.1800timeclocks.com/icon-time-systems/icon-time-upgrades/icon-time-advanced-pack-upgrade-sb100-pro/

Sleep data

Community data

Personal Schedule

Activity data

Personal health records

Page 50: Transforming Big Data into Smart Data

51

3.4 billion people will have smartphones or tablets by 2017 -- Research2Guidance

“Intelligence at the Edges” of Digital Health

http://www.digikey.com/us/en/techzone/energy-harvesting/resources/articles/zigbees-smart-energy-20-profile.html

m-health app market is predicted to reach $26 billion in 2017 -- Research2Guidance

Page 51: Transforming Big Data into Smart Data

Asthma is a multifactorial disease with health signals spanning personal, public health, and population levels.

52

Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies.

Variety Volume

Veracity Velocity

Value

Can we detect the asthma severity level? Can we characterize asthma control level? What risk factors influence asthma control? What is the contribution of each risk factor?

sem

anti

cs

Understanding relationships between health signals and asthma attacks for providing actionable information

WHY Big Data to Smart Data: Healthcare example

Page 52: Transforming Big Data into Smart Data

53

Population Level

Personal

Public Health

Variety: Health signals span heterogeneous sources Volume: Health signals are fine grained Velocity: Real-time change in situations Veracity: Reliability of health signals may be compromised

Value: Can I reduce my asthma attacks at night?

Decision support to doctors by providing them with

deeper insights into patient asthma care

Asthma: Demonstration of Value

Page 53: Transforming Big Data into Smart Data

54

Sensordrone – for monitoring environmental air quality

Wheezometer – for monitoring wheezing sounds

Can I reduce my asthma attacks at night?

What are the triggers? What is the wheezing level?

What is the propensity toward asthma?

What is the exposure level over a day?

What is the air quality indoors?

Commute to Work

Personal

Public Health

Population Level

Closing the window at home in the morning and taking an alternate route to office may

lead to reduced asthma attacks

Actionable Information

Asthma: Actionable Information for Asthma Patients

Page 54: Transforming Big Data into Smart Data

Personal, Public Health, and Population Level Signals for Monitoring Asthma

Asthma Control => Daily Medication Choices for starting

therapy

Not Well Controlled Poor Controlled

Severity Level of Asthma

(Recommended Action) (Recommended Action) (Recommended Action)

Intermittent Asthma SABA prn - -

Mild Persistent Asthma Low dose ICS Medium ICS Medium ICS

Moderate Persistent

Asthma

Medium dose ICS alone

Or with LABA/montelukast

Medium ICS +

LABA/Montelukast Or High dose ICS

Medium ICS +

LABA/Montelukast Or High dose ICS*

Severe Persistent Asthma High dose ICS with LABA/montelukast

Needs specialist care Needs specialist care

ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist

Asthma Control and Actionable Information

Sensors and their observations for understanding asthma

55

Page 55: Transforming Big Data into Smart Data

56

Personal Level Signals

Societal Level Signals

(Personal Level Signals)

(Personalized Societal Level Signal)

(Societal Level Signals)

Societal Level Signals Relevant to the Personal Level

Personal Level Sensors

(kHealth**) (EventShop*)

Qualify Quantify Action

Recommendation

What are the features influencing my asthma?

What is the contribution of each of these features?

How controlled is my asthma? (risk score)

What will be my action plan to manage asthma?

Storage

Societal Level Sensors

Asthma Early Warning Model (AEWM)

Query AEWM

Verify & augment

domain knowledge

Recommended

Action

Action

Justification

Asthma Early Warning Model

*http://www.slideshare.net/jain49/eventshop-120721, ** http://www.youtube.com/watch?v=btnRi64hJp4

Page 56: Transforming Big Data into Smart Data

57

Population Level

Personal

Wheeze – Yes Do you have tightness of chest? –Yes

Observations Physical-Cyber-Social System Health Signal Extraction Health Signal Understanding

<Wheezing=Yes, time, location>

<ChectTightness=Yes, time, location>

<PollenLevel=Medium, time, location>

<Pollution=Yes, time, location>

<Activity=High, time, location>

Wheezing

ChectTightness

PollenLevel

Pollution

Activity

Wheezing

ChectTightness

PollenLevel

Pollution

Activity

RiskCategory

<PollenLevel, ChectTightness, Pollution,

Activity, Wheezing, RiskCategory>

<2, 1, 1,3, 1, RiskCategory>

<2, 1, 1,3, 1, RiskCategory>

<2, 1, 1,3, 1, RiskCategory>

<2, 1, 1,3, 1, RiskCategory>

.

.

.

Expert

Knowledge

Background

Knowledge

tweet reporting pollution level

and asthma attacks

Acceleration readings from

on-phone sensors

Sensor and personal

observations Signals from personal, personal

spaces, and community spaces

Risk Category assigned by

doctors

Qualify

Quantify

Enrich

Outdoor pollen and pollution

Public Health

Health Signal Extraction to Understanding

Well Controlled - continue Not Well Controlled – contact nurse Poor Controlled – contact doctor

Page 57: Transforming Big Data into Smart Data

58

RDF OWL

How are machines supposed to integrate and interpret sensor data?

Semantic Sensor Networks (SSN)

Page 58: Transforming Big Data into Smart Data

59

W3C Semantic Sensor Network Ontology

Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,

Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).

Page 59: Transforming Big Data into Smart Data

60

W3C Semantic Sensor Network Ontology

Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,

Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).

Page 60: Transforming Big Data into Smart Data

61

W3C Semantic Sensor Network Ontology

Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,

Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).

Page 61: Transforming Big Data into Smart Data

62

Semantic Annotation of SWE

Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,

Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).

Page 62: Transforming Big Data into Smart Data

… and do it efficiently and at scale

Next: What if we could automate the sense making ability?

63

Page 63: Transforming Big Data into Smart Data

People are good at making sense of sensory input

What can we learn from cognitive models of perception? • The key ingredient is prior knowledge

64

Page 64: Transforming Big Data into Smart Data

* based on Neisser’s cognitive model of perception

Observe Property

Perceive Feature

Explanation

Discrimination

1

2

Perception Cycle*

Translating low-level signals into high-level knowledge

Focusing attention on those aspects of the environment that provide useful information

Prior Knowledge

65

Page 65: Transforming Big Data into Smart Data

To enable machine perception,

Semantic Web technology is used to integrate sensor data with prior knowledge on the Web

66

Page 66: Transforming Big Data into Smart Data

Prior knowledge on the Web

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

67

Page 67: Transforming Big Data into Smart Data

Prior knowledge on the Web

W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph

68

Page 69: Transforming Big Data into Smart Data

Explanation

Inference to the best explanation • In general, explanation is an abductive problem; and

hard to compute Finding the sweet spot between abduction and OWL • Single-feature assumption* enables use of OWL-DL

deductive reasoner * An explanation must be a single feature which accounts for all observed properties

Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building

70

Page 70: Transforming Big Data into Smart Data

Explanation

Explanatory Feature: a feature that explains the set of observed properties

ExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Observed Property Explanatory Feature

71

Page 71: Transforming Big Data into Smart Data

Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features

Observe Property

Perceive Feature

Explanation

Discrimination 2

Focusing attention on those aspects of the environment that provide useful information

Discrimination

72

Page 72: Transforming Big Data into Smart Data

Discrimination

Expected Property: would be explained by every explanatory feature

ExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Expected Property Explanatory Feature

73

Page 73: Transforming Big Data into Smart Data

Discrimination

Not Applicable Property: would not be explained by any explanatory feature

NotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn}

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Not Applicable Property Explanatory Feature

74

Page 74: Transforming Big Data into Smart Data

Discrimination

Discriminating Property: is neither expected nor not-applicable

DiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ¬NotApplicableProperty

elevated blood pressure

clammy skin

palpitations

Hypertension

Hyperthyroidism

Pulmonary Edema

Discriminating Property Explanatory Feature

75

Page 75: Transforming Big Data into Smart Data

Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information

canary in a coal mine

Our Motivation

kHealth: knowledge-enabled healthcare

76

Page 76: Transforming Big Data into Smart Data

Qualities -High BP -Increased Weight

Entities -Hypertension -Hypothyroidism

kHealth

Machine Sensors

Personal Input

EMR/PHR

Comorbidity risk score e.g., Charlson Index

Longitudinal studies of cardiovascular risks

- Find correlations - Validation - domain knowledge - domain expert

Parameterize the model

Risk Assessment Model

Current Observations -Physical -Physiological -History

Risk Score (Actionable Information)

Model Creation Validate correlations

Historical observations of each patient

Risk Score: from Data to Abstraction and Actionable Information

77

Page 77: Transforming Big Data into Smart Data

How do we implement machine perception efficiently on a resource-constrained device?

Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time • Runs out of resources with prior knowledge >> 15 nodes • Asymptotic complexity: O(n3)

78

Page 78: Transforming Big Data into Smart Data

intelligence at the edge

Approach 1: Send all sensor observations to the cloud for processing

Approach 2: downscale semantic processing so that each device is capable of machine perception

79

Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.

Page 79: Transforming Big Data into Smart Data

Efficient execution of machine perception

Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning

0101100011010011110010101100011011011010110001101001111001010110001101011000110100111

80

Page 80: Transforming Big Data into Smart Data

O(n3) < x < O(n4) O(n)

Efficiency Improvement • Problem size increased from 10’s to 1000’s of nodes • Time reduced from minutes to milliseconds • Complexity growth reduced from polynomial to linear

Evaluation on a mobile device

81

Page 81: Transforming Big Data into Smart Data

2 Prior knowledge is the key to perception

Using SW technologies, machine perception can be formalized and

integrated with prior knowledge on the Web

3 Intelligence at the edge By downscaling semantic inference, machine perception can

execute efficiently on resource-constrained devices

Semantic Perception for smarter analytics: 3 ideas to takeaway

1 Translate low-level data to high-level knowledge

Machine perception can be used to convert low-level sensory

signals into high-level knowledge useful for decision making

82

Page 82: Transforming Big Data into Smart Data

• Real Time Feature Streams: http://www.youtube.com/watch?v=_ews4w_eCpg

• kHealth: http://www.youtube.com/watch?v=btnRi64hJp4

83

Demos

Page 83: Transforming Big Data into Smart Data

84

Smart Data in Social Media Analytics

To Understand the human social dynamics in real world events

Page 84: Transforming Big Data into Smart Data

0.5B Tweets per day

0.5B Users

60% on Mobile

5530 Tweets per second related to the Japan earthquake and tsunami

17000 Tweets per second

85

Twitter During Real-world Events of Interest

http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/ http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitter http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/

Page 86: Transforming Big Data into Smart Data

State of the Art – Uni/Bi Dimensional Analysis During Elections

Topics

Sentiments

87

Page 88: Transforming Big Data into Smart Data

89 http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249 http://semanticweb.com/election-2012-the-semantic-recap_b33278

Page 89: Transforming Big Data into Smart Data

90

[The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST]

/t

Page 90: Transforming Big Data into Smart Data

91

Twitris: Sentiment Analysis- Smart Answers with reasoning!

How was Obama doing in the first debate?

Page 91: Transforming Big Data into Smart Data

92

Red Color: Negative Topics Green Color: Positive Topics

Twitris: Sentiment Analysis- Smart Answers with reasoning!

How was Obama doing in the second debate?

SMART DATA IS ABOUT ANALYSIS FOR REASONING (what caused the positive sentiment for Democrats) BEHIND THE REAL-WORLD ACTIONS (Democrats’ win)

http://knoesis.wright.edu/library/resource.php?id=1787

Page 92: Transforming Big Data into Smart Data

Top 100 influential users that talks about Barack Obama

Positive or Negative Influence

Twitris: Network Analysis

SMART DATA TELLS YOU HOW CAN A SYSTEM BE TWEAKED FOR THE DESIRED ACTIONS!

Could we engage with users (targeted) with extreme polarity leaning for Obama to spark an agenda in the whole

network of voters (ACTION)? 93

Page 93: Transforming Big Data into Smart Data

Twitris: Community Evolution

SMART DATA FOCUSES ON THE CAUSALITY OF CHANGES IN REAL-WORLD ACTIONS!

Romney

Obama

Evolution of influencer interaction networks for Romney vs. Obama topical communities, during U.S. Presidential Election 2012 debates

Before 1st debate

After 1st debate

After Hurricane Sandy

After 3rd debate

94

Page 94: Transforming Big Data into Smart Data

The Dead People mentioned in the event OWC

Twitris: Impact of Background Knowledge

95

Page 95: Transforming Big Data into Smart Data

How People from Different parts of the world talked

about US Election

Images and Videos Related to US Election

Twitris: Analysis by Location

96

Page 96: Transforming Big Data into Smart Data

What is Smart Data in the context of Disaster Management

ACTIONABLE: Timely delivery of right resources and information to the right people at right location!

97

Because everyone wants to Help, but DON’T KNOW HOW!

Page 97: Transforming Big Data into Smart Data

Join us for the Social Good! http://twitris.knoesis.org

RT @OpOKRelief: Southgate Baptist Church

on 4th Street in Moore has food, water, clothes, diapers, toys, and more. If you can't go,call 794

Text \"FOOD\" to 32333, REDCROSS to 90999, or STORM to 80888 to donate $10

in storm relief. #moore #oklahoma

#disasterrelief #donate

Want to help animals in #Oklahoma? @ASPCA tells

how you can help: http://t.co/mt8l9PwzmO

CITIZEN SENSORS

RESPONSE TEAMS (including humanitarian

org. and ‘pseudo’ responders)

VICTIM SITE

Coordination of needs and offers

Using Social Media Does anyone

know where to send a check to donate to the

tornado victims?

Where do I go to help out for volunteer work around Moore? Anyone know?

Anyone know where to donate

to help the animals from the

Oklahoma disaster? #oklah

oma #dogs

Matched

Matched

Matched

Serving the need!

If you would like to volunteer today, help is desperately needed in Shawnee. Call 273-5331 for more info

http://www.slideshare.net/hemant_knoesis/cscw-2012-hemantpurohit-11531612 98 Purohit et al. Framework to Analyze Coordination in Crisis Response, 2012. Int’l Collaboration in-progress: with QCRI

Page 98: Transforming Big Data into Smart Data

Smart Data from Twitris system for Disaster Response Coordination

Which are the primary locations with most negative sentiments/emotions?

Who are all the people to engage with for better information

diffusion? Which are the most important organizations acting at my

location?

Smart data provides actionable information and improve decision making through

semantic analysis of Big Data.

Who are the resource seekers and suppliers? How can one donate?

99

Page 100: Transforming Big Data into Smart Data

Disaster Response Coordination: Twitris Summary for Actionable Nuggets

101

Important tags to summarize Big Data flow

Related to Oklahoma tornado

Images and Videos Related to Oklahoma tornado

Page 101: Transforming Big Data into Smart Data

102

Disaster Response Coordination: Twitris Real-time information for needs

Incoming Tweets with need types to give quick idea of what is needed and where

currently #OKC

Legends for Different needs #OKC

(It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado

Page 102: Transforming Big Data into Smart Data

103

Disaster Response Coordination: Influencers to engage with for specific needs

Influential users are respective needs and their interaction

network on the right.

Page 103: Transforming Big Data into Smart Data

Really sparse Signal to Noise: • 2M tweets during the first week after #Oklahoma-tornado-2013

- 1.3% as the highly precise donation requests to help - 0.02% as the highly precise donation offers to help

104

• Anyone know how to get involved to help the tornado victims in Oklahoma??\#tornado #oklahomacity (OFFER)

• I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)

Disaster Response Coordination: Finding Actionable Nuggets for Responders to act

• Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST)

• Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST)

For responders, most important information is the scarcity and availability of resources, can we mine it via Social Media?

Page 104: Transforming Big Data into Smart Data

• Features driven by the experience of domain experts at the responder organizations

• Examples, – ‘I want to <donate/ help/ bring>’ for extraction of offering intention

– ‘tent house’ OR ‘cots’ for shelter need types

105

Disaster Response Coordination: Human Knowledge to drive information extraction

Page 105: Transforming Big Data into Smart Data

• A knowledge-driven approach – A rich inventory of metadata for tweets

– Semantic matching for

needs (query) vs. offers (documents)

• Example, – @bladesofmilford please help get the word out,we are accepting kid clothes to send

to the lil angels in Oklahoma.Drop off @MilfordGreenPiz (REQUEST)

– I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER)

106

Disaster Response Coordination: Automatic Matching of needs and offers

Matching the competitive intentions

(Needs and Offers) can offload humans for the

task of resource matchmaking for

coordination.

Page 106: Transforming Big Data into Smart Data

107

Disaster Response Coordination: Engagement Interface for responders

What-Where-How-Who-Why Coordination

Influential users to engage with and resources for

seekers/supplies at a location, at a timestamp

Contextual Information for a

chosen topical tags

Page 107: Transforming Big Data into Smart Data

• Illustrious scenario: #Oklahoma-tornado 2013

108

Disaster Response Coordination: Anecdote for the value of Smart Data

FEMA asked us to quickly filter out gas-leak related data

Mining the data for smart nuggets to inform FEMA (Timely needs)

Engaged with the author of this information to confirm (Veracity)

e.g., All gas leaks in #moore were capped and stopped by 11:30 last night (at 5/22/2013 1:41:37)

Lot of tweets for ‘how to/where to’ assist (‘pseudo’ responders) e.g., I want to go to Oklahoma this weekend & do what i can to help those people with food,cloths & supplies,im in the feel of wanting to help ! :)

Page 108: Transforming Big Data into Smart Data

An event is a dynamic topic that evolves and

might later fork into several distinct events.

Smart Data analytics to capture rapidly evolving social data events

109

Social Media is the pulse of the populace, a true reflection of

events all over the globe!

Page 109: Transforming Big Data into Smart Data

Continuous Semantics

110

Page 110: Transforming Big Data into Smart Data

Dynamic Model Creation

Continuous Semantics 111

Page 111: Transforming Big Data into Smart Data

Dynamic Model Creation:

112

Example of how background knowledge help understand situation described in the tweets, while

also updating knowledge model also

Page 112: Transforming Big Data into Smart Data

How is Continuous Semantics a form of Smart Data Analytics?

Keeping the Background Knowledge abreast with the changes of the event

Smartly learning and adapting data acquisition (Temporally apt Big Data, i.e. Fast Data)

In-turn providing temporally relevant Smart Data through analysis

113

Page 113: Transforming Big Data into Smart Data

114

Smart Data Analytics in Traffic Management

To improve the everyday life entangled due to our most common problem of sticking in traffic

Page 114: Transforming Big Data into Smart Data

By 2001 over 285 million Indians lived in cities, more than in all North American cities combined (Office of the Registrar General of India 2001)1

1The Crisis of Public Transport in India 2IBM Smarter Traffic

Modes of transportation in Indian Cities

Texas Transportation Institute (TTI) Congestion report in U.S.

115

Severity of the Traffic Problem

Page 115: Transforming Big Data into Smart Data

Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical) and incident reports (textual)

116

http://511.org/

Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months.

Variety Volume

Veracity Velocity

Value Can we detect the onset of traffic congestion? Can we characterize traffic congestion based on events? Can we provide actionable information to decision makers?

sem

anti

cs

Representing prior knowledge of traffic lead to a focused exploration of this massive dataset

Big Data to Smart Data: Traffic Management example

Page 116: Transforming Big Data into Smart Data

Slow moving traffic

Link Description

Scheduled Event

Scheduled Event

511.org

511.org

Schedule Information

511.org

Traffic Monitoring

117

Heterogeneity in a Physical-Cyber-Social System

Page 117: Transforming Big Data into Smart Data

118

Heterogeneity in a Physical-Cyber-Social System

Page 118: Transforming Big Data into Smart Data

• Observation: Slow Moving Traffic

• Multiple Causes (Uncertain about the cause): – Scheduled Events: music events, fair, theatre events, concerts, road

work, repairs, etc.

– Active Events: accidents, disabled vehicles, break down of roads/bridges, fire, bad weather, etc.

– Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm

• Each of these events may have a varying impact on traffic.

• A delay prediction algorithm should process multimodal and multi-sensory observations.

Uncertainty in a Physical-Cyber-Social System

119

Page 119: Transforming Big Data into Smart Data

• Internal observations

– Speed, volume, and travel time observations

– Correlations may exist between these variables across different parts of the network

• External events

– Accident, music event, sporting event, and planned events

– External events and internal observations may exhibit correlations

Modeling Traffic Events

120

Page 120: Transforming Big Data into Smart Data

Accident

Music event

Sporting event

Road Work

Theatre event

External events <ActiveEvents, ScheduledEvents>

Internal observations <speed, volume, traveTime>

Weather

Time of Day

Modeling Traffic Events

121

Page 121: Transforming Big Data into Smart Data

Domain Experts

cold

PoorVisibility

SlowTraffic

IcyRoad

Declarative domain knowledge

Causal knowledge

Linked Open Data

Cold (YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO)

1 0 1 1

1 1 1 0

1 1 1 1

1 0 1 0

Domain Observations

Domain Knowledge

Structure and parameters

Complementing Probabilistic Models with Declarative Knowledge

123

Correlations to causations using Declarative knowledge on the Semantic Web

Page 122: Transforming Big Data into Smart Data

• Declarative knowledge about various domains are increasingly being published on the web1,2.

• Declarative knowledge describes concepts and relationships in a domain (structure).

• Linked Open Data may be used to derive priors probability of events (parameters).

• Explored the use declarative knowledge for structure using ConceptNet 5.

1http://conceptnet5.media.mit.edu/ 2http://linkeddata.org/

Domain Knowledge

124

Page 123: Transforming Big Data into Smart Data

http://conceptnet5.media.mit.edu/web/c/en/traffic_jam

Delay

go to baseball game

traffic jam

traffic accident

traffic jam

ActiveEvent

ScheduledEvent

Causes traffic jam

Causes traffic jam

CapableOf slow traffic

CapableOf occur twice each day

Causes

is_a

bad weather CapableOf

slow traffic

road ice Causes

accident

TimeOfDay

go to concert HasSubevent

car crash

accident RelatedTo

car crash

BadWeather

Causes

Causes

is_a is_a

is_a is_a is_a

is_a

is_a

ConceptNet 5

125

Page 124: Transforming Big Data into Smart Data

Traffic jam

Link Description

Scheduled Event

traffic jam baseball game

Add missing random variables

Time of day

bad weather CapableOf slow traffic

bad weather

Traffic data from sensors deployed on road network in San Francisco Bay Area

time of day

traffic jam baseball game time of day

slow traffic

Three Operations: Complementing graphical model structure extraction

Add missing links

bad weather

traffic jam baseball game time of day

slow traffic

Add link direction

bad weather

traffic jam baseball game time of day

slow traffic

go to baseball game Causes traffic jam

Knowledge from ConceptNet5

traffic jam CapableOfoccur twice each day traffic jam CapableOf slow traffic

126

Page 125: Transforming Big Data into Smart Data

127

Scheduled Event

Active Event

Day of week Time of day

delay

Travel time

speed

volume

Structure extracted form traffic observations (sensors + textual) using statistical techniques

Scheduled Event

Active Event

Day of week

Time of day

delay Travel time

speed

volume

Bad Weather

Enriched structure which has link directions and new nodes such as “Bad Weather” potentially leading to better delay predictions

Enriched Probabilistic Models using ConceptNet 5

Page 126: Transforming Big Data into Smart Data

Take Away

• It is all about the human – not computing, not device – Computing for human experience

• Whatever we do in Smart Data, focus on human-in-the-loop (empowering machine computing!): – Of Human, By Human, For Human

– But in serving human needs, there is a lot more than

what current big data analytics handle – variety, contextual, personalized, subjective, spanning data and knowledge across P-C-S dimensions

129

Page 127: Transforming Big Data into Smart Data

Acknowledgements

• Kno.e.sis team

• Funds: NSF, NIH, AFRL, Industry…

• Note:

• For images and sources, if not on slides, please see slide notes

• Some images were taken from the Web Search results and all such images belong to their respective owners, we are grateful to the owners for usefulness of these images in our context.

130

Page 128: Transforming Big Data into Smart Data

• OpenSource: http://knoesis.org/opensource

• Showcase: http://knoesis.org/showcase

• Vision: http://knoesis.org/node/266

• Publications: http://knoesis.org/library

131

References and Further Readings

Page 129: Transforming Big Data into Smart Data

Thanks …

132

Page 130: Transforming Big Data into Smart Data

133

Physical Cyber Social Computing

Amit Sheth, Kno.e.sis, Wright State

Page 132: Transforming Big Data into Smart Data

135

thank you, and please visit us at

http://knoesis.org/vision

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing

Wright State University, Dayton, Ohio, USA

Smart Data