The Aha! Moment: From Data to Insight

  • View
    23

  • Download
    1

Embed Size (px)

DESCRIPTION

The Aha! Moment: From Data to Insight. Dafna Shahaf Joint work with Carlos Guestrin , Eric Horvitz, Jure Leskovec. Acquiring Data Used to be Hard Work. Census Interviewer, 1930. How many cows do you own?. Not Anymore. Cow Tracking System, 2008. We Have LOTS of Data. Huge Potential - PowerPoint PPT Presentation

Transcript

Slide 1

The Aha! Moment: From Data to InsightDafna Shahaf

Joint work with Carlos Guestrin, Eric Horvitz, Jure Leskovec

Acquiring Data Used to be Hard Work2

Census Interviewer, 1930

How many cows do you own?Department of Agriculture2 Not Anymore3

Cow Tracking System, 2008

We Have LOTS of DataHuge PotentialScience, business, sports, public healthIn order for this data to be useful, we must understand itTurn data into insight!

4

Large-scale data has potential to transform almost every aspect of our world, from science to business.It has potential for addressing some of societys most pressing challenges45My Goal: Develop computational approaches for turning data into insightWhat is insight?How to help people understand The structure of data?What is interesting in data?How to facilitate discoveries?

Example: News

So, you want to understand a complex news story6

Search Engines are GreatBut do not show how it all fits together7

About 57,500,000 results.How do they fit together?About 57,500,000 results7

Timeline Systems8e.g., NewsJunkie [Gabrilovich, Dumais, Horvitz]

Pacific Campaign of World War II8Real Stories are not Linear9

Todays complex stories spread into branches, side stories, dead ends, and intertwiningnarratives.9Holy Grail: Issue Maps10

An issue map (or argument map) is a visual representation of the structure of a topic. It includes components such as amain contention, premises, co-premises, objections, rebuttals and lemmas. Typically it is a directed graph, with nodescorresponding to propositions and edges corresponding to relationships (e.g. dispute or support).10

is supported byHoly Grail: Issue Maps11we can imagine artifacts that have feelings [Smart 59]machines cant have emotionsconcept of feeling only applies to living organisms[Ziff 59] is disputed byChallenge: Build automatically!Proposed System: Metro MapsInput: A set of documentsOutput: A map -- a set of storylines Each line follows a coherent narrative threadTemporal Dynamics + Structure12austeritybailoutjunk statusGermanyprotestsstrikelabor unionsMerkel

Example: Greek debt crisis MapHard problem!Our Approach:What makes a good map?How to formalize it?How to optimize it?

Finding Good Maps13Metro Maps of Information [S, Guestrin, Horvitz, WWW12]Note that this problem is hard, mostly because we dont really know what were looking for. I know a good map when I see it, you know a good map when you see it, but its a very intuitive property.We first need to figure out what we are looking for, then formalize it mathematically in other words, define the problem. Then find a tractable algorithm that optimizes our objective.13Properties of a Good Map14

Coherence

d1d2d3d4d5Coherence: Main Idea15Connecting the Dots [S, Guestrin, KDD10]How to measure coherence of a chain of documents?

Strong transitionsGlobal themeGreek debt crisisRepublicans and the debt crisisThe Pope and RepublicansProtests in ItalyThe main point was that coherence of a chain of articles is not a property of local interactionsbetween neighbouring articles. You have to remember the context of the rest of the chain.

Let me show you what I mean by consider only local interactions.Consider this article. Bars mean that the word on the left appears in the article above it,so this article is about the debt default in Greece. Now you start building a chain. You look for similar articles, and find this one, about Republicans opinion of the debt crisis. Next, you forget all about the first article, and start looking for articles similar to the new one.15Properties of a Good Map16

Coherence

Is it enough?Max-coherence MapQuery: Greek debt17Asian trading sluggish as markets fret about GreeceGreek Civil ServantsStrike over Austerity MeasuresJapanese stocks plunge on Greece debt problemsGreek Strike Against Austerity Is GrowingGreece Paralyzedby New StrikeStrike against austerity plan halts trafficAsian markets higher in holiday-thinned trade

Not importantRedundantProperties of a Good Map18

Coherence

2. CoverageShould cover diverse topics important to the user

Essential, frame as challenges: black swan 18Coverage: Idea

Documents cover words:

CorpusCoverageTurning Down the Noise [El-Arini, Veda, S, Guestrin, KDD09]1919High-coverage, Coherent MapQuery: Greek debt20Greek Civil ServantsStrike over Austerity MeasuresGreece Paralyzedby New StrikeGreek Take to theStreets, but LackingEarlier ZealInfighting Adds to Merkels WoesIts Germany that MattersUK Backs Germanys EffortGermany says the IMF should Rescue GreeceIMF more Likely to Lead EffortsIMF is Urged to Move Forward

Related but disconnectedProperties of a Good Map21

Coherence

2. Coverage

3. Connectivity

Mathematical Formulation

Coherence

2. Coverage

3. Connectivity

Optimization Problem: Linear Programming + Rounding

Submodular Optimization

Encourage Line Intersection

Algorithm with theoretical guarantees22Example Map: Greek Debt23Greek bonds rated 'junk' by Standard & Poor's Greece Struggles to Stay Afloat as Debts Pile OnE.U. Official Backs Greeces Deficit Cutting Plan

EU Sets Deadline for Greece to Make CutsGreek economyGreek Workers Protest Austerity PlanGreek Civil Servants Strike Over Austerity MeasuresGreeks Take to the Streets, but Lacking Earlier ZealGreece Paralyzed by New StrikeStrikes and RiotsInfighting Adds to Merkels Woes

Euro Unity? Its Germany That Matters

Germany Now Says I.M.F. Should Rescue GreeceU.K. Backs Germanys Effort to Support EuroGermany and the EUI.M.F. More Likely to Lead Efforts for Greek Aid

I.M.F. Is Urged to Move Forward on Voting Changes

IMFGreece Gets Help but is it Enough?Is it good?

EvaluationChallenging to evaluateMany machine learning/ data mining techniques use surrogate evaluation metricsUser studies are fundamental

Data: All New York Times articles (2008-2010)Queries: Chile miners, Haiti earthquake, Greek debt

24Study Question: Can maps help news readers understand news events?30 million, infrastructure24Task 1: Simple Question Answering10 questions per task

Measured total knowledge and rateMaps, Google News, Topic Detection and Tracking [Nallapati et al, CIKM '04]338 unique users, minor gains

25Question 2: How many miners were trapped?

Maps are not about small details, they are about the big picture!Task 2: High-Level UnderstandingSummarize complex story in a paragraph

Other people evaluate paragraphs:Which paragraph provided a more complete and coherent picture of the story?

26

Task 2: High-Level Understanding15 paragraph writers, ~300 evaluations per taskResults: big gains, especially for complex stories 72% preferred maps about Greece59% for Haiti

27Bottom line: maps are more useful as high-level tools for stories without a single dominant storyline

So, you want to understand a complex news story28

Maps are Easy to Adapt to Other DomainsPrinciples stay the sameUse domain knowledge to improve objectiveExamples:ScienceLegalBooks

29

Application 2: ScienceData: ACM PapersSlight modifications to the objectiveTaking advantage of citation graphAlgorithm stays the same!

30

Metro Maps of Science [S, Guestrin, Horvitz, KDD12]Goal: Understand the state of the artWhat is reinforcement learning up to?Example Map: Reinforcement Learning

31multi-agent cooperative joint teammdp states pomdp transition optioncontrol motor robot skills armbandit regret dilemma exploration armq-learning bound optimal rmax mdpUser StudyUpdate a survey paper from 1996 about Reinforcement LearningIdentify research directions + relevant papersControl group: Google Scholar Treatment group: Metro Map and Google Scholar 32Study Question: Can maps help a first-year grad student learn a new topic better than current tools?Evaluation30 participantsPrecision: Judge scoring papersRecall: List of top-10 subareas ofReinforcement Learning

Results (in a nutshell)34

BetterGoogleMapsGoogleMapsOn average , map users find 10% more relevant papers, and cover 2.7 more of the top-10 areas

Application 3: Legal DocumentsGoal: Help lawyers preparing for litigation

Data: Supreme court decisions35Goal: Help lawyers argue a caseCommerce Clause36

Power to prohibit commerce Congress's power to regulate 11th amendment, state sovereignty Merely vs substantially affects Regulating wholesale energy sale interstate, commerce, affect, regulate congress, interest, regulate, channel immunity, sovereignty, amendment, eleventh affects, substantial, regulate wholesale, electricity, resale, steam, utilitiesLawyer LabelsCoherence Words

Application 4: BooksGoal: Structure of a bookLord of the Rings

Data: Lord of the Rings37Goal: Structure of a book

Lord of the Rings Map38Making Maps UsefulScalabilityHandle web-scale corpusInteractionMulti-resolution: Zoom in to learn moreWord feedback: Personalized coverageDifferent points-of-view for controversial topics

Website + Open-Source Package39Information Cartography [S, Yang, Suen, Jacobs, Wang, Leskovec, KDD13]Metro Maps: RecapA news-reader, a first-year student, a paralegal ...Used to rely on searchCan now get perspective on the fieldSee structure and connections

User studies validate our method40

What about making new connections?

The Aha! Project41Challenge: Finding insightful connections in data Define insight

Content, query, interaction, scaling41Properties of Insight (Abstract)SurpriseNot enough!We can extract many surprising connectionsNoise, bias, coincidence

Plausibility Well-supported by