54
Social Media Analysis and Reccomending Systems: An Introduction to Question Answering Roberto Basili (Università di Roma, Tor Vergata) Master in Big Data, June 2016 Most slides from the teaching material of “Introduction to Information Retrieval”, Manning, Raghavan & Schutze, Cambridge University Press 2008, by C. Manning

Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

  • Upload
    vuminh

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Social Media Analysis and Reccomending Systems:

An Introduction to Question Answering

Roberto Basili (Università di Roma, Tor Vergata)

Master in Big Data, June 2016

Most slides from the teaching material of “Introduction to Information Retrieval”, Manning, Raghavan & Schutze, Cambridge University Press 2008, by C. Manning

Page 2: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Overview

• From query-based and keyword-based IR to Question Answering

• QA over Structured data

• From NL questions to SQL queries

• Major Approaches to QA

• Knowledge-based approaches to QA

• Text-based QA systems

• An architecture for a text-based QA system

• Question Classification and Answer matching as a classification task

Page 3: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Motivations

Parsing, Semantic

Interpretation,

Language Processing

NERC, Relation Extraction

Coreference

Trend Analysis, Community Detection,

Recommending

Social Media Analysis

Opinion Mining, EmotionalAnalysis,

ReputationManagement

Bayesianmodeling, SVM,

kernel machines, NN

Machine Learning

Clustering, Language modeling

Embeddings

Classification, Indexing,

Search, QA

Information Retrieval

Ranking, User Modeling

Page 4: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Motivations

Parsing, Semantic

Interpretation,

Language Processing

NERC, Relation Extraction

Coreference

• Unstructured Data are Epistemologically Opaque

• Queries are often too poor surrogates for expressing user needs

• The Web provides a context for any search interaction able to bettercharacterize any query

• NL interactions, i.e. interactive Question Answering, is a possible and viableevolution of Web search, actually very promising

Page 5: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Web Search in 2020?

The web, it is a changing.

What will people do in 2020?

• Type key words into a search box?

• Use the Semantic Web?

• Speak to your computer with natural language search?

• Use social or “human powered” search?

Page 6: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

GoogleWhat’s been happening? 2014

• New search index at Google: “Hummingbird”

• Answering long, “natural language” questions better

• Partly to deal with spoken queries on mobile

• More use of the Google Knowledge Graph

• Concepts versus words

Page 7: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

What’s been happening

• Google Knowledge Graph

• Facebook Graph Search

• Bing’s Satori

• Things like Wolfram Alpha

Common theme: Doing graph search over structured knowledge rather than traditional text search

Page 8: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

What’s been happening

• More semi-structured information embedded in web pages

• schema.org

Page 9: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

What’s been happening

• Move to mobile favors a move to speech which favors “natural language information search”

• Will we move to a time when over half of searches are spoken?

Page 10: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Towards intelligent agents

Two goals

• Things not strings

• Inference not only search

Page 11: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Two paradigms for question answering

• Text-based approaches

• TREC QA, IBM Watson

• Structured knowledge-based approaches

• Apple Siri, Wolfram Alpha, Facebook Graph Search

(And, of course, there are hybrids, including some of the above.)

At the moment, structured knowledge is back in fashion, but it may or may not last

Page 12: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Example from Fernando Pereira (GOOG)

Page 13: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction
Page 14: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction
Page 15: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction
Page 16: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction
Page 17: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction
Page 18: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction
Page 19: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction
Page 20: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction
Page 21: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Patrick Pantel talk(Then) Current experience

Page 22: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Desired experience: Towards actions

Page 23: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Learning actions from web usage logs

Page 24: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Entity disambiguation and linking

• Key requirement is that entities get identified

• Named entity recognition (e.g., Stanford NER!)

• and disambiguated

• Entity linking (or sometimes “Wikification”)

• e.g., Michael Jordan the basketballer or the ML guy

Page 25: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Sergio talked to

Ennio about

Eli‘s role in the

Ecstasy scene.

This sequence on

the graveyard

was a highlight in

Sergio‘s trilogy

of western films.

Mentions, Meanings, Mappings [G. Weikum]

Sergio means Sergio_LeoneSergio means Serge_GainsbourgEnnio means Ennio_AntonelliEnnio means Ennio_MorriconeEli means Eli_(bible)Eli means ExtremeLightInfrastructureEli means Eli_WallachEcstasy means Ecstasy_(drug)Ecstasy means Ecstasy_of_Goldtrilogy means Star_Wars_Trilogytrilogy means Lord_of_the_Ringstrilogy means Dollars_Trilogy… … …

KB

Eli (bible)

Eli Wallach

Mentions

(surface names)

Entities

(meanings)

Dollars Trilogy

Lord of the Rings

Star Wars Trilogy

Benny Andersson

Benny Goodman

Ecstasy of Gold

Ecstasy (drug)

?

Page 26: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

• and linked to a canonical reference

• Freebase, dbPedia, Yago2, (WordNet)

Page 27: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

3 approaches to question answering:Knowledge-based approaches (Siri)

• Build a semantic representation of the query

• Times, dates, locations, entities, numeric quantities

• Map from this semantics to query structured data or resources (SQL or SparQL)

• Geospatial databases

• Ontologies (Wikipedia infoboxes, dbPedia, WordNet, Yago)

• Restaurant review sources and reservation services

• Scientific databases

• Wolfram Alpha

27

Page 28: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

28

Types of Questions in Modern Systems

• Factoid questions

• Who wrote “The Universal Declaration of Human Rights”?

• How many calories are there in two slices of apple pie?

• What is the average age of the onset of autism?

• Where is Apple Computer based?

• Complex (narrative) questions:

• In children with an acute febrile illness, what is the efficacy of acetaminophen in reducing fever?

• What do scholars think about Jefferson’s position on dealing with pirates?

Page 29: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Text-based (mainly factoid) QA

• QUESTION PROCESSING

• Detect question type, answer type, focus, relations

• Formulate queries to send to a search engine

• PASSAGE RETRIEVAL

• Retrieve ranked documents

• Break into suitable passages and rerank

• ANSWER PROCESSING

• Extract candidate answers (as named entities)

• Rank candidates

• using evidence from relations in the text and external sources

Page 30: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Hybrid approaches (IBM Watson)

• Build a shallow semantic representation of the query

• Generate answer candidates using IR methods

• Augmented with ontologies and semi-structured data

• Score each candidate using richer knowledge sources

• Geospatial databases

• Temporal reasoning

• Taxonomical classification

30

Page 31: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

31

IR-based Factoid QA

Document

DocumentDocument

Document

Document

Document

Document

Document

Question Processing

PassageRetrieval

Query Formulation

Answer Type Detection

Question

Passage Retrieval

Document Retrieval

Answer Processing

Answer

passages

Indexing

RelevantDocs

DocumentDocument

Document

Page 32: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

IR-based Factoid QA

• QUESTION PROCESSING

• Detect question type, answer type, focus, relations

• Formulate queries to send to a search engine

• PASSAGE RETRIEVAL

• Retrieve ranked documents

• Break into suitable passages and rerank

• ANSWER PROCESSING

• Extract candidate answers

• Rank candidates

using evidence from the text and external sources

Page 33: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Question ProcessingThings to extract from the question

• Answer Type Detection

• Decide the named entity type (person, place) of the answer

• Query Formulation

• Choose query keywords for the IR system

• Question Type classification

• Is this a definition question, a math question, a list question?

• Focus Detection

• Find the question words that are replaced by the answer

• Relation Extraction

• Find relations between entities in the question

33

Page 34: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Question Processing

They’re the two states you could be reentering if you’re crossing Florida’s northern border

• Answer Type: US state

• Query: two states, border, Florida, north

• Focus: the two states

• Relations: borders(Florida, ?x, north)

34

Page 35: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Answer Type Detection: Named Entities

• Who founded Virgin Airlines?

• PERSON

• What Canadian city has the largest population?

• CITY.

Page 36: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Answer Type Taxonomy

• 6 coarse classes

• ABBEVIATION, ENTITY, DESCRIPTION, HUMAN, LOCATION, NUMERIC

• 50 finer classes

• LOCATION: city, country, mountain…

• HUMAN: group, individual, title, description

• ENTITY: animal, body, color, currency…

36

Xin Li, Dan Roth. 2002. Learning Question Classifiers. COLING'02

Page 37: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

37

Part of Li & Roth’s Answer Type Taxonomy

LOCATION

NUMERIC

ENTITY HUMAN

ABBREVIATIONDESCRIPTION

country city state

date

percent

money

sizedistance

individual

title

group

food

currency

animal

definition

reasonexpression

abbreviation

Page 38: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

38

Answer Types

Page 39: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Answer types in Jeopardy

• 2500 answer types in 20,000 Jeopardy question sample

• The most frequent 200 answer types cover < 50% of data

• The 40 most frequent Jeopardy answer types

he, country, city, man, film, state, she, author, group, here, company, president, capital, star, novel, character, woman, river, island, king, song, part, series, sport, singer, actor, play, team, show, actress, animal, presidential, composer, musical, nation, book, title, leader, game

39

Ferrucci et al. 2010. Building Watson: An Overview of the DeepQA Project. AI Magazine. Fall 2010. 59-

79.

Page 40: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Knowledge:Not just semantics but pragmatics

Pragmatics = taking account of context in determining meaning

Search engines are great because they inherently take into account pragmatics (“associations and contexts”)

• [the national] The National (a band)

• [the national ohio] The National - Bloodbuzz Ohio – YouTube

• [the national broadband] www.broadband.gov

Page 41: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Task – Answer Sentence Selection

• Given a factoid question, find the sentence that

• Contains the answer

• Can sufficiently support the answer

Q: Who won the best actor Oscar in 1973?

S1: Jack Lemmon was awarded the Best Actor Oscar for Save

the Tiger (1973).

S2: Academy award winner Kevin Spacey said that Jack

Lemmon is remembered as always making time for others.

Scott Wen-tau Yih (ACL 2013) paper

Page 42: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Assume that there is an underlying alignment

Describes which words in and can be associated

What is the fastest car in the world?

The Jaguar XJ220 is the dearest, fastest and most sought after car on the planet.

Word Alignment for Question AnsweringTREC QA (1999-2005)

See if the (syntactic/semantic) relations support the answer

[Harabagiu & Moldovan, 2001]

Page 43: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Full NLP QA: LCC (Harabagiu/Moldovan) [below is the architecture of LCC’s QA system circa 2003]

Question Parse

Semantic

Transformation

Recognition of

Expected Answer

Type (for NER)

Keyword Extraction

Factoid

Question

List

Question

Named Entity

Recognition

(CICERO LITE)

Answer Type

Hierarchy

(WordNet)

Question Processing

Question Parse

Pattern Matching

Keyword Extraction

Question ProcessingDefinition

Question Definition

Answer

Answer Extraction

Pattern Matching

Definition Answer Processing

Answer Extraction

Threshold Cutoff

List Answer ProcessingList

Answer

Answer Extraction (NER)

Answer Justification

(alignment, relations)

Answer Reranking

(~ Theorem Prover)

Factoid Answer Processing

Axiomatic Knowledge

Base

Factoid

AnswerMultiple

Definition

Passages

Pattern

Repository

Single Factoid

Passages

Multiple

List

Passages

Passage Retrieval

Document Processing

Document Index

Document

Collection

Page 44: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Question Answering: IBM’s Watson

• Won Jeopardy on February 16, 2011!

WILLIAM WILKINSON’S

“AN ACCOUNT OF THE PRINCIPALITIES OF

WALLACHIA AND MOLDOVIA”

INSPIRED THIS AUTHOR’S

MOST FAMOUS NOVEL

Bram Stoker

Page 45: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

IBM Watson: between Intelligence and Data

• IBM’s Watson

• http://www-03.ibm.com/innovation/us/watson/science-behind_watson.shtml

45

Page 46: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Jeopardy!

Page 47: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Watson

Page 48: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Watson

Page 49: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

WatsonWILLIAM WILKINSON’S

“AN ACCOUNT OF THE PRINCIPALITIES OF

WALLACHIA AND MOLDOVIA”

INSPIRED THIS AUTHOR’S

MOST FAMOUS NOVEL

Page 50: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Semantic Inference in Watson QA

Page 51: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

… Intelligence in Watson

51

Page 52: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Watson: a DeepQA architecture

52

Page 53: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Pronti per Jeopardy!

53

Page 54: Social Media Analysis and Reccomending Systems An ...ai-nlp.info.uniroma2.it/basili/didattica/BigData/003_INtro_to... · Social Media Analysis and Reccomending Systems: An Introduction

Riferimenti

• NLP, IR & ML:

• «Statistical Methods for Speech Recognition», F. Jelinek, MIT Press, 1998

• «Speech and Language Processing”, D. Jurafsky and J. H .Martin, Prentice-Hall, 2009.

• “Introduction to Information Retrieval”, Manning, Raghavan & Schutze, Cambridge University Press 2008.

• Reti Sociali e Data Analytics

• Community Detection and Mining in Social Media, Lei Tang, Huan Liu, Morgan & Claypool Publishers, 2010.

• Analyzing the Social Web, Jennifer Golbeck, Elsevier, 2015.

• Sitografia:

• SAG, Univ. Roma Tor Vergata: http://sag.art.uniroma2.it/