93
Coupled Bayesian Sets Algorithm for Semi- supervised Learning and Information Extraction Saurabh Verma IIT (BHU), Varanasi Estevam R. Hruschka Jr. Federal University of São Carlos, Brazil ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

  • Upload
    kareem

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction. Saurabh Verma IIT (BHU), Varanasi Estevam R. Hruschka Jr. F ederal University of São Carlos, Brazil. http:// rtw.ml.cmu.edu. NELL: Never-Ending Language Learner. Inputs: initial ontology - PowerPoint PPT Presentation

Citation preview

Page 1: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Bayesian Sets Algorithm for Semi-supervised Learning and

Information Extraction

Saurabh VermaIIT (BHU), Varanasi

Estevam R. Hruschka Jr.Federal University of São Carlos, Brazil

Page 2: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

http://rtw.ml.cmu.edu

Page 3: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

NELL: Never-Ending Language Learner

Inputs: initial ontology handful of examples of each predicate in ontology the web occasional interaction with human trainers

The task: run 24x7, forever• each day:

1. extract more facts from the web to populate the initial ontology

2. learn to read (perform #1) better than yesterday

Page 4: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

NELL: Never-Ending Language Learner

Goal:• run 24x7, forever• each day:

1. extract more facts from the web to populate given ontology

2. learn to read better than yesterday

Today...Running 24 x 7, since January, 2010Input: • ontology defining ~800 categories and relations • 10-20 seed examples of each • 1 billion web pages (ClueWeb – Jamie Callan)

Result: • continuously growing KB with +1,300,000 extracted beliefs

Page 5: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

NELL: Never-Ending Language Learner

http://rtw.ml.cmu.edu

Page 6: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

The Problem with Semi-Supervised Bootstrap Learning

ParisPittsburghSeattleCupertino

Page 7: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

The Problem with Semi-Supervised Bootstrap Learning

ParisPittsburghSeattleCupertino

… mayor of arg1… live in arg1

Page 8: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

The Problem with Semi-Supervised Bootstrap Learning

ParisPittsburghSeattleCupertino

… mayor of arg1… live in arg1

San FranciscoAustindenial

Page 9: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

The Problem with Semi-Supervised Bootstrap Learning

ParisPittsburghSeattleCupertino

… mayor of arg1… live in arg1

San FranciscoAustindenial

Page 10: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

The Problem with Semi-Supervised Bootstrap Learning

ParisPittsburghSeattleCupertino

… mayor of arg1… live in arg1

San FranciscoAustindenial

arg1 is home of …… traits such as arg1

Page 11: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

The Problem with Semi-Supervised Bootstrap Learning

ParisPittsburghSeattleCupertino

… mayor of arg1… live in arg1

San FranciscoAustindenial

arg1 is home of …… traits such as arg1

Page 12: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

The Problem with Semi-Supervised Bootstrap Learning

ParisPittsburghSeattleCupertino

… mayor of arg1… live in arg1

San FranciscoAustindenial

arg1 is home of …… traits such as arg1

Page 13: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

The Problem with Semi-Supervised Bootstrap Learning

ParisPittsburghSeattleCupertino

… mayor of arg1… live in arg1

San FranciscoAustindenial

arg1 is home of …… traits such as arg1

it’s underconstrained!!

Page 14: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

Bayesian Sets (BS)

Given and , rank the elements of by how well they would “fit into” a set which includes

Define a score for each :

From Bayes rule, the score can be re-written as:

}{xD DDc D

cD

)()(

)(x

xx

pDp

score c

Dx

)()(),()(c

c

DppDpscore

xxx

Ghahramani & Heller; NIPS 2005

Page 15: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

Bayesian Sets (BS)

Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.

Ghahramani & Heller; NIPS 2005

)()(),()(c

c

DppDpscore

xxx

Page 16: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

Bayesian Sets (BS)

Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.

Ghahramani & Heller; NIPS 2005

)()(),()(c

c

DppDpscore

xxx

Page 17: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

Bayesian Sets (BS)

Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.

Ghahramani & Heller; NIPS 2005

)()(),()(c

c

DppDpscore

xxx

Page 18: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

Bayesian Sets (BS)

Intuitively, the score compares the probability that x and Dc were generated by the same model with the same unknown parameters θ, to the probability that x and Dc came from models with different parameters θ and θ’.

Ghahramani & Heller; NIPS 2005

)()(),()(c

c

DppDpscore

xxx

Page 19: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

BS using NELL’s Ontology

Initial ontology: Everything

PersonCompany VegetableSport

Page 20: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Initial ontology: Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

BS using NELL’s Ontology

Page 21: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus, run BS onceEverything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

BS using NELL’s Ontology

Page 22: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus, run BS onceEverything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

TexacoFacebook

DELL…

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

FreudTom Mitchell

AristotleAlan Turing

Alexander Fleming…

BasketballFootball

SwimmingTennisGolf

SoccerVolleyballJogging

MarathonBaseball

Badminton…

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Sao PauloBrisbaneBeijingCairo

BS using NELL’s Ontology

Page 23: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

BS using NELL’s Ontology

Page 24: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

BS using NELL’s Ontology

Page 25: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

BS using NELL’s Ontology

Page 26: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus, iteratively run BSEverything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

Boeing

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

Freud

BasketballFootball

SwimmingTennisGolf

SoccerVolleyball

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Iterative BS using NELL’s OntologyZhang & Liu, 2011

Page 27: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus, iteratively run BSEverything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

FreudTom Mitchell

Aristotle

BasketballFootball

SwimmingTennisGolf

SoccerVolleyballJogging

Marathon

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Sao PauloBrisbane

Iterative BS using NELL’s OntologyZhang & Liu, 2011

Page 28: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus, iteratively run BSEverything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

TexacoFacebook

DELL…

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

FreudTom Mitchell

AristotleAlan Turing

Alexander Fleming…

BasketballFootball

SwimmingTennisGolf

SoccerVolleyballJogging

MarathonBaseball

Badminton…

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Sao PauloBrisbaneBeijingCairo

Iterative BS using NELL’s OntologyZhang & Liu, 2011

Page 29: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Iterative BS using NELL’s Ontology

Page 30: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Iterative BS using NELL’s Ontology

Page 31: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

NELL: Coupled semi-supervised training of many functions

Page 32: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Training Type 2:Structured Outputs, Multitask, Posterior

Regularization, Multilabel

Learn functions with the same input, different outputs, where we know some constraint

Page 33: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Training Type 2:Structured Outputs, Multitask, Posterior

Regularization, Multilabel

Learn functions with the same input, different outputs, where we know some constraint

Page 34: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Training Type 2:Structured Outputs, Multitask, Posterior

Regularization, Multilabel

Learn functions with the same input, different outputs, where we know some constraint

Page 35: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Bayesian Sets (CBS)

Page 36: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Bayesian Sets (CBS)

Page 37: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Bayesian Sets (CBS)

Page 38: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Bayesian Sets (CBS)

Page 39: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Bayesian Sets (CBS)

Page 40: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Coupled Bayesian Sets (CBS)

Page 41: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 42: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);MutuallyExclusive(Company,Sport);MutuallyExclusive(Company,City);MutuallyExclusive(Pearson,Sport);…

Page 43: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);MutuallyExclusive(Company,Sport);MutuallyExclusive(Company,City);MutuallyExclusive(Pearson,Sport);…

Page 44: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);MutuallyExclusive(Company,Sport);MutuallyExclusive(Company,City);MutuallyExclusive(Pearson,Sport);…

Page 45: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);MutuallyExclusive(Company,Sport);MutuallyExclusive(Company,City);MutuallyExclusive(Pearson,Sport);…

Page 46: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);MutuallyExclusive(Company,Sport);MutuallyExclusive(Company,City);MutuallyExclusive(Pearson,Sport);…

Page 47: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);MutuallyExclusive(Company,Sport);MutuallyExclusive(Company,City);MutuallyExclusive(Pearson,Sport);…

Page 48: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);MutuallyExclusive(Company,Sport);MutuallyExclusive(Company,City);MutuallyExclusive(Pearson,Sport);…

Page 49: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahoo

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

MutuallyExclusive(Company,Person);MutuallyExclusive(Company,Sport);MutuallyExclusive(Company,City);MutuallyExclusive(Pearson,Sport);…

Page 50: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

Boeing

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 51: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

Boeing

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 52: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

Boeing

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

…AT&T

Boeing

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 53: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

Boeing

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

…AT&T

Boeing

BasketballFootball

SwimmingTennisGolf…

AT&TBoeing

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 54: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

Boeing

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

…AT&T

Boeing

BasketballFootball

SwimmingTennisGolf…

AT&TBoeing

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town…

AT&TBoeing

CBS using NELL’s Ontology

Page 55: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

Boeing

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 56: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 57: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 58: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

…Brazil Telecom

Texaco

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 59: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

…Brazil Telecom

Texaco

BasketballFootball

SwimmingTennisGolf…

Brazil TelecomTexaco

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 60: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

…Brazil Telecom

Texaco

BasketballFootball

SwimmingTennisGolf…

Brazil TelecomTexaco

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town…

Brazil TelecomTexaco

CBS using NELL’s Ontology

Page 61: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

Given a huge web corpus and mutually exclusiveness constraints, iteratively run BS

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco

Peter FlachBill ClintonJeremy Lin

AdeleBarak Obama

BasketballFootball

SwimmingTennisGolf

BristolPittsburgh

Rio de JaneiroTokyo

Cape Town

CBS using NELL’s Ontology

Page 62: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 63: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 64: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 65: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 66: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 67: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 68: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 69: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 70: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 71: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 72: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 73: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 74: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

Page 75: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s OntologyWhat if we do not have the mutual exclusiveness constraints?

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco…

Great BritainKeyboard

Pencil

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

FreudTom Mitchell

Aristotle…

BasketballFootball

SwimmingTennisGolf

SoccerVolleyballJogging

Marathon…

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Sao PauloBrisbane

Page 76: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s OntologyWhat if we do not have the mutual exclusiveness constraints?

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco…

Great BritainKeyboard

Pencil

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

FreudTom Mitchell

Aristotle…

BasketballFootball

SwimmingTennisGolf

SoccerVolleyballJogging

Marathon…

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Sao PauloBrisbane

Not Company

Page 77: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s OntologyWhat if we do not have the mutual exclusiveness constraints?

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco…

Great BritainKeyboard

Pencil

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

FreudTom Mitchell

Aristotle…

BasketballFootball

SwimmingTennisGolf

SoccerVolleyballJogging

Marathon…

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Sao PauloBrisbane

Not Company

Great BritainKeyboard

Pencil

Page 78: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s OntologyWhat if we do not have the mutual exclusiveness constraints?

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco…

Great BritainKeyboard

Pencil

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

FreudTom Mitchell

Aristotle…

BasketballFootball

SwimmingTennisGolf

SoccerVolleyballJogging

Marathon…

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Sao PauloBrisbane

Not Company

Great BritainKeyboard

Pencil

Page 79: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s OntologyWhat if we do not have the mutual exclusiveness constraints?

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco…

Great BritainKeyboard

Pencil

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

FreudTom Mitchell

Aristotle…

BasketballFootball

SwimmingTennisGolf

SoccerVolleyballJogging

Marathon…

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Sao PauloBrisbane

Not Company

Great BritainKeyboard

Pencil

MutuallyExclusive(Company,NotCompany);

Page 80: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s OntologyWhat if we do not have the mutual exclusiveness constraints?

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco…

Great BritainKeyboard

Pencil

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

FreudTom Mitchell

Aristotle…

BasketballFootball

SwimmingTennisGolf

SoccerVolleyballJogging

Marathon…

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Sao PauloBrisbane

Not Company

Great BritainKeyboard

Pencil

MutuallyExclusive(Company,NotCompany);

Page 81: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s OntologyWhat if we do not have the mutual exclusiveness constraints?

Everything

PersonCompany CitySport

AppleMicrosoftGoogle

IBMYahooAT&T

BoeingBrazil Telecom

Texaco…

Great BritainKeyboard

Pencil

Peter FlachBill ClintonJeremy Lin

AdeleBarak ObamaDalai Lama

FreudTom Mitchell

Aristotle…

BasketballFootball

SwimmingTennisGolf

SoccerVolleyballJogging

Marathon…

BristolPittsburgh

Rio de JaneiroTokyo

Cape TownNew YorkLondon

Sao PauloBrisbane

Not Company

Great BritainKeyboard

Pencil

MutuallyExclusive(Company,NotCompany);

Page 82: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Page 83: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Page 84: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Page 85: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Page 86: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Page 87: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Page 88: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What if we do not have the mutual exclusiveness constraints?

Page 89: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What about Semantic Relations?

Page 90: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What about Semantic Relations?

Page 91: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

CBS using NELL’s Ontology

What about Semantic Relations?

Page 92: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

ConclusionsCoupled Bayesian Sets

• semi-supervised learning approach to extract category instances (e.g. country(USA), city(New York) from web pages;

• based on the original Bayesian Sets • can outperform algorithms such as the original Bayesian

Set, the Naive Bayes classifier, the Bas-all and the coupled semi-supervised logistic regression algorithm (CPL);

• can be used to automatically generate new constraints to the set expansion task even when no mutually exclusiveness relationship is previously defined

Page 93: Coupled Bayesian Sets Algorithm for Semi-supervised Learning and Information Extraction

ECML/PKDD2012 Bristol, UK September, 26th, 2012

AcknowledgementsThanks to:

ECML/PKDD2012 audience! Also Thanks to:

• Department of Science and Technology, Government of India under Indo-Brazil cooperation programme;

• Brazilian research agencies CAPES and CNPq;

• Zoubin Ghahramani and K.A. Heller;

• All the Read The Web group;

contact: [email protected]://rtw.ml.cmu.edu