59
YAGO3: A Knowledge Base from Multilingual Wikipedias Farzaneh Mahdisoltani Joanna Biega Fabian M. Suchanek CIDR 2015

yago3 - Max Planck Societyjbiega/slides/yago3_talk... · 2015. 1. 11. · (John_Coltrane, Alice_Coltrane) (Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • YAGO3:A Knowledge Base from Multilingual Wikipedias

    Farzaneh MahdisoltaniJoanna Biega

    Fabian M. Suchanek

    CIDR 2015

  • 2

  • 2

  • John_Coltrane

    2

  • John_ColtranewasBornOnDate

    wasBornIn

    “1926-09-23”

    Hamlet_(Town)

    label

    “John William Coltrane”

    2

  • John_ColtranewasBornOnDate

    wasBornIn

    “1926-09-23”

    Hamlet_(Town)

    labeltype

    “John William Coltrane”

    American_Jazz_Composer

    2

  • John_ColtranewasBornOnDate

    wasBornIn

    “1926-09-23”

    Hamlet_(Town)

    labeltype

    “John William Coltrane”

    locatedIn

    United_States

    subclassOf

    wordnet_composer

    locatedIn

    North_America

    subclassOf

    wordnet_musician

    2

    American_Jazz_Composer

  • John_ColtranewasBornOnDate

    wasBornIn

    “1926-09-23”

    Hamlet_(Town)

    labeltype

    “John William Coltrane”

    locatedIn

    United_States

    subclassOf

    wordnet_composer

    locatedIn

    North_America

    subclassOf

    wordnet_musician

    2

    American_Jazz_Composer120M facts

    10M entities 100 relations95% precision

  • YAGO can be used in many ways

    Named Entity Disambiguation

    J. Hoffart et al., Robust Disambiguation of Named Entities in Text, EMNLP2011 3

  • YAGO can be used in many ways

    Named Entity Disambiguation

    Semantic Culturomics

    F. M. Suchanek, N. Preda, Semantic Culturomics, VLDB2014

    T. Huet, J. Biega, F. M. Suchanek, Mining History with Le Monde, AKBC2013

    J. Hoffart et al., Robust Disambiguation of Named Entities in Text, EMNLP2011 3

  • YAGO can be used in many ways

    Named Entity Disambiguation

    Semantic Culturomics

    Extending YAGO coverage would yield better results!

    F. M. Suchanek, N. Preda, Semantic Culturomics, VLDB2014

    T. Huet, J. Biega, F. M. Suchanek, Mining History with Le Monde, AKBC2013

    J. Hoffart et al., Robust Disambiguation of Named Entities in Text, EMNLP2011 3

  • Multilingual wikipedias

    4

  • Multilingual wikipedias

    Izabella_Olszewska

    Local entities

    Tadeusz_Jurasz

    4

  • Multilingual wikipedias

    Izabella_Olszewska

    Local entities

    Tadeusz_Jurasz

    Local facts

    isMarriedTo

    4

  • Running YAGO on multilingual wikipedias

    Extraction EN

    5

    ?

  • Running YAGO on multilingual wikipedias

    Extraction EN

    Duplicate entities

    5

    ?

  • Running YAGO on multilingual wikipedias

    Extraction EN

    Entities with no type discardedDuplicate entities

    5

    ?

  • Running YAGO on multilingual wikipedias

    Extraction EN

    No facts extracted from foreign inboxes

    Entities with no type discardedDuplicate entities

    5

    ?

  • Running YAGO on multilingual wikipedias

    6

    ExtractorExtractor

    Extractor

    Extractor Extractor

    Extractor

    Theme Theme

    Theme

    Theme Theme

  • Running YAGO on multilingual wikipedias

    ExtractorExtractor

    Extractor

    Extractor Extractor

    Extractor

    Theme Theme

    Theme

    Theme Theme

    ExtractorExtractor

    Extractor

    Theme Theme

    Theme

    6

    ExtractorExtractor

    Theme Theme

    Raw extraction

    Clean-up

  • Tasks

    2. Types

    3. Facts

    1. Entities

    7

  • 1. Set of Entities

    =? =?

    8

  • 1. Set of Entities

    specifies the abstraction classes

    8

  • 1. Set of Entities

    specifies the abstraction classes

    8

  • 2. Taxonomy construction

    en/John_Coltrane inCategory "Jazz Music"en/John_Coltrane inCategory "American Composers"

    9

  • 2. Taxonomy construction

    en/John_Coltrane inCategory "Jazz Music"en/John_Coltrane inCategory "American Composers"

    en/John_Coltrane type American_Composer

    9

  • 2. Taxonomy construction

    en/John_Coltrane inCategory "Jazz Music"en/John_Coltrane inCategory "American Composers"

    en/John_Coltrane type American_Composer

    American_Composer subclassOf wordnet_composer

    9

  • 2. Taxonomy construction

    en/John_Coltrane inCategory "Jazz Music"en/John_Coltrane inCategory "American Composers"

    en/John_Coltrane type American_Composer

    American_Composer subclassOf wordnet_composer

    English-centric!

    9

  • 9

    2. Taxonomy construction

    en/John_Coltrane inCategory "Jazz Music"en/John_Coltrane inCategory "American Composers"

    en/John_Coltrane type American_Composer

    American_Composer subclassOf wordnet_composer

    pl/John_Coltrane inCategory pl/Amerykańscy_Jazzmani

  • 9

    2. Taxonomy construction

    pl/John_Coltrane inCategory pl/Amerykańscy_Jazzmani

    en/John_Coltrane inCategory en/American_Jazzmen

    en/John_Coltrane inCategory "Jazz Music"en/John_Coltrane inCategory "American Composers"

    en/John_Coltrane type American_Composer

    American_Composer subclassOf wordnet_composer

  • 9

    2. Taxonomy construction

    en/John_Coltrane inCategory "Jazz Music"en/John_Coltrane inCategory "American Composers"

    en/John_Coltrane type American_Composeren/John_Coltrane type American_Jazzman

    American_Composer subclassOf wordnet_composer American_Jazzman subclassOf wordnet_jazzman

    pl/John_Coltrane inCategory pl/Amerykańscy_Jazzmani

    en/John_Coltrane inCategory en/American_Jazzmen

  • 9

    2. Taxonomy construction

    en/John_Coltrane inCategory "Jazz Music"en/John_Coltrane inCategory "American Composers"

    en/John_Coltrane type American_Composeren/John_Coltrane type American_Jazzman

    American_Composer subclassOf wordnet_composer American_Jazzman subclassOf wordnet_jazzman

    pl/John_Coltrane inCategory pl/Amerykańscy_Jazzmani

    en/John_Coltrane inCategory en/American_Jazzmen

  • 3. Fact extraction

    en/infobox/married

    10

  • 3. Fact extraction

    isMarriedTo

    en/infobox/married

    Manually defined in YAGO-EN

    10

  • 3. Fact extraction

    isMarriedTo

    pl/infobox/małżonek

    en/infobox/married

    10

  • 3. Fact extraction

    isMarriedTo

    pl/infobox/małżonek

    en/infobox/married

    hasChildwasBornOnDate?

    ??

    10

  • Infobox attributes mapping

    pl/infobox/małżonek =? isMarriedTo

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Alice_Coltrane)

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Ravi Coltrane)

    (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

    EisMarriedTo

    Fmalzonek

    11

  • pl/infobox/małżonek =? isMarriedTo

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Alice_Coltrane)

    EisMarriedTo

    Fmalzonek

    Corresponding attributes will share some subject-object pairs

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Ravi Coltrane)

    (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

    Infobox attributes mapping

    11

  • pl/infobox/małżonek =? isMarriedTo

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Alice_Coltrane)

    EisMarriedTo

    Fmalzonek

    support(Fa, Er) = |matches(Fa, Er)|

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Ravi Coltrane)

    (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

    Infobox attributes mapping

    12

  • pl/infobox/małżonek =? isMarriedTo

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Alice_Coltrane)

    EisMarriedTo

    Fmalzonek

    Too restrictive for attributes with few contributions

    support(Fa, Er) = |matches(Fa, Er)|

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Ravi Coltrane)

    (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

    Infobox attributes mapping

    12

  • confidence(Fa, Er) =|matches(Fa, Er)|

    |contrib(Fa)|

    pl/infobox/małżonek =? isMarriedTo

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Alice_Coltrane)

    EisMarriedTo

    Fmalzonek

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Ravi Coltrane)

    (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

    Infobox attributes mapping

    13

  • pl/infobox/małżonek =? isMarriedTo

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Alice_Coltrane)

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Ravi Coltrane)

    (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)pl/Krystyna_Pyrkosz, pl/Witold_Pyrkoszpl/Grażyna_Torbicka, pl/Adam_Torbicki

    pl/Szymon_Majewski, pl/Magda_Majewska

    EisMarriedTo

    Fmalzonek

    Too restrictive for attributes with a lot of new facts

    but few matches

    confidence(Fa, Er) =|matches(Fa, Er)|

    |contrib(Fa)|

    Infobox attributes mapping

    13

  • 14

    pl/infobox/małżonek =? isMarriedTo

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Alice_Coltrane)

    EisMarriedTo

    Fmalzonek

    pca(Fa, Er) =|matches(Fa, Er)|

    |matches(Fa, Er)|+ |clashes(Fa, Er)|

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Ravi Coltrane)

    (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)pl/Krystyna_Pyrkosz, pl/Witold_Pyrkoszpl/Grażyna_Torbicka, pl/Adam_Torbicki

    pl/Szymon_Majewski, pl/Magda_Majewska

    Open-world assumption

    L. Galarraga, C. Teflioudi, K. Hose, F. M. Suchanek, AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases, WWW2013

    Infobox attributes mapping

  • pl/infobox/małżonek =? isMarriedTo

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Alice_Coltrane)

    EisMarriedTo

    Fmalzonek

    Can get mislead by clashes

    pca(Fa, Er) =|matches(Fa, Er)|

    |matches(Fa, Er)|+ |clashes(Fa, Er)|

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Ravi Coltrane)

    (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)pl/Krystyna_Pyrkosz, pl/Witold_Pyrkoszpl/Grażyna_Torbicka, pl/Adam_Torbicki

    pl/Szymon_Majewski, pl/Magda_Majewska

    Open-world assumption

    Infobox attributes mapping

    14L. Galarraga, C. Teflioudi, K. Hose, F. M. Suchanek, AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases, WWW2013

  • pl/infobox/małżonek =? isMarriedTo

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Alice_Coltrane)

    EisMarriedTo

    Fmalzonek

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Ravi Coltrane)

    (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

    F ⇤malzonek

    Random Sample

    Infobox attributes mapping

    15

  • pl/infobox/małżonek =? isMarriedTo

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Alice_Coltrane)

    EisMarriedTo

    Fmalzonek

    (Barack_Obama, Michelle_Obama)(Elvis_Presley, Priscilla_Presley)(John_Coltrane, Ravi Coltrane)

    (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

    F ⇤malzonek

    Infobox attributes mapping

    wilson(Fa, Er) = c� �

    With 95% probability 
the true proportion of matches falls into

    [c� �, c+ �]

    15

  • 3. Fact extraction

    isMarriedTo

    pl/infobox/małżonek

    en/infobox/married

    hasChildwasBornOnDate?

    ??

    16

  • 3. Fact extraction

    isMarriedTo

    pl/infobox/małżonek

    en/infobox/married

    16

  • Mapping quality

    Confidence 16% Wilson 4%

    Estimated on a manually annotated sample

    17

  • Mapping quality

    Confidence 16% Wilson 4%

    Good performance across different languages

    17

  • Mapping quality

    Confidence 16% Wilson 4%

    Chosen so that we get high recall at precision > 95%

    17

  • Mapping quality

    Confidence 16% Wilson 4%

    Prec Rec F1 Prec Rec F1

    ar 100 73 85 100 82 90

    de 100 37 54 98 56 72

    es 96 19 32 95 29 45

    fa 100 49 66 97 54 69

    fr 100 16 27 100 69 82

    it 100 7 12 98 23 37

    nl 100 19 32 100 22 36

    pl 95 10 19 97 64 77

    ro 96 52 67 95 70 81

    18

  • Mapping quality

    High precision consistent across languages.

    Confidence 16% Wilson 4%

    Prec Rec F1 Prec Rec F1

    ar 100 73 85 100 82 90

    de 100 37 54 98 56 72

    es 96 19 32 95 29 45

    fa 100 49 66 97 54 69

    fr 100 16 27 100 69 82

    it 100 7 12 98 23 37

    nl 100 19 32 100 22 36

    pl 95 10 19 97 64 77

    ro 96 52 67 95 70 81

    18

  • Mapping quality

    Higher recall for smaller wikipedias.

    Confidence 16% Wilson 4%

    Prec Rec F1 Prec Rec F1

    ar 100 73 85 100 82 90

    de 100 37 54 98 56 72

    es 96 19 32 95 29 45

    fa 100 49 66 97 54 69

    fr 100 16 27 100 69 82

    it 100 7 12 98 23 37

    nl 100 19 32 100 22 36

    pl 95 10 19 97 64 77

    ro 96 52 67 95 70 81

    18

  • Mapping quality

    Lower threshold for Wilson helps increase recall.

    <

    Confidence 16% Wilson 4%

    Prec Rec F1 Prec Rec F1

    ar 100 73 85 100 82 90

    de 100 37 54 98 56 72

    es 96 19 32 95 29 45

    fa 100 49 66 97 54 69

    fr 100 16 27 100 69 82

    it 100 7 12 98 23 37

    nl 100 19 32 100 22 36

    pl 95 10 19 97 64 77

    ro 96 52 67 95 70 81

    18

  • YAGO3

    19

  • YAGO3

    de/Kirdorf (Bedburg),hasNumberOfPeople, "1204"^^xsd:integer

    fr/Château de Montcony,isLocatedIn, Burgundy

    pl/Henryk Pietras, wasBornInde/Debiensko

    1M new entities (3.5M for English)

    2.5M new facts (6.5M for English)

    19

  • YAGO3

    Large, clean knowledge base from multilingual wikipedias.

    de/Kirdorf (Bedburg),hasNumberOfPeople, "1204"^^xsd:integer

    fr/Château de Montcony,isLocatedIn, Burgundy

    pl/Henryk Pietras, wasBornInde/Debiensko

    19

    Single coherent taxonomy.

    Mapping of infobox attributes to YAGO relations.

    1M new entities (3.5M for English)

    2.5M new facts (6.5M for English)

  • YAGO3

    http://yago-knowledge.orgThank you!

    http://yago-knowledge.org