Upload
elisabeth-lex
View
365
Download
0
Tags:
Embed Size (px)
Citation preview
gefördert durch das Kompetenzzentrenprogramm
www.know-center.at
© Know-Center 2012
Measuring the Quality of Web Content using Factual Information
16. April 2012
WebQuality 2012 workshop at WWW 2012
Elisabeth Lex, Michael Voelske , Marcelo Errecalde , Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein and Michael Granitzer
© Know-Center 2012
2
www.know-center.at
Agenda
Motivation
Approach
Results
Summary and Outlook
© Know-Center 2012
3
www.know-center.at
Motivation
People‘s decisions often based on Web content
lacking quality control, no verification
Inaccurate, incorrect infomation No fact checking
Measures needed to capture credibility and quality aspects
In respect to facts!
© Know-Center 2012
4
www.know-center.at
Approach
Measure information quality based on factual information
3 Approaches:
Use simple statistics about the facts obtained from text
Exploit relational information contained in facts
Use semantic relationships like meronymy and hypernymy
First approach:
Use simple statistical features about facts in a document
Indicates how informative a document is
Derive facts from Web content using Open Information Extraction
© Know-Center 2012
5
www.know-center.at
Definition of Factual Density
Fact Count
Factual Density
© Know-Center 2012
6
www.know-center.at
Experiments
Wikipedia: 1000 Featured and Good articles versus 1000 Non-Featured (randomly selected)
Featured: a comprehensive coverage of the major facts in the context of the article’s subject
Baseline: Word Count [Blumenstock 2008]
Featured articles longer than non-featured
Bias: longer docs contain more facts
Evaluation: 2 Datasets
Unbalanced: articles differ in length
Balanced: articles similar in length
© Know-Center 2012
7
www.know-center.at
Distributions of docs in both datasets in respect to word count
© Know-Center 2012
8
www.know-center.at
Precision/Recall curves of Factual Density
© Know-Center 2012
9
www.know-center.at
ResultsFactual Density on balanced corpus
© Know-Center 2012
10
www.know-center.at
Experiments – Relational Features
Approach 2: exploiting relational information contained in facts
Extract relational features from articles
Use relations from ReVerb: binary relations (e1, relation, e2)
Use them to train a classifier to discriminate between featured/good and non-featured
© Know-Center 2012
11
www.know-center.at
Experiments – Relational Features
Approach 2: exploiting relational information contained in facts
Extract relational features from articles
Use relations from ReVerb: binary relations (e1, relation, e2)
Use them to train a classifier to discriminate between featured/good and non-featured
© Know-Center 2012
12
www.know-center.at
Summary
Simple fact related measure: Factual Density
Based on Factual Density, featured/good articles can be separated from non-featured if article length similar
If articles differ in length, word count! For future work, combination of both
Plan to incorporate edit history: more editors, higher factual density
Preliminary experiments with relational features
Promising results, more work in this direction
Goal here is to bring semantics in to the field of Information Quality
We expect this to unlock several IQ dimensions, e.g. generality vs specificity
© Know-Center 2012
13
www.know-center.at
Thank you for your attention!
Elisabeth Lex