Representing discourse and argumentation as an application of Web Science

Preview:

DESCRIPTION

Discourse on the Web currently can not be appropriately representation, which hampers searching and querying. Based on insights from Web Science, DERI Galway has developed three different approaches for representing and mining of discourse.

Citation preview

Chapter Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Representing discourse and argumentationas an application of Web Science

Benjamin Heitmann, Dr. Conor Hayes

Digital Resources for the Humanities and Arts Conference 2009

Digital Enterprise Research Institute www.deri.ie

Benjamin.Heitmann@deri.org

slide of 18

Introduction

The Web mirrors most areas of today’s society (e.g.: entertainment, science and humanities)

Current Web does not capture structure of critique, argumentation, interpretation

Representing types and granularity of discourse and links is necessary

DERI has 3 approaches to discourse representation

Foundation: Web Science as an interdisciplinary approach to understanding and engineering the Web (started by Tim Berners-Lee)

2

Digital Enterprise Research Institute www.deri.ie

Benjamin.Heitmann@deri.org

slide of 18

Outline

Motivation:

Knowledge representation techniques to enable more sophisticated searching and querying of discourse on the Web

Introducing Web Science:

an interdisciplinary approach to understanding the Web and its evolution

Applying the Web Science method:

three approaches for discourse representation

3

Argument

research weblog

research paper

Primary Text

reference

Counter-argument

Conclusion

reference

Motivation

Evaluation

reference

reference

publication frequency increases

Digital Enterprise Research Institute www.deri.ie

slide of 18Benjamin.Heitmann@deri.org

Discourse and argumentation on the Web

The Web doesn't properly capture the dynamic argumentation structures in discourse

Current search only captures: Plain text

General links

Citations

No search for: Relations between concepts

Negative relations

Semantics of argumentation:– Argument, counter-argument – Condition, evidence, solution

4

Digital Enterprise Research Institute www.deri.ie

Benjamin.Heitmann@deri.org

slide of 18

Representing the structure of discourse

5

Knowledge on the Web is not sufficiently connected

No standard vocabularies for representation of discourse structure and link granularity

Queries are un-intuitive and imprecise, no negative queries

Links are un-typed, and only on document level

No semantics of relationships

Source: “Clickstream Data Yields High-Resolution Maps of Science,” Bollen, Van de Sompel, et al. PLoS ONE (2009)

Digital Enterprise Research Institute www.deri.ie

slide of 18Benjamin.Heitmann@deri.org

Insights from Web Science

The “Web Science” idea was started by Tim Berners-Lee and researchers from Southampton (see sources)

1. Understanding the current Web requires an interdisciplinary and holistic view of the Web on a whole

2. On the Web, engineering and social factors will influence each other and create a feedback loop

3. Properties of the Web are based on emergent behaviour, which can be empirically measured

6

Digital Enterprise Research Institute www.deri.ie

slide of 18Benjamin.Heitmann@deri.org

A Systems-level view of the Web

Classical reductionist approach does not work

Understand-ing the current Web requires an inter-disciplinary view

No delegation of research on one area to only one discipline

7

© Web Science Research Initiative

Digital Enterprise Research Institute www.deri.ie

slide of 18Benjamin.Heitmann@deri.org

The Web Science Process Model

On the Web, engineering and social factors will influence each other.

Increase in complexity:

result is transition from micro to macro effects

Example: Evolution of Blogs Independent blogs:

Track-backs, Comments, Spam

Twitter: Microblogging,

HashTags, Location aware

Facebook: Lifestreaming, Privacy

8

© Law

rence Lessig

Source: CACM Web Science Article

Digital Enterprise Research Institute www.deri.ie

slide of 18Benjamin.Heitmann@deri.org

Emergent properties of the Web

Empirical properties: In- and out-degree distribution of links

Power laws Growth: 7 million new pages a day in 2005

Emergent patterns: Popular tags (folksonomies) on Web 2.0 sites

Emerging of an editorial elite on Wikipedia

9

Source: “Graph structure in the Web”, Broder, Kumar et al.

© C

lay Shirky

Digital Enterprise Research Institute www.deri.ie

Benjamin.Heitmann@deri.org

slide of 18

Approaches for discourse representation

The Web Science method and discourse representation: Interdisciplinary: theoretical foundation is based on Speech act theory and Language Game theory

Expect a feedback loop between Semantic Web solutions and usage patterns of community

Empirical approach: CORAAL: use knowledge extraction and integration on large data collections

Normative (engineering) approaches: – SIOC Argumentation vocabulary:

light-weight and community-driven– SALT: annotation of argumentation semantics

10

Digital Enterprise Research Institute www.deri.ie

slide of 18Benjamin.Heitmann@deri.org

CORAAL: empirical discourse analysis

Knowledge extraction and integration

Pattern discovery Use emergent patterns

in large document collections

Go beyond text based search: Answer negative queries

Detect relations between concepts

Uses Natural Language Processing

No mark-up required

11

CORAAL screen shot of results for the search term “breast cancer”

Digital Enterprise Research Institute www.deri.ie

slide of 18Benjamin.Heitmann@deri.org

SIOC argumentation vocabulary

Light-weight and informal

Express structure of argumentation: Who is participating?

Where are the elements of the discourse distributed?

How are the elements connected?

Extensibility enables community involvement

13

SIOC argumentation vocabulary

Digital Enterprise Research Institute www.deri.ie

slide of 18Benjamin.Heitmann@deri.org

SALT: Semantically Annotated LaTex

Enables mark-up of documents for claim identification

Exposes the semantics of the argumentation. Examples: Claims, explanations

Rhetorical structure (abstract, contribution, evaluation)

Argument, counter-argument

Creates PDF with content and structure

15

discourse representation in SALT

Digital Enterprise Research Institute www.deri.ie

Benjamin.Heitmann@deri.org

slide of 18

Summary

Representing discourse allows intuitive querying and searching of the argumentation semantics

The Web Science method provides insights to representing discourse: Use interdisciplinary approach; Expect feedback loop between technical and social factors; Detect emergent properties and patterns

Three approaches at DERI for representing discourse: CORAAL: empirical, knowledge extraction+integration

SIOC argumentation vocabulary: light weight, bottom up

SALT: annotate argumentation semantics in publications

17

Digital Enterprise Research Institute www.deri.ie

Benjamin.Heitmann@deri.org

slide of 18

Questions? and Sources!

These slides: http://www.slideshare.net/metaman

Web Science:“Web science: an interdisciplinary approach to understanding the web”, Hendler, Shadboldt, Hall, Berners-Lee, Weitzner, Communications of the ACM (2008)

CORAAL: demo at http://coraal.deri.ie:8080/coraal“CORAAL-Dive into publications, Bathe in the Knowledge,” Novacek, Groza, et al., Journal of Web Semantics, Elsevier (2009)

SIOC argumentation vocabulary:“Expressing Argumentative Discussions in Social Media Sites”, Lange, Bojars, et al., Workshop on Social Data on the Web at the International Semantic Web Conference (2008)

SALT:“SALT-Semantically Annotated LaTex for Scientific Publications,” Groza, Handschuh, et al., European Semantic Web Conference (2007)

18

Recommended