33
Lightweight Text Analytics using Linked Data Ali Khalili, Sören Auer, Axel-Cyrille Ngonga Ngomo Extended Semantic Web Conference May 27th, 2014 Crete, Greece http://context.aksw.org

conTEXT -- Lightweight Text Analytics using Linked Data

Embed Size (px)

DESCRIPTION

The Web democratized publishing -- everybody can easily publish information on a Website, Blog, in social networks or microblogging systems. The more the amount of published information grows, the more important are technologies for accessing, analysing, summarising and visualising information. While substantial progress has been made in the last years in each of these areas individually, we argue, that only the intelligent combination of approaches will make this progress truly useful and leverage further synergies between techniques. In this paper we develop a text analytics architecture of participation, which allows ordinary people to use sophisticated NLP techniques for analysing and visualizing their content, be it a Blog, Twitter feed, Website or article collection. The architecture comprises interfaces for information access, natural language processing and visualization. Di erent exchangeable components can be plugged into this architecture, making it easy to tailor for individual needs. We evaluate the usefulness of our approach by comparing both the e ectiveness and eciency of end users within a task-solving setting. Moreover, we evaluate the usability of our approach using a questionnaire-driven approach. Both evaluations suggest that oridinary Web users are empowered to analyse their data and perform tasks, which were previously out of reach.

Citation preview

Page 1: conTEXT -- Lightweight Text Analytics using Linked Data

Lightweight Text Analytics using Linked Data

Ali Khalili, Sören Auer, Axel-Cyrille Ngonga Ngomo

Extended Semantic Web Conference May 27th, 2014Crete, Greece

http://context.aksw.org

Page 2: conTEXT -- Lightweight Text Analytics using Linked Data

2 Agenda

Motivation

How does conTEXT work?

Workflow

Features

Evaluation

Conclusion

Demo

Page 3: conTEXT -- Lightweight Text Analytics using Linked Data

3 Motivation: Analytical Information Imbalance

People should be able to find out what patterns can be discovered and what conclusions can be drawn from the information they share.

Page 4: conTEXT -- Lightweight Text Analytics using Linked Data

4 Motivation: Lightweight Text Analytics

Unstructured

Semi-structured

Structured

• IBM Content Analytics platform• GATE• Apache UIMA

• Attensity• Trendminer• MashMaker• Thomson

Data Analyzer

• Zoho Reports• SAP NetWeaver• Jackbe• Rapidminer

• Excel• DataWrangler• Google Docs Spreadsheets• Google Refine

• Alchmey• OpenCalais

• Facete• CubeViz

• TweetDeck• Topsy• Flumes

Lack of tools dealing with unstructured content, catering non-expert users and providing extensible analytics interfaces.

Page 5: conTEXT -- Lightweight Text Analytics using Linked Data

5 conTEXT

http://context.aksw.org A platform for lightweight text analytics Approach

No installation and configuration required

Access content from a variety of sources

Instantly show the results of analysis to users in a variety of visualizations

Allow refinement of automatic annotations and take feedback into account

Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together

How does it work?

Page 6: conTEXT -- Lightweight Text Analytics using Linked Data

6 Data Collection

Data tr

ansform

ation

Input Data Model

•Rest

APIs

•SPA

RQL endpoints

•RSS, A

tom

, RDF f

eeds

•Web C

rawlers

Handling different input types

- RDF-based- Relational

Page 7: conTEXT -- Lightweight Text Analytics using Linked Data

7 Data Analysis

Natural Language Processing (NLP)

• DBpedia Spotlight

• FOX

• Any other NLP services which support NIF

http://spotlight.dbpedia.org

http://fox.aksw.orgDBpedi

a

Nam

ed

entities NLP

Serv

ice Corp

ora

Page 8: conTEXT -- Lightweight Text Analytics using Linked Data

8 NLP Interchange Format (NIF)

http://nlp2rdf.org An RDF/OWL-based format

Provides Interoperability between Natural Language Processing (NLP) tools and services.

Standardize access parameters, annotations (e.g. tokenization), validation & log messages.

Page 9: conTEXT -- Lightweight Text Analytics using Linked Data

9 NLP Interchange Format (NIF)

Page 10: conTEXT -- Lightweight Text Analytics using Linked Data

10 Data Enrichment

De-referencing the DBpedia URIs of the recognized entities.

(e.g. longitude and latitudes for locations , birth and death dates for people, etc.)

Matching the entity co-occurrences with pre-defined natural language patterns for DBpedia predicates provided by BOA (BOotstrapping linked datA)

(e.g. authorship relation ) Catalyst

Page 11: conTEXT -- Lightweight Text Analytics using Linked Data

11 Data Mixing (Mashups)

NLP service integration Composite corpus

E.g. Twitter + Blog + Facebook

Helps to create a user model

Page 12: conTEXT -- Lightweight Text Analytics using Linked Data

12 Data Visualization & Exploration Different Views on Semantically-enriched

data

Using Exhibit & D3.js

Page 13: conTEXT -- Lightweight Text Analytics using Linked Data

13 Faceted browsing

Page 14: conTEXT -- Lightweight Text Analytics using Linked Data

14 Places map & People timeline

Page 15: conTEXT -- Lightweight Text Analytics using Linked Data

15 Tag cloud

Page 16: conTEXT -- Lightweight Text Analytics using Linked Data

16 Chordal graph view

Page 17: conTEXT -- Lightweight Text Analytics using Linked Data

17 Matrix view

Page 18: conTEXT -- Lightweight Text Analytics using Linked Data

18 Trend view

Page 19: conTEXT -- Lightweight Text Analytics using Linked Data

19 Sentiment view

Page 20: conTEXT -- Lightweight Text Analytics using Linked Data

20 Image view

Page 21: conTEXT -- Lightweight Text Analytics using Linked Data

21 Annotation refinement Lightweight text analytics as an incentive for users to

revise semantic annotations

RDFaCE WYSIWYM (What-You-See-Is-What-You-Mean) interface for manual content annotation in RDFa format

Feedback to NLP services NLP calibration

calibration

FOX Feedback APIhttp://139.18.2.164:4444/api/ner/feedback

DBPedia Spotlight Feedback APIhttp://spotlight.dbpedia.org/rest/feedback

Page 22: conTEXT -- Lightweight Text Analytics using Linked Data

22 Annotation refinement UI

Page 23: conTEXT -- Lightweight Text Analytics using Linked Data

23 conTEXT architecture overview

Page 24: conTEXT -- Lightweight Text Analytics using Linked Data

24 Other features: Interactive & Progressive Annotation

Interactive systems can be responsive despite low performance.

Page 25: conTEXT -- Lightweight Text Analytics using Linked Data

25 Other features: Real-time Semantic Analysis (ReSA)

https://github.com/ali1k/resa

Page 26: conTEXT -- Lightweight Text Analytics using Linked Data

26 Other features:

• Search Engine Optimization (SEO) using Schema.org & JSON-LD

• Drilling down results using a subgraph of DBpedia

• Changing the underlying DBpedia ontology

Page 27: conTEXT -- Lightweight Text Analytics using Linked Data

27 Evaluation: Usefulness study Task-driven usefulness study

25 Users

10 questions pertaining to knowledge discovery in corpora of unstructured data E.g. What are the five most mentioned countries by Bill Gates tweets?

Page 28: conTEXT -- Lightweight Text Analytics using Linked Data

28 Evaluation: Results of usefulness study

Measuring time & Jaccard similarity for answers using/without conTEXT

second

Avg. 136% more time without conTEXT

Page 29: conTEXT -- Lightweight Text Analytics using Linked Data

29 Evaluation: Usability study System Usability Scale (SUS) 82

http://www.measuringusability.com/

Page 30: conTEXT -- Lightweight Text Analytics using Linked Data

30

Lightweight Text Analytics using Linked Data Democratizing the NLP usage

Alleviating the Semantic Web's chicken-and-egg problem

Harnessing the power of feedback loops

Conclusions

Page 31: conTEXT -- Lightweight Text Analytics using Linked Data

31 Future Work

Improving the performance & scalability of views

Exposing APIs for third-parties Enable batch refinement of annotations More input source types More…

Page 32: conTEXT -- Lightweight Text Analytics using Linked Data

32 Any Questions?

Page 33: conTEXT -- Lightweight Text Analytics using Linked Data

33 Demo

Progressive data collection and annotation http://context.aksw.org

Different views LOD2 Blog http

://context.aksw.org/app/hub.php?corpus=6 Example of adding extra input types + changing the

DBpedia ontology + composite corpora LinkedIn Jobs

http://context.aksw.org/app/hub.php?corpus=242