1
VANITAS Visualising and AugmeNting Interesting Text collectionS System Design Supervisor: Dr. Joel Azzopardi Co-Supervisor: Dr. Charlie Abela Problem Information Overload Problem and Open Data Movement Need Text analysis techniques to handle raw textual data available on the World Wide Web. Aims and Objectives To research and develop methods whereby users can obtain an overview, and perform analyses of interesting document collections, whilst able to 'drill-down' in the information. Group similar documents into clusters Extract important and informative keyphrases Visualise the discovered information by utilising different techniques Solution Evaluation Results Keyword Phrase Extraction Component: The implemented Modified B & Q algorithm surpasses the TextRank and the original B&Q algorithms. It also performs best when considering few keyphrases. Clustering Component: The Bag-Of-Words were the most effective features in contrast to keyphrases and named entities. The utilised No-K-Means algorithm surpassed the baseline as well as other systems. The Generalised Dunn’s Index did not indicate the best performing similarity threshold. Ayrton Senna Azzopardi User Interface Visualisation Component and Overall System: User study participants were quite satisfied with their experience whilst commenting on the system’s usability in the current time. Although they found the system useful in obtaining an overview of a document collection, some stated possible improvements to enhance the users’ experience. Conclusion A tool that retrieves meaningful information from document collections and visualises it in an interactive manner, was created. Although there is room for improvement, our system succeeded in achieving the objectives set for this project.

VANITAS Visualising and AugmeNting Interesting Text collectionS · 2019. 9. 30. · VANITAS Visualising and AugmeNting Interesting Text collectionS System Design Supervisor: Dr. Joel

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VANITAS Visualising and AugmeNting Interesting Text collectionS · 2019. 9. 30. · VANITAS Visualising and AugmeNting Interesting Text collectionS System Design Supervisor: Dr. Joel

VANITASVisualising and AugmeNting Interesting Text collectionS

System Design

Supervisor: Dr. Joel Azzopardi

Co-Supervisor: Dr. Charlie Abela

Problem

Information Overload Problemand Open Data Movement

Need

Text analysis techniques to handle raw textualdata available on the World Wide Web.

Aims and Objectives

To research and develop methods whereby userscan obtain an overview, and perform analyses ofinteresting document collections, whilst able to'drill-down' in the information.

• Group similar documents into clusters

• Extract important and informative keyphrases

• Visualise the discovered information byutilising different techniques

Solution

Evaluation Results

Keyword Phrase Extraction Component:

• The implemented Modified B & Q algorithmsurpasses the TextRank and the original B & Qalgorithms. It also performs best whenconsidering few keyphrases.

Clustering Component:

• The Bag-Of-Words were the most effectivefeatures in contrast to keyphrases and namedentities.

• The utilised No-K-Means algorithm surpassedthe baseline as well as other systems.

• The Generalised Dunn’s Index did not indicatethe best performing similarity threshold.

Ayrton Senna Azzopardi

User Interface

Visualisation Component and Overall System:

• User study participants were quite satisfiedwith their experience whilst commenting onthe system’s usability in the current time.

• Although they found the system useful inobtaining an overview of a documentcollection, some stated possible improvementsto enhance the users’ experience.

Conclusion

A tool that retrieves meaningful informationfrom document collections and visualises it inan interactive manner, was created.

Although there is room for improvement, oursystem succeeded in achieving the objectivesset for this project.