VANITAS Visualising and AugmeNting Interesting Text collectionS · 2019. 9. 30. · VANITAS Visualising and AugmeNting Interesting Text collectionS System Design Supervisor: Dr. Joel

VANITASVisualising and AugmeNting Interesting Text collectionS

System Design

Supervisor: Dr. Joel Azzopardi

Co-Supervisor: Dr. Charlie Abela

Problem

Information Overload Problemand Open Data Movement

Need

Text analysis techniques to handle raw textualdata available on the World Wide Web.

Aims and Objectives

To research and develop methods whereby userscan obtain an overview, and perform analyses ofinteresting document collections, whilst able to'drill-down' in the information.

• Group similar documents into clusters

• Extract important and informative keyphrases

• Visualise the discovered information byutilising different techniques

Solution

Evaluation Results

Keyword Phrase Extraction Component:

• The implemented Modified B & Q algorithmsurpasses the TextRank and the original B & Qalgorithms. It also performs best whenconsidering few keyphrases.

Clustering Component:

• The Bag-Of-Words were the most effectivefeatures in contrast to keyphrases and namedentities.

• The utilised No-K-Means algorithm surpassedthe baseline as well as other systems.

• The Generalised Dunn’s Index did not indicatethe best performing similarity threshold.

Ayrton Senna Azzopardi

User Interface

Visualisation Component and Overall System:

• User study participants were quite satisfiedwith their experience whilst commenting onthe system’s usability in the current time.

• Although they found the system useful inobtaining an overview of a documentcollection, some stated possible improvementsto enhance the users’ experience.

Conclusion

A tool that retrieves meaningful informationfrom document collections and visualises it inan interactive manner, was created.

Although there is room for improvement, oursystem succeeded in achieving the objectivesset for this project.

Documents

VANITAS Visualising and AugmeNting Interesting Text collectionS · 2019. 9. 30. · VANITAS Visualising and AugmeNting Interesting Text collectionS System Design Supervisor: Dr. Joel