Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
VANITASVisualising and AugmeNting Interesting Text collectionS
System Design
Supervisor: Dr. Joel Azzopardi
Co-Supervisor: Dr. Charlie Abela
Problem
Information Overload Problemand Open Data Movement
Need
Text analysis techniques to handle raw textualdata available on the World Wide Web.
Aims and Objectives
To research and develop methods whereby userscan obtain an overview, and perform analyses ofinteresting document collections, whilst able to'drill-down' in the information.
• Group similar documents into clusters
• Extract important and informative keyphrases
• Visualise the discovered information byutilising different techniques
Solution
Evaluation Results
Keyword Phrase Extraction Component:
• The implemented Modified B & Q algorithmsurpasses the TextRank and the original B & Qalgorithms. It also performs best whenconsidering few keyphrases.
Clustering Component:
• The Bag-Of-Words were the most effectivefeatures in contrast to keyphrases and namedentities.
• The utilised No-K-Means algorithm surpassedthe baseline as well as other systems.
• The Generalised Dunn’s Index did not indicatethe best performing similarity threshold.
Ayrton Senna Azzopardi
User Interface
Visualisation Component and Overall System:
• User study participants were quite satisfiedwith their experience whilst commenting onthe system’s usability in the current time.
• Although they found the system useful inobtaining an overview of a documentcollection, some stated possible improvementsto enhance the users’ experience.
Conclusion
A tool that retrieves meaningful informationfrom document collections and visualises it inan interactive manner, was created.
Although there is room for improvement, oursystem succeeded in achieving the objectivesset for this project.