16
See Also: Auto Generated Recommendations Mislav Cimperšak Marija Tkalec Siniša Jovčić Faculty of Humanities and Social Sciences Ivana Lučića 3, Zagreb, Croatia INFuture 2009: Digital Resources and Knowledge Sharing

See Also: Auto Generated Recommendations

Embed Size (px)

DESCRIPTION

See Also: Auto Generated Recommendations. Mislav Cimperšak Marija Tkalec Siniša Jovčić Faculty of Humanities and Social Sciences Ivana Lučića 3, Zagreb, Croatia. INFuture 2009: Digital Resources and Knowledge Sharing. Introduction. reliable source of information - PowerPoint PPT Presentation

Citation preview

Page 1: See Also: Auto Generated Recommendations

See Also: Auto Generated RecommendationsMislav CimperšakMarija TkalecSiniša Jovčić

Faculty of Humanities and Social SciencesIvana Lučića 3, Zagreb, Croatia

INFuture 2009: Digital Resources and Knowledge Sharing

Page 2: See Also: Auto Generated Recommendations

Introduction

•reliable source of information •accessible to everyone around the world•most up-to-date online encyclopedia

•disadvantages

Page 3: See Also: Auto Generated Recommendations

See Also

•list of similar or related articles to current article

•urges users to continue browsing and reading articles on the page itself

•user created list

Page 4: See Also: Auto Generated Recommendations

Thesis

•users on similar topics create connections to the same articles

•by comparing two articles connections we could conclude how similar these two articles are

Page 5: See Also: Auto Generated Recommendations

Goal

•creation of an automatic recommendation system for the “See also” section based on soft clustering of documents

Page 6: See Also: Auto Generated Recommendations

XfceXfceGNOM

EGNOM

E

KDEKDE

Page 7: See Also: Auto Generated Recommendations

XfceXfceGNOM

EGNOM

E

KDEKDE

GUIGUILinuxLinux

GNU General Public

License

GNU General Public

License

UnixUnix

WindowsWindows Mac OS

Mac OS

BSD licenseBSD license

MIT licenseMIT license

Apache LicenseApache License

Page 8: See Also: Auto Generated Recommendations

XfceXfceGNOM

EGNOM

E

KDEKDE

GUIGUILinuxLinux

GNU General Public

License

GNU General Public

License

UnixUnix

WindowsWindows Mac OS

Mac OS

BSD licenseBSD license

MIT licenseMIT license

Apache LicenseApache License

FedoraFedora

Page 9: See Also: Auto Generated Recommendations

Research

•5,012 articles•509 clusters•evaluation

▫compared against human created connections

Page 10: See Also: Auto Generated Recommendations

Research

•tokens as vector features•document similarity threshold 0.5•connections within Wikipedia treated as

separate tokens with extra weight when comparing the articles

Page 11: See Also: Auto Generated Recommendations

Research

•clusters in three categories▫clusters with no real value▫partially relevant clusters▫well-formed clusters

Page 12: See Also: Auto Generated Recommendations

Clusters with no real value

•generated clusters not usable•subjects in completely different theme

areas•clusters which contain too many articles

▫St. Peter, Saint-John Perse, General Staff of Armed Forces of the Republic of Croatia, French Guiana, Marine mammals

▫Eurasian Avars, Psychology, birds

Page 13: See Also: Auto Generated Recommendations

Partially relevant clusters

•some articles within this kind of clusters thematically related

•remaining articles are not bound with the same subject or they don’t involve the same or similar area

▫Croatian Football Team, Parliamentray elections, Orthography, Presidential Elections, Croatian Academy of Science and Arts

Page 14: See Also: Auto Generated Recommendations

Well-formed clusters

•articles connected to the same subject

▫Olympic Games in Tokyo, London, Barcelona, Atlanta, Athena, Beijing, Summer Olympic Games

▫football teams▫Airbus airplanes

Page 15: See Also: Auto Generated Recommendations

Observations

•Wikipedia users more often create connections on more general and more obvious terms

Page 16: See Also: Auto Generated Recommendations

Conclusion

•the procedure cannot be regarded as being successful enough for an unsupervised implementation on articles in Croatian Wikipedia

•most likely the algorithm would be more successful in a strictly supervised encyclopedia