Using Co-authorship Networks for Author Name Disambiguation Fakhri Momeni ([email protected]) and Philipp Mayr ([email protected]) GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany Problem • Difficulty to distiguish the publications of different authors with the same name (homonym problem), esp. for common names Objectives • Clustering the publications of authors with the same name into separate groups • Analyzing the effect of common names on clustering algorithm • Suggesting a solution to optimize the algorithm for common names Constraints • Lack of metadata for authors and publications in DBLP Proposed method • Using co-authorship networks to find similarity between publications and clustering them • Using community detection to optimize the result for common names Community Detection BCubed precision BCubed recall BCubed F Threshold=1 0.99 0.77 0.81 Threshold= 3 0.96 0.83 0.84 BCubed precision BCubed recall BCubed F before optimization 0.46 0.87 0.45 after optimization 0.79 0.61 0.58 Mean values of BCubed metrics for 1,000 names BCubed metrics for author names with more than 200 publications after applying community detection algorithm, thr.=3 Results* * Open testbed available at http://dx.doi.org/10.7802/1234 Building a network of authors and publications Project funded by BMBF (Federal Ministry of Education and Research, Germany) grant number 01PQ13001. Clustering bases on co-authorship networks

Using co-authorship networks for author name disambiguation

Download PDF Report

Upload
philipp-mayr
View
67
Download
1

Embed Size (px)

Citation preview

Using Co-authorship Networks for Author Name Disambiguation

Fakhri Momeni ([email protected]) and Philipp Mayr ([email protected]) GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany

Problem• Difficulty to distiguish the publications of different authors with

the same name (homonym problem), esp. for common names

Objectives• Clustering the publications of authors with the same name into

separate groups

• Analyzing the effect of common names on clustering algorithm

• Suggesting a solution to optimize the algorithm for commonnames

Constraints• Lack of metadata for authors and publications in DBLP

Proposed method• Using co-authorship networks to find similarity between

publications and clustering them

• Using community detection to optimize the result for commonnames

Community Detection

BCubed precision BCubed recall BCubed F

Threshold=1 0.99 0.77 0.81

Threshold= 3 0.96 0.83 0.84

BCubed

precision

BCubed

recall

BCubed F

before

optimization

0.46 0.87 0.45

after optimization 0.79 0.61 0.58

Mean values of BCubed metrics for 1,000 namesBCubed metrics for author names with more than 200 publications after applying community detection algorithm, thr.=3

Results* * Open testbed available at http://dx.doi.org/10.7802/1234

Building a network ofauthors andpublications

Project funded by BMBF (Federal Ministry of Education and Research, Germany) grant number 01PQ13001.

Clustering bases on co-authorship

networks