Upload
jose-f-rodrigues-jr
View
58
Download
0
Tags:
Embed Size (px)
Citation preview
Introduction Methodology Experiments Conclusions
Multimodal graph-based analysis over the DBLPrepository: critical discoveries and hypotheses
Gabriel Perri Gimenes, Hugo Gualdron, Jose F Rodrigues Jr 1
Mario Gazziro 2
1University of Sao Paulo 2Fed. University of Santo AndreAv Trab Sao-carlense, 400 Av dos Estados, 500
Sao Carlos, SP, Brazil - 13566-590 Santo Andre, SP, Brazil - 09210-580{ggimenes,gualdron,junio}@icmc.usp.br [email protected]
This work has financial support from Fapesp (2013/10026-7)
http://www.icmc.usp.br/pessoas/junio/Site/index.htm
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 1/21
Introduction Methodology Experiments Conclusions
Summary
1 Introduction
2 Methodology
3 Experiments
4 Conclusions
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 2/21
Introduction Methodology Experiments Conclusions
Summary
1 Introduction
2 Methodology
3 Experiments
4 Conclusions
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 3/21
Introduction Methodology Experiments Conclusions
Introduction
High demand for informations about the behavior ofscientists: authors, editors, funding agencies and society
Combining analytical techniques - multimodal approach
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 4/21
Introduction Methodology Experiments Conclusions
Problem
Finding non-evident facts about DBLP is a non-trivial task
Single-technique approaches - limited analytical potential
Sistematic process - can be applied on similar data from otherdomains
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 5/21
Introduction Methodology Experiments Conclusions
Hypothesis
Hypothesis
The use of multiple analytical techniques, through a well-definedprocess, is capable of revealing important aspects of the scientificcommunity in computer science
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 6/21
Introduction Methodology Experiments Conclusions
Summary
1 Introduction
2 Methodology
3 Experiments
4 Conclusions
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 7/21
Introduction Methodology Experiments Conclusions
Materials
Cardinality of the entities extracted from DBLP - XML
Entity Number
Authors 1.060.221
Articles 1.801.576
Events 14.654
Publications 4.262
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 8/21
Introduction Methodology Experiments Conclusions
Data migration
Semi-structured format ⇒ Relational model
Need of specific software for the migration
Definition of the entity-relationship model:
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 9/21
Introduction Methodology Experiments Conclusions
Extracted relationships
Relationship Description
Co-authorship Authors that published an article
togheter.
Co-edition Authors that appear as editors in the
same event or journal.
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 10/21
Introduction Methodology Experiments Conclusions
Summary
1 Introduction
2 Methodology
3 Experiments
4 Conclusions
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 11/21
Introduction Methodology Experiments Conclusions
Multimodal Analysis - WCC
Weakly-connected components distribution - Co-authorship
13% small components with up to 30 nodes
Giant component with 87% of the authors
44.000 sub-networks of co-authorship - eventual researchers,industry white papers
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 12/21
Introduction Methodology Experiments Conclusions
Multimodal Analysis - ACC
Node degree × average clustering coefficient - Co-authorship
High coefficient values are found in nodes with degree < 10
Coefficient value decreases as the node degree increases - ACC ∝ degree−1.06
Authors tend to colaborate with the co-authors of their co-authors - triangles
Young authors vs. older authors
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 13/21
Introduction Methodology Experiments Conclusions
Multimodal Analysis - Densification
Degree distribution - Co-autorship
As new authors appear new edges also appear - e(t) ∝ n(t)1.47 - densification
Edges appear exponentially vs. publication of elaborated articles
Master and Ph.D as regular coursesFunding agencies - numbersMore authors per paper
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 14/21
Introduction Methodology Experiments Conclusions
Multimodal Analysis - Diameter
Effective diameter evolution - Co-edition
Peaked near 1995 - beginning of a shrink period
Before that - new editors/publication vehicles vs. after that - same editor/samevehicles
Densification period: more new edges than new nodes - editor commitees rotatebetween same members
Editor: experience and expertise - limitations for new researchers
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 15/21
Introduction Methodology Experiments Conclusions
Multimodal Analysis - Previsibility
Previsibility analysis - Co-authoring
Can we predict new interactions in the DBLP newtork?
Extraction of topological features → supervised learning
Figure: Results - Interval G [1995, 2005],G [2006, 2007]
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 16/21
Introduction Methodology Experiments Conclusions
Multimodal Analysis - Counting and algebraic analysis
Counting - Bipartite author-article network with timestamps
Accomplishment: number of years with at least onepublication
Silence: number of consecutive years with no publications
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 17/21
Introduction Methodology Experiments Conclusions
Multimodal Analysis - Counting and algebraic analysis
Proposed metric
Importance = 1√silence+1
∗ log(Accomplishment)
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 18/21
Introduction Methodology Experiments Conclusions
Summary
1 Introduction
2 Methodology
3 Experiments
4 Conclusions
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 19/21
Introduction Methodology Experiments Conclusions
Conclusions
Well-defined analytical process - combination of multipletechniques
Non-trivial extraction of information from DBLP
Multi-perspective interpretations about the past and future ofthe academic community in computer science
Application in the decision making process of funding agenciesand academic personnel
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 20/21
Introduction Methodology Experiments Conclusions
Thanks!
Questions?
The 30th ACM/SIGAPP Symposium On Applied Computing, 2015 21/21