An Algorithm to Determine Peer-Reviewers

An Algorithm to Determine Peer-Reviewers

Marko A. Rodriguez and Johan BollenDigital Library Research and Prototyping Team

T-7, Center for Nonlinear StudiesLos Alamos National Laboratory

October 25, 2008

Peer-Review Problem Statement

• Editors are overwhelmed due to the number of submissions.

? Provide mechanisms to decentralize the peer-review process [10].

• Editors have a difficult time locating referees who know the domain ofdiscourse and do not have a ethical conflict with reviewing the submission.

? Automate the referee identification problem [9].

Conference on Information and Knowledge Management (CIKM) – Napa, California – October 25, 2008

Hypothesis

• It is hypothesized that the authors of the cited articles and their coauthorsare good referees.

• It is hypothesis that conflict of interest referees are considered the authorsof the article and their coauthors.

With respect to the article associated with this presentation:

• David Yarowsky, Radu Florian, Fabio Crestani, Tamara Sumner, etc. areconsidered competent referees.

• Marko A. Rodriguez, Johan Bollen, Herbert Van de Sompel, XiaomingLiu, Michael Nelson, etc. are considered conflict of interest referees.


Outline

• Define the coauthorship network data structure.

• Define the particle-swarm algorithm.

• Present experimental results validating the proposed algorithm.

• Related work and conclusion.


Outline






A Scholarly Coauthorship Network

Author-B

Author-C

Author-D

Author-E

Author-F

Author-A

All edges have a single homogenous meaning of “coauthor”. If Author-Aand Author-B have written an article together, then they are consideredcoauthors.



Our coauthorship network is defined as

G = (V,E, ω),

where V is the set of vertices, E ⊆ (V × V ), and ω : E → R+. Thefunction rule for ω is

ω(i, j) = ω(j, i) 7→∑

∀m∈M by i,j

1α(m)− 1

,

where M is the set of all manuscripts and α : M → N+ maps eachmanuscript to the total number of authors for that manuscript. Thus, themore authors on an article, the less “coauthor weight” exist between themwith respect to that article [5, 7, 6].



Finally the weight of all edges outgoing from a vertex are normalized toform a probability distribution over the outgoing edge set. Thus, for aparticular vertex i,

ω′ : E → [0, 1]

such that

ω′(i, j) 7→ ω(i, j)∑k ω(i, k)

,

where ω′(i, j) need not equal ω′(j, i).


Outline






A Particle Swarm Algorithm

0.250.5

0.5

1.0

0.75

1.0

1.0

1.0

t = 1 t = 2 t = 3 t = 4

A particle begins its journey at a particular vertex and will take anoutgoing edge of its current vertex biased by the outgoing probabilitydistribution defined over the outgoing edge set. Moreover, at each discretetimestep in N+ the particle decays in energy.



The set of all particles in the network is P where pi ∈ P is the ith particle.The properties of an individual particle include:

1. ci(t) ∈ V : is the location of the particle pi at time t.

2. εi(t) ∈ R: is the amount of energy contained within the particle pi attime t.

3. δi ∈ [0, 1]: is the decay parameter governing the loss of energy as theparticle pi propagates through the network. This is a globally definedparameter in our experiment with decay set to ∀iδi = 0.15.

4. particles can maintain state and have heterogenous internal logics toperform more complex walks.



The algorithm runs for k timesteps. At each step, the particle has itsenergy decayed such that

εi(t+ 1) =

{(1− δi)εi(t) if t ≤ k0 otherwise

Finally, there exists a global rank vector e ∈ R|V | that records how muchenergy has passed through each vertex.

eci(t)(t+ 1) = eci(t)(t) + εi(t)

Thus,

el(k) =t≤k∑t=1

i≤|P |∑i=1

{(1− δi)t−1εi(1) if ci(t− 1) = nl

0 otherwise.


Experimental Results

-

-

+

+

author1author2

reference1reference2

-

+

+

-

co-authorshipnetwork

submission

Authors of the submitted article have negative energy particles provided totheir corresponding vertex in the coauthorship network. The authors of thereferenced articles (i.e. cited authors) are provided positive energyparticles.


Outline







The DBLP provided us the data set from which to construct ourcoauthorship network, which includes 284,082 authors and 2,167,018coauthorship edges. The 2005 ACM/IEEE Joint Conference on DigitalLibraries provided us a their program committees referee bid data. That is,for each of the 124 submitted manuscripts, each of the 77 programcommittee members stated:

1. I am an expert in the domain of the submission and want to review2. I am an expert in the domain of the submission3. I am not an expert in the domain of the submission4. There exists a conflict of interest

[1] ≈ [2] > [3] ≈ [4] ≈ 0.


Experimental Results[1] expert wanting to review (k=2)

log of the energy value

fre

qu

en

cy

!20 !15 !10 !5 0

05

10

20

30

[2] expert (k=2)


fre

qu

en

cy

!20 !15 !10 !5 0

05

01

00

15

02

00

[3] non!expert (k=2)


fre

qu

en

cy

!20 !15 !10 !5 0

02

04

06

08

01

00

[4] conflict of interest (k=2)


fre

qu

en

cy

!20 !15 !10 !5 0

05

10

15

20

25



k!steps of negative energy

aver

age

indi

vidu

al e

nerg

y


aver

age

indi

vidu

al e

nerg

y


aver

age

indi

vidu

al e

nerg

y


aver

age

indi

vidu

al e

nerg

y0.

00.

10.

20.

30.

4

0 1 2 3 4 5 6 7

(1) expert wanting to review(2) expert

(3) non!expert

(4) conflict of interest



• Other types of relationships are involved in conflict of interest situationsbesides previous article collaborations (e.g. same institution, friendship,shared committees, etc.) [2, 8].


Outline






Related Work and Conclusion

• Latent semantic indexing to match manuscript abstract to referees [3, 11].

• Expertise identification via web mining techniques [1].

• Simply asking authors and the referees to provide keyterms describing their manuscript

and area of expertise respectively [4].

• Due to the computational and human intervention costs, applications of the mentioned

referee identification algorithms have been restricted to situations in which such

information can be obtained for a pre-selected set of individuals, e.g. conferences and

workshops.

• They have consequently failed to gain acceptance in the domain of classic journal

peer-review and open commentary peer-review.


Related Work and Conclusion

• The proposed automatic referee identification algorithm requires no human intervention,

is computationally efficient, and can, to some extent, automatically identify conflict of

interest situations.

• The referee weighting aspect of the algorithm provides a strong incentive for its use

in open commentary peer-review. The level of automation provides the necessary

infrastructure to decouple the publication process from the peer-review process in the

sense that editors are no longer required to assign referees.


Acknowledgements

• This research could not have been conducted if it were not for thesupport of the 2005 JCDL program chair and steering committee.

• Herbert Van de Sompel supported this research through data acquisition.

• Journal of Memetics1 for using a prototype implementation of thealgorithm in their peer-review process.

• This research was financially supported by the Los Alamos NationalLaboratory.

1Journal of Memetics available at: http://www.jom-emit.org/


References

[1] C. Basu, H. Hirsh, W. Cohen, and C. Nevill-Manning. Technical paperrecommendation: A study in combining multiple information sources.Journal of Artificial Intelligence Research, 14:231–252, 2001.

[2] Johan Bollen, Marko A. Rodriguez, Herbert Van de Sompel, Luda L.Balakireva, and Aric Hagberg. The largest scholarly semanticnetwork...ever. In ACM World Wide Web Conference, Banff, Canada,Banff, Canada 2007. ACM Press.

[3] Susan T. Dumais and Jakob Nielsen. Automating the assignment ofsubmitted manuscripts to reviewers. In SIGIR ’92: Proceedings ofthe 15th annual international ACM SIGIR conference on Research and


development in information retrieval, pages 233–244, Copenhagen,Denmark, 1992. ACM Press.

[4] Juan J. Merelo Guervos and Pedro A. Castillo Valdivieso. Conferencepaper assignment using a combined greedy/evolutionary algorithm.In Proceedings of the International Conference on Parallel ProblemSolving from Nature, pages 602–611, Birmingham, UK, 2004.

[5] Xiaoming Liu, Johan Bollen, Michael L. Nelson, and Herbert Vande Sompel. Co-authorship networks in the digital library researchcommunity. Information Processing and Management, 41(6):1462–1480, 2006.

[6] M E J Newman. Scientific collaboration networks: I. networkconstruction and fundamental results. Physical Review E,64(1):016131, 2001.


[7] M E J Newman. Scientific collaboration networks: Ii. shortest paths,weighted networks, and centrality. Physical Review E, 64(1):016132,2001.

[8] Marko A. Rodriguez. Grammar-based random walkers in semanticnetworks. Knowledge-Based Systems, 21(7):727–739, 2008.

[9] Marko A. Rodriguez, Johan Bollen, and Herbert Van de Sompel.An algorithm to determine peer-reviewers. In Proceedings of theConference on Information and Knowledge Management, Napa,California, October 2008. ACM Press.

[10] Marko A. Rodriguez, Johan Bollen, and Herbert Van de Sompel. Theconvergence of digital-libraries and the peer-review process. Journal ofInformation Science, 32(2):151–161, 2006.


[11] D. Yarowsky and R. Florian. Taking the load off the conferencechairs: towards a digital paper-routing assistant. In Proceedings ofthe 1999 Joint SIGDAT Conference on Empirical Methods in NLP andVery-Large Corpora., Cambridge, MA, 1999.


Technology

An Algorithm to Determine Peer-Reviewers