21
PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan Presenter: Erjia Yan Boğaziçi University, Istanbul ISSI, June 29

PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

PAGERANK-RELATED METHODSFOR ANALYZING CITATION NETWORKS

Author: Ludo Waltman and Erjia YanPresenter: Erjia Yan

Boğaziçi University, IstanbulISSI, June 29

Page 2: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

• Objectives– understandings of PageRank– applications of PageRank in informetric research– tutorial: extracting journal citation networks

through bibliographic data– tutorial: computing PageRank for journals in

journal citation networks using Sci2 and MATLAB

Objectives | 2

Page 3: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

NON-RECURSIVE

• journal impact factor• h-index• accumulative number of

citations• accumulative number of

publications• …

RECURSIVE

• PageRank and its variants– AuthorRank (Liu et al., 2005)– Y-factor (Bollen et al., 2006)– CiteRank (Walker et al., 2007)– FutureRank (Sayyadi &

Getoor, 2009)– Eigenfactor (Bergstrom &

West, 2008)– SCImago (SCImago, 2007)– weighted PageRank (Ding,

2011; Yan & Ding, 2011)– …

A comparison | 3

Page 4: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

NON-RECURSIVE RECURSIVE

A comparison | 4

Page 5: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

• Observations– non-recursive methods take into account only the local

structure of a citation network; thus, a citation originating from Nature or Science has the same weight as a citation originating from some obscure journals

• Motivations– using recursive methods to take into account the global

structure of a citation network such that citations originating from highly cited nodes are given more weight than those originating from lowly cited nodes

Observations and motivations | 5

Page 6: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

• Basics of PageRank– the concept was first proposed by Pinski and Narin in 1976

(influence weight); PageRank was introduced as a method for ranking web pages by Brin and Page in 1998

• Formulation

– where α denotes the damping factor parameter, Bi denotes the set of all web pages that link to web page i, mj denotes the number of web pages to which web page j links, and ndenotes the total number of web pages to be ranked.

Basics of PageRank | 6

nmp

piBj j

ji

1)1(

Page 7: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

• In other words…– the larger the number of web pages that link to web page i,

the higher the PageRank value of web page i– the higher the PageRank values of the web pages that link

to web page i, the higher the PageRank value of web page i– for those web pages that link to web page i, the smaller the

number of other web pages to which these web pages link, the higher the PageRank value of web page i

– the closer the damping factor parameter α is set to 1, the stronger the above effects

PageRank meanings | 7

Page 8: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

• On the damping factor– 1: PageRank won’t converge– just below 1 (e.g., 0.9999): extremely sensitive to small

changes in the network of links– 0.5: according to Chen et al. (2007), 0.5 is preferred for

citation networks based on the assumption that authors on average will browse as far as two degrees of references (references and references’ cited references, thus 1-1/2=0.5)

– 0.85: the default (coincide the “six degrees of separation”: 1-1/60.85)

Damping factor | 8

Page 9: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

• Applications– Analyzing journal citation networks

• Y-factor; Eigenfactor; SCImago Journal Rank (SJR)

– Analyzing author citation networks• SARA (science author rank algorithm)

– Analyzing document citation networks• CiteRank

Applications | 9

Page 10: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

TUTORIALS

Tutorials | 10

Page 11: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

• Tools we need– Sci2: https://sci2.cns.iu.edu/user/index.php – Sci2 plugins:

http://wiki.cns.iu.edu/display/SCI2TUTORIAL/3.2+Additional+Plugins

– MATLAB or Octave: http://www.gnu.org/software/octave/

• Data materials– http://www.pages.drexel.edu/~ey86/p/tutorial/

Tools and materials | 11

Page 12: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

Steps 1-5 | 12

Page 13: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

• Step 6: merge individually downloaded files– on Windows systems, a command such as copy *.txt

merged_data.txt can be entered in the Command Prompt tool

– in the resulting file, make sure to remove all lines ‘FN Thomson Reuters Web of Knowledge VR 1.0’ except for the first one and all lines ‘EF’ except for the last one

• Step 7: change file extension– change the extension of the text file that contains your

bibliographic data from .txt into .isi.

Steps 6-7 | 13

Page 14: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

Steps 8-9 | 14

Page 15: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

Steps 10-12 | 15

Page 16: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

Step 13 | 16

Page 17: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

Steps 14-19 | 17

Page 18: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

Step 19 | 18

function p = calc_PageRank(C, alpha, n_iterations)

% Take care of dangling nodes.

m = sum(C, 2);

C(m == 0, :) = 1;

% Create a row-normalized matrix.

n = length(C);

m = sum(C, 2);

C = spdiags(1 ./ m, 0, n, n) * C;

% Apply the power method.

p = repmat(1 / n, [1 n]);

for i = 1:n_iterations

p = alpha * p * C + (1 - alpha) / n;

end

Page 19: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

Steps 20-21 | 19

The resulted PageRank scores for the journals

Page 20: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

• Author and document citation networks and PageRank calculations can be obtained through extracting proper networks in Sci2

Other citation network types | 20

Page 21: PAGERANK-RELATED METHODS FOR ANALYZING ...info.slis.indiana.edu/~dingying/download/ISSI_tutorial...citation networks based on the assumption that authors on average will browse as

• Questions?

• Any further questions can be directed to:– Erjia Yan [email protected] or– Ludo Waltman [email protected]

Thank you | 21