19
Abstract Identifying the influence of authors and papers is important for fostering academic research. In this paper, we build three kinds of networks (co-author network, paper citation network and author citation network) and employ two measures (PageRank, local centrality) to identify the influence of academic research. We build and analyze the co-author network of Erdös1 authors. Skills of data extraction has been used to simplify the process of network building. To analyse properties of the network, degree distribution, clustering coefficient and average distance are calculated. After that, we find the existence of the small world phenomenon in this network. Furthermore, we employ PageRank algorithm for undirected network and centrality measure to determine the influence of authors. Through the analysis of 511 nodes in the co-author network, we find that Frank Harary is the most influential author. To test the effectiveness of the method, we use citations of authors as a reference. A paper citation network is built to identify the influence of papers. Some papers are added to the given data set so as to gain a connected network. We introduce “local centrality” measure to determine the influence of paper. PageRank measure is given as a comparison to prove the effectiveness of “local centrality” measure. We conclude that Collective dynamics of `small-world' networks is the most influential paper in our paper data set. Projecting the paper citation network into an author citation network, we use weighted PageRank algorithm ranking the influence of authors and visualize it. Erdös is considered as the most influential scientist in Network Science. In order to test our influence measurement algorithm, we implement it on Facebook data set. The method of dealing with the co-author network is used to cope with the Facebook social network because of similarities of the two networks. It performs well with the empirical support. Finally, sensitivity analysis proves that our algorithm is robust. Discussion on the power of network analysis is included in our paper. Keywords: Local Centrality, PageRank Algorithm, Network science, Centrality measures, Social Network Analysis For office use only T1 ________________ T2 ________________ T3 ________________ T4 ________________ Team Control Number 29080 Problem Chosen C For office use only F1 ________________ F2 ________________ F3 ________________ F4 ________________ 2014 Mathematical Contest in Modeling (MCM) Summary Sheet (Attach a copy of this page to your solution paper.)

29080Team #29080 page 2 of 18 In the co-author network of Erdös1, ignore the frequency of collaboration. To a certain degree, the citations can be a pretty good measure for the influence

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Abstract Identifying the influence of authors and papers is important for fostering academic

    research. In this paper, we build three kinds of networks (co-author network, paper

    citation network and author citation network) and employ two measures (PageRank,

    local centrality) to identify the influence of academic research.

    We build and analyze the co-author network of Erdös1 authors. Skills of data

    extraction has been used to simplify the process of network building. To analyse

    properties of the network, degree distribution, clustering coefficient and average

    distance are calculated. After that, we find the existence of the small world

    phenomenon in this network. Furthermore, we employ PageRank algorithm for

    undirected network and centrality measure to determine the influence of authors.

    Through the analysis of 511 nodes in the co-author network, we find that Frank Harary

    is the most influential author. To test the effectiveness of the method, we use citations

    of authors as a reference.

    A paper citation network is built to identify the influence of papers. Some papers are

    added to the given data set so as to gain a connected network. We introduce “local

    centrality” measure to determine the influence of paper. PageRank measure is given

    as a comparison to prove the effectiveness of “local centrality” measure. We conclude

    that Collective dynamics of `small-world' networks is the most influential paper in our

    paper data set.

    Projecting the paper citation network into an author citation network, we use

    weighted PageRank algorithm ranking the influence of authors and visualize it. Erdös

    is considered as the most influential scientist in Network Science.

    In order to test our influence measurement algorithm, we implement it on Facebook

    data set. The method of dealing with the co-author network is used to cope with the

    Facebook social network because of similarities of the two networks. It performs well

    with the empirical support.

    Finally, sensitivity analysis proves that our algorithm is robust. Discussion on the

    power of network analysis is included in our paper.

    Keywords: Local Centrality, PageRank Algorithm, Network science,

    Centrality measures, Social Network Analysis

    For office use only

    T1 ________________

    T2 ________________

    T3 ________________

    T4 ________________

    Team Control Number

    29080

    Problem Chosen

    C

    For office use only

    F1 ________________

    F2 ________________

    F3 ________________

    F4 ________________

    2014 Mathematical Contest in Modeling (MCM) Summary Sheet (Attach a copy of this page to your solution paper.)

  • Team #29080 page 1 of 18

    Identifying Influential Nodes in the Network:

    PageRank and Local Centrality Measures

    1. Introduction

    Network science has been proven to be effective in solving problems in multiple

    fields. In the field of Informetrics, large number of citations, co-citation and co-author

    networks have been built to determine the influence of papers and scientific authors.

    Prior literatures used measures of social network analysis (SNA) in citation and

    coauthor networks to dig information for evaluating the author influence. E. Otte et al.

    studied SNA in the co-author network, and concluded that centrality can be used to

    find core specialists in certain field [1]. Abbasi A. et al. used measures from SNA for

    examining the effect of networks on the (citation-based) performance of scholars [2].

    Also, Martins M E et al. discussed centrality and small-world phenomenon in the

    collaborative network [3]. PageRank algorithm is used to analyze authors in co-citation

    networks [4].

    In addition, effective measures based on authors’ work, such as, citations, h-index,

    g-index are proposed. The number of citations qualifies the quantity of publications

    [5].Hirsch introduced the h-index as a measure that combines the quantity of

    publications and the impact of publications [6]. Another widely used index for

    evaluation is g-index, which is introduced by Egghe [7].

    The literature research on using network to study influence of paper is few. There

    are studies assessing the relative influence of journals [8, 9]. In this paper, the problems

    we are going to investigate include:

    analyzing influence of authors in Erdös1 network,

    determining relative influence of foundation papers in a certain field, and

    implementing algorithms on a completely different set of network influence data.

    2. Assumptions

    For simplicity, we make the following assumptions in this paper:

    The error of the data is considered acceptable. Difficulties in author identification,

    surely make the data less than 100% accurate, but we believe the errors will not

    imfluence our result.

    Assume that the co-authors have done the same contributions to their paper.

    That is, we ignore differences in co-authors’ contribution to the paper. This is

    reasonable, for it is hard to distinguish how much work has been conducted by

    each author.

    For a given paper, all its referenced papers share the same influence over it.

    That is, referenced papers contribute to the result of paper equally.

  • Team #29080 page 2 of 18

    In the co-author network of Erdös1, ignore the frequency of collaboration.

    To a certain degree, the citations can be a pretty good measure for the influence

    of authors and papers. We use it to test our measures.

    The influence of college, or organization is determined by the influence of

    authors and papers. This is intuitive, for both the college and journals can be

    regarded as a set of papers or a set of scientific authors.

    3. Co-author Network of the Erdös1 Authors

    3.1 Building Co-author Network

    Based on the given data Erdös1.htm, we build our co-author network, the process

    of data extracting and network building as follows:

    Step1. Obtain 511 Erdös1 authors’ name, assign each of them an identity (ID)

    number from 1 to 511.

    Step2. Build a mapping table which map Erdös1 co-authors’ name to its ID number

    Step3. For Erdös1 author whose ID number is u, obtain one of its co-authors’ name,

    search it in mapping table, decide whether this co-author is Erods1 author. If is,

    obtain its ID v, build an edge between u and v; otherwise, do nothing.

    Step4. Repeat Step3 until all co-authors of u have been inquired.

    Step5. Repeat Step4 for all Erdös1 authors.

    Figure 1 show the co-author network of the Erdös1 authors. The co-author

    network is depicted through a graph. The nodes i of the graph represent authors. A

    link between node i and node j indicates a collaboration relationship between authors.

    Figure 1. Co-author network of the Erdös1 authors

  • Team #29080 page 3 of 18

    This network is not a connected graph since there are some isolated nodes in the

    left. The implication of isolated nodes is that some Erdös1 co-authors did not

    collaborate with other co-authors of Erdös.

    In the next sections, we can limit the size of the network by taking no consider of

    isolated nodes, in which case, the network is a connected graph. It is convenient for us

    to analyze the properties of the network or influence of nodes.

    3.2 Analyzing Properties of Co-author Network

    The properties of the network we are going to analyze include: degree distribution,

    clustering coefficient, average distance. In addtion, we devote to find small-world

    phenomenon in the Erdös1 network.

    Degree Distribution

    We consider degree distribution of co-authorship network first. Let du denote the

    degree of node u in the co-author network, which corresponds to the number of co-

    authors of individual u. The degree distribution of network is shown in Figure2. Note

    that k represents the degree; p(k) is the fraction of nodes with degree equal to k.

    Figure 2. Degree distribution of co-author network

    The degree distribution is plotted in log-log plot, we fit the degree distribution

    with the log-log function as follows:

    lg( ( )) lg( )p k k c (1)

    Both the fitted equation and the fitted line are showed in the Figure 2. It indicates

    the degree of the network obey power-law distribution, with a wide range of values.

    This feature is in line with prior literatures. It suggests that a small number of authors

    with many collaborators and a lot of authors with few collaborators.

  • Team #29080 page 4 of 18

    Clustering coefficient

    The clustering coefficient of node u measures the cluster degree of its neighbours,

    and can be obtained as follows:

    1 2( ) /

    uu

    u u

    ECC

    d d (2)

    where, Eu is the number of edges among neighbors of node u, and du is the degree of u.

    In the Erdös1 network, the clustering coefficient is 0.28.

    Average Distance

    The average distance is the average length of the shortest path among all pairs of

    nodes. High average distance indicates that resources, such as information, must pass

    by a large number of intermediaries to travel between nodes in the network [1].

    Due to the above network is not connected one. We restrict attention to the giant

    connected component of this graph, which includes 92.75% of all authors listed in the

    Erdös1. Hence, the average distance is 3.82 here. It is rather very small value, given the

    size of network, supporting the general notion that social networks form a “small world”

    [10].

    Small world

    A more rigorous analysis on small world is given following the Watts, D. and

    Strogatz, S. [11]. Based on the observed co-author network, the random co-author

    network is built with the same number of nodes (authors) and average number of

    edges per node. By comparing the parameters (observed and random), we can analyze

    whether this network is “small world”.

    The clustering coefficient and average distance of both network (observed and

    random ) are calculated and shown in Table 1.

    Table1. Small world‘s statistics

    As presented in Table 1, we have Lobs~Lexp and CCobs≫CCexp. Therefore, this

    network does show the small-world phenomenon. It means that the authors are

    connected to any other authors in the network through a small number of

    intermediaries [11].

    Variables Value

    Observable data

    Authors n 511

    Average number of ties per author k 11.86

    Lobs: average distance l 3.82

    CCobs: clustering coefficient CC 0.28

    Random data

    Lexp: expected average distance ln(n)/ln(k) 3.58

    CCexp: expected clustering coefficient k/n 0.012

  • Team #29080 page 5 of 18

    3.3 Co-author Influence Measure

    In this section, PageRank algorithm, centralities of nodes in the network are used

    to measure co-author relative influence. The PageRank algorithm is employed to dig

    the network structural information while centralities of nodes measure importance of

    nodes. At last, the importance of authors’ works is used to test the result.

    PageRank for Undirected Network

    PageRank is a ranking algorithm used to determine the importance of nodes in the

    network. Initially, it’s designed for web pages ranking [12]. The network of web pages

    is directed network which consists of billion nodes and links. It also can be applied to

    undirected network [13]. We give the PageRank algorithm for undirected network as

    follows:

    Step1. Convert an undirected network to the directed one by changing one

    undirected edge to two directed edges. It is shown in Figure 3.

    Step2. Use PageRank algorithm with the damping factor [13]. Given a directed

    network of N nodes i=1,2,…,N, the PRi for the ith node is defined by the recursion

    formula:

    ( 1)( ) (1 )

    i

    j

    i outj B j

    PR m dPR m d

    k N (3)

    here, Bi is the set of nodes that point to i, kjout is the out-degree of node j, and d

    which called ‘damping factor’ is a parameter that control the performance of

    PageRank algorithm. m is the iterations of calculating. After plenty of iterations,

    PR will converge and we get PageRank for each node.

    Step3. Rank the nodes by the PageRank lists and test performances of the

    algorithm in different damping factors.

    Figure 3. Converting undirected network to directed network

    We rank the authors in the Erdös1 network with the above algorithm. The top ten

    authors who have significant influence are shown in Table 2.

  • Team #29080 page 6 of 18

    Table 2. Top ten influential authors ranked by PageRank

    Frank Harary is ranked as the most influential author in our PageRank result. This

    is reasonable since he specialized in graph theory and was widely recognized as one

    of the "fathers" of modern graph theory [14].

    Identifying the influence of authors only with PageRank is not enough. Centrality

    is configured as a property that measures how central a node is in a network.

    Centrality [15] The most important centrality measures are: degree centrality, closeness centrality and

    betweenness centrality.

    Degree centrality of a node is defined as the number of ties this node has. In a co-

    author network the degree centrality of a node(author) is just the number of nodes in

    the network with whom she has co-authored at least one article. In mathematical terms

    standardized degree centrality, Dsi, of node i is defined as:

    1

    1si ij

    j

    D en

    (4)

    where, eij=1 if there is a link between nodes i and j, and eij=0 if there is no link. n is the

    number of authors in the network.

    Closeness centrality of a node is equal to the total distance of this node from all other

    nodes. In mathematical terms standardized closeness centrality, Csi, of node i is defined

    as:

    si

    ijj

    nC

    p

    (5)

    where, pij is the length of the shortest path from node i to node j.

    Betweenness centrality may be defined as the number of shortest path of two

    arbitrary nodes pass a given node. In mathematical terms standardized betweenness

    centrality, Bsi, of node i is defined as:

    2

    1 2 ,( )( )

    jik

    sij k jk

    gB

    n n g

    (6)

    where, jkg is the number of shortest paths from node j to node k (j, k≠i), and jikg is

    the number of shortest paths from node j to node k passing through node i.

    Rank Name Rank Name

    1 Frank Harary* 6 Vojtech Rodl

    2 Noge M. Alon 7 Zsolt Tuza

    3 Ronald Lewis Graham 8 Carl Bernard Pomerance

    4 Bela Bollobas 9 Zoltan Furedi

    5 Vera Turan Sos 10 Joel Harold Spencer

  • Team #29080 page 7 of 18

    Figure 4. Centralities of nodes in three dimensional coordinates

    We calculate above centrality measures and plot it in a 3-demensional coordinates

    which is shown as Figure 4.

    Note that the x, y, z axis represents degree centrality, closeness centrality and

    betweenness centrality respectively. As three centrality measures are used to

    determine the influence of authors, the higher centrality, the more influential the

    author. Hence, in the three dimensional coordinates, nodes which are plotted in the

    upper-right have significant influence.

    The red points in the Figure 4 are the top ten influential authors in the Erdös

    network which calculated by the PageRank. Clearly, the centrality measure result is

    compatible with the PageRank result. In this case, it indicates effectiveness of our

    model.

    Empirical Support

    Both centrality measure and PageRank, we take no considerate of the importance

    of authors’ work. Actually, it must be an essential measure. We add it in this section

    pursuing an overall influence appraisal for authors in the Erdös 1 network.

    Plenty of indexes have been studied and used to measure the academic influence

    of scientific authors, such as h-index and total citations. Actually, all this indexes

    measure the importance of authors’ works. To determine who in this Erdös 1 network

    has significant influence, we take total citations as a measurement of importance of

    authors’ work. Both h-index and g-index are proved to be relatively effective in

    measuring authors’ work. However, since many works of the authors in Erdös1

    network was done before 1995, and the h-index in Scopus databases[16] is calculated

    only consider citations after 1995, it is not reasonable to employ h-index here. In

    addition, total citation can be a pretty good indicator for it reflects both the quantity

    and the quality of authors’ work.

    The method we propose is to compare the citations of our PageRank top10 authors

    with the citations of other 20 authors randomly selectted in Erods1 network. We

  • Team #29080 page 8 of 18

    visualize the number of citations in the Figure 5 for comparison.

    Figure 5. Comparison of citations

    As is shown in the Figure 5, most of the PageRank top 10 authors have citations

    greater than 2000, while all randomly selected author below 2000. It indicates that the

    PageRank top 10 authors have high citations on average comparing with randomly

    selected author. Therefore, we give a set of authors who have significant influence

    rather only give one.

    4. Citation Network of Papers

    4.1 Building Citation Network

    Data and Data Processing

    The set of papers we use to construct a citation network includes all paper from the

    attached list (NetSciFoundation.pdf) and other 6 papers in the field of network science

    we collected via Google Scholar. Note that the fifth paper in the NetSciFoundation.pdf,

    which write by Borgatti, S and titled “Identifying sets of key players in a network”, is

    not published on Journal of Computational and Mathematical Organization Theory in 2006

    but Journal of Integration of Knowledge Intensive Multi-Agent Systems in 2003. In fact,

    Borgatti, S has two paper which has similar title, shown as follows:

    Borgatti S P. Identifying sets of key players in a network[C]//Integration of

    Knowledge Intensive Multi-Agent Systems, 2003. International Conference on.

    IEEE, 2003: 127-131.

    Borgatti S P. Identifying sets of key players in a social network [J]. Computational

    & Mathematical Organization Theory, 2006, 12(1): 21-34.

  • Team #29080 page 9 of 18

    Specifically, we chose the first one. The choice has no impact on measuring the

    relative importance of paper. We label each paper with number from 1 to 22. According

    to the published time, paper in attached list is labeled with number 1~16, paper we

    discovered is labeled with number 17~22. For simplicity, in the following part, we use

    the ID number to refer papers.

    Citation Network

    For a given set of foundation papers, to determine their relative influence, we build

    a citation network. We find edges of citation network based on references of paper.

    Figure 6 show the citation network. If paper 4 cited paper 3, there is an edge from

    number 4 point to number 3.

    Figure 6. Citation network of 22 papers in the field of network science.

    4.2 “Local centrality” Measure

    An intuitive idea of influence of paper should be, either paper is cited by highly

    cited paper, or is cited highly by massive papers, it is influential. To determine the

    relative influence of those papers within the network, we introduce a model base on

    the citations and the idea of local centrality [17].

    Chen et al. propose local centrality to identify influential nodes in complex

    networks [17]. The local centrality is proved to be effective in measuring importance of

    nodes in large-scale networks. The local centrality CL (v) of node v is defined as:

    ( )

    ( )w u

    Q u N w

    (7)

  • Team #29080 page 10 of 18

    ( )

    ( ) ( )Lu v

    C v Q u

    (8)

    where, Γ (u) is the set of the nearest neighbors of node u and N (w) is the number of

    the nearest and the next nearest neighbors of node w.

    The idea of local centrality is, for node u, using the number of a certain scope of

    nodes (distance lower than 4) to determine the influence of node u.

    Our approach is, for paper u, using the sum of the total citations of papers which

    cite paper u, and papers which indirectly cite paper u with 2 direct citation, to the

    influence of paper u. In mathematical terms, our ‘local centrality’ of paper u, '( )LC u is

    defined as:

    '

    ( ) ( ) ( )

    ( ) ( ) ( )Lv u v u w v

    C u C v C w

    (9)

    where, C(u) is the number of citations of node u.

    Table 3. Influence rank of paper measured by ‘local centrality’

    Rank 1 2 3 4 5 6 7 8 9 10 11

    ID 5 17 18 19 1 6 10 7 8 21 3

    Rank 12 13 14 15 16 17 18 19 20 21 22

    ID 12 20 9 11 22 13 2 4 14 15 16

    Table 3 show the Influence rank of paper measured by ‘local centrality’. ID number

    5 paper that ranks the first is “Collective dynamics of ‘small-world’ networks”, which

    is widely regarded as most influential fundamental paper in the network science field.

    The ID number 17 paper is “On the evolution of random graphs”;18 is “Internet:

    Diameter of the world-wide web”; 19 is “The large-scale organization of metabolic

    networks”. They are all very popular literature and much important works have

    followed from their publication. 15 and 16 are not so influential because they are

    published in 2007 and 2010, respectively, and have no citations in the network.

    4.3 Using PageRank Measure for Comparison

    We also rank the set of paper based on PageRank, and show the result in Table 4

    for comparison.

    Table 4. Influence rank of paper measured by PageRank

    Rank 1 2 3 4 5 6 7 8 9 10 11

    ID 5 1 17 3 6 18 8 19 10 7 21

    Rank 12 13 14 15 16 17 18 19 20 21 22

    ID 12 20 13 9 11 22 2 4 14 15 16

    This results is similar to rank measured by ‘local centrality’ but not so effective since

    ID number 3 ranks top 5 while it has less citations. The reason is that “local centrality”

    consider both structure of the network and the works follow from a paper.

  • Team #29080 page 11 of 18

    4.4 Similar Measure for Individual Researcher

    The core idea of our measure is that the influence of paper u is determined by the

    quality and quantity of paper v that cite paper u. The quality of paper is measured by

    the number of citations. To measure the individual in the citation work, this idea

    continues to be useful, because the author’s influence is transmitted by paper. In

    addition, co-authors share the influence of their paper.

    A weighted author citation network(WACN) between authors can be easily

    determined as a particular projection of the paper citation network(PCN), shown as

    Figure 7 [18].

    Figure 7. Projection of the PCN into a WACN [18]

    Note: (a) the paper i, written by two authors i1 and i2, cited by two papers j and k,

    written by one author j1 and two authors k1 and k2. (b) the WACN is then simply

    generated by connecting with a directed link both i1 and i2 to j1, each with weight of

    1/2, and to k1 and k2, each with weight of 1/4 [18].

    We then utilize weighted PageRank algorithm to rank the influence of authors in

    the weighted author citation network. The weighted PageRank algorithm is different

    from the normal one because its edge weight is different. The recursion formula in the

    weighted PageRank is a deformation of the formula (10):

    ( 1)( ) (1 )

    i

    j

    i jioutj B j

    PR m dPR m d w

    s N (10)

    outj jkk

    s w (11)

    Here wji is the weight of the directed connection from j to i, sjout is the outstrenght of

    the node j. So we just change 1/ kjout in formula (10) into wji/ sjout, which means the

    probability of a random walk from node j to node i.

    Figure 8 shows the result. The size of the node is used to represent the PageRank

    value. The larger size the nodes, the more influential the author. Obviously, author like

    Erdös and Strogatz who did some innovative benchmark work is very famous in our

    model.

  • Team #29080 page 12 of 18

    Figure 8 Influence of individual researchers in the author citation network.

    4.5 Authors and Their Papers

    We conduct a consistency check by analyzing the relationship between authors and

    their papers, to check our empirical deduction that influential authors has influential

    papers.

    Figure 9 The relationship between authors and their papers

  • Team #29080 page 13 of 18

    As is shown in Figure 9. The left and right numbers represent the rank of papers

    and authors, respectively.The arrow point from the author to all his papers. Red lines

    means the influence between the papers and the authors is consistent. The implication

    is that influential author has influential papers. The number of red lines is greater than

    black lines ,which indicates that our rank for both the papers and individual

    researchers is effective and persuasive.

    4.6 University, Department, and Journal

    As we already mentioned in assumptions, colleges, departments and journals can

    be regarded as a set of papers or a set of scientific authors. The influence of a college

    can be calculated by the total influence of scientists who work there and publications

    of its researchers. Via building co-author networks and citation networks. The

    influence of a college can be easily and reasonably evaluated. It is the same when

    evaluating department influence and journal influence.

    5. Implementing Our Algorithm on Facebook Dataset

    In order to test our influence measurement algorithm, we implement it in a

    different network. As Social Networking Service (SNS) enjoys a boom in the present

    century, we determine to use our algorithm in ‘friends circles’ of Facebook so as to

    identify the most influential person in the social network.

    5.1 Building Social Network

    Our dataset, which comes from Stanford Network Analysis Project [19], consists

    of ‘circles’ (or 'friends lists') from Facebook. The dataset includes node features, circles,

    and ego networks.

    We build a social network via a similar process of building a co-author network.

    The different meaning of edges is the difference between two networks. That is, one

    edge in a social network represents that the corresponding two persons know each

    other while the one in a co-author network means that the corresponding authors

    worked together for an article.

    .

    Table 5 Social Network Measures

    Variables Value

    Observable data

    nodes n 4039

    Average number of ties per node k 21.85

    Lobs: average distance l 3.69

    CCobs: clustering coefficient CC 0.61

    Random data

    Lexp: expected average distance ln(n)/ln(k) 2.61

    CCexp: expected clustering coefficient k/n 0.01

  • Team #29080 page 14 of 18

    Table 5 shows a characterization of the social network from our dataset. We can

    easily find that L~Lexp and CC≫CCexp. Therefore, this network does show the small-

    world phenomenon. It means that two persons in the network are ‘close’ to each

    other

    5.2 Methodology and Results

    Different at the edges’ meaning are, the social network is so similar to the co-author

    network. For instance, they both are undirected and unweighted graphs. Co-author

    network even can be regarded as a social network in some degree because authors in

    one identical article have large possibility to know each other.

    According to the similarity of two networks, we use the same method to analyze

    the social network. Firstly, we employ ‘PageRank for Undirected Network’ algorithm

    to calculate the PageRank of each node so as to get a rank list. Secondly, we get the

    centralities (degree centrality, closeness centrality and betweenness centrality) of each

    node after some calculating. After that, we compare the top20 nodes in our PageRank

    list via plotting their features of centralities to argue that our algorithm is correct.

    Figure 10 shows the result of plotting. (The red points represent the top20 nodes in

    PageRank list

    5.3 Empirical Support

    Empirical analysis of social networks finds that an influential person tends to have higher betweenness centrality or closeness centrality because he/she is the ‘bridge’ and

    the center of his/her circles of friends.

    Figure 10 illustrates the distribution of three centralities of nodes. We can easily

    find that top20 nodes in our ranking list also have higher betweenness centrality and

    closeness centrality, which is an evidence for the correctness of our method

    Figure 10.Centralities of nodes in three dimensional coordinates

  • Team #29080 page 15 of 18

    6. Sensitivity Analysis

    Figure 11. Performances with different damping factors.

    Note: The performance of ‘PageRank for undirected network’ algorithm with

    different damping factors. The co-authorship network is used to test the algorithm

    robustness.

    We discuss to what extent the results depend on the parameters. Figure 11 displays

    the performance of our algorithm with different damping factors. As we can see,

    although the value of PageRank for an identical node is different due to different

    factors, the rank list of nodes changes little, substantially proving the inherent stability

    of our algorithm.

    We can also find that with the increasing of the damping factor, the rank list become

    ambiguous, espically when the parameter d=0.85 (see the green line in Figure 11) or

    more. In the original PageRank algorithm[12], d was chosen to be 0.15. This value was

    proposed from the empirical observation that an individual surfing the web will

    typically follow of the order of 6 hyperlinks before changes the search, corresponding

    to a probablity d=1/6~0.15[13]. Similarly, we argue that the damping factor in a co-

    authorship network should be limited in a range 0.15~0.85 form the observation of

    the network structure.

    7. Strengths and Weaknesses

    7.1 Strengths

    Consistency. Although our influence measurement algorithms only consider the

    feature of the network, they match the empirical results (e.g., the rank list in co-

    authorship network matches well with authors’ citation rank list). In addition,

    under small changes, like adjusting the damping factor in our algorithm, the

  • Team #29080 page 16 of 18

    results of the model change little.

    Flexibility. Our methods easily adapts to problems with different kinds of

    networks, such as social networks. Additionally, as the time complexity is

    polynomial, our algorithms are suitable for dealing with large amount of data.

    Less assumptions required. By using weighted PageRank algorithm, we can

    reduce the number of assumptions. In order to simplify data processing, we ignore

    the strength of corporation and citation, but we can consider them in our algorithm

    if we have plenty of data.

    7.2 Weaknesses

    Complicated data collecting. In order to get the citation network, we download

    all the papers required and search the references manually, which takes lots of time.

    This method of building a network cannot implement on a large dataset.

    Little consideration on features of nodes. We treat every node equally in our

    algorithm. However, some feature of the node, like the author’s prestige in co-

    authorship network, should be considered in the algorithm.

    8. Conclusion and Discussion

    8.1 Conclusion

    We build three kinds of networks (co-author network, paper citation network and

    author citation network) and employ two influence measurement algorithms

    (PageRank, local centrality) so as to determine influence of academic research. After

    these analysis, we try to propose a general methoddology. Furthermore, we implement

    our method in social network analysis and it performs well.

    Co-author network is undirected and unweighted. We use ‘PageRank for

    Undirected Network’ algorithm to get the ranked list and compare it to the centrality

    measurement. The results are surprising. To test our algorithm, we use the citation

    indicator as a standard and they match with each other. We also test the algorithm by

    changing the damping factor.

    Paper citation network is directed and unweighted. We employ ‘Local centrality’

    indicator as a measurement of influence and compared it with the results of PageRank

    algorithm. It performs better to some extent.

    Author citation network is directed and weitheted. It is a network formed from

    citation network via a particular method. We implenment ‘weighted PageRank

    algorithm’ so as to get the ranked list of authors. In additon, we bulid a paper-author

    graph to illustrate the correctness of our network and algorithm.

    According to the similarity of the social network and co-author network, we

    implement the method of dealing with the latter one to the former and get the ideal

    results.

  • Team #29080 page 17 of 18

    8.2 Further Discussion

    Science

    Our methods are based on the works of many predecessors and the data come

    from real life. As we already mentioned above, centrality indicators and diffusion-

    based processes are wildly used in various kinds of complex networks, and proved to

    be well-performed. The results of evaluating node influence in the networks via both

    centrality measure and diffusion-based algorithm such as PageRank are highly

    consistent. Furthermore, the results have empirical support, which is reflected in top-

    ranked papers have more citations and top-ranked authors have more masterpieces

    and more citations. Therefore, our methods are scientific.

    Understanding

    The main idea of our methods comes from real life. Our main idea is the

    importance of a node is depends on the quality and quantity of nodes who accept it.

    The relationship between nodes (papers or authors) is presented as collaboration in

    the co-author network and citation in the paper citation network and author citation

    network. Actually, in the real world, we use the same idea to define the influence of

    individuals or works as well. Therefore, it is natural for us to understand this idea.

    Utility

    In various different networks such as traffic networks, business networks and

    social networks, evaluating the influence of nodes is a significant task for us to make

    better decisions. By using our methods in Facebook network, we know our methods

    are universally applicable in “small world” networks, which is common in the real

    world. That is to say, our methods can be wildly used in many different fields.

    Applications Examples

    Based on the network of commercial collaborations, companies can look for

    suitable and excellent partners.

    Based on the network of recommendation (workers recommend co-worker as the

    leader), organization can select appropriate leaders.

    Government can make some protection decisions by analyzing the importance of

    nodes within electricity networks, aviation networks, and transport networks.

    Individual can build effective social relationship which can promote personal

    development by analyzing social network to know important people.

    9. Reference

    [1] Otte E, Rousseau R. Social network analysis: a powerful strategy, also for the

    information sciences [J]. Journal of Information Science, 2002, 28(6): 441-453.

  • Team #29080 page 18 of 18

    [2] Abbasi A, Altmann J, Hossain L. Identifying the effects of co-authorship networks

    on the performance of scholars: A correlation and regression analysis of performance

    measures and social network analysis measures [J]. Journal of Informetrics, 2011, 5(4):

    594-607.

    [3] Martins M E, Martins G S, Csillag J M, et al. Service's scientific community: a social

    network analysis (1995-2010) [J]. Journal of Service Management, 2012, 23(3): 455-469.

    [4] Ding Y, Yan E, Frazho A, et al. PageRank for ranking authors in co‐citation

    networks[J]. Journal of the American Society for Information Science and Technology,

    2009, 60(11): 2229-2243.

    [5] Lehmann S, Jackson A D, Lautrup B E. Measures for measures [J]. Nature, 2006,

    444(7122): 1003-1004.

    [6] Hirsch, JE. 2005. "An index to quantify an individual's scientific research output."

    Proceedings of the National Academy of Sciences 102(46):16569

    [7] Egghe L. Theory and practise of the g-index[J]. Scientometrics, 2006, 69(1): 131-152.

    [8] Nerur S, Sikora R, Mangalaraj G, et al. Assessing the relative influence of journals

    in a citation network[J]. Communications of the ACM, 2005, 48(11): 71-74.

    [9] Delgado E, Repiso R. The Impact of Scientific Journals of Communication:

    Comparing Google Scholar Metrics, Web of Science and Scopus[J]. Comunicar, 2013,

    21(41).

    [10] Freire V P, Figueiredo D R. Ranking in collaboration networks using a group

    based metric[J]. Journal of the Brazilian Computer Society, 2011, 17(4): 255-266.

    [11] Watts D J, Strogatz S H. Collective dynamics of ‘small-world’networks[J]. Nature,

    1998, 393(6684): 440-442.

    [12] Page L, Brin S, Motwani R, et al. The PageRank citation ranking: bringing order to

    the web [J]. 1999.

    [13] Mihalcea R. Graph-based ranking algorithms for sentence extraction, applied to

    text summarization[C]//Proceedings of the ACL 2004 on Interactive poster and

    demonstration sessions. Association for Computational Linguistics, 2004: 20.

    [14] Frank Harary, Wikipedia, http://en.wikipedia.org/wiki/Frank_Harary

    [15] L.C. Freeman, Centrality in social networks: I. Conceptual clarification, Social

    Networks1 (1978) 215–239

    [16] Scopus databases, http://www.scopus.com/search/form.url?display=authorLoo

    Kup&clear=t&origin=searchbasic&txGid=EB1D2BA09CA152AF9BDA11F7EFFC08B4.

    Vdktg6RVtMfaQJ4pNTCQ%3a2

    [17] Chen D, Lü L, Shang M S, et al. Identifying influential nodes in complex

    networks[J]. Physica A: Statistical Mechanics and its Applications, 2012, 391(4): 1777-

    1787.

    [18] Radicchi F, Fortunato S, Markines B, et al. Diffusion of scientific credits and the

    ranking of scientists[J]. Physical Review E, 2009, 80(5): 056103.

    [19] Stanford Network Analysis Project. http://snap.stanford.edu/data/egonets-

    Facebook.html

    1. Introduction2. Assumptions3. Co-author Network of the Erdös1 Authors3.3 Co-author Influence Measur