Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Abstract Identifying the influence of authors and papers is important for fostering academic
research. In this paper, we build three kinds of networks (co-author network, paper
citation network and author citation network) and employ two measures (PageRank,
local centrality) to identify the influence of academic research.
We build and analyze the co-author network of Erdös1 authors. Skills of data
extraction has been used to simplify the process of network building. To analyse
properties of the network, degree distribution, clustering coefficient and average
distance are calculated. After that, we find the existence of the small world
phenomenon in this network. Furthermore, we employ PageRank algorithm for
undirected network and centrality measure to determine the influence of authors.
Through the analysis of 511 nodes in the co-author network, we find that Frank Harary
is the most influential author. To test the effectiveness of the method, we use citations
of authors as a reference.
A paper citation network is built to identify the influence of papers. Some papers are
added to the given data set so as to gain a connected network. We introduce “local
centrality” measure to determine the influence of paper. PageRank measure is given
as a comparison to prove the effectiveness of “local centrality” measure. We conclude
that Collective dynamics of `small-world' networks is the most influential paper in our
paper data set.
Projecting the paper citation network into an author citation network, we use
weighted PageRank algorithm ranking the influence of authors and visualize it. Erdös
is considered as the most influential scientist in Network Science.
In order to test our influence measurement algorithm, we implement it on Facebook
data set. The method of dealing with the co-author network is used to cope with the
Facebook social network because of similarities of the two networks. It performs well
with the empirical support.
Finally, sensitivity analysis proves that our algorithm is robust. Discussion on the
power of network analysis is included in our paper.
Keywords: Local Centrality, PageRank Algorithm, Network science,
Centrality measures, Social Network Analysis
For office use only
T1 ________________
T2 ________________
T3 ________________
T4 ________________
Team Control Number
29080
Problem Chosen
C
For office use only
F1 ________________
F2 ________________
F3 ________________
F4 ________________
2014 Mathematical Contest in Modeling (MCM) Summary Sheet (Attach a copy of this page to your solution paper.)
Team #29080 page 1 of 18
Identifying Influential Nodes in the Network:
PageRank and Local Centrality Measures
1. Introduction
Network science has been proven to be effective in solving problems in multiple
fields. In the field of Informetrics, large number of citations, co-citation and co-author
networks have been built to determine the influence of papers and scientific authors.
Prior literatures used measures of social network analysis (SNA) in citation and
coauthor networks to dig information for evaluating the author influence. E. Otte et al.
studied SNA in the co-author network, and concluded that centrality can be used to
find core specialists in certain field [1]. Abbasi A. et al. used measures from SNA for
examining the effect of networks on the (citation-based) performance of scholars [2].
Also, Martins M E et al. discussed centrality and small-world phenomenon in the
collaborative network [3]. PageRank algorithm is used to analyze authors in co-citation
networks [4].
In addition, effective measures based on authors’ work, such as, citations, h-index,
g-index are proposed. The number of citations qualifies the quantity of publications
[5].Hirsch introduced the h-index as a measure that combines the quantity of
publications and the impact of publications [6]. Another widely used index for
evaluation is g-index, which is introduced by Egghe [7].
The literature research on using network to study influence of paper is few. There
are studies assessing the relative influence of journals [8, 9]. In this paper, the problems
we are going to investigate include:
analyzing influence of authors in Erdös1 network,
determining relative influence of foundation papers in a certain field, and
implementing algorithms on a completely different set of network influence data.
2. Assumptions
For simplicity, we make the following assumptions in this paper:
The error of the data is considered acceptable. Difficulties in author identification,
surely make the data less than 100% accurate, but we believe the errors will not
imfluence our result.
Assume that the co-authors have done the same contributions to their paper.
That is, we ignore differences in co-authors’ contribution to the paper. This is
reasonable, for it is hard to distinguish how much work has been conducted by
each author.
For a given paper, all its referenced papers share the same influence over it.
That is, referenced papers contribute to the result of paper equally.
Team #29080 page 2 of 18
In the co-author network of Erdös1, ignore the frequency of collaboration.
To a certain degree, the citations can be a pretty good measure for the influence
of authors and papers. We use it to test our measures.
The influence of college, or organization is determined by the influence of
authors and papers. This is intuitive, for both the college and journals can be
regarded as a set of papers or a set of scientific authors.
3. Co-author Network of the Erdös1 Authors
3.1 Building Co-author Network
Based on the given data Erdös1.htm, we build our co-author network, the process
of data extracting and network building as follows:
Step1. Obtain 511 Erdös1 authors’ name, assign each of them an identity (ID)
number from 1 to 511.
Step2. Build a mapping table which map Erdös1 co-authors’ name to its ID number
Step3. For Erdös1 author whose ID number is u, obtain one of its co-authors’ name,
search it in mapping table, decide whether this co-author is Erods1 author. If is,
obtain its ID v, build an edge between u and v; otherwise, do nothing.
Step4. Repeat Step3 until all co-authors of u have been inquired.
Step5. Repeat Step4 for all Erdös1 authors.
Figure 1 show the co-author network of the Erdös1 authors. The co-author
network is depicted through a graph. The nodes i of the graph represent authors. A
link between node i and node j indicates a collaboration relationship between authors.
Figure 1. Co-author network of the Erdös1 authors
Team #29080 page 3 of 18
This network is not a connected graph since there are some isolated nodes in the
left. The implication of isolated nodes is that some Erdös1 co-authors did not
collaborate with other co-authors of Erdös.
In the next sections, we can limit the size of the network by taking no consider of
isolated nodes, in which case, the network is a connected graph. It is convenient for us
to analyze the properties of the network or influence of nodes.
3.2 Analyzing Properties of Co-author Network
The properties of the network we are going to analyze include: degree distribution,
clustering coefficient, average distance. In addtion, we devote to find small-world
phenomenon in the Erdös1 network.
Degree Distribution
We consider degree distribution of co-authorship network first. Let du denote the
degree of node u in the co-author network, which corresponds to the number of co-
authors of individual u. The degree distribution of network is shown in Figure2. Note
that k represents the degree; p(k) is the fraction of nodes with degree equal to k.
Figure 2. Degree distribution of co-author network
The degree distribution is plotted in log-log plot, we fit the degree distribution
with the log-log function as follows:
lg( ( )) lg( )p k k c (1)
Both the fitted equation and the fitted line are showed in the Figure 2. It indicates
the degree of the network obey power-law distribution, with a wide range of values.
This feature is in line with prior literatures. It suggests that a small number of authors
with many collaborators and a lot of authors with few collaborators.
Team #29080 page 4 of 18
Clustering coefficient
The clustering coefficient of node u measures the cluster degree of its neighbours,
and can be obtained as follows:
1 2( ) /
uu
u u
ECC
d d (2)
where, Eu is the number of edges among neighbors of node u, and du is the degree of u.
In the Erdös1 network, the clustering coefficient is 0.28.
Average Distance
The average distance is the average length of the shortest path among all pairs of
nodes. High average distance indicates that resources, such as information, must pass
by a large number of intermediaries to travel between nodes in the network [1].
Due to the above network is not connected one. We restrict attention to the giant
connected component of this graph, which includes 92.75% of all authors listed in the
Erdös1. Hence, the average distance is 3.82 here. It is rather very small value, given the
size of network, supporting the general notion that social networks form a “small world”
[10].
Small world
A more rigorous analysis on small world is given following the Watts, D. and
Strogatz, S. [11]. Based on the observed co-author network, the random co-author
network is built with the same number of nodes (authors) and average number of
edges per node. By comparing the parameters (observed and random), we can analyze
whether this network is “small world”.
The clustering coefficient and average distance of both network (observed and
random ) are calculated and shown in Table 1.
Table1. Small world‘s statistics
As presented in Table 1, we have Lobs~Lexp and CCobs≫CCexp. Therefore, this
network does show the small-world phenomenon. It means that the authors are
connected to any other authors in the network through a small number of
intermediaries [11].
Variables Value
Observable data
Authors n 511
Average number of ties per author k 11.86
Lobs: average distance l 3.82
CCobs: clustering coefficient CC 0.28
Random data
Lexp: expected average distance ln(n)/ln(k) 3.58
CCexp: expected clustering coefficient k/n 0.012
Team #29080 page 5 of 18
3.3 Co-author Influence Measure
In this section, PageRank algorithm, centralities of nodes in the network are used
to measure co-author relative influence. The PageRank algorithm is employed to dig
the network structural information while centralities of nodes measure importance of
nodes. At last, the importance of authors’ works is used to test the result.
PageRank for Undirected Network
PageRank is a ranking algorithm used to determine the importance of nodes in the
network. Initially, it’s designed for web pages ranking [12]. The network of web pages
is directed network which consists of billion nodes and links. It also can be applied to
undirected network [13]. We give the PageRank algorithm for undirected network as
follows:
Step1. Convert an undirected network to the directed one by changing one
undirected edge to two directed edges. It is shown in Figure 3.
Step2. Use PageRank algorithm with the damping factor [13]. Given a directed
network of N nodes i=1,2,…,N, the PRi for the ith node is defined by the recursion
formula:
( 1)( ) (1 )
i
j
i outj B j
PR m dPR m d
k N (3)
here, Bi is the set of nodes that point to i, kjout is the out-degree of node j, and d
which called ‘damping factor’ is a parameter that control the performance of
PageRank algorithm. m is the iterations of calculating. After plenty of iterations,
PR will converge and we get PageRank for each node.
Step3. Rank the nodes by the PageRank lists and test performances of the
algorithm in different damping factors.
Figure 3. Converting undirected network to directed network
We rank the authors in the Erdös1 network with the above algorithm. The top ten
authors who have significant influence are shown in Table 2.
Team #29080 page 6 of 18
Table 2. Top ten influential authors ranked by PageRank
Frank Harary is ranked as the most influential author in our PageRank result. This
is reasonable since he specialized in graph theory and was widely recognized as one
of the "fathers" of modern graph theory [14].
Identifying the influence of authors only with PageRank is not enough. Centrality
is configured as a property that measures how central a node is in a network.
Centrality [15] The most important centrality measures are: degree centrality, closeness centrality and
betweenness centrality.
Degree centrality of a node is defined as the number of ties this node has. In a co-
author network the degree centrality of a node(author) is just the number of nodes in
the network with whom she has co-authored at least one article. In mathematical terms
standardized degree centrality, Dsi, of node i is defined as:
1
1si ij
j
D en
(4)
where, eij=1 if there is a link between nodes i and j, and eij=0 if there is no link. n is the
number of authors in the network.
Closeness centrality of a node is equal to the total distance of this node from all other
nodes. In mathematical terms standardized closeness centrality, Csi, of node i is defined
as:
si
ijj
nC
p
(5)
where, pij is the length of the shortest path from node i to node j.
Betweenness centrality may be defined as the number of shortest path of two
arbitrary nodes pass a given node. In mathematical terms standardized betweenness
centrality, Bsi, of node i is defined as:
2
1 2 ,( )( )
jik
sij k jk
gB
n n g
(6)
where, jkg is the number of shortest paths from node j to node k (j, k≠i), and jikg is
the number of shortest paths from node j to node k passing through node i.
Rank Name Rank Name
1 Frank Harary* 6 Vojtech Rodl
2 Noge M. Alon 7 Zsolt Tuza
3 Ronald Lewis Graham 8 Carl Bernard Pomerance
4 Bela Bollobas 9 Zoltan Furedi
5 Vera Turan Sos 10 Joel Harold Spencer
Team #29080 page 7 of 18
Figure 4. Centralities of nodes in three dimensional coordinates
We calculate above centrality measures and plot it in a 3-demensional coordinates
which is shown as Figure 4.
Note that the x, y, z axis represents degree centrality, closeness centrality and
betweenness centrality respectively. As three centrality measures are used to
determine the influence of authors, the higher centrality, the more influential the
author. Hence, in the three dimensional coordinates, nodes which are plotted in the
upper-right have significant influence.
The red points in the Figure 4 are the top ten influential authors in the Erdös
network which calculated by the PageRank. Clearly, the centrality measure result is
compatible with the PageRank result. In this case, it indicates effectiveness of our
model.
Empirical Support
Both centrality measure and PageRank, we take no considerate of the importance
of authors’ work. Actually, it must be an essential measure. We add it in this section
pursuing an overall influence appraisal for authors in the Erdös 1 network.
Plenty of indexes have been studied and used to measure the academic influence
of scientific authors, such as h-index and total citations. Actually, all this indexes
measure the importance of authors’ works. To determine who in this Erdös 1 network
has significant influence, we take total citations as a measurement of importance of
authors’ work. Both h-index and g-index are proved to be relatively effective in
measuring authors’ work. However, since many works of the authors in Erdös1
network was done before 1995, and the h-index in Scopus databases[16] is calculated
only consider citations after 1995, it is not reasonable to employ h-index here. In
addition, total citation can be a pretty good indicator for it reflects both the quantity
and the quality of authors’ work.
The method we propose is to compare the citations of our PageRank top10 authors
with the citations of other 20 authors randomly selectted in Erods1 network. We
Team #29080 page 8 of 18
visualize the number of citations in the Figure 5 for comparison.
Figure 5. Comparison of citations
As is shown in the Figure 5, most of the PageRank top 10 authors have citations
greater than 2000, while all randomly selected author below 2000. It indicates that the
PageRank top 10 authors have high citations on average comparing with randomly
selected author. Therefore, we give a set of authors who have significant influence
rather only give one.
4. Citation Network of Papers
4.1 Building Citation Network
Data and Data Processing
The set of papers we use to construct a citation network includes all paper from the
attached list (NetSciFoundation.pdf) and other 6 papers in the field of network science
we collected via Google Scholar. Note that the fifth paper in the NetSciFoundation.pdf,
which write by Borgatti, S and titled “Identifying sets of key players in a network”, is
not published on Journal of Computational and Mathematical Organization Theory in 2006
but Journal of Integration of Knowledge Intensive Multi-Agent Systems in 2003. In fact,
Borgatti, S has two paper which has similar title, shown as follows:
Borgatti S P. Identifying sets of key players in a network[C]//Integration of
Knowledge Intensive Multi-Agent Systems, 2003. International Conference on.
IEEE, 2003: 127-131.
Borgatti S P. Identifying sets of key players in a social network [J]. Computational
& Mathematical Organization Theory, 2006, 12(1): 21-34.
Team #29080 page 9 of 18
Specifically, we chose the first one. The choice has no impact on measuring the
relative importance of paper. We label each paper with number from 1 to 22. According
to the published time, paper in attached list is labeled with number 1~16, paper we
discovered is labeled with number 17~22. For simplicity, in the following part, we use
the ID number to refer papers.
Citation Network
For a given set of foundation papers, to determine their relative influence, we build
a citation network. We find edges of citation network based on references of paper.
Figure 6 show the citation network. If paper 4 cited paper 3, there is an edge from
number 4 point to number 3.
Figure 6. Citation network of 22 papers in the field of network science.
4.2 “Local centrality” Measure
An intuitive idea of influence of paper should be, either paper is cited by highly
cited paper, or is cited highly by massive papers, it is influential. To determine the
relative influence of those papers within the network, we introduce a model base on
the citations and the idea of local centrality [17].
Chen et al. propose local centrality to identify influential nodes in complex
networks [17]. The local centrality is proved to be effective in measuring importance of
nodes in large-scale networks. The local centrality CL (v) of node v is defined as:
( )
( )w u
Q u N w
(7)
Team #29080 page 10 of 18
( )
( ) ( )Lu v
C v Q u
(8)
where, Γ (u) is the set of the nearest neighbors of node u and N (w) is the number of
the nearest and the next nearest neighbors of node w.
The idea of local centrality is, for node u, using the number of a certain scope of
nodes (distance lower than 4) to determine the influence of node u.
Our approach is, for paper u, using the sum of the total citations of papers which
cite paper u, and papers which indirectly cite paper u with 2 direct citation, to the
influence of paper u. In mathematical terms, our ‘local centrality’ of paper u, '( )LC u is
defined as:
'
( ) ( ) ( )
( ) ( ) ( )Lv u v u w v
C u C v C w
(9)
where, C(u) is the number of citations of node u.
Table 3. Influence rank of paper measured by ‘local centrality’
Rank 1 2 3 4 5 6 7 8 9 10 11
ID 5 17 18 19 1 6 10 7 8 21 3
Rank 12 13 14 15 16 17 18 19 20 21 22
ID 12 20 9 11 22 13 2 4 14 15 16
Table 3 show the Influence rank of paper measured by ‘local centrality’. ID number
5 paper that ranks the first is “Collective dynamics of ‘small-world’ networks”, which
is widely regarded as most influential fundamental paper in the network science field.
The ID number 17 paper is “On the evolution of random graphs”;18 is “Internet:
Diameter of the world-wide web”; 19 is “The large-scale organization of metabolic
networks”. They are all very popular literature and much important works have
followed from their publication. 15 and 16 are not so influential because they are
published in 2007 and 2010, respectively, and have no citations in the network.
4.3 Using PageRank Measure for Comparison
We also rank the set of paper based on PageRank, and show the result in Table 4
for comparison.
Table 4. Influence rank of paper measured by PageRank
Rank 1 2 3 4 5 6 7 8 9 10 11
ID 5 1 17 3 6 18 8 19 10 7 21
Rank 12 13 14 15 16 17 18 19 20 21 22
ID 12 20 13 9 11 22 2 4 14 15 16
This results is similar to rank measured by ‘local centrality’ but not so effective since
ID number 3 ranks top 5 while it has less citations. The reason is that “local centrality”
consider both structure of the network and the works follow from a paper.
Team #29080 page 11 of 18
4.4 Similar Measure for Individual Researcher
The core idea of our measure is that the influence of paper u is determined by the
quality and quantity of paper v that cite paper u. The quality of paper is measured by
the number of citations. To measure the individual in the citation work, this idea
continues to be useful, because the author’s influence is transmitted by paper. In
addition, co-authors share the influence of their paper.
A weighted author citation network(WACN) between authors can be easily
determined as a particular projection of the paper citation network(PCN), shown as
Figure 7 [18].
Figure 7. Projection of the PCN into a WACN [18]
Note: (a) the paper i, written by two authors i1 and i2, cited by two papers j and k,
written by one author j1 and two authors k1 and k2. (b) the WACN is then simply
generated by connecting with a directed link both i1 and i2 to j1, each with weight of
1/2, and to k1 and k2, each with weight of 1/4 [18].
We then utilize weighted PageRank algorithm to rank the influence of authors in
the weighted author citation network. The weighted PageRank algorithm is different
from the normal one because its edge weight is different. The recursion formula in the
weighted PageRank is a deformation of the formula (10):
( 1)( ) (1 )
i
j
i jioutj B j
PR m dPR m d w
s N (10)
outj jkk
s w (11)
Here wji is the weight of the directed connection from j to i, sjout is the outstrenght of
the node j. So we just change 1/ kjout in formula (10) into wji/ sjout, which means the
probability of a random walk from node j to node i.
Figure 8 shows the result. The size of the node is used to represent the PageRank
value. The larger size the nodes, the more influential the author. Obviously, author like
Erdös and Strogatz who did some innovative benchmark work is very famous in our
model.
Team #29080 page 12 of 18
Figure 8 Influence of individual researchers in the author citation network.
4.5 Authors and Their Papers
We conduct a consistency check by analyzing the relationship between authors and
their papers, to check our empirical deduction that influential authors has influential
papers.
Figure 9 The relationship between authors and their papers
Team #29080 page 13 of 18
As is shown in Figure 9. The left and right numbers represent the rank of papers
and authors, respectively.The arrow point from the author to all his papers. Red lines
means the influence between the papers and the authors is consistent. The implication
is that influential author has influential papers. The number of red lines is greater than
black lines ,which indicates that our rank for both the papers and individual
researchers is effective and persuasive.
4.6 University, Department, and Journal
As we already mentioned in assumptions, colleges, departments and journals can
be regarded as a set of papers or a set of scientific authors. The influence of a college
can be calculated by the total influence of scientists who work there and publications
of its researchers. Via building co-author networks and citation networks. The
influence of a college can be easily and reasonably evaluated. It is the same when
evaluating department influence and journal influence.
5. Implementing Our Algorithm on Facebook Dataset
In order to test our influence measurement algorithm, we implement it in a
different network. As Social Networking Service (SNS) enjoys a boom in the present
century, we determine to use our algorithm in ‘friends circles’ of Facebook so as to
identify the most influential person in the social network.
5.1 Building Social Network
Our dataset, which comes from Stanford Network Analysis Project [19], consists
of ‘circles’ (or 'friends lists') from Facebook. The dataset includes node features, circles,
and ego networks.
We build a social network via a similar process of building a co-author network.
The different meaning of edges is the difference between two networks. That is, one
edge in a social network represents that the corresponding two persons know each
other while the one in a co-author network means that the corresponding authors
worked together for an article.
.
Table 5 Social Network Measures
Variables Value
Observable data
nodes n 4039
Average number of ties per node k 21.85
Lobs: average distance l 3.69
CCobs: clustering coefficient CC 0.61
Random data
Lexp: expected average distance ln(n)/ln(k) 2.61
CCexp: expected clustering coefficient k/n 0.01
Team #29080 page 14 of 18
Table 5 shows a characterization of the social network from our dataset. We can
easily find that L~Lexp and CC≫CCexp. Therefore, this network does show the small-
world phenomenon. It means that two persons in the network are ‘close’ to each
other
5.2 Methodology and Results
Different at the edges’ meaning are, the social network is so similar to the co-author
network. For instance, they both are undirected and unweighted graphs. Co-author
network even can be regarded as a social network in some degree because authors in
one identical article have large possibility to know each other.
According to the similarity of two networks, we use the same method to analyze
the social network. Firstly, we employ ‘PageRank for Undirected Network’ algorithm
to calculate the PageRank of each node so as to get a rank list. Secondly, we get the
centralities (degree centrality, closeness centrality and betweenness centrality) of each
node after some calculating. After that, we compare the top20 nodes in our PageRank
list via plotting their features of centralities to argue that our algorithm is correct.
Figure 10 shows the result of plotting. (The red points represent the top20 nodes in
PageRank list
5.3 Empirical Support
Empirical analysis of social networks finds that an influential person tends to have higher betweenness centrality or closeness centrality because he/she is the ‘bridge’ and
the center of his/her circles of friends.
Figure 10 illustrates the distribution of three centralities of nodes. We can easily
find that top20 nodes in our ranking list also have higher betweenness centrality and
closeness centrality, which is an evidence for the correctness of our method
Figure 10.Centralities of nodes in three dimensional coordinates
Team #29080 page 15 of 18
6. Sensitivity Analysis
Figure 11. Performances with different damping factors.
Note: The performance of ‘PageRank for undirected network’ algorithm with
different damping factors. The co-authorship network is used to test the algorithm
robustness.
We discuss to what extent the results depend on the parameters. Figure 11 displays
the performance of our algorithm with different damping factors. As we can see,
although the value of PageRank for an identical node is different due to different
factors, the rank list of nodes changes little, substantially proving the inherent stability
of our algorithm.
We can also find that with the increasing of the damping factor, the rank list become
ambiguous, espically when the parameter d=0.85 (see the green line in Figure 11) or
more. In the original PageRank algorithm[12], d was chosen to be 0.15. This value was
proposed from the empirical observation that an individual surfing the web will
typically follow of the order of 6 hyperlinks before changes the search, corresponding
to a probablity d=1/6~0.15[13]. Similarly, we argue that the damping factor in a co-
authorship network should be limited in a range 0.15~0.85 form the observation of
the network structure.
7. Strengths and Weaknesses
7.1 Strengths
Consistency. Although our influence measurement algorithms only consider the
feature of the network, they match the empirical results (e.g., the rank list in co-
authorship network matches well with authors’ citation rank list). In addition,
under small changes, like adjusting the damping factor in our algorithm, the
Team #29080 page 16 of 18
results of the model change little.
Flexibility. Our methods easily adapts to problems with different kinds of
networks, such as social networks. Additionally, as the time complexity is
polynomial, our algorithms are suitable for dealing with large amount of data.
Less assumptions required. By using weighted PageRank algorithm, we can
reduce the number of assumptions. In order to simplify data processing, we ignore
the strength of corporation and citation, but we can consider them in our algorithm
if we have plenty of data.
7.2 Weaknesses
Complicated data collecting. In order to get the citation network, we download
all the papers required and search the references manually, which takes lots of time.
This method of building a network cannot implement on a large dataset.
Little consideration on features of nodes. We treat every node equally in our
algorithm. However, some feature of the node, like the author’s prestige in co-
authorship network, should be considered in the algorithm.
8. Conclusion and Discussion
8.1 Conclusion
We build three kinds of networks (co-author network, paper citation network and
author citation network) and employ two influence measurement algorithms
(PageRank, local centrality) so as to determine influence of academic research. After
these analysis, we try to propose a general methoddology. Furthermore, we implement
our method in social network analysis and it performs well.
Co-author network is undirected and unweighted. We use ‘PageRank for
Undirected Network’ algorithm to get the ranked list and compare it to the centrality
measurement. The results are surprising. To test our algorithm, we use the citation
indicator as a standard and they match with each other. We also test the algorithm by
changing the damping factor.
Paper citation network is directed and unweighted. We employ ‘Local centrality’
indicator as a measurement of influence and compared it with the results of PageRank
algorithm. It performs better to some extent.
Author citation network is directed and weitheted. It is a network formed from
citation network via a particular method. We implenment ‘weighted PageRank
algorithm’ so as to get the ranked list of authors. In additon, we bulid a paper-author
graph to illustrate the correctness of our network and algorithm.
According to the similarity of the social network and co-author network, we
implement the method of dealing with the latter one to the former and get the ideal
results.
Team #29080 page 17 of 18
8.2 Further Discussion
Science
Our methods are based on the works of many predecessors and the data come
from real life. As we already mentioned above, centrality indicators and diffusion-
based processes are wildly used in various kinds of complex networks, and proved to
be well-performed. The results of evaluating node influence in the networks via both
centrality measure and diffusion-based algorithm such as PageRank are highly
consistent. Furthermore, the results have empirical support, which is reflected in top-
ranked papers have more citations and top-ranked authors have more masterpieces
and more citations. Therefore, our methods are scientific.
Understanding
The main idea of our methods comes from real life. Our main idea is the
importance of a node is depends on the quality and quantity of nodes who accept it.
The relationship between nodes (papers or authors) is presented as collaboration in
the co-author network and citation in the paper citation network and author citation
network. Actually, in the real world, we use the same idea to define the influence of
individuals or works as well. Therefore, it is natural for us to understand this idea.
Utility
In various different networks such as traffic networks, business networks and
social networks, evaluating the influence of nodes is a significant task for us to make
better decisions. By using our methods in Facebook network, we know our methods
are universally applicable in “small world” networks, which is common in the real
world. That is to say, our methods can be wildly used in many different fields.
Applications Examples
Based on the network of commercial collaborations, companies can look for
suitable and excellent partners.
Based on the network of recommendation (workers recommend co-worker as the
leader), organization can select appropriate leaders.
Government can make some protection decisions by analyzing the importance of
nodes within electricity networks, aviation networks, and transport networks.
Individual can build effective social relationship which can promote personal
development by analyzing social network to know important people.
9. Reference
[1] Otte E, Rousseau R. Social network analysis: a powerful strategy, also for the
information sciences [J]. Journal of Information Science, 2002, 28(6): 441-453.
Team #29080 page 18 of 18
[2] Abbasi A, Altmann J, Hossain L. Identifying the effects of co-authorship networks
on the performance of scholars: A correlation and regression analysis of performance
measures and social network analysis measures [J]. Journal of Informetrics, 2011, 5(4):
594-607.
[3] Martins M E, Martins G S, Csillag J M, et al. Service's scientific community: a social
network analysis (1995-2010) [J]. Journal of Service Management, 2012, 23(3): 455-469.
[4] Ding Y, Yan E, Frazho A, et al. PageRank for ranking authors in co‐citation
networks[J]. Journal of the American Society for Information Science and Technology,
2009, 60(11): 2229-2243.
[5] Lehmann S, Jackson A D, Lautrup B E. Measures for measures [J]. Nature, 2006,
444(7122): 1003-1004.
[6] Hirsch, JE. 2005. "An index to quantify an individual's scientific research output."
Proceedings of the National Academy of Sciences 102(46):16569
[7] Egghe L. Theory and practise of the g-index[J]. Scientometrics, 2006, 69(1): 131-152.
[8] Nerur S, Sikora R, Mangalaraj G, et al. Assessing the relative influence of journals
in a citation network[J]. Communications of the ACM, 2005, 48(11): 71-74.
[9] Delgado E, Repiso R. The Impact of Scientific Journals of Communication:
Comparing Google Scholar Metrics, Web of Science and Scopus[J]. Comunicar, 2013,
21(41).
[10] Freire V P, Figueiredo D R. Ranking in collaboration networks using a group
based metric[J]. Journal of the Brazilian Computer Society, 2011, 17(4): 255-266.
[11] Watts D J, Strogatz S H. Collective dynamics of ‘small-world’networks[J]. Nature,
1998, 393(6684): 440-442.
[12] Page L, Brin S, Motwani R, et al. The PageRank citation ranking: bringing order to
the web [J]. 1999.
[13] Mihalcea R. Graph-based ranking algorithms for sentence extraction, applied to
text summarization[C]//Proceedings of the ACL 2004 on Interactive poster and
demonstration sessions. Association for Computational Linguistics, 2004: 20.
[14] Frank Harary, Wikipedia, http://en.wikipedia.org/wiki/Frank_Harary
[15] L.C. Freeman, Centrality in social networks: I. Conceptual clarification, Social
Networks1 (1978) 215–239
[16] Scopus databases, http://www.scopus.com/search/form.url?display=authorLoo
Kup&clear=t&origin=searchbasic&txGid=EB1D2BA09CA152AF9BDA11F7EFFC08B4.
Vdktg6RVtMfaQJ4pNTCQ%3a2
[17] Chen D, Lü L, Shang M S, et al. Identifying influential nodes in complex
networks[J]. Physica A: Statistical Mechanics and its Applications, 2012, 391(4): 1777-
1787.
[18] Radicchi F, Fortunato S, Markines B, et al. Diffusion of scientific credits and the
ranking of scientists[J]. Physical Review E, 2009, 80(5): 056103.
[19] Stanford Network Analysis Project. http://snap.stanford.edu/data/egonets-
Facebook.html
1. Introduction2. Assumptions3. Co-author Network of the Erdös1 Authors3.3 Co-author Influence Measur