View
230
Download
2
Category
Tags:
Preview:
Citation preview
Informetric methods seminar
Tutorial 2: Using Matlab for network construction, ranking, clustering, topic modeling, and path finding
Erjia Yan
Paper-to-paper citation network is the base
Web of Science cited references format: First Author, Year Of Publication, Abbreviated
Journal Name, Volume Number, Beginning Page Number
AANESTAD M, 2011, J STRATEGIC INF SYST, V20, P161
All fields can be found in “full record + cited references” downloading option
Web of Science format
Some of the newer records may also have DOI. For a better match, it is better to remove the DOI from the cited references
For citing papers, extract these fields and format them into Web of Science cited reference format.
Now we have citing papers and cited references that have the same format
Use these two fields, construct an internal citation network that only contains those cited references that are cited by the citing papers in the data set
Citation matching
If you can write an app for this, it would be great!
Otherwise, you can follow these instructions
Converting into
Use Access to construct the network Have a table for citing papers Import the converted citation pairs to Access Use query to extract those pairs whose papers are in
the table Now you have the node info and link info Import both into Matlab
Procedures
CP1 CR1; CR2; CR3
CP1 CR1
CP1 CR2
CP1 CR3
Now we have paper-to-paper citation networks, but in order to construct for instance author-to-author citation or author co-citation networks, we need to use adjacent matrices.
Adjacent matrices
Authors
Papersa cell number 1 (i,j)=1 indicates paper i is written by author j
Convert into
Add to the beginning of the file
Use Txt2Pajek on the linkage file Import the edge section of the .net file to
Matlab Select M(1:n,n+1:m) where m is the col
size. The selection is our author-paper adjacent matrix
Procedures
ID1 AU1; AU2; AU3
ID1 AU1
ID1 AU2
ID1 AU3ID1 ID1
ID2 ID2
… …
IDn IDn
By David Gleich of Purdue University http://
www.mathworks.com/matlabcentral/fileexchange/11613-pagerank
pagerank(M,options) options.c: the teleportation coefficient [double |
{0.85}] options.v: the personalization vector [vector |
{uniform: 1/n}]
PageRank
K-means IDX = kmeans(X,k) http://
www.mathworks.com/help/stats/kmeans.html
Hierarchical clustering http://
www.mathworks.com/help/stats/hierarchical-clustering.html
Built-in functions
By MIT Strategic Engineering http://
strategic.mit.edu/downloads.php?page=matlab_networks [modules,module_hist,Q] =
newmangirvan(adj,k) [groups_hist,Q]=newman_comm_fast(ad
j)
Modularity-based clustering
By Nees van Eck and Ludo Waltman of Leiden University
http://www.vosviewer.com/relatedsoftware/ A variant of the modularity-based
clustering technique [X, cluster_size, V] = VOS_clustering(A,
P)
VOSviewer clustering
By Mark Steyvers of University of California Irvine
http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm
Input: The input is a bag of word representation containing the number of times each words occurs in a document.
Matlab Topic Modeling Toolbox
http://www.mathworks.com/help/bioinfo/ref/graphshortestpath.html
[dist, path, pred]=graphshortestpath(G,S,T) from S to T in graph G
[dist] = graphallshortestpaths(G) find all shortest path in graph G; dist is a
distance matrix for the shortest path of each pair of nodes
Bioinformatics toolbox
Recommended