Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Dynamic graph patterns in the High-Energy Physics/Theory Citation Networks
Victor O. Santos and Lawrence B. Holder Electrical Engineering and Computer Science, Washington State University REU
This work was supported by the National Science Foundation’s REU program under grant number IIS-0647705
INTRODUCTION EXPERIMENT
APROACH
VISUALIZATION
RESULTS
CONCLUSION
The steps for the experiment were the followings:
• Six new dynamic graphs were generated to try to find patterns. These six graphsare the full month, weeks, and days graphs for both citation networks.
• Because the high time consumption of DynGRL’s first step algorithm, the twocitation networks graphs were executed in different machines at the same time.
• After eight days of execution, all the High Energy Physics processes finished, andthe results for the pattern discovery process for both citation networks weredifferent from other previous inconsistent results.
• We realize that should increment the precision parameters of DynGRL to findmore complex substructures and more interesting patterns.
• After executing DynGRL with these new parameters, we notice that the executiontime will be approximately exponentiated as the snapshot grows.The total execution time for a ten months’ dynamic graph can be expressed asfollows.
The citation network can be easily represented as a graph where each paper is avertex and each citation is an edge. The graph formed by the citation network isdirected since the citations can only occur when a recently published papercites a previously published paper. The citation network graphs with we will beworking with in this research are dynamic. We will be using several tools tostudy these dynamic graphs such as DynGRL [1] that will be analyzing thegraphs using a two-step algorithm to determine if there are patterns in thesenetworks, and a conversion algorithm in order to eliminate the inconsistenciesin raw data and generate the dynamic graphs.
We will be working with the High Energy Physics and High Energy PhysicsTheory citation networks that contains ten years of information given in thefollowing format.
Before starting with the pattern discovery process, we performthe following steps:
• Build a conversion algorithm that will be used countlesstimes in the experimental process.
- The conversion algorithm allows generating the dynamicgraph by splitting the whole graph into time snapshots.
- Three measures of time were created: months, weeks anddays to create snapshots.
• Convert the raw graph data into its dynamic graphrepresentation as represented in figure (A).
Graph (B) represents the patterndiscovered by dividing the timesnapshots every two days.
REFERENCES
Graph visualization (B) represents thewhole High Energy Physics Theorycitation network. The colors in thedifferent areas are the graphscommunities.
DynGRL is designed to work as a single thread process, and this can result in avery slow pattern discovery process in large graphs like the ones we work on inthis research.
• DynGRL is designed to process graphs that change over time, and thesechanges included additions and subtraction of vertices and edges.
• Citation graphs do change through time, but do not suffer subtraction ofvertices or edges.
• The execution time of DynGRL varies depending on the size of the graph’stime snapshots.
• The execution time for 500 snapshots representing the evolution of thegraphs each day can take more than 72 hours to process with the lowestaccuracy parameters.
• A Good solution for this issue is to change the DynGRL single thread designto a parallel design capable of using the resources of today’s multi coresystems.
[1] C. hun You, “DynGRL: Dynamic Graph-based Relational Learning,” 2011, http://changhun.com/research.html.
[2] C. hun You, L. B. Holder, and D. J. Cook, “Learning Patterns in the Dynamics of Biological Networks,” 2009, in press.
[3] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An Open Source Software for Exploring and Manipulating Networks,” 2009, in press.
Graph (A) represents thepattern discovered by dividingthe time snapshots every oneday.
Graph (C) represents the patterndiscovered by dividing the time snapshotsevery one month.
COMPUATIONAL POWER ISSUE
Other analyses were made to bothcitation networks, like calculating theweight of the vertices, communitydetection, and a visualization using theGephi [3] graphs visualization tool.
Graph visualization (A) represents thefirst 500 days of the High Energy PhysicsTheory citation network.
The pattern discovery technique has demonstrated effectiveness in thediscovery of patterns in these kinds of networks and therefore in othercitation networks that can also be represented in graphs. This technique givesus an abstract idea of how the citations network behaves and therefore,shows the possibility of predicting when their structures will be changing. Inorder to test the accuracy of the three found patterns, we can take a smallsample of today’s High Energy Physics citation network and see if thepatterns are present. If we get a high accuracy, we can conclude that thesepatterns represents a persistent behavior in this citation network andtherefore, the behavior of how researchers are related by the citations of theirpublications.
The research also gives an idea of how much computational power is neededto process sophisticated graphs like these ones.
Here is a framework of dynamic graph analysis [2]. Step (A) represents a dynamic graphwith ten snapshots of time. Step (B) The graph’s rewriting rules discovery from twocontinuous graph snapshot times. Step (C) Learning the rewriting rules generated by theprevious step. Step (D) Generating the dynamic graph transformation patterns byabstracting the learned rewriting rules.
Despite the difficulties processing the different dynamic graphs, three patternswere found in the High Energy Physics citation network.
(A)
(B)
(C)
(A)
(B)
Time 1
v 1 paper
v 2 paper
v i paper
d 2 1 citation
d i1 i2 citation
Time i
v 1 paper
v 2 paper
v n paper
d 2 1 citation
d i1 i2 citation
……
…
(A)
Target paper Cited paper Paper Publication date