1
Dynamic graph patterns in the High-Energy Physics/Theory Citation Networks Victor O. Santos and Lawrence B. Holder Electrical Engineering and Computer Science, Washington State University REU This work was supported by the National Science Foundation’s REU program under grant number IIS-0647705 INTRODUCTION EXPERIMENT APROACH VISUALIZATION RESULTS CONCLUSION The steps for the experiment were the followings: Six new dynamic graphs were generated to try to find patterns. These six graphs are the full month, weeks, and days graphs for both citation networks. Because the high time consumption of DynGRL’s first step algorithm, the two citation networks graphs were executed in different machines at the same time. After eight days of execution, all the High Energy Physics processes finished, and the results for the pattern discovery process for both citation networks were different from other previous inconsistent results. We realize that should increment the precision parameters of DynGRL to find more complex substructures and more interesting patterns. After executing DynGRL with these new parameters, we notice that the execution time will be approximately exponentiated as the snapshot grows. The total execution time for a ten months’ dynamic graph can be expressed as follows. The citation network can be easily represented as a graph where each paper is a vertex and each citation is an edge. The graph formed by the citation network is directed since the citations can only occur when a recently published paper cites a previously published paper. The citation network graphs with we will be working with in this research are dynamic. We will be using several tools to study these dynamic graphs such as DynGRL [1] that will be analyzing the graphs using a two-step algorithm to determine if there are patterns in these networks, and a conversion algorithm in order to eliminate the inconsistencies in raw data and generate the dynamic graphs. We will be working with the High Energy Physics and High Energy Physics Theory citation networks that contains ten years of information given in the following format. Before starting with the pattern discovery process, we perform the following steps: Build a conversion algorithm that will be used countless times in the experimental process. - The conversion algorithm allows generating the dynamic graph by splitting the whole graph into time snapshots. - Three measures of time were created: months, weeks and days to create snapshots. Convert the raw graph data into its dynamic graph representation as represented in figure (A). Graph (B) represents the pattern discovered by dividing the time snapshots every two days. REFERENCES Graph visualization (B) represents the whole High Energy Physics Theory citation network. The colors in the different areas are the graphs communities. DynGRL is designed to work as a single thread process, and this can result in a very slow pattern discovery process in large graphs like the ones we work on in this research. DynGRL is designed to process graphs that change over time, and these changes included additions and subtraction of vertices and edges. Citation graphs do change through time, but do not suffer subtraction of vertices or edges. The execution time of DynGRL varies depending on the size of the graph’s time snapshots. The execution time for 500 snapshots representing the evolution of the graphs each day can take more than 72 hours to process with the lowest accuracy parameters. A Good solution for this issue is to change the DynGRL single thread design to a parallel design capable of using the resources of today’s multi core systems. [1] C. hun You, “DynGRL: Dynamic Graph-based Relational Learning,” 2011, http://changhun.com/research.html . [2] C. hun You, L. B. Holder, and D. J. Cook, “Learning Patterns in the Dynamics of Biological Networks,” 2009, in press. [3] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An Open Source Software for Exploring and Manipulating Networks,” 2009, in press. Graph (A) represents the pattern discovered by dividing the time snapshots every one day. Graph (C) represents the pattern discovered by dividing the time snapshots every one month. COMPUATIONAL POWER ISSUE Other analyses were made to both citation networks, like calculating the weight of the vertices, community detection, and a visualization using the Gephi [3] graphs visualization tool. Graph visualization (A) represents the first 500 days of the High Energy Physics Theory citation network. The pattern discovery technique has demonstrated effectiveness in the discovery of patterns in these kinds of networks and therefore in other citation networks that can also be represented in graphs. This technique gives us an abstract idea of how the citations network behaves and therefore, shows the possibility of predicting when their structures will be changing. In order to test the accuracy of the three found patterns, we can take a small sample of today’s High Energy Physics citation network and see if the patterns are present. If we get a high accuracy, we can conclude that these patterns represents a persistent behavior in this citation network and therefore, the behavior of how researchers are related by the citations of their publications. The research also gives an idea of how much computational power is needed to process sophisticated graphs like these ones. Here is a framework of dynamic graph analysis [2]. Step (A) represents a dynamic graph with ten snapshots of time. Step (B) The graph’s rewriting rules discovery from two continuous graph snapshot times. Step (C) Learning the rewriting rules generated by the previous step. Step (D) Generating the dynamic graph transformation patterns by abstracting the learned rewriting rules. Despite the difficulties processing the different dynamic graphs, three patterns were found in the High Energy Physics citation network. (A) (B) (C) (A) (B) Time 1 v 1 paper v 2 paper v i paper d 2 1 citation d i 1 i 2 citation Time i v 1 paper v 2 paper v n paper d 2 1 citation d i 1 i 2 citation (A) Target paper Cited paper Paper Publication date

Dynamic graph patterns in the High-Energy Physics/Theory …reu.mme.wsu.edu/2011/files/03.pdf · 2011. 7. 29. · Dynamic graph patterns in the High-Energy Physics/Theory Citation

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dynamic graph patterns in the High-Energy Physics/Theory …reu.mme.wsu.edu/2011/files/03.pdf · 2011. 7. 29. · Dynamic graph patterns in the High-Energy Physics/Theory Citation

Dynamic graph patterns in the High-Energy Physics/Theory Citation Networks

Victor O. Santos and Lawrence B. Holder Electrical Engineering and Computer Science, Washington State University REU

This work was supported by the National Science Foundation’s REU program under grant number IIS-0647705

INTRODUCTION EXPERIMENT

APROACH

VISUALIZATION

RESULTS

CONCLUSION

The steps for the experiment were the followings:

• Six new dynamic graphs were generated to try to find patterns. These six graphsare the full month, weeks, and days graphs for both citation networks.

• Because the high time consumption of DynGRL’s first step algorithm, the twocitation networks graphs were executed in different machines at the same time.

• After eight days of execution, all the High Energy Physics processes finished, andthe results for the pattern discovery process for both citation networks weredifferent from other previous inconsistent results.

• We realize that should increment the precision parameters of DynGRL to findmore complex substructures and more interesting patterns.

• After executing DynGRL with these new parameters, we notice that the executiontime will be approximately exponentiated as the snapshot grows.The total execution time for a ten months’ dynamic graph can be expressed asfollows.

The citation network can be easily represented as a graph where each paper is avertex and each citation is an edge. The graph formed by the citation network isdirected since the citations can only occur when a recently published papercites a previously published paper. The citation network graphs with we will beworking with in this research are dynamic. We will be using several tools tostudy these dynamic graphs such as DynGRL [1] that will be analyzing thegraphs using a two-step algorithm to determine if there are patterns in thesenetworks, and a conversion algorithm in order to eliminate the inconsistenciesin raw data and generate the dynamic graphs.

We will be working with the High Energy Physics and High Energy PhysicsTheory citation networks that contains ten years of information given in thefollowing format.

Before starting with the pattern discovery process, we performthe following steps:

• Build a conversion algorithm that will be used countlesstimes in the experimental process.

- The conversion algorithm allows generating the dynamicgraph by splitting the whole graph into time snapshots.

- Three measures of time were created: months, weeks anddays to create snapshots.

• Convert the raw graph data into its dynamic graphrepresentation as represented in figure (A).

Graph (B) represents the patterndiscovered by dividing the timesnapshots every two days.

REFERENCES

Graph visualization (B) represents thewhole High Energy Physics Theorycitation network. The colors in thedifferent areas are the graphscommunities.

DynGRL is designed to work as a single thread process, and this can result in avery slow pattern discovery process in large graphs like the ones we work on inthis research.

• DynGRL is designed to process graphs that change over time, and thesechanges included additions and subtraction of vertices and edges.

• Citation graphs do change through time, but do not suffer subtraction ofvertices or edges.

• The execution time of DynGRL varies depending on the size of the graph’stime snapshots.

• The execution time for 500 snapshots representing the evolution of thegraphs each day can take more than 72 hours to process with the lowestaccuracy parameters.

• A Good solution for this issue is to change the DynGRL single thread designto a parallel design capable of using the resources of today’s multi coresystems.

[1] C. hun You, “DynGRL: Dynamic Graph-based Relational Learning,” 2011, http://changhun.com/research.html.

[2] C. hun You, L. B. Holder, and D. J. Cook, “Learning Patterns in the Dynamics of Biological Networks,” 2009, in press.

[3] M. Bastian, S. Heymann, and M. Jacomy, “Gephi: An Open Source Software for Exploring and Manipulating Networks,” 2009, in press.

Graph (A) represents thepattern discovered by dividingthe time snapshots every oneday.

Graph (C) represents the patterndiscovered by dividing the time snapshotsevery one month.

COMPUATIONAL POWER ISSUE

Other analyses were made to bothcitation networks, like calculating theweight of the vertices, communitydetection, and a visualization using theGephi [3] graphs visualization tool.

Graph visualization (A) represents thefirst 500 days of the High Energy PhysicsTheory citation network.

The pattern discovery technique has demonstrated effectiveness in thediscovery of patterns in these kinds of networks and therefore in othercitation networks that can also be represented in graphs. This technique givesus an abstract idea of how the citations network behaves and therefore,shows the possibility of predicting when their structures will be changing. Inorder to test the accuracy of the three found patterns, we can take a smallsample of today’s High Energy Physics citation network and see if thepatterns are present. If we get a high accuracy, we can conclude that thesepatterns represents a persistent behavior in this citation network andtherefore, the behavior of how researchers are related by the citations of theirpublications.

The research also gives an idea of how much computational power is neededto process sophisticated graphs like these ones.

Here is a framework of dynamic graph analysis [2]. Step (A) represents a dynamic graphwith ten snapshots of time. Step (B) The graph’s rewriting rules discovery from twocontinuous graph snapshot times. Step (C) Learning the rewriting rules generated by theprevious step. Step (D) Generating the dynamic graph transformation patterns byabstracting the learned rewriting rules.

Despite the difficulties processing the different dynamic graphs, three patternswere found in the High Energy Physics citation network.

(A)

(B)

(C)

(A)

(B)

Time 1

v 1 paper

v 2 paper

v i paper

d 2 1 citation

d i1 i2 citation

Time i

v 1 paper

v 2 paper

v n paper

d 2 1 citation

d i1 i2 citation

……

(A)

Target paper Cited paper Paper Publication date