16
Chapter 8 Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach Susanta Mitra and Aditya Bagchi 8.1 Introduction A social network is a social structure between actors (individuals, organization or other social entities) and indicates the ways in which they are connected through various social relationships like friendships, kinships, professional, academic etc. Usually, a social network represents a social community, like a club and its members or a city and its citizens etc. or a research group communicating over Internet. In seventies Leinhardt [1] first proposed the idea of representing a social community by a digraph. Later, this idea became popular among other research workers like, network designers, web-service application developers and e-learning modelers. It gave rise to a rapid proliferation of research work in the area of social network anal- ysis. Some of the notable structural properties of a social network are connectedness between actors, reachability between a source and a target actor, reciprocity or pair- wise connection between actors with bi-directional links, centrality of actors or the important actors having high degree or more connections and finally the division of actors into sub-structures or cliques or strongly-connected components. The cy- cles present in a social network may even be nested [2, 3]. The formal definition of these structural properties will be provided in Sect. 8.2.1. The division of actors into cliques or sub-groups can be a very important factor for understanding a social structure, particularly the degree of cohesiveness in a community. The number, size, and connections among the sub-groups in a network are useful in understanding how the network, as a whole, is likely to behave. Social scientists, through the analysis of a social network, focus attention on how solidarity and connection of large social structures can be built out of smaller groups. To build a useful understanding of a social network, a complete and rigorous description of a pattern of social relationships is a necessary starting point. S. Mitra ( ) Meghnad Saha Institute of Technology (Techno India Group), East Kolkata Township, Kolkata-700107, India e-mail: [email protected]; susanta [email protected] B. Furht (ed.), Handbook of Social Network Technologies and Applications, DOI 10.1007/978-1-4419-7142-5 8, © Springer Science+Business Media, LLC 201 169 0

Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

  • Upload
    borko

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

Chapter 8Modeling Temporal Variation in Social Network:An Evolutionary Web Graph Approach

Susanta Mitra and Aditya Bagchi

8.1 Introduction

A social network is a social structure between actors (individuals, organization orother social entities) and indicates the ways in which they are connected throughvarious social relationships like friendships, kinships, professional, academic etc.Usually, a social network represents a social community, like a club and its membersor a city and its citizens etc. or a research group communicating over Internet. Inseventies Leinhardt [1] first proposed the idea of representing a social communityby a digraph. Later, this idea became popular among other research workers like,network designers, web-service application developers and e-learning modelers. Itgave rise to a rapid proliferation of research work in the area of social network anal-ysis. Some of the notable structural properties of a social network are connectednessbetween actors, reachability between a source and a target actor, reciprocity or pair-wise connection between actors with bi-directional links, centrality of actors or theimportant actors having high degree or more connections and finally the divisionof actors into sub-structures or cliques or strongly-connected components. The cy-cles present in a social network may even be nested [2, 3]. The formal definitionof these structural properties will be provided in Sect. 8.2.1. The division of actorsinto cliques or sub-groups can be a very important factor for understanding a socialstructure, particularly the degree of cohesiveness in a community. The number, size,and connections among the sub-groups in a network are useful in understandinghow the network, as a whole, is likely to behave.

Social scientists, through the analysis of a social network, focus attention onhow solidarity and connection of large social structures can be built out of smallergroups. To build a useful understanding of a social network, a complete andrigorous description of a pattern of social relationships is a necessary starting point.

S. Mitra (�)Meghnad Saha Institute of Technology (Techno India Group), East Kolkata Township,Kolkata-700107, Indiae-mail: [email protected]; susanta [email protected]

B. Furht (ed.), Handbook of Social Network Technologies and Applications,DOI 10.1007/978-1-4419-7142-5 8, © Springer Science+Business Media, LLC 201

1690

Page 2: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

170 S. Mitra and A. Bagchi

This pattern of relationships between the actors can be better understood throughmathematical or formal representation like graphs. Therefore, a social network isrepresented as a directed graph or digraph. In this graph, each member of a socialcommunity (people or other entities embedded in a social context) is considered asa node and communication (collaboration, interaction or influence) from one mem-ber of the community to another member is represented by a directed edge. A graphrepresenting a social network has certain basic structural properties, which distin-guishes it from other type of networks or graphs. This type of graph is meant tostudy the nature of a real life social community and its structural changes over time.It may even be used for structural comparison between two social networks that inturn represents comparison between two social communities.

In order to understand the social properties and behavior of a community, socialscientists analyze the corresponding digraph. The number of nodes in a social net-work can be very small representing a circle of friends or very large representing aWeb community. This graphical representation is useful for the study and analysisof a social network. When a new member joins a social community like, a new im-migrant to a village or a new member to a club, he/she may not have any connectionwith any other member of the community. When mapped as nodes of a graph, thistype of new members to a social community would give rise to isolated nodes. Thepercentage of isolated nodes in a community is an important parameter of study fora social scientist. Moreover, all members of a community may not have contact withall other members. As a result, the community may form separate sub-groups. Mem-bers within a sub-group will have connection among themselves whereas membersof two different sub-groups will remain isolated from each other. When mappedon to the graph representing a social network, these sub-groups would give rise toisolated sub-graphs. In addition, each social network will also have some node re-lated information depending on the application area or the type of social communitythe network is representing [4]. For example, in a Web community, each node mayrepresent a web page containing data relevant for each such page.

Discussions made so far indicate that social scientists make rigorous computa-tion on the node-based and structural information of a graph representing a socialnetwork. For each such computation, entire graph related data, both node-based andstructural, need to be accessed. Since a social network may give rise to a graph ofthousands of nodes and edges, accessing the entire graph each time will contributesignificantly on the overall time of computation. Moreover, some social networkrelated applications try to search for interesting patterns on the existing data (bothnode-based and structural) [5]. Such social network related applications are quitecommon in web-based mining [6]. Overall computation time can be reduced to agreat extent if the structure-based and node-based selection and searching can bedone efficiently. In order to make it effective, the relevant information for both nodesand links along with common built-in structures like sub-graphs, cycles, paths etc.may be computed and stored apriori. If any application needs a particular type ofcomputation quite often, such information can also be pre-computed and stored.

Page 3: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

8 Modeling Temporal Variation in Social Network 171

In short, instead of starting from raw node and edge related data for each type ofanalysis, some storage and selective retrieval facility should be provided for socialnetwork applications involving large graphs.

8.1.1 Temporal Variation of a Social Network

A social network changes over time. Since a social network involves a social com-munity, new members keep joining and leaving the community. As a result, numberof nodes, edges and their interconnections in the corresponding graph also changeover time. The models proposed so far for the growth of social networks [7–9] arebased on stochastic processes or are predictive in nature. One of the well knowngrowth models of social networks proposed by Barabasi and Albert [10] is basedon preferential attachment. It considers that during growth, vertices of high de-gree (in-degree or out-degree) are likely to have more number of new edges thanthe vertices of low degree. This inclination towards popularity (centrality) or “richget richer” behavior is typically observed in Web-based social networks. However,this model has certain limitations when applied on a referral network [8]. More-over, though earlier models on social network have discussed a lot on the concept of‘community structure’, there was hardly any in depth study on the temporal changesin community structures due to evolution. These changes can be associated with for-mation, dissolution, growth, shrink, merger and split of social communities.

8.2 Web as a Social Network

World-Wide-Web (WWW) or just Web, as it is popularly known, is a rich and volu-minous source of information. The diverse and distributed nature of Web has givenrise to variety of research into the Web’s link structure ranging from graph-theoreticstudies (connectivity, reachability etc.) to community mining (like, discoveringstrongly connected structural components). Recently, Web has played a major rolein the formation of communities (Cyber-communities or Web communities) wheremembers or people from different parts of the globe can join the community forcommon interest. For example, members of a professional institution forming agroup like a special interest group (SIG) of ACM. Similarly, news groups, researchgroups may form Web communities. The number of communities in the Web isincreasing dramatically with time. This community formation is a powerful social-izing aspect of the Web that has made a tremendous influence on human society.Thus Web has become a good source of social networks. Structural similarities witha social network help in studying different sociological behaviors of a Web com-munity through applications of graph theory and social network analysis. Thesesimilarities lead towards a progress in knowledge representation and managementon the Web [11].

Page 4: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

172 S. Mitra and A. Bagchi

Legend : -Web objects,

WWW

Web Site

Web Page

- Relationship, 0.. 1 and 0.. ∗- Cardinalities

Fig. 8.1 Hierarchy of web objects in UML

8.2.1 Concept of Web Graph

Web can be viewed as a hierarchy of Web objects as represented in Fig. 8.1. TheWWW can be considered as a set of Web sites and a Web site as a set of Web pages.These Web sites form a directed graph with Web pages as nodes and hyperlinksbetween the Web pages as directed edges. Directed graph, thus induced by the hy-perlinks between the Web pages is known as the Web graph. The formal definitionof Web graph is provided here.

8.2.1.1 Definition

A Web graph is defined as a directed graph G (N, E) where N is the set of Webnodes, indicating corresponding Web pages, and E � f(u; v)ju; v � N g is the setof edges, indicating corresponding hyperlinks, where each edge is associated withan ordered pair of Web nodes.

Given a Web node i of G, ind(i) denote the set of Web nodes pointing to i or theindegree of i and outd(i) is the set of Web nodes that i points to or the outdegree ofi . Moreover, I (i ) D jind(i)j and O(I ) D joutd(i)j denote the cardinalities of ind(i)and outd(i), respectively.

For any two Web nodes i, j 2 N, a hyperlink from i to j can be denoted as i ! j. So,a Web graph G, like any other graph, can be represented by means of an adjacencymatrix A D .jNj � jNj/, where A(i; j ) D 1 if and only if i ! j, and 0 otherwise.Incidentally for a Web graph, number of nodes is so high that it becomes difficult toget computational facility for storing and manipulating the corresponding adjacencymatrix. So a considerable effort goes in the compression, storage and correspondingmanipulation of a Web graph.

Page 5: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

8 Modeling Temporal Variation in Social Network 173

8.2.1.2 Properties

As explained earlier, a Web community is a social network. So, a Web graphcorresponding to a Web community exhibits various properties that are similar toother types of social networks. Some of the important features are:

� Community Formation: Web can be seen as sets of communities, connected toor isolated from each other. Each community consists of nodes representingthe members of the concerned community with edges connecting them [12].This property of community formation has been widely adopted in hierarchi-cal models [13, 14] that try to find web communities where the members sharesome common features. These models are also based on some social measuresdescribing a social network.

� Evolving: Web graph is evolving in nature. In other words, it re-shapes its struc-ture over time with the addition or deletion of nodes and links. This evolutionreflects the changes in the social structure or the acquaintances and provides animportant area of study and research.

A snapshot of a sample social network that covers various properties is shown inFig. 8.2. Although a few nodes and edges have been considered here, the explanationwill soon show that even this small graph covers most of the structural peculiaritiesof a social network on the Web.

However, in order to understand the properties and graph compression, a noveltechnique applied in this system in an easier way, another simpler sample socialnetwork related to a referral network has also been considered in Fig. 8.3.

Although all the relevant structural components of a Web graph are present in thenetwork of Fig. 8.2, only ISG and isolated nodes are explained referring this figure.For the sake of convenience, the other structural components are explained referringthe network of Fig. 8.3.

� Isolated Subgraph (ISG): An isolated subgraph is a graph such that, if a graphG(V, E) has two isolated subgraphs G0(V0, E0) and G00(V00, E00), then V0 � V,V00 � V but V0\ V00 D ¥ and also E0 � E, E00 � E but E0\ E00 D ¥.

Fig. 8.2 Sample web community

Page 6: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

174 S. Mitra and A. Bagchi

Fig. 8.3 Sample referralnetwork

In a community, all members may not have contact with each other, giving rise toisolated subgraphs. The sample social network of Fig. 8.2 has two isolated sub-graphs, (1–2–3–4–5) and (6–7–8–9–10–11–12–13–14).

� Isolated Nodes: Any node � in a directed graph G(V. E), where � 2 V, has twoproperties Ind as the in-degree and Oud as the out-degree. In-degree provides thenumber of edges incident to � and out-degree is the number of nodes going outof �. A node � is an isolated node, if Ind. � D 0 and also Oud. � D 0.

In Fig. 8.2, nodes 15, 16 and 17 are the isolated nodes.As shown in Fig. 8.3, the network initially had four nodes (1, 2, 3, 4). Node 5 is

the acquaintance of node 4. So, node 5 joined the net. In turn node 5 brought nodes6 and 7 and they again brought node 8 in the network. This way the network keepson growing. It is assumed that at the time of query, Fig. 8.3 shows the current statusof the network. It can easily be seen that the referral net of Fig. 8.3 is equivalent toISG-2 of Fig. 8.2. This network consists of the following structural components:

� Strongly-Connected-Component (SCC): A strongly-connected-component is amaximal subgraph of a directed graph such that for every pair of nodes �1, �2

in the subgraph, there is a directed path from �1 to �2 and also a directed pathfrom �2 to �1.

If there exists an operator R.�1, �2/, such that R.�1, �2/ D True if node �2 is reach-able from node �1, i.e. there exists a path from node �1 to node �2, then subgraphG0(V0, E0) of graph G(V, E) is a SCC, if R.�1; �2/ D True and also R.�2, �1/ DTrue, where .�1; �2/ 2 V0.

This definition indicates that a reachability operator R will be required, in orderto check the existence of paths between any two nodes of a graph. Detail discussionin this regard will be made later.

The sample social network in Fig. 8.3 has two edge-types shown by farm andchain lines. Node sequence (1–2–3) represents a strongly connected componentwhen same edge-types are considered, whereas (1–2–3–4) is a SCC consideringboth the edge-types.

� Cycle: If the sequence of nodes defining a path of a graph, starts and ends at thesame node and includes other nodes at most once, then that path is a cycle. If ina graph G(V, E), (v0, v1,: : : : : : : : :,vn) be a node sequence defining a path P in Gsuch that (v0, v1,: : : : : : : : :,vn/ 2 V and v0 D vn, then P is a cycle.

Page 7: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

8 Modeling Temporal Variation in Social Network 175

Figure 8.3 shows three cycles; (1–2–3–1), (2–3–4–2) and (2–3–2). Here cycles havebeen considered irrespective of the variation in edge-types. The cycles may evenbe nested. Cycle (2–3–2) is nested within the other two cycles, (1–2–3–1) and(2–3–4–2).

� Reciprocal Edge: A cycle having only two nodes is a reciprocal edge. So, a re-ciprocal edge (�1; �2/ 2 V has directed edge from �1 to �2 and also from �2 to�1. A reciprocal edge is the smallest size cycle.

In Fig. 8.3, (2–3–2) is a reciprocal edge.

� Hyper-node: In a nested-cycle structure, the largest or the outermost cycle is de-fined as a hyper-node. For a graph G(V, E), if there exists a nested cycle structurewith a set of cycles such that, fC1 � C2 � : : :: � Cn/g where Ci is a cycle inG, then C1 is the hyper-node corresponding to the nested cycle structure. So, ahyper-node represents a SCC.

� Homogeneous Hyper-node: If in a hyper-node all the edge-types are same, thenit is a homogeneous hyper-node. Let, fC1 � C2 � : : :: � Cn)g be a nestedcycle structure in a graph G(V, E) where, Ci is a cycle in G. Now C1 will be ahomogeneous hyper-node if for any pair of edges, .�i, �j/ 2 C1 and .�r; �s/ 2 C1,.�i; �j/.edge-type D (�r; �s/.edge-type.

In Fig. 8.3, (1–2–3) is a homogeneous hyper-node.

� Heterogeneous Hyper-node: In a heterogeneous hyper-node all the edge-typesneed not be same. In Fig. 8.3, (1–2–3–4) is a heterogeneous hyper-node.

Though by definition, a hyper-node is the largest cycle in a nested cycle structure,the hyper-nodes themselves can also be nested. Since in a homogeneous hyper-nodeall the edges must be of same type, this hyper-node may be nested within anotherlarger cycle formed by edges of different types resulting a heterogeneous hyper-node. So, a homogeneous hyper-node may be nested within a heterogeneous hyper-node. In Fig. 8.3, homogeneous hyper-node (1–2–3) is nested within heterogeneoushyper-node (1–2–3–4).

The directed graph as shown in Fig. 8.3 E be converted to a Directed AcyclicGraph (DAG) through a compression process whereby the hyper-structures arefused to hyper-nodes and the edges are fused to hyper-edges. The compression pro-cess includes two types of compression namely, Homogeneous hyper-node basedcompression and Heterogeneous hyper-node based compression. The detail of com-pression process has been discussed elsewhere [15].

8.3 Evolution of Web Graph

It is well known that Web graph is a giant social network that shares many structuralproperties with other social networks. Hence to understand and explain the evolutionof a social network it is better to consider Web graph as a case study. This is also well

Page 8: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

176 S. Mitra and A. Bagchi

justified by the fact that change in network is substantial in case of Web where thepages are added to the Web at a very high rate and pages also disappear quite oftenfrom the Web. The evolution of Web has been widely studied for quite sometime andhas received much attention in the physics literature [16–19]. The studies on theevolution have proved to be challenging due to the complex structure of Web. Theevolution pattern of the Web is not fully understood yet and has provided the scopefor active research in this area. However, all the proposed models on the evolutionof Web graph are based on stochastic processes only and do not attempt to capturethe microscopic details of social dynamics through a data model. Evaluation of Webqueries for static and dynamic Web was initially proposed by [20]. It is one of theearliest works on dynamic Web graph that have dealt on the computational aspects ofdynamic Web queries. However, it has not proposed a formal temporal data modelto account for the temporal changes of the structural components of Web graph.Like other social networks, the formation and interaction of communities are alsowell exhibited by Web graph during evolution. Only a few models [21–23] haveconsidered the same during the study of Web-based social networks. However, noneof these models have carried out analysis of the temporal variation of the structuresof the communities and their components (pages, hyperlinks) by associating a validtime period or interval of existence with each of these dynamic structural objects.Valid time period should also be associated with the hyper-structures and paths asdiscussed in earlier sections. This approach of analysis can be quite important andvaluable for the sociologists and information scientists. An attempt has only beenmade in [24] to study the dynamic nature of Web graph against a temporal axis,where each edge is labeled with data of its first and last appearance in the Web.However, this approach poses several challenges like; (a) efficient representation ofdynamic graphs in secondary memory, (b) application of compression techniquesto dynamic Web graph and (c) utilization of timestamps, e.g. valid time, to answertemporal queries on structural evolution of Web graph.

Thus, it is apparent from the above discussion that none of the earlier researchefforts have proposed any temporal data model for a dynamic Web graph. This chap-ter proposes Dynamic Data Model to manage temporal changes in the structure ofa Web graph representing a social network. With application specific changes, thismodel may also be adopted for other areas of evolutionary graph (e.g. electricaldistribution, biological pathways etc.).

A Web graph representing a social network is a dynamic system that alwaysevolves and re-shapes its structure over time due to the changes in the sociologicalbehavior of its members. However in a data model, such structural changes can onlybe represented by discrete time instants similar to versioning in standard temporaldata models. So between two time instants of study, the graph structure is consideredto be constant. Hence, the structure of a Web graph does not remain same at twodifferent time instants. Therefore, the reshaping of the Web structure means thatthe Web graph at any time point t is different from that at time tC1 or t�1. Thisfact, of course demands that the graph needs to be analyzed at several time pointsto understand the pattern of evolution. The basic mechanism that can explain the

Page 9: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

8 Modeling Temporal Variation in Social Network 177

dynamic behavior of Web graph is the hyperlinking of Web resources. So, there arefour basic processes that characterize the evolution of a Web graph:

� Addition of new nodes (pages)� Addition of new edges (hyperlinks)� Removal or deletion of existing nodes� Removal or deletion of existing edges

In reality, the combination of these four basic processes can produce a complexdynamic behavior of a Web graph due to its evolution over time. When a Web graphis studied in two successive time instants, structural comparison between the twographs reveals the actual changes that have occurred. It enumerates the number ofnodes (pages) and edges (hyperlinks) and also the interconnections among them.Definitely, these changes would also include changes in the hyper-structures andpaths, if any.

This study and analysis on the structural evolution of a Web graph can also help inthe discovery of compact and densely connected zones of ‘information’ and ‘knowl-edge’. Moreover, further study on the characteristics and properties of this type of‘Web Knowledge Bases’ can bring out many important and interesting informa-tion on changes in a Web community structure with respect to pages and links ofdifferent time instants i.e., over a period of time. These changes may cause forma-tion, dissolution, growth or merger and decay or split of Web communities [21,23].Thus it becomes apparent that collection of structural changes through a set of suc-cessive time instants can give rise to a composite structural evolution pattern of aWeb community over a long period of time. This fact is well explained and illus-trated through the data model and the evolutionary query examples in the subsequentsections. The next sections cover only the structural evolution of Web graph. In otherwords, change in the content of a page over time, the usual area of study for a tempo-ral data model, has not been considered here. Designing a data model for temporalchanges of an evolutionary graph is definitely a new area of study.

8.4 Dynamic Web Graph Model

This section provides the overview and preliminaries of the dynamic data modelfollowed by the detailed description. As discussed earlier, in a social community,different types of social activities may cause changes in social relationship andthus the structure of a Web graph that represents such a community. However, suchchanges may not happen in the entire graph. It may happen in some portions of aWeb graph or in certain sub-graphs of such a graph. In order to study such changes,the temporal evolution process at different parts of a Web-based social networkneeds to be studied. So, facility for selective retrieval of data would be necessary.Therefore, once again, a data model has to be designed to capture the temporalchanges in the Web-based social network. This would help in studying the fre-quently changing sub-structures leaving the entire graph. To capture these temporal

Page 10: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

178 S. Mitra and A. Bagchi

changes in a social network, a dynamic graph data model has been proposed here.This type of analysis may cover situations like,

� Observing social and cultural trends over time for sociological research. For ex-ample, the number of accesses to matrimonial or employment sites in a referralnetwork during a time period is a trend indicator for a society.

� Observing structural changes in a society over time (formation of sub-groups,i.e. sub-graphs, dissolution or splitting of such groups etc.) with change in socialfacilities like communication, financial status, availability of resources etc.

8.4.1 Dynamic Data Model Preliminaries

� Granularity: Time granularity provides the domain of the value of time associ-ated with the time stamp for each structural component.

� Valid Time: It provides the time stamp for any structural component. It representsa time range through which a structural component exists in a Web graph. It hasa Start-time and an End-time.

� Valid Start-time: The start-time of a time stamp i.e., Valid time.� Valid End-time: The end-time of a time stamp i.e., Valid time.

Both start-time and end-time will assume values within the domain of granularitydefined in an application. For example, if granularity is in terms of year, then avalid time (2001–2005) for a node signifies that the node entered the graph and itsassociated social network in its version of year 2001 and existed in the graph till itsversion of year 2005. So, for any valid time, valid end-time > valid start-time withinthe domain of values of granularity used.

� Now: A temporal variable that implicitly represents the current time. So, Now isalways associated with the time stamp of the current version.

If a structural component joins and leaves a graph within the same version, itsvalid start-time D valid end-time. For example, a structural component introducedin 2001 version disappears in 2002 version would have a valid time (2001, 2001).

So, when a new component is introduced in the current version, its valid time,till the next version is created, is (Now, Now).

� Lifespan: Lifespan of a structural component in a Web graph is the time intervalduring which the component is valid. This interval is the difference between avalid start-time when the component first appeared in the graph and a valid end-time after which the component disappeared. If ve represents the valid end-timeand vs represents the valid start-time of a structural component, then .ve � vs/

is the lifespan of the component. Accordingly, a member function lifespan hasbeen provided to compute time of existence of any object instance of any objecttype in the temporal version of the object-relational data model as described inSect. 8.4.2.

Page 11: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

8 Modeling Temporal Variation in Social Network 179

Lifespan attribute may be used to compute lifespan overlapping between two struc-tural components. It is the intersection of lifespans of the two components. If [ti, tj]and [tk, tl] be valid time of any two structural components, where i < j, k < l, i < k,j � k and j � l, then .tj � tk/ is the overlapping lifespan.

Lifespan containment signifies that the valid time of a structural component iswithin the valid time of another structural component. If [ti, tj] and [tk, tl] be thevalid time of any two structural components, where i < j, k < l, i � k, j � l, then[tk, tl] is contained within [ti, tj].

� Query Time Period: Time interval over which a query has been made. Like validtime, query time will also have a query start-time and query end-time with valuesin the domain of granularity defined. For example a query, ‘Find the number ofWeb pages related to Indian Internet Banking created since 2003’ will have querytime period as [2003, Now].

8.4.2 Temporal Structure-Based Schema

A temporal structure-based schema has been defined in this section to representthe structural evolution of a Web graph representing a social network. This schemaadmits different type structures. In addition, a temporal attribute valid time hasbeen provided with each object type to indicate the time of existence of each ob-ject instance for each such object type. A member function lifespan, as describedin Sect. 8.4.1 has also been provided to compute the actual period of existence ofeach object instance as the difference of its valid end time and valid start time. Aunit named timestamp has been defined to indicate the values of valid end time andvalid start time. The unit of timestamp will depend on the unit of granularity de-fined for the application modeled. In the experimental schema used in this chapter,it has been marked as a year like, 2001, 2002, etc.

The relevant object types for the temporal structure-based schema are shownbelow:

Valid time(type: ADT;valid start time: timestamp;valid end time: timestamp;member functions :lifespan returns integer;)Graph(type: ADT;graph id: string;version: timestamp;

Page 12: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

180 S. Mitra and A. Bagchi

ISGs: TABLE OF REF ISG;isolated nodes: TABLE OF REF Node;member functions :no of ISG returns integer,no of isolated nodes returns integer,and other member functions;)ISG(type: ADT;ISG id: string;valid time: REF Valid time;homogeneous hyper nodes: TABLE OF REF Homogeneous hyper node;heterogeneous hyper nodes: TABLE OF REF Heterogeneous hyper node;hyper edges: TABLE OF REF Hyper edge;nodes: TABLE OF REF Node;edges: TABLE OF REF Edge;paths: TABLE OF REF Path;member functions :isg size returns no of nodes as integer,no of heterogeneous hyper nodes returns integer,no of homogeneous hyper nodes returns integer,max homogeneous hyper node sizereturns no of nodes as integer,max path length returns no of nodes as integer,and other member functions)Homogeneous hyper node(type: ADT;homogeneous hyper node id: string;valid time: REF Valid time;homogeneous hyper node edge type: string;//since all edges within a homogeneous hyper node are of same type, inclusion ofhomogeneous hyper node edge type is useful to process queries searching for ho-mogeneous hyper nodes of specific edge type//cycles: TABLE OF REF Cycle;nodes: TABLE OF REF Node;edges: TABLE OF REF Edge;member functions :homogeneous hyper node size returns no of nodes as integer,and other member functions)

Page 13: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

8 Modeling Temporal Variation in Social Network 181

Heterogeneous hyper node(type: ADT;heterogeneous hyper node id: string;valid time: REF Valid time;homogeneous hyper nodes: TABLE OF REF Homogeneous hyper node;hyper edges: TABLE OF REF Hyper edge;cycles: TABLE OF REF Cycle;nodes: TABLE OF REF Node;edges: TABLE OF REF Edge;member functions :heterogeneous hyper node size returns no of nodes as integer,and other member functions)Hyper edge(type: set of edges;hyper edge id: string;valid time: REF Valid time;start node: REF Heterogeneous hyper node or Homogeneous hyper node or Node;end node: REF Heterogeneous hyper node or Homogeneous hyper node or Node;hyper edge type: string;//since all edges within a hyper-edge are of same type, inclusion of hyper edge typeis useful to process queries searching for hyper edges of specific edge type//hyper edge members: TABLE OF REF Edge;member functions :hyper edge size returns no of edges as integer,and other member functions)Cycle(type: ADT;cycle id: string;valid time: REF Valid time;nodes: TABLE OF REF Node;edges: TABLE OF REF Edge;member functions :cycle size returns no of nodes as integer,and other member functions)Path(type: sequence of nodes;path id: string;valid time: REF Valid time;

Page 14: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

182 S. Mitra and A. Bagchi

path start node: REF Heterogeneous hyper node or Homogeneous hyper nodeor Node;path end node: REF Heterogeneous hyper node or Homogeneous hyper node orNode; path edgelist: TABLE OF REF Hyper edge or Edge;member functions :path length returns no of nodes as integer,//no of nodes refers to the total number of heterogeneous hyper nodes, homoge-neous hyper nodes as well as nodes belonging to the path//and other member functions)Edge(type: sequence of nodes;edge id: string;valid time: REF Valid time;edge type: string;start node: REF Node;end node: REF Node;//edge is a sequence of nodes of length 2//)Node(type: object;node id: string;valid time: REF Valid time;in degree: integer;out degree: integer;node type: string;//the system defines four types of nodes; isolated, source, sink, communicator//)

8.5 Conclusion and Future Works

A social network grows over time. Temporal data models developed are the ex-tensions of relational, object-oriented and later, the object-relational models. Theycould hardly handle graph related data. Dynamic graphs or temporal changes instructure for social network analysis have already been studied but once again, withno effort for data management. A comprehensive object-relational data model hasbeen proposed in this chapter to incorporate the necessary features of a dynamicor evolutionary Web graph represented as a social network. The model covers thepossible types of changes that can occur among the structural components due toevolution. It supports efficient evaluation of dynamic or temporal queries on graph

Page 15: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

8 Modeling Temporal Variation in Social Network 183

structures. Different temporal query operators have been defined for query formu-lation of queries. The definition of query operators and query examples as well asthe detail implementation process of Web graph evolution is beyond the scope ofthe present chapter. The proposed model is a generic and comprehensive data modelfor managing the changes in sociological behavior of the Web communities and canbring out the important and interesting information on change in Web communitystructure over a period of time.

However, a lot of scope for future works exists in this important and challeng-ing area of network evolution. First of all, the model needs to be implemented andthoroughly tested for the current Web Graphs having nodes and edges of order ofbillions. Novel graph compression techniques based on structures as proposed bythe authors in [15] or any other techniques like [25] needs to be applied for efficientdata storage and retrieval. Secondly, more efficient query processing systems needto be designed to get the dynamic information on structures as well as their com-ponent nodes and edges. Work on complex query processing on static or snapshotdata model for a Web graph had already been done by the authors and communi-cated [26].

It is envisaged that when all the efforts made in this chapter are published, itwill provide an impetus to many other research works in the area of social networkmodeling in particular and on dynamic graph data model in general.

References

1. Leinhardt, S. (1977). Social networks: a developing paradigm. Academic Press, New York.2. Rao, A.R. and Bandyopadhyay, S. (1987). Measures of reciprocity in a social network.

Sankhya: The Indian Journal of Statistics, Series A, 49, 141–188.3. Rao, A.R., Bandyopadhyay, S., Sinha, B.K., Bagchi, A., Jana, R., Chaudhuri, A. and Sen, D.

(1998). Changing social relations – social network approach, Technical Report. Survey Re-search and Data Analysis Center, Indian Statistical Institute.

4. Mitra, S., Bagchi, A. and Bandyopadhyay, A.K. (2007). Design of a data model for socialnetwork applications. Journal of Database Management, 18, 4, 51–79.

5. Chen, L., Gupta, A. and Kurul, E.M. (2005). Efficient algorithms for pattern matching ondirected acyclic graphs. IEEE ICDE.

6. Chakrabarti, S. (2004). Web mining. Elsevier.7. Liben-Nowell, D. and Kleinberg, J. (2003). The link prediction problem for social networks.

Proceedings of the ACM CIKM.8. Newman, M.E.J. (2003). The structure and function of complex networks. SIAM Review, 45,

167–256.9. Jin, E.M., Grivan, M., and Newman, M.E.J. (2001). The structure of growing social networks.

Physics Review E, 64, 046132.10. Barabasi, A. and Albert, R. (1999). Emergence of scaling in random networks. Science, 286,

509–512.11. Kumar, R., Raghavan, P., Rajagopalan, S. and Tomkins, A. (2002). Web and social networks.

IEEE Computer, 35(11), 32–36.12. Flake, G.W., Lawrence, S.R., Giles, C.L. and Coetzee, F.M. (2002). Self-organization and iden-

tification of web communities. IEEE Computer, 35, 66–71.13. Kleinberg, J.M. (2002). Small world phenomena and the dynamics of information. Proceedings

of the 2001 Neural Information Processing Systems Conference, MIT Press, Cambridge, MA.

Page 16: Handbook of Social Network Technologies and Applications || Modeling Temporal Variation in Social Network: An Evolutionary Web Graph Approach

184 S. Mitra and A. Bagchi

14. Watts, D.J., Dodds, P.S. and Newman, M.E.J. (2002). Identity and search in social networks.Science, 296, 1302–1305.

15. Bhanu Teja, C., Mitra, S., Bagchi, A. and Bandyopadhyay, A.K. (2007). Pre-processing andpath normalization of a web graph used as a social network. Journal of Digital InformationManagement 5, 5, 262–275.

16. Salathe, M., May, M.R. and Bonhoeffer, S. (2005). The evolution of network topology byselective removal. Journal of Royal Society Interface, 2, 533–536.

17. Krapivsky, P.L. and Redner, S. (2002). A statistical physics perspective on web growth. Com-puter Networks, 39, 261–276.

18. Dorogovtsev, S. and Mendes, J. (2002). Evolution of networks. Advances in Physics, 51,1079–1187.

19. Dorogovtsev, S. and Mendes, J. (2000). Scaling behavior of developing and decaying networks.Europhysics Letter, 52, 33–39.

20. Mendelzon, A.O. and Milo, T. (1997). Formal models of web queries. Proceedings of the ACMDatabase Systems, 134–143.

21. Tawde, B.V., Oates, T. and Glover, E.J. (2004). Generating web graphs with embedded com-munities. Proceedings of the World Wide Web Conference.

22. Chakrabarti, S., Joshi, M.M., Punera, K. and Pennock, D.M. (2002). The structure of broadtopics on the web. Proceedings of the World Wide Web Conference.

23. Toyoda, M. and Kitsuregawa, M. (2003). Extracting evolution of web communities froma series of web archives. Proceedings of the Fourteenth Conference on Hypertext andHypermedia, 28–37.

24. Kraft, R., Hastor, E. and Stata, R. (2003). Timelinks: Exploring the link structure of the evolv-ing Web. Second Workshop on Algorithms and Models for the Web Graph.

25. Dourisboure, Y., Geraci, F. and Pellegrini, M. (2007). Extraction and classification of densecommunities in the web. Proceedings of the International World Wide Web conference.

26. Mitra, S., Bagchi, A. and Bandyopadhyay, A.K. (2008). Complex query processing on webgraph: A social network perspective. Journal of Digital Information Management, 6, 1, 12–20.

Biographies of Authors

Susanta Mitra is a Professor and Head of Department of Computer Science & Engg.and IT at Meghnad Saha Institute of Technology (Techno India Group), Kolkata,India. He has received Ph.D. (Comp. Sc.) from Jadavpur University, Kolkata.His research interests include social networking, Web graph analysis and mining,data modeling, social computing, data structure and algorithm. Prof. Mitra is aProfessional Member of Association for Computing Machinery (ACM) and SeniorLife Member of Computer Society of India (CSI).

Aditya Bagchi is the Dean of Studies at Indian Statistical Institute, Kolkata, India.Prof. Bagchi received his Ph.D. (Engg.) from Jadavpur University in 1987. He hasserved as visiting scientist at the San Diego Super Computer Centre, Universityof California, San Diego, USA and at the Centre for Secure Information Systems,George Mason University, Virginia, USA. He is a Senior member of CSI, memberof ACM Sigmod and IEEE Computer Society. His research interests include AccessControl and Trust Negotiation algorithms, developing new measures for AssociationRule mining and application specific Data Modeling.