DM-MEETING Bijaya Adhikari 11.11.2015. OUTLINE From Micro to Macro: Uncovering and Predicting...

Preview:

DESCRIPTION

FROM MICRO TO MACRO: UNCOVERING AND PREDICTING INFORMATION CASCADING PROCESS WITH BEHAVIORAL DYNAMICS

Citation preview

DM-MEETING Bijaya Adhikari11.11.2015

OUTLINE From Micro to Macro: Uncovering and Predicting Information Cascading Process with Behavioral Dynamics Yu et al.

Graph Summarization with Quality Guarantees Riondato et al.

FROM MICRO TO MACRO: UNCOVERING AND PREDICTING INFORMATION CASCADING PROCESS WITH BEHAVIORAL DYNAMICS

MOTIVATION Can we predict cascades in a network ? Are they predictable ? If yes, given an early stage of information cascade, can we predict its cumulative cascade size for any later time ?

KEY IDEA When a node is involved in cascades, so are some of its offspring. If the dynamic process of these node level sub-cascades can be accurately modelled, then the whole cascade process can be predicted by an additive function of these local sub-cascades.

Look into micro mechanism of cascades by decomposing it into multiple local (one-hop) sub-cascades and predict cascading processes.

ILLUSTRATION

EXAMPLE

Comparison of Prediction for observations at various times against the true cascade(red)

BEHAVIORAL DYNAMICS Behavioral dynamics of a node captures cumulative number of its infected descendants once it gets infected

Cumulative size varies from cascade to cascade, use survival rate

PARAMETERIZING BEHAVIORAL DYNAMICS

KS-Statistic shows that Weibull distribution is most adequate for parameterizing behavioral dynamics

PDF

Survival

Hazard

Source: https://wikimedia.org

COVARIATES OF BEHAVIORAL FEATURES Some nodes have no or very little sub-cascades and the parameters learned form data are difficult to interpret (twitter like data)

WHY CAN WE INFER CASCADES FROM EARLY STAGES ?

Minor Dominance and Early Stage Dominance

FORMAL STATEMENT

SURVIVAL ANALYSIS

NETWORKED WEIBULL REGRESSION (NEWER) MODEL

Fit Weibull distribution on survival time of node i

REGULARIZED NLL FOR NEWER

Optimize F by coordinate descent

EFFICIENT CASCADE PREDICTION

SAMPLING MODEL Estimate Cascade dynamically so that the changes are monitored

Sub-cascade generated by a node is zero if no other node is involved Temporal size counter and final death rate do not change but death rate increases over time

Causes relative error rate of

Therefore cascade size can be dynamically estimated within some error bound

EXPERIMENTS : CASCADE SIZE PREDICTION

EXPERIMENTS: OUTBREAK TIME PREDICTION

GRAPH SUMMARIZATION WITH QUALITY GUARANTEES

MOTIVATION As the graph sizes grow, analysis, visualizing, and mining graphs become computationally challenging.

As large networks do not fit in memory, accessing disk makes computation even slower.

Can we find lossy concise representation of large graph that fits into main memory ?

DEFINITION Given a graph G =(V, E) and an integer k, k summary S of G is a complete weighted undirected graph

The vertices of S are called supernodes and they have superedges between them

Each superedge is weighted by density of edges between Vi and VJ Where,

AG is the Adjacency matrix of original graph

DEFINITION Density matrix

The density matrix can be lifter to n*n matrix,

Where s(v) of a vertex in a original graph is a supernode in S

EXAMPLE

PROBLEM DEFINITION

LP RECONSTRUCTION ERROR

THE BEST MATRIX FOR A GIVEN PARTITION Given a k partition we say that n*n matrix M is P-constatnt if Si * SJ submatrix of M is constant for all i and j between 1 an k

It is shown that finding a P-constant matrix to represent the graph with some guaranteed quality reduces to k-means problem with l2 metric (k-meadian with l1 metric)

EXPERIMENTS: RECONSTRUCTION ERROR

EXPERIMENTS: SUMMARIZATION

Recommended