Upload
markus-luczak-roesch
View
155
Download
2
Embed Size (px)
Citation preview
From coincidence to purposeful flow? Properties of transcendental information cascades.�Markus Luczak-Roesch �University of Southampton, Web and Internet Science Group�@mluczak | http://sociam.org �
Community-level linguistic change�
Project � initial 10%� most recent 10%�
PH � transit, star, day, aph, look, one, planet, like, possibl, dip�
day, transit, httparchive. . . , possibl, star, kid, dip, look, planet, like �
SF� like, look, fish, sea, scallop, thing, imag, right, star, left �
corallinealga, anemon, object, hermitcrab, bryozoan, stalkedtun, shrimp, left, cerianthid, sanddollar �
NN� field, record, one, use, enter, get, work, can, specimen, button �
like, field, record, date, name, can, click, look, get, label �
Stable domain specific vocabulary�
Emerging domain specific vocabulary�
Stable problem/error reporting�
Dominance of microposts and implicit coordination�
PH � SG � SW� NN� GZ � CC � PF� SF� AP � WS�
91%�Vo
cabu
lary s
hift�
2
0
6
4
10
8
Microposts�
Luczak-Roesch, M., Tinati, R., Simperl, E., Van Kleek, M., Shadbolt, N., & Simpson, R. (2014). Why won't aliens talk to us? Content and community dynamics in online citizen science. Proceedings of the Eighth AAAI Conference on Weblogs and Social Media, {ICWSM} 2014, Ann Arbor, Michigan, USA, June 1-4, 2014. �
A qualitative investigation of crowdsourced disaster response�
• Haiti (Ushahidi, N=298) �– requests for help from
identified local source�
• Congo (Ushahidi, N=102) �– information about the
situation but not who is responsible for this information�
– more non-local sources�
• Ebola (Twitter, N=298) �– comments�
• tasteless jokes�• racist comments�• concern that the crisis could
spread and call to governments to close the borders �
Boundaries of crowdsourced disaster response�
• Wrong things go viral �• Crowdsourcing informativeness
of social media information not synchronized with crises* �
negative� neutral� positive�
11 “When you tell a […] kid that is has got Ebola”
*Olteanu, A., Vieweg, S., & Castillo, C. (2015). What to Expect When the Unexpected Happens: Social Media Communications Across Crises. In In Proc. of 18th ACM Computer Supported Cooperative Work and Social Computing (CSCW’15), (No. EPFL-CONF-203562).�
We can observe situations when online communication does not happen along explicit social ties (especially in critical situations when time to make decisions is rare). Instead of talking explicitly with each other people are broadcasting about the same event or topic.�
Source: United Nations Development Programme, https://goo.gl/Z1uXdV, CC BY-NC-ND 2.0 �
“An informational cascade occurs when it is optimal for an individual, having observed the actions of those ahead him, to follow the behavior of the preceding individual without regard to his own information.” [1] �
[2]
[1] Bikhchandani, Sushil, David Hirshleifer, and Ivo Welch. "A theory of fads, fashion, custom, and cultural change as informational cascades." Journal of political Economy (1992): 992-1026.�[2] Cheng, Justin, et al. "Can cascades be predicted?." Proceedings of the 23rd international conference on World wide web. International World Wide Web Conferences Steering Committee, 2014.�
Boundaries of context-rich approaches�
Does the accumulated information propagation behaviour on the Web form giant purposeful processes?�
Sour
ce: M
ichae
l Dale
s, htt
ps:/
/goo
.gl/I
KXs4
X, CC
BY-NC
2.0�
Discovering the algorithms of Social Machines�
Socio-technical Computation�The computational capability embodied in cascades of information sharing activities on the Web that are not necessarily conditioned by system-specific or social network features but only time and inherent properties of pairs of resources. �
Markus Luczak-Roesch, Ramine Tinati, Kieron O'Hara, and Nigel Shadbolt. 2015. Socio-technical Computation. In Proceedings of the 18th ACM Conference Companion on Computer Supported Cooperative Work & Social Computing (CSCW'15 Companion). ACM, New York, NY, USA, 139-142. http://doi.acm.org/10.1145/2685553.2698991 �
2-state model � infinite-state model �
HF LF
[3] Kleinberg, Jon. "Bursty and hierarchical structure in streams." Data Mining and Knowledge Discovery 7.4 (2003): 373-397.�
Time�
Numb
er of
obse
rved d
ocum
ents�
Content streams as automata [3] �
Building transcendental information cascades�
only local understanding of its use but also an abstract globalview. This lets us propose a new model that we call transcen-dental information cascades. Informed by Kleinbergs work onbursty structures in document streams [2] it regards time asthe only ascertainable condition for relationships between anypairs of resources, meaning that we focus on coincidence ofinformation sharing activities rather than socially-determinedconditionality.
In [20] we presented the initial definition of a transcenden-tal information cascade as a 4-tupel TC = (V,E,R, F ). This4-tupel represents a directed network consisting of a set ofnodes V and edges E, derived when applying a set of matchingfunctions F to a set of resources R = {r1, r2, ..., rm}, r
i
=(u
i
, ti
, ci
), where every ui
is a unique identifier of a resourceri
that was shared at the time ti
with the content ci
. Nodes inthe network are those resources from R that contain a set I
i
ofone or multiple cascade identifiers. A cascade identifier is anyunique informational pattern that is recognized by applyinga matching function to the content or any other inherentproperties of a resource (e.g. simple string matching algorithmsto identify keywords in content). Formally a matching functionfk
2 F, k 2 N, k n is defined as:
fk
(ci
) =
8>>>>><
>>>>>:
{i1, i2, ..., ix} if fk
matches patterns{i1, i2, ..., ix} in c
i
x 2 N
; otherwise
Nodes V and edges E are then given as follows
V ={v1, v2, ..., vp}vy
= (uy
, ty
, Iy
),
E ={e1, e2, ..., eq}ez
=(ua
, ub
,⇤z
)
with Ii
= {i1, i2, ..., io} = f1(ci) [ f2(ci) [ ... [ fn
(ci
) beingthe result of the concatenation of all identifiers found by allmatching functions2. An edge exists between any two nodesthat share a unique subset of all the cascade identifiers thatwere found for them. This subset and none of its subsets ispart of the identifiers found for any node that was created in thetime period between when the two linked nodes were created.
⇤z
={ir
|ir
2 Ia
^ ir
2 Ib
,
8ir
! V 0 =
{vc
|vc
= (uc
,tc
, Ic
), ir
2 Ic
^ ta
tc
tb
} = ;,vc
2 V, r 2 N, r |Ib
|}
A node that contains a cascade identifier that was notdetected for any other nodes before is called the identifierroot. Beside this we call a node without any incoming edgesa network root and node that has no outgoing edges a stub.Our cascade model clearly yields different outputs dependingon the data to hand (e.g. determined by the extent of the
2Please note that [20] contains an unintentionally malformed equation forthis as the wrong symbol was used to refer to the concatenation of the matchingfunctions.
Web crawl), and the matching algorithms determining whichcascade identifiers will be spotted (e.g. reuse of hashtags,URIs, quotes, images, or maybe exploiting wider semanticsor sentiment) as depicted in Figure ??.
Fig. 1. Depending on the applied matching functions, different transcendentalinformation cascade representations can be generated for the same input data.
A fictive example of a transcendental cascade based on ourmodel is shown in Figure 2. Consider a system that featureshashtags as an established form of identifying content patterns.The visualisation uses the following approach to representdistinct identifiers and time: Nodes are chronologically orderedalongside the horizontal dimension from left (the oldest node)to right (the most recent node); additionally nodes are orderedalongside the vertical dimension depending on the set ofidentifiers present in a node (each unique set is assigned toa distinct level). Consequently, the visualisation represents thecontent creation sequence (“#A”) - (“#A#B”) - (“#A”) - (“#A”)- (“#A#B#C”) - (“#C”) - (“#A”) - (“#B#D”) - (“#A”).
Fig. 2. Example of a cascade that emerges along five different identifiers.#A, #B, #A#B#C, #B#D and #C are fictive hashtags (or hashtag combinationsresepectively) treated as the indentifying content patterns
In order to understand how edges are labelled we highlightthe sub-graph involving the nodes 2, 3, 4, and 5. Conformingto our cascade model an edge exist between nodes 2 and 3
only local understanding of its use but also an abstract globalview. This lets us propose a new model that we call transcen-dental information cascades. Informed by Kleinbergs work onbursty structures in document streams [2] it regards time asthe only ascertainable condition for relationships between anypairs of resources, meaning that we focus on coincidence ofinformation sharing activities rather than socially-determinedconditionality.
In [20] we presented the initial definition of a transcenden-tal information cascade as a 4-tupel TC = (V,E,R, F ). This4-tupel represents a directed network consisting of a set ofnodes V and edges E, derived when applying a set of matchingfunctions F to a set of resources R = {r1, r2, ..., rm}, r
i
=(u
i
, ti
, ci
), where every ui
is a unique identifier of a resourceri
that was shared at the time ti
with the content ci
. Nodes inthe network are those resources from R that contain a set I
i
ofone or multiple cascade identifiers. A cascade identifier is anyunique informational pattern that is recognized by applyinga matching function to the content or any other inherentproperties of a resource (e.g. simple string matching algorithmsto identify keywords in content). Formally a matching functionfk
2 F, k 2 N, k n is defined as:
fk
(ci
) =
8>>>>><
>>>>>:
{i1, i2, ..., ix} if fk
matches patterns{i1, i2, ..., ix} in c
i
x 2 N
; otherwise
Nodes V and edges E are then given as follows
V ={v1, v2, ..., vp}vy
= (uy
, ty
, Iy
),
E ={e1, e2, ..., eq}ez
=(ua
, ub
,⇤z
)
with Ii
= {i1, i2, ..., io} = f1(ci) [ f2(ci) [ ... [ fn
(ci
) beingthe result of the concatenation of all identifiers found by allmatching functions2. An edge exists between any two nodesthat share a unique subset of all the cascade identifiers thatwere found for them. This subset and none of its subsets ispart of the identifiers found for any node that was created in thetime period between when the two linked nodes were created.
⇤z
={ir
|ir
2 Ia
^ ir
2 Ib
,
8ir
! V 0 =
{vc
|vc
= (uc
,tc
, Ic
), ir
2 Ic
^ ta
tc
tb
} = ;,vc
2 V, r 2 N, r |Ib
|}
A node that contains a cascade identifier that was notdetected for any other nodes before is called the identifierroot. Beside this we call a node without any incoming edgesa network root and node that has no outgoing edges a stub.Our cascade model clearly yields different outputs dependingon the data to hand (e.g. determined by the extent of the
2Please note that [20] contains an unintentionally malformed equation forthis as the wrong symbol was used to refer to the concatenation of the matchingfunctions.
Web crawl), and the matching algorithms determining whichcascade identifiers will be spotted (e.g. reuse of hashtags,URIs, quotes, images, or maybe exploiting wider semanticsor sentiment) as depicted in Figure ??.
Fig. 1. Depending on the applied matching functions, different transcendentalinformation cascade representations can be generated for the same input data.
A fictive example of a transcendental cascade based on ourmodel is shown in Figure 2. Consider a system that featureshashtags as an established form of identifying content patterns.The visualisation uses the following approach to representdistinct identifiers and time: Nodes are chronologically orderedalongside the horizontal dimension from left (the oldest node)to right (the most recent node); additionally nodes are orderedalongside the vertical dimension depending on the set ofidentifiers present in a node (each unique set is assigned toa distinct level). Consequently, the visualisation represents thecontent creation sequence (“#A”) - (“#A#B”) - (“#A”) - (“#A”)- (“#A#B#C”) - (“#C”) - (“#A”) - (“#B#D”) - (“#A”).
Fig. 2. Example of a cascade that emerges along five different identifiers.#A, #B, #A#B#C, #B#D and #C are fictive hashtags (or hashtag combinationsresepectively) treated as the indentifying content patterns
In order to understand how edges are labelled we highlightthe sub-graph involving the nodes 2, 3, 4, and 5. Conformingto our cascade model an edge exist between nodes 2 and 3
only local understanding of its use but also an abstract globalview. This lets us propose a new model that we call transcen-dental information cascades. Informed by Kleinbergs work onbursty structures in document streams [2] it regards time asthe only ascertainable condition for relationships between anypairs of resources, meaning that we focus on coincidence ofinformation sharing activities rather than socially-determinedconditionality.
In [20] we presented the initial definition of a transcenden-tal information cascade as a 4-tupel TC = (V,E,R, F ). This4-tupel represents a directed network consisting of a set ofnodes V and edges E, derived when applying a set of matchingfunctions F to a set of resources R = {r1, r2, ..., rm}, r
i
=(u
i
, ti
, ci
), where every ui
is a unique identifier of a resourceri
that was shared at the time ti
with the content ci
. Nodes inthe network are those resources from R that contain a set I
i
ofone or multiple cascade identifiers. A cascade identifier is anyunique informational pattern that is recognized by applyinga matching function to the content or any other inherentproperties of a resource (e.g. simple string matching algorithmsto identify keywords in content). Formally a matching functionfk
2 F, k 2 N, k n is defined as:
fk
(ci
) =
8>>>>><
>>>>>:
{i1, i2, ..., ix} if fk
matches patterns{i1, i2, ..., ix} in c
i
x 2 N
; otherwise
Nodes V and edges E are then given as follows
V ={v1, v2, ..., vp}vy
= (uy
, ty
, Iy
),
E ={e1, e2, ..., eq}ez
=(ua
, ub
,⇤z
)
with Ii
= {i1, i2, ..., io} = f1(ci) [ f2(ci) [ ... [ fn
(ci
) beingthe result of the concatenation of all identifiers found by allmatching functions2. An edge exists between any two nodesthat share a unique subset of all the cascade identifiers thatwere found for them. This subset and none of its subsets ispart of the identifiers found for any node that was created in thetime period between when the two linked nodes were created.
⇤z
={ir
|ir
2 Ia
^ ir
2 Ib
,
8ir
! V 0 =
{vc
|vc
= (uc
,tc
, Ic
), ir
2 Ic
^ ta
tc
tb
} = ;,vc
2 V, r 2 N, r |Ib
|}
A node that contains a cascade identifier that was notdetected for any other nodes before is called the identifierroot. Beside this we call a node without any incoming edgesa network root and node that has no outgoing edges a stub.Our cascade model clearly yields different outputs dependingon the data to hand (e.g. determined by the extent of the
2Please note that [20] contains an unintentionally malformed equation forthis as the wrong symbol was used to refer to the concatenation of the matchingfunctions.
Web crawl), and the matching algorithms determining whichcascade identifiers will be spotted (e.g. reuse of hashtags,URIs, quotes, images, or maybe exploiting wider semanticsor sentiment) as depicted in Figure ??.
Fig. 1. Depending on the applied matching functions, different transcendentalinformation cascade representations can be generated for the same input data.
A fictive example of a transcendental cascade based on ourmodel is shown in Figure 2. Consider a system that featureshashtags as an established form of identifying content patterns.The visualisation uses the following approach to representdistinct identifiers and time: Nodes are chronologically orderedalongside the horizontal dimension from left (the oldest node)to right (the most recent node); additionally nodes are orderedalongside the vertical dimension depending on the set ofidentifiers present in a node (each unique set is assigned toa distinct level). Consequently, the visualisation represents thecontent creation sequence (“#A”) - (“#A#B”) - (“#A”) - (“#A”)- (“#A#B#C”) - (“#C”) - (“#A”) - (“#B#D”) - (“#A”).
Fig. 2. Example of a cascade that emerges along five different identifiers.#A, #B, #A#B#C, #B#D and #C are fictive hashtags (or hashtag combinationsresepectively) treated as the indentifying content patterns
In order to understand how edges are labelled we highlightthe sub-graph involving the nodes 2, 3, 4, and 5. Conformingto our cascade model an edge exist between nodes 2 and 3
only local understanding of its use but also an abstract globalview. This lets us propose a new model that we call transcen-dental information cascades. Informed by Kleinbergs work onbursty structures in document streams [2] it regards time asthe only ascertainable condition for relationships between anypairs of resources, meaning that we focus on coincidence ofinformation sharing activities rather than socially-determinedconditionality.
In [20] we presented the initial definition of a transcenden-tal information cascade as a 4-tupel TC = (V,E,R, F ). This4-tupel represents a directed network consisting of a set ofnodes V and edges E, derived when applying a set of matchingfunctions F to a set of resources R = {r1, r2, ..., rm}, r
i
=(u
i
, ti
, ci
), where every ui
is a unique identifier of a resourceri
that was shared at the time ti
with the content ci
. Nodes inthe network are those resources from R that contain a set I
i
ofone or multiple cascade identifiers. A cascade identifier is anyunique informational pattern that is recognized by applyinga matching function to the content or any other inherentproperties of a resource (e.g. simple string matching algorithmsto identify keywords in content). Formally a matching functionfk
2 F, k 2 N, k n is defined as:
fk
(ci
) =
8>>>>><
>>>>>:
{i1, i2, ..., ix} if fk
matches patterns{i1, i2, ..., ix} in c
i
x 2 N
; otherwise
Nodes V and edges E are then given as follows
V ={v1, v2, ..., vp}vy
= (uy
, ty
, Iy
),
E ={e1, e2, ..., eq}ez
=(ua
, ub
,⇤z
)
with Ii
= {i1, i2, ..., io} = f1(ci) [ f2(ci) [ ... [ fn
(ci
) beingthe result of the concatenation of all identifiers found by allmatching functions2. An edge exists between any two nodesthat share a unique subset of all the cascade identifiers thatwere found for them. This subset and none of its subsets ispart of the identifiers found for any node that was created in thetime period between when the two linked nodes were created.
⇤z
={ir
|ir
2 Ia
^ ir
2 Ib
,
8ir
! V 0 =
{vc
|vc
= (uc
,tc
, Ic
), ir
2 Ic
^ ta
tc
tb
} = ;,vc
2 V, r 2 N, r |Ib
|}
A node that contains a cascade identifier that was notdetected for any other nodes before is called the identifierroot. Beside this we call a node without any incoming edgesa network root and node that has no outgoing edges a stub.Our cascade model clearly yields different outputs dependingon the data to hand (e.g. determined by the extent of the
2Please note that [20] contains an unintentionally malformed equation forthis as the wrong symbol was used to refer to the concatenation of the matchingfunctions.
Web crawl), and the matching algorithms determining whichcascade identifiers will be spotted (e.g. reuse of hashtags,URIs, quotes, images, or maybe exploiting wider semanticsor sentiment) as depicted in Figure ??.
Fig. 1. Depending on the applied matching functions, different transcendentalinformation cascade representations can be generated for the same input data.
A fictive example of a transcendental cascade based on ourmodel is shown in Figure 2. Consider a system that featureshashtags as an established form of identifying content patterns.The visualisation uses the following approach to representdistinct identifiers and time: Nodes are chronologically orderedalongside the horizontal dimension from left (the oldest node)to right (the most recent node); additionally nodes are orderedalongside the vertical dimension depending on the set ofidentifiers present in a node (each unique set is assigned toa distinct level). Consequently, the visualisation represents thecontent creation sequence (“#A”) - (“#A#B”) - (“#A”) - (“#A”)- (“#A#B#C”) - (“#C”) - (“#A”) - (“#B#D”) - (“#A”).
Fig. 2. Example of a cascade that emerges along five different identifiers.#A, #B, #A#B#C, #B#D and #C are fictive hashtags (or hashtag combinationsresepectively) treated as the indentifying content patterns
In order to understand how edges are labelled we highlightthe sub-graph involving the nodes 2, 3, 4, and 5. Conformingto our cascade model an edge exist between nodes 2 and 3
Capturing the unintended action resulting from information sharing activities of human collectives.�
t Document stream
Transcendental Information Cascade
Temporal text/data mining�
!" !#
!$
%
& '$'$ % % '" '" &'" & % '# '# '# '#'# % % &
& (( ( ( & & &(( ( ( (( ( ( (( ( (
'$
()*+,( ()*+,( -
./012'
Figure 5: Decoding the collection
where δ(dt′j , i) = 1 if word dt′j is labeled as theme i;otherwise δ(dt′j , i) = 0. W is the size of the sliding windowin terms of time points.
NStrength(i, t) =AStrength(i, t)
kj=1 AStrength(j, t)
=t′∈[t−W
2 ,t+ W2 ]
|dt′ |j=1 δ(dt′j , i)
t′∈[t−W2 ,t+ W
2 ] |dt′ |
The life cycle of each theme can then be modeled as thevariation of the theme strengths over time.
The analysis of theme life cycles thus involves the follow-ing four steps: (1) Construct an HMM to model how themesshift between each other in the collection. (2) Estimate theunknown parameters of the HMM using the whole streamcollection as observed example sequence. (3) Decode the col-lection and label each word with the hidden theme modelfrom which it is generated. (4) For each trans-collectiontheme, analyze when it starts, when it terminates, and howit varies over time.
5. EXPERIMENTS AND RESULTS
5.1 Data PreparationTwo data sets are constructed to evaluate the proposed
ETP discovery methods. The first, tsunami news data, con-sists of news articles about the event of Asia Tsunami datedDec. 19 2004 to Feb. 8 2005. We downloaded 7468 newsarticles from 10 selected sources, with the keyword query”tsunami”. As shown in Table 1, three of the sources are inAsia, two of them are in Europe and the rest are in the U.S.
News Source Nation News Source NationBBC UK Times of India IndiaCNN US VOA US
Economics Times India Washington Post USNew York Times US Washington Times US
Reuters UK Xinhua News China
Table 1: News sources of Asia Tsunami data set
The second data set consists of the abstracts in KDD con-ference proceedings from 1999 to 2004. All the abstractswere extracted from the full-text pdf files downloaded fromthe ACM digital library2. 2 articles were excluded becausethey were not recognizable by the pdf2text software in Linux,
2http://www.acm.org/dl
giving us a total of 496 abstracts. The basic statistics of thetwo data sets are shown in Table 2. We intentionally didnot perform stemming or stop word pruning in order to testthe robustness of our algorithms.
Data Set # of docs AvgLength Time rangeAsia Tsunami 7468 505.24 12/19/04 - 02/08/05
KDD Abs. 496 169.50 1999-2004
Table 2: Basic information of data sets
On each data set, two experiments are designed: (1) Par-tition the collection into time intervals, discover the themeevolution graph and identify theme evolution threads. (2)Discover trans-collection themes and analyze their life cy-cles. The results are discussed below.
5.2 Experiments on Asia TsunamiSince news reports on the same topic may appear ear-
lier in one source but later in another (i.e., “report delay”),partitioning news articles into overlapping, as opposed tonon-overlapping subcollections seems to be more reasonable.We thus partition the our news data into 5 time intervals,each of which spans about two weeks and is half overlappingwith the previous one. We use the mixture model discussedin Section 3 to extract the most salient themes in each timeinterval. We set the background parameter λB = 0.95 andnumber of themes in each time interval to be 6. The varia-tion of λB is discussed later. Table 3 shows the top 10 wordswith the highest probabilities in each theme span. We seethat most of these themes suggest meaningful subtopics inthe context of the Asia tsunami event.
!"##$%#&'($)&"*"%+
,-.#
/$%0"(+#&'$(&.$%1+-$%
2"(#$%13&456"(-"%0"
7$%1+-$%&81+09
2$3-+-013&:##;"#
/(-+-0-#)&&$%&:(1<
=+1+-#+-0#
Figure 6: Theme evolution graph for Asia Tsunami
With these theme spans, we use KL-divergence to furtheridentify evolutionary transitions. Figure 6 shows a themeevolution graph discovered from Asia Tsunami data whenthe threshold for evolution distance is set to ξ = 12. FromFigure 6, we can see several interesting evolution threadswhich are annotated with symbols.
The thread labeled with a may be about warning systemsfor tsunami. It is interesting to see that the nation coveredby the thread seems to have evolved from the U.S. in periodl1, to China in l2, and then to Japan in l3. In thread b,themes 3, 4, and 5 in period l1 indicate the aids and finan-cial support from UN, from local area, and special aids forchildren, respectively. They all show an evolutionary transi-tion to theme 2 (donation from UK) and theme 3 (aid from
[4] Subašić, I., & Berendt, B. (2013). Story graphs: Tracking document set evolution using dynamic graphs. Intelligent Data Analysis, 17(1), 125-147.�[5] Mei, Q., & Zhai, C. (2005, August). Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 198-207). ACM.�
[5]
“The key notion of TTM is burstiness – sudden increases in frequency of text fragments, and all TTM methods aim to model burstiness.” [4]�
t t
F1
Fn
… …
C11
C21
C22 C23
t0 t1 t2 t3 t4 t5 t7 t8 t6
t6 -‐ t0
t2 -‐ t1 t8 -‐ t2 t4 -‐ t2
t7 -‐ t4
t5 -‐ t3
t1 -‐ t0 t2 -‐ t1
t4 -‐ t1
t4 -‐ t3 t6 -‐ t5
t8 -‐ t6
t7 -‐ t4
t5 -‐ t4
t3 -‐ t2
There is more than one “reality” �
Analyzing low-level properties of the multiple states of a system that exist at the same time�
Fig. 4. Overview of the results of the cascade comparison. Cascade size distribution and wiener index are plotted on a log-log scale; identifier entropy isplotted with a log scale on the y-axis.
contain one or few identifiers equally distributed. Very largehashtag cascades in contrast become very fuzzy, meaning thateven though loads of identifiers are covered (indicating manyinformation) the informativeness of the entire cascade is veryshallow. The other three entropy distribution profiles insteadshow that there is a more even distribution of information innon-trivial cascades with multiple identifiers, with the largestcascades still falling into the same category as the largesthashtag cascade.
VI. DISCUSSION
In this section we reflect the results of our study againstthe original questions asked, and then consider how ourcontent-centric approach to cascade construction provides analternative way to consider information flows on the Web.
A. Summary of Experiments
Our experiments show that it is possible to generate struc-turally different cascades from a single source dataset, depend-ing on the pattern matching used. By exploring cascade sub-structures within each of the four resulting cascade datasets,we found that in comparison to cascades that use actual object
identifiers (KID, APH, URIs), cascades which are based onhashtags tend to be either trivial (single identifier cascades)or consist of multiple roots (the origin of the cascade) thatare merging and diverging so that they form one massiveconnected component.
For instance, in A1 cascades, there may be two hashtags,#A and #B, which originate in different, independent posts, bydifferent users. However, over the course of the evolution of thecascade, these hashtags merge, most likely as a consequence ofa user bringing them together in a single post. These hashtagsthen may become part of several merges and diverges, whichcan end up located within a single stub. As a consequence ofthis, information can be perceived as lost, as they do not remainpresent in a distinct cascade, but are subsumed by anotherone. This is reflected in Figure 4, where a large proportion ofthe node types are those that are either merging or diverginginformation.
In comparison to this, the results of cascade types A2 andA3 reveal cascades which are less structurally viral (a lowerwiener index), thus tending to form shorter chains of singleor few identifier cascades. As a consequence, informationis rarely lost or gained as cascades do not merge often. Itis more likely that when a branch node is observed (for
Fig. 4. Overview of the results of the cascade comparison. Cascade size distribution and wiener index are plotted on a log-log scale; identifier entropy isplotted with a log scale on the y-axis.
contain one or few identifiers equally distributed. Very largehashtag cascades in contrast become very fuzzy, meaning thateven though loads of identifiers are covered (indicating manyinformation) the informativeness of the entire cascade is veryshallow. The other three entropy distribution profiles insteadshow that there is a more even distribution of information innon-trivial cascades with multiple identifiers, with the largestcascades still falling into the same category as the largesthashtag cascade.
VI. DISCUSSION
In this section we reflect the results of our study againstthe original questions asked, and then consider how ourcontent-centric approach to cascade construction provides analternative way to consider information flows on the Web.
A. Summary of Experiments
Our experiments show that it is possible to generate struc-turally different cascades from a single source dataset, depend-ing on the pattern matching used. By exploring cascade sub-structures within each of the four resulting cascade datasets,we found that in comparison to cascades that use actual object
identifiers (KID, APH, URIs), cascades which are based onhashtags tend to be either trivial (single identifier cascades)or consist of multiple roots (the origin of the cascade) thatare merging and diverging so that they form one massiveconnected component.
For instance, in A1 cascades, there may be two hashtags,#A and #B, which originate in different, independent posts, bydifferent users. However, over the course of the evolution of thecascade, these hashtags merge, most likely as a consequence ofa user bringing them together in a single post. These hashtagsthen may become part of several merges and diverges, whichcan end up located within a single stub. As a consequence ofthis, information can be perceived as lost, as they do not remainpresent in a distinct cascade, but are subsumed by anotherone. This is reflected in Figure 4, where a large proportion ofthe node types are those that are either merging or diverginginformation.
In comparison to this, the results of cascade types A2 andA3 reveal cascades which are less structurally viral (a lowerwiener index), thus tending to form shorter chains of singleor few identifier cascades. As a consequence, informationis rarely lost or gained as cascades do not merge often. Itis more likely that when a branch node is observed (for
Fig. 4. Overview of the results of the cascade comparison. Cascade size distribution and wiener index are plotted on a log-log scale; identifier entropy isplotted with a log scale on the y-axis.
contain one or few identifiers equally distributed. Very largehashtag cascades in contrast become very fuzzy, meaning thateven though loads of identifiers are covered (indicating manyinformation) the informativeness of the entire cascade is veryshallow. The other three entropy distribution profiles insteadshow that there is a more even distribution of information innon-trivial cascades with multiple identifiers, with the largestcascades still falling into the same category as the largesthashtag cascade.
VI. DISCUSSION
In this section we reflect the results of our study againstthe original questions asked, and then consider how ourcontent-centric approach to cascade construction provides analternative way to consider information flows on the Web.
A. Summary of Experiments
Our experiments show that it is possible to generate struc-turally different cascades from a single source dataset, depend-ing on the pattern matching used. By exploring cascade sub-structures within each of the four resulting cascade datasets,we found that in comparison to cascades that use actual object
identifiers (KID, APH, URIs), cascades which are based onhashtags tend to be either trivial (single identifier cascades)or consist of multiple roots (the origin of the cascade) thatare merging and diverging so that they form one massiveconnected component.
For instance, in A1 cascades, there may be two hashtags,#A and #B, which originate in different, independent posts, bydifferent users. However, over the course of the evolution of thecascade, these hashtags merge, most likely as a consequence ofa user bringing them together in a single post. These hashtagsthen may become part of several merges and diverges, whichcan end up located within a single stub. As a consequence ofthis, information can be perceived as lost, as they do not remainpresent in a distinct cascade, but are subsumed by anotherone. This is reflected in Figure 4, where a large proportion ofthe node types are those that are either merging or diverginginformation.
In comparison to this, the results of cascade types A2 andA3 reveal cascades which are less structurally viral (a lowerwiener index), thus tending to form shorter chains of singleor few identifier cascades. As a consequence, informationis rarely lost or gained as cascades do not merge often. Itis more likely that when a branch node is observed (for
4�
1� 15�
10�
Tags URIs
KID & APH
Single node motifs
long uniform paths
short uniform paths
long non-uniform paths
Analyzing low-level properties of the multiple states of a system that exist at the same time�
Tags URIs
KID &
APH
Identifier entropy
Fig. 4. Overview of the results of the cascade comparison. Cascade size distribution and wiener index are plotted on a log-log scale; identifier entropy isplotted with a log scale on the y-axis.
contain one or few identifiers equally distributed. Very largehashtag cascades in contrast become very fuzzy, meaning thateven though loads of identifiers are covered (indicating manyinformation) the informativeness of the entire cascade is veryshallow. The other three entropy distribution profiles insteadshow that there is a more even distribution of information innon-trivial cascades with multiple identifiers, with the largestcascades still falling into the same category as the largesthashtag cascade.
VI. DISCUSSION
In this section we reflect the results of our study againstthe original questions asked, and then consider how ourcontent-centric approach to cascade construction provides analternative way to consider information flows on the Web.
A. Summary of Experiments
Our experiments show that it is possible to generate struc-turally different cascades from a single source dataset, depend-ing on the pattern matching used. By exploring cascade sub-structures within each of the four resulting cascade datasets,we found that in comparison to cascades that use actual object
identifiers (KID, APH, URIs), cascades which are based onhashtags tend to be either trivial (single identifier cascades)or consist of multiple roots (the origin of the cascade) thatare merging and diverging so that they form one massiveconnected component.
For instance, in A1 cascades, there may be two hashtags,#A and #B, which originate in different, independent posts, bydifferent users. However, over the course of the evolution of thecascade, these hashtags merge, most likely as a consequence ofa user bringing them together in a single post. These hashtagsthen may become part of several merges and diverges, whichcan end up located within a single stub. As a consequence ofthis, information can be perceived as lost, as they do not remainpresent in a distinct cascade, but are subsumed by anotherone. This is reflected in Figure 4, where a large proportion ofthe node types are those that are either merging or diverginginformation.
In comparison to this, the results of cascade types A2 andA3 reveal cascades which are less structurally viral (a lowerwiener index), thus tending to form shorter chains of singleor few identifier cascades. As a consequence, informationis rarely lost or gained as cascades do not merge often. Itis more likely that when a branch node is observed (for
Fig. 4. Overview of the results of the cascade comparison. Cascade size distribution and wiener index are plotted on a log-log scale; identifier entropy isplotted with a log scale on the y-axis.
contain one or few identifiers equally distributed. Very largehashtag cascades in contrast become very fuzzy, meaning thateven though loads of identifiers are covered (indicating manyinformation) the informativeness of the entire cascade is veryshallow. The other three entropy distribution profiles insteadshow that there is a more even distribution of information innon-trivial cascades with multiple identifiers, with the largestcascades still falling into the same category as the largesthashtag cascade.
VI. DISCUSSION
In this section we reflect the results of our study againstthe original questions asked, and then consider how ourcontent-centric approach to cascade construction provides analternative way to consider information flows on the Web.
A. Summary of Experiments
Our experiments show that it is possible to generate struc-turally different cascades from a single source dataset, depend-ing on the pattern matching used. By exploring cascade sub-structures within each of the four resulting cascade datasets,we found that in comparison to cascades that use actual object
identifiers (KID, APH, URIs), cascades which are based onhashtags tend to be either trivial (single identifier cascades)or consist of multiple roots (the origin of the cascade) thatare merging and diverging so that they form one massiveconnected component.
For instance, in A1 cascades, there may be two hashtags,#A and #B, which originate in different, independent posts, bydifferent users. However, over the course of the evolution of thecascade, these hashtags merge, most likely as a consequence ofa user bringing them together in a single post. These hashtagsthen may become part of several merges and diverges, whichcan end up located within a single stub. As a consequence ofthis, information can be perceived as lost, as they do not remainpresent in a distinct cascade, but are subsumed by anotherone. This is reflected in Figure 4, where a large proportion ofthe node types are those that are either merging or diverginginformation.
In comparison to this, the results of cascade types A2 andA3 reveal cascades which are less structurally viral (a lowerwiener index), thus tending to form shorter chains of singleor few identifier cascades. As a consequence, informationis rarely lost or gained as cascades do not merge often. Itis more likely that when a branch node is observed (for
Fig. 4. Overview of the results of the cascade comparison. Cascade size distribution and wiener index are plotted on a log-log scale; identifier entropy isplotted with a log scale on the y-axis.
contain one or few identifiers equally distributed. Very largehashtag cascades in contrast become very fuzzy, meaning thateven though loads of identifiers are covered (indicating manyinformation) the informativeness of the entire cascade is veryshallow. The other three entropy distribution profiles insteadshow that there is a more even distribution of information innon-trivial cascades with multiple identifiers, with the largestcascades still falling into the same category as the largesthashtag cascade.
VI. DISCUSSION
In this section we reflect the results of our study againstthe original questions asked, and then consider how ourcontent-centric approach to cascade construction provides analternative way to consider information flows on the Web.
A. Summary of Experiments
Our experiments show that it is possible to generate struc-turally different cascades from a single source dataset, depend-ing on the pattern matching used. By exploring cascade sub-structures within each of the four resulting cascade datasets,we found that in comparison to cascades that use actual object
identifiers (KID, APH, URIs), cascades which are based onhashtags tend to be either trivial (single identifier cascades)or consist of multiple roots (the origin of the cascade) thatare merging and diverging so that they form one massiveconnected component.
For instance, in A1 cascades, there may be two hashtags,#A and #B, which originate in different, independent posts, bydifferent users. However, over the course of the evolution of thecascade, these hashtags merge, most likely as a consequence ofa user bringing them together in a single post. These hashtagsthen may become part of several merges and diverges, whichcan end up located within a single stub. As a consequence ofthis, information can be perceived as lost, as they do not remainpresent in a distinct cascade, but are subsumed by anotherone. This is reflected in Figure 4, where a large proportion ofthe node types are those that are either merging or diverginginformation.
In comparison to this, the results of cascade types A2 andA3 reveal cascades which are less structurally viral (a lowerwiener index), thus tending to form shorter chains of singleor few identifier cascades. As a consequence, informationis rarely lost or gained as cascades do not merge often. Itis more likely that when a branch node is observed (for
varying profiles of increasing randomness with growing cascade size
t
F1
Fn
… …
C11
C21
C22 C23
Formalising the multiple possible representations of a system at any time and their relationships.��Not all representing purposeful action but reflecting useful informational properties.�
By focusing only on the coincidence of information occurrence, we can capture and analyse emergent collective action across system boundaries and independent from social network contexts.�����Markus Luczak-Roesch �@mluczak�http://markus-luczak.de�
Source: Giulia Forsythe, http://goo.gl/6hpZ0W, CC BY-NC-SA 2.0�
Refer
ence
s�• Markus Luczak-Roesch, Ramine Tinati, Kieron O'Hara, and Nigel
Shadbolt. 2015. Socio-technical Computation. In Proceedings of the 18th ACM Conference Companion on Computer Supported Cooperative Work & Social Computing (CSCW'15 Companion). ACM, New York, NY, USA, 139-142. http://doi.acm.org/10.1145/2685553.2698991 �
• Markus Luczak-Roesch, Ramine Tinati, and Nigel Shadbolt. 2015. When Resources Collide: Towards a Theory of Coincidence in Information Spaces. To appear in WWW’15 Companion, May 18–22, 2015, Florence, Italy. http://dx.doi.org/10.1145/2740908.2743973 �