22
Mining personal media thresholds for opinion dynamics and social influence Alex Meandzija

Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Mining personal media thresholds for opinion

dynamics and social influence Alex Meandzija

Page 2: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Defining the Problem

Dataset: I is a set of discrete items (i) T is a set of transactions (t) such that t ⊆ I Question: What sets of items are frequently found together in the transactions?

Transaction ID Items

1 ABC

2 DE

3 AB

4 CDE

Page 3: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Example Applications

• Market Basket Analysis Determining frequently co-purchased products can inform store layout

• Survey Data Discrete data from surveys can be mined for trends and profiles

• Website Logs Pages frequently visited in the same session by users can be hyperlinked to

each other

Page 4: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Support Measure

Support is the frequency at which an item or item set X is found in the transactions T and is defined as:

Sup(X,T) = |{t ∊ T; X ⊆ t}| |T|

Example: Supp(A) = .5 Supp(C) = .5 Supp(AC) = .25

Transaction ID Items

1 ABC

2 DE

3 AB

4 BCD

Page 5: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Apriori Algorithm

Theorem: Supp(AB) ≤ Supp(A) & Supp(AB) ≤ Supp(B) Proof: any set Y such that Y⊆AB must satisfy Y⊆A and A⊈AB Therefore, if Supp(X ) ≤ Supp_min we can eliminate all item sets Y such that Y⊆X. The Apriori Algorithm uses this approach to build the frequent itemsets.

Page 6: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Apriori Algorithm (continued)

Figure and Plot from Datamining and Analysis P. 247-248 (Zaki & Meira 2014 )

Page 7: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

ECLAT Optimization

Horizontal Database formatting allows for faster computation of support values. t(XY) = t(X) ⋂ t(Y) & Supp(X) = |t(X)| New prefixes can be generated by intersecting their bases, and their support is simply its cardinality.

Transaction ID Items

1 ABC

2 DE

3 AB

4 BCD

Items A B C D E

TIDs 1 3

1 3 4

1 4

2 4

2

Page 8: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

ECLAT Optimization (continued)

Figure and Plot from Datamining and Analysis P. 251-252 (Zaki & Meira 2014 )

Page 9: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Association Rules

Often, it is more important to find directional associations rather than simple frequent item sets. We call these associations rules. Examples: Eggs, Butter, and Sugar → Flour In order to draw association rules from the frequent item sets one must use interestingness measures.

Page 10: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Interestingness Measures

Confidence:

Conf(X→Y) = Supp(XY) Supp(X)

Lift:

Lift(X→Y) = Supp(XY) Supp(X) ∗ Supp(Y)

Jaccard:

Jacc(X→Y) = Supp(XY) Supp(X) + Supp(Y) − Supp(XY)

Page 11: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Media Threshold Survey

Participants asked to self-identify the number of social media items they would need to see before forming or shifting opinions given:

Media types: 1. Images: for still photos and

drawings 2. Videos: for any animations or

moving picture 3. Messages: for text, tweets,

and Facebook posts

Controversy levels: 1. Low: minimal (some people

would form an opinion) 2. Medium: generally

controversial (most would form an opinion)

3. High: very controversial (most or all would form an opinion)

4. None: no reference to controversy

Media sources: 1. Unknown: individual has no

knowledge of the source 2. Like-minded: the source of the

media generally thinks similarly to the recipient

3. Different-minded: the source of the media generally thinks differently from the recipient

Page 12: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Binning Survey Data

In order to mine a dataset, it must be split up into a discrete set of items. Contiguous or open-ended response should be binned such that: 1. The bin resolution is broad enough to keep frequencies above the

minimal support. 2. The bin resolution is fine enough that information is not lost in the

binning process.

Page 13: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Individual responses log2 binned Responses binned by %-deviation from Avg.

Page 14: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can
Page 15: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can
Page 16: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Filtering Through the Rules

Filter(min) FC rule count FC %-remain FS rule count FS %-remain

Support(.12/.15) 873,998 100.00 2,584,330 100.00

Confidence(.6) 360,644 41.26 1,096,151 42.42

Lift(3) 3,801 0.43 68,878 2.67

Maximal 784 0.09 25,329 0.98

One of the major challenges of data mining is the massive quantity of rules it can generate. Interestingness measures, problem considerations, and bloat reduction measures can greatly reduce the overall quantity of rules. Examples: • Interestingness measure: Requiring a sizeable Minimum lift or Confidence. • Problem Considerations: If one is looking to find the variables that effect average response, requiring

average response on the RHS. • Bloat Reduction: Accepting only maximal frequent item sets (Eliminating any FISs which have supersets

with equivalent support.)

Page 17: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Filtering – Community Detection?

The prior filtering techniques can help to pull out uninteresting or unimportant rules for our data, but they little in the way of parsing the data we have found. Community detection provides a way to further filter our results by placing them into communities which can be used the base unit for analysis. Additionally, community detection can find rule clusters with substitutable items (butter and margarine) and help to pull out unneeded complexity.

Page 18: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Fixed-Context rules as Network Fixed-Source rules as Network

Rules and Items as Bipartite Graph

Page 19: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

SpeakEasy Community Detection Influences on choice of algorithm: • Label Propagation used based community detection resilient to graph topology. • History based approach reduces the impact of random initial conditions and prevents cascades. • Multiple runs reconciled with ARI to find most representative partition. • Does not require user to set the number of partitions. • Written here at RPI!

Page 20: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

SpeakEasy Community Algorithm

Page 21: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Preliminary Results – Modular Rules The first major category of communities were sets of rules where only a couple items differed from any given rule.

Page 22: Mining personal media thresholds for opinion dynamics and ...cs.rpi.edu/~szymansk/fns.18/slides/27.3_Presentation_2018_Datamining.pdf• Survey Data Discrete data from surveys can

Preliminary Results – Equivalent Items The second set of Communities of note were communities where mutually exclusive items co-occurred.