Upload
knowledge-media-institute
View
1.424
Download
0
Embed Size (px)
DESCRIPTION
Gave this talk at SSSW'13; The 10th Summer School on Ontology Engineering and the Semantic Web 7 - 13 July, 2013. Cercedilla, Spain. http://sssw.org/2013/
Citation preview
Social Media Analytics with a pinch of semantics Harith Alani
http://people.kmi.open.ac.uk/harith/
@halani
harith-alani
@halani
Outline of my talk
§ I’ll start talking § Then I’ll finish talking § You’ll wonder what you’ve learned! § You will clap regardless § You’ll be convinced you learned nothing
§ You could be right! § But you’re wrong of course
§ We go to the bar tonight and forget all about the talk!
• Why social media analytics? – It’s where everyone is! – Real time information – Low cost – Much of it
Survey of 3800 marketers on how they use social media to grow their business
Social Media for Businesses
§ “they can't be forced to use social apps, they must opt-in”
§ “need a detailed understanding of social networks: how people are currently working, who they work with and what their needs are”
5
Measuring Social Media
6
Tools for monitoring social networks
LinkedIn Group Analytics
Facebook Insights • Provides measurements
on FB Page performance
• Provides demographic data about visitors, and their engagement with posts
• “Experiment with different types of posts to see what your audience responds to best.”
Social Media Challenges • Integration – How to represent and
connect this data? • Behaviour
– How can we measure and predict behaviour?
– Which behaviours are good/bad in which community type?
• Change – Can we influence behaviour
change? • Community Health
– What health signs should we look for?
– How to predict them?
• Engagement – How can we maximise
engagement?
• Sentiment – How to measure it? track it? – Can we predict sentiment
towards entities (brands, people, events)?
Forum on a celebrity
Forum on transport
June 25, 2013
In-house Social Platforms
Jan 29, 2013
Semantically-Interlinked Online Communities (SIOC) • SIOC aims to enable the integration of online community information. • SIOC provides a Semantic Web ontology for representing rich data from the Social Web
in RDF
sioc-project.org
Semantics in FB Open Graph
Behaviour Analysis
Why monitor behaviour?
§ Understand impact of behaviour on community evolution § Forecast community future § Learn when intervention might be needed § Learn which behaviour should be encouraged or
discouraged § Find what could trigger certain behaviours § What is the best mix of behaviour to increase
engagement in the community § To see which users need more support, which ones
should be confined, and which ones should be promoted
Behaviour analysis in Social Media
§ Bottom Up analysis § Every community member
is classified into a “role” § Unknown roles might be
identified § Copes with role changes
over time ini#ators
lurkers
followers
leaders
Structural, social network, reciprocity, persistence, participation
Feature levels change with the dynamics of the community
Associations of roles with a collection of feature-to-level mappings e.g. in-degree -> high, out-degree -> high
Run rules over each user’s features and derive the community role composition
Modelling user features and interactions
Encoding Rules in Ontologies with SPIN
Clustering for identifying emerging roles
– Map the distribution of each feature in each cluster to a level (i.e. low, mid, high)
– Align the mapping patterns with role labels
Table 1: Correlation Coe!cients of dimensions
Dispersion Engagement Contribution Initiation Quality PopularityDispersion 1.000 0.277 0.168 0.389 0.086 0.356Engagement 0.277 1.000 0.939** 0.284 0.151 0.926**Contribution 0.168 0.939** 1.000 0.274 0.086 0.909**Initiation 0.389 0.284 0.274 1.000 -0.059 0.513Quality 0.086 0.151 0.086 -0.059 1.000 0.065Popularity 0.356 0.926** 0.909** 0.513 0.065 1.000
Figure 7: Cumulative density functions of each dimension showingthe skew in the distributions for initiated and in-degree ratio
same forum and do not deviate away, at the other ex-treme very few users are found to post in a large rangeof forums. For initiated (initiation) and in-degree ratio(popularity) the density functions are skewed towardslow values where only a few users initiate discussionsand are replied to by large portions of the community.Average points per post (quality) is also skewed to-wards lower values indicating that the majority of usersdo not provide the best answers consistently.These plots indicate that feature levels derived from
these distributions will be skewed towards lower values,for instance for initiated the definition of high for thisfeature is anything exceeding 1.55x10!5.The distribution of each dimension is shown in Fig-
ure 8 for each of the 11 induced clusters. We assessthe distribution of each feature for each cluster againstthe levels derived from the equal-frequency binning ofeach feature, thereby generating a feature-to-level map-
Figure 8: Boxplots of the feature distributions in each of the 11 clus-ters. Feature distributions are matched against the feature levels de-rived from equal-frequency binning
ping. This mapping is shown in Table 2 where certainclusters are combined together as they have the samefeature-level mapping patterns (i.e. 5,7 and 8,9). Wethen interpreted the role labels from these clusters, andtheir subsequent patterns, as follows:
• 0 - Focussed Expert Participant: this user typeprovides high quality answers but only within se-lect forums that they do not deviate from. Theyalso have a mix of asking questions and answeringthem.
• 1 - Focussed Novice: this user is focussed within afew select forums but does not provide good qual-ity content.
• 2 - Mixed Novice: is a novice across a mediumrange of topics
6
Table 2: Mapping of cluster dimensions to levels
Cluster Dispersion Initiation Quality Popularity0 L M H L1 L L L L2 M H L H3 H H H H4 L H H M5,7 H H L H6 L H M M8,9 M H H H10 L H M H
• 3 - Distributed Expert: an expert on a variety oftopics and participates across many di!erent fo-rums
• 4 - Focussed Expert Initiator: similar to cluster0 in that this type of user is focussed on certaintopics and is an expert on those, but to a large ex-tent starts discussions and threads, indicating thathis/her shared content is useful to the community
• 5.7 - Distributed Novice: participates across arange of forums but is not knowledgeable on anytopics
• 6 - Focussed Knowledgeable Member: con-tributes to only a few forums, has medium-levelexpertise (i.e. he/she is neither an expert nor anovice) and has medium popularity
• 8,9 - Mixed Expert: medium-dispersed user whoprovides high-quality content
• 10 - Focussed Knowledgeable Sink: focusseduser who has medium-level expertise but who getsa lot of the community replying to them - hence asink. Di!ers from cluster 6 in terms of popularity.
6. Analysis: Community Health
Deriving a community’s role composition providescommunity operators and hosts with amacro-level viewof how their community is operating and how it is func-tioning. Understanding what is a healthy and unhealthycomposition in a community involves analysing how agiven role composition has been associated with com-munity activity, interaction or some other measure in thepast and reusing that knowledge. Forums and communi-ties operating within the same platform may also di!ersuch that what turns a community healthy in one loca-tion may be di!erent from another. In this section wedescribe how community analysis is possible throughour presented approach to derive the role compositionof a community using semantic rules.
6.1. Experimental Setup
To demonstrate the utility of our approach we anal-ysed each of the 33 SAP communities from 2009through to 2011. Figure 9 shows how our dataset wasdivided into the tuning section - i.e. the first half of2008 in which we derived our clusters and aligned themto roles (as described in Section 5) - and the analysissection. We began with the 1st January 2009 as our col-lect date by taking a feature window 6 months prior tothis date (going back to the 2nd half of 2008) in whichwe measured the behaviour dimensions for each com-munity’s users. In order to gauge the role compositionin a community over time we move our collect date onone week at a time and use the 6-months prior to thisdate as our feature window. As Figure 9 demonstrateswe repeat this process until we reach 2011.
Figure 9: Windows used for a) tuning of the clusters and the derivationof roles and b) the analysis of community health. Role compositionis derived every week from 2009 onwards using a 6-month windowgoing back from the collection date.
By measuring the behaviour dimensions of individ-ual users in individual communities we are able to inferthe roles of the users using the semantic rules describedin Section 4. This provides a micro-level assessment ofthe roles that individual users assume. We can then lookat the macro-level by deriving the role composition of agiven community at a given point in time by measuringhow many users have a specific role. Such role compo-sition analysis allows for predictions to then be made.To demonstrate the application of such analysis we per-formed three distinct experiments (each designed to ex-plore one of our three aforementioned research ques-tions):
1. Composition Analysis: assesses the average rolecomposition in each community and clusters thembased on the compositions. We also pick out eachcommunity’s most popular role and measure whatpercentage of the community that role covers.
2. Activity Increase/Decrease: we perform a binaryclassification task such that at timestep t = k + 1we predict whether the community’s activity (i.e.number of posts) has increased or decreased since
7
• 1 - Focussed Novice: focussed within a few select forums but does not provide good quality content.
• 2 - Mixed Novice: a novice across a medium range of topics
• 3 - Distributed Expert: expert on a variety of topics and participates across many different forums
….
Mapping of cluster dimensions to levels
Correlation of behaviour with community activity
§ How existence of certain behaviour roles impact activity in an online community?
Online Community Health Analytics
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Churn Rate
FPR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
User Count
FPR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Seeds / Non−seeds Prop
FPR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Clustering Coefficient
FPR
TPR
• Machine learning models to predict community health based on compositions and evolution of user behaviour
• Churn rate: proportion of community leavers in a given time segment.
• User count: number of users who posted at least once.
• Seeds to Non-seeds ratio: proportion of posts that get responses to those that don’t
• Cluster coefficient: extent to which the community forms a clique.
Health categories
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Churn Rate
FPR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
User Count
FPR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Seeds / Non−seeds Prop
FPR
TPR
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Clustering Coefficient
FPR
TPR
False Positive Rate
False Positive Rate False Positive Rate
False Positive Rate
True
Pos
itive
Rat
e Tr
ue P
ositi
ve R
ate
True
Pos
itive
Rat
e Tr
ue P
ositi
ve R
ate
The fewer Focused Experts in the community, the more posts will received a reply! There is no “one size fits all” model!
Community Types
Community types § Do communities of different types behave differently?
§ Analysed IBM Connections communities to study participation, activity, and behaviour of users
§ Help us to know what is normal and healthy in a community, and what is not!
§ Compare exhibited community with what users say they use the community for § Does macro behaviour match micro needs?
Community types Community
Wiki Page Blog Post Forum Thread
Wiki Edit Blog Comment Forum Reply
Bookmark Tag
File § Data consists of non-
private info on IBM Connections Intranet deployment
§ Communities: § ID § Creation date § Members § Used applications
(blogs, Wikis, forums)
§ Forums: § Discussion threads § Comments § Dates § Authors and
responders
Community types
§ Muller, M. (CHI 2012) identified five distinct community types in IBM Connections: § Communities of Practice (CoP): for sharing information and
network § Teams: shared goal for a particular project or client § Technical Support: support for a specific technology § Idea Labs Communities: for focused brainstorming § Recreation Communities: recreational activities unrelated to work.
§ Our data consisted of 186 most active
communities: § 100 CoPs, 72 Teams, and 14 Techs communities § No Ideas of Recreation communities
Behaviour in different community types
• Members of Team communities are more engaged, popular, and initiate more discussions
• Tech users are mostly active in a few communities, and don’t initiate of contribute much
• CoP users disperse their activity across many communities, and contribute more
Macro
Mean and Standard Deviation (in brackets) of the distribution of micro features within the different community types
Need an ontology and inference
engine of community types
Matthew Rowe, Miriam Fernandez, Harith Alani, Inbal Ronen, Conor Hayes and Marcel Karnstedt: Behaviour Analysis across different types of Enterprise Online Communities. ACM WebSci 2012
User needs and value
41%
47%
8% 3% 1%
[Quality of content] .
18%
46% 26%
8% 2%
[Number of members] .
31%
53%
13%
2% 1%
[Diversity of expertise] .
2% 15%
30% 30
%
23%
[Level of entertainment] .
44% 50%
4% 2%
[Provides accurate answers to questions].
38%
55%
5% 2%
[Contributes good quality and well presented content].
21%
60%
14% 5%
[Provides quick answers to questions].
38%
49%
8% 5%
[Has good expertise in a domain].
11%
58%
25%
6%
[Contributes content frequently]
1% 17%
34% 30%
18%
[Has many contacts (e.g. Facebook friends)].
2% 14%
32% 31%
21%
[Has many fans (e.g. Twitter followers, positive
replies to posts)].
Community Value
Community Member Value
Value of community features Measurements of value and needs satisfaction • Assessing user engagement and needs
satisfaction
• Measuring value of individual users to their communities
• Measuring value of communities to their members
Monitoring Online Communities
Maslow’s Hierarchy of Needs
Mapping Maslow’s hierarchy of needs to social media communities
Self_actualisation: Altruistic behavior:
helping others, replying to queries, giving rates
Self-Esteem: Need to be rated and ranked higher in the
community, promotion of roles from novice to active member to
expert and moderator
Social Belongingness: Need to be part of the community, groups, need for interaction and
engagement
Security: Need for privacy, security from identity theft, security from online abuse, trolling and bullying
Physical: Need for Hardware, Software, Information, Internet access.
User groups based on ‘needs’ High Helping Need • Reply a lot • Last 17% longer in system • Contribute to many forums • High and consistent
engagement • (Self-actualisation)
High Information Need • Contribute 70% less • Don’t care about ‘points’
and ‘reputation’ • Don’t stay for long • Engage with very few users • (Basic needs)
High Social Need • High level of social
interaction • Moderate reputation scores • High contribution level • Low information needs • (Social belongingness)
Recognition Need • High ‘reputation’ • Moderate contribution level • High engagement • (Self-esteem)
~90% of users at happily staying at the lower levels of the ‘need’s hierarchy’
experts to-be
about to churn
on right path to leadership
Behaviour evolution patterns
§ Can we predict future behaviour role? § Who’s on the path to become a
leader? an expert? a churner? § Which users we want to encourage
staying/leaving?
into becoming an expert - however this development only occurs 4 times
13
10
P28
13
8
P76
1
3
8
10
P103
12
3
P133
1
3
8
10
P155
1
3
6
10
P159
15
7
P190
17
10
P191
1
2
3
10
P193
1
38
10
11
P198
14
10
P201
1
3
10
11
P208
1
3
8
11
P223
1
3
6
10
P283
1
7
8
11
P284
13
6
P302
1
36
8
10
P305
13
10
P343
1
3
8
11
P363
1
38
10
11
P374
13
9
P413
17
8
P415
1
3
8
10
P417
1
2
3
11
P426
1
3
6
10
P427
1
5
7
10
P429
1
5
7
9
P430
1
2
3
8
P434
1
4
9
11
P458
3
8
10
11
P464
14
8
P480
1
35
10
11
P486
12
3
P507
1
2
3
6
P534
1
38
9
11
P537
1
23
6
10
P570
1
4
5
11
P571
7
8
10
11
P586
1
4
9
10
P602
1
3
6
11
P636
1
57
10
11
P654
1
45
9
11
P661
1
78
10
11
P667
1
36
8
10
P685
1
57
8
10
P720
1
2
3
6
P738
1
3
68
9
10
11
P750
1
57
8
10
P772
1
2
3
8
P785
1
3
5
8
9
11
P807
Fig. 6. Progression Patterns where users progress from a novice to an expert role overtime
Engagement Analysis
Tweet recipe for generating engagement § Identifying seed posts
Top features: Time in Day, Readability, Out-Degree, Polarity, Informativeness
Top features: Referral Count, Topic Likelihood, Informativeness, Readability, User Age
For both datasets: • Content features play a greater
role than user features • The combination of all features
provides the best results
• Predicting discussion activity Top features: Referral Count(-), Complexity(-)
Top features: URLs(-), Polarity(-), Topic Likelihood(+), Complexity (+)
For both, a decrease in URLs is associated with max activity. Language and terminology are more significant for Boards.ie.
Engagement in different communities § How the results differ:
§ from one community type to another § from random datasets to topic-
based ones § from related experiments in the
literature
§ Experimented with 7 datasets, from: § Boards.ie § Twitter § SAP § Server Fault § Facebook
Impact of features on engagement Boards.ie
β
−2−1
012
Twitter Random
β
−0.50.00.51.0
Twitter Haiti
−6e+16−4e+16−2e+16
0e+002e+164e+166e+16
Twitter Union
β
−0.8−0.6−0.4−0.2
0.00.2
Server Fault
β
−1.0−0.5
0.00.51.01.52.0
SAP
β
−10
−5
0
5
β
−0.10.00.10.20.30.40.5
In−degreeOut−degreePost CountAgePost RatePost LengthReferrals Count
PolarityComplexityReadabilityReadability FogInformativenessEF−IPFCF−IPF
Entity EntropyConcept EntropyEntity Degree CentralityConcept Degree CentralityEntity Network EntropyConcept Network Entropy
Effects of individual social, content, and semantic features on the response variable (i.e. whether the post seeds engagement or not).
Semantic Sentiment Analysis
Semantic sentiment analysis on social media
§ Offers a fast and cheap access to publics’ feelings towards brands, business, people, etc.
§ Range of features and statistical classifiers have been used for in recent years
§ Semantics are often neglected
§ We add semantics as additional features into the training set for sentiment analysis
§ Measure the correlation of the representative concept with negative/positive sentiment
Sentiment Analysis
hate negative honest positive inefficient negative Love positive …
Sentiment Lexicon
I hate the iPhone
I really love the iPhone
Lexical-Based Approach
Learn Model
Apply Model
Naïve Bayes, SVM, MaxEnt , etc.
Training Set
Test Set
Model
Machine Learning Approach
Semantic Concept Extraction § Extract semantic concepts from tweets data and incorporate them
into the supervised classifier training.
Fig. 1. Measuring correlation of semantic concepts with negative/positive sentiment. These se-mantic concepts are then incorporated in sentiment classification.
OpenCalais and Zemanta. Their experimental results showed that AlchemyAPI per-forms best for entity extraction and semantic concept mapping. Our datasets consist ofinformal tweets, and hence are intrinsically different from those used in [10]. There-fore we conducted our own evaluation, and randomly selected 500 tweets from the STScorpus and asked 3 evaluators to evaluate the semantic concept extraction outputs gen-erated from AlchemyAPI, OpenCalais and Zemanta.
No. of Concepts Entity-Concept Mapping Accuracy (%)Extraction Tool Extracted Evaluator 1 Evaluator 2 Evaluator 3AlchemyAPI 108 73.97 73.8 72.8Zemanta 70 71 71.8 70.4OpenCalais 65 68 69.1 68.7Table 2. Evaluation results of AlchemyAPI, Zemanta and OpenCalais.
The assessment of the outputs was based on (1) the correctness of the extractedentities; and (2) the correctness of the entity-concept mappings. The evaluation resultspresented in Table 2 show that AlchemyAPI extracted the most number of conceptsand it also has the highest entity-concept mapping accuracy compared to OpenCalaisand Zematna. As such, we chose AlchemyAPI to extract the semantic concepts fromour three datasets. Table 3 lists the total number of entities extracted and the number ofsemantic concepts mapped against them for each dataset.
STS HCR OMDNo. of Entities 15139 723 1194No. of Concepts 29 17 14
Table 3. Entity/concept extraction statistics of STS, OMD and HCR using AlchemyAPI.
Likely sentiment for a concept
§ Semantic concepts can help determining sentiment even when no good lexical clues are present
Impact of adding semantic features
§ Incorporating semantics increases accuracy by 6.5% for negative sentiment, and 4.8% for positive sentiment § F = 75.95%, with 77.18% Precision and 75.33% Recall § Using baselines of unigrams and part-of-speech features
§ More to-dos: § Semantic Concepts Extraction: Explore more fine-grained approach
for the entity extraction and the entity-concept mapping
§ Selective Method: Interpolate semantic concepts based on their contribution to the classification performance
Saif, Hassan; He, Yulan and Alani, Harith (2012). Semantic sentiment analysis of twitter. In: The 11th International Semantic Web Conference (ISWC 2012), 11-15 November 2012, Boston, MA, USA
OK, and now what?!
OUSocials
§ Many FB groups exist for students of OU courses
§ Created and used by students to discuss and share opinions on courses and get support
Behaviour Analysis
Sen#ment Analysis
Topic Analysis
Course tutors
Real #me monitoring
• How are opinion and sen#ment towards a course evolving?
• Who’s providing posi#ve/nega#ve support?
• What topics are emerging? How they change over#me?
• Do students get the answers and support they need?
Analytics over FB groups
§ Compare findings to course performance, and student performance
Reel Lives
Problem Summary
• Fragmented digital selves don’t support social learning and individual empowerment
• Need to enable: – Digital empowerment – Improved understanding and social cohesion – Informed decision making (for individuals) – Informed policy making (for organisations) – Facilitating creative participation – Co-curating of digital personhoods
Creating the ‘reels’
Changing energy consumption behaviour
A Decarbonisation Platform for Citizen Empowerment and Translating Collective
Awareness into Behavioural Change
August 2012
Energy Monitors
www.efergy.com greenenergyoptions.co.uk
fastcompany.com tdevice.net
powerp.co.uk
www.energycircle.com
indiegogo.com
greentechadvocates.com
• Do they change how we consume energy in our homes?
• Are they enough? • Why? How? What if? Where?
Social Eco Feedback Technology
Thanks to ..
Matthew Rowe (now at Uni Lancaster) Sofia Angeletou
(now at BBC)
Gregoire Burel Miriam Fernandez Smitashree Choudhury Hassan Saif
Papers http://oro.open.ac.uk/view/person/ha2294.html § Rowe, Matthew; Fernandez, Miriam; Angeletou, Sofia and Alani, Harith (2012). Community analysis through semantic rules and role composition
derivation. Journal of Web Semantics, 18(1)
§ Rowe, Matthew; Fernandez, Miriam; Alani, Harith; Ronen, Inbal ; Hayes, Conor and Karnstedt, Marcel (2012). Behaviour analysis across different types of Enterprise Online Communities. In: ACM web Science Conference 2012 (WebSci12), 22-24 June 2012, Evanston, U.S.A.
§ Rowe, Matthew; Stankovic, Milan and Alani, Harith (2012). Who will follow whom? Exploiting semantics for link prediction in attention-information networks. In: 11th International Semantic Web Conference (ISWC 2012), 11-15 November 2012, Boston, USA
§ Rowe, Matthew and Alani, Harith (2012). What makes communities tick? Community health analysis using role compositions. In: 4th IEEE International Conference on Social Computing, 3-6 September 2012, Amsterdam, The Netherlands
§ Wagner, Claudia ; Rowe, Matthew; Strohmaier, Markus and Alani, Harith (2012). Ignorance isn't bliss: an empirical analysis of attention patterns in online communities. In: 4th IEEE International Conference on Social Computing, 3-6 September 2012, Amsterdam, The Netherlands
§ Saif, Hassan; He, Yulan and Alani, Harith (2012). Semantic sentiment analysis of twitter. In: The 11th International Semantic Web Conference (ISWC 2012), 11-15 November 2012, Boston, MA, USA.
§ Rowe, Matthew; Angeletou, Sofia and Alani, Harith (2011). Predicting discussions on the social semantic web. In: 8th Extended Semantic Web Conference (ESWC 2011), 29 May - 2 June 2011, Heraklion, Greece.
§ Rowe, Matthew; Angeletou, Sofia and Alani, Harith (2011). Anticipating discussion activity on community forums. In: Third IEEE International Conference on Social Computing (SocialCom2011) , 9-11 October 2011, Boston, MA, USA.
§ Angeletou, Sofia; Rowe, Matthew and Alani, Harith (2011). Modelling and analysis of user behaviour in online communities. In: 10th International Semantic Web Conference (ISWC 2011), 23 - 27 Oct 2010, Bonn, Germany.
§ Karnstedt, Marcel ; Rowe, Matthew; Chan, Jeff ; Alani, Harith and Hayes, Conor (2011). The Effect of User Features on Churn in Social Networks. In: ACM Web Science Conference 2011 (WebSci2011), 14 - 17 June 2011, Koblenz, Germany.