Upload
deirdre-stewart
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
1
Linking OrganizationalSocial Networking ProfilesResearch Wrap-Up – 28 August 2015
2
Develop a systemto find an organization’s profiles across different social networks.
Objective
3
Brands
Regional
Affiliates
Affiliate Profiles
4
System
Overview
Organization Name
Official
Affiliate
Unrelated
5
OfficialProfiles representing the company as a whole. e.g. @Microsoft, @Dell (respectively)
AffiliateProfiles representing a brand or regional affiliate.
e.g. @Surface, @Windows, @MicrosoftAsia
UnrelatedProfiles that aren’t run by the company itself.
Includes employees, other companies.
6
Introduction
Introduction
Implementation
Evaluation
Results/Discussion
Future?
7
Input Processing
QueryGET /company/Microsoft Corporation
Profile Acquisition
Twitter/Facebook Search API
DuckDuckGo Instant Answers
API
Processed Querye.g. “Microsoft”
Profile Conversion
Profile Classification
Twitter/Facebook Profiles
Feature Vectors
Labelled Profilesjson
Pipeline
8
9
Input Processing
QueryGET /company/Microsoft Corporation
Profile Acquisition
Twitter/Facebook Search API
DuckDuckGo Instant Answers
API
Processed Querye.g. “Microsoft”
Profile Conversion
Profile Classification
Twitter/Facebook Profiles
Feature Vectors
Labelled Profilesjson
Pipeline
10
Input ProcessingQuery DuckDuckGo Instant Answers API, which gives
a “topic summary”.Take the name from that summary.
11
Profile AcquisitionQuery Twitter/Facebook’s search API and retrieve 20
candidate profiles.
12
Name-based
(5)
• N1: Normalized Edit Distance: Query to Username• N2: Normalized Edit Distance: Query to Display Name• N3: Length of Query• N4: Length of Username• N5: Length of Display Name
Description-
based (3)
• D1: Occurrences of Query in Description• D2: Cosine Similarity: Query and Description• D3: Cosine Similarity: Profile Description and
DuckDuckGo Description
Language Model-
based (6)
• LM1: “Official” Description LM Probability• LM2: “Affiliate” Description LM Probability• LM3: “Unrelated” Description LM Probability• LM4: “Official” Post LM Probability• LM5: “Affiliate” Post LM Probability• LM6: “Unrelated” Post LM Probability
Profile Conversion - Features
13
Name-based FeaturesN1 - Normalized Edit
Distance: Query to UsernameN2 - Normalized Edit Distance: Query to Display NameN3 - Length of QueryN4 - Length of UsernameN5 - Length of Display Name
1−𝑒𝑑𝑖𝑡 _𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑚𝑎𝑥 (𝑙𝑒𝑛 (𝑠1 ) , 𝑙𝑒𝑛 (𝑠2 ))0 when completely different, 1 when identical
Username: GMDisplay Name: General Motors
QuirksAbbreviations: GM versus General Motors
Stopwords: “Corporation”, “Company”, etc.
Imposters!
14
Description-based FeaturesD1 - Occurrences of Query
D2 - Cosine Similarity: Query and DescriptionD3 - Cosine Similarity: DuckDuckGo Description and Profile Description
15
Language Model-based FeaturesProbability that description/posts
appear in each language model:
Description
• LM1 - Official Profiles• LM2- Affiliate Profiles• LM3 - Unrelated
Profiles
Recent Posts
• LM4 - Official Profiles• LM5 - Affiliate Profiles• LM6 - Unrelated
Profiles
16
Official; 232; 7%Affiliate;
675; 20%
Unrelated; 2474; 73%
3381 labels from 228 organizations
Twitter Labels
Official; 145; 4% Affil-iate; 491; 14%
Unrelated; 2767; 81%
3403 labels from 216 organizations
Facebook Labels
Ground Truth Breakdown
17
Per-Fold Evaluation Process
Official Profiles
Affiliate Profiles
1. Training set is used to train the classifier.
Classifier
Unrelated
Profiles
2. Test set is filtered for official and affiliate profiles.
Official Profiles
Affiliate Profiles
Test Set
3. Obtain list of organizations that own these profiles.
Official Profiles
Affiliate Profiles
Organization Names
System
4. Names used to query system, results used to calculate performance.
Organization Names
Classifier
Classified
Official
Classified
AffiliateClassifie
d Unrelate
d
18
BaselineSimulates manually judging profiles by name alone.
N1 - Normalized Edit Distance: Query to Username
N2 - Normalized Edit Distance: Query to Display Name
19
F1 Precision Recall0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
60.4%
80.1%
48.5%
93.5%97.5%
89.9%
Baseline Final
Official
F1 Precision Recall0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
67.0%74.0%
61.1%
92.3% 94.9%89.8%
Baseline Final
Affiliate
Results - Twitter
20
F1 Precision Recall0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
81.0% 79.4% 82.6%
93.3% 96.2%90.6%
Baseline Final
Official
F1 Precision Recall0%
10%
20%
30%
40%
50%
60%
70%
80%
48.9%
59.0%
41.8%
67.0%
75.3%
60.4%
Baseline Final
Affiliate
Results - Facebook
21
Profile TypesFacebook has multiple profile types: people, pages,
places, groups, etc.Twitter has just one: people.
Affiliates?Why don’t FB affiliates score as well? Page usernames are
optional./pages/Netflix-Latinoamérica/553454298124413
Display Name ID
22
Profile TypesFacebook has multiple profile types: people, pages,
places, groups, etc.Twitter has just one: people.
Affiliates?Why don’t FB affiliates score as well? Page usernames are
optional./pages/Netflix-Latinoamérica/553454298124413
Display Name ID
23
Affiliates?Why don’t FB affiliates score as well? Page usernames are
optional./pages/Netflix-Latinoamérica/553454298124413
Display Name ID
Auto-generated pages also follow the same pattern!
24
Future?
Focus on affiliates – unique to the domain.
25
Future?
Focus on affiliates – unique to the domain.
Drill down into the various different types: (e.g.) outreach, regional, brand, business unit.
26
Future?
Focus on affiliates – unique to the domain.
Drill down into the various different types: (e.g.) outreach, regional, brand, business unit.Improve ground truth: crowd-source labels.
27
DoneObjective: develop a system to find an organization’s profiles across different social networksUsed network-specific classifiers to do soEvaluated performance using modified cross-validation
FUture
Dive deeper into affiliates, which are unique to organizations