Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf ·...

Data Mining and Machine Learning Lab

Feature Selection with Linked Data

in Social Media

Jiliang Tang and Huan Liu

Computer Science and Engineering

Arizona State University

April 26-28, 2012 SDM2012

Social Media

• Explosion of social media generates massive

data in an unprecedented rate

- 200 million Tweets per day

- 3,000 photos in Flickr per minute

-153 million blogs posted per year

Social Media Data

• Massive and high-dimensional social media data

poses challenges to data mining tasks

- Scalability

- Curse of dimensionality

• Feature selection is an effective way to prepare

large-scale, high-dimensional data for effective

data mining

Feature Selection

• Traditional feature selection algorithms

work with “flat" data (attribute-value data)

- Independent and Identically Distributed (i.i.d.)

• Social media data differs from attribute-

value data

- Inherently linked

An Example of Social Media Data

𝑝1 𝑝2

𝑝3 𝑝5

𝑝1 𝑝2

𝑝3 𝑝5

𝑝1 𝑝2

𝑝3 𝑝5

User-post

relations

𝑝1 𝑝2

𝑝3 𝑝5

User-user

following

Representation for Attribute Value Data

𝑝7 𝑝8

𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 ….

𝑝7 𝑝8

𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 …. Features

𝑝7 𝑝8

𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 ….

Labels

Representation for Social Media Data

User-post relations

𝑢1 𝑢2 𝑢3 𝑢4

𝑝7 𝑝8

𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 ….

𝑢1 𝑢2 𝑢3 𝑢4

𝑝7 𝑝8

𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 ….

User-user relations

𝑢1 𝑢2 𝑢3 𝑢4

𝑝7 𝑝8

𝑓1 𝑓2 𝑓𝑚 …. …. …. 𝑐1 𝑐𝑘 ….

Social

Context

Problem Statement

• Given labeled data X and its label indicator matrix Y, the

whole dataset F, its social context including user-user

following relationships S and user-post relationships P, we

aim to select K most relevant features from m features on

the dataset F with its social context S and P.

Two Fundamental Problems

• Relation extraction

- What are distinctive relations that can be

extracted from linked data

• Mathematical representation

- How to use these relations in feature selection

formulation

𝑝1 𝑝2

𝑝3 𝑝5

Relation Extraction

coPost

• A user can have

multiple posts

coFollowing

𝑢1 𝑢3

𝑝1 𝑝2

𝑢4 𝑝8 • Two users

follow a

third user

coFollowed

𝑢2 𝑝1 𝑝2

𝑝3 𝑝5 𝑝4

𝑢4 𝑝8 • Two users

are followed

by a third

Following

𝑢2 𝑝1 𝑝2

𝑝5 𝑝4

• A user follows

another user

Post-Post relations

• What do these relations suggest for posts?

Social Correlation Theories

• Homophily

- People with similar interests are more likely to be

linked

• Social influence

- People that are linked are more likely to have

similar interests

CoPost Hypothesis

• CoPost Hypothesis

- Posts by the same user are more likely to be of

Feature Selection with Linked Data in Social Mediacse.msu.edu/~tangjili/publication/SDM12.pdf ·...

Documents

Density-based Projected Clustering over High Dimensional ...disi.unitn.it/~themis/publications/sdm12.pdf · Density-based Projected Clustering over High Dimensional Data Streams Irene

Linked and Sex-linked genes

Kruse/Ryba ch041 Linked Stacks and Queues Pointers and Linked Structures Linked Stacks Linked Stacks with Safeguards Linked Queues Application: Polynomials

Linked(open data)vsopen(linked data) lod2014roma

Integrating SocialMedia Data for Community Detectioncse.msu.edu/~tangjili/publication/MSM-MUSE.pdf · Social media is quickly becoming an integral part of our life. Facebook, one

Robust Reputation-Based Ranking on ... - comp.hkbu.edu.hkxinhuang/publications/pdfs/SDM12.pdf · based ranking algorithm (Mizz) for the assessment of scholarly papers. In [25], the

Linked Lists CSC220 Winter 2004-5. Array vs Linked List node Array Linked List

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ...tangjili/publication/LUFS_TKDE.pdfIEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING APRIL 2012 1 An Unsupervised Feature Selection Framework

Feature Selection for Classiﬁcation: A Reviewcse.msu.edu/~tangjili/publication/feature_selection_for...Feature Selection for Classiﬁcation: A Review. 2. ... of this growth of the

Context-aware Search for Personal Information Management ...pages.cs.wisc.edu/~wentaowu/papers/sdm12-context-aware-pim.pdf · Context-aware Search for Personal Information Management

Linked List, Types of Linked LIst, Various Operations, Applications of Linked List

LinkED: A Novel Methodology for Publishing Linked

Visualizing Linked Open Data Andra Waagmeester. Overview Context: Pathways Howto: Linked data Make sense of linked data Visualizing linked data

Linked Data at present Using Linked Data

Linked List Lecture One Data Structuresfiles.imtschool.com/Online/DataStructure/Lecture 1/linked...Linked list operation Creation is used to create a linked list. Once a linked list

Linked Lists1 Part-B3 Linked Lists. Linked Lists2 Singly Linked List (§ 4.4.1) A singly linked list is a concrete data structure consisting of a sequence

UNIT-2. Singly Linked List Doubly Linked List Circular Linked List Representing Stack with Linked List. Representing Queue with Linked List

Linked List 1. Introduction to Linked List 2. Node Class 3. Linked List 4. The Bag Class with Linked List

Linked Lists II Doubly Linked Lists

Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different