19
Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval Presenter: Andy Lim

Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval Presenter: Andy Lim

Embed Size (px)

Citation preview

Nonnegative Shared Subspace Learning and Its Application to Social Media RetrievalPresenter: Andy Lim

Paper Topic• Folksonomy

• Social media sharing platforms

The Problem• Rise in popularity of social image and video sharing platforms

• Precision of tag-based media retrieval

• Tags are• Noisy• Ambiguous• Incomplete• Subjective

• Lack of constraints• Free-text tags (i.e. “djfja;sldfkj”)

Tags: hotdog, chinese, trololol, aidjishi, sandwich, bread

Previous Research(Internal)• Improving tag relevance

• Sigurbjornsson and Zwol• Developed a method of recommending a set of relevant tags

based on tag popularity

• Li et al.• List all images for a given tag and determine tag relevance from

visual similarity

• All are confined to noisy tags within the primary dataset

The Approach• Internal vs. External

• Leverage external auxiliary sources of information to improve target tagging systems (presumably much noisier)

• Exploit disparate characteristics of target domain using auxiliary source

• Note: What is the optimal level of joint modeling such that the target domain still benefits from the auxiliary source?

Assumptions• There is a common underlying subspace shared by the primary

and secondary domains

• The primary domain is much nosier than the secondary domains

Nonnegative Matrix Factorization

• X (M x N data matrix) where N = documents in terms of M vocabulary words

• F (M x R nonnegative matrix) represents R basis vectors

• H (R x N nonnegative matrix) contains coordinates of each document

Joint Shared Nonnegative Matrix Factorization (JSNMF)• Input:• X (target domain), Y

(auxiliary domain), R1 and R2

(dimensionality of underlying subspaces of X and Y), K (basis vectors)

• Output:• W (joint shared subspace), U

(remaining subspace in target domain), V (remaining subspace in auxiliary domain), H (coordinate matrix for target domain), L (coordinate matrix for auxiliary domain)

Retrieval using JSNMF• Input: W, U, H, query

sentence SQ, number of images (or videos) to be retrieved N and image (or video) dataset

• Output: Return top N retrieved images (or videos)

Experiment• Use LabelMe tags (auxiliary) to improve• Image retrieval in Flickr• Video retrieval in Youtube

• Why LabelMe?• Object image tagging• Controlled vocabulary

Flickr Dataset• Downloaded 50,000 images from Flickr

• Average number of distinct tags = 8

• Removed• Rare tags (appears less than 5 times)• Images with no tags and non-English tags

• Obtained 20,000 labeled images

• 7,000 examples are kept for investigating internal auxiliary dataset

YouTube Dataset• Downloaded 18,000 videos’ metadata (tags, URL, category,

title, comments, etc.)

• Average number of distinct tags = 7

• Removed• Rare tags (appearing less than 2 times)• Videos with no tags or non-English tags

• Obtained dataset corresponding to 12,000 videos

• Again, kept 7,000 examples to be used as an internal auxiliary dataset

LabelMe Dataset• Added 7,000 images with tags from LabelMe

• Average number of distinct tags = 32

• Removed• Rare tags (appearing less than 2 times)

• Cleanup does not reduce dataset

Evaluation Measures• Defined query set Q• {cloud, man, street, water, road, leg, table, plant, girl, drawer,

lamp, bed, cable, bus, pole, laptop, plate, kitchen, river, pool, flower}

• Manually annotated the two datasets (Flickr and YouTube) with respect to the query set (no benchmark dataset available)

• Query term and an image is relevant if the concept is clearly visible in the image (or video)

Results with JSNMF• Precision-Scope Curve

• Fix recall at 0.1• Users are usually only interested in

first few results

• 10% improvement

Results with JSNMF• Under-representation• Shares very few basis vectors

• Over-representation• Forces many basis vectors to

represent both datasets

• Appropriate level of representation

Flickr Retrieval Results

• Results are better with LabelMe

• As recall increases, precision decreases

• When K=0 (no sharing) or K=40 (fully sharing), precision is lower compared to K=15

YouTube Retrieval Results

• Similar to Flickr Results

Extra Notes & Questions?

• Can be extended to multiple datasets (not just 2)

• Can use generic model to apply to other data mining tasks