26
Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair [email protected] Old Dominion University Department of Computer Science DH 2012 Hamburg 19 July 2012 1

Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

Embed Size (px)

Citation preview

Page 1: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 1

Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud

Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris WuKurt Maly, Mohammad [email protected]

Old Dominion UniversityDepartment of Computer Science

Page 2: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 2

Outline

Faceted Classification System and scalability issues

Implementation and deployment on a cloud Evaluation and user studies Conclusions

Page 3: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 3

Faceted Classification Systemand scalability issues Web based application Allows users collaboratively organize

multimedia collections into faceted classification

Social application - must handle Many users Various network traffic levels

Traditional on-premises deployment can’t handle Increasing number of users Numerous evolving classification schemas Large document collections

Page 4: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 4

Faceted Classification Systemand scalability issues

Page 5: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 5

Faceted Classification Systemand scalability issues

New features require even more resources Personal classification schema History feature – evolution of

classification over time Decision – move to a cloud platform

Page 6: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 6

The click-and-drag classification screen

Page 7: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 7

Global and personal (or local) schemas

Page 8: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 8

Faceted Classification Systemand scalability issues

Page 9: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 9

Microsoft Windows Azure vs. Amazon Elastic Compute

Microsoft Windows Azure Cloud Platform as a Service (PaaS) cloud

Hides management and operational side from users

Focus on development and solving business problems

Amazon Elastic Compute Cloud Infrastructure as a Service (IaaS) cloud Allows to deploy new technologies and

adopt new capabilities

Page 10: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 10

Microsoft Windows Azure vs. Amazon Elastic Compute

Both offer reliability and scalability Windows Azure more suitable for

applications with variable load, short or unpredicted lifetime

Azure platform was chosen because of the most managed environment

Choice of either platform – best fit for a company, developers and users

Page 11: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 11

Implementation and deployment on Azure cloud platform

First step – conversion of Joomla 1.6.3 to work with Azure SQL

Second step – converting Faceted Classification System packages to Azure SQL (from MSQL)

Third step – full configuration of the system Last step – configuration of the whole

project and deployment

Page 12: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 12

Implementation and deployment on Azure cloud platform

Page 13: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 13

Implementation and deployment on Azure cloud platform

Page 14: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 14

Design of the cloud-based web application

Final design of current deployment Web role can run by default 20

instances (more if needed) Azure manages load-balancing

(round-robin algorithm, performance and failover in beta) and seamlessly redirects users

All data stored now on Azure SQL

Page 15: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 15

Design of the cloud-based web application

Page 16: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 16

Advantages and disadvantages of deployment on the cloud platform

Advantages High availability, reliability and

scalability Disadvantages Azure SQL is a new product

Lacks features of the full MSSQL DB No profiler Import, export are rudimentary

Page 17: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 17

Advantages and disadvantages of deployment on the cloud platform

Biggest drawback – performance of Microsoft SQL Driver for PHP

Measured query statements – no unusual delays

Fetching results with sqlsrv_fetch_array() sqlsrv_fetch_object()

delays in rendering web pages up to 20 seconds

Deployment of web application should consider all benefits and drawbacks

Page 18: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 18

Evaluation

User studies with classes on information technologies (Spring and Fall 2011) Students had to develop personal facet

schemas Personal schemas were merged into

global schema

Page 19: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

Initial Page with only few facets

Page 20: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

Page without & with user facets

Page 21: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

Item detail screen without & with faces and tags

Page 22: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

22

Merging of Personal Facets

Global Personal

- Good facet/category definition - Useful for most users - Optimized - Wide coverage

- Personal use - May contain non-facet schemas - Personal wording for facet/category/tag - Narrow coverage

Approach:

Evaluating all the personal schemas, find most widely used facets/category/items,

use similarity of concepts, enrich/reconstruct the global schema.

Page 23: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

Sample algorithm component

23

Popularity Description

New facet (1) It does not existed in the global schema;(2) is used in more than half of the personal schema

New category (1)It or a ‘similar’ category does not exist in the global old facet;

(2)the personal facet containing the global new category is similar to the global old facet

(3)more than half of the users who have the (‘similar’) global facet have the new category under it.

“Similar”: when two entities are either Wordnet similar or structure similar

Page 24: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

Example-1:

24

Event

- Group action

- Competition

- Wreck

Location

- Alabama

- Virginia

Source

- Newspaper

- Internet

Space Quality Time

- VA - Good - 1998

- New York - Bad - 2006

- Alabama Event - 2010

Position - Activity Tom

- NY - Crash - OK

- Virginia - Happening - Not Ok

Jason Year

- Favorite - before 2000

- Dislike - after 2000

Global schema (old):

Personal schema:

Page 25: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

25

Example-2:

Event Source

- Group action - Newspaper

- Competition - Internet

- Wreck Year

Location - Before 2000

- Alabama - After 2000

- Virginia

- New York

Similarity: S(year, time) =0.5528, S(crash, wreck) =1,

S(New York, NY)=1, S(Virginia, VA)=1

New global schema

Page 26: Social Curation of Large Multimedia Collections on a Microsoft Azure Cloud Dazhi Chong, Samuel Coppage, Xiangyi Gu, Harris Wu Kurt Maly, Mohammad Zubair

DH 2012 Hamburg 19 July 2012 26

Conclusions

A cloud can solve the scalability issue of: compute intensive features such as schema

merging and history (schema evolution) many simultaneous users

Porting a complex application to the cloud is a daunting task – not for the uninitiated