18
Collaborative Digital Library Services in a Cloud Kurt Maly [email protected] Harris Wu [email protected] Mohammad Zubair [email protected] Milena Mektesheva [email protected] Department of Computer Science, Old Dominion University, Norfolk, VA, USA Service Computation 2010 November 21-26, 2010 - Lisbon

Service Computation 2010November 21-26, 2010 - Lisbon

Embed Size (px)

Citation preview

Page 1: Service Computation 2010November 21-26, 2010 - Lisbon

Collaborative Digital Library Services in a Cloud

Kurt Maly [email protected] Harris Wu [email protected] Mohammad Zubair [email protected] Milena Mektesheva [email protected]

Department of Computer Science, Old Dominion University, Norfolk, VA, USA

Service Computation 2010November 21-26, 2010 - Lisbon

Page 2: Service Computation 2010November 21-26, 2010 - Lisbon

Outline

1. Introduction What’s the main issue of traditional computing?

2. Background The existing facet based system with the compute

intensive nature of some features.

3. Evaluation and scaling issues The evaluation of Facet System The scaling issues of traditional computing

4. Cloud development architecture The system on LAMP system with PHP and MySQL on Windows Azure.

5. Future Work

1November 21-26, 2010 - Lisbon Service Computation 2010

Page 3: Service Computation 2010November 21-26, 2010 - Lisbon

Introduction

We have developed a web-based system that allows users to collaboratively organize large online multimedia collections into an evolving faceted classification.

The system includes backend algorithms that systematically enrich the classification and automatically classify documents

Evaluation of the prototype system (Facet System) shows promise, and identifies some issues.

2November 21-26, 2010 - Lisbon Service Computation 2010

Page 4: Service Computation 2010November 21-26, 2010 - Lisbon

Introduction One major issue: the scalability of the system

on traditional server implementations.

Traditional computing cannot support ever-increasing number of users, documents, schema objects, schema history, and automated classification processes without difficult, expensive and time consuming resource reconfiguration.

To address this problem, we are proposing to move our system on a cloud-based Microsoft Windows Azure platform as a collaborative cloud service.

3November 21-26, 2010 - Lisbon Service Computation 2010

Page 5: Service Computation 2010November 21-26, 2010 - Lisbon

Background – the existing Facet System

browsing screen

4November 21-26, 2010 - Lisbon Service Computation 2010

Page 6: Service Computation 2010November 21-26, 2010 - Lisbon

Background – the existing Facet System

Facet classification

The personal schema allows user to have a personal, persistent, idiosyncratic view of the collection

5November 21-26, 2010 - Lisbon Service Computation 2010

Page 7: Service Computation 2010November 21-26, 2010 - Lisbon

Background – the existing Facet System

Facet classification with both global and personal schemas.

Personal schema

Globalschema

6November 21-26, 2010 - Lisbon Service Computation 2010

Page 8: Service Computation 2010November 21-26, 2010 - Lisbon

Background – the existing Facet System

The back-end algorithms utilize the metadata in personal schemas for enrichment of global schema and automated classifications.

When automated classification is enabled for the personal hierarchy (in user preference settings), the backend algorithms take significant amount of computing resources for each additional user.

Furthermore, our system supports schema history – which allows users to examine global or personal schema at any given point in time.

7November 21-26, 2010 - Lisbon Service Computation 2010

Page 9: Service Computation 2010November 21-26, 2010 - Lisbon

Evaluation and scaling issues The evaluation of Facet System:• We have evaluated the Facet System for over a

year with over 300 students at the Old Dominion University and the niversity of Delaware.

• We have tested the system by simulating a large number of users.

• The scaling issue proves to be a critical factor in expanding the evaluation and deploying our system for public use in a multimedia document repository.

8November 21-26, 2010 - Lisbon Service Computation 2010

Page 10: Service Computation 2010November 21-26, 2010 - Lisbon

Evaluation and scaling issues The scaling issues of traditional computing:

• Traditional computing cannot support ever-increasing number of users, associated personal schemas, schema history logging, schema enrichment, and automated classification process.

• With traditional computing, resources are typically configured rigidly with respect to both hardware and software (including licenses) to handle expected usage for a fairly short time horizon.

9November 21-26, 2010 - Lisbon Service Computation 2010

Page 11: Service Computation 2010November 21-26, 2010 - Lisbon

Evaluation and scaling issues Our long-term vision: the cloud-based document-

organization approach may go beyond organizing an online multimedia collection to organizing knowledge bases in a large enterprise or a global research community.

The cloud not only eliminates the storage limitation of desktop computers and traditional file servers, but also reduces duplicate storage and allows for value-added services such as document version controls.

The Facet system on Windows Azure

10November 21-26, 2010 - Lisbon Service Computation 2010

Page 12: Service Computation 2010November 21-26, 2010 - Lisbon

Cloud development architecture Current System: Joomla on LAMP (Linux, Apache,

MYSQL, and PHP) The system using Azure: the Joomla system along with

PHP and MySQL on Windows Azure.

AzureWeb Role

Worker Role

Run the user-facingFacet System, which is programmed in PHP

Run the MySQL database, backend schema enrichment and classification programs in Java.

11November 21-26, 2010 - Lisbon Service Computation 2010

Page 13: Service Computation 2010November 21-26, 2010 - Lisbon

Cloud development architecture

Overview of the Azure Cloud

12November 21-26, 2010 - Lisbon Service Computation 2010

Page 14: Service Computation 2010November 21-26, 2010 - Lisbon

Cloud development architecture In our deployment there are DIFFERENT Web Roles

and Work Roles. The FacetUI instances: serve the end-user

interface to the Facet system. The FacetAdmin role: contains administrative

tools that administer the database and caches.

Two web roles

The MySQL instances host the MySQL database that supports the core-Joomla features.

The MemCached instances host Memcached, a popular distributed object cache system.

The FacetBackend role contains systematic schema enrichment and automated classification

algorithms, which operate with data in SQL Azure.

Threework roles

13November 21-26, 2010 - Lisbon Service Computation 2010

Page 15: Service Computation 2010November 21-26, 2010 - Lisbon

Cloud development architecture

Architecture of proposed deployment on Windows Azure

14November 21-26, 2010 - Lisbon Service Computation 2010

Page 16: Service Computation 2010November 21-26, 2010 - Lisbon

Cloud development architecture

Deployment of Facet System on Azure Development

15November 21-26, 2010 - Lisbon Service Computation 2010

Page 17: Service Computation 2010November 21-26, 2010 - Lisbon

Future Work

On the user-oriented side: address issues that come with the large scale.

On the back-end: address scalability issues of schema enrichment and automated classification

We will evaluate various aspects of system functionality, including both user interface and backend algorithms.

In parallel with code changes, we will develop a large test bed that allows us to test the scalability of the system.

……

16November 21-26, 2010 - Lisbon Service Computation 2010

Page 18: Service Computation 2010November 21-26, 2010 - Lisbon

Thank You!

Question ?

November 21-26, 2010 - Lisbon Service Computation 2010