19
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION Diana-Florina Hali¸ a Darius Bufnea Departament of Computer Science Faculty of Matemathics and Computer Science Babe¸ s-Bolyai University July 5, 2013 Diana-Florina Hali¸ a, Darius Bufnea Babe¸ s-Bolyai University A SERVER-SIDESUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

A SERVER-SIDE SUPPORT LAYER FORCLIENT PERSPECTIVE TRANSPARENT

WEB CONTENT MIGRATION

Diana-Florina Halita Darius Bufnea

Departament of Computer ScienceFaculty of Matemathics and Computer Science

Babes-Bolyai UniversityJuly 5, 2013

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 2: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Table of Contents

1 Introduction

2 Migration Process Challenges

3 Content Migration

4 Mechanism, Algorithm, Implementation

5 Results and Evaluation

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 3: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Abstract

the migration process implies changes in the site structureas seen by search engines and web clients

disadvantages, such as misdirecting search enginesvisitors

problem remains unsolved for visitors landing from 3rdparty referrers

map the old visible structure of a website to the new oneadvantages of such a mechanism are:

reducing incoming dead links from 3rd party referrersassisting search engines for properly redirecting userspage rank and SERP conservationseparation between content and presentation, storageengine, business and database

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 4: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Why Content Management Systems?

easy maintenance

easy migration from one presentation to another

regular security updates

full control over the elements related to SEO

3rd party plugins

web based administration

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 5: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Why is Migration Necessary?

old CMS is deprecated

open source CMS granting easier access to support newfeatures implemented by 3rd party plugins

CMS scalability

support for new technologies: CSS3, Ajax, HTML5

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 6: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Problems

exposing to the web acertain content to a new,different URL as it waspresented in the old site

misleading visitor comingfrom search engines orthird party referrers

lose page rank or otherin-time gather benefitsinduced by back links orsocial shares

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 7: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Possible solutions

dividing the content into categorieswhat can be automatically migrated and what can’tit is desired to automate as much of the content

estimating needed time for migration

compare automatic migration with manual migrationrequired time

migration reevaluation based on guidance

evaluating the automatic migration processproblem: content’s structure and its regularitymanual migration: waste of resources, time and effort

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 8: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Similar solutions

implemented mostly as plugins inside all major CMS

take into consideration some in-time gather data based onuser behavior

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 9: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Our approach

match new URLs to old ones based on:

content similaritysemantic related information such as: URLsthe query given to the search enginesreferrer’s content similarity with the linked content

advantage of being implementable prior to the release ofthe new web site

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 10: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Content migration

static content migration to static content migrationthe base name of the file remains the sameoldsiteURL: http://oldsite/oldpath/filenamenewsiteURL: http://oldsite/newpath/filenameperfect match, similarity(oldsiteURL, newsiteURL) = 1similarity(static content, static content)

static content migration to dynamic content migration

similarity(static content, dynamic content)similarity(full content, absolute content)

dynamic content migration to dynamic content migration

similarity(dynamic content, dynamic content)similarity(absolute content, absolute content)

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 11: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Dynamic content migration

the content = content of thepage template + the absolutecontent stored in the database

We’ll take into discussion onlythe absolute content

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 12: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

In order to match these URLs, our algorithm makes use of asimilarity function, but the method is not dependent of a certainsimilarity function.

Information processing isnot done in real time

running a real timecomputation will inducedelays in serving a responseto the web clientthe batch program runscompletely independent ofthe CMS core

For a quick match we usethe cosine similarityalgorithmas implemented byApache Lucene Figure 1 : Batch processing

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 13: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

The formal presentation of the algorithm

For each oldsiteURL having staticcontent doIdentify the newsiteURL which points

to the same static content (based onbase filename) (A)If this newsiteURL was identified theneliminate the oldsiteURL from theURLs list which must be processed

EndIfEndFor

For all unprocessed oldsiteURLs doIf it is a static content URL thenabsolute_content = the effective

content (C)elseabsolute_content =

content(oldsiteURL) - content(pagetemplate)(B)EndIfIdentify the newsiteURL so the

absolute_content has the best similaritywith the oldsiteURL’s absolute_contentThe matching pair is inserted in a

table, together with the current date,the similarity and the best similarityeverEndFor

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 14: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Figure 2 : Old site: number ofpages accessed from externalreferrer and number of notfound pages accessed fromexternal referrer

Figure 3 : New site: number ofpages accessed from externalreferrer and number of notfound pages accessed fromexternal referrer

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 15: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Figure 4 : Old site: percent ofnot found pages accessedfrom external referrer

Figure 5 : New site: percent ofnot found pages accessedfrom external referrer

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 16: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Figure 6 : New site:adaptation of the searchengines to the new structure ofthe site

Figure 7 : New site: Most ofnot found pages are comingfrom 3rd party referrer

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 17: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Figure 8 : New site: percent ofnot found pages, not foundpages from search engines,not found pages from 3rd partyreferrer

Figure 9 : Percent of not foundpages having an externalreferrer with our support layerenabled

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 18: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Conclusions and Future work

method for implementing this layer by mapping URLs fromthe old site to the ones in new site based on their contentsimilarity

evaluating different similarity functions in order to improvethe matching process and its speed

giving different weights in the similarity algorithm to thevarious properties of the content such as: URL, pageheadings, page title, key words

redirecting the user to a similar page, from the contentpoint of view, as the page where he is coming from

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION

Page 19: A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE ...diana.sotropa/files/... · features implemented by 3rd party plugins CMS scalability support for new technologies: CSS3, Ajax,

Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation

Thank you!

Q & A

Diana-Florina Halita, Darius Bufnea Babes-Bolyai University

A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION