Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
A SERVER-SIDE SUPPORT LAYER FORCLIENT PERSPECTIVE TRANSPARENT
WEB CONTENT MIGRATION
Diana-Florina Halita Darius Bufnea
Departament of Computer ScienceFaculty of Matemathics and Computer Science
Babes-Bolyai UniversityJuly 5, 2013
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Table of Contents
1 Introduction
2 Migration Process Challenges
3 Content Migration
4 Mechanism, Algorithm, Implementation
5 Results and Evaluation
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Abstract
the migration process implies changes in the site structureas seen by search engines and web clients
disadvantages, such as misdirecting search enginesvisitors
problem remains unsolved for visitors landing from 3rdparty referrers
map the old visible structure of a website to the new oneadvantages of such a mechanism are:
reducing incoming dead links from 3rd party referrersassisting search engines for properly redirecting userspage rank and SERP conservationseparation between content and presentation, storageengine, business and database
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Why Content Management Systems?
easy maintenance
easy migration from one presentation to another
regular security updates
full control over the elements related to SEO
3rd party plugins
web based administration
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Why is Migration Necessary?
old CMS is deprecated
open source CMS granting easier access to support newfeatures implemented by 3rd party plugins
CMS scalability
support for new technologies: CSS3, Ajax, HTML5
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Problems
exposing to the web acertain content to a new,different URL as it waspresented in the old site
misleading visitor comingfrom search engines orthird party referrers
lose page rank or otherin-time gather benefitsinduced by back links orsocial shares
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Possible solutions
dividing the content into categorieswhat can be automatically migrated and what can’tit is desired to automate as much of the content
estimating needed time for migration
compare automatic migration with manual migrationrequired time
migration reevaluation based on guidance
evaluating the automatic migration processproblem: content’s structure and its regularitymanual migration: waste of resources, time and effort
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Similar solutions
implemented mostly as plugins inside all major CMS
take into consideration some in-time gather data based onuser behavior
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Our approach
match new URLs to old ones based on:
content similaritysemantic related information such as: URLsthe query given to the search enginesreferrer’s content similarity with the linked content
advantage of being implementable prior to the release ofthe new web site
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Content migration
static content migration to static content migrationthe base name of the file remains the sameoldsiteURL: http://oldsite/oldpath/filenamenewsiteURL: http://oldsite/newpath/filenameperfect match, similarity(oldsiteURL, newsiteURL) = 1similarity(static content, static content)
static content migration to dynamic content migration
similarity(static content, dynamic content)similarity(full content, absolute content)
dynamic content migration to dynamic content migration
similarity(dynamic content, dynamic content)similarity(absolute content, absolute content)
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Dynamic content migration
the content = content of thepage template + the absolutecontent stored in the database
We’ll take into discussion onlythe absolute content
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
In order to match these URLs, our algorithm makes use of asimilarity function, but the method is not dependent of a certainsimilarity function.
Information processing isnot done in real time
running a real timecomputation will inducedelays in serving a responseto the web clientthe batch program runscompletely independent ofthe CMS core
For a quick match we usethe cosine similarityalgorithmas implemented byApache Lucene Figure 1 : Batch processing
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
The formal presentation of the algorithm
For each oldsiteURL having staticcontent doIdentify the newsiteURL which points
to the same static content (based onbase filename) (A)If this newsiteURL was identified theneliminate the oldsiteURL from theURLs list which must be processed
EndIfEndFor
For all unprocessed oldsiteURLs doIf it is a static content URL thenabsolute_content = the effective
content (C)elseabsolute_content =
content(oldsiteURL) - content(pagetemplate)(B)EndIfIdentify the newsiteURL so the
absolute_content has the best similaritywith the oldsiteURL’s absolute_contentThe matching pair is inserted in a
table, together with the current date,the similarity and the best similarityeverEndFor
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Figure 2 : Old site: number ofpages accessed from externalreferrer and number of notfound pages accessed fromexternal referrer
Figure 3 : New site: number ofpages accessed from externalreferrer and number of notfound pages accessed fromexternal referrer
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Figure 4 : Old site: percent ofnot found pages accessedfrom external referrer
Figure 5 : New site: percent ofnot found pages accessedfrom external referrer
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Figure 6 : New site:adaptation of the searchengines to the new structure ofthe site
Figure 7 : New site: Most ofnot found pages are comingfrom 3rd party referrer
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Figure 8 : New site: percent ofnot found pages, not foundpages from search engines,not found pages from 3rd partyreferrer
Figure 9 : Percent of not foundpages having an externalreferrer with our support layerenabled
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Conclusions and Future work
method for implementing this layer by mapping URLs fromthe old site to the ones in new site based on their contentsimilarity
evaluating different similarity functions in order to improvethe matching process and its speed
giving different weights in the similarity algorithm to thevarious properties of the content such as: URL, pageheadings, page title, key words
redirecting the user to a similar page, from the contentpoint of view, as the page where he is coming from
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION
Introduction Migration Process Challenges Content Migration Mechanism, Algorithm, Implementation Results and Evaluation
Thank you!
Q & A
Diana-Florina Halita, Darius Bufnea Babes-Bolyai University
A SERVER-SIDE SUPPORT LAYER FOR CLIENT PERSPECTIVE TRANSPARENT WEB CONTENT MIGRATION