A Web Services Search Engine CS 8803 [AIA] - Spring 2008 Roland Krystian Alberciak Piotr Kozikowski Sudnya Padalikar Tushar Sugandhi

A Web Services Search Engine

CS 8803 [AIA] - Spring 2008

Roland Krystian Alberciak Piotr Kozikowski

Sudnya Padalikar Tushar Sugandhi

Outline• Project Overview

• Searching Web-serviceso Tools / APIso How to figure out what information to show

• Results :Working prototypeo Locate, classify, rank, and present web-services

• System Integrationo Diversity!

Languages (no joke): Python, Ruby on Rails, PHP, C#, Java, Perl.

Databases: MySQL, MSSQL

Project Overview

Step 1 - There are web-services available on the web

Step 2 - (Challanges)Obstacles to find WS vs. web pages because:

Effort to RegisterDirectories disconnected No Clustering availableNo Ranking available

Step 3 - ProfitShould be Beneficial for Web DevelopersShould be Beneficial for us

What is out there?• Swoogle -“10,000 ontologies” (they are

more concerned with “semantic web” and “metadata”, and not so much on web services)

• Programmableweb -726 (only APIs)• "Yellow pages" - 5000 web-services• XMethods - 500 web-services• UDDI - Discontinued but was useful to

many web services to advertise themselves.

Survey of the Market-

We found solutions for Step 2!

Step 1. Have web-services available on the webStep 2. (Solutions) Crawler, database, web application and abunch of clustering algorithms and lots of "glue"Step 3. Our proposed solution - Web Slogger! - for us: content based advertising - for users: easy way to search for web-services

System Architecture

Crawling

Yahoo!Why not Google?Restricted extraction: Could not extract many results What about Alexa?Couldn't afford it! :-)What did we crawl for?.wsdl and .asmx filesHow is Webslogger different from the Yellow Pages project (last year's class project)?• Multiple Language support

Categorization and ClusteringGlossaries • Hierarchical Categirization (27 Categories)• List of keywords for each category (2800 keywords)

Web Service Partitioning By Importance Some sections in web service are more important than othe r e.g. Service Name / Operation Name is more important than message type name.Affinity Vector• Weight assigned to each term in Webservice based

on its mapping with Glossary• Determines which web service belongs to which

category

Ranking InsightFundamental Difference: Web page ranking is based on inlinks and outlinks. Web service ranking should be based on objects and web methods.

Recall: Our results are extracts from search engines. Therefore: • We don't know how many pages link to a particular

wsdl file. • Search engine algorithms [ie. PageRank] have this

data and can assert 'popularity', 'credibility' of hubs which locate sources.

Resolution: We must find alternate ways to rank content

Ranking Options1. Community Level: Collaborative Ranking: • users can leave comments, • Likert scale ranking • rank good users / bad users in the community:

experts

2. User Level: Usage statistic ranking: • how long you view a wsdl • do you go back to look at it again [since it is like an

API...] • inquire about what wsdl files they used to achieve a

goal

Ranking Options ..contd

3. Use Page Ranking provided by Google / Yahoo4. File Level: Quality of file: • "Do You Care if Your WSDL is W3C Compliant?"

o Good format, thoroughness. Heuristics on model files.

5. Generate referral chain from WSDLo Understand citation network in order to

determine valuable web serviceso Web services often use methods / objects from

other web services. Use this linking to rank web services.

<?xml version="1.0"?><definitions name="StockQuote"targetNamespace="http://example.com/stockquote.wsdl"xmlns:tns="http://example.com/stockquote.wsdl"xmlns:xsd1="http://example.com/stockquote.xsd"xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"xmlns="http://schemas.xmlsoap.org/wsdl/">

<message name="SubscribeToQuotes">...element="xsd1:SubscriptionHeader"/></message><portType name="StockQuotePortType"><operation name="SubscribeToQuotes">...</operation></portType>

www.wbslogger.com

Future work

• Develop our own crawler• Further improve clustering (there is always room for that!)

• Figure out an innovative (&& effective) way for ranking

• Location based clustering

Questions ?

Documents

A Web Services Search Engine CS 8803 [AIA] - Spring 2008 Roland Krystian Alberciak Piotr Kozikowski Sudnya Padalikar Tushar Sugandhi