Upload
egbert-stewart
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
A Web Services Search Engine
CS 8803 [AIA] - Spring 2008
Roland Krystian Alberciak Piotr Kozikowski
Sudnya Padalikar Tushar Sugandhi
Outline• Project Overview
• Searching Web-serviceso Tools / APIso How to figure out what information to show
• Results :Working prototypeo Locate, classify, rank, and present web-services
• System Integrationo Diversity!
Languages (no joke): Python, Ruby on Rails, PHP, C#, Java, Perl.
Databases: MySQL, MSSQL
Project Overview
Step 1 - There are web-services available on the web
Step 2 - (Challanges)Obstacles to find WS vs. web pages because:
Effort to RegisterDirectories disconnected No Clustering availableNo Ranking available
Step 3 - ProfitShould be Beneficial for Web DevelopersShould be Beneficial for us
What is out there?• Swoogle -“10,000 ontologies” (they are
more concerned with “semantic web” and “metadata”, and not so much on web services)
• Programmableweb -726 (only APIs)• "Yellow pages" - 5000 web-services• XMethods - 500 web-services• UDDI - Discontinued but was useful to
many web services to advertise themselves.
We found solutions for Step 2!
Step 1. Have web-services available on the webStep 2. (Solutions) Crawler, database, web application and abunch of clustering algorithms and lots of "glue"Step 3. Our proposed solution - Web Slogger! - for us: content based advertising - for users: easy way to search for web-services
Crawling
Yahoo!Why not Google?Restricted extraction: Could not extract many results What about Alexa?Couldn't afford it! :-)What did we crawl for?.wsdl and .asmx filesHow is Webslogger different from the Yellow Pages project (last year's class project)?• Multiple Language support
Categorization and ClusteringGlossaries • Hierarchical Categirization (27 Categories)• List of keywords for each category (2800 keywords)
Web Service Partitioning By Importance Some sections in web service are more important than othe r e.g. Service Name / Operation Name is more important than message type name.Affinity Vector• Weight assigned to each term in Webservice based
on its mapping with Glossary• Determines which web service belongs to which
category
Ranking InsightFundamental Difference: Web page ranking is based on inlinks and outlinks. Web service ranking should be based on objects and web methods.
Recall: Our results are extracts from search engines. Therefore: • We don't know how many pages link to a particular
wsdl file. • Search engine algorithms [ie. PageRank] have this
data and can assert 'popularity', 'credibility' of hubs which locate sources.
Resolution: We must find alternate ways to rank content
Ranking Options1. Community Level: Collaborative Ranking: • users can leave comments, • Likert scale ranking • rank good users / bad users in the community:
experts
2. User Level: Usage statistic ranking: • how long you view a wsdl • do you go back to look at it again [since it is like an
API...] • inquire about what wsdl files they used to achieve a
goal
Ranking Options ..contd
3. Use Page Ranking provided by Google / Yahoo4. File Level: Quality of file: • "Do You Care if Your WSDL is W3C Compliant?"
o Good format, thoroughness. Heuristics on model files.
5. Generate referral chain from WSDLo Understand citation network in order to
determine valuable web serviceso Web services often use methods / objects from
other web services. Use this linking to rank web services.
<?xml version="1.0"?><definitions name="StockQuote"targetNamespace="http://example.com/stockquote.wsdl"xmlns:tns="http://example.com/stockquote.wsdl"xmlns:xsd1="http://example.com/stockquote.xsd"xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"xmlns="http://schemas.xmlsoap.org/wsdl/">
<message name="SubscribeToQuotes">...element="xsd1:SubscriptionHeader"/></message><portType name="StockQuotePortType"><operation name="SubscribeToQuotes">...</operation></portType>
Future work
• Develop our own crawler• Further improve clustering (there is always room for that!)
• Figure out an innovative (&& effective) way for ranking
• Location based clustering