View
216
Download
1
Embed Size (px)
Citation preview
Federated Federated Searching: The Searching: The
ABC’s of HSE, XML, ABC’s of HSE, XML, & Z39.50& Z39.50Harry Samuels
Product Manager Linking & Searching
August 27, 2004
TopicsTopics
The Challenge of Federated Searching Z39.50 XML Gateways HTTP Searching So, Where Are We Now? The Future
SRW/SRU NISO Metasearch Initiative The Generic XML Gateway API
The Challenge of Federated The Challenge of Federated SearchingSearching
To execute federated searching, one needs a protocol or mechanism to search each of the electronic resources one would like to search
But one protocol does not fit all in the federated search environment - different electronic resources require different mechanisms
The challenge is to figure out how an electronic resource can be searched and have the right mechanism in place for each situation
Z39.50Z39.50
The protocol we love to hate Z39.50 is the oldest of the commonly used
search mechanisms Almost every integrated library system can
be searched using Z39.50 Despite the issues with Z39.50 it provides
a fairly dependable mechanism for searching
Z39.50Z39.50
The main problem with Z39.50 is that very few content providers implemented Z39.50
But it is the content of the commercial providers that we really want to search from our federated search systems
XML GatewaysXML Gateways
Enter the XML gateway But first of all, what does XML gateway mean? As in Z39.50, there must be an XML gateway
client that transmits search queries and accepts results – This is the part of the XML gateway that is in the federated search system
There must also be an XML gateway server that responds to search queries – This is the part of the XML gateway that is at the content provider site
XML GatewaysXML Gateways
An XML gateway client sends a search query over http
The query is (1) packeded into the query string of a URL or (2) packaged into an XML document that is posted to the resource
Regardless of how the query is packaged the results are sent back in an XML document over http
The use of XML in at least one of the steps gave rise to the name XML Gateway
XML GatewaysXML Gateways
XML gateways provide an alternative mechanism for searching an electronic resource
Every XML gateway is different and every XML gateway requires special programming or special configuration
As electronic resource providers implement search mechanisms they are implementing XML gateways and not Z39.50 servers
XML gateways are the future – the world of electronic resources and federated searching just needs to catch up with the future
HTTP SearchingHTTP Searching
Z39.50 was implemented by very few content providers and XML gateways are just now catching on – so how do we search everything else
The same way a user does… The federated search system pretends to be
a user sitting at a web browser – it simulates the actions of a human user by generating URL’s that are understood by the electronic resource – and then extracting the information off of the web pages that are returned
HTTP SearchingHTTP Searching
This is possible because almost all electronic resources are accessed over the web
At Endeavor, we simply call the HTTP Search Engine the HSE
It is capable of searching hundreds of web sites and databases that are inaccessible via Z39.50 or XML gateways
Some federated search engines use HTTP searching as the preferred search mechanism
HTTP SearchingHTTP Searching
Despite its reach, there are issues with HTTP searching
It usually cannot retrieve a large set of metadata in its results sets
If the user interface of an electronic resource changes then the HSE connector for that resource usually breaks – this means that HTTP searching is fragile and requires constant maintenance
So Where Are We Now?So Where Are We Now?
Adoption of Z39.50 has stalled XML gateway adoption is in the early
stages and many content providers do not yet have them
HTTP searching can search far more resources than Z39.50 or XML gateways, but it is fragile and usually does not retrieve a robust set of metadata
The FutureThe Future
SRW/SRU NISO Metasearch Initiative The Generic XML Gateway API
SRW/SRUSRW/SRU
The next generation of Z39.50 over the web “Search and Retrieve Web Service (SRW) and
Search and Retrieve URL Service (SRU) are Web Services-based protocols for querying databases and returning search results.”
Eric Lease Morgan http://www.loc.gov/z3950/agency/zing/srw/ It is a version of an XML gateway that holds
the promise of a standard XML Gateway protocol
NISO Metasearch InitiativeNISO Metasearch Initiative“NISO's metasearch Initiative will identify, develop,
and frame the standards and other common understandings that are needed to enable an efficient and robust information environment. The goal of NISO's Metasearch Initiative is to enable:
metasearch service providers to offer more effective and responsive services
content providers to deliver enhanced content and protect their intellectual property
libraries to deliver services that distinguish their services from Google and other free web services. “
http://www.niso.org/committees/MS_initiative.html
The The Generic XML Gateway APIGeneric XML Gateway API
We couldn’t wait… ENCompass already had an XML gateway
search infrastructure From that infrastructure, we created a
generic gateway and documented it It is freely available to Endeavor customers When content providers ask us “how to
build an XML gateway” we share the specification with them
Questions?Questions?