Searching uPortal with a third party Search Engine
Katya SadovskyUniversity of California, Irvine
Administrative Computing Services
Agenda
Our goalsOur current setupBuilt-in vs. Third Party Search EngineDynamic vs. Static ContentIssues in combining uPortal with a search
engineDemonstrationQuestions & Answers
Our goals
Use the portal as a “gateway” to information
Allow users to search for pertinent portal content
Present users with integrated search results (portal and non-portal content)
Aid the search engine in weighing the results (meaningful page title, metadata, etc.)
Our current setup
uPortal 2.0.3Verity Ultraseek Search Engine (formerly
Inktomi)Tomcat 4.0.6
Built-in vs. Third Party Search Engine
Pros to using a built-in search engine: Ensure generation of correct links to content Present users with customized (user-specific)
result sets Ability to fully utilize channel metadata Employ portal’s authorization infrastructure
Built-in vs. Third Party Search Engine
Pros to using a third party search engine Well tested mature functionality Well developed dictionary and thesaurus Ability to search content beyond uPortal and
present users with integrated search results URL filtering capabilities Useful but optional: nice administrative GUI,
quick link definitions
Dynamic v.s. Static ContentuPortal generates dynamic content that
depends on user's preferences, security level, browser and operating system
Most search engines are designed to work with static content: Search engines index content on a periodic
basis and use cached/stored index to present user with search results
Search results are not user-specific Only public content is indexed
Issues/Areas of difficulty
User Agent settingFiltering out certain URLsDeciding what to search:
Search index/start page Searchable v.s. non-searchable content
Generating links to channels using: global (published) vs. instance (subscribed) ID functional names
Page title used in search results
User Agent Issues:
uPortal needs to know the mapping between a user agent and a MIME type/output type
When user agent is not recognized, uPortal will display a screen allowing users to choose a profile to use
Solutions: If you know the user agent reported by the search
engine – add a mapping to the UP_USER_UA_MAP table
Choose a search engine that allows you to specify a user agent
Example: setting a search engine user agent
Filtering out certain URLs
Issues: A search engine may follow a link that includes
a channel option or command uPortal URL tags:
• Dynamically generated for each URL hit
• Tags, other than 'idempotent' make search result senseless
• While indexing content, a search engine may enter a loop referencing the same page with different tags
Filtering out certain URLs (cont’d)
Solutions: acquire a search engine that allows URL
filtering and filter out all “offending” URLs If available with the search engine, use
advanced URL “de-duping”
Example: Filtering out certain URLs
Example: using URL filters
What to search: index/start page
Issues: A user layout may not be used as a starting
point for a search engine: a typical layout doesn't contain all the channels
Need a page with 'idempotent' links to all the searchable channels
Solutions: Searchable Channel Index channel
What to search: searchable v.s. non-searchable content
Issue: not all channels needed
to be included in the search
Solution: added a 'searchable'
attribute to all the channels
CSearchRegistry channel
CSearchRegistry: stylesheet
Generating links to channelsProblem: channel instance (subscribed) IDs vary
from user to user, so the search result links are inconsistent
Solutions: link to channels using global (published) IDs -- involves code changes functional names (fname) -- this is a new functionality,
available in CVS (Concurrent Versions System)
Linking to channels via their published IDs: implementation plan
Modified org/jasig/portal/UserInstance.java to recognize that user is asking for a published channel that may not be in user’s layout
Create a temporary hidden folder in user’s layout to store “temporary” channels (make sure to delete this folder before layout is saved to the database)
Add XML channel definitions to this hidden folder
Proceed to render as usual
Page titles used in search resultsIssues:
Out of the box, uPortal has a statically set page title (no matter what channel is viewed)
Search engines generally use page titles (or other metadata) for:
• search result titles• result ranking• de-duping
Users have to be trained to enter meaningful page titles when creating documents/channels (e.g. do not start each page title with UCIrvine)
Page titles used in search resultsSolution:
when channels are rendered in 'focused’ or ‘detached’ mode, add channel title to the default page title (following is a fragment of webpages/stylesheets/org/jasig/portal/layout/tab-column/nested-tables/nested-
tables.xsl): <xsl:template match="layout_fragment">
... <title><xsl:value-of select="$windowTitle"/> <xsl:value-of select="concat(': ',content//channel/@description)"/> </title> ...</xsl:template>
<xsl:template match="layout">... <title><xsl:value-of select="$windowTitle"/> <xsl:if test="//focused"> <xsl:value-of select="concat(': ',//focused/channel/@description)"/> </xsl:if> </title>...</xsl:template>
Example: page titles
Conclusions
There are tradeoffs when using either a built-in or a third-party search engine
We have yet to address the following issues: searching restricted content creating META data tags to help the search
engine with content ranking
Overall, our portal project could not succeed without a search function
Links
UC Irvine’s uPortal installation (SNAP): http://snap.uci.edu
This presentation: http://snap.uci.edu/PortalDocs/uPortal_Search.ppt
Demo
Questions?