2
US: Phone: 703-953-2791 UK: Phone: 01344 292 292 www.searchtechnologies.com [email protected] INTRODUCTION Reed Business Information (RBI), a leading provider of business information, data and marketing solutions, produces industry critical data services and lead generation tools, as well as online community and job websites. RBI reached out to Search Technologies for help with a migration project, to take them from FAST ESP to a Solr-based infrastructure running on Amazon AWS. DETAILS RBI has been using FAST to power search on a wide range of websites since 2005. FAST ESP has proved to be a highly reliable platform for this purpose. In addition to very low down-time, it has provided large query capacities, and has been used as a platform to develop or deploy a range of sophisticated, multi-lingual capabilities for entity extraction and categorization, powering both search, and the contextual serving of widgetized links across hundreds of websites. The future for the FAST technology, owned by Microsoft since 2008, is SharePoint-centric, and as an agile publisher of both business and consumer titles to niche audiences, RBI decided that Solr was their preferred alternative to replace FAST ESP. RBI contracted with Search Technologies to deliver consulting and implementation services, and to manage the FAST ESP to Solr transition project. The overall project objective was simply to emulate the FAST-based service, without loss of functionality or performance. The websites served by this application use a range of languages, including Chinese (simple and traditional), English, Dutch, Spanish, French, Italian and German. The content sets to be indexed include the participating websites, to enable sophisticated site search functionality, plus numerous other content sources, to provide supplementary information and news, usually focused around specific industry verticals. The new system also powers RBI’s business search portal, Zibb.com. SAME FUNCTIONALITY, MUCH LOWER COSTS The key challenge set by RBI, was to find a way to maintain existing functionality, providing a highly functional and reliable service to publishers within RBI, and at the same time to substantially lower the overall cost-of-ownership of the search infrastructure. Some key aspects of the existing infrastructure used FAST-specific methods. In addition, FAST was running on a substantial, Microsoft-hosted facility involving more than 90 servers. THE APPROACH A key aspect of the requirements was that publications using the search service should not be required to change anything in their configuration. This necessitated emulating a number of FAST ESP methods, including: Transforming FAST FQL search requests into Solr’s query syntax Returning results in standard FAST ESP format (by manipulating Solr’s XML-based results) Making use of existing content processing capabilities, such as entity extraction and categorization CASE STUDY: Reed Business Information – A FAST ESP to Solr Migration

A FAST ESP to Solr Migration Case Study - RBI

Embed Size (px)

DESCRIPTION

Reed Business Information reached out to Search Technologies for help with a migration project, to take them from FAST ESP to a Solr-based infrastructure running on Amazon AWS. Read how Search Technologies implemented this project. http://www.searchtechnologies.com/fast-solr-migration-case-study.html

Citation preview

Page 1: A FAST ESP to Solr Migration Case Study - RBI

US: Phone: 703-953-2791 UK: Phone: 01344 292 292

www.searchtechnologies.com [email protected]

INTRODUCTION Reed Business Information (RBI), a leading provider of business information, data and marketing solutions, produces industry critical data services and lead generation tools, as well as online community and job websites. RBI reached out to Search Technologies for help with a migration project, to take them from FAST ESP to a Solr-based infrastructure running on Amazon AWS. DETAILS RBI has been using FAST to power search on a wide range of websites since 2005. FAST ESP has proved to be a highly reliable platform for this purpose. In addition to very low down-time, it has provided large query capacities, and has been used as a platform to develop or deploy a range of sophisticated, multi-lingual capabilities for entity extraction and categorization, powering both search, and the contextual serving of widgetized links across hundreds of websites. The future for the FAST technology, owned by Microsoft since 2008, is SharePoint-centric, and as an agile publisher of both business and consumer titles to niche audiences, RBI decided that Solr was their preferred alternative to replace FAST ESP. RBI contracted with Search Technologies to deliver consulting and implementation services, and to manage the FAST ESP to Solr transition project. The overall project objective was simply to emulate the FAST-based service, without loss of functionality or performance.

The websites served by this application use a range of languages, including Chinese (simple and traditional), English, Dutch, Spanish, French, Italian and German. The content sets to be indexed include the participating websites, to enable sophisticated site search functionality, plus numerous other content sources, to provide supplementary information and news, usually focused around specific industry verticals. The new system also powers RBI’s business search portal, Zibb.com. SAME FUNCTIONALITY, MUCH LOWER COSTS The key challenge set by RBI, was to find a way to maintain existing functionality, providing a highly functional and reliable service to publishers within RBI, and at the same time to substantially lower the overall cost-of-ownership of the search infrastructure. Some key aspects of the existing infrastructure used FAST-specific methods. In addition, FAST was running on a substantial, Microsoft-hosted facility involving more than 90 servers. THE APPROACH A key aspect of the requirements was that publications using the search service should not be required to change anything in their configuration. This necessitated emulating a number of FAST ESP methods, including:

Transforming FAST FQL search requests into Solr’s query syntax

Returning results in standard FAST ESP format (by manipulating Solr’s XML-based results)

Making use of existing content processing capabilities, such as entity extraction and categorization

CASE STUDY:

Reed Business Information – A FAST ESP to Solr Migration

Page 2: A FAST ESP to Solr Migration Case Study - RBI

It was agreed that Search Technologies’ Aspire Content Processing Platform, and QPL, the Query Processing Language, should be used to effect these transitions. Therefore, the overall solution comprised Solr, Aspire, plus a number of existing technologies within RBI, many of which are home-grown. THE PROJECT

Over a period of a few months, with regular daily calls between the RBI and Search Technologies’ teams, the project was detailed and progressed. Key decisions included:

The use of Amazon AWS to host the new search service

The development of a query parser to translate FAST FQL into Solr search syntax, and to translate Solr search results into a FAST ESP format, so that the receiving Content Management Systems would not notice any changes and would function as normal

The use of software-based load balancing to send queries to servers with spare capacity

The project also involved a significant amount of work to re-create the FAST ESP index pipeline, including interfaces to third party tools. This was achieved using the’ Aspire Content Processing framework. Solr’s native language processing capabilities coped adequately with the multi-language demands of this project.

IN SUMMARY Graeme McCracken, CIO at Reed Business International, commented, “We progressively transitioned our sites and services from FAST to Solr over a two-week period, and nobody noticed. At the same time, we reduced our on-going cost-of-ownership by more than half.” Numerous Reed Business properties are now served by this Solr-based service. Design specifications called for an average search-time of less than 200 milliseconds. The live system is consistently delivering an average of 70 milliseconds. The new Solr search system has more than 30 million documents under index, and it meets sustained capacity demands of more than 300 queries-per-second, without compromising search speed.

About Search Technologies Leadership The largest IT service company dedicated to enterprise search implementation, consulting, and managed services Independence

Working with all of the leading

search software vendors and

open source alternatives Experience 400+ customers and more than 50,000 consultant days of expert services delivered in the last four years alone