View
155
Download
2
Category
Preview:
DESCRIPTION
Open Search Server (OSS) is a search engine software developed under the GPL v3 open source licence.Built using the best open source technologies available, Open Search Server is a stable, high-performance piece of software. It is both a modern search engine and a suite of high-powered full text search algorithms.
Citation preview
Open Search Server documentation
PRELIMINARY DRAFT
Emmanuel Keller Author Emmanuel Gosse AuthorSebastien Andrivet Translator, proofreader
InfoPro Digital12-14, rue Mederic Paris France
http://www.open-search-server.com© InfoPro Digital 2009
2 | Open Search Server | Introduction
Quick start
Installing the JDK SoftwareOpen Search Server (OSS) requires a Java™ runtime environment (JRE) version 5 or newer.
1. Download the JDK software from either Sun Microsystems or IBM.
• Sun Microsystems provides its JRE (or JDK) for the Windows™ ,Linux and Solaris™ operating systems: Sundownload page .
• IBM® provides its JRE for the AIX™ and Linux operating systems: IBM developer kit2. Select an appropriate JRE/JDK version and download it.3. Install the JRE/JDK using the installation instructions.
Setting up then environment variables on a Windows™ SystemOn Windows, the only thing to do is to add an environment variable named JAVA_HOME.
1. Right click My Computer2. Select Properties3. Select the Advanced tab4. Select Environment Variables5. Edit or create a new entry named JAVA_HOME.6. JAVA_HOME must point toward the JDK software, for example: C:\Program Files\Java
\jdk1.6.0_14
Setting up environment variables on an UNIX SystemYou have to define the JAVA_HOME environment variable.
1. Set JAVA_HOMEReplace [jdk-path] by the location of you JDK. For example: /usr/jdk/jdk1.6.0_14
• Korn or bash shells: export JAVA_HOME=[jdk-path]• If you are using a Bourne shell:
JAVA_HOME=[jdk-path]export JAVA_HOME
• If you are using a C shell: set env JAVA_HOME [jdk-path]2. Set PATH
• Korn or bash shells: export PATH=$JAVA_HOME/bin:$PATH• Bourne shell:
PATH=$JAVA_HOME/bin:$PATHexport PATH
• C shell: set env PATH $JAVA_HOME/bin:$PATH
Downloading Open Search ServerDownload the appropriate package file for your environment.
1. Go to the download pages: http://sourceforge.net/project/showfiles.php?group_id=2608632. Select the release version you need. Usually you will be offered the following options:
4 | Open Search Server | Quick start
• Beta : the beta version. Lastest stage of development cycle.• Stable / Release: Stable releases, intended for production use.• Unstable / Alpha: Usually the lastest trunk version.
3. Choose the appropriate file / archive:
Options Description
documentation.pdf The documentation you are reading now, inPDF format.
open-search-server-XXX.zip Open Search Server archive in ZIP format.
open-search-server-XXX.tar.gz Open Search Server archive in tar.gz format.
open-search-server-XXX.war You can use the war file if you want todeploy it manually on an application server.
4. The download process should start immediately after you click on the name of the file.
Extracting the open-search-server folder from the archiveUncompress and/or unarchive the package file using your favorite tool.
Use your favorite tool to uncompress the archive and extract the open-search-server folder.• Windows / Mac: double clicking on the archive will usually decompress it and extract the folder.• ZIP archive on Unix system: You can use the unzip command line utility, for example: unzip open-
search-server-XXX.zip• TAR.GZ archive on Unix: You can use the tar command line utility, for example: tar -zxvf open-
search-server-XXX.tar.gz
Launching Open Search ServerStart the server by executing the start batch file.
Start the server by executing the start batch file.• On Windows, run the file start.bat as a command.
Open Search Server | Quick start | 5
• On Unix/Linux/Mac OS, open a shell, and execute start.sh.
The server is running, and will now start listening to the tcp port 8080.
Displaying the web interfaceOpen a compatible web browser (Internet Explorer, Firefox/Mozilla, Safari), then enter an url matching your server.
1. Open you favorite web browser.
2. Enter an url matching your server
• If the server runs on your desktop machine, you can use: http://localhost:8080• If OSS runs on a remote server, you should build the appropriate URL, like this: http://[server-
hostname]:8080
Setting up the index directoryYou must provide a path to the directory where you want to store the index data. We recommend that you start withthe web_crawler folder provided in the examples folder.
Enter the absolute path of the index directory.
• On Unix/Linux/Mac systems, enter the absolute path, for example: /home/me/open-search-server/examples/web_crawler
• On Windows systems, enter a Windows UNC pathname, for example: \\ComputerName\SomeFolder\open-search-server\examples\web_crawler
Entering the URL of the web site to be crawledThe pattern list lets you decide which URL will be crawled. Only URLs that match these patterns will be indexed.
6 | Open Search Server | Quick start
1. Select the Crawler panel.2. Then, select the Web sub-panel.3. Finally, select the Pattern list sub-panel.4. Enter, for example, http://www.open-search-server.com*5. Click on the Add button.
Starting the crawl process.The crawl process will download and index the url(s) you inserted in the patterns list.
1. Select the Crawl process sub-panel.2. Click on the Not running - Click to start button.3. Later, you can click on the same button to stop the crawl.
Open Search Server | Quick start | 7
Querying the indexYou can use the web interface to query the data in your index.
1. Select the Query panel.
2. Load the predefined search query template.
3. Enter a word in the field named Enter the query , for example: open
4. Click on the Search button
Testing the XML APITry the same request using the XML API to get an XML result. Open a new web browser with the following url:
8 | Open Search Server | Quick start
1. Open a new window on your web browser
2. Enter the following url: http://localhost:8080/select?qt=search&q=open
API Search / Select
API Search/Select is the interface to query the OSS search engine. The call is sent through a HTTP request. POST ORGET are both available. The engine will answer with a XML result.Url callBasic relative url is : /selectExamplehttp://localhost:8080/OpenSearchServer/select?q=test&qt=searchParameters
Note: Parameters have to be encoded in UTF-8.
Name Description Type Default value Needed?
q Searches forkeywords. Ex:q=try
Text yes (ou query)
query Same asparameterq. Ex.:query=try
Text yes (ou q)
qt Enables you topre-load a setquery in indexconfigurationfile config.xml.Ex.:qt=requestName
Text no
start Indicates thefirst result'srank shown.This parameterallows for apagination.Ex.:start=10
Number 0 no
rows Indicates thenumber ofrecords tobe returned.Associatedwith the 'start'parameter,This parameterallows for apagination.Ex.: rows=5
Number 10 no
lang Indicates thelanguage ofthe keywordspassed to
Text no
10 | Open Search Server | API Search / Select
Name Description Type Default value Needed?
parameter q.The enginewill use thematchinganalyzer. Ex.:lang=fr
collapse.mode Choosecollapsingmethod. Ex.:collapse.mode=optimized
[off|optimized|full]
no
collapse.field Activecollapsingon the fieldpassed as aparameter. Ex.:collapse.field=hostname
field's name no
collapse.max Indicates thenumber ofdocuments tosend beforecollapsingactivation. Ex.:collapse.max=2
Number 2 no
delete If thisparameter ispassed, thedocumentsreturned bythe query areremoved. Ex.:&delete
no
noCache Disables thecache (for thecurrent callonly). Ex.:&noCache
no
debug Enablesthe debuginformation inthe result. Ex.:&debug
no
fq Adds a filter tothe current call.The parameterscan be usedseveral timesin the same callfor successivefilters. Ex.:fq=date:20101201&fq=color:red
Text no
Open Search Server | API Search / Select | 11
Name Description Type Default value Needed?
rf Adds one ormore fieldsto send. Ex.:&rf=date&rf=color
Text (field'sname)
no
fl Same asparameter rf
Text no
sort Controlsresults order.Using theabbreviation+ ou - to sortby ascendingor descendingorder. Ex.:&sort=-date&sort=color
Text no
facet Enablesfaceting for thefield passed asa parameter.You can adda number inparenthesisto specifythe minimumcount. Ex.:&facet=colorou&facet=color(2)
Text(Number) no
facet.multi Same asparameterfacet, for usewith fieldscontainingmultiple values(multi-valuedfields). Ex.:&facet.multi=colorou&facet.multi=color(2)
Text(Number) no
XML result
Note: The answer is in XML format encoded in UTF-8.
12 | Open Search Server | API Search / Select
War deployment guide
This first version of the installation guide demonstrates that it takes few minutes to have a OSS server running andready to be used.1. Install Apache Tomcat or another JAVA server: This installation guide assumes that it is installed. Please refer to
standard installation procedures at the corresponding website. http://tomcat.apache.org/index.html Version 5 ornewer available.
2. Deploy the OSS war file: Put oss.war in 'tomcat/webapps' tomcat directory. Rename it as you want (but keep 'war'extension !). Ex. : oss.war
3. Configuration of war in Tomcat: In 'tomcat/conf/Catalina/localhost/' path, create a xml file named as same as youhave named your war at the step 2.1 (keep 'xml' extension !).Example : oss.xml
<Context docbase="oss.war" debug="0" crossContext="true"> <Environment name="JaeksoftSearchServer/configfile" type="java.lang.String" value="/mnt/all_oss/oss1/config.xml" override="true" /> </Context>
4. Configuration of the physical index: In any folder where you would like to put it (no special needs), use '/mnt/all_oss/', create the place you want to have your physical index at. For instance oss1 ( to match the previous steps).a) put the file config.xml in. (don't change its name !). You can observe that oss.xml refers to it.b) create a single folder named 'index' in oss1, At server start, empty index files will automatically be added
inside it.Example of a basic config.xml:
<configuration> <indices> <index name="index" searchCache="100" filterCache="100" fieldCache="500" /> </indices> <schema> <analyzers> <analyzer name="StandardAnalyzer" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> </analyzer> <analyzer name="TextAnalyzer" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="en" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="SnowballEnglishFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="fr" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="FrenchStemFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="de" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballGermanFilter" />
14 | Open Search Server | War deployment guide
</analyzer> <analyzer name="TextAnalyzer" lang="nl" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="DutchStemFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="es" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballSpanishFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="it" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballItalianFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="pt" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballPortugueseFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="no" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballNorwegianFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="se" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballSwedishFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="fi" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballFinnishFilter" /> </analyzer> </analyzers> <fields default="content" unique="url"> <field name="lang" indexed="yes" stored="yes" /> <field name="title" analyzer="TextAnalyzer" indexed="yes" stored="compress" termVector="positions_offsets" /> <field name="titleExact" analyzer="StandardAnalyzer" indexed="yes" stored="compress" termVector="positions_offsets" /> <field name="content" analyzer="TextAnalyzer" indexed="yes" stored="compress" termVector="positions_offsets" /> <field name="contentExact" analyzer="StandardAnalyzer" indexed="yes" stored="compress" termVector="positions_offsets" /> <field name="contentBaseType" indexed="yes" stored="yes" /> <field name="url" indexed="yes" stored="yes" /> <field name="urlSplit" indexed="yes" stored="no" analyzer="TextAnalyzer" termVector="positions_offsets" /> <field name="urlExact" indexed="yes" stored="no" analyzer="StandardAnalyzer" termVector="positions_offsets" /> <field name="metaDescription" indexed="no" stored="compress" /> <field name="metaKeywords" indexed="no" stored="compress" /> <field name="host" indexed="yes" stored="yes" />
Open Search Server | War deployment guide | 15
</fields> </schema> <parsers> <parser class="com.jaeksoft.searchlib.parser.HtmlParser" sizeLimit="8388608"> <contentType>text/html</contentType> </parser> <parser class="com.jaeksoft.searchlib.parser.PdfParser" sizeLimit="8388608"> <contentType>application/pdf</contentType> </parser> <parser class="com.jaeksoft.searchlib.parser.DocParser" sizeLimit="8388608"> <contentType>application/msword</contentType> </parser> <parser class="com.jaeksoft.searchlib.parser.PptParser" sizeLimit="8388608"> <contentType>application/vnd.ms-powerpoint</contentType> </parser> </parsers></configuration>
16 | Open Search Server | War deployment guide
Recommended