17
 Search 2 Documentation KnowledgeTree's Search capability has been significantly overhauled and imp roved in Knowle dgeT ree 3.5. Sea rch2 is bui lt on a Lucene-based searc h engine, and includes the exposure of a power ful search vocabulary via Kn owledgeTree’ s Web Services interface. A Que ry Builder in the Web inte rf ace al lows you to build complex expr essions, whi ch you use to query the Knowl edgeT ree document repository via Web Services.

Search 2

Embed Size (px)

Citation preview

Search 2 DocumentationKnowledgeTree's Search capability has been significantly overhauled and improved in KnowledgeTree 3.5. Search2 is built on a Lucene-based search engine, and includes the exposure of a powerful search vocabulary via KnowledgeTrees Web Services interface. A Query Builder in the Web interface allows you to build complex expressions, which you use to query the KnowledgeTree document repository via Web Services.

Table of ContentsUser Documentation Administrator Guide User Guide Optimization Developer Documentation Roadmap Architecture Extractors Web service locating documents Internals Process Search URL BrowseCriterion

User DocumentationAdministrator GuideFrom KnowledgeTree Document Management Made SimpleConfiguration[ search ] ; The number of results per page ; defaults to 25 resultsPerPage = default ; The date format used when making queries using widgets ; defaults to Y-m-d .... NOTE Future development dateFormat = default [ indexer ] ; The core indexing class coreClass=PHPLuceneIndexer ; The number of documents to be indexed in a cron session ; defaults to 20 batchDocuments = default ; The location of the lucene indexes luceneDirectory=${varDirectory}/indexes ; The url for the Java Lucene Server. This should match up the the Lucene Server configuration. ; Defaults to http://localhost:8875 javaLuceneURL = default

Setting up the Lucene Directory If using the Java Lucene Server, simply start the server. Ensure that it is configured correctly. Some more information is available in ktroot/bin/luceneserver/README.TXT Edit the config.ini and ensure that the 'javaLuceneURL' field is correct. If using the PHP Lucene Server, you need to run the search2/indexing/bin/recreateIndex.php. Registering new extractors If a new extractor has been added to the search2/indexing/extractors folder, the search2/indexing/bin/registerTypes.php script must be run to associate them with the correct mime types. Note that old associations will not be overwritten. Migration

Migrating to the new server requires that the content of the full text tables are extracted and inserted into the Lucene indexes. This is done using the search2/indexing/bin/migrate.php script. (this feature can be heavy - care should be taken when implementing) Search Results Ranking Review the 'search_ranking' table to find the weightings associated with matching subexpressions. These may be modified to improve the relevance of search results according to your needs. Status TODO: The lucene indexers should provide some statistics on the lucene index. It should provide some general information on the index, but a diagnostics function should be available to ensure that the correct version of the documents are indexed and possibly reschedule indexing if there is a mismatch for some reason. (this feature could be heavy on the system - care should be taken when implementing) Background Tasks search2/indexing/bin/cronIndexer.php - task to batch index files. search2/indexing/bin/optimise.php - task to optimise the lucene index. The indexing script should be run frequently - say every 5 minutes. The config.ini allows for the number of documents to be indexed to be configured. This defaults to 20. If the frequency is shortened, you may want to decrease the number of documents that will be indexed so that there is no serious load that can impact on the performance of the system. The lucene index requires optimisation to ensure that performance is optimal. This could be run once a day around midnight, or weekly depending on frequency of updates to the index. HOWTO - how to run a php script from the command linephp -Cq script.php

User GuideFrom KnowledgeTree Document Management Made SimpleThe new search engine provides for more complicated search expressions than were possible in the past. Expression Language The core of the search engine is the 'expression language'. Expressions may be built up using the following grammar:expr ::= expr { AND | OR } expr expr ::= NOT expr expr ::= (expr) expr ::= expr { < | | >= | CONTAINS |STARTS WITH | ENDS WITH } value expr ::= field BETWEEN value AND value expr ::= field DOES [ NOT ] CONTAIN value expr ::= field IS [ NOT ] LIKE value value ::= "search text here"

A field may be one of the following:CheckedOut , CheckedOutBy , CheckedoutDelta , Created , CreatedBy , CreatedDelta , DiscussionText , DocumentId , DocumentText , DocumentType , Filename , Filesize , Folder , GeneralText , IsCheckedOut , IsImmutable , Metadata , MimeType , Modified , ModifiedBy , ModifiedDelta , Tag , Title , Workflow , WorkflowID , WorkflowState , WorkflowStateID

A 'field' may also refer to metadata using the following syntax: ["fieldset name"]["field name"] Note that 'values' must be contained within "double quotes". Example ExpressionsTitle contains "contract" and filesize