21
Googalize your Search with DirectInfo Documents DirectInfo Documents - New Features Author: Kiril Rusev Software Architect Semantec Bulgaria OOD Semantec GmbH Benzstr. 32 D-71083 Herrenberg, Germany www.semantec.de

Googalize your Search with DirectInfo Documents DirectInfo Documents - New Features Author: Kiril Rusev Software Architect Semantec Bulgaria OOD Semantec

Embed Size (px)

Citation preview

Googalize your Search with DirectInfo Documents

DirectInfo Documents - New Features

Author:

Kiril Rusev

Software ArchitectSemantec Bulgaria OOD

Semantec GmbHBenzstr. 32D-71083 Herrenberg, Germanywww.semantec.de

Agenda Motivation What is DirectInfo Documents? What's new? Live Demo Future development

Motivation - The Need

??

?

Motivation - The Challenge

DatabaseData

Email

LocalFiles

Internet

Intranet

Motivation - The Answer

Oracle TextIndex

DirectInfo

Document Files

Database Data

Web Contents

Structured Search Results

What is DirectInfo? A framework based on Oracle Text Can index and search into various

data sources Can be extended Can be adjusted to the customer’s

needs

Oracle Text - how does indexing work?

DirectInfo and Oracle Text

Oracle Text

Context indexes withUSER_DATASTORE

Full control over the indexing

Flexible and extensible filtering

Custom defined document grouping

Regular index management

Effective cachingmechanism

Fast and flexiblesearching

A lot of context information

Summarizingcapabilities

Oracle

DirectInfo

DirectInfo Architecture

Search Results

- Text fragments- Document summary- File information- Direct link to every document- ...

DirectInfo

Index Groups

Documents:local files, web content,

email, third partysystems, etc.

DocumentsMeta Info

Text Indexes

Document Cache

Data Retrieval

Crawling

GatheringMeta Info

Indexing

Users

Sending Keywords

Searching

Getting SearchResults List

Preparing TheResults

Getting Results

Direct link to every document

Security

Checkinguser rights

Crawlers

What is DirectInfo Documents? Based on DirectInfo platform A powerful document searching

tool A web based “google-like”

application Easily managed and deployed

What's new? Speed improvement Robustness Manageability Functional improvements LF and search results presentation

improved

Speed improvement – Document Cache

User DatastorePL/SQL Procedure

NullFilter

PDF

PDF

HTMLHTML

Filtering

HTML

DocumentCache

Store/Retrieve HTML

• Filtering is done only once• The HTML version of the document is cached

Speed improvement – Faster Crawling

DirectInfo

Internet

Local Files

Email

Crawler Interface

File Crawler

Web Crawler

Other…

Crawlers are adjusted according to the target document sources

Robustness – Better Filtering

Before: DatastoreINSO Filter

PDF PDF HTML

XFilter

After: DatastorePDF HTML NULL

Filter

HTML

Filter 1 Filter 2 Filter N…

Manageability - Indexing in Chunks

Before: Dtx_Ddl.Sync_Index Index

Unstoppable !!!

After: Index

Dtx_Ddl.Sync_Index

Dtx_Ddl.Sync_Index

Dtx_Ddl.Sync_Index………

Functional improvements - Duplicated Files Detection

Before:

Found Files Indexed Files

After:

Found Files

Indexed Files

Functional improvements - Summarizer

LF and search results presentation improved Deferred fragments loading Skins support, XP look and feel Visual and functional redesign -

HTML Frames Searching made more simple

Live Demo

Future development Defining and searching of meta

data Search results clustering Improved flexibility Improved administration Improved caching Better summarizing

Thank You!