63

Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Embed Size (px)

Citation preview

Page 1: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH
Page 2: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Oracle TEXT 10g Release 1

New Features

Edwin BalthesOracle Support ServicesOracle Deutschland GmbH

Page 3: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

AGENDA

Multilingualer Lexer Multipart Mime Filtering Query Log Analyse – neues Package Progressive Relaxation JDeveloper wizards for text

search Near Accum Ctx_Report XML Output ALTER INDEX rebuild replace metadata

Page 4: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Globalisierung

• Unicode Lexer Erweiterung• Japanische Sprachunterstützung• Neue Deutsche Rechtschreibung

Page 5: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Unicode Lexer

• Neue Lexer Präferenz – World_Lexer• Support für jede Unicode 4.0 Sprache

Page 6: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Japanische Sprachunterstützung

• Delimiter characters • Fuzzy Funktion für das Japanische• Japanisches Stemming• Japanischer Unicode• Japanisches Benutzer Lexikon

Page 7: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Neue Deutsche Rechtschreibung

Alte Rechtschreibung

Neue Rechtschreibung

Potential PotenzialKatarrh Katarr, Katarrh Delphin DelfinErdgeschoß Erdgeschoss Schiffahrt Schifffahrt Weh tun WehtunIrgend etwas IrgendetwasSoviel So viel

• Alte Schreibweise• Neue Schreibweise

Page 8: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Query Template - Erweiterungen

• Query Rewrite • Progressive Relaxation• Spezifikation - Query Language • Alternatives Scoring• Alternative Grammatik

<query>…..</query>

Page 9: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Query Rewrite

Page 10: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Beispiel - Query Rewrite

SELECT * FROM purchaseorders WHERE CONTAINS (text,’<query><textquery lang="ENGLISH" grammar="CONTEXT"> Retail Sales<progression><seq><rewrite>transform((TOKENS, "{", "}", " "))</rewrite></seq><seq><rewrite>transform((TOKENS, "{", "}", " ; "))</rewrite>/seq><seq><rewrite>transform((TOKENS, "{", "}", "AND"))</rewrite></seq><seq><rewrite>transform((TOKENS, "{", "}", "ACCUM"))</rewrite></seq></progression></textquery><score datatype="INTEGER" algorithm="COUNT"/></query>’)>0;

Page 11: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1NEAR_ACCUMulate

NEAR_ACCUM((word1, word2,..., wordn) [, max_span [, order]])

Page 12: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Progressive Relaxation

Page 13: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Progressive Relaxation

select * from purchaseorders where CONTAINS (text,'<query>

<textquery lang="ENGLISH" grammar="CONTEXT">Retail Sales<progression><seq>{Retail} {Sales}</seq><seq>{Retail} NEAR {Sales}</seq><seq>{Retail} AND {Sales}</seq><seq>{Retail} ACCUM {Sales}</seq></progression></textquery><score datatype="INTEGER" algorithm="COUNT"/></query>')>0;

Page 14: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Query Templates - Erweiterungen

Multi_Lexer - Query Languageselect id from docs where CONTAINS (text,'<query><textquery lang="french">bon soir</textquery></query>')>0;

Alternatives Scoringselect id from docs where CONTAINS (text,'<query><textquery grammar="CONTEXT" lang="english"> mustang </textquery><score datatype="float" algorithm="DEFAULT"/></query>')>0

Alternative Grammatikselect id from docs where CONTAINS (text,'<query><textquery grammar="CTXCAT">San Diego</textquery><score datatype="integer"/></query>')>0;

Page 15: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1CTX_REPORT Package

• CTX_REPORT.DESCRIBE_INDEX • CTX_REPORT.DESCRIBE_POLICY • CTX_REPORT.CREATE_INDEX_SCRIPT • CTX_REPORT.CREATE_POLICY_SCRIPT • CTX_REPORT.INDEX_SIZE• CTX_REPORT.INDEX_STATS• CTX_REPORT.TOKEN_INFO • CTX_REPORT.QUERY_LOG_SUMMARY • CTX_REPORT.TOKEN_TYPE

Page 16: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1CTX_REPORT - Query Log Analyse

• Welche Abfragen wurden gemacht ? • Welche Abfragen waren erfolgreich ? • Welche Abfragen waren nicht erfolgreich ?• WAS wurde WIE HÄUFIG angefragt ?

Page 17: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1CTX_REPORT - Query Analyse

1. Start query logging2. End query logging3. Query log summary

Page 18: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1XML Output - CTX_REPORT Package

SELECT ctx_report.describe_index('DOCS_TEXT','XML') FROM dual;CTX_REPORT.DESCRIBE_INDEX('DOCS_TEXT','XML')--------------------------------------------------------------------------------<CTXREPORT><DESCRIBE_INDEX><INDEX_ATTRIBUTES><INDEX_ATTRIBUTE NAME="index name">"CTXSYS"."DOCS_TEXT"</INDEX_ATTRIBUTE><INDEX_ATTRIBUTE NAME="index id">1392</INDEX_ATTRIBUTE><INDEX_ATTRIBUTE NAME="index type">context …

Page 19: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Erweiterungen - Dokumenten Services

In 9i wurde für highlight, markup, tokens, filter, gist und markup ein Index benötigt.

Dies geht nun in 10gauch ohne einen Index

Page 20: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Alter Index Rebuild Replace Metadata

• Ersetzen der existierenden Präferenz-Settings durch neue Präferenz-Settings• betrifft auch die SYNC Parameter• Kein Neuaufbau des Textindexes

• ACHTUNG – eigene Verantwortung für einen konsistenten Index

Page 21: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Erweiterungen - Mail Filtering

• Konvertieren einer RFC-2045 Email in ein indizierbares Format

• Behandlung der Message Bodies basierend auf dem Content-Type • Text Meldungen werden in den DB Characterset konvertiert• Binärer Text wird gefiltert -> INSO • Andere nicht binäre Daten werden nicht ausgegeben

• Benutzerdefinierte Felder sind als Sektionen suchbar

Page 22: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Erweiterungen bei der Indizierung

• AUTO und ON COMMIT Synchronisierung für CONTEXT Indizes • Transaktionale CONTEXT Indizes• Automatische Multi-Language Indizierung• Unterstützung für Local Partitioned CONTEXT Indizes in parallel • Binäres Filtern für den MULTI_COLUMN_DATASTORE • Neue XML Output Option für Index Reports

Page 23: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Auto und ON COMMIT Synchronisierung

DML COMMITDML Pending Queue

Page 24: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Index Synchronisierung

CREATE INDEX <index_name> ON<table_name>(<column_name>)INDEXTYPE ISCTXSYS.CONTEXT PARAMETERS('SYNC(MANUAL |

ON COMMIT |EVERY "interval_string" MEMORY size PARALLEL degree');

CREATE INDEX index_name ON table_name(column_name) INDEXTYPE IS CTXSYS.CONTEXT LOCAL(PARTITION part_name1 PARAMETERS(' SYNC(MANUAL |ON COMMIT |EVERY "interval_string" MEMORY size PARALLEL degree'), PARTITION part_name2 PARAMETERS('...'),...)PARAMETERS('...');

Global Indexes

Local Indexes

Page 25: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Views - Synchronisierung

CTX_USER_INDEXES

CTX_INDEXES

CTX_INDEX_PARTITIONS

CTX_USER_INDEX_PARTITIONS

Page 26: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Transaktionale CONTEXT Indizes

SELECT… CONTAINS(…)

DML

Page 27: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1CTXXPATH - Erweiterungen

• Indizierung von Number und Unterstüzung von numerischen Range Searches • Attribute Existence • Positional Predicate.

Page 28: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1INPATH and HASPATH Erweiterungen

• Highlighting mit• INPATH• HASPATH

Page 29: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Änderung der Rechte für CTXSYS

DBA Privilege

Page 30: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1JDeveloper TEXT Wizards

•Text Wizard•Classification Wizard•Catalog Wizard

Page 31: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Text Wizard Demo

Page 32: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Text Wizard Demo

Page 33: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Text Wizard Demo

Page 34: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Text Wizard Demo

Page 35: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Text Wizard Demo

Page 36: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Text Wizard Demo

Page 37: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Text Wizard Demo

Page 38: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Text Wizard Demo

Page 39: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text 10g Release 1Text Wizard Demo

Page 40: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text – Simple Search

Page 41: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text – Advanced Search

Page 42: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Text – Knowledge Base

Page 43: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Oracle Ultra Search

Out-of-the-Box search engine– Basiert auf Oracle Text

Suche über intranet/extranet sources– Web, Databases, Files, Mail Servers,

Repositories Verfügt über Web style interface und Java API

für UserInterface Wird mit 9i Database, 9iAS/Portal, Collab.

Suite R2/3 ausgeliefert

Page 44: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Ultra Search

Page 45: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Ultra Search Adv. Search

Page 46: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Collab Suite Search App.

Page 47: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Ultra Search Architecture

SQL EngineSQL EngineOracle TextOracle TextWeb Server Web Server

Query & Query & Admin Admin

CapabilitiesCapabilities

Web Web BrowsBrows

erer

Ultra Search Client

Ultra Search Mid-Tier

Component

Ultra Search ServerCrawlerCrawler

CrawlerCrawlerCrawlerCrawlerCrawlerCrawler

Page 48: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Crawled Search Architecture

Mid-tier

Ultra Search

Search App.

Meeting

MeetingCrawlet

WebCrawler

Calendar

Mail Crawlet

Files

Files Crawlet

CalendarCrawlet

Mail

Search Repository

Oracle Confidential

Client

Page 49: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Caching DocumentsWK$DOCUMENT

Page 50: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Indexing Documents

Page 51: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Gathering URLs

Page 52: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Federated Search Architecture

Mid-tier

Search Federator

Calendar Meeting

Mail Searchlet

Files

Files Searchlet

CalendarSearchlet

Mail

UltraSearch MeetingSearchlet

WebSearchlet

Search App.

Google

Search RepositoryOracle Confidential

Client

Page 53: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Integrated Search Architecture

Mid-tier

Calendar MeetingFilesMail

Crawled Search

MeetingSearchlet

WebSearchlet

Google

Ultra Search Repository

Portal Crawlet

Database

DatabaseCrawlet

Portal

3rd-party Sources Search Federator

Mail Searchlet

Files Searchlet

CalendarSearchlet

Collab. Suite Search App.

Client

Page 54: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Java Query API

Page 55: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

New Feature AreasSecurityFederated SearchNeue repositories (Documentum, Lotus/Notes)Classification/Clustering

Page 56: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Advanced SecurityAuthentifizierung mit SSOUltra Search repository unterstützt ACLs für crawled Dokumente

–OID Integration für die group Info.–Benutzereingeschränkte Suche nach was erlaubt ist–Crawlet für (document, ACL) Paare

Nur mit Extensible Crawler repositories HTTPS, Digest Authentication support

Page 57: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Federated Search

Mid-tier

Calendar MeetingFilesMail

Crawled Search

MeetingSearchlet

WebSearchlet

Google

Ultra Search Repository

Portal Crawlet

Database

DatabaseCrawlet

Portal

3rd-party Sources

Search Federator

Mail Searchlet

Files Searchlet

CalendarSearchlet

Client

Page 58: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

New 10g Multimedia FeaturesStandards Support – SQL/MM Still Image

New version of Java Advanced Imaging and additional image processing operators

Support for additional media formats

–Microsoft ASF, MPEG2 & MPEG4

•Microsoft Windows Media Server Plugin

•Real Server Plugin for Helix Server

•XML DB integration

Page 59: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Standards Support

Oracle10g supports the first edition of the ISO/IEC 13249-5:2001 SQL/MM Part 5: Still Image Standard.

The standard defines object relational types for images and image characteristics. Each object type includes attributes, methods, and associated SQL functions and procedures.

Page 60: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Java Advanced ImagingSupport for JAI 1.1.1_01, the newest version of the SUN Open Standard for Image Processing

Additional image processing operators–Arbitrary Image Rotate

–Flip & Mirror

–Page extract from a multi page TIFF file

–Contrast Enhancement

–Quantize algorithm

–Gamma Correction

Page 61: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

Microsoft ASF & Windows Media Server

Advanced System Format has become a popular streaming media format on the web

–Oracle10g Database can parse ASF file format metadata

Windows Media Server–An Oracle developed plugin for the Microsoft Windows Media that enables it to stream ASF audio/video files stored in Oracle10g Database

–Analogous to the existing Oracle9i Database support for Real Networks streaming server

Available through OTN

Page 62: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH

AQ&FRAGENFRAGEN??????

Page 63: Oracle TEXT 10g Release 1 New Features Edwin Balthes Oracle Support Services Oracle Deutschland GmbH