16
caDSR Freestyle Search June 11, 2009

CaDSR Freestyle Search June 11, 2009. caDSR Freestyle Search Overview Architecture Implementation Dependencies Futures 2

Embed Size (px)

Citation preview

caDSR Freestyle SearchcaDSR Freestyle Search

June 11, 2009

2

caDSR Freestyle SearchcaDSR Freestyle Search

• Overview• Architecture• Implementation• Dependencies• Futures

3

caDSR Freestyle Search - OverviewcaDSR Freestyle Search - Overview

• Provides a “Google” like search across caDSR• Case Insensitive• Results limited to only highest ranked matches –

does *not* normally return all matches• Match weight a result of term sequence,

intervening terms, number of occurrences, Workflow Status, Registration Status, Administered Item type, etc

• Results sorted descending by weight, i.e. heaviest match appears at the top of the list

• Does not require user to know caDSR structure or objects/attributes to perform searches

4

caDSR Freestyle Search - OverviewcaDSR Freestyle Search - Overview

• Stakeholders:– Form Designers– Modelers– Developers– Analysts– Clinicians– Statisticians– Researchers– Curators– caBIG– NCI

5

caDSR Freestyle Search - ArchitecturecaDSR Freestyle Search - Architecture

• Technologies– Java 1.5– Javascript– HTML 4– JDBC– Struts– EVS 4.2

6

caDSR Freestyle Search - ArchitecturecaDSR Freestyle Search - Architecture

• Struts / JSP / HTMLView

• JBossController

• Java 1.5Application

• Class, InterfaceModel

• JDBC, PL/SQL, ANSI SQLDatabase

• Oracle 10gPersist

7

caDSR Freestyle Search - ArchitecturecaDSR Freestyle Search - Architecture

• Auto-deploy– Deployable via Anthill– Ant –DPROP.FILE=… build-all deploy

• SCM– CVS– .cvsignore for all transient files– One file, no duplicates, e.g. template.web.xml vs. web.xml

• All files placed in deployment-artifacts• Production deployment artifacts

– Accessible via links in email from Anthill– Files hosted on GForge for distribution– URL references to GForge hosting for Wiki, Download, etc

8

caDSR Freestyle Search - ArchitecturecaDSR Freestyle Search - Architecture

• Jboss/freestyle.war– Web Browser UI– Passes input to JAR and formats result in HTML, pure UI

layer• Gforge/freestylesearch.jar

– API interface for searches, options, etc• Bin/autorun.sh

– Deploys to /local/content/freestyle/bin/.– Automated job to update search indices– Scheduled and launched by CRON every morning at 3:00

am and every hour between 8:00 am and 5:00 pm

9

caDSR Freestyle Search - ArchitecturecaDSR Freestyle Search - Architecture

tool name

• FREESTYLE• SENTINEL• …

property• URL• EMAIL• …

value• http://freestyle..•[email protected]• …

…• …• …• …

Tool Options Table

• Tool options table hosts configuration values beyond 3rd party requirements, e.g. XML

• Dynamic– Values are read as needed – user sees changes in real time– Values cached when new session created – user must close

window– Values never cached with application – requires restart of

JBoss

10

caDSR Freestyle Search - ArchitecturecaDSR Freestyle Search - Architecture

• SQL script updates/sets tool option values– Updates limited to FREESTYLE tool name

• SQL may check database schema during deployment

– E.g. When a new column is added to a table/view a SELECT using the column name will throw an error if the database is not updated before deploying the tool

• SQL may *never* alter schema• SQL may perform data migration• Must be coordinated and negotiated with caDSR

database deployment scripts• Index updates write current timestamp on

completion

11

caDSR Freestyle Search - ImplementationcaDSR Freestyle Search - Implementation

• Project Structure– Conf

Configuration files, e.g. XML, which require value substitution during build and deployment

– Db-sqlScripts to correct errors in index tables

– DocPatterned after phases in development lifecycle with the addition of “Administration” for all documentation specific to NCI policies and processes and not directly pertinent to the product features

• Administration• Construction• Elaboration• Inception• Transition

– LibJAR files needed for building the project *but* not included in the deployment, e.g. ojdbc14.jar is deployment on Jboss and not packaged in project WAR but must be present to compile and build the WAR, allows for the separation of the build machine and the deployment target machine

12

caDSR Freestyle Search - ImplementationcaDSR Freestyle Search - Implementation

• Project Structure– Scripts

Console scripts to update index tables

– SrcJava source, more details follow

– WebRootThe deployed freestyle.war content

• Css• Html• Images• Js• Jsp• Meta-inf• Web-inf

– Lib– Tld

13

caDSR Freestyle Search - ImplementationcaDSR Freestyle Search - Implementation

• Packages– gov.nih.nci.cadsr.freestylesearch.test

Automated tests– gov.nih.nci.cadsr. freestylesearch.tool

Main business logic– gov.nih.nci.cadsr. freestylesearch.ui

Web Browser UI using Struts– gov.nih.nci.cadsr. freestylesearch.utl

Utility features, e.g. Search, results object types, etc

• Search entry utl/Search.java• Index table

– Update entry point utl/Seed.java– Configuration cont/template.seed.xml

14

caDSR Freestyle Search - ImplementationcaDSR Freestyle Search - Implementation

• Logging– freestylesearch_log.txt

Jboss messages from gov.nih.nci.cadsr.freestylesearch.*– Server.log

Jboss messages from 3rd party packages, e.g. struts– Seed_log.txt

Messages from the update to the index tables

15

caDSR Freestyle Search - DependenciescaDSR Freestyle Search - Dependencies

• caDSR API– The search results are returned in Freestyle defined class

objects or in AdministeredItem derived class objects per the search method used.

• Oracle 10g– The weight algorithm relies on calculations performed in

SQL, this is necessary to avoid sending large amounts of data to the web server for weight calculations.

16

caDSR Freestyle Search - FuturescaDSR Freestyle Search - Futures

• Upgrade the caDSR API as needed• Research use of Lucene• Add “sounds like” matching• Add singular/plural matching• Add wildcard support• Add Concept matching• Add selection of indirect Admin Item type, e.g.

return all DE where DEC is …• Improve performance (possibly define database

indexes on index table columns)• Add weight calculation customizations, e.g.

matches in long_name should be 2x all other columns