39
Oracle Life Sciences User Group Meeting – Reston, VA 2004 Having a BLAST Data Mining in Oracle 10g: Implementing A Bioinformatics Target Database John Burke, Ph.D. UCB Research, Inc. Having a BLAST Data Mining in Oracle 10g: Implementing A Bioinformatics Target Database John Burke, Ph.D. UCB Research, Inc.

Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Having a BLAST Data Mining in Oracle 10g:

Implementing A Bioinformatics Target Database

John Burke, Ph.D.UCB Research, Inc.

Having a BLAST Data Mining in Oracle 10g:

Implementing A Bioinformatics Target Database

John Burke, Ph.D.UCB Research, Inc.

Page 2: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Having a BLAST Data Mining in Oracle 10gPreviewPreview

UCB Discovery Research

Designing the Target Database

Building the Target Database

Looking Forward

Page 3: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

UCB Discovery ResearchUCB Discovery Research

Page 4: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

UCB Pharma

Discovery ResearchDiscovery Research

StructureChemistry

BiologyN

NCl OOH

O

ClH2

Page 5: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

UCB Pharma Discovery Research

Discovery Research SitesDiscovery Research Sites

Lille

Cambridge Braine-l’Alleud

?

Page 6: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

UCB Pharma Discovery Research

Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics

Page 7: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Protein db

LS-Graph

MascotProtein Prospector

Biotools

SwisProtGenbank

...

MALDI-TOFQ-TOF

SIMS +ProteinMine

SAN

Custom on Oracle 10g

GeneXpressSpotfire

Sequencher andOmiga

GCG and SeqwebHuman genome browser (UCSC)

UnigeneTIGR

Proteome PSD

GeneXpress Proteinscape

UCB Pharma Discovery Research

Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics

Page 8: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Designing the Target DatabaseDesigning the Target Database

Page 9: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Designing the Target DatabaseGeneral RequirementsGeneral Requirements

Purpose: to store and manage target discovery research information efficiently and effectively

Scope: corporate, global, multi-project, multi-user

Content: gene and protein targets and ancillary information

Functionality: BLAST search, Web access, Link to other DB and applications

Page 10: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Text

Dat

a

Searchable fields Searchable fields Im

age

Dat

a

Northern Hybridization image

Western Hybridization

MC ENorthern Tissue ENorthernQPCR

3-D Structure Small molecule hits

(link to compound DB?)

Clone alignment

Designing the Target DatabaseGene name

EST selectedSource of identification

cDNA IMAGE cloneUniGene Hs.#Transcript sizeFull-length cDNA clone name

Reading FramecDNA clone sequences

ORF nucleotide numberORF aa Predicted Size

Protein homologyProtein Sequence

NoteProtein FunctionMouse KOKey Literature

Page 11: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

Data ModelData Model

THERAPEUTIC_AREA* AREA_NAME* FOCUS_GROUP* PROJECTID

PROJECT# PROJECTID* PROJECT_MANAGER* PROJECT_NAME* REVIEW_DATE* STATUS

EXPERIMENT# EXPID* RESEARCHERo CHIPo COMMENTo NOTEBOOB_REFo SPECIES

LITERATURE# LITIDo DATE_PUBLISHEDo JOURNALo LIT_AUTHORo LIT_TITLEo URL

CELL# CELLID* CELL_NAMEo SPECIES

COMPOUND# COMPOUNDID* COMMENT* COMPOUND_NAME* UCB_NUMBER

SOURCE# SOURCEIDo CELLIDo EXPIDo LITIDo MOUSE_KO

GENEALIAS# GENE_NAME# GENE_SYMBOL

IMAGE# IMAGEID* COMMENT* GENE_SYMBOL* PICTURE* TYPE

CDNA# CDNAID* FULL_LENGTH_CDNA* READING_FRAME* SEQUENCEo ORF_NT_NUMBERo ORF_PREDICTED_SIZE

PROTEIN# PROTEINID* GENE_SYMBOL* PROTEIN_FUNCTION* PROTEIN_HOMOLOGY* SEQUENCE* SOURCEIDo COMPONENTo COMPOUNDID

GENE# GID* EST_SELECTED* GENE_SYMBOL* SOURCEID* UNIGENE_HS_NOo ASSAYIDo PROTEINID

involved with

involved with

invloved with

involved

involved

involved with

is

a

is

a

is

a

binds with

binds with

is from

contains

is from

contains

has

hashas

has

represents

represents

expressed by

expresses

Page 12: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Designing the Target Database

Typical QueriesTypical Queries

• Find all targets similar to this protein with size x in gate y or therapeutic area z.

• Find all targets with a specified (or unknown) function.

• Find all targets scheduled to be reviewed on a specified date .

• Find all projects and targets managed by a given person.

• Find all targets from Affy study x, or literature search, cell line y or species z.

Page 13: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Designing the Target Database

Critical Factors in Choosing Oracle 10gCritical Factors in Choosing Oracle 10g

Oracle already a UCB standard

Confidence in Oracle product and support

Smaller resource requirement

Shorter development time

Inclusion of BLAST in database• No need to build interface between DB and BLAST• No need to move data from DB to BLAST• Ability to execute other queries combined with BLAST

Page 14: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Designing the Target Database

Core NCBI BLAST SubroutinesCore NCBI BLAST Subroutines

Subroutine Descriptionblastp Compares an amino acid query sequence against a protein

sequence database.

blastn Compares a nucleotide query sequence against a nucleotide sequence database.

blastxCompares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence.

tblastnCompares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.

tblastxCompares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

Page 15: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Designing the Target Database

System ArchitectureSystem Architecture

Application Server 10G

Web Client

OS: Windows XP

Platform: HP Workstation

Web Client

Web Client

Oracle Database 10GOS: Solaris 8

Platform: Sun Enterprise 250

Page 16: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target DatabaseBuilding the Target Database

Page 17: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

Oracle System Components InstalledOracle System Components Installed

10g Database• Data Mining Option

10g JDeveloper10gAS Infrastructure

• Infrastructure database• OracleAS Identity Management components• OracleAS Metadata Repository

10gAS Middle Tier• J2EE and Web Cache• Portal

Page 18: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

N-tiered Application ArchitectureN-tiered Application Architecture

Client Tier• Web Browser

Application Server Tier • JSP Pages• Jakarta Struts Framework • BC4J• Java Beans• Portal

EIS Tier• Oracle 10g Database• BLAST Data Mining

Page 19: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

JSP Model 2 Architecture – MVC PatternJSP Model 2 Architecture – MVC Pattern

Web Browser

Servlet(Controller)

JSP(View)

User Action

ResponseRedirect

Instantiates

Java Beans(Model)

Data

Oracle 10g Database(Database Server)

Web Container(Application Server)

Page 20: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

Page Flow Page Flow

Page 21: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

Classes - Jakarta Struts FrameworkClasses - Jakarta Struts Framework

Page 22: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

An Issue with SQL in JavaAn Issue with SQL in Java

Nested IN-Clause Statement failed in Java.

OraclePreparedStatement pstmt = (OraclePreparedStatement)conn.prepareStatement("Select genesymbol from proteins where proteinid " +" IN(Select proteinid from projects_proteins where project_projectid " +" IN(Select projectid from projects where status LIKE :1))");

Identical SELECT statement worked in SQL Plus.

Equivalent statement implemented as Stored Procedure.

Page 23: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

An Issue with SQL in JavaAn Issue with SQL in Java

Equivalent Statement as Stored Procedure

SELECT genesymbolFROM proteins,projects,projects_proteins,therapeutic_areasWHERE PROTEINS.PROTEINID =

PROJECTS_PROTEINS.PROTEIN_PROTEINID AND PROJECTS_PROTEINS.PROJECT_PROJECTID = PROJECTS.PROJECTID AND PROJECTS.PROJECTID = THERAPEUTIC_AREAS.PROJECTIDAND (PROJECTS.status = query OR query IS NULL)AND (THERAPEUTIC_AREAS.AREA_NAME = areaName OR areaName IS NULL) ;

Page 24: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

JSP interacts with Database via Stored ProceduresJSP interacts with Database via Stored Procedures

Use of stored procedures:

Centralizes SQL, facilitating reuse

Allows the DBA to tune SQL statements

Leverages Oracle’s dependency tracking mechanism

Provides greater security since JSP user unable to directly modify base tablesProvides precompiled code

Offers better performance• Stored procedures load once into the shared pool and remain there unless

they become paged out.

Page 25: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

BLASTN Stored ProcedureBLASTN Stored ProcedurePROCEDURE "BLASTNTARGETS" IS

--DECLARET_SEQ_ID blastn.T_SEQ_ID%TYPE;SCORE blastn.SCORE%TYPE;EXPECT blastn.EXPECT%TYPE;

-- Using the default parameters in BlastCURSOR blastn_cursor is

select * from TABLE(BLASTN_MATCH ((select seq_data from targets), CURSOR(selectgenesymbol,clonesequence from cdnas, genes where genes.cdnaid=cdnas.cdnaid))) t

where t.score > 25;

BEGIN--OPEN blastn_cursor;OPEN blastn_cursor;--delete the rows in the blastn tableDELETE FROM BLASTN;LOOP

FETCH blastn_cursor INTO T_SEQ_ID,SCORE,EXPECT;EXIT WHEN blastn_cursor%NOTFOUND;INSERT INTO BLASTN VALUES(T_SEQ_ID,SCORE,EXPECT);

END LOOP; CLOSE blastn_cursor;

END blastntargets;

Page 26: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

An Issue with 10g ASAn Issue with 10g AS

Attempts to deploy application to AS gave server error.Identifying proper expertise and mode of resolution proved difficult.Teamwork ultimately solved problem.

• Oracle Life Sciences• Oracle Customer Service• OLSUG membership• Oracle Consulting Practice

Solution SIMPLE, but of course NOT OBVIOUS

Page 27: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

An Issue with 10g ASAn Issue with 10g AS

Page 28: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

Request PageRequest Page

Page 29: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

BLAST Query PageBLAST Query Page

Page 30: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

BLAST Query PageBLAST Query Page

Page 31: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

BLAST Result PageBLAST Result Page

Page 32: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

Request PageRequest Page

Page 33: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

Query PageQuery Page

Page 34: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

Query PageQuery Page

Page 35: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Building the Target Database

Query Result PageQuery Result Page

Page 36: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Looking ForwardLooking Forward

Page 37: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Looking Forward

Short TermShort TermAdditional Features and Improvements• Data input page

• Sexy new name for Target Database

• Integrated BLAST and query

• Report pages

• Integration with other systemsLong TermLong Term

Bioinformatics Portal

Integrated Knowledge Base

Page 38: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

UCB Team

MISMIS ResearchResearch

Prasoon Kejriwal, Cambridge

David Wei, Cambridge

Bob Johnson, Cambridge

Didier Generet, Braine

Didier Chalon, Braine

Karl Nocka, Cambridge

Bob Coopersmith, Cambridge

Zhidong Zhang, Cambridge

Rich Fisher, Cambridge

Pierre Chatelain, Braine

Page 39: Having a BLAST Data Mining in Oracle 10g · UCB Pharma Discovery Research Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics ... Scope: corporate, global,

Oracle Life Sciences User Group Meeting – Reston, VA 2004

Special Thanks

Prasoon KejriwalCharlie Berger, OracleSusie Stephens, OracleDev Nayak, OracleShiby Thomas, OracleShaloo Anand, OracleMahendra Navarange, MRC Clinical Sciences