72
Combining Data Integration and Visualization with Spotfire Tools Benjamin A. Rolfe

discoveryHubOvervie Data Integ...Model complex nested collections naturally ... conpar= new Object(); ... DHTML/JavaScript?Parts Dialog front end (user input)

Embed Size (px)

Citation preview

Com bining Data I ntegrat ion and Visualizat ion w ith Spot fire Tools

Benjam in A. Rolfe

discoveryHub Overview

discoveryHubdiscoveryHub™™ : What it Is: What it Is

?? Data Integration PlatformData Integration PlatformFlexible Query EngineFlexible Query Engine

Powerfully Simple Data ModelPowerfully Simple Data Model

Mediation LayersMediation Layers

Access LayersAccess Layers

?? NOT a Database!NOT a Database!works with RDBMS, ODBMS, anythingworks with RDBMS, ODBMS, anything……

?? NOT the Replacement for Oracle, LIMS orNOT the Replacement for Oracle, LIMS or……Compliment to acquisition and storage systemsCompliment to acquisition and storage systems

?? NOT One size fits all, all youNOT One size fits all, all you’’ll ever needll ever needBecause there is no such thingBecause there is no such thing

?? NOT Any part of a BicycleNOT Any part of a Bicycle““HubHub”” as in that from which spokes reach outas in that from which spokes reach out

discoveryHubdiscoveryHub™™ : What it is Not.: What it is Not.

Query EngineQuery EngineFlexible RealFlexible Real-- time data accesstime data access

Integrate disparate data sourcesIntegrate disparate data sources

Automate integration and transformationAutomate integration and transformation

Disparate Information SourcesDisparate Information Sources

?? Distributed and Heterogeneous Distributed and Heterogeneous

?? Number of interesting sources very largeNumber of interesting sources very large

?? Sources managed independentlySources managed independently

?? Rapid ChangeRapid Change

?? Requires a dynamic solutionRequires a dynamic solution

Complex Data HandlingComplex Data Handling

?? Nested Collections and Complex ObjectsNested Collections and Complex ObjectsEnables dealing inherently with heterogeneous, Enables dealing inherently with heterogeneous, hierarchical structured and unstructured data.hierarchical structured and unstructured data.

?? Flexible data model: manage complex things Flexible data model: manage complex things simply.simply.

?? Model the problem as it appears, naturallyModel the problem as it appears, naturallyNo more complexity than is neededNo more complexity than is needed

?? Functional Query SystemFunctional Query System

Simplified Data IntegrationSimplified Data Integration

Model complex nested Model complex nested collections naturallycollections naturally

drill into drill into heterogenousheterogenous, , disparate nested disparate nested structures directlystructures directly

discoveryHubdiscoveryHub™™ ContextContext

Applications

Universe of Data Sources

discoveryHubdiscoveryHub™™ ContextContext

Applications

Universe of Data Sources

discoveryHubdiscoveryHub™™ ContextContext

Universe of Data Sources

APIsAPIs

?? Programmatic Access to Programmatic Access to dHdH systemsystemJava, Enterprise JavaJava, Enterprise Java

Dot.NETDot.NET (C# , J# , (C# , J# , ……))

DHTML/JavaScriptDHTML/JavaScript

CGICGI

PerlPerl

?? Easily extend applications with data Easily extend applications with data integrationintegration

APIsAPIs

?? Programmatic Access to Programmatic Access to dHdH systemsystemJava, Enterprise JavaJava, Enterprise Java

Dot.NETDot.NET (C# , J# , (C# , J# , ……))

DHTML/JavaScriptDHTML/JavaScript

CGICGI

PerlPerl

?? Easily extend applications with data Easily extend applications with data integrationintegration

LinksLinks

?? Discovery Hub marketing materialDiscovery Hub marketing material?? http://www1.amershambiosciences.com/aptrix/upp01077.nsf/Content/http://www1.amershambiosciences.com/aptrix/upp01077.nsf/Content/ss

cierra_discoveryhub_overviewcierra_discoveryhub_overview

?? Or from Or from http://http://www.amershambiosciences.comwww.amershambiosciences.com click click ScierraScierradiscoveryHubdiscoveryHub

?? Discovery Hub support pageDiscovery Hub support page?? Log into Log into http://http://www.amershambiosciences.com/scierrawww.amershambiosciences.com/scierra

?? discovery Hub pane should be on bottom left panediscovery Hub pane should be on bottom left pane

Integration with Applications

API OverviewAPI Overview

?? Consistent ArchitectureConsistent ArchitectureCreate/Manage connectionCreate/Manage connection

Execute serverExecute server--side processesside processes

Get and use resultsGet and use results

Java, Enterprise JavaJava, Enterprise JavaDot.NETDot.NET (C#, J#, (C#, J#, ……))DHTML/JavaScriptDHTML/JavaScriptPerlPerl

General General EampleEample: Java: Java

?? Define connectionDefine connection

?? Get the connection Get the connection and connect.and connect.

?? execute commandsexecute commands

?? do something with do something with resultsresults

connectionString = "type=SOCKET server=technet.geneticXchange.com";

Connection myConnection = null;

Properties p = ConnectionFactory.parseProperties(connectionString, false);

myConnection = ConnectionFactory.create(p);

myConnection.connect();

String cmdToRun = "select (#uid: x.uid, #feature: x.feature) from na-get-seqfeat-by-uid(12354);”);

String results =myConnection.executeAndReadRaw(cmdToRun);

myConnection.disconnect();

import k1connection.*;import java.util.Properties;

/*** This is a very simple sample class to demonstrate use of the k1connection* package.*/public class SampleClient{public SampleClient(){

// Connection string for socket connnection to dev serverconnectionString = "type=SOCKET

server=technet.geneticXchange.com";

Connection myConnection = null;try {// Get the connection type from the command and build the rest of the // options into a Properties set. Get the connection and connect.// display information about the connectionProperties p = ConnectionFactory.parseProperties(connectionString,

false);myConnection = ConnectionFactory.create(p);myConnection.connect();System.out.println(myConnection.getConnectionInfo());

}catch (ConnectionException e) {myConnection = null;System.out.println(e.getMessage());return;

}

try {// Run a simple commandSystem.out.println(myConnection.executeAndReadRaw("{1,2,3,4,5};"));

}catch (ConnectionException e) {System.out.println(e.getMessage());return;

}

try {// DisconnectmyConnection.disconnect();

}catch (ConnectionException e) {System.out.println(e.getMessage());

}

}

public static void main(String[] args){

String connectionString = null;

if (args.length > 0){connectionString = args[0];

}

SampleClient it = new SampleClient(connectionString);

}}

Simple Java Example

JavaScript/DHTML APIJavaScript/DHTML API

?? Define ConnectionDefine Connection

?? Make connectionMake connection

?? Run Run sSQLsSQL

?? Handle ResultsHandle Results

mydh = new dHubConnection(params);mydh.connect(constring);

res = myConnection.executeScriptXML(scriptName, args);

htmlstring = myConnection.formatXML(res); dhresult = new dhResultXMLProcessor();dhresult.makeRecs(xmlstring,null);uid = dhresult.getMember("uid");dhresult.nextRecord();rec = dhresult.getCurRecord();

<script language="JavaScript" src="dhAccessAPIOBJ.js"></script><script language="JavaScript" src="dhResultOBJ.js"></script><script language="JavaScript">// create connection object and connectconpar = new Object(); conpar.hostname = "localhost";conpar.port = "80";myConnection = new dHubConnection(conpar);myConnection.connect("whatever") ;scriptName=“ztest.ssql”; args=“bovine feces”;// execute dh server side script, results as XMLxmlstring = myConnection.executeScriptXML(scriptName, args);

// make HTML out of the XML for display.htmlstring = myConnection.formatXML(xmlstring);

// pick some things out of the resultsvar dhresult = new dhResultXMLProcessor();dhresult.makeRecs(xmlstring,null);while (dhresult.nextRecord())

{uid = dhresult.getMember("uid");title = dhresult.getMember("title"); acc = dhresult.getMember("accession"); org = dhresult.getMember("organism"); taxon = dhresult.getMember("taxon"); doSomethingWith(uid,title,acc,org,taxon);

}

Simple JavaScript Example

Creating Creating SpotfireSpotfire ToolsTools

?? Why Tools?Why Tools?

?? Architectural ConsiderationsArchitectural Considerations

?? Tool Development ProcessTool Development Process

Why a Tool?Why a Tool?

?? FlexibilityFlexibilityInteraction with user and Interaction with user and discoveryHubdiscoveryHub

Does more than just Does more than just ““suck in datasuck in data””?? Present structured view of everything, even bits that donPresent structured view of everything, even bits that don’’t t

fit a fit a ““ flatflat”” data modeldata model

?? Flatten the parts to insert into Flatten the parts to insert into SpotfireSpotfire

Architecture ConsiderationsArchitecture Considerations

Choose the right toolChoose the right toolAppropriate selection of implementation domain reduces complexitAppropriate selection of implementation domain reduces complexity.y.

?? SpotfireSpotfire –– User facing interactionUser facing interaction

?? JavaScript JavaScript Interaction with Interaction with SpotfireSpotfire data set, user inputsdata set, user inputs

?? discoveryHubdiscoveryHub: data integration and : data integration and transformationtransformation

Integrate and create user views and Integrate and create user views and ““ flatflat”” views for views for SpotfireSpotfire..

Tool Development Process: Tool Development Process: Components of a Components of a discoveryHubdiscoveryHub ToolTool

Spotfire Decision Site Browser

Dialog Tool

discoveryHub scriptsSQL

Client PC

discoveryHubserver

Dialog Tool ComponentsDialog Tool Components

?? Dialog runs in the DS Client Browser.Dialog runs in the DS Client Browser.DHTML/JavaScriptDHTML/JavaScript

?? PartsPartsDialog front end (user input)Dialog front end (user input)Extract from SF datasetExtract from SF datasetExecute Execute dHdH integration operationintegration operationHandle ResultsHandle Results

?? Display resultsDisplay results?? Add/Modify to SF datasetAdd/Modify to SF dataset

discoveryHubdiscoveryHub ServerServer--sideside

?? Create the integrating queryCreate the integrating queryDo the Do the ““hard parthard part”” of access to the worldof access to the world

Transform and flatten as requiredTransform and flatten as required?? Pull into a view that is easy to deal withPull into a view that is easy to deal with

?? Can create multiple views Can create multiple views

Some Spotfire Tool Examples(the good stuff)

Example Tool: Example Tool: Gene AnnotationGene Annotation

?? Start with NCI geneStart with NCI gene--drug interaction datadrug interaction data

?? Keys of ID embedded in Keys of ID embedded in ““NameName”” fieldfield

?? Integrates information from several sources Integrates information from several sources ““ livelive””

?? Create simplified Create simplified ““ flatflat”” view for view for SpotfireSpotfire from from complex objectscomplex objects

?? Creates (or updates) new columns in Creates (or updates) new columns in SpotfireSpotfire

discoveryHubdiscoveryHub Server Side ScriptServer Side Script

! Example: tom2.ssql ! Example: tom2.ssql ! Extract data from NCBI ! Extract data from NCBI UnigeneUnigene and and LocusLinkLocusLink..! start with accession, go to ! start with accession, go to unigeneunigene to get a locus id.to get a locus id.! go to locus link with that. Collect bits of information along ! go to locus link with that. Collect bits of information along ! the way.! the way.! ! ------------------------------------------------------------------------------------------------------! ! discoveryHubdiscoveryHub example B. example B. RolfeRolfe, geneticXchange Inc., geneticXchange Inc.! Sample code provided for education and evaluation. ! Sample code provided for education and evaluation. ! Use at your own responsibility.!! Use at your own responsibility.!

set echo off;set echo off;! ! utils.ssqlutils.ssql contains the contains the getArgByNamegetArgByName definition.definition.usessqlscriptusessqlscript ""utils.ssqlutils.ssql";";

! Get argument values. The accession list is a quoted! Get argument values. The accession list is a quoted! string as passed in, so we use string! string as passed in, so we use string--tokenize to break tokenize to break ! on ! on whitespacewhitespace, resulting in a list of accessions : we cast , resulting in a list of accessions : we cast ! this to a set (l2s) for use in ! this to a set (l2s) for use in ! the select that follows.! the select that follows.create view create view astrastr as getArgByNum(1);as getArgByNum(1);create view create view acclistacclist as l2s(stringas l2s(string--tokenize(" ",astr,0));tokenize(" ",astr,0));

! ! full_viewfull_view is the desired result; We get a is the desired result; We get a UnigeneUnigene ID for ID for each accession,each accession,! and pull locus! and pull locus--id by the id by the UnigeneUnigene IDs and finally go to locusIDs and finally go to locus--link.link.create view create view full_viewfull_view asasselect (select (#accession: acc,#accession: acc,##unigeneunigene: : ugid.fullugid.full--id,id,##locusidlocusid: : llidllid,,##chromoposchromopos: : ll.cytogeneticll.cytogenetic,,##keggkegg: : ll.kegg.pathwayll.kegg.pathway,,##locuslinklocuslink: : llll))

fromfromacclistacclist as acc,as acc,webunigenewebunigene--idid--general(accgeneral(acc) as ) as ugidugid,,getget--locuslocus--fromfrom--unigene(#unigene(#org: org: ugid.orgugid.org, #cid: , #cid: ugid.cidugid.cid) ) llidllid,,locuslinklocuslink--byby--locusidlocusid--2(num2(num--stringify(llid)) as stringify(llid)) as llll;;

full_viewfull_view;;

Sample ToolSample Tool

[Show the tool and code[Show the tool and code……]]

Example Tool: Example Tool: Transcription AnalysisTranscription Analysis

?? Uses annotated data (Locus Link ID)Uses annotated data (Locus Link ID)

?? Retrieves set of transcription factors by geneRetrieves set of transcription factors by gene

?? Retrieves set of genes by transcription factorRetrieves set of genes by transcription factor

?? Integrates multiple web queries to Integrates multiple web queries to oPossumoPossum::http://http://sonoma.cmmt.ubc.ca/cgisonoma.cmmt.ubc.ca/cgi--bin/POSSUM/possum/bin/POSSUM/possum/

?? Transforms and presents simplified viewTransforms and presents simplified view

discoveryHubdiscoveryHub Server Side ScriptServer Side Script! Gene search of ! Gene search of oPossumoPossum for transcription factor analysis.for transcription factor analysis.! Used by ! Used by spotfirespotfire tool TZ2tool TZ2--ORTFAORTFAusessqlscriptusessqlscript ""utils.ssqlutils.ssql"; "; usessqlscriptusessqlscript "fisher2a.ssql"; "fisher2a.ssql"; usessqlscriptusessqlscript ""tf.ssqltf.ssql";";

! Input to the script ! Input to the script –– search parameterssearch parameterscreate view species as getArgByNum(1);create view species as getArgByNum(1);create view create view idtypeidtype as getArgByNum(2);as getArgByNum(2);create view phylum as getArgByNum(3);create view phylum as getArgByNum(3);create view create view idstridstr as getArgByNum(4);as getArgByNum(4);

! ! oPossumoPossum transcription factor search using gene IDs and search transcription factor search using gene IDs and search paramsparams inputinputcreate view a as fisher2a(#species:species, #create view a as fisher2a(#species:species, #idtype:idtypeidtype:idtype, #, #phylum:phylumphylum:phylum, #, #ids:idstrids:idstr););

! Extract gene lists by transcription factor for a Transcription! Extract gene lists by transcription factor for a Transcription factorfactorcreate view v1 as select setcreate view v1 as select set--head(tf(x.TargetGeneHitsURLhead(tf(x.TargetGeneHitsURL)) from a x;)) from a x;

! Function we use to simplify select statement! Function we use to simplify select statementcreate function f1 create function f1 ssss as select as select z.GeneIDz.GeneID from from ssss z;z;

! Final view of Results of Search: genes associated with each TF! Final view of Results of Search: genes associated with each TFselect (#TF: select (#TF: x.#TranscriptionFactorx.#TranscriptionFactor, #, #gidsgids: f1(x.Genes)) from v1 x; : f1(x.Genes)) from v1 x;

Sample ToolSample Tool

[Show the tool and code[Show the tool and code……]]

Example Tool: Example Tool: Column SortingColumn Sorting

?? Purpose: Enhance Heat Map viewPurpose: Enhance Heat Map viewAutomate a tedious point and click processAutomate a tedious point and click process

Accelerates finding focus on Accelerates finding focus on ““ interestinginteresting”” datadata

?? Rearranges Heat Map ColumnsRearranges Heat Map ColumnsFilters by column sparsenessFilters by column sparseness

Sorts order columns appearSorts order columns appear

Sample ToolSample Tool

[Show the tool and code[Show the tool and code……]]

Thank You!

Contact: [email protected]

For Tool DemosBenjamin Rolfe

05-Oct-2004

10/27/2004 discoveryHub and Spotfore 38

Starting Point: Sparse Data Set

10/27/2004 discoveryHub and Spotfore 39. He can specify the threshold and choose the sort method.

10/27/2004 discoveryHub and Spotfore 40First step is filter sparse columns based on cutoff value.

10/27/2004 discoveryHub and Spotfore 41

Then the column order is changed, sorted by column name or a part of it as selected.

10/27/2004 discoveryHub and Spotfore 42

Now he can see what’s interesting

10/27/2004 discoveryHub and Spotfore 43Insert domain expertise here…

10/27/2004 discoveryHub and Spotfore 44

He now runs an external, proprietary application to cluster via a tool

10/27/2004 discoveryHub and Spotfore 45

Results created by discoveryHub

10/27/2004 discoveryHub and Spotfore 46

10/27/2004 discoveryHub and Spotfore 47

He used the query device to see compounds clustered around cluster 6 with 0.5 score

10/27/2004 discoveryHub and Spotfore 48

…the next tool does another external process via discoveryHub

10/27/2004 discoveryHub and Spotfore 49

10/27/2004 discoveryHub and Spotfore 50

10/27/2004 discoveryHub and Spotfore 51

10/27/2004 discoveryHub and Spotfore 52

A new data set grouped by structural similarity.

10/27/2004 discoveryHub and Spotfore 53

Stock Status dialog

10/27/2004 discoveryHub and Spotfore 54

05-Oct-2004Benjamin A. Rolfe

10/27/2004 discoveryHub and Spotfore 56

• Start with NCI drug-gene interaction data• Annotate data from multiple sources

– genBank, locus link, Unigene, etc from NCBI– Kegg pathway

• Transcription Analysis – Uses oPossum live (multiple pages)

• More details

10/27/2004 discoveryHub and Spotfore 57

• Idea of sequential workflow via tools– Automate the tedious parts– Insert domain expertise at the right points

10/27/2004 discoveryHub and Spotfore 58

10/27/2004 discoveryHub and Spotfore 59

Find the 3’ Accession number embedded in the Name

10/27/2004 discoveryHub and Spotfore 60Select some genes to annotate.

10/27/2004 discoveryHub and Spotfore 61

10/27/2004 discoveryHub and Spotfore 62Shows him a detailed view. Note top level members.

10/27/2004 discoveryHub and Spotfore 63New Columns. “accession” and “locusid” we use in subsequent tools.

10/27/2004 discoveryHub and Spotfore 64Locus-link tool – get complete locus link record and display.

10/27/2004 discoveryHub and Spotfore 65

10/27/2004 discoveryHub and Spotfore 66

10/27/2004 discoveryHub and Spotfore 67

10/27/2004 discoveryHub and Spotfore 68

10/27/2004 discoveryHub and Spotfore 69

10/27/2004 discoveryHub and Spotfore 70

10/27/2004 discoveryHub and Spotfore 71

• Pulled in multiple external sources automatically• Insert domain expertise at the right points• Reduce “cut and paste” time exponentially• Enable workflows to tedious to do manually• Open up the world from inside Decision Site

10/27/2004 discoveryHub and Spotfore 72

• This presentation has focused on Tools.• The IIM and Import Agent are still usefull

– We have connections via both;– Each has it’s use

• Use the right tool for the Job.

Looks likea nail?

Hit it with a Hammer

Hit it with Something Else

Yes No