View
2.465
Download
2
Embed Size (px)
DESCRIPTION
Presentation for Lab-J of the Human Genetics Department at the Leiden University Medical Centre.
Citation preview
From Laboratory to e-Laboratory?
Introduction for ‘Lab-J’ of the LUMC Human Genetics Department
Marco Roos Acknowledging the colleagues from BioSemantics, myGrid, OMII-UK, AID, The LUMC BioInformatics Expertise Centre
2
Introducing
Me
3
Liaison biology/bioinformatics – informatics
Biologist and bioinformatician, e-(bio)science researcherCoordinator BioSemantics group Leiden
Human Genetics Department Leiden University Medical Centre and Informatics Institute University of Amsterdam
Project or Area Liaison (PAL) OMII-UK Member BioAssist programme committee NBIC
4
also about
You
5
First about
Me
6
My C.V. before e-Sciencebefore 2003
• Molecular & Cellular biology (MSc)– microscopy and image analysis of chromosome structure– ‘minor’ computer science
• Image analysis methods to measure DNA content in bull sperm cells (civil service)
• Chromatin structure & function (PhD molecular cytology)
– F.I.S.H., microscopy, image analysis, statistics– 3-D chromosome structure during cell cycle (no luck)– DNA movement in Escherichia coli (success)
• Human Transcriptome Map (post-doc)– Gene expression to human genome sequence– Analysis of regions of increased gene expression
MotivationStructure and function of DNA in the nucleus
Esc
heri
chia
coli
Munti
acu
s m
untj
ak
8
Why bioinformatics?
Lab-J suggests…
07/04/2023 BioAID 9
Bioinformatics
A typical bioinformatician
07/04/2023 BioAID 10
Bioinformatics
A biologist behind a computerwho (just) learned perl
07/04/2023 BioAID 11
/* * determines ridges in htm expression table*/
#include "ridge.h"
int selecthtm(PGconn *conn, char *htmtablename, char *chromname, PGresult *htmtable){
char querystring[256];
sprintf("SELECT * FROM %s WHERE chrom = %s ORDER BY genstart", htmtablename, chromname);htmtable = PQexec(conn, querystring);
return(validquery(htmtable, querystring));}
int is_ridge(PGresult *htmtable, int row, double exprthreshold, int mincount)/* determines if mincount genes in a row are (part of) a ridge *//* pre: htmtable is valid and sorted on genStart (ascending)/* post: {
if (mincount<=0) return TRUE;
if (row>=PQntuples(htmtable)) return FALSE;
if(PQgetvalue(htmtable, 0, PQfnumber(htmtable, "movmed39expr")) < exprthreshold){ return FALSE;}return(is_ridge(htmtable, ++row, exprthreshold, --mincount));
}
int main(){
PGconn *conn; /* holds database connection */char querystring[256]; /* query string */PGresult *result;int i;
conn = PQconnectdb("dbname=htm port=6400 user=mroos password=geheim");
if (PQstatus(conn)==CONNECTION_BAD){
fprintf(stderr, "connection to database failed.\n");fprintf(stderr, "%s", PQerrorMessage(conn));exit(1);
}else printf("Connection ok\n");
sprintf(querystring, "SELECT * FROM chromosomes");printf("%s\n", querystring);
result = PQexec(conn, querystring);
if (validquery(result, querystring)){
printresults(result);}else{
PQclear(result);PQfinish(conn);return FALSE;
}
PQclear(result);PQfinish(conn);return TRUE;
}
int printresults(PGresult *tuples){
int i;
for (i=0; i< PQntuples(tuples) && i < 10; i++){
printf("%d, ", i);printf("%s\n", PQgetvalue(tuples,i,0));
}return TRUE;
}
int validquery(PGresult *result, char *querystring){
printf(" in validquery\n");if (PQresultStatus(result) != PGRES_TUPLES_OK) {
printf("Query %s failed.\n", querystring);fprintf(stderr, "Query %s failed.\n", querystring);return FALSE;
}return TRUE;
}
13
Why e-science? What is wrong with bioinformatics?
Human geneticists think…
14
Why should a biologist be interested in e-science?
BioAssistants guessed…
• Involves Computation• Interpretation of results• Biology isn’t that interesting• Reduce reinvention of the wheel• Current lack of standards• Sharing results• Reshaping biology• Synergy between different sciences• Emerging Data driven science
15
Why e-Science?
A needy biologist
Single tiny brain
Lots of data to deal with
Lots of methodsand algorithms to try
and combine
No computationalsuperpowers
Lots of knowledge to deal with
16
1070 databases Nucleic Acids Research Jan 2008(96 in Jan 2001)
Proteomics, Genomics, Transcriptomics, Protein sequence prediction, Phenotypic studies, Phylogeny, Sequence analysis, Protein Structure prediction, Protein-protein interaction, Metabolomics, Model organism collections, Systems Biology, Epidemiology, etcetera …
All with a splendid interface… all different, of course
07/04/2023 17
Traditional data integration in bioinformatics
LocalDatabase
LocalDatabase
18
The ‘spaghetti’ approach
19
Some of my observations
• Reinvention– How many reannotation pipelines do you need?– Little reuse of components
• Reproducibility– Black boxes – Emphasis not on clarity– Can we understand bioinformatics as wet lab protocols?
• Focus on technicalities, not biological analysis– Should bioinformaticians write ‘job submission’ scripts?
• Data graveyards– Do we need >1000 databases?– Can we understand our own data?
21
SOME EXAMPLES FROM FIELD OF E-SCIENCE
22
Enhancement 1: Workflows(Taverna workflow)
23
Enhancement 2: exploiting brains
24
Exploiting Brains By Web Servicessource: http://biocatalogue.org (launched at ISMB2009)
>1000 annotated services, >3000 known to TavernaIncludes BioMart, R, Text mining, Kegg, NCBI Pubmed, Ensembl, etc.
Web Services run remotely
25
Exploiting more brains by sharing workflowssource: http://myExperiment.org
Social community web site for scientists2300 registered users in two years
750 workflows
Bioinformatics and e-science
Single purpose,single person,
black boxapplication
Customized experiments with reusable components
My component
Your componentMy component
Your component
My component
27
What do we know of our data?
Sufficient?
• Query discoveries?• Query across
experiment?• Fit biological
modelling?• Good basis for new
experiments?• Flexible enough?
Model-based data integration
Biological concepts (‘myModel’)
Data
Marshall et al., International Workshop on Knowledge Systems in Bioinformatics 2006Post et al., Bioinformatics 2007
Biologist readable
model
Computer
readable model
Model based data integrationExample: UCSC genome browser
partOf
30
Semantic Web (Linked Open Data)
31
Empower me with a ‘virtual brain’
My ws
Your ws
My ws
Your ws
My ws
* From P.J. Verschure, Journal of Cellular Biochemistry 2006, vol. 99(1), pg 23-34
*
32
Query
Retrieve documents from Medline
Extract proteins (Homo sapiens)
Calculate ranking scores
Create biological cross references
Convert to table (html)
Add documents (IDs) to semantic model
Add proteins to semantic model
Add scores to semantic model
Add cross references to semantic model
Add query to semantic model
Workflow and Semantic Web
33
Concept web from a users point of view
34
e-Laboratories and e-Laboratory factories
35
e-Galaxy for NBIC
• Galaxy as front end
• Workflows & Web Services
• Grid enabled Taverna
• MOLGENIS
• Semantic/Concept Web
• myExperiment/BioCatalogue
• Scientific Research Objects
Vacancy! (software engineer)
37
e-Galaxy mock-up
Underlying workflow
Your Scientific Research Object
MOLGENISConvertImport/ExportResearch ObjectsStoreConfigureRun
Related research and documents
Adlsjflad jslf adsflkj alfd adsf Adflja dlfkjal adlfj lakdjflkj adf Adflkj lakjlkjadsf lakdfjlf ladoioewnJlakdsfo oiuw fja oija oisdflv oaijdf
Suggestions by semantic components
38
e-Science requirement: Reuse
E-La
bora
tory
com
pone
nt
40
Research and development aims
• Automated support for hypothesis formation – E.g. on epigenetic mechanisms– Apply Workflow, Semantic Web, Concept Web– Concept-based meta-analysis– Automated triple creation from computational
analysis
41
Research and development ambitions
• Co-develop e-Laboratories– e-Galaxy– epiGenius– BioBanking
• Help BEC with support environment• Concept Web services
– Web services– E-Laboratory components– Transparent creation of triples– Personal semantic repositories
Liaison
OMII-UKManchester, Southampton, Edinburgh
(ca. 30 engineers)Taverna, myExperiment, e-Labs
W3C Health Care & Life Sciences Interest Group
Semantic Web expertsLinked Open Data
AIDUniversity of Amsterdam
e-Science expertsGrid tools
BioSemantics RotterdamText mining
Concept profile meta-analysis
NBICBioAssist core software development
Grid tools, Concept Web, e-Labs
Concept WebContent, tools and infrastructure
You?
Bioinformatics Expertise Centre LUMCStatistical and computer science expertise
Generic support
43
‘e’ for enhance, not enforce
Please help me to help you
Register for:http://snipurl.com/biosemanticsusers(http://www.myexperiment.org/groups/211)
Allows me to• Give you preferential treatment• Not spam everybody• Keep you informed• Ask your opinion (user driven development!)
44
Visit the BioSemantics web sitehttp://www.biosemantics.org/
45
Word of warning
Computer scientists are scientists too!Need to publishScore by papers, not by softwareAddressed by OMII-UK and BioAssist
Compare“How can I use it in the clinic?”“How can I use it in the lab?”
46
Dissemination
• Come by for help or information• Internal ‘mini-courses’?• Send me suggestions!
• FYI: Course ‘Managing Life Science Information’ for PhD students, 2010
47
Key points
• Liaisingbetween technology contacts and you, the colleagues of Human Genetics.
• No obligationsTry any new developments that we are involved in with our help, but don't feel obliged.
• Help us help you Express your wishes, problems, try things and give feedback – and be patient sometimes
Please join the biosemantics users group on myExperiment.org to help us communicate.
48
Thank you for your attention
An enhanced biologist
Lots of accessible data
Web Services, Workflows,
and their creatorsavailable
Other people’scomputationalsuperpowers
Knowledge basesto query
Communitybrain power
Homo biologicus enhancis