Rothamsted Researchwhere knowledge grows
QTLNetMiner – Efficient search and prioritization of gene evidence networks
WheatIS Annual Meeting, San Diego
9 January 2015
Keywan Hassani-Pak
Many gene discovery routes exploit genetic or transcriptome data to produce markers for breeding or reverse genetics
Routes to candidate gene discovery
Gene Expression
QTL/GWAS
Candidate Genes
Prioritization Validation
Markers for Breeding or GM
Tra
its
1
2
3
Gene Prioritization – a knowledge discovery challenge!
Orthologous Genes
Arabidopsis, Rice, Yeast etc.
Lists of candidate genes
Gene Expression
Evaluation of different types of evidence Expensive and labour-intensive
Literature
Phenotype, GeneOntologies
PathwaysOmics data
Traits
Knowledge discovery process
Selection
Preprocessing
Transformation
Data mining
InterpretationEvaluation
Data integration and transformation using Ondex
• Ondex parsers for many data sources to transform raw data into semantic networks
• Accession mapping or text mining to link concepts from different data sources
• Update data warehouse needs download of new data and re-run integration workflow
Ondex: free, open-source, developed in Java www.ondex.org
Building a Wheat Information Network through integration of publicly available datasets
Wheat Genes Homology/Domains Annotations
5A
5B
5D
TTG2seed color
seed coat development
DNA-binding WRKY
WRKY1
PMID 19129166
Inferred from Mutant Phenotype
PMID: 15598800
GO
TO
encodes
text-mining
Mutations in TTG2 cause phenotypic defects in trichome
development and seed colorpigmentation. PMID: 17766401
41% identityEnsemblCompara
QTLNetMiner – Mining large semantic networks for gene-trait discovery
Arabidopsis, Wheat,Poplar at Rothamsted
Barley in collaboration with IPK, Germany
Potato & Solanaceaein collaboration with INTA, Argentina
Animals in collaboration with Roslin Institute, UK
• Web: https://ondex.rothamsted.ac.uk/QTLNetMiner• Code: https://github.com/KeywanHP/QTLNetMiner
QTLNetMiner search interface
Define a QTL region you are interested in.
Include a list of gene names and see if they are related to your keyword.
Let’s help you to suggest alternative search terms to
improve your results.
QTLNetMiner – Network View
Taubert J, Hassani-Pak K, Castells-Brooke N and Rawlings C, Ondex Web: web-based visualization and exploration of heterogeneous biological networks. Bioinformatics (2013)
... zoom into regions of interest
TRAES_1AL_0404BC790
TRAES_1BL_1D865A8CC
TRAES_1DL_5BAB0B6BC
WRKY43
CML9
Calcium signalling
Mechanical stimulus response
Calcium ion detection
Stress tolerance
GO
GO
GO
TO
WRKY
Mutations of the AtCML9 gene also alter the expression of several stress-regulated genes,
suggesting that AtCML9 is involved in salt stress tolerance through its effects on the ABA-
mediated pathways.
Associating genes with trait terms through guilt by association in a labelled & directed multi-graph (Ondex network)
QTLNetMiner – Semantic motif search
auxincytokinin
strigolactone
CCDMAX
subapical shootsaxillary branching
shoot branching
hormone
?
Integrated knowledge network User input (prior knowledge)
Gene
• Scoring genes based on information retrieval metric
reflect how relevant a term is to a gene in a collection
• Developed a metric that takes into account
1. The amount of supporting evidence (tdf)
2. The specificity of evidence to a gene (IDFmean)
Candidate gene prioritisation
𝑆𝑐𝑜𝑟𝑒 𝑡, 𝑋 = 𝑡𝑑𝑓 𝑡, 𝑋 ∗ 𝐼𝐷𝐹𝑚𝑒𝑎𝑛(𝑋)
t: query terms X: set of documents associated with a gene
Gene ranking – Example
Query: Phytophthora infestans|late blight resistance|response to pathogen|LRR
Score: 5.72
Score: 2.71
• Compatible with iOS, Android and Microsoft mobile devices Replace the Java applet network viewer with CytoscapeJS
Replace the Flash GViewer with KineticsJS
• Develop a federated version (SolR, RDF, SPARQL) of QTLNetMiner instead of centralised data warehousing
• Tighter integration with gene expression and variation databases to improve gene ranking algorithm
Current and future development