Upload
paulina-todd
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
An Integrative Approach for the Study of Sequence Variation Impact on Biological Processes, Diseases and Environmental Agents’ RiskSivakumar Gowrisankar, Amol S Deshmukh, Anil G Jegga and Bruce J Aronow
Department of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center and University of Cincinnati
The integration of genomic sequence analyses from multiple species and strains, along with protein interaction data and gene expression profiles that reflect specific biological states and processes has opened many new avenues to understand specific biological systems. Nevertheless, formidable challenges remain to be overcome for the improvement of prediction, diagnosis, prognosis, and treatment of human diseases. Can we infer from large molecular datasets how different biological entities are organized and interact, and then predict the effect that genetic polymorphisms or sequence variations might confer on interconnected biological processes? The integration of heterogeneous data and information in fact is a key issue in functional genomics. An appropriate data model and consistent methods for its integrated representation, analysis, and visualization has the potential to pave the way for the emergence of discovery-driven science, enhance hypothesis-generation, and provide new focus for experimental validation and refinement. Thus, to represent the presence and impact of polymorphisms further in the context of biological pathways, we have sought to unify our representation of molecular, biological, and environmental entities such that biological knowledge from experts and biomedical literature could be assembled in a storyboard canvas. For example, the representation of a disease could consist of a biological process composed of one or more pathways, within which, entities (gene products, complexes, and cellular and subcellular components) are subjected to one or more interactions and transitions to disease term associated states. We have begun the development of a suite of applications using a common database structure that can represent biological processes using a host of publicly available data sources including gene objects and biological ontologies that in turn represent systematic abstractions of biomedical literature and expert knowledge. As part of this exercise, we have compiled all existing protein-protein interactions from “interactome” rich databases (PreBIND, MINT, DIP and HPRD) and mine the biomedical literature for novel interactions unrepresented in these specialized databases. Our compiled interactions data comply with the standards set out by Proteomics Standards Initiative (PSI) facilitating easy data exchange. As available annotations increase the challenge is to integrate biological process representation in such a way as to increase our understanding rather than obscure in convoluted figures or excessive detail. The use of a network visualizer provides not only a lucid means of summarizing existing biological knowledge about molecular behavior but also helps in elucidating the potential implications sequence variations can have on protein-protein interactions or the binding of specific transcription factors.
AbstractAbstractAbstractAbstract
IntestinesHollow viscus
Large intestinal structure
Organ with organ cavity
Large Intestine Colon structure Region of large intestine
Colon
A Systems Biology Integrative ApproachA Systems Biology Integrative ApproachA Systems Biology Integrative ApproachA Systems Biology Integrative Approach
GKP-PathMakerGKP-PathMakerGKP-PathMakerGKP-PathMaker
Future DirectionsFuture DirectionsFuture DirectionsFuture Directions
References & SupportReferences & SupportReferences & SupportReferences & Support1. XPrInt and PatholoGene: http://abstrainer.cchmc.org
2. UMLS Knowledge Source Server: http://umlsks.nlm.nih.gov
3. Open Biological Ontologies: http://obo.sourceforge.net
Support: NIEHS U01 ES11038 Mouse Centers Genomics Consortium
PatholoGene – Development of a system to link biological entities, anatomy, pathways and diseases using the UMLS Semantic Network, NCBI-OMIM and MedLine abstract parsing with ICD10 disease terms and gene symbols. The Semantic Network, through its semantic types, provides a categorization of all UMLS Metathesaurus concepts. The links between the semantic types provide the structure for the Network and represent important relationships in the biomedical domain. The UMLS Metathesaurus contains information about biomedical concepts and terms from many controlled vocabularies and classifications used in patient records, bibliographic and full-text databases, and expert systems. As a test case we illustrate the analysis of colon cancer as a function of anatomy, pathology, etiology and disease progression.
PatholoGenePatholoGenePatholoGenePatholoGene
XPrInt: Extracting & Compiling Protein InteractionsXPrInt: Extracting & Compiling Protein InteractionsXPrInt: Extracting & Compiling Protein InteractionsXPrInt: Extracting & Compiling Protein Interactions
Protein
Interaction
s
PreBIND
GeneRIF
HPRD
OMIM
FANCG, NBS1, RB1, TP53, CDKN2A
TNF, IL5, TNFRSF14, IL12B, IL12A, IL8, IL1B, IL4R, LTB, RAG1, TNFRSF6, TNFRSF17, APOE, TNFRSF7, TNFRSF4, TNFRSF9, TNFRSF5, F3, LTA
NIEHS Candidate Genes’ Categorization Based on
GO (Biological Process)
Are these functionally clustered proteins
involved in a common biological network or
interaction? Co-citation in literature
abstracts using gene/protein symbols and “interactome-
specific” keywordsDoes a SNP in one or more biological
entities result in aberrations within a pathway and manifest as a disease or contribute to increased susceptibility to disease or an altered response to therapeutic agents?
MapMoleculeGene BioMaterial
Publishable
GKP Object
Expert Curated
Ontologies
Unified Representation of Disease States and Biological Processes using Clinical Phenotype, Molecular Signatures, and Genetic Attributes
Analysis, Diagnosis and Prediction
Disease State A Disease State B
Therapeutic Intervention
Disease Process
Modeling Tool
Pat
ient-
Cen
tere
d
Clinic
al
Obse
rvat
ions
New Insights & Hypotheses
Path
ways an
d
Processes-
Cen
tered
Biom
edical
Know
ledge
Sample-Centered Genetic and
Genomic DataBiological Entities
12 Siblings (UMLS – Concepts)
→Adenomatous Polyposis Coli
→Basal Cell Nevus Syndrome
→Colorectal Neoplasms, Hereditary Nonpolyposis
→Dysplastic Nevus Syndrome
→Exostoses, Multiple Hereditary
→Hamartoma Syndrome, Multiple
→Li-Fraumeni Syndrome
→Multiple Endocrine Neoplasia
→Nephroblastoma
→Neurofibromatoses
→Peutz-Jeghers Syndrome
→Sturge-Weber Syndrome
Inborn Genetic Diseases
Neoplasms
Hereditary NonPolyposis Colon Cancer
Hereditary Neoplastic Syndromes
HNPCC (hMSH2, hMLH1, hPMS1, hPMS2)
Anatomy Ontology
Disease Ontology
Protein-Protein Interactions
Variation (SNPs)
Protein Domains & 3D Structure
MedLine
Pathway Databases
Sequence Databases
Other Databases
Ontologies
Ontology Explorer
PathBuilder
GPB Integrate
d Annotatio
n
Complex Builder
Gene Summary
Taxonomy
GO Clusterer
Gene Expressio
n
PathMaker Canvas
Genomics Knowledge Platform Biological Object Model
Network Representation
Biological Pathways
Cognitive Processing (Researcher/Scientist Reasoning)
Biological Explanation Mechanistic Explanation
Novel Treatments
Normal Cellular Function Disease Processes
Biomedical Discovery Process
Biological Entities
Genotype
Environment
Etiologies? Treatment?Mechanisms? Signatures? Prevention?
Genome
Transcriptome
Proteome
Interactome
Metabolome
Physiome
Regulome Variome
Pathome
Pharmacogenome
Pathologene Report: Extracting
relationships between disease, anatomy and
genes.