Upload
ali-kishk
View
285
Download
0
Embed Size (px)
Citation preview
Next Generation Sequencingin Systems Biology
Lec1ALI KISHK
Index Why we need Systems Biology Genetic Dilemma Puzzle of Next generation sequencing Why we need modeling Type of biological models Roles of Networks Types of Biological networks Network vs Pathway vs Biomodels Primary Database Vs Secondary Database
"Holy trinity of systems biology is biology, computational science and technology."
Lee Hood, Institute for Systems Biology
Systems biology
The apple tastes good !
Traditional biology
The hard texture of the apple does not fit in the salad!
Systems biology 101
What is Systems Biology?• Whole-istic approach to
understanding biology .
• It aims at system-level understanding of biology, and to understand biological systems as a system.
http://www.sysbio.de/info/background/WhatIs.shtml
Puzzle of Next generation sequencing
Types of Interactions Networks
Vidal, Cusick and Barbasi, Cell 144, 2011.
In neuroscience
In pharmacology
In Ecological Genomics
Forecasting Ecological Genomics: High-Tech Animal Instrumentation
Meets High-Throughput Sequencing
Strategies to Build a Model
Chapter 2 Modeling Approaches in Systems Biology , Including Silicon Cell Models
Where Do Gene Lists Come From?• Molecular profiling e.g. mRNA, protein
– Identification Gene list– Quantification Gene list + values– Ranking, Clustering (biostatistics)
• Interactions: Protein interactions, microRNA targets, transcription factor binding sites (ChIP)
• Genetic screen e.g. of knock out library• Association studies (Genome-wide)
– Single nucleotide polymorphisms (SNPs)– Copy number variants (CNVs) Other
examples?
What Do Gene Lists Mean?• Biological system: complex, pathway, physical interactors• Similar gene function e.g. protein kinase• Similar cell or tissue location• Chromosomal location (linkage, CNVs)
Data
"One of the big problems in systems biology is separating signal from the noise in the data.“
~ Lee Hood, Institute for Systems Biology
Before Analysis
NormalizationBackground adjustmentQuality control (garbage in, garbage out)
Use statistics that will increase signal and reduce noise specifically for your experiment
Gene list sizeMake sure your gene IDs are compatible with software
Biological Questions
• Step 1: What do you want to accomplish with your list (hopefully part of experiment design! )– Summarize biological processes or other aspects of gene function– Perform differential analysis – what pathways are different between
samples?– Find a controller for a process (TF, miRNA)– Find new pathways or new pathway members– Discover new gene function– Correlate with a disease or phenotype (candidate gene prioritization)– Find a drug
Biological Answers
*Pathway enrichment analysis: summarize and compare
*Network analysis: predict gene function, find new pathway members, identify functional modules (new pathways)
*Regulatory network analysis: find and analyze controllers1
Pathway enrichment analysis
Gene list from experiment:Genes down-regulated in drug-sensitive brain cancer cell lines
Pathway information:All genes known to be involved inNeurotransmitter signaling
Statistical test: are there more annotations in gene list than expected?
Hypothesis: drug sensitivity in brain cancer is related to reduced neurotransmitter signaling0
Test manypathwaysp<0.05 ?
Pathway Enrichment Analysis
• Gene identifiers• Pathways and other gene annotation
– Gene Ontology• Ontology Structure• Annotation
– BioMart + other sources
Gene and Protein Identifiers• Identifiers (IDs) are ideally unique, stable names or numbers that
help track database records– E.g. Social Insurance Number, Entrez Gene ID 41232
• Gene and protein information stored in many databases– Genes have many IDs
• Records for: Gene, DNA, RNA, Protein– Important to recognize the correct record type– E.g. Entrez Gene records don’t store sequence. They link to DNA regions,
RNA transcripts and proteins e.g. in RefSeq, which stores sequence.
Why Systems Biology needs Networks ,
Pathways ,
Biomodels ?
Which pathway does red algea choose in Global Warming ?
Which miRNA /Transcription factor can be used as a biomarker in Prostate cancer ?
How does a specific herbicide not affect my plant ?
What is my Biological question
Types of Biological Networks
What do we mean by pathway?
• Biological process or molecular function
• Metabolic processes• Signaling cascades• Genes are categorized
based on some criteria
Central Dogma Involvement of Gene Products
Gene sets (biological categories)
• Genes (sets) have something in common– On the same cytogenetic band– Coding for proteins that are part of the same
cellular component– Can be part of the same biochemical pathway– Co-expressed under certain conditions– Putative targets of same regulatory factor– ….
What is the Gene Ontology (GO)?• Set of biological phrases (terms) which are applied to genes:
– protein kinase– apoptosis– membrane
• Dictionary: term definitions• Ontology: A formal system for describing knowledge• www.geneontology.org
lwww.geneontology.orgJane Lomax @ EBI
Gene Ontology• Gene Ontology (GO) Consortium was established in 1998 to
developed shared, structured vocabulary (an ontology) for the annotation of molecular characteristics across different organisms.
– a collaborative effort to address the need for consistent descriptions of gene and gene products in different databases
– Original members of the consortium: SGD, FlyBase and MGD
• Two primary purposes for an ontology:1. to facilitate communication between people and
organizations2. to improve upon the interoperability between systems
GO structure• The ontologies are structured vocabularies in the form of directed acyclic
graphs (DAGs)• The DAG represents a network (not a tree) in which each term may be a child
of one or more than one parent • The relationships of child to parent can be of the “is a” type or the “part of”
type
telomere
chromosome
mitotic chromosomeis a
part of
Ontologies within GO
molecular function describing activities, such as catalytic or binding activities, at the molecular level
biological process referring to a biological
objective to which the gene product contributes cellular component referring to the place in the cell
(i.e. the location) where a gene product is found
Primary Database
Vs
Secondary Database
How to save time find a biological database
BIOSHARING:
A DATABASE OF BIOLOGICAL DATABASES
KEGG: Kyoto Encyclopedia of Genes and Genomes
About KEGG • Kyoto Encyclopedia of Genes and
Genomes (KEGG) knowledgebase was developed in 1996 consisting of genetic building blocks of genes and proteins.
• A collection of manually drawn pathway maps representing current knowledge on the molecular interaction and reaction networks
• Manually curated based on published literature
• Constructed as wiring diagrams with enzymes and proteins, processes and reactions and substrates, co-factors, intermediates, metabolites and end products
Category in KEGG• Metabolism: carbohydrates,
energy, lipid, nucleotides, amino acid, xenobiotics
• Genetic information processing
• Environmental information processing
• Cellular processes • Human diseases • Drug development: the
structure relationships
More on gene set collections
• Gene Ontology (GO)– Cellular components (CC)– Biological processes (BP)– Molecular functions (MF)
• Well curated pathway database– KEGG pathway– Biocarta– Reactome– GenMAPP– IPA pathway database
• Gene set collections– MSigDB– GAzer
2nd
databases in Plant Systems Biology
STRING
It is a database of protein-protein interaction withknown as well as predicted information.
http://string-db.org/
2nd
databases in Plant Systems Biology
DIP
The DIP database provides experimentallydetermined interactions between proteins.
http://dip.doe-mbi.ucla.edu/dip/Main.cgi
2nd
databases in Plant Systems Biology
-Plant Metabolic Network (PMN)
It is a database that provides information withrespect to metabolic pathway in plants.
http://www.plantcyc.org/
Common IdentifiersSpecies-specific
lHUGO HGNC BRCA2lMGI MGI:109337
lRGD 2219 lZFIN ZDB-GENE-060510-3
lFlyBase CG9097 lWormBase WBGene00002299 or ZK1067.1
lSGD S000002187 or YDL029WlAnnotations
lInterPro IPR015252lOMIM 600185
lPfam PF09104lGene Ontology GO:0000724
lSNPs rs28897757lExperimental Platform
lAffymetrix 208368_3p_s_atlAgilent A_23_P99452
lCodeLink GE60169lIllumina GI_4502450-S
lGeneEnsembl ENSG00000139618Entrez Gene 675
lUnigene Hs.34012
lRNA transcriptlGenBank BC026160.1
RefSeq NM_000059lEnsembl ENST00000380152
lProteinlEnsembl ENSP00000369497
RefSeq NP_000050.2UniProt BRCA2_HUMAN or A1YBP1_HUMAN
lIPI IPI00412408.1lEMBL AF309413
lPDB 1MIU
Red = Recommended
Identifier Mapping
• So many IDs!– Software tools recognize only a handful– May need to map from your gene list IDs to standard IDs
• Four main uses– Searching for a favorite gene name– Link to related resources– Identifier translation
• E.g. Proteins to genes, Affy ID to Entrez Gene– Merging data from different sources
• Find equivalent records
Thank You [email protected]
linked-in : https://eg.linkedin.com/in/ali-kishk-997423a9