Bioinformatics Introduction

Bioinformatics in medicinetoday

David Montanerdmontaner@cipf.es

Centro de Investigación Príncipe FelipeInstitute of Computational Genomics

9 May 2013in Valencia

David Montaner Bioinformatics in medicine 1/26

Genomics

“Progress in science depends on new techniques, newdiscoveries and new ideas, probably in that order.”Sydney Brenner, 1980

Microarray devices and high-throughput sequencing allow usmeasuring thousands or millions of genomic characteristics.

Genomics vs. genetics

Genetics:• Single genes are responsible for biological changes.• one gene→ one hypothesis→ one p-value→ conclusions

Genomics:• Genes or genomic features act together to produce

biological changes.• many genes→ many hypothesis→ many p-value→→ more data analysis

• Computational support is needed even for drawingconclusions

Genomic numbers

Microarray:• 30.000 genes• 2 million SNPs• 100 Mb

Measured features:• genes, isoforms• SNPs, Polymorphisms• IN-DELS• loss of heterozygosity• methylation• copy number alterations

NGS:• 30.000 genes• 30.000 transcripts• 20 million SNPs• 10-100 GB

Registered information:• Genomic characteristics:

position, chromosome ...• Biological function• Disease association• miRNA targets

Genomic databases

Nucleic Acid Research lists +1500 online databases!http://www.oxfordjournals.org/nar/database/c

• Many different databases for each category, which should Iuse?

• No standards: different IDs, methods, servers, formats, ...• Lack of international initiatives, many local and small

databases• Different gene IDs, more than 50• In vivo vs in silico databases

Biological databases (Wikipedia)1 Primary nucleotide

sequence databases2 Metadatabases3 Genome databases4 Protein sequence

databases5 Proteomics databases6 Protein structure

databases7 Protein model databases8 RNA databases9 Carbohydrate structure

databases10 Protein-protein interactions

11 Signal transductionpathway databases

12 Metabolic pathwaydatabases

13 Experimental datarepositories (MicroarraysNGS, Sanger)

14 Exosomal databases15 Mathematical model

databases16 PCR / real time PCR

primer databases17 Specialized databases18 Taxonomic databases19 Wiki-style databasesDavid Montaner Bioinformatics in medicine 6/26

Primary nucleotide sequencedatabases

Contain any kind of nucleotide sequences, form genes togenomes.

The International Nucleotide Sequence Database (INSD)Collaboration:

• GenBankNational Center for Biotechnology Information (NCBI)

• European Nucleotide Archive (ENA)European Bioinformatics Institute (EBI)

• DNA Data Bank of Japan (DDBJ)

GenBankPrimary nucleotide sequence databases

• available on the NCBI ftp site:http://www.ncbi.nlm.nih.gov/Ftp/

• A new release is made every two months.• 3 types of entries:

• CoreNucleotide (the main collection)• dbEST (Expressed Sequence Tags)• dbGSS (Genome Survey Sequences)

Access:• Search for sequence identifiers using Entrez Nucleotide:

http://www.ncbi.nlm.nih.gov/nucleotide/• Align GenBank sequences to a query sequence using

BLAST (Basic Local Alignment Search Tool).http://blast.ncbi.nlm.nih.gov/Blast.cgi

• Several other e-utilities (see book)See an example of a GenBank record.

Metadatabases

• Collect and organize data from primary nucleotidesequence databases and may other resources.

• Make the information available in a convenient format andprovide data handling resources: web pages, applicationprogramming interface (API) …

• Focus on particular species, diseases …

Examples• Entrez: searches through almost all NCBI resources.

http://www.ncbi.nlm.nih.gov/sites/gquery• GeneCards: provides genomic, proteomic, transcriptomic,

genetic and functional information for human genes (knownand predicted )http://www.genecards.org/

EntrezMetadatabases

• Searches through almost all NCBI resources.• Entrez search page: http://www.ncbi.nlm.nih.gov/sites/gquery• queries can be saved if you have a a MyNCBI account

http://www.ncbi.nlm.nih.gov/

Genome databasesCollect genome sequences and annotation (specification aboutgenes) for particular organisms, and try to improve them:

• Data curation.• Complete missing information using insilico methods.• Generate new relational organization.• Complement feature IDs.• Provide “easy” access, visualization …

Examples

• Ensembl: automatic annotation on selected eukaryotegenomes.

• UCSC Genome Browser: reference sequence and workingdraft assemblies for a large collection of genomes

• Wormbase: genome of the model organism C.elegans.

EnsemblGenome databases

• Ensembl is a joint project between European BioinformaticsInstitute (EBI) the European Molecular Biology Laboratory(EMBL) and the Wellcome Trust Sanger Institute.

• Develop a software system which produces and maintainsautomatic annotation on selected vertebrate andeukaryote genomes.

• http://www.ensembl.org

UCSC Genome BrowserGenome databases

• UCSC: University of California, Santa Cruz.• This site contains the reference sequence and working

draft assemblies for a large collection of genomes.• http://genome.ucsc.edu/

Protein sequence databases

• Most times proteins are the final unit of interest to research.• There is a direct conversion from DNA/RNA sequences to

protein sequences.• Gene IDs and protein IDs are equivalently used by

researchers (biologists not bioinformaticians …)

Examples

• UniProt: Universal Protein Resource (EBI)• Swiss-Prot (Swiss Institute of Bioinformatics)• InterPro Classifies proteins into families and predicts the

presence of domains and sites.• Pfam Protein families database of alignments and HMMs

(Sanger Institute)

RNA databases

• Contain information about RNA molecules.• Most of them regarding gene regulatory factors. (Gene

information is usually in other repositories).

Examples

• mirBase: microRNAshttp://www.mirbase.org/

• TRANSFAC: transcription factors in eukaryote (Proprietarydatabase).

• JASPAR: transcription factor binding sites for eukaryote(Open access, curated, non-redundant).http://jaspar.genereg.net/

Protein-protein interactions

• Proteins are the main functional units.• But they do not work in isolation.• Pretty useless at the moment but promising in the future …• some information is experimental, but most of it is

generated insilico.

Examples• IntAct: protein–small molecule

and protein–nucleic acidinteractions.

• BIND: Biomolecular InteractionNetwork Database.

Signal transduction pathwaydatabases& Metabolic pathway databases

• Information about how genes (or proteins) interact amongthem.

• not only physical interactions …

Examples• Reactome: free online database of biological pathways.

http://www.reactome.org• KEGG: Kyoto Encyclopedia of Genes and Genomes.

Metabolic pathways.http://www.genome.jp/kegg/pathway.html

KEGGMetabolic pathway databases

Experimental data repositories

Contain Microarray, NGS, Sanger, and other experimental highthroughput data.

• GEO: Gene Expression Omnibus (NCBI)http://www.ncbi.nlm.nih.gov/geo/

• ArrayExpress: database of functional genomicsexperiments including (EBI)http://www.ebi.ac.uk/arrayexpress/

• The Cancer Genome Atlas (TCGA): Data on differentcancer related tissues.http://cancergenome.nih.gov/

Bioinformatics

Training• Biology 1/3• Statistics 1/3• Computer science 1/3←−

Efficiently combine:• Experimental information• Database registered knowledge

Time and resources:• As in the wet lab

Example

Example I

Autistic children

1 (microarray) NGS data processing• data quality control, filtering...• map against reference genome• CNV calling

2 CNV filtering• just 75 rare de novo CNV events (not registered in

databases)• filter out the long ones• keep the ones that contain genes

Example II

3 move to the gene level• 47 loci in total affecting 433 human genes

4 Building the background likelihood network• GO annotations• KEGG pathways• InterPro domains• protein-proteins interactions. Databases: BIND, BioGRID,

DIP, HPRD, InNetDB, IntAct, BiGG, MINT, and MIPS• sequence homology between the gene pair (BLAST)

Example III

5 Search for high scoring clusters affected by CNVs6 Evaluating significance of cluster scores:

10.000 simulations

Example IV7 Functional characterization of the identified network

8 And, finally, draw conclusions

Questions

Thanks

Bioinformatics Introduction

Documents

Introduction to Bioinformatics · biopotato bioinformatics Introduction to Bioinformatics. 8 What is Bioinformatics? Interdisciplinary a biology/medical researchers, just like you

Introduction to bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics -

Introduction to Bioinformatics … · Introduction to Bioinformatics Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2005.02 SIB and EMBnet Bioinformatics

BIOINFORMATICS Introduction

Introduction to Bioinformatics - hu-berlin.de · Introduction to Bioinformatics Ulf Leser . Ulf Leser: Bioinformatics, Summer Semester 2016 2 Bioinformatics 25.4.2003 50. Jubiläum

Introduction to Bioinformatics Yana Kortsarts References: An Introduction to Bioinformatics Algorithms bioalgorithms.info

1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Introduction to bioinformatics, 2010 - Göteborgs universitetbio.lundberg.gu.se/courses/vt10/2010_RNA_genomics_toprint.pdf · Introduction to bioinformatics, 2010. RNA bioinformatics

Introduction to Bioinformatics

1. INTRODUCTION TO BIOLOGY AND BIOINFORMATICS · INTRODUCTION TO BIOLOGY AND BIOINFORMATICS BIOINFORMATICS COURSE MTAT.03.239 11.09.2013 . 2 "Introduction to Bioinformatics" Bioinformatics

Introduction to Bioinformatics - Department of Computer ...khuri/Yverdon_2010/Yverdon_2010_Bioinformatics.pdf · Introduction to Bioinformatics @2010 Sami Khuri Introduction to Bioinformatics

Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

Bioinformatics: course introduction

Introduction to Bioinformatics · Introduction to Bioinformatics Shifra Ben-Dor Irit Orr. What is bioinformatics? A marriage between Biology and Computers! What is bioinformatics?

Introduction to Bioinformatics

EB3233 Bioinformatics Introduction to Bioinformatics

Introduction to Bioinformatics - hu-berlin.de · 2017-04-21 · Introduction to Bioinformatics Johannes Starlinger. Johannes Starlinger: Bioinformatics, Summer Semester 2017 2 Bioinformatics