Upload
brenda-kelly
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1 of 42
Browsing Genes and Genomes Browsing Genes and Genomes with Ensemblwith Ensembl
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Maria WilbeDepartment of Animal Breeding and Genetics, SLU, Sweden
2 of 42
Several lecture notes taken Several lecture notes taken from:from:
• Bert OverduinEnsembl User Support
EMBL Outstation
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, UK
• Alvaro Martinez BarrioLinneaus Centre for Bioinformatics,Uppsala University, Sweden
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
3 of 42
What is EnsemblWhat is Ensembl
• A software system which produces and maintains automatic annotation on selected eukaryotic genomes.
• Perform automatic analysis of new genome data• Analysis and annotation maintained on the current data• Presentation of the analysis to all via the web • Ensembl will concentrate on vertebrate genomes, but
other groups have adapted the system for use with plant and fungal genomes
• Powered by Ensembl shows a list of projects that use Ensembl technology
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
4 of 42
Ensembl - OrganisationEnsembl - Organisation
• Joint project between European Bioinformatics Institute (EMBL-EBI) and Wellcome Trust Sanger Institute
• Started in 1999 for the Human Genome Project• Funded primarily by the Wellcome Trust, additional
funding by EMBL, EU, NIH-NIAID, BBSRC and MRC
• Team of ca. 40 people, led by Ewan Birney (EBI) and Tim Hubbard (Sanger)
• Uses the largest dedicated computer system in biology in Europe
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
5 of 42
A Bit of HistoryA Bit of History
• 1995 Haemophilus influenzae 1.8 Mb• 1996 Yeast 12 Mb• 1998 C. elegans 100 Mb• 1999 Fruit fly 125 Mb• 2000 Arabidopsis 115 Mb• 2001 Human (draft)• 2002 Mouse 2.6 Gb• 2004 Human (“finished”) 3 Gb
Sequenced genomes
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
6 of 42
Sequencing genomesSequencing genomesThe term DNA sequencing is a method for determining the order of the nucleotide bases (A,T,C,G)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
7 of 42
Ensembl genomes Ensembl genomes (Ensembl release 49 - March 2008)(Ensembl release 49 - March 2008)
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
8 of 42
Species in EnsemblSpecies in Ensembl
CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA
57
0
50
5
43
8
40
8
36
0
28
6
24
5
20
8
14
4
65
MY
BP
FISHES
BIRDSREPTILES
MAMMALS PLACENTALS
MONOTREMES
MARSUPIALS
OTHER BIRDS
PALEOGNATHS
PASSERINES
CROCODILES
TURTLES
LIZARDS
AMPHIBIANS
TELEOSTS
SHARKS
RAYS
LATIMERIA
BICHIR/POLYPTERUS
LUNGFISHES
AGNATHANS
NON-VERTEBRATESQuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
9 of 42
Ensembl - GoalsEnsembl - Goals
• Provide automatic annotation of genomic sequence
• Integrate other biological data
• Make data available to all via the web
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
10 of 42
AnnotationAnnotationWikipedia:Genome annotation is the process of attaching biological
information to sequences. It consists of two main steps:
1. identifying elements on the genome, a process called Gene Finding:- ORFs and their localisation- gene structure- coding regions- location of regulatory motifs
2. attaching biological information to these elements.- biochemical function- biological function- involved regulation and interactions- expression
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
11 of 42
The big Genome BrowsersThe big Genome Browsers
• Ensembl Genome browserhttp://www.ensembl.org
• NCBI Map Viewerhttp://www.ncbi.nlm.nih.gov/mapview/
• UCSC Genome Browserhttp://genome.ucsc.edu
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
12 of 42
Ensembl / NCBI Map Viewer / Ensembl / NCBI Map Viewer / UCSCUCSC
• All allow access of multiple organisms
• All are based on same data
• Annotations are different
• Assembly versions may differ
• Some organisms specific to only a certain browser
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
13 of 42
NCBI Map Viewer - NCBI Map Viewer - Opening pageOpening page
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
14 of 42
NCBI Map Viewer - NCBI Map Viewer - Result pageResult page
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
15 of 42
UCSC Genome Browser - UCSC Genome Browser - Opening pageOpening page
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
16 of 42
UCSC Genome Browser - UCSC Genome Browser - Search pageSearch page
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
17 of 42
UCSC Genome Browser - UCSC Genome Browser - Default viewDefault view
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
18 of 42
UCSC Genome Browser - UCSC Genome Browser - OptionsOptions
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
19 of 42
UCSC Genome Browser - UCSC Genome Browser - BLAT searchBLAT search
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
20 of 42
Ensembl Genome BrowserEnsembl Genome Browser-Opening page-Opening page
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
21 of 42
Ensembl Genome BrowserEnsembl Genome Browser- Search view- Search view
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Choose human gene
22 of 42
Ensembl Genome BrowserEnsembl Genome Browser- Gene view- Gene view
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
23 of 42
Ensembl Genome BrowserEnsembl Genome Browser- BLAST- BLAST
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
24 of 42
What Distinguishes Ensembl from What Distinguishes Ensembl from the UCSC and NCBI Browsers?the UCSC and NCBI Browsers?
• Automatic annotation for those species for which no manually curated gene set exists
• Direct database access and programmatic access via the Perl API
• Not only the data, but also the software source code is open source
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
25 of 42
Which Data Are Available?Which Data Are Available?• Genomic sequence• Transcript and peptide models• External references• Variation data: SNPs• Mapped cDNAs, peptides, micro array probes,
BAC clones etc.• Other features of the genome:
cytogenetic bands, markers, repeats etc.• Comparative data:
orthologues and paralogues, protein families, whole genome alignments, syntenic regions
• Regulatory data:“best guess” set of regulatory elements
• Data from external sources (DAS)QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
26 of 42
Genomic sequence
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Gene location
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
27 of 42
Genomic sequence
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Export
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
28 of 42
Transcript and peptide info
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Click to view
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
29 of 42
External references
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Click to view
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
30 of 42
Single nucleotide polymorphisms Single nucleotide polymorphisms (SNPs)(SNPs)
• Two human genomes differ by ~0.1%
• Polymorphism: a DNA variation in which each possible sequence is present in at least 1% of people
• Most polymorphisms (~90%) take the forms of SNPs: variations that involve just one nucleotide• ~1 out of every 300 bases in the human
genome• ~10 million in the human genome
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
31 of 42
Practical ApplicationsPractical Applications
• Disease diagnosis
• Association studies
• Forensic testing
• Population genetics and evolutionary studies
• Marker-assisted selection
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
32 of 42
SNPs in Ensembl - TypesSNPs in Ensembl - Types
Non-synonymous In coding sequence, resulting in an aa changeSynonymous In coding sequence, not resulting in an aa changeFrameshift In coding sequence, resulting in a frameshiftStop lost In coding sequence, resulting in the loss of a stop codonStop gained In coding sequence, resulting in the gain of a stop codon
Essential splice site In the first 2 or the last 2 basepairs of an intronSplice site 1-3 bps into an exon or 3-8 bps into an intron
Upstream Within 5 kb upstream of the 5'-end of a transcriptRegulatory region In regulatory region annotated by Ensembl5' UTR In 5' UTRIntronic In intron3' UTR In 3' UTRDownstream Within 5 kb downstream of the 3'-end of a transcriptIntergenic More than 5 kb away from a transcript
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
33 of 42
SNPs in EnsemblSNPs in Ensembl
ContigView: SNPs in genomic context
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
34 of 42
SNPs in EnsemblSNPs in Ensembl
35 of 42
Biological EvidenceBiological Evidence
• UniProt/Swiss-ProtA manually curated database and therefore of highest accuracy
• NCBI RefSeqA partially manually curated database
• UniProt/TrEMBLAutomatically annotated translations of EMBL coding sequence (CDS) features
• EMBL / GenBank / DDBJPrimary nucleotide sequence repository
All Ensembl gene predictions are based on experimental evidence:
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
36 of 42
The Ensembl GenebuildThe Ensembl Genebuild
Genome assembly
Computer programs
Experimental evidence
Ensembl Ensembl GenesGenes
+
+
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
37 of 42
Ensembl IdentifiersEnsembl Identifiers
• ENSG### Ensembl Gene ID• ENST### Ensembl Transcript ID• ENSP### Ensembl Peptide ID• ENSE### Ensembl Exon ID• ENSF### Ensembl Family ID• ENSR### Ensembl Regulatory Feature ID
• For other species than human a suffix is added:MUS for mouse (Mus musculus) : ENSMUSG###,DAR for zebrafish (Danio rerio) : ENSDARG### etc.etc.
• For imported genes Ensembl uses the original identifiers
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
38 of 42
PrPree!! and Archiv and Archivee!! Sites Sites
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
39 of 42
Powered by EnsemblPowered by Ensembl
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
40 of 42
Ensembl – Open SourceEnsembl – Open Source
• Data and software freely available
• More than 50 installs worldwide
• Academia and industry
• Local or available via the web• Mirrors with Ensembl data, e.g. http:
//ensembl.genome.tugraz.at/index.html
or user projects with own data
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
41 of 42
Ensembl AccountsEnsembl Accounts
• Personalise Ensembl by saving bookmarks, view configurations and homepage preferences in a user account
• Share bookmarks and configurations by setting up groups
Please note that all Ensembl data remains free access. It is not necessary to register in order to gain access to Ensembl data!
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
42 of 42
Website StatisticsWebsite Statistics
On average 1,000,000 page impressions / week
Top 3 species:
Top 3 countries:
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
43 of 42
What If I Need Help?What If I Need Help?
• Helpdesk:
• Mailing lists:
[email protected] [email protected]
• Animated tutorials
http://www.ensembl.org/common/Workshops_Online
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
44 of 42
TodayToday
1. Ensembl: www.ensembl.org
1. WORKED EXAMPLE: A walk through the main pages of the Ensembl browser, using the EPO (Erythropoietin precursor) gene as an example (Course Homepage).
2. Ensembl Exercise: Answering questions by using Ensembl (Course Homepage).
3. If time, find information about your favorite gene by using Ensembl.