33
Emily Perry Ensembl Outreach Project Leader EMBL-EBI Browsing Genes and Genomes with Ensembl

Browsing Genes and Genomes with Ensembl · 4th May Comparing genes and genomes with Ensembl Compara Ben Moore 11th May Finding features that regulate genes – the Ensembl Regulatory

  • Upload
    hahanh

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Emily Perry

Ensembl Outreach Project Leader

EMBL-EBI

Browsing Genes and Genomes with Ensembl

Objectives

• What is Ensembl?

• What type of data can you get in Ensembl?

• How to navigate the Ensembl browser website.

• Where to go for help and documentation.

This webinar courseDate Webinar topic Instructor

6th April Introduction to Ensembl Helen Sparrow

13th

April

Ensembl genes Emily Perry

20th

April

Data export with BioMart Victoria Newman

27th

April

Variation data in Ensembl and the Ensembl VEP Victoria Newman

4th May Comparing genes and genomes with Ensembl Compara Ben Moore

11th May Finding features that regulate genes – the Ensembl Regulatory

Build

Ben Moore

18th May Uploading your data to Ensembl and advanced ways to access

Ensembl data

Emily Perry

Structure

Presentation:Where Ensembl genes come from

Demo:Getting gene data

Exercises:On the train online course

Questions?

• We’ve muted all the mics• Ask questions in the Chat box in

the webinar interface• My Ensembl colleagues will

respond during the talk• There’s no threading so please

respond with @name

Ben Moore Victoria Newman

Course exercises

http://www.ebi.ac.uk/training/online/course/ensembl-browser-webinar-series-2016

This text will be replaced by a YouTube (link to YouKu too) video of the webinar

and a pdf of the slides.

The “next page” will be the exercises

A link to exercises and their solutions will appear in the page

hierarchy

Get help with the exercises

• Use the exercise solutions in the online course

• Join our Facebook group and discuss the exercises with everybody (see the online course for the link)

• Email us [email protected]

EBI is an Outstation of the European Molecular Biology Laboratory.

Genes and Transcripts

Gene views

Merged transcript

Protein coding transcript

Non-coding transcript

Coding exon Intron Non-coding exon

2## - Ensembl annotation

0## - Havana annotation

Golden transcripts

• Identical annotation

• gf• Higher confidence and quality

Ensembl and Havana annotation

Automatic annotation Manual annotation

Automatic gene annotation

• Genome-wide determination using the Ensembl automated pipeline

• Predictions based on experimental (biological) data

• Known proteins/cDNAs plotted onto the genome using sequence matching

Biological Evidence

• International Nucleotide Sequence databases

• Protein sequence databases• Swiss-Prot: manually curated

• TrEMBL: unreviewed translations

• NCBI RefSeq• Manually annotated proteins and mRNAs (NP, NM)

Other species

• Infer genes from homology to other species• Eg predict genes in by mapping cDNAs/proteins

from to the genome

• RNAseq data

Manual gene annotation

• Gene determination on a case-by-case basis by a person

• h

• Genome-wide

• Genes list

GENCODE

• The GENCODE gene set is made up of:

• Ensembl automatically annotated genes

• Havana manually annotated genes

• The merged gene set

• GENCODE is the default gene set used by ENCODE, 1000 genomes and other major projects.

Golden transcripts

• Identical annotation

• gf• Higher confidence and quality

CCDS transcripts

• Consensus coding DNA sequence set

• Agreement between EBI, WTSI, UCSC and NCBI

• vg• http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi

Higher quality transcripts

Which transcript to use?

• GENCODE Basic: Only the “complete” transcripts (where a gene has complete transcripts) (http://www.ensembl.org/Help/Glossary?id=500)

• Transcript support level: Scored 1-5 for quality, where 1 is the best (http://www.ensembl.org/Help/Glossary?id=492)

• APPRIS principal isoform: The major isoform(s) from combining protein structural information, functionally important residues and evidence from cross-species alignments. (http://www.ensembl.org/Help/Glossary?id=521)

• + CCDS, + Golden transcripts

Ensembl stable IDs

• ENSG########### Ensembl Gene ID

• ENST########### Ensembl Transcript ID

• ENSP########### Ensembl Peptide ID

• ENSE########### Ensembl Exon ID

• For non-human species a suffix is added:

MUS (Mus musculus) for mouse ENSMUSG###

DAR (Danio rerio) for zebrafish: ENSDARG###

http://www.ensembl.org/info/genome/stable_ids/index.html

Why Gene Ontology (GO)?

Innate immunity

Non-specific immunity

Phagocyte

Complement Cytokines Natural killer cells

Multiple terms for the same thing

Gene descriptions too specific

Mast cells

GO terms form a controlled vocabulary

GO:0045087 - innate immune responseInnate immune responses are defense responses mediated by germline encoded components that directly recognise components of potential pathogens.

GO terms are hierarchical

GO:0045087innate immune response

GO:0006955immune response

GO:0006957complement activation,

alternative pathway

GO:0001867complement activation,

lectin pathway

GO:0009814defence response,

incompatible interaction

GO:0042381hemolymph coagulation

GO:0009682induced systemic

resistance

GO:0002227innate immune response

in mucosa

GO:0035420MAPK cascade involved in innate immune response

GO:0035006melanisation defence

response

GO:0002228natural killer cell

mediated immunity

GO:0045824negative reg of innate

immune response

GO:0009626plant-type

hypersensitive response

GO:0045089positive reg of innate

immune response

GO:0045088regulation of innate immune response

GO:0034341response to

interferon-gamma

GO:0034340response to type I

interferon

GO:0034342response to type II

interferon

GO:0009616virus induced gene

silencing

Hands on

• We’re going to look at an Ensembl gene, ESPN, and find out information about it and its transcripts.

Next webinar – Data export with BioMart

Ensembl data can be easily exported in bulk using BioMart. BioMart is a flexible tool that allows you to easily specify what Ensembl features you want data for, and what data you want to see about them, then export those data in a table or as sequences.

Learn the basics of running a BioMart query, and

explore some of the options that are available.

Victoria Newman

Questions?

• We’ve muted all the mics• Ask questions in the Chat box in

the webinar interface• My Ensembl colleagues will

respond during the talk• There’s no threading so please

respond with @name

Ben Moore Victoria Newman

Course exercises

http://www.ebi.ac.uk/training/online/course/ensembl-browser-webinar-series-2016

This text will be replaced by a YouTube (link to YouKu too) video of the webinar

and a pdf of the slides.

The “next page” will be the exercises

A link to exercises and their solutions will appear in the page

hierarchy

Get help with the exercises

• Use the exercise solutions in the online course

• Join our Facebook group and discuss the exercises with everybody (see the online course for the link)

• Email us [email protected]

Help and documentationCourse online http://www.ebi.ac.uk/training/online/subjects/11

Tutorials www.ensembl.org/info/website/tutorials

Flash animations

www.youtube.com/user/EnsemblHelpdesk

http://u.youku.com/Ensemblhelpdesk

Email us [email protected]

Ensembl public mailing lists [email protected], [email protected]

Follow us

www.facebook.com/Ensembl.org

@Ensembl

www.ensembl.info

Publications

Aken, B. et al

Ensembl 2017

Nucleic Acids Research

http://europepmc.org/articles/PMC5210575

Xosé M. Fernández-Suárez and Michael K. SchusterUsing the Ensembl Genome Server to Browse Genomic Sequence Data.Current Protocols in Bioinformatics 1.15.1-1.15.48 (2010)www.ncbi.nlm.nih.gov/pubmed/20521244

Giulietta M Spudich and Xosé M Fernández-SuárezTouring Ensembl: A practical guide to genome browsingBMC Genomics 11:295 (2010)www.biomedcentral.com/1471-2164/11/295

http://www.ensembl.org/info/about/publications.html

Ensembl AcknowledgementsThe Entire Ensembl Team

Funding

Co-funded by the European Union