112
s funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 226073 (Integratin EBI services Jennifer McDowall EMBL-EBI

EBI services

  • Upload
    kiaria

  • View
    64

  • Download
    2

Embed Size (px)

DESCRIPTION

EBI services. Jennifer McDowall EMBL-EBI. Overview. Introduction EBI Databases Searching for sequences NEW: simple EBI search Advanced SRS text search Sequence search tools Accessing Old entries Sequence archives Chemoinformatics. Thematic index. Website:. http://www.ebi.ac.uk/. - PowerPoint PPT Presentation

Citation preview

Page 1: EBI services

The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 226073 (Integrating Activity)

EBI services

Jennifer McDowall

EMBL-EBI

Page 2: EBI services

Overview

• Introduction

• EBI Databases

• Searching for sequences

– NEW: simple EBI search

– Advanced SRS text search

– Sequence search tools

• Accessing Old entries

– Sequence archives

• Chemoinformatics

Page 3: EBI services

Website: http://www.ebi.ac.uk/

Thematic index

EBI SearchSearch all databases and

literature in one go

EBI SearchSearch all databases and

literature in one go

Page 4: EBI services

Website:

www.ebi.ac.uk

Databases• Patent resources• Sequences• Genomes• Chemistry• Structures• Gene expression• Reactions & pathways• Literature

• Sequence searching

• Sequence analysis

• Structural analysis

• Functional analysis

Tools

Training• eLearning• Workshops• 2Can education resource

Industry programme

• Industry support• SME Support

Page 5: EBI services

patent-related resources...

EBI databases

Page 6: EBI services

Sequence data from patent literature

October 2010 patent nucleotides > 17.5m sequences

patent proteins > 4.9m sequences

GenBankGenBank

ENAENA

DDBJDDBJ

EPOEPO

USPTOUSPTO JPO + JPO + KIPOKIPO

EPO policy: Data publically released

18 months afterpatent application date

(whether patent granted or not)

INSDC agreement: • Free unrestricted access• Permanently accessible• All data exchanged daily

Page 7: EBI services

Patent resources at EBI

www.ebi.ac.uk/patentdata

Page 8: EBI services

Patent sequence records at EBI

NR patentsequences

• >124 million sequences• patent + non-patent nucleotides• redundant

UniParc(division of UniProt)

ENA(formerly EMBL-Bank)

• >24 million sequences• patent + non-patent proteins• non-redundant

• patent proteins and nucleotides • non-redundant• additional patent annotation

non-patent sequence

prior art searches

patent sequence

prior art searches

Page 9: EBI services

Non-redundant patent databases

www.ebi.ac.uk

Remove sequence redundancy

Level-1 NR

Group by patent families

Level-2 NR

Additional annotation, including priority dates for patent

families

ENA(redundant)

Page 10: EBI services

Sequence submissions

Generate sequence

Submit to journalSubmit to ENA

Submission guides at www.ebi.ac.uk

Not acceptedSubmit to journalStep 2

Submit claim to EPOStep 1

Page 11: EBI services

Searching for sequences

simple EBI search...

Page 12: EBI services

EBI-Search by patent number

www.ebi.ac.uk

Follow link to NEW EBI Search

Page 13: EBI services

Link to NEW EBI Search

EBI-Search by patent number

Page 14: EBI services

EBI-Search by patent number

Link to NEW EBI Search• Getting started • How it works• Gene & protein

summaries

NEW EBI Search

Training

video

Page 15: EBI services

EBI-Search by patent number

Link to NEW EBI Search

Search for patent WO0146262

Page 16: EBI services

Link to NEW EBI Search

EBI-Search by patent number

Search for WO0146262

Page 17: EBI services

EBI-Search by patent number

Link to NEW EBI Search Search for WO0146262

Literature for WO0146262

Sequence data for WO0146262

Page 18: EBI services

EBI-Search by patent number

Link to NEW EBI Search Search for WO0146262

Link to full patent paper

Page 19: EBI services

EBI-Search by patent number

Link to NEW EBI Search Search for WO0146262

WO0146262 literature and sequence databases

Page 20: EBI services

Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases

WO0146262 in CiteXplore

EBI-Search by patent number

Page 21: EBI services

EBI-Search by patent number

Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases

WO0146262 in CiteXplore

Page 22: EBI services

EBI-Search by patent number

Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases

WO0146262 in CiteXplore

WO0146262 in Esp@cenet

Page 23: EBI services

EBI-Search by patent number

Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases

WO0146262 in CiteXplore

WO0146262 in Esp@cenet

Page 24: EBI services

EBI-Search by patent number

WO0146262 in Esp@cenet

Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases

WO0146262 in CiteXplore

WO0146262 in Patent Lens

Page 25: EBI services

EBI-Search by patent number

WO0146262 in Esp@cenet

Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases

WO0146262 in CiteXplore

WO0146262 in Patent Lens

Page 26: EBI services

EBI-Search by patent number

WO0146262 in Esp@cenet

Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases

WO0146262 in CiteXplore

WO0146262 in Patent Lens

Lists nucleotide sequences from

WO0146262

Additional annotation

Page 27: EBI services

EBI-Search by patent number

WO0146262 in Esp@cenet

Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases

WO0146262 in CiteXplore

WO0146262 in Patent Lens

WO0146262 nucleotide

sequence record in ENA

Page 28: EBI services

Patent sequence record in ENA

www.ebi.ac.uk

Graphical viewer

Sequence

Patent reference

Navigate to related datae.g. Version

archive

Navigate to external data

sourcese.g. UniProt

Download data

DNA source

Dates (first public and last updated)

Sequence version

Page 29: EBI services

WO0146262 in Esp@cenet

Link to NEW EBI Search Search for WO0146262

WO0146262 in CiteXplore

WO0146262 in Patent Lens

WO0146262 literature and sequence databases

ENA sequence record

EBI-Search by patent number

Page 30: EBI services

EBI-Search by gene name

Link to NEW EBI Search

Search for src gene

Page 31: EBI services

Link to NEW EBI Search

EBI-Search by gene name

Search for src

Page 32: EBI services

EBI-Search by gene name

Link to NEW EBI Search Search for src

Genome information

Gene & protein summaries

Page 33: EBI services

EBI-Search by gene name

Link to NEW EBI Search Search for src

Let’s select src in humans

Page 34: EBI services

EBI-Search by gene name

Link to NEW EBI Search

src gene & protein summary

Search for src

Page 35: EBI services

EBI-Search by gene name

Link to NEW EBI Search src gene & protein summarySearch for src

Species

selector

Page 36: EBI services

EBI-Search by gene name

Link to NEW EBI Search src gene & protein summarySearch for src

Gene tab

Gene structure

(forward &

reverse strand)

• Gene sequence• Location• Sequence variations• Orthologs

Data source

(Ensembl)

Page 37: EBI services

src gene & protein summaryLink to NEW EBI Search Search for src

Gene & protein summarygene tab

EBI-Search by gene name

Page 38: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

Expression tab

Expression

studies

Data source (Expression

Atlas)

EBI-Search by gene name

Page 39: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

See expression

in cell type

EBI-Search by gene name

Gene Atlas

Page 40: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

EBI-Search by gene name

Gene & protein summaryexpression tab

Page 41: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

EBI-Search by gene name

Gene & protein summaryexpression tab

Protein tab

• Function• Isoforms• Sequence• Classification• Interactions

Data sources

(UniProt, InterPro, IntAct)

Page 42: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

EBI-Search by gene name

Gene & protein summaryprotein tab

Gene & protein summaryexpression tab

Page 43: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

EBI-Search by gene name

Gene & protein summaryprotein tab

Gene & protein summaryexpression tab

Structure

tab

Citation

Data source (PDBe)

Structural

domains

47 additional

structures

Page 44: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

EBI-Search by gene name

Gene & protein summaryprotein tab

Gene & protein summaryexpression tab

Gene & protein summarystructure tab

Page 45: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

EBI-Search by gene name

Gene & protein summaryprotein tab

Gene & protein summaryexpression tab

Gene & protein summarystructure tab

Literature

tabSearch results taken from:

• PubMed• PubMedUK

• Agricola• EPO

Divided into

categoriesDescription

of categories

Page 46: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

EBI-Search by gene name

Gene & protein summaryprotein tab

Gene & protein summaryexpression tab

Gene & protein summarystructure tab

Literature

tab

PatentsCurator-selected

articles

Page 47: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

EBI-Search by gene name

Gene & protein summaryprotein tab

Gene & protein summaryexpression tab

Gene & protein summarystructure tab

Gene & protein summaryliterature tab

Page 48: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

EBI-Search by gene name

Gene & protein summaryprotein tab

Gene & protein summaryexpression tab

Gene & protein summarystructure tab

Reporting view

print full summary page

Page 49: EBI services

Link to NEW EBI Search Search for src src gene & protein summary

Gene & protein summarygene tab

EBI-Search by gene name

Gene & protein summaryprotein tab

Gene & protein summaryexpression tab

Gene & protein summarystructure tab

Gene & protein summaryliterature tab Print report

Page 50: EBI services

Searching for sequences

advanced SRS text search...

Page 51: EBI services

SRS – for more search options

www.ebi.ac.uk/srs

1st: Select resources to search

2nd: Create query

Page 52: EBI services

SRS – for more search options

Select library tab

Page 53: EBI services

SRS – for more search options

Select library tab

Patent literature

Patent DNA

Patent proteins

Search >100 databases

Page 54: EBI services

SRS – for more search options

Select library tab

Here, selected NR-level 2 DNA

database

Page 55: EBI services

SRS – for more search options

Select library tab

Select resources to search

Page 56: EBI services

SRS – for more search options

Select library tab Select resources to search

2) Type in text1) Select field

Page 57: EBI services

SRS – for more search options

Select library tab Select resources to search

Here, selected patent number

Page 58: EBI services

SRS – for more search options

Select library tab Select resources to search

Create query

Page 59: EBI services

SRS – for more search options

Select library tab Select resources to search Create query

Lists non-redundant nucleotide sequences

from WO0146262

Page 60: EBI services

SRS – for more search options

Select library tab Select resources to search Create query

WO0146262 sequences

Page 61: EBI services

SRS – for more search options

Select library tab Select resources to search Create query

WO0146262 sequences

WO0146262 nucleotide

sequence record in NRNL2

Page 62: EBI services

Patent sequence record in NRNL2

Patent equivalents

Sequence record in ENA

Sequence

Patent literature

Priority numberand date

Translation

Page 63: EBI services

SRS – for more search options

Select library tab Select resources to search Create query

WO0146262 sequencesNRNL2 sequence record

Page 64: EBI services

SRS – for more search options

Select library tab Select resources to search Create query

WO0146262 sequences

NRNL2 sequence record

WO0146262 literature

www.ebi.ac.uk/srs

Page 65: EBI services

Searching for sequences

sequence search...

Page 66: EBI services

Sequence searching – specialised tools

Navigate to ‘Sequence Similarity & Analysis’

www.ebi.ac.uk

Page 67: EBI services

Sequence searching – specialised tools

Navigate to search tools

Page 68: EBI services

Sequence searching – specialised tools

Navigate to search tools

www.ebi.ac.uk/Tools/sss

BLASTBLAST

FASTAFASTA

PSI searchPSI search

ChooseSearch tool

Page 69: EBI services

When to use which search?Q

uery

leng

th

FASTA

WU-BLAST

NCBI BLAST

PSI-SEARCH

time

to s

earc

h

Database size

Page 70: EBI services

When to use which search?

Chose the appropriate

search engine for the job

BLAST – initial fast search

FASTA – better general search engine

PSI-BLAST – find remote family members

GLSEARCH – match peptide/domain to protein

GGSEARCH –full length matches

FASTM – match several peptides to protein

(one search engine won’t do everything)

Page 71: EBI services

Sequence searching – specialised tools

Navigate to search tools

www.ebi.ac.uk/Tools/sss

Here, try FASTA protein

Page 72: EBI services

Sequence searching – specialised tools

Navigate to search tools

Select search tool

Page 73: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool

For patent proteins:

Search individual patent offices

ornon-redundant patent datasets

Step 1: Select database

Page 74: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool

Here, selected

UniProt Knowledgebase+

NR patent proteins L2

Step 1: Select database

Page 75: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool

(1) Select database

Page 76: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

Step 2: Copy/paste sequence

orupload file

Copy/pasted

patent protein A00210

from patent EP0242329

Page 77: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

Page 78: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

Step 3: Set

parameters

Can change

search engine

Page 79: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

Step 3: Set

parameters

Can change

search

parameters

Page 80: EBI services

How to optimise parameters?

User manual

provides help

Page 81: EBI services

How to optimise parameters?

2. length of query sequence

Choice of matrix

depends on:

1. strictness of search

QUERY LENGTH  MATRIX    open   ext    >300       BLOSUM50  -10    -2    85-300     BLOSUM62   -7    -1    50-85      BLOSUM80  -16    -4    >300       PAM250    -10    -2    85-300    PAM120    -16    -4    35-85    MDM40     -12    -2     <=35     MDM20     -22    -4     <=10     MDM10     -23    -4

Page 82: EBI services

How to optimise parameters?

Choice of gap penalties depends on:

2. to match scoring matrix

1. strictness of search

QUERY LENGTH  MATRIX    open   ext    >300       BLOSUM50  -10    -2    85-300     BLOSUM62   -7    -1    50-85      BLOSUM80  -16    -4    >300       PAM250    -10    -2    85-300    PAM120    -16    -4    35-85    MDM40     -12    -2     <=35     MDM20     -22    -4     <=10     MDM10     -23    -4

• larger penalty fewer gaps

Page 83: EBI services

How to optimise parameters?

Do I mask my sequence?

**Be careful you don’t mask what you are looking for

Low complexity regions should be

masked to avoid spurious results

• CA repeats

• poly-A tails

• proline-rich regions

Page 84: EBI services

How to optimise parameters?

use strict matrices

use high gap penalties

avoid masking

allow high e-values

What do I use for short

sequences?

Page 85: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

Step 3: Set

parameters

Here, use default

parameters

Page 86: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters

Page 87: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters

Step 4: submit Can select to

have results

emailed

Page 88: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters

(4) Submit

Page 89: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters(4) Submit

Results include patent

proteins (from NRPL2)...

...and non-patent

proteins (from

UniProtKB)

View additional annotation

(non-patent proteins)

Page 90: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters(4) Submit

Related EMBL

nucleotide entries

Page 91: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters(4) Submit

Related genomic

information

Page 92: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters(4) Submit

Gene ontology (GO)

mapping for protein

Page 93: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters(4) Submit

InterPro family/domain

classification

Page 94: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters(4) Submit

Literature

Page 95: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters(4) Submit

Functional

predictions on

ALL proteins

Page 96: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters(4) Submit

Result summary + annotation

Page 97: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters(4) SubmitResult summary + annotation

Visual comparison

find mis- or partial

matchesPrioritize

results

Functional predictions:

InterPro family/domain

classifications

Extractinformation

Page 98: EBI services

Sequence searching – specialised tools

Navigate to search tools Select search tool (1) Select database

(2) Copy/paste sequence

(3) Set parameters(4) SubmitResult summary + annotation

Functional predictions

Page 99: EBI services

Accessing old entries

sequence archives...

Page 100: EBI services

Sequence archives

www.ebi.ac.uk

• ENA nucleotide sequence version archive (SVA)www.ebi.ac.uk/embl/sva

• UniSave – UniProt sequence/annotation version archivewww.ebi.ac.uk/uniprot/unisave

Search by date get specific record

Search by accession only get all records

Page 101: EBI services

Sequence archives

View old entries

Compare different versions

Provides completeversion list

Page 102: EBI services

Sequence archivesView old entries

Page 103: EBI services

Sequence archivesCompare different

versions

Page 104: EBI services

Chemoinformatics

ChEBI & ChEMBL...

Page 105: EBI services

Chemoinformatics databases at EBI

• Chemical Entities of Biological Interest• ‘Small’ chemical entities (no protein/nucleic acids)• Illustrated dictionary of chemical nomenclature• http://www.ebi.ac.uk/chebi/

ChEBI

ChEMBL

• Database of bioactive drug-like small molecules• ‘Small’ molecules and peptides• Illustrated dictionary of chemical nomenclature• http://www.ebi.ac.uk/chembl/

Page 106: EBI services

ChEBI data overview

Visualisation

caffeine1,3,7-trimethylxanthine methyltheobromine

Nomenclature

Formula: C8H10N4O2Charge: 0 Mass: 194.19

Chemical data

metaboliteCNS stimulanttrimethylxanthines

Ontology

MSDchem: CFFKEGG DRUG: D00528

Database Xrefs

Chemical Informatics

InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3

SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O

Page 107: EBI services

ChEBI search for -lactamase

Chemical Entities of Biological Interest (ChEBI)

Page 108: EBI services

ChEBI search for -lactamase

Compounds interacting with BLA2_KLEPN

Page 109: EBI services

ChEBI search for -lactamase

Patent abstracts

ChEMBL db• bioactivity details

Page 110: EBI services

Summary

Comprehensive sequence databases ENA & UniParc (PAT / PRT class data) Non-redundant patent sequences enriched

Sequence archives

ENA SVA & UniSave track changes

Multiple search engines

Broad patent sequence coverage

Protein/nucleotides: EPO, USTPO, JPO, KIPO

EB-eye text search fetch patent literature ad sequences SRS advanced text searching >100 databases (including patents) Sequence searching specialised tools; annotation-enhanced

Page 111: EBI services

User support

2Can bioinformatics user support –  www.ebi.ac.uk/2Can

Online help pages – www.ebi.ac.uk/help

E-mail support – www.ebi.ac.uk/support

Page 112: EBI services

The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 226073 (Integrating Activity)

Any questions?

Contacts:

www.ebi.ac.uk/support