Upload
kiaria
View
64
Download
2
Tags:
Embed Size (px)
DESCRIPTION
EBI services. Jennifer McDowall EMBL-EBI. Overview. Introduction EBI Databases Searching for sequences NEW: simple EBI search Advanced SRS text search Sequence search tools Accessing Old entries Sequence archives Chemoinformatics. Thematic index. Website:. http://www.ebi.ac.uk/. - PowerPoint PPT Presentation
Citation preview
The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 226073 (Integrating Activity)
EBI services
Jennifer McDowall
EMBL-EBI
Overview
• Introduction
• EBI Databases
• Searching for sequences
– NEW: simple EBI search
– Advanced SRS text search
– Sequence search tools
• Accessing Old entries
– Sequence archives
• Chemoinformatics
Website: http://www.ebi.ac.uk/
Thematic index
EBI SearchSearch all databases and
literature in one go
EBI SearchSearch all databases and
literature in one go
Website:
www.ebi.ac.uk
Databases• Patent resources• Sequences• Genomes• Chemistry• Structures• Gene expression• Reactions & pathways• Literature
• Sequence searching
• Sequence analysis
• Structural analysis
• Functional analysis
Tools
Training• eLearning• Workshops• 2Can education resource
Industry programme
• Industry support• SME Support
patent-related resources...
EBI databases
Sequence data from patent literature
October 2010 patent nucleotides > 17.5m sequences
patent proteins > 4.9m sequences
GenBankGenBank
ENAENA
DDBJDDBJ
EPOEPO
USPTOUSPTO JPO + JPO + KIPOKIPO
EPO policy: Data publically released
18 months afterpatent application date
(whether patent granted or not)
INSDC agreement: • Free unrestricted access• Permanently accessible• All data exchanged daily
Patent resources at EBI
www.ebi.ac.uk/patentdata
Patent sequence records at EBI
NR patentsequences
• >124 million sequences• patent + non-patent nucleotides• redundant
UniParc(division of UniProt)
ENA(formerly EMBL-Bank)
• >24 million sequences• patent + non-patent proteins• non-redundant
• patent proteins and nucleotides • non-redundant• additional patent annotation
non-patent sequence
prior art searches
patent sequence
prior art searches
Non-redundant patent databases
www.ebi.ac.uk
Remove sequence redundancy
Level-1 NR
Group by patent families
Level-2 NR
Additional annotation, including priority dates for patent
families
ENA(redundant)
Sequence submissions
Generate sequence
Submit to journalSubmit to ENA
Submission guides at www.ebi.ac.uk
Not acceptedSubmit to journalStep 2
Submit claim to EPOStep 1
Searching for sequences
simple EBI search...
EBI-Search by patent number
www.ebi.ac.uk
Follow link to NEW EBI Search
Link to NEW EBI Search
EBI-Search by patent number
EBI-Search by patent number
Link to NEW EBI Search• Getting started • How it works• Gene & protein
summaries
NEW EBI Search
Training
video
EBI-Search by patent number
Link to NEW EBI Search
Search for patent WO0146262
Link to NEW EBI Search
EBI-Search by patent number
Search for WO0146262
EBI-Search by patent number
Link to NEW EBI Search Search for WO0146262
Literature for WO0146262
Sequence data for WO0146262
EBI-Search by patent number
Link to NEW EBI Search Search for WO0146262
Link to full patent paper
EBI-Search by patent number
Link to NEW EBI Search Search for WO0146262
WO0146262 literature and sequence databases
Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases
WO0146262 in CiteXplore
EBI-Search by patent number
EBI-Search by patent number
Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases
WO0146262 in CiteXplore
EBI-Search by patent number
Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases
WO0146262 in CiteXplore
WO0146262 in Esp@cenet
EBI-Search by patent number
Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases
WO0146262 in CiteXplore
WO0146262 in Esp@cenet
EBI-Search by patent number
WO0146262 in Esp@cenet
Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases
WO0146262 in CiteXplore
WO0146262 in Patent Lens
EBI-Search by patent number
WO0146262 in Esp@cenet
Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases
WO0146262 in CiteXplore
WO0146262 in Patent Lens
EBI-Search by patent number
WO0146262 in Esp@cenet
Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases
WO0146262 in CiteXplore
WO0146262 in Patent Lens
Lists nucleotide sequences from
WO0146262
Additional annotation
EBI-Search by patent number
WO0146262 in Esp@cenet
Link to NEW EBI Search Search for WO0146262 WO0146262 literature and sequence databases
WO0146262 in CiteXplore
WO0146262 in Patent Lens
WO0146262 nucleotide
sequence record in ENA
Patent sequence record in ENA
www.ebi.ac.uk
Graphical viewer
Sequence
Patent reference
Navigate to related datae.g. Version
archive
Navigate to external data
sourcese.g. UniProt
Download data
DNA source
Dates (first public and last updated)
Sequence version
WO0146262 in Esp@cenet
Link to NEW EBI Search Search for WO0146262
WO0146262 in CiteXplore
WO0146262 in Patent Lens
WO0146262 literature and sequence databases
ENA sequence record
EBI-Search by patent number
EBI-Search by gene name
Link to NEW EBI Search
Search for src gene
Link to NEW EBI Search
EBI-Search by gene name
Search for src
EBI-Search by gene name
Link to NEW EBI Search Search for src
Genome information
Gene & protein summaries
EBI-Search by gene name
Link to NEW EBI Search Search for src
Let’s select src in humans
EBI-Search by gene name
Link to NEW EBI Search
src gene & protein summary
Search for src
EBI-Search by gene name
Link to NEW EBI Search src gene & protein summarySearch for src
Species
selector
EBI-Search by gene name
Link to NEW EBI Search src gene & protein summarySearch for src
Gene tab
Gene structure
(forward &
reverse strand)
• Gene sequence• Location• Sequence variations• Orthologs
Data source
(Ensembl)
src gene & protein summaryLink to NEW EBI Search Search for src
Gene & protein summarygene tab
EBI-Search by gene name
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
Expression tab
Expression
studies
Data source (Expression
Atlas)
EBI-Search by gene name
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
See expression
in cell type
EBI-Search by gene name
Gene Atlas
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
EBI-Search by gene name
Gene & protein summaryexpression tab
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
EBI-Search by gene name
Gene & protein summaryexpression tab
Protein tab
• Function• Isoforms• Sequence• Classification• Interactions
Data sources
(UniProt, InterPro, IntAct)
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
EBI-Search by gene name
Gene & protein summaryprotein tab
Gene & protein summaryexpression tab
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
EBI-Search by gene name
Gene & protein summaryprotein tab
Gene & protein summaryexpression tab
Structure
tab
Citation
Data source (PDBe)
Structural
domains
47 additional
structures
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
EBI-Search by gene name
Gene & protein summaryprotein tab
Gene & protein summaryexpression tab
Gene & protein summarystructure tab
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
EBI-Search by gene name
Gene & protein summaryprotein tab
Gene & protein summaryexpression tab
Gene & protein summarystructure tab
Literature
tabSearch results taken from:
• PubMed• PubMedUK
• Agricola• EPO
Divided into
categoriesDescription
of categories
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
EBI-Search by gene name
Gene & protein summaryprotein tab
Gene & protein summaryexpression tab
Gene & protein summarystructure tab
Literature
tab
PatentsCurator-selected
articles
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
EBI-Search by gene name
Gene & protein summaryprotein tab
Gene & protein summaryexpression tab
Gene & protein summarystructure tab
Gene & protein summaryliterature tab
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
EBI-Search by gene name
Gene & protein summaryprotein tab
Gene & protein summaryexpression tab
Gene & protein summarystructure tab
Reporting view
print full summary page
Link to NEW EBI Search Search for src src gene & protein summary
Gene & protein summarygene tab
EBI-Search by gene name
Gene & protein summaryprotein tab
Gene & protein summaryexpression tab
Gene & protein summarystructure tab
Gene & protein summaryliterature tab Print report
Searching for sequences
advanced SRS text search...
SRS – for more search options
www.ebi.ac.uk/srs
1st: Select resources to search
2nd: Create query
SRS – for more search options
Select library tab
SRS – for more search options
Select library tab
Patent literature
Patent DNA
Patent proteins
Search >100 databases
SRS – for more search options
Select library tab
Here, selected NR-level 2 DNA
database
SRS – for more search options
Select library tab
Select resources to search
SRS – for more search options
Select library tab Select resources to search
2) Type in text1) Select field
SRS – for more search options
Select library tab Select resources to search
Here, selected patent number
SRS – for more search options
Select library tab Select resources to search
Create query
SRS – for more search options
Select library tab Select resources to search Create query
Lists non-redundant nucleotide sequences
from WO0146262
SRS – for more search options
Select library tab Select resources to search Create query
WO0146262 sequences
SRS – for more search options
Select library tab Select resources to search Create query
WO0146262 sequences
WO0146262 nucleotide
sequence record in NRNL2
Patent sequence record in NRNL2
Patent equivalents
Sequence record in ENA
Sequence
Patent literature
Priority numberand date
Translation
SRS – for more search options
Select library tab Select resources to search Create query
WO0146262 sequencesNRNL2 sequence record
SRS – for more search options
Select library tab Select resources to search Create query
WO0146262 sequences
NRNL2 sequence record
WO0146262 literature
www.ebi.ac.uk/srs
Searching for sequences
sequence search...
Sequence searching – specialised tools
Navigate to ‘Sequence Similarity & Analysis’
www.ebi.ac.uk
Sequence searching – specialised tools
Navigate to search tools
Sequence searching – specialised tools
Navigate to search tools
www.ebi.ac.uk/Tools/sss
BLASTBLAST
FASTAFASTA
PSI searchPSI search
ChooseSearch tool
When to use which search?Q
uery
leng
th
FASTA
WU-BLAST
NCBI BLAST
PSI-SEARCH
time
to s
earc
h
Database size
When to use which search?
Chose the appropriate
search engine for the job
BLAST – initial fast search
FASTA – better general search engine
PSI-BLAST – find remote family members
GLSEARCH – match peptide/domain to protein
GGSEARCH –full length matches
FASTM – match several peptides to protein
(one search engine won’t do everything)
Sequence searching – specialised tools
Navigate to search tools
www.ebi.ac.uk/Tools/sss
Here, try FASTA protein
Sequence searching – specialised tools
Navigate to search tools
Select search tool
Sequence searching – specialised tools
Navigate to search tools Select search tool
For patent proteins:
Search individual patent offices
ornon-redundant patent datasets
Step 1: Select database
Sequence searching – specialised tools
Navigate to search tools Select search tool
Here, selected
UniProt Knowledgebase+
NR patent proteins L2
Step 1: Select database
Sequence searching – specialised tools
Navigate to search tools Select search tool
(1) Select database
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
Step 2: Copy/paste sequence
orupload file
Copy/pasted
patent protein A00210
from patent EP0242329
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
Step 3: Set
parameters
Can change
search engine
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
Step 3: Set
parameters
Can change
search
parameters
How to optimise parameters?
User manual
provides help
How to optimise parameters?
2. length of query sequence
Choice of matrix
depends on:
1. strictness of search
QUERY LENGTH MATRIX open ext >300 BLOSUM50 -10 -2 85-300 BLOSUM62 -7 -1 50-85 BLOSUM80 -16 -4 >300 PAM250 -10 -2 85-300 PAM120 -16 -4 35-85 MDM40 -12 -2 <=35 MDM20 -22 -4 <=10 MDM10 -23 -4
How to optimise parameters?
Choice of gap penalties depends on:
2. to match scoring matrix
1. strictness of search
QUERY LENGTH MATRIX open ext >300 BLOSUM50 -10 -2 85-300 BLOSUM62 -7 -1 50-85 BLOSUM80 -16 -4 >300 PAM250 -10 -2 85-300 PAM120 -16 -4 35-85 MDM40 -12 -2 <=35 MDM20 -22 -4 <=10 MDM10 -23 -4
• larger penalty fewer gaps
How to optimise parameters?
Do I mask my sequence?
**Be careful you don’t mask what you are looking for
Low complexity regions should be
masked to avoid spurious results
• CA repeats
• poly-A tails
• proline-rich regions
How to optimise parameters?
use strict matrices
use high gap penalties
avoid masking
allow high e-values
What do I use for short
sequences?
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
Step 3: Set
parameters
Here, use default
parameters
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters
Step 4: submit Can select to
have results
emailed
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters
(4) Submit
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters(4) Submit
Results include patent
proteins (from NRPL2)...
...and non-patent
proteins (from
UniProtKB)
View additional annotation
(non-patent proteins)
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters(4) Submit
Related EMBL
nucleotide entries
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters(4) Submit
Related genomic
information
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters(4) Submit
Gene ontology (GO)
mapping for protein
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters(4) Submit
InterPro family/domain
classification
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters(4) Submit
Literature
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters(4) Submit
Functional
predictions on
ALL proteins
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters(4) Submit
Result summary + annotation
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters(4) SubmitResult summary + annotation
Visual comparison
find mis- or partial
matchesPrioritize
results
Functional predictions:
InterPro family/domain
classifications
Extractinformation
Sequence searching – specialised tools
Navigate to search tools Select search tool (1) Select database
(2) Copy/paste sequence
(3) Set parameters(4) SubmitResult summary + annotation
Functional predictions
Accessing old entries
sequence archives...
Sequence archives
www.ebi.ac.uk
• ENA nucleotide sequence version archive (SVA)www.ebi.ac.uk/embl/sva
• UniSave – UniProt sequence/annotation version archivewww.ebi.ac.uk/uniprot/unisave
Search by date get specific record
Search by accession only get all records
Sequence archives
View old entries
Compare different versions
Provides completeversion list
Sequence archivesView old entries
Sequence archivesCompare different
versions
Chemoinformatics
ChEBI & ChEMBL...
Chemoinformatics databases at EBI
• Chemical Entities of Biological Interest• ‘Small’ chemical entities (no protein/nucleic acids)• Illustrated dictionary of chemical nomenclature• http://www.ebi.ac.uk/chebi/
ChEBI
ChEMBL
• Database of bioactive drug-like small molecules• ‘Small’ molecules and peptides• Illustrated dictionary of chemical nomenclature• http://www.ebi.ac.uk/chembl/
ChEBI data overview
Visualisation
caffeine1,3,7-trimethylxanthine methyltheobromine
Nomenclature
Formula: C8H10N4O2Charge: 0 Mass: 194.19
Chemical data
metaboliteCNS stimulanttrimethylxanthines
Ontology
MSDchem: CFFKEGG DRUG: D00528
Database Xrefs
Chemical Informatics
InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O
ChEBI search for -lactamase
Chemical Entities of Biological Interest (ChEBI)
ChEBI search for -lactamase
Compounds interacting with BLA2_KLEPN
ChEBI search for -lactamase
Patent abstracts
ChEMBL db• bioactivity details
Summary
Comprehensive sequence databases ENA & UniParc (PAT / PRT class data) Non-redundant patent sequences enriched
Sequence archives
ENA SVA & UniSave track changes
Multiple search engines
Broad patent sequence coverage
Protein/nucleotides: EPO, USTPO, JPO, KIPO
EB-eye text search fetch patent literature ad sequences SRS advanced text searching >100 databases (including patents) Sequence searching specialised tools; annotation-enhanced
User support
2Can bioinformatics user support – www.ebi.ac.uk/2Can
Online help pages – www.ebi.ac.uk/help
E-mail support – www.ebi.ac.uk/support
The SLING project is funded by the European Commission within Research Infrastructures of the FP7 Capacities Specific Programme, grant agreement number 226073 (Integrating Activity)
Any questions?
Contacts:
www.ebi.ac.uk/support