Upload
thomas-copeland
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Protein Sequence Analysis - Protein Sequence Analysis - OverviewOverview
Raja MazumderRaja MazumderSenior Protein Scientist, PIRSenior Protein Scientist, PIR
Assistant Professor, Department of Biochemistry and Molecular Assistant Professor, Department of Biochemistry and Molecular BiologyBiology
Georgetown University Medical CenterGeorgetown University Medical Center
NIH Proteomics Workshop 2004
22
OverviewOverview
ProteomicsProteomics and and protein bioinformatics protein bioinformatics (protein sequence analysis)(protein sequence analysis)
Why do protein sequence analysis? Why do protein sequence analysis? Searching sequence databasesSearching sequence databases Post-processing search resultsPost-processing search results Detecting remote homologsDetecting remote homologs
33
Clinical Proteomics
From Petricoin et al., Nature Reviews Drug Discovery (2002) 1, 683-695From Petricoin et al., Nature Reviews Drug Discovery (2002) 1, 683-695
44
Single protein and shotgun analysisSingle protein and shotgun analysis
Adapted from: McDonald et al. 2002. Disease Markers 18 99-105
Protein Bioinformatics
Mixture of proteinsG
el b
ased
sep
erat
ion
Single protein analysis
Digestion of protein mixture
Spot excisionand digestion
LC orLC/LC separation
Shotgun analysis
Peptides from a single protein
Peptides from many proteins
MS analysisMS/MS analysis
55
Protein Bioinformatics: Protein Protein Bioinformatics: Protein sequence analysissequence analysis
Helps characterize protein sequences Helps characterize protein sequences inin silico silico and allows prediction of protein structure and and allows prediction of protein structure and functionfunction
Statistically significant BLAST hits Statistically significant BLAST hits usuallyusually signifies sequence homologysignifies sequence homology
Homologous sequences may or may not have Homologous sequences may or may not have the same function but would always (very few the same function but would always (very few exceptions) have the same structural foldexceptions) have the same structural fold
Protein sequence analysis allows protein Protein sequence analysis allows protein classification classification
66
Development of protein sequence Development of protein sequence databasesdatabases
Atlas of protein sequence and structureAtlas of protein sequence and structure – – Dayhoff (1966) first sequence database (pre-Dayhoff (1966) first sequence database (pre-bioinformatics). Currently known as Protein bioinformatics). Currently known as Protein Information Resource (PIR)Information Resource (PIR)
Protein data bankProtein data bank (PDB) – structural database (PDB) – structural database (1972) remains most widely used database of (1972) remains most widely used database of structuresstructures
UniProtUniProt – The United Protein Databases – The United Protein Databases (UniProt, 2003) is a central database of protein (UniProt, 2003) is a central database of protein sequence and function created by joining the sequence and function created by joining the forces of the SWISS-PROT, TrEMBL and PIR forces of the SWISS-PROT, TrEMBL and PIR protein database activitiesprotein database activities
77
Comparative protein sequence Comparative protein sequence analysis and evolutionanalysis and evolution
Patterns of conservation in sequences allows us Patterns of conservation in sequences allows us to determine which residues are under selective to determine which residues are under selective constraints (are important for protein function)constraints (are important for protein function)
Comparative analysis of proteins more sensitive Comparative analysis of proteins more sensitive than comparing DNAthan comparing DNA
Homologous proteins have a common ancestorHomologous proteins have a common ancestor Different proteins evolve at different ratesDifferent proteins evolve at different rates Protein classification systems based on Protein classification systems based on
evolution: evolution: PIRSFPIRSF and and COGCOG
88
PIRSF and large-scale functional PIRSF and large-scale functional annotation of proteinsannotation of proteins
PIRSF structure is in the PIRSF structure is in the form of a network form of a network classification system classification system based on the evolutionary based on the evolutionary relationships of whole relationships of whole proteins and domainsproteins and domains
As part of the UniProt As part of the UniProt project, PIR has developed project, PIR has developed this classification strategy this classification strategy to assist in the propagation to assist in the propagation and standardization of and standardization of protein annotationprotein annotation
99
Comparing proteinsComparing proteins
Amino acid sequence of protein generated Amino acid sequence of protein generated from proteomics experimentfrom proteomics experiment
e.g. protein fragment e.g. protein fragment DTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKFDTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKFCKFTCKFT
Amino-acids of two sequences can be Amino-acids of two sequences can be aligned and we can easily count the aligned and we can easily count the number of identical residues (or use an number of identical residues (or use an index of similarity) to find the % similarity.index of similarity) to find the % similarity.
Proteins structures can be compared by Proteins structures can be compared by superimpositionsuperimposition
1010
Protein sequence alignmentProtein sequence alignment
Pairwise alignmentPairwise alignment aa bb a a cc dd aa bb _ _ cc dd
Multiple sequence alignment usually provides Multiple sequence alignment usually provides more informationmore information a a bb a a cc d d a a bb _ _ cc d d x x bb a a cc e e
Multiple alignment difficult to do for distantly Multiple alignment difficult to do for distantly related proteins related proteins
1111
Protein sequence analysis Protein sequence analysis overviewoverview
Protein databasesProtein databases PIR and UniProtPIR and UniProt
Searching databasesSearching databases Peptide search, BLAST search, Text searchPeptide search, BLAST search, Text search
Information retrieval and analysisInformation retrieval and analysis Protein records at UniProt and PIRProtein records at UniProt and PIR Multiple sequence alignmentMultiple sequence alignment Secondary structure predictionSecondary structure prediction Homology modelingHomology modeling
1212
Universal Protein KnowledgebaseUniversal Protein Knowledgebase(UniProt) (UniProt)
PIR (Protein Information Resource)PIR (Protein Information Resource) has recently joined forces with EBI (European Bioinformatics Institute) and has recently joined forces with EBI (European Bioinformatics Institute) and SIB (Swiss Institute of Bioinformatics) to establish the UniProtSIB (Swiss Institute of Bioinformatics) to establish the UniProt
Literature-Based Annotation
UniProt Archive
UniProt NREF
Swiss-Prot
PIR-PSDTrEMBL RefSeq GenBank/EMBL/DDBJ
EnsEMBL PDB PatentData
Other Data
UniProt Knowledgebase
Classification
Automated Annotation
Clustering at 100, 90, 50%
Literature-Based Annotation
UniProt Archive
UniProt NREF
Swiss-Prot
PIR-PSDTrEMBL RefSeq GenBank/EMBL/DDBJ
EnsEMBL PDB PatentData
Other Data
UniProt Knowledgebase
Classification
Automated Annotation
Clustering at 100, 90, 50%
http://www.uniprot.org/
1313
Peptide SearchPeptide Search
1414
Query SequenceQuery Sequence
Unknown sequence is Q9I7I7Unknown sequence is Q9I7I7 BLAST Q9I7I7 against the UniProt BLAST Q9I7I7 against the UniProt
knowledgebaseknowledgebase ( (http://http://
www.pir.uniprot.org/search/blast.shtmlwww.pir.uniprot.org/search/blast.shtml)) Analyze resultsAnalyze results
1515
BLAST resultsBLAST results
1616
Text SearchText Search
1717
Text search results: display Text search results: display optionsoptions
Moving Pubmed ID and PDB ID into “Columns in Display”
1818
Text search results: add input Text search results: add input boxbox
1919
Text Search Result with NULL/NOT Text Search Result with NULL/NOT NULLNULL
2020
UniProt protein recordUniProt protein record: :
2121
SIR2_HUMAN protein recordSIR2_HUMAN protein record
2222
Are Q9I7I7 and SIR2_HUMAN Are Q9I7I7 and SIR2_HUMAN close homologs?close homologs?
Check BLAST resultsCheck BLAST results
Check pairwise alignmentCheck pairwise alignment
2323
Protein structure predictionProtein structure prediction
Programs can predict Programs can predict secondary structure secondary structure information with 70% information with 70% accuracyaccuracy
Homology modeling - Homology modeling - prediction of ‘target prediction of ‘target structure from closely structure from closely related ‘template’ related ‘template’ structurestructure
2424
Secondary structure predictionSecondary structure predictionhttp://bioinf.cs.ucl.ac.uk/psipred/http://bioinf.cs.ucl.ac.uk/psipred/
2525
Secondary structure prediction Secondary structure prediction resultsresults
2626
Sir2 Homolog-Nad ComplexSir2 Homolog-Nad Complex
2727
Homology modelingHomology modelinghttp://www.expasy.org/swissmod/SWISS-MODEL.htmlhttp://www.expasy.org/swissmod/SWISS-MODEL.html
2828
Homology model of Q9I7I7Homology model of Q9I7I7
Blue - excellentGreen - so soRed - not good
Yellow - beta sheetRed - alpha helixGrey - loop
2929
Sequence features: Sequence features: SIR2_HUMANSIR2_HUMAN
3030
Multiple sequence alignmentMultiple sequence alignment
3131
Multiple sequence alignmentMultiple sequence alignment
Q9I7I7, Q82QG9, SIR2_HUMANQ9I7I7, Q82QG9, SIR2_HUMAN
3232
Sequence features: Sequence features: CRAA_RABITCRAA_RABIT
3333
Identifying remote homologsIdentifying remote homologs
3434
Structure guided sequence Structure guided sequence alignmentalignment