32
B Cell Epitope Prediction Immunological Bioinformatics #27685 3-week course June 2011 Center for Biological Sequence Analysis Department of Systems Biology Technical University of Denmark

B Cell Epitope Prediction - DTU Bioinformatics · Leon Jessen ([email protected]) 21 CBS, ... B Cell Epitope Prediction 10/06/2011 Leon Jessen ([email protected]) 23 CBS, Department

  • Upload
    lekien

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

B Cell Epitope Prediction Immunological Bioinformatics #27685 3-week course June 2011 Center for Biological Sequence Analysis Department of Systems Biology Technical University of Denmark

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

2 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Outline

• What is an epitope?

•  How can we identify potential epitopes?

•  Note that in this presentation, epitope refers only to B-cell epitopes and is not to be confused with T-cell epitopes (Recall the MHCI/II pathways)

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

3 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

So what is an Epitope?

•  An epitope is the part of a protein (antigen) being recognised by soluble antibodies or the B-cell receptor (immobilised antibodies on B-cells)

•  Proteins play an absolute key role in pathogenicity: Invasion, adhesion, inhibition etc.

•  There is a constant arms race going on inside each and everyone of you!

•  Pathogens seek to evade immune detection

•  The immune system seek to detect pathogens

•  So who is winning?

•  Luckily in most cases we are!

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

4 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Antibody-antigen interaction

https://www.pharmatching.com/blog/wp-content/uploads/2011/01/monoclonal_antibody.jpg

Heavy chains

Light chains

Constant region

Variable region

Antigen binding site

Epitope Paratope

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

5 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Binding interactions

•  Salt bridges

•  Hydrogen bonds

•  Hydrophobic interactions

•  Van der Waals forces

Binding

strength

The interaction is highly specific! One key, one lock principle!

(This is the reason for the high sensitivity of western blots)

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

6 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Taking a closer look at the ‘button’

•  Basic principle – An epitope is an exposed, accessible non-

self surface structure

•  Consists of –  Lipids, sugars, protein, DNA or complex

hereof

Basically anything that the BCR will recognise (bind to)!

In the following we solely focus on epitopes made up of amino acid residues

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

7 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Types of epitopes

•  There are two basic types of B-cell epitopes

– Continuous (linear made up of primary structure)

– Discontinuous (non-linear made up of tertiary structure)

•  In nature – ~10% linear – ~90% discontinues (But often

with a linear determinant)

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

8 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Case: PfEMP1 PAM VAR2CSA DBL5ε

•  VAR2CSA: Primary pathogenesis protein in Pregnancy associated malaria •  Highlighted motif predicted using computational approach •  Experimentally validated using high density peptide array

Adapted from: Gnidehou S, Jessen L, et al. 2010. PLoS ONE 5(10): e13105. doi:10.1371/journal.pone.0013105

Homology model of the DBL5ε domain of the Pregnancy Associated Malaria PfEMP1 protein VAR2CSA (3D7 variant)

275-TFKNI-279

-  Exposed

-  Accessible

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

9 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

High Density Peptide Array

24 fields divided by

Teflon barriers

Each field is further subdivided

into 5,000 sub-fields, on which

peptides are synthesised directly

High density peptide chip, here applied for VAR2CSA antigenicity analysis.

Works for any protein

Briefly:

Addition of sera samples

(Immunised rats) Signal quantified by

fluorescence measurements ~1,000,000 peptides/chip!

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

10 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

High Density Peptide Array

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #1

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

378−SRYDDYVKDFFKKLEA−393378−SRYDDYVKDFFKKLEA−393

997−RTMKRGYKN−1005997−RTMKRGYKN−1005

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #2

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

2263−GMDEFKNTFKNIKE−2276

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #3

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

420−NSSDANNPSEKI−431420−NSSDANNPSEKI−431420−NSSDANNPSEKI−431995−SARTMKRGYK−1004995−SARTMKRGYK−1004

2266−E−22662268−K−2268

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #4

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

994−GSARTMKRGYKNDNYELC−1011

1038−FNLFEQW−1044

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #5

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

993−CGSARTMKRGYKNDNYELC−1011993−CGSARTMKRGYKNDNYELC−1011

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #6

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

862−NRK−864995−SARTMKRGYK−10041286−KRYGGRSNIK−12952268−K−22682270−T−2270

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #7

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

994−GSARTMKRGYKNDNY−1008994−GSARTMKRGYKNDNY−1008

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #8

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #9

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

862−NRKAG−8662668−AG−26692668−AG−2669

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #10

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

1597−GNDRTWSKKYIKKLE−16111597−GNDRTWSKKYIKKLE−1611

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #11

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

1579−YEYNNAEKKNNKS−15911579−YEYNNAEKKNNKS−15911579−YEYNNAEKKNNKS−1591

p = 0.05p = 0.05 w/ corr.

0 500 1000 1500 2000 2500

05

1015

20

VAR2CSA Variant: 3D7, PepArray Conditions #12

Sequence Position

Z−Sc

ore

Nor

mal

ised

ave

rage

S/N

1573−CEQVKYYEYNNAEKK−15871573−CEQVKYYEYNNAEKK−1587

p = 0.05p = 0.05 w/ corr.

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

11 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

So why are B-cell epitopes so important?

•  In the case of PAM, being ready for the ‘attack’ really in many cases is the difference between life and death! (Sadly ~10,000 women and ~200,000 infants die from PAM each year in sub-Saharan Africa)

•  The primary response simply is not enough!

Adapted from http://www.mhhe.com/biosci/esp/2001_gbio/folder_structure/an/m10/s3/assets/images/anm10s3_9.jpg

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

12 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

A small note

• We now turn to the more ‘nerdy’ bioinformatics

•  But please do remind yourself that the research you do every day, does in fact have extrapolations to the real world, in which real people will benefit from your tedious work

• We are saving the world here people! (or at least trying to – Right?)

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

13 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

So how do we predict b-cell epitopes?

•  Appr. 1010 combinations of the variable region (minus self-antigens!)

• Millions of B-cells with different B-cell receptors are made each day

•  All in all, we are asking a very difficult question!

•  To which residues in any given sequence will any of these 1010 paratopes bind?

•  Number of possible combinations?

•  A LOT!

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

14 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Recall the basic principle of epitopes

Exposed accessible surface structure!

So basically we need to predict the surface!

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

15 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

So which residues fulfill the basic principle?

• Most proteins live in aqueous environments •  Aqueous environments are... hydrophilic (surprise) •  The Parker Hydrophilicity Scale provides a quantitative measure of the

hydrophilicity of any given amino acid residue (exp derived) D 2.46 E 1.86 N 1.64 S 1.50 Q 1.37 G 1.28 K 1.26 T 1.15 R 0.87 P 0.30 H 0.30 C 0.11 A 0.03 Y -0.78 V -1.27 M -1.41 I -2.45 F -2.78 L -2.87 W -3.00

Hydrophilicity

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

16 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Propensity Scale

On a primary structure level, each amino acid residues is assigned the average of the nearest neighbours

...DEFKNTFKNIKEPDA...

...EFKNTFKNIKEPDA...

S(N)= mean(T+F+K+N+I+K+E)

= (1.15-2.78+1.26+1.64-2.45+1.26+1.86)/7

= 0.28

So the region is hydrophilic on average!

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

17 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Other approaches

•  Epitopes are found in: –  Turns –  Loops – Other often surface exposed

secondary structures

http://www.cs.gmu.edu/~ashehu/sites/default/files/images/1hml_loop_ensemble_newcartoon.jpg

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

18 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

But...

•  Blythe and Flowers (2005) did an extensive evaluation of the Propensity scale approach

•  Simon says random!

•  But why? White regions are hydrophobic!

Electrostatics for VAR2CSA DBL3x (3BQK) based on the structure by:

Higgins, M. K. Journal of Biological Chemistry 283, 21842 (2008).

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

19 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

The BepiPred Server

•  Combining –  Parker Hydrophobisity Scale –  Position Specific Scoring Matrix (PSSM) experimentally derived

•  Validated using Pellequer Dataset and epitopes from the HIV Los Alamos database

•  Available at: http://www.cbs.dtu.dk/services/BepiPred/

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

20 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Performance

•  Depiction of HIV Los Alamos set

•  HIV Los Alamos set –  Levitt 0.57 –  Parker 0.59 – BepiPred 0.60

•  Pellequer set: –  Levitt 0.66 –  Parker 0.65 – BepiPred 0.68

So, BepiPred is better than the others, but still not too good!

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

21 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Imposing 2D on what is really 3D

•  Predicting linear epitopes from sequence is an over simplification

•  Epitopes live in a 3D world!

•  In a 2D world you go to San Francisco •  In the 3D world you to Miami (At best if you drop out of the sky!)

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

22 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Going 3D requires protein structures

•  Protein Data Bank (Jun 07 2011: 73656 Structures)

•  Evolutionary conservation: Structure over function over sequence

•  Even if the structure is unknown, often it can be modeled using homology modeling (http://www.cbs.dtu.dk/services/CPHmodels/)

Super impose sequence

with unknown structure

Homology model of VAR2CSA DBL5ε based on

template VAR2CSA DBL3x (3BQK). RMS = 0.498

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

23 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

So exactly what is it that an AB ‘sees’?

•  Interrogate the surface using a 10Å probe

•  Imagine the probe as a ball rolling over the surface of the protein and reporting back what it touches

•  In the figure, all that is ‘touched’ is in green

•  This can define ‘Exposed accessible surface structure’ (Basic principle)

•  Regardless of hydrophilicity: What is accessible is accessible! (But often hydrophilic will be ‘most’ accessible)

Novotny J. A static accessibility model of protein antigenicity. Int Rev Immunol 1987 Jul;2(4):379-89

Antibody

Antigen

Probe

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

24 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

The DiscoTope Server

•  Discotope: Prediction of residues in discontinuous B cell epitopes using protein 3D structures (Andersen PH, Nielsen M and Lund O, Protein Sci 2006)

•  http://www.cbs.dtu.dk/services/DiscoTope/

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

25 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

How does it work then?

•  Combines propensity scale values of amino acids in discontinuous epitopes with surface exposure

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

26 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Surface exposure

•  Structures of antibodies/antigen protein complexes in PDB

•  Dr.Andrew Martin’s SACS database (available at http://www.bioinf.org.uk/abs/sacs) was used to get an overview of PDB entries

•  Epitopes in the data set were identified by finding residues within 4Å from heavy or light chains in the Abs

•  Homology grouping and cross-validation for the training and testing of the method to avoid biasing towards specific antigens was used

•  The 5 sets used for cross-validated training/testing are available at: http://www.cbs.dtu.dk/suppl/immunology/DiscoTope.php

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

27 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Log-odds ratios

•  Frequencies of amino acids in epitope residues compared to frequencies of non-epitope residues

•  Several discrepancies compared to the Parker hydrophilicity scale

•  Predictive performance (AUC) of B cell epitopes: –  Parker hydrophilicity scale 0.614 – Epitope log–odds 0.634

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

28 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Combine log-odds ratio and surface expos.

•  By structure, we know which residues are in spatial proximity

•  By log-odds ratio, we know which residues are likely to be in an epitope

•  S - D - E - K - R - P - E - K are in spatial proximity

•  K has 7 contacts

•  The score for K is the sum of the log-odds values

...LIST..FVDEKRPGSDIVED......ALILKDENKTTVI...

-0.145 + 0.691 + 0.346 + 1.136 + 1.180 + 1.164 + 0.346 + 1.136

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

29 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

DiscoTope Performance

•  Parker 0.614 Seq.-based

•  Epitope log–odds 0.634 Seq.-based

•  Contact numbers 0.647 Str.-based

•  DiscoTope 0.711 Seq./Str.-based

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

30 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Evaluation example

•  Plasmodium falciparum Apical antigen I

•  Kept completely seperat from DiscoTope training

•  Two epitopes were identified using phagedisplay, sequence variance analysis and pointmutation (green backbone)

• Most residues identified as epitopes were successfully predicted by DiscoTope (black side chains)

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

31 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Relatively new services

Technical University of Denmark - DTUDepartment of systems biology

CE

NT

ER

FOR

BIO

LOG

ICA

L SE

QU

EN

CE

AN

ALY

SIS

ECCB/ISMB-2009 - Immunological Bioinformatics Tutorial

Vol. 24 no. 12 2008, pages 1459–1460BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn199

Structural bioinformatics

PEPITO: improved discontinuous B-cell epitope prediction usingmultiple distance thresholds and half sphere exposureMichael J. Sweredoski1,2 and Pierre Baldi1,2,*1Department of Computer Science and 2Institute for Genomics and Bioinformatics, University of California, Irvine,92697-3435, California, USA

Received on March 3, 2008; revised on April 18, 2008; accepted on April 20, 2008

Advance Access publication April 28, 2008

Associate Editor: Anna Tramontano

ABSTRACT

Motivation: Accurate prediction of B-cell epitopes is an important

goal of computational immunology. Up to 90% of B-cell epitopes are

discontinuous in nature, yet most predictors focus on linear

epitopes. Even when the tertiary structure of the antigen is available,

the accurate prediction of B-cell epitopes remains challenging.

Results: Our predictor, PEPITO, uses a combination of amino-acid

propensity scores and half sphere exposure values at multiple

distances to achieve state-of-the-art performance. PEPITO achieves

an area under the curve (AUC) of 75.4 on the Discotope dataset.

Additionally, we benchmark PEPITO as well as the Discotope

predictor on the more recent Epitome dataset, achieving AUCs of

68.3 and 66.0, respectively.

Availability: PEPITO is available as part of the SCRATCH suite of

protein structure predictors via www.igb.uci.edu.

Contact: [email protected]

Supplementary information: Supplementary data are available at

Bioinformatics online.

1 INTRODUCTION

B-cell epitope prediction is an important, but unsolved problem inbioinformatics. The ability to accurately predict B-cell epitopeswould aid researchers in a variety of immunological applications.Initial attempts at predicting B-cell epitopes involved the

calculation of propensity scales (Hopp and Woods, 1981).While this information can be useful in predicting B-cellepitopes, Blythe and Flower (2005) showed that propensityscales alone are not enough to accurately predict epitopes.Many of the previous predictors have focused on linear B-cell

epitopes. Some of these methods include ABCpred (Saha andRaghava, 2006), BEPITOPE (Odorico and Pellequer, 2003),Bepipred (Larsen et al., 2006) and PEOPLE (Alix, 1999).However, past surveys have estimated that only 10% of theB-cell epitopes are continuous (van Regenmortel, 1996).Additionally, van Regenmortel (2006) noted that even linearepitopes adopt a conformational structure and therefore thedistinction is somewhat blurred. Far fewer predictors have beendeveloped for discontinuous B-cell epitopes. One of the firstmethods explicitly created for identification of discontinuousepitopes was conformational epitope predictor (CEP)

(Kulkarni-Kale et al., 2005). Another method described byRapberger et al. (2007) incorporates epitope–paratope shapecomplementarity to predict interaction sites. One of the mostrecent, state-of-the-art, predictors of discontinuous epitopes isDiscotope (Andersen et al., 2006), which uses both contactnumbers (i.e. the number of C! atoms within a certain distancethreshold) and an amino-acid propensity scale.Our predictor, PEPITO, attempts to overcome some of the

limitations of previous predictors by incorporating an amino-acid propensity scale along with side chain orientation andsolvent accessibility information using half sphere exposurevalues (Hamelryck, 2005). To increase robustness, PEPITOuses propensity scales and half sphere exposure values atmultiple distance thresholds from the target residue.

2 METHODS

2.1 DatasetsWe obtained epitope datasets for benchmarking prediction methodsfrom both the Discotope Supplementary Materials (Andersen et al.,2006) and Epitome (Schlessinger et al., 2006). The two datasets containdifferent sets of protein chains and differ in their epitope/non-epitopeclassification rules. The Discotope dataset, which consists of 75 proteinchains, labels all residues in antigen chains within 4 A of an antibody asepitopes. The Epitome dataset, which consists of 140 protein chains,seeks to eliminate incidental contacts by labeling residues in the antigenwithin 6 A of the complementary determining regions of the antibodychains as epitopes.

We derived two additional datasets, C[Discotope] and C[Epitome],from the set of protein chains that are common to both the Epitomeand Discotope datasets. The two datasets differ in the method used toidentify epitope residues. Eight hundred and seventy-five of the residuesin the derived datasets are defined as epitopes using both methods. Fourhundred and seventy-one of the residues in the derived datasets aredefined as epitopes using the Epitome method but not the Discotopemethod. One hundred and nine of the residues in the derived datasetsare defined as epitopes using the Discotope method but not the Epitomemethod. The assertions by Schlessinger et al. (2006) would indicate thatthe 471 residues are integral to the antigen–antibody binding while the109 residues result from incidental contacts.

Testing procedures require that the protein chains present in thedatasets be clustered to prevent any one family from dominating theperformance measures. Protein families were previously annotated forthe Discotope dataset. UniqueProt (Mika and Rost, 2003) was used toidentify protein families in the Epitome dataset and the two deriveddatasets.*To whom correspondence should be addressed.

! The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 1459

BioMed Central

!"#$%&%'(%)!"#$%&'()*%+&',-&.,+&/0-#-0,'&"(+",1%12

BMC Bioinformatics

Open AccessSoftwareElliPro: a new structure-based tool for the prediction of antibody epitopesJulia Ponomarenko*1,2, Huynh-Hoa Bui3, Wei Li, Nicholas Fusseder, Philip E Bourne1,2, Alessandro Sette4 and Bjoern Peters4

Address: 1San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA, 2Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA, 3Isis Pharmaceuticals, Inc., 1896 Rutherford Road, Carlsbad, California 92008, USA and 4La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, California 92037, USA

Email: Julia Ponomarenko* - [email protected]; Huynh-Hoa Bui - [email protected]; Wei Li - [email protected]; Nicholas Fusseder - [email protected]; Philip E Bourne - [email protected]; Alessandro Sette - [email protected]; Bjoern Peters - [email protected]* Corresponding author

AbstractBackground: Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highlydesirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity,solvent accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton andcolleagues proposed a method for identifying continuous epitopes in the protein regions protrudingfrom the protein's globular surface. The aim of this work was to implement that method as a web-tool and evaluate its performance on discontinuous epitopes known from the structures ofantibody-protein complexes.

Results: Here we present ElliPro, a web-tool that implements Thornton's method and, togetherwith a residue clustering algorithm, the MODELLER program and the Jmol viewer, allows theprediction and visualization of antibody epitopes in a given protein sequence or structure. ElliProhas been tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures ofantibody-protein complexes. In comparison with six other structure-based methods that can beused for epitope prediction, ElliPro performed the best and gave an AUC value of 0.732, when themost significant prediction was considered for each protein. Since the rank of the best predictionwas at most in the top three for more than 70% of proteins and never exceeded five, ElliPro isconsidered a useful research tool for identifying antibody epitopes in protein antigens. ElliPro isavailable at http://tools.immuneepitope.org/tools/ElliPro.

Conclusion: The results from ElliPro suggest that further research on antibody epitopesconsidering more features that discriminate epitopes from non-epitopes may further improvepredictions. As ElliPro is based on the geometrical properties of protein structure and does notrequire training, it might be more generally applied for predicting different types of protein-proteininteractions.

Published: 2 December 2008

BMC Bioinformatics 2008, 9:514 doi:10.1186/1471-2105-9-514

Received: 24 September 2008Accepted: 2 December 2008

This article is available from: http://www.biomedcentral.com/1471-2105/9/514

© 2008 Ponomarenko et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

RECENT DEVELOPMENTS

Friday, 11 June 2010

10/06/2011 B Cell Epitope Prediction Leon Jessen ([email protected])

32 CBS, Department of Systems Biology

CEN

TER FO

R B

IOLO

GIC

AL S

EQU

ENCE A

NALYS

IS

TECH

NIC

AL U

NIV

ERSITY O

F DEN

MARK

Summary

•  B-cell epitopes are essential in preparing the body for future infections

•  Antibodies are constantly monitoring the body

•  Due to combinatorics predicting B-cell epitopes is a highly complex task

•  Current best approach: Combine propensity scales with structure

•  Immunoinformatics is a new field of reasearch, so there is plenty of room for improvement

•  The field is expanding and actively reduces lab-time

•  Thanks to Claus Lundegaard for letting me use his slides from last year as inspiration for this talk