18
Rrprint1•d from J . .l/ol. Biol. (1987) 1 96. !lOl-!)17 Canonical Structures for the Hypervariable Regions of Immunoglobulins Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062

1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

Rrprint1•d from J . .l/ol. Biol. (1987) 196. !lOl-!)17

Canonical Structures for the Hypervariable Regions of Immunoglobulins

Cyrus Chothia and Arthur M. Lesk

38

1 of 18 Celltrion, Inc., Exhibit 1062

Page 2: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

J. Jin/ Biol. (1987) 196. 90I- 91i

Canonical Structures for the Hypervariable Regions of Immunoglobulins

Cyrus Chothia 1 •2 and Arthur M. Lesk 1.3t

1 11/ RC La-Ooratory of Molecular Biology Hills Road. Cambridge CB2 2QH

England

2Ghristopher h1gold Laboratory l "niversity College Lo1ulon

20 Gordon St-reet London WCJl-f OAJ, England

3 EM BL Biocom pa ting Programme llfeyerhofstr. 1, Postfach 1022.09

D-6.900 Heidelberg Federal Republic of Germany

(Received 13 November 1986, and in rn•ised form ~.1 April 1987)

\\'e have analysed t he atomic structures of Fab and \ "L fragmrnt-s of immunoglobulins to

determine the relationship between th~i1· amino aC'id sequences and t.h<' three·dimeni;ional

structures of their antigen binding .:it-t-s. \\'e identif~· the relatively few residues that,

through their packing, hydrogen bonding or the abi lit~· to assu me unusual </J. if! or w

conformations, arc primarily responsible for the main-chain conformations of the

hypervariable regions. These residues are found to oc·<'nr at sit t'>1 within the hypervariable

regions and in the conserved /J-sheet. framework.

Examination of the sequences of immunoglobulins of unknown structur<' shows that

many have hypervariable regions I hat are similar in siZ(' to one of t he known ;;t ruc·l un·:< and

contain identical residues at the sitrs responsible for the observed c·onformation. 'f'his

implies that these hyperva.riable regions ha.ve «onformations close t,o those in the known

structures. For five of the hypervariable regions, the l'f'J>erloire of conformations appears to

be limited to a relat h·e]y :-:mall number of discn·t.e st rurtural da;:;;es. \\'c· call t he c·ommonly

ON·nrring ma.in-chain conformat ions of the hy pervariab)e regions "1•a nonical s t.ructurc•s".

The accuracy of the analysis is being test eel and refi ned l1y the predid i1111 of

immunoglobulin struc·tures prior to their experimental determination.

1. Introduction

The specificity of immunoglobulins is detnrnined

by th" sequenl't' and size of the hypt-n·ar1able

regions in the variable domai11s. These regions

produce a surface complementary to that of the

antigen. The subject of this paper is the relation

lit>lwt·1·n the amino acid sequerwrs of antibodies and

the strul'ture of their binding ;;itt>>1. The result s we

report are related to two prf'vious :wts of

ohsrrvations.

The fir:-1 set com·ernl< the sequenct>s of the

hypervariable regions . l\aliat and his colleagul's

(Kahat t i nl .. 1977: Kabat, 1978) rompared lh1·

s('qnrn1:es oft lw h.vpernviable re~ions then known

and found t.hat. at 13 sitex in the light. thain:-.

and at >lf' \ 'E'll positions in t ht> ht'a".'' l"11ain,.;. the

rt·sidue!-l are consen-rrl . 1'h1·y argnecl th1\t the

resid11C's at Hwst' i;it1·s art' invoked in I he st.ructure.

rat hl'r than t hl' spl.'1:ificity . of tht> hy p1·rrnrialtlt·

regions. 'I'lw.v suggested that lht'.'!' n >., id111·s ban· a

fo.:Pcl position in antibndi c•:-: and that thi:-: 1·01dd hr

used in tht· modrl building of combining sites to

limit I he conformal ions and positions of 1 h1· sil t':-.

whose residut>' va r·it·d . P adla.n (1979) a lso 1·xamitwd

the sl'querwes of the hypervariable n ·gion of light t Also associated with Fairleigh Dickinson l'nivf'rxit.v.

~ean\'tk · Ha<·kensack Campu3, Teanf'1·k. :\.J 07666. l X A.

901

2 of 18 Celltrion, Inc., Exhibit 1062

Page 3: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

902 C. Chothia nm/ A. M. Lesk

chains. He found t hat residues that are part of the h.YJ>l"'rvariable regions. and that are buried within the domains in t he known structures. are conserved. The residues he found conserved in V,i sequenl"e~ \\"ere different to t hose conserved in VK ~t>quen <"t'S.

The i<t>1·ond ,:;t:>t· of observations concerns the conformation of the hypervariable regions. The results of tht· structure analysis of Fab and Bence· Jones prot\>ins (Saul l't al .. 1978; Segal ''/ al., 197+: ~larquart et al .. 1980; Suh et n.l., 1986; !-ichiffer et al .. 1973: Epp et al., 1975; Fehlham mer et al .. 1975: Colman el rd .. 1977; Furey el al., 1983) show that in several cases hypervariable regions of the same size, but with d ifferent sequences, have the same main­chain conformation (Pa<llan & Davies, 1975; Fehlhamnwr et al .• 1975; Padlan et al. , 1977; Padlan , 1977b; Colman et al .. 1977; de la Paz et a.I .. 1986). Details of these observations are given below .

In this paper. from an analysis of t he immuno­globulins of known atomic stru<:ture we determine the limits of t he P-sheet framework common t.o t he known st.ructures (see section 3 below). We then identify t he relative!.'· few residues that , t hrough packing, hydrogen bonding or the ability to assume unusual </>. I/! or w conformations, are primarily responsible for t he main-chain conformations observed in the hypervariable regions (see sections 4 to 9, below). These residues a re found to occur at sites within the hypervariable regions and in the conserved 8-sheet framework. Some correspond to residues identified by Kabat r:t al. (1977) and by Pad lan (Padlan et al .. 1977; Padlan. 1979) as being important for determining l,he conformation of hypervariable regions.

Examination of t.he sequences of immuno­glohulins of unkno\\"n structure shows that in manv ('asE's the ~t>I of residues responsible for one of t h.e ohserved hypt>rni.riablf' conformations is present. This suggests that most of the hypen·ariable regions in immunoglobulins han? one of a small d isl"rete set of main-chain conformations that wf' <'all ··canoniC'al st rudme>s ' '. Sequence variations at the sil t's not responsible for t he ('()nformation of a partic·ular «anonical stru<'lure will modulate the su rfa( ·t· 1hat it. prest>nl s to an antigen.

Pri0>· tu thi~ anal)·sis, alt.empts to model t hf' <"Ombining sites of antihodit>>' of unknown st rm· ture ha\'t> bt·t>n ha>w<I on tlw assumption t hat hyper­,·ariable n·gions of t he same si:1a· have simila r ba!·kbone ~I rutturt>s (sf."e st>(·tion I:.!. ht'ln\\"). :\,; \\"€'

show bf:'low. and a~ has bt>t·n n·;ilir,ed in part be.fore. this is t.r11t> only in (·t>rtain inst a.ru·es. :\lodelling based on the ,.;<·ts of rl'~idtws idf'ntified here a:< respon::;ihle for th\· olisl' rn·d ('Onformations of hyp1·/'\'a rfab lc regions ll'ould be 1·xµel·kd t o giYl' mort• acn1rat<· rPsult s.

2. Immunoglobulin Sequences and Structures

Kahal et 11/. (1983) ha,·p publish!:'<! a mllrr·t ion of the known immunoglobulin seque>H't>.~ . For the

variable domain of tl1e light chain (Vdt they list some 200 complete and 400 partial sequences; for the variable domain of the heavy chain (VH) they list about 130 complete and 200 partial sequences In this paper we use the residue numbering of Kabat et al. ( 1983), except in the few instance~ where the structural superposition of certain hypervariable regions gives an alignment different. from that suggested by the sequence comparisons.

In Table l we list the immunoglobulins of known structure for which atomic co-ordinates are available from t,he Protein Data Bank (Bernstein et al., 1977) , and give t he references to the crystallo­graphic analyses. Amzel & Poljak (1979), Marquart & Deisenhofer ( 1982) and Davies & Metzger (1983) have written reviews of the molecular structure of immunoglobulins.

The \'L and VH domains have homologous structures (for references, Ree Table l ). Each contains two large fl-pleated sheets that pack face to face with their main chains about 10 A apart (l A = O· I nm) and inclined at an angle of - 30° (Fig. I ). The P-sheets of each domain are linked by a conserved disulphide bridge. The antibody binding site is formed by the six hyperl'ariable regions; three in VL and three in \'H. These regions link strands of the P-sheets. Two link strands that are in different {J-sheets. The other four are hair-pin t urns: peptides that link two adjacent strands in the same P-sheet (Fig. 2). Sibanda & Thornton (1985) and Efimov (1986) have described how the conformations of small and medium-sized hair-pin turns depend primarily on t he length and sequence of the turn . Thornton et al. (1985) pointed out that t he sequence-conformation rules for hair-pin turns can be used for modelling antibody combining sites. The results of these authors and our own unpublished work on t he conformations of hair-pin turns, are summarized in Table 2.

3. The Conserved ~Sheet Framework

Comparisons of the first immunoglobulin structures determined showed that. t.he fram1>work regions of different moleC'ules are very similar

t A bbrf'viatiom:: used: \'1. and \ 'H, variahle regions of thE' immunoglobulin light and hea''.'' c'hains. re~pl"'l'!in•l.Y : r.m.:< .. root · mean-:>quare; COR, 1·omplementarity·df.'tf.'rmining region.

Table 1 I 1111111111oglob11/i11 mriable domain,s of k1101c11

a.tomic structure

('hain T~·pe l'r(llt>i11 L H Refertnce

l'alo'XE\nl ,.\I 11 Saol el 11/. (1978) ~'ah )J('fl( 'fiO:i " l ~t'j!lll p/ rt/. (1974) ~·11h Kill. ,!J TTI Marquart el al. (198\ll ~'ab .Jr;:l!I

" Ill Soh el al. (J!lSli) \ 'L }{)<;}

" Epp el,,/, (1975)

'"- RHJ<: J.l Fort')' el al. ( Hl83)

3 of 18 Celltrion, Inc., Exhibit 1062

Page 4: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

The Structure of Hypervariable Regions 903

L2 f

Figure 1. The structure of an immunoglobulin V domain. The drawing is of KOL Vv Strands of .8-sheet are represented by ribbons. T he three hypervariable regions are labelled LI, L2 and L3. L2 and L3 are hairpin loops that link adjacent P-sheet strands. Ll links two strands that are part of different .8-sheets. The VH domains and their hypervariable regions. H l , H2 and H3, have homologous structures. The doma in is viewed from the P­sheet that forms the Vt. - VH interface. The arrangement of the 6 hypervariable regions that form the antibody binding site is shown in Figure 2.

(Padlan & Davies, 1975). The structural similarities of the frameworks of the variable domains were seen as arising from the tendency of residues that form the interiors of the domains to be conserved, and from the conservation of the total volume of the interior residues (Padlan, l977a, 1979). In atldition, the residues that form t he central region of the interface between Vi.. and VH domains were observed to be strongly conserved (Poljak et al. , 1975; Padlan, l977b) and to pack with very similar geometries (Chothia et al. , 1985).

In this section we define and describe t he exact ~xt.ent of the structurally similar framework regions in the known Fab and VL structures. This was de~rmined by optimally superposing the main ­chain atoms of the known structures (Table 1) and calculating the differences in position of at-Oms in homologous residuest.

In Figure 3(a) we give a plan of t he /J-sheet ~ramework that, on the basis of t he superposit ions, is ~ommon to all six VL structures. It contains 69 res1.dues. The r.m.s. difference in the position of the main-chain atoms of these residues is small for all pairs of VL domains; the values vary between 0·50 and 1·61A(Table3A). The four VH domains share a

t For these and other calculations we used a program system written by one of us (see Lesk, 1986).

N

c N

Figure 2. A drawing of the arrangement. of the hypervariable regions in immunoglobulin binding site:; . The squares indicate the position of residues at the ends of the /J-sheet strands in the framework regions.

common /J-sheet framework of 79 residues (Fig. 3(b)). For different pairs of VH domains t he r.m.s. difference in the position of t he main-chain atoms is between 0·64 and l ·-1.:l A.

The combined P-sheet framework consists of \"L residues-!. to 6, 9 to 13, 19 to 25. 33 to -1-9, 53 to 55, 61 to 76, 84 to 90, 97 to 107 and VH residues 3 to I:?, 17 to 25, 33 to 52, 56 to 60, 68 to S:!. 88 to 95 and 102 to 112. A fit of t he main-chain atoms of these 156 residues in the four known Fab structures gives r.m .s. differences in atomic positions of main-chain atoms of:

~ F,\\"i\I McPC603 ,J539

KOL l·39A 1·15 A 1· 14 A :-:EWM 1·47 A 1·37 A ~kP('603 t·03 A

The major determinants of the tertia ,..r stn1ttun· of t he framework are the residues buried within and between the domains. We calculated the ac:«essible surface area (Lee & Richards, 1971 ) of each residue jn the Fab and VL structures. In Table .i we list the residues commonly buried within the \'L and \'H domains and in the interface between them. T hese are essentially t he same as those identified by Padlan (1977a) as buried within the t hen known structures and conserved in the then known sequences. Examination of t he :'WO to 700 \'1.. sequences and 130 to 300 VH sequences in the Tables of Kabat et al. (1983) shows that in nearly a.II t he sequences listed there t he residues at t hese positions are identical with , or very similar to. \host' in the known structures.

There are two positions in the \ '._ :·wqut-nces at which the nature of the conserved residues depends on the chain class. Tn VA sequences, t he residues at positions 7 l and 90 are usua lly Ala and l:ier/ Ala, respectively; in V~ sequences the correllponding residues are usually Tyr/Phe and Gln/ Asn. These residues make conta<'I wit h t lw hypervariable loops and play a role in determining the conformation of

4 of 18 Celltrion, Inc., Exhibit 1062

Page 5: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

904 f '. Chothia and A . M. Lesk

Table 2 ('1111f11r111ritio11 of hair-pin 11lrnS

Conformationb

~tru\"lur't' Sequence' (.) Frequency'

I :! 3 .j iP•) 1/12 </12. t/13 ~.

:-. c: . <: - x +55 +35 +85 _5d or 6/6

+65 -12.'i -105 +10' :! - 3

I x. (:. x. x +711 -1 15 -90 0' 6/7

1:::-1 X- X- (: . X +50 + 45 +85 -20• 7/8

x. x. x. x +60 +20 +85 +:?;;' 41-I iPl I/II iP2 t/12 </13 t/13 </14 t/14

X- X·X · G -1 35 + 175 - 50 -35 -95 -1 0 +145 + l51i 4/4

3 I 2 3 .j 5• </12 t/12 iP3 t/13 tP4 t/14 </15 1/15 :?/ "-.i x x x x (; - 75 - 10 - 95 -50 -105 0 +85 - 160 3/3 I I

-90 + 130 1/1(3/3) 1-- - - x x x x x +;;o +55 +65 -50 -130 -~ __ _ ., /3'-.

:! .j 2 3 4 5• ,P:? 1/12 </13 t/13 </14 1/14

l / I (;

x X· X X· X -60 -:!.'; - 90 0 +85 +JO 13/15 ·~ --5 D

:! - .j :! 3 .j 5 6' iP2 t/I:? iP3 1/13 </14 1/14 tP5 1/15 I I t: 3/3 :! 5 x x x. x x. x - 65 - 30 - 65 - 45 -95 -5 +70 +35 '!/ :! I I x 1/1 1:::6

The data in thi• TahlP arP. from Rn nnpuhlish~d anll lyois of proteins whose atomic atrut•t ur .. has been determined at a resolution of 2 A or higher. The conformations dl"S<·ribed here for the 2-residue X-X· X -1: turn and the 3-residue turns are new. The other conformations have ~n described by Sibanda &. Thornton (1!111.;1 and by Efimo,· (1986). \\"e list only conformations found more than once.

• X indic·,11 ~~ no residu.- ..-~tri .. tiuu exct'pt that certain sites cannot ha,·p Pro. M this ~idue requires a 4> v11l ue of - - w and cannot form a hydrogen bond to its main-chain nitrogen.

• Residu~s whose </I.I/I valuPS are not given havt' a P conformation. ' ~·requencies are given as 11 1/n,. wlwrt- 11 2 is the number of cases where we found the structure in

column I with the ..equence in column 2 and n1 the number of these cases that have the conformation in t'Ol umn 3. r.., .... pl for the frequenc·iE>• in bra<'ket.'!. data is given only for non-homologous proteins.

d.•.r Tht·>i· nrc• typP I'. 11 ' and Ill' turn~. 1 Different t'Onformations are found for the single ~ases of X-D-G-X-X and X-C·X·G-X. h Differl!nt conformations are found for the single cases of X-X-X-X-X. X .(;.(: . ); . X and X-G·X·X·

(; . The:!\·"""" of X-X-X-\.\. have different <·onformations. ' Different l'onformations are found for the:! cases of X·G·X-X-X-X.

these loops. Thi:< is discussed in sec-t ion:< 5 and 7, lit' low.

The conservation of lhf> framework :<I ru<'l ure t-xtends In t he residues immediately adjan·nt lo the hypervariable region:<. lf thr l'"""l'n·t'cl franH'works of a pair of molecules are superposed , the differences in t h1· fJt>:<it ion:< of 1 lws<' t«·:<idm's is in mn:<t c·1\s1·:< le:<s than 1 .l. and in all but one case l!'ss than 1·8 . .\ (Tablt>5). Jn <·ontrnst. residut's in th t• hyperva.riablc region adjaecnt to the eonsern~d fram (·\\·ork <·an differ in position !.>· 3 .. \ or more.

The six loops. whose main-c·hain conformation~ ,·ary and whi<·h an· part of the antil11ul>· c·ombinin).( :<it ... are formed l1y r1·.;i1hu·-. :!Ii to :l:! . . )0 to .i:! and !ll to ! Iii in \ "L <lomainll, and 26 to:{:! . 53 to 1)5 and 96 to 101 in lhf' \'H domains LI , L:!. L3. HI . H:? and H3, respeC'tin·l,L Tht>i l' limit :< <1n· ,.:nnwwhat difft-rt-nt from tho:-1· 11f I he <'omplt>mt-nt arit ,. _ ciE>terminin~ reJZions defitwd li.y Kil hat ,,, rd. ( J !}8°3) 011 the hasts of sequc•nc·<- \';Hinbility : resid1tt's :!..t to

34 , 50 to 56 and 89 to 97 in \"Land 31 to 35. 50 ~o 65 and 95 to 102 in \ "H· This point is discussed m sec-t ion 11 , below.

4. Conformation of the L1 Hypervariable Regions

In t.he known \ · structures thC' conformations of I he LI regions, re~idues 26 t o' 32, are char~cterist~c of t he <-lass of tlw light. chain. Jn V, domat~s t~e~r conform a I ion is helical and in the \". domains it IS extended (Padlan et al .. 1977; Padlan. 1977b; de la Paz et rtl .. 1986). These conformational differences are the result of sequence differences in b~t h the LI region and t he framework (Lesk & Choth1a, 1982).

(a) VA do111ai11.<

Figure + shows the c-onformation of the LI rrgions of the \"., domains. The LI regions in RHE

5 of 18 Celltrion, Inc., Exhibit 1062

Page 6: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

The Str11c/11re of II Y/ll'l'l"flfiah/1' Regions 905

9

, .. '"' 13

'

6

4

, ,

',_

76 61

19

C ---------·

53

-----c

55

8

84

107 12

7

3

,,'' ,,, ' ' -, ......

112

18 82

C----------

w----­~::::

68

H2

60

Figure 3. Plane of the tl ·sheet framework that is conserved in the \"L and \ 'H domains of the immunoglobulins of

mown atomic structure.

and KOL contain nine residues designated 26 to 30. 30a, 30b, 31 to 32; ~E\\'~1 has one additional residue. The LI regions in RHE and KOL have the

3ame conformation: their main-chain atoms have a r.m.s. difference in position of 0·28 A. Superposition of the LI region of ~EWM with those of KOL and

RHE shows that the additional residue is inserted between residues 30b and 31 and has little effect on

the conformation of the rest of t he region: superpositions of the main-chain a.toms of 26 to 30b and 31 to 32 in NEWM to 26 to 32 in KOL and

RHE give r.m.s. differences in position of 0·96 A and 1 ·~5 A. Thus, the sequence alignment for the Vl Ll regions of KOL, RHE and NEWM implied by

the structural superposition is:

Position :!(; :17 28 29 30 :Joa 30b :111,. 31 :1:!

RHE Sn Ala Thr Asp Ile <:iv ~r A~n :-;.,r

KOL Thr Ser &>r Asn lie c(,· S.-r lie Thr

:>IE\\'~I ~r S.-r Ser A•n lie (;I~· Ala l:ly Asn His

Tn all three structures, residues :?Ii to 29 form a

t~·pe J turn with a hydrogen bond between the carbonyl of 26 and the amide of :?!1. Residues 27 to

30b form an irregular helix (Fig . 4). This helix :-.it:< across t he top of the /J-sheei core. The side-cha.in of

residue 30 penetrates deep into the core occupying a cavity between res idues 25. 33 and 71. The major

determinant of the conformation of L I in t he observed structures is the packing of residues :!5.

30, 33 and 71. V;. RHE, KOL and NEWM. have the

6 of 18 Celltrion, Inc., Exhibit 1062

Page 7: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

906 ( '. Chothia and A. M. Lesk

Table 3 /JijJ,.tt·11r•'·' in i1111111111oylril111li11 frflllll'trork

.. 11rnrl11 rP.~ ( . .t)

For l'•""' of \" domains '"~ give. the r:m.s. difference in. thl' atomic ('1Qsitio11' of frame\\·ork mam chain atoms aftpr optimal :-.Ultt'f1Ht"'lti11f\

:\ l't. tfomn;uli Framework residues al'I' 4 tu 6, !I tu 13, 19 to:?:.. 33 lo 4!l. 53 to :;:;. 61 tu 7ti. 114 to !10 and 97 to I07.

KOL ;l;E\01 REI )\t"Prno:i J!;39

RHE ()·74 1·47 1·46 1·61 1·41

l\llL l ·13 l ·:.!3 1·36 1-1:.

:'\E\01 1 :!-1 1·28 1·53

REI 0·50 ()·77 )(('('( 'Oll:l 0·76

B. J 'u d1m111hl"' Framework residues art> 3 10 l:! . 17 to :!5. 33 to 52, 56 to 60. 68 to X:!. i;s to 95 and Ill:! lo I I:!

:'\F.\DI M('PC'603 .J539

KOL H:! IHj,.j 0·89 :'\E\Dl 1·27 1·29 )lt 'l't ·1~1:1 0·89

,..amt' residues at these sitE>s: C:ld5. Ile30, Val33 and Ala71. (Another LI residue,· Asp29 or Asn29. is buried hy the conliwts it makes with L3.)

Kabat el al. (1983) listed 33 human \'1 domains

for which the sequences of the Ll regions are known. The 21 st-quences in subgroups I , II , V and \ . f ha ,.e L I regions that are the same length as those found in RHE, KOL or NEWl\1. Of these, 18 conserve the residues responsible for the observed conformations:

Residue position

25 30 33 7 1 2!!

Residue in KOL/RHE/NEWM

<:1,,· l it Val Ala

Asp/Asn

R~iduesin 18 \', sequences

I>!( ;r,. 17 V~l. l llt 17 \"al. I lie 18Ala 11 Asp. 6 Asn, I Ser

The ronservation of these residues implies that t hese 18 LI regions have a conformation that is the same as that in RH E , KOL or NEWM.

Subgroups TII and I\' have 13 sequences for which the L I regions are known (Kabat et al., 1983). These regions are shorter than those in RHE and KOL and in the other V, subgroups. They also have a quite different pattern of conserved residues.

Kabat f:I al . ( 1983) listed 29 mouse \·.domains for which the sequence of the Ll region is known. These Ll regions are the same size as that in X F:\ DI. They also have a pattern of residue <:on:sen·alion similar to, but not identical with, that in KOl./ XEWM: Ser at position 25, Val at 30, Ala at 33 and Ala at 71. This suggests that the fold of

Table 4 Rl'.sirlue.< ro11111uml,11 b11rin/ 11•itlti11 l'L and l'n domains

\"L domains r., domains

Residues in Residue~ in known A.k . ..\ .0 known A.S.A.'

Position strudures (,\2) Posi tion stru~ture~ cA1l

4 L.M 6 " L 14 Ii Q I:! 6 Q.f: 16 1!\ ,.

II 18 L :!I :! I I.~I I 20 L 0 :!:I (' ti .,.~ (' 0 :!."} 1: . A.X 13 :!of S.\'.T.A s :$:! \".L 3 :1-t ~l.Y 4 :1.; \\" ll 3G \\" 0 37 Q 30 38 R 13 47 L.l. \\" s 48 I ,\ · I 411 I 2-t 411 A.G 0 ti:? ~- 11 5 1 l.\".S 4 64 t; A 13 69 l.\'.M 13 71 .\ .F.Y 2 n1 L.F 0 73 L.F 0 XO L 0 ;;) I.\' ti >t? M.L 0 -<' D ~ 86 D 2 10 A.!' 11 88 A(; 3 86 y fl lNI \" 0 '<>< ( ' ti II:.! l' 0 !)tJ A S .(J .:'\ I 10~ 1; I I !17 \ ".T .1: 18 106 (; 19 !l!I c; 3 107 T .S 17 1111 c: II 10!) ,. 2 I ll:! T I 10~ L.\ · :!

. • )li·an ac~~~s;;ible .~urfaC'e. area (A.X .• \ .) of lht> ro«idu .. s i 11 th~ 1''1ib ~truC'tures ~EWM, MCPC603. 1, 0 1, and .J.1.U and 111 the\ L 'tru<·tures REI and HH E.

7 of 18 Celltrion, Inc., Exhibit 1062

Page 8: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

The 8tr11rf11re of Hypen•oriable Heyions 907

the mouse \'~ Ll regions is a distorted version of that found in tht> known human structures.

(b) 1·. domains

In Figure ,-, we illust rat-e the 1·onformation of t he LI regions in the three known \.K structures: J539. REI and \l('PC603. In J539 L1 has six res iduE>s. in RET it has seven and in MCPC603 13. The LI region of J539 has an t>xtended l'Onformation. In REI, residues 26 to 28 have an extended conformation and 29 to 3~ form a distorted type IT turn. The six additional residues in MCPC'l>ll:3 all occur in the region of this turn (Fig. 5). Tn the t.hree ;;tnu·turt':> the main chain of residues :W to 29 and :3:! haw the same conformation. A fit of the main· ('hain atoms of these residues in J539. REI and ~ICPC603 give:; r.m.s. differences in position of 0-4 7 to I ·03 A. The sequenc-e alignment implied b~· the ~t rul"lural superposition is:

Residue :!fj 27 28 29 :Jo 31 3la. 3Jb J539 ~(.'!' Ser Ser \'al :-\c-r REI Set Olu Asp [le lie Ly• MCPC-1m:1 Ser Ulu Ser Leu Leu Asn :-\l"I' f:ly

In J539, REI and MCP(.'603. residues 26 to :rn extend across the top of /J-sheet framework with one, 29. buried within it. The main contacts of 29 are with residues 2. 25. 33 and 71. The penetration of residue 29 into the interiOI' of the framework is not as great as that of residue 30 in the Y 1 domains, and the deep c-av ity that exists in \"1 domains is filled in V" domains hy tht> large s ide-chain of the i·esirlue at position 71. In J539. REI and :\lCPC603, the residues involved in the packing of L l (2. ::!fl. :!\!. 33 a.nd 71) are very similar: Ile. Ala/Ser. \·al/ Ile/teu, Leu and Tyr/ Phe, r·espeet.i\·t>ly .

The six residues 30 to 30f in :'ltCPC603 form a hair-pin loop that extends a wa>· from the domain (Fig. 5) and does not have a well-ordered conforma· tion (fiegal "'al., 197~)-

Kabat et al. (1977) noted that residues at c-ertain position!' in the Ll regions of the V" ,.;equt>rH·f'>< then known were conserved, and suggested that they havt' a structural role. The structural ro le of residues at positions 25. 29 and 33 is confirmed by the above analysis of the \"" structures and t,he pattern of residue conservation in t.he muc:h larger number of sequences known now. Kabat ,,, al. (1983) listed 65 human and llH- mou~e \'" sequenc·e>< for which the residues between posit.ions 2 and 33 are known. For about half of these. the residue at position 71 is also known. Thes\-' data ><how that there are 59 human and l-l-8 mouse i<equt>n t·e:< that ha,·t.' residues ,-en· similar to thosl' in the known stnwtures at the 'site:-; involved in the packing of LI:

Poi;ition

2 ~;}

29 33 71

. J.~39/REl/~lf 'l'f ·1;u:i

Ile Ala Ser

Val Ile Leu Leu

Tyr Phe

Human\" •

;,7 lie. I Met. 1 \'al !'d Ala, 7 Ser

ao Ile. ti \'al. x Leu ;;; Leu. t \"al :.'X Phe. I Tyr

The numher of residue,:; in the LI region m these :<(·quen1T1' ,-aries:

Hesi<lut- size of LI fi 7 8 9 JU 11 It 1:3 ~umber of human r. :lx 14 I ~ :! :-;umber of mouse \"• I I 40 :~:! 31i 30

The conservation of r<'sidut>>< at the positions buried between LI and tlw framework implies thal in the large majority of\"" domains rt>sidur:-: :W lo 29 han· a conformation clos€" to that found in the known struct.ures and tha.t the remaining residues. if small in number, form a turn or, if large. <.I hair-pin loop.

5. Conformation of the L2 Hypervariable Regions

The L2 regions have the same conformation in the known strndurt>s (Padlan Pl al., 1977; Pad la.n.

31c· 31d :Jle 31f 3:? Ser Tyr

.. ~\:-.:11 Glu Lys Asn Phe

1977/J: de la Paz ,,, rd .• 1986) expect for .'.\E\nl. \1·here it i:< deleted. v\'p find that the similaritil'" in the L2 struct ures arise from the conformat.ional requiremE-nts of 1:1 three-residue turn and tht· c-onservation of the framework residues <lgainst which L2 pack:<.

The know structures L2 consists of three n·,.;irlue,.; . 50 to n2:

ResiduP RHE KOi. Rf<:! ~H 'PC603 .Ja:m

:;o 'l\·r Arg Olu <a" Ulu 51 .\ :<II .\~1· .\la Al;. lie !):! Asp Ala S~r ~ ... r Ser

These three rt>sidue·s link two adjal"ent ,.;!.rands in the framework P·><lwl'I. Residues 49 and f>:J an .. h~·drogen bonded to each othel' ,.;o that the L2 region is a three-residue hair-pin turn (Fig. 6).

51

/ ""' 50 52 I I

49= = =53

The c'onformations of L2 in tht- fi,-f' stru<'turt-:-< art' ver,v s imilar: r.m . ~. differenc:t.'i< in position of their main·thain atom!> are lwt·wt>t>n 0· 1 and 0-97 A. 1'ht> only difference among the conformations i" in the orientaticm of the peptide bt-twet'n res idues i'ill and 51. In W'PC603 this difference is associated with t he Gly re:-;ichw at po,.;ition 50. The >1ide-<:hain>< of L:2 all point towards the surfac·e. The main-chain pac·k,.;

1:34 lie. 14 \"al 1(14 .\la. 4 ~r

Ml Leu . . ii \"al. :38 llt-$14 Lo·u. 44 ~It-I , i \"al . 3 llt­

!H Plw. ~Ii 1\r

8 of 18 Celltrion, Inc., Exhibit 1062

Page 9: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

908 I '. ('/lfJlhin nnd A . • 1/. /,Psk

32

26

KOL LI Figure 4. The C'onformation of the ~I regi.on. of \ 'A

l\:OL. The side-chain of lle30 is buried w1thm the framt-work struC'ture: st'\' seC'tion ~ .

against the consPrved framework residues llf'4 i and Gly6+/Ala6.t (Fig. 6).

K abat "' al. (I 983) gi,·<· t ht· se4uenn•)< of the L:! regions of I 74 \ 'L domains. l n all cases t lw.\' are three residues in length. Of the l 7-i. 122 do not contain (:ly and 49 have. like :\ICPC603, a Cly residue at position :)I). The residllPl< at position 48 and 64 are almost ah:<oh1tf>ly conserved as I le an<l Gh-. These size and sequenN' identities imply that :tl.:Oost all L:? regions ha,•e a conformation close to that found in the known st ructures.

MCPC603

Table 5 DifferenrP.~ in the positions of the framework rP.,idiua

'"~jfm• 11 t to the hypervariable regions i 11

i 111 m·u.noglobul in structures

Hypervariable A<lja <·1>11 t framework Differences in region

LI L2 1,,3 HI H:! H:l

re<iduc·• position (.\}

:?!; 33 fl·:!+I -1!1 53 0·3--0·5 !)() 97 0·8-1·0 :,?;, 33 fl·a 1·2 52 56 0·8-:H 95 ltl~ 0·5-1·2

6. Conformation of the L3 Hypervariable Regions

O<H).S I~~ J.4 ~8-1-2

0.3-1 ·~ l·~-1·7

II ·~ J.i

The L3 region, residues 91 to 96. forms the link 11c·lwf>c·n two adjac:ent :<trands of P-sheet. Our anah ·;;i,. of the strurt ure), and sequen<'e" known for this· region suggests that the large majority of• chains have a common conformation that is quite different from the conformations found in J. chains.

(a) l'A domains

The L3 region of\'; ~ E\DI has six rt'><idues and t hose of KOL and R.11 E ha.1·c· t' ight. :-i111K·rim~iLion of the three regions g il't's th <• following alignment:

91 92 93 93a 93b 9-1 95 96

~E\\'~I KOL RHE

Tvr Asp Arg T;.p Asn St'r Trp Asn ..\'I'

J539

REI

Ser Leu A'l? Ser Asp . .\~n $er Trr Ser Leu ..\>p C:lu Pio

32

Figure S. The C'Onforma111111 of the LI regions of\', ;\l('P<'603. \ '• REI and v. J539. Residues 26 to 29 and 32 hav~: :<ame c·onformation in the 3 strudurt>s. Tht' sidf'-C'hain of residue :!() is buried within the framework structure, ,,. .. 1·tion -L

9 of 18 Celltrion, Inc., Exhibit 1062

Page 10: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

The SfrucfurP of llyprn·ari((b/e Region., 909

52

KOL L2 Figure 6. The C'Onformation of thP L2 region of \'~

KOL. This region packs against framework residues Ile47 and Glyti4.

In a.II three \'i structures. residues 91 to !12 and 95 to 96 form an extension of the P-sheet framework with main-chain hydrogen bonds between residues 92 and 95:

93A.--93h I I

93 _ _ 94 93 94 I I I I

92===95 92= = =95 I I I I

91 96 91 96 I I I I

90= = =97 90= = =97

Residues 93 and 94 in NR\\':11 form a t11·0-n:.,<idut" type II' turn (see Table:?). Residues 93. 93a, 93b and 94 in RHE and KOL form a four-residue tu rn with the same conformation: the r.m.s. difference in the position of their main-chain atoms i:-; 0·19A. This conformation is found in almost all four residue turns that. like KOL and RHE, have(;)\· o r Asn in tht- fourth position of t he turn, positio~ 94 here (Sibanda & Thornton, 1985: Efimov. 1986; and see Table 2).

Ka.bat el al. (1983) listed :?7 human and:?.'; mouse r, domains for which t he seq11enc·(' of the whole of

· t~e .third hypervariable region is known. The ' distribution of s izes of th1t L3 region in U1ese sequences is:

Residue • iu· :\umber of human \', Number of mouse ,.,

:; 6 7

:!!)

i I:?

s 7

In the L3 regions with s ix residues \l'l' would expP<'I . as in :\E\\'}l, 91 to !I:! and !I:) to 96 to continue the P·sheet oft lw framework and 9;3 (.1i 9.J. t<i form a two-residue hair-pin t.urn. Ruff's r1'1 111in~

REI L3 Figure 7. Tht: conformation of t.he L3 region of \'.

REI. The conformation is stabili2ed by the hyclrogPn bonds made ll\' the framework residuP Gln90 and b\' the ri.~ conformati~n of the peptide of Pro95. ·

the sequence and conformation of two-residue t nrn,.; (Sibanda & Thornton. 198!); F.fimov , 1986) a rt> given in Table 2. Similarly, in L3 regions wit.h e ight residues 1rt- would <'Xpe1· t 91 to fl:! and !I;) to 96 to cont inue the P-sheet framework and 93. !I:~'' · 93b and 94 to form a four-residue turn.

(b) J'K d1J//lflilt8

The L3 regions in RF.I. ~l('J>( '@:~ and .J539 art: the same size:

91 !I:! 93 !I+ 9fi !lti

REI Tyr <:In :"'\er· Leu Pro .. r,·r )!<'PC 'fill:l A~p HiR St-r Tyr Pro l.~tt .)!):)9 Trp Thr T.n Pro Leu II<'

In R.EI and :\l< 'P('fiO:l. tlw L3 regions han· the same conformation: the r .m.s. differente in the positions of the main-chain 11t111ns of residues 91 to 96 is O·.J.:J A. L3 in J5a9 ha;; a e;onforma.tion different from that in REI and :\!( 'P< 'lill:l.

Normally, for six-residue looµs , w1· might <· xp('<·t t ht- main-chain a.toms of rt:s idul"s !l2 and!)!') to form hydrogen bonds. and residues 93 and 94 to form a turn (><t-t" t.he di,.;<·ussion of L3 in tht> \"i c· ha ins, ;;('('(ion 6{a). above). This conformat ion is prevented in the two \'~ ,.;t rnd.11 f(',.; R.ET and :\H ' I'( '()(I:~ Ii.\' a Pro residue at position 95. In thest• two \'.­:-;tnwt.ur<:'s . rt-~idue 92 has an 1XL c:onformation and Pro95 has a ris peptide . This puts residues 93 to 96 in an extended c·onformation (Fig. 7). Important determina.nt.s of t his particular L3 con format.ion are the hydrogen lionds formed to its main-cha.in a.toms by t he side-chain of framework residue 90. Though the s ide-chains at po:-;it ion 90 are not identical (Rl<:J

10 of 18 Celltrion, Inc., Exhibit 1062

Page 11: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

-.no C. Chothia and A. M. Luk

has Gin and l\1~0603 has Asn), the amides are in the same position and play the same role: the NH group forms hydrogen bonds to the carbonyls of 93 and 95 and the 0 atom forms a hydrogen bond to the amide of 92 (Fig. 7).

Although L3 in J 539 is six residues in length, it has Leu, not Pro, at position 95 and forms a two­residue hair-pin turn:

Tyr93 _ _ Pro94 I I

Thr92 = = = Leu95 I I

Trp9I Ile96 I I

Gln90 = = = Thr97 Because of the Pro residue at position 94, this turn has a conformation different from those in VA chains and those commonly found (see Table 2): Tyr93 has </>,t/I values - 5I0

, + 13I0; Pro94 has a cu-peptide

and </>,t/I values of -46°, -54°. Kabat et al. (I977) found that residues at

positions 90 and 95 in L3 regions of V,. sequences are conserved and suggested that they have a structural role. Kabat et al. (1983) listed I2I human and mouse V,. domains for which the sequence of the whole of the third hypervariable region is known. The size distribution of the L3 regions is:

Reaidue size Number of human v. Number of mouse v.

5 l l

6 36 81

7 12

2

Of the 117 L3 regions that contain six residues, 93 have Pro at position 95 and <Gln or Asn at position 90. Their size and sequence identities imply that these 93 have an L3 conformation that is the same as that found in REI and MCPC603. A further I6 of the J 17 have Pro at position 94 but not at 95 and are likely to have the L3 conformation found in J539.

7. Conformation of the Hl Hypervariable Region

The HI regions are the same size in four known structures:

26 27 28 29 30 31 32

KOL Gly Phe Ile Phe Ser Ser Tyr MCP0603 Gly Phe Thr Phe Ser Asp Phe J539 Gly Phe Asp Phe Ser Lys Tyr NEWM Gly Thr Ser Phe Asp Asp 'fyr

They pack across the top of the V domain (Fig. 2). Padlan (1977b) noted that the folds of HI in N J<:W:\t and MC'PC'603 are very similar. They a.re also i1imilar to those fn und in KOL and J539 1Fi1t 8) For he!'• four structures t he r.m.s. diHerenccs in ti-. i'\l4it.ion of tht ri.~ i· h>tin atoms

"l' ~ ~i., ~ · .. -·\.\· -~r. •)·I!. and l ~ A. Small ( - "tO OOC-Jl ·.' .. P.._ .10 t·.;. :::!. ; Fig. 8), which

\nsti. H · •. .i:! ••• ~'"z;~ l(<u 'o be the

result of changes in conformation and residue identity in H2.

In the observed HI structures the Gly at position 26 produces & sharp turn through a </>,t{! value ( + 75,0) outside the range allowed for non-glycine residues. The Phe at position 29 is deeply buried within the framework structure, packing against the side-chain of residue 34 and the main chain of residues 72 and 77. The residues at position 27, Phe or Thr, a.re partially buried in a surface cavity next to residue 94. In the four structures the residues a.t positions 26, 34 and 94 are identical or similar; Gly, Phe, Met/Tyr and Arg. .

Kabat et al. (I983) listed I85 human and mouse V" domains for which t he sequence of the first hypervariable region is known. Of 178, 170 are the same length as those found in the known structures, one mouse sequence is one residue longer, and six human sequences are two residues longer.

Of the 170 with seven residues, there are 115 for which the residue at position 94 is also known, Of these, three-quarters have residues at positjoJi.1{:26, 27. 29, 34 and 94 that are the same as or very close to those found in the known structures:

Residues Number of sequenOl!ll

26 27 29 34 94

Gly Phe Pbe Me~ Arg 50 Gly 'fyr Phe Met Arg 29 Gly Phe Phe Met Lys 4 Gly Tyr/Phe Phe Ile Arg/Lys 5

The conservation of the length of the loop and of the residues at the sites involved in the packing of HI against the framework implies that in at least these VH domains the conformation of HI is close to that found in t he known structures.

~ 8. Conformation of the H2 ..... Hypervariabie Region

The H2 region forms the link between t~e framework residues 52 and 56, which are tn

adjacent strands of P-sheet. The H2 loops differ ~ length in the known V" structures: in NEWM• 1t cont.a.ins three residues, in KOL and J539 four, and in MCP0603 six. Kabat et al. (1983) list 127 human and mouse V" sequences for which the sequenoe of the whole H2 region is known. In all but one, H2 has a length that is t he same as one of the known structures:

Residue aize of H2 Number of eequenC88

3 13

4 71

5 I

6 42

The 42 H2 regions with six residues are all mouse sequences in subgroup III.

The three residues in the H2 region of NEWM (Tyr53, His54, Oly55) form the apex of a seven'

11 of 18 Celltrion, Inc., Exhibit 1062

Page 12: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

The Structure of Hypervariable Regions 9ll

26~). , •

NEWM J539 Figure 8. The conformation of the Hl regions of VH KOL. \ 'H NEWM, VH MCPC603 and VH J539. The side-chain of

Phe29 is buried within the framewor k structure.

residue turn. The other four residues in the turn are part of the framework structure:

__.......,His54~ TyrM ,.... Gly5fi

I_,,.,,.--- I Phe52 _ __ Thr56

I I Val5l Ser57

I I Tyr50 = = = Asp58

The conformation of seven-residue turns is described in Table 2. The conformation found in NEWM is the conformation found for nearly all seven-residue turns that have Gly, Asn or Asp at the fifth position , position 55 in NEWM (Sibanda & Thornton , 1985; Efimov, 1986). Of t he 13 three­residue H2 regions listed by Kabat et al. (1983) , nine have a Gly residue at position 55 and four have Asp. We would expect these H2 regions to have the conformation found in NEWM.

The H2 regions in J539 and KOL, 52a to 55, form four-residue turns:

Asp53 __ Ser54 I I

Pro52a Gly55 I I

His52 = = = Thr56 J539

Asp53 __ Gly54 I I

Asp52a Ser55 I I

Trp52 __ _ Asp56

KOL

The conformation of these turns is determined by the position of the Gly residue. H2 in J539 has the co~formation most commonly found for four­res1due turns (Sibanda & Thornton, 1985; Efimov. 1986; see Table 2): the first three residues are in an approximately aR conformation and the fourt h

(Cly55) in an <XL conformation. H2 in KOL is different: Gly54 is in an <XL conformation and the other t hree residues are in an aR conformation.

Of the 71 H2 regions with four residues, Cly. Asn or Asp residues 01;1.:ui- at position :J4 in ten cases, at posit ion 55 in 12 cases and at both positions 54 and 55 in 32 cases. Jn those with a Gly, Asn or Asp residue at position 54 only, we should expecl an H2 conformation like that in KOL. In t hose cases

94 Arg

IOOc Ph~

49 Tyr

MCPC603 H3 Figure 9. The eonformation of thf' H:l rP~ i11 n of

;\l( ' J'('t)O;{. BPsid 111 .. IOOh T;vrlOOf, pa<·k" agaim<t Tyr.J~ I of t lw \·,_ domain . Phe.IOtlr- al~o f'<t•·ks int.he \ 'L \ 'H interfal:f'. R'' " id11..-s .-\rg<J4 and A><f'IOI lnnil >1 ><alt 1, , ;.1 •.•

12 of 18 Celltrion, Inc., Exhibit 1062

Page 13: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

'912 0. Chothia and A. M. Lesk

where a Gly resftlue occurs at position 55 we should expect an H2 conformation like that in J539.

The six residues in the H2 region of MCPC603 are part of a ten-residue hair-pin tum. At pr~nt there is too little experimental and theoretical evidence to formulate rules governing the conformations of such large turns. Therefore we do not know if the other six-residue H2 regions in the mouse subgroup-III have a conformation close to that found in MCPC603.

9. Conformation of the H3 Hypervariable Region

The H3 region consists of residues 96 to IOI. The VH structure is formed by the recombination of three genes: VH, which codes for residues I to 94 or 95; D, which codes for between one and 13 residues, and JH (for a review, see Tonegawa, 1983). There are six human JH germline genes that code for the following amino acid sequences:

conformation of H3 is determined mainly by the interactions of residues Arg94, Tyrl()()b, PhelOOc and AsplOl within the VH domain and at they -V interface. (The importance of TyrlOOb-PheloOc i~ ma.king the conformation of H3 in MCPC603 different from that in NEWM was noted by Padlan et al. (1977).) The side-chains of residues 96 to lOOa are on the surface of the protein.

Arg at position 94 packs across the H3 hair-pin and forms a surface salt bridge with AsplOI. TyrlOOb and PhelOOc pack in the VL-VH interface (Fig. 9). Residues at positions equivalent to J()()c are usually part of the conserved core of the VL-VH interface (Chothia et al., 1985). Residues at this position are Phe or Leu in 83 % of known sequences. In MCPC603, TyrlOOb packs into a large cavity adjacent to Tyr49 of VL (Fig. 9). The hydroxyl groups of both Tyr residues are on the surface. Tyr or Phe occurs at position 49 in 82 % of Vt domains. Different residues at the position equivalent to lOOb can produce different H3

JH1: Ala· Glu· Tyr· Phe- Gin· His- Trp· Ciy- Gin· Gly- Thr- Leu- Val· Thr- Val· Ser· Ser JH2: Tyr· Trp· Tyr· Phe· Asp· Leu· Trp· Gly- Arg- Gly- Thr· Leu- Val- Thr- Val · Ser· Ser JH3: Al&- Phe· Asp· Val- Trp· Gly· Gin· Gly- Thr- Met· Val· Thr· Val · Ser· Ser JH•· Tyr· Phe· Asp-Tyr- Trp- Gly- Gin- Gly- Thr· Leu· Val· Thr· Val · Ser· Ser J"5: Asn-Trp· Phe· Asp· Ser· Trp· Gly- Gin· Gly· Thr· Leu- Val· Thr- Val · Ser· Ser JH6: Tyr· Gly· Met- Asp· Val- Trp- Gly- Gin· Gly· Thr· Thr· Val· Thr- Val · Ser· Ser

(Ravetch et al. , 1981) and four mouse genes that code for the following amino acid sequences:

JHJ · Trp-Tyr- Phe· Asp- Val- Trp- Gly- Ala- Gly- Thr· Thr- Val· Thr· Val· Ser· Ser· HH2: Tyr· Phe· Asp· Val- Trp- Gly- Gin- Gly- Thr- Thr- Val- Thr- Val · Ser- Ser J,13: Trp· Phe- Ala· Tyr· Trp· Gly- Trp· Gly· Thr· Leu· Val· Thr- Val · Ser· Ala J11,: Asp· Tyr· Trp- Gly- Trp- Gly- Thr· Ser- Val· Thr- Val · Ser· Ser

(Sakano et al ., 1980). Because the joining ends of the D and JH genes can be varied, the residues coded at the beginning of JH genes may not be present in the flnal structure. Further sequence diversity in this region is produced by somatic mutations. So it is not surprising that the H3 regions of J539, NEWM, MCPC603 and KOL differ greatly in size (6, 7, 9 and 15 residues, respectively), sequence and conformation. Here we shall confine our discussion to the H3 region of MCPC603, as our analysis suggests that its conformation is found at least in part in several other immunoglobulins.

The H3 region in MCPC603 forms a large hair-pin loop:

ThrlOO __ Ser99 I ', I

TrplOOa ', Gly98 I I

TyrlOOb "° Tyr97

I "°"° I Phe I OOc '' Tyr96 I I

Asp IO I ..... , Asn95 I ', I

Val102 __ _ Arg94

F~r ::.ul·h l~rgE> loop::i t he range of allowed <P.t/I values wi ll pE>rm1t sl:.'\·eral conforms.l ions and the one act~ally found will dt>pt-nd upon the packing l\llarn"I tht.- n•st 111 t lw protein. In ~IC PC'603 the

conformations. For example, in KOL it is Gly and the cavity adjacent to Tyr49 in VL is filled by Phe at a position equivalent to lOOa. This contributes to making the conformation of H3 in KOL very different from that found in MCPC603.

The sequence Tyr-Phe-Asp at positions 100-100· 101 is found in the human genes JH2 and JH4 and the mouse genes JH1 and JH2 (see above). The human JHs gene has Trp in place of Tyr. If these residues are not removed during gene recombine.· tion or by somatic mutation, we should normally expect these J 4 genes to produce an H3 conforma· tion close to that found in MCPC603. We inspected the sequence tables of Kabat et al. (1983) for U3 regions that are at least six residues in length, have an Arg residue at position 94, Asp at position 101 and Tyr-Phe in the two positions preceding 101, i.e. those t hat have the form :

n Tyr100 ........ X97

I ',_::-,, I PhelOO ........ X96

I I AsplOI ........ X95

I ' .... I X l02 ____ Arg94 11

13 of 18 Celltrion, Inc., Exhibit 1062

Page 14: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

'l'he 8trucf11re of Hypercoriab/(' Regfo11« 91 3

Kabat el al. (1983) listed the entire H3 region for 28

human and 77 mouse sequen<·t>s. Of these, one

human and 48 mouse sequences fulfil t'xartly the

si:i.e and sequence conditions. Another five human

,equenC't'<' are close in that the.'· differ on ly hy

hill'ing a Lys residue at posit.ion !H. Plw or 1'rp in

place of Tyr. or :\lt't in place of Phe. The size

distribution of the H3 regions in I hesE> :._. s<'f111enti>s

is:

H3rtsidue length 6 1 8 9 10 II I:! 13 14

:\umber of sequences 18 4 I Ii l!O 3 " ti I

For thest' sequences Wt' would expect the s tt•m of

H3 to have the same conformation as that found in

~f('PC603. The conformation of the remaining

(distal} part of the small and medium-s ized H3

loops may be given hy tlu· turn rules described in

Table2.

10. The Effects of Environment on the Structure of the Hypervariable Regions

The descriptions of the hypervariable regions

given abo,·e suggest that their main-chain

l'Onformations are determined ;;olt>ly hy particu la r

residut>i< within each region. In reality we should

expect the conformations t.o be affected LI\· their

e11"i ronment. The effects on a particular region can

be di,·ided into two parts: loeal ehanges in

conformation and changes in the relati,·e position of

the region in the binding site. A measure of the differem·r in conformation of

two peptides is the r.m .s. difference in position of

their atoms after they ha,·e been optimalJ~·

superposed. In the sections abov(' \\'{' report the

r.m.s. differences for the main-chain regions of

hypervariable regions with t lw same fold in

different immunoglobulin s tructures. The r.m.:<.

differences in position are small. For the :<t ruc·t ures

determined at high resolut.ion the~· are less than

O·:'i • .\. For those determined at medium resolution

tht'.\' are usually l ·O A or less and are due mainly to

differenc•es in the orientation of peptides. It i:< only

m the H l region that we find significant. though

small, differences in conformation (see sN·tion 7.

above). To determine differences in the r·elati,·p positions

of hypervariable loops in the immunoglobulin

structures we did the following calculation. The Fab

protein;; ~EW.l1 . :\1CPC603, KOL and J539 were

s.uperposed h» a fit ofthc V1..- VH framework residut,s

hsted in section 3, above. The ,.L st ructures REI

and RHE were superposed on the Fa.bs l1y a fit of

t~e \"L framework residues. r\ fter the sup~rposition of the fram ework, we calculated the additional shift

required to superpose h~·pervariable regions of the

same fold; for example, the common residues in the

LI regions of J539, REI and :\J('T'( ·11oa. .The results of these calculations arc· given in

Fig.ure 10. Jn t h(• Fab proteins. hypervariable

regions of the same fold differed in position liy 0·2 10 I ·5 . .\ . [n part these differt>1wes occur because,

although the \'cVtt dimers have the same 1mtt t-rn

of resi~ue <"Ont.a(·tl' and \'I' '".' ' similar paC'king

J.!eOmPt r1es (Poljak el 11/.. I !li!i: Padlan. J 979: l'hothia ef nl., I 985). I here are small differences in

the ori l'lllat.ion of rH l't' lat i\'(• to VL (Da\' it>s &

:\lctzg<•r. 1983). The RP.I \ "1.. stru('t.ure was dett>rmined from a

13enc·t'·Jones protein. This contains a \ 'L_, .L dimer

.ind their pa<'king in RET is , .•. ,T similar tu the \ ' -

\ 'H packing in the Fab:s (Epp. el al .. 1975). The

positions of the REI hypervariable regions rt'l11tive

to the framew ork are the same as those that o<'cur

in the Fahi; (Fig. 10). I n t ll('se stn1l'tures, t herefore. differences in t.he

en,·ironrnent of the h,\'(l<, n ·a r iable rPgions prodm«·

onl!' small differences in main-chain conformation

and differences of no more than I ·5 ·' in their

position relative to the framework. In the Benee­

,Jom·,; protf'ins RH E and :\IC'O. the hyiwn·ariable

regions have erwironments \ 'Pl'\' different from

I hose I hat would normally I~ found in the

immunoglobulins. ·

The pa<"king of \ 'c \ 'L dimer in RHE is quite

different from that found for \'L - \·H dimers (Fur('y

et al., 1983) and s11 thP l·n,·ironments of its

hypervariable regions are ,.Pr.v different from t hose

found in tht" other immunoglobulins d is(•ussed here.

These differences in eiwironment have little l' ffeet

on the conformations of L1 . L:! and L3 in RHE:

they fit the homologous regions in Fab KOL. with

r.n1.s. differences in their co-ordinate:< of less than

0·3 ..\ (see above). 'T'lwy do htH'f' i:;<1rne offe<'t on the.

.- --, •L3 1 RHE I I

'L2: VS

: I I LI I KOL 1 I

HI MCPC J539 NEWM

HI 603

I .--i ! •L3• HI I I

HI L2 L2 L2 L3: I :c2: .--i

: LI LI LI L2: I LI I

0 .0 0 .4 0 .8 1.2 1.6 2 .0 2 .4

Differences in Relative Position CAI Figure 10. l>ifft•rt>111·t's in tht' Jl<>i<tlion. r••lative to the /3·

shet>t franwwork . of homnlogou~ h.\'flt'rvariahlt> r+·i.:ions.

Tht• method uS('d to determine the differt>rl<'f's is d1·,wribt-d

in thf' IP.xi . ( 'ontinuou:; li11 ps ~·ndc1:•1• thr diffe rt>nC'f's found

llf'tmfn hypen·ariable rt-gi1111:< in Fab ><t ruc·tun·s. Brokrn

linl'S enclost- tht' differen«•'" found 1,,..1111•1•11 hyper"ariable

1~gions in Bf>rn·.-.. J,.m•,, p1nt1•ini; anrl F'abi<. Thf' lar~"

differl'nC-ei< found brtw<'t'tl thr rei::ions i11 RHE and thP

Fahs art' la~lled . Differt-lltl'S ll't-1'<' dett·n11ined for the

relativt> J>Ol!ition of the main·<·hain atom:< of thl' ,. .. s1d1..-,.

of: LI in RHE. KOL and ~!<:\DI :l6 to :l:!: LI in .)539. REI and ~ll'P<'603 :!Ii to :!!I and :t!:

J,:! in RH K KOL. J 539. REl and ~H'PC'tio:~. :"1t11" :;:.?:

L:lin RHEanc! KOL. 9 1 to96: L3 in R 1-;1 anc! :\l<'P('60:3. 91 to !Iii. and

HI in KOL. X h:\\' .'.II. ;\1('1'('603 anti .J539.

14 of 18 Celltrion, Inc., Exhibit 1062

Page 15: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

91-' f'. Chothia and A . JI. Lesk

po~ition of LI , L2 and L3 relativ.e to the framework. The positions in R HE can differ by up to 2·2 A from the positions found in the F abs (Fig. 10).

Jn the structure of the Bencl'-.Jones prott>in )f('(;

(Sd1iffer et al .. 1973). a more complex situation is observed. The c·n·iltal of this prot.ein has the dimer in the 11"vmmetric unit, with the two VL monomers in diffor~nt environments. The LI region of onl' monomer is in the lwlical conformation that we would expPct from its sequenee. The Ll region of the other monomer is pre,·ented from having this conformation by the close approach of residues 31 and 32 to a neighbouring molecule and i t is quite d isordered (Sthiffer, 1980). The observation that t he close conl aC't produces disorder, rather than an altc·niittin• conformation , suggests that t ht· L l region has only a limited flexibility.

11. The Residues that Form the Immunoglobulin Binding Sites and their Surface Area

(a) R~.~id11P.~ in the region of thP hindiny .~ifP8

In the preceding sections we han· made a prt'1·ise structural distinction het ween two parts of the variable domains: the conserved P-sheet framework and the regions of variable main-chain conforma-

tion. What a.re the contributions of these two parts to tlw antigen-binding site?

T he hypervariable regions cluster at one end of the \ 'L - VH dimer and present a surface, part of which interacts with the antigen. Figure 11 shows a spa~·e-filling drawing of this region in )ICPC603. The residues accessible to the solvent in this part of the prott'in art- 27 to 32, 49 to 53 and 92 to 94 in r and 28 to 33, 52 to 56 and 96 to IOOa in VH· Th~ regions outside the P-sheet defined in section 3 art': 26 to 32, 50 to 52 and 91 to 96 in VL and 26 to 32, 52 to 56 and 96 to 101 in rH. The limits of some of the regions of a(·tt-s:sible residues difft'r by up to two residues from t.he limits of these regions. In Table 6 we list tht· accessible n•sidui>s that form the same region in J539, K OL and XEW:\l. The limits of the accessible loops in KOL, ~EW)l. J539 and MGPC603 a re very similar but not identical. Taken together they show that the residues a\·ailable for binding to antigens a.re largely those in the structura lly variable regions defined in section 3, above. \ ·ariations of loop size and sequence may result in one or two residues at the loop ends becoming buried or one or two framework residues becoming exposed.

Except for H2, the regions listed in Table 6 are si milar to the c·omplementarity-determining regions (CDRs) determined from sequence variability

.• :~gu~e 11. I Jnrn ing of a !lparf'·fill~ng model of the hyp1>rq1riahlf' regions of MCPC603. \\'f' show the superposition of5 s\~ <t1ons rut throuJ(h •l model at 2 .l. intervals. ,Just abov~ the seetion i;hown here are residues 3la to 3 Id in \' and ii:!(· io

H· L

15 of 18 Celltrion, Inc., Exhibit 1062

Page 16: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

The St rurt 11 re of H .IJJH'tl'firiuh/1- Regions 915

Table 6 / 11111111noglobuli11 bind1'.ng sites: the re11id11es a11t! tlwir tll're88i/JI,, .~ 11 1fm·,, ,,,.,,,,_. ( . .P)

Rtgions of Residues a1 .. 1'SSible in the region of thf binding "il.fs

\'ariable KOL main·rhain

:\ E\\')1 )ll'PC'1~>:1 ,Jl)39

structure Rt>sidues A.K.-\ . Residues A .S .\ Rcsidu ... .\ s A. Rt>•idues A.SA.

LI :26-33 :!i- 3:! :!70 27-33 L2 . iiO- ;;:! 4!) 53 360 L3 : 91 96 91 - 94 :!+n 9 1-96 HI :!6- 32 :!8 :l:! :!::!II 28-:13 H:! : 53- 55 52- 58 4-10 fd- 58 H:l :96-101 96- lOllg 600 96- 100

Total A.$.A. 2190 -----A.s A .. mean accessible ~urfac~ area.

(K11.ba.t et al., 1983). H2 in Table 6 covers residue1< 50 to 58; the corresponding C'l>R covers residues 50 to 65. Padlan (1977a) found that t he fi rst three and last six residues of this UDR had the same structure in NEW:\! and ;\JCP0603. We find that this is also true for KOL a.md J539. Residues 59 to 65 run down one side of the \ '" domains and art' fairh· remote from the other hy pervariable regions. Th·e side·chains of 59 to 65 a,re acrt>1<:-;ihle t.o the solvent and the variation in their sequence may refle<'t only a lack of structural and fund.ional constraint.

(b) Surface area of residues in the binding .~i11-.<

Table 6 also lists the accessible surfaee a reas of the loops that make up t he binding sites. H3 in KOL is unusually large, 15 residues, and it makes the largest cont ribut ion to the tota.I surface. This is not the case in J539, XEWM or MCPG603, in which medium-sized H3 regions make contributions similar to these of the other loops. The important role of H3 in antibody specificity a rises not from il1< siie but from its central posit ion in l he binding site (Fig. II ).

The total accessible surface a rea, of the region of the hypervariablc loops in J539. KOL and ~l('PC'603 is -2250 A2 ; in NEWM, which is unique in having I.2 deleted, it is 1760 A 2 (Table 6) . Analy­sis of oligomeric proteins shows that the cases where the structures in the isolated and associated state,; arl' \'t'r~ similar, stable associations are formed by surfaces with surface area that are smaller than the total found for these binding sites (Uhothia & J anin . 1975, and our unpublished work). Typical!."' each monomer buries 500 to IOOO A in the interfaC't, a. quarter to half of the total accessible surface a rea of lht hypervariable loops. The number of hydrogen bonds and salt bridges in t hese interfact·,; varies. (T n tho!\e cases in which association involves <"hanges in structure or the stabilization of loops that do not ~iave a fix ed structure, larger surface areas a re involved.)

The expectation t hat antibody- prot.ein inter-

-1.-.0 :!7 :J:! 770 :!7- 30 :!411 49 53 190 40- 5:1 310

:!.i4 !l:! 94 lift 9 1 !16 330 380 :!II <l3 :!40 :!X- 33 :J6() +30 52- 56 620 .".U-ll8 430 :!i'>IJ 96- I OOa :1:10 ~1:\- 1 00 .~i\11

1764 :!:l:.?tl :!~:!•>

actions would involvl' surfa.('E's similar to those found in oligomeric proll'ins is supportt'd h~· a dE'sniption of the <·omplt>x formed h.\' irnmuno· g lobulin Dl.3 and the antigen hen egg·whit<­lysozyme (Amit 1-I al .• 1986). T ht- association dm-s not involve significant changt-s in main -chain conformation. 'fht- antil111d\' rt-1<id uE:'s that make contact wi th the lysozyme a r~t- 30. :l:! . .i!l to 50. 91 to 93 in \ .1. and 30 to :!2. 52 l o !'l-l and 96 to 99 in \ ·H· The interface consist:< of 690 ..\. 2 of t he antiboch· surface and 750 A 2 of the Pn1..\'lllt' surfa<·t>. ·

12. Conclusion

Jn this paper \l't• have attt:'mpted to identi~\' t he residues t hat determine t ht· t'onformations nf the h,\'pernlriable regions. \\'e have proposed that . if t.he residues we have identified an· found in t he sequences of other imrnunoglobulins, t h('ir hyper· variable regions will IHH'e Hie same <'Onforrnationl' as those found in the known strud ure that shares t he same cha.raderist i1· residues. The an <1 l\'s i1< of LhE' immunoglobulin sequences implies that n;ost of thE' hypervaria.ble regions hiwe one of a small si·t of main-chain conformations. \\'l' ('all thes(' eommon conformations "canonical stru<"tur<'s".

The atomic struC'lures of the \ '., domains All and ROY (Fehlhammer et"/.. 1975: C'olman Pl fll .. 1977) givt' suppor t to some of the conclusions of our analysis. The 1<E'quen<"t's of these two prolt'ins diffE'r from that of RE1 at 18 and 16 positions. respectively. Eight of the t h<1nj!1>cl po1<il io111< are in hypervariable regions:

J>u..,ttion :10 31 3:! 00 Ill H:! 93 !16 REI Ill' L,I'' Thr C:Ju Tyr Ian :--: ... r Tn Al' l-it-r .hp Tvr .\<p Tyr A~p Tyr T~µ ROY ~r lie Pht .-\xp Phe .-\..;11 Asn Leu

T he residue changes a.t positions :m. 31 . 93 and 96 involve large differences in rnlume and "hem intl character. However. from the ana l.\·1<is g iven a hovt' we would not t•xpe(·t. t lwr11 to prodUl't' a ma.in-chain conformation for LI a nd L3 different from that found in R.E1 and in fad no differt·111·t-<' in main-

16 of 18 Celltrion, Inc., Exhibit 1062

Page 17: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

916 r·. Glwthi11 and A. M. Lesk

chain tonformation are seen (Fehlhammer et al., 1975· Colman "' al., 1977). T~ test the a<·tura<·y of t he analysis we ;lf'f>

applying our result.:< to pre~id t he strm·t~res of the \'ariable domains of new 1mmunoglobulms. In all l'a><!:'s our predic·tions are being recorded prior to the d!:'termination of the structures by X-ray analysis.

There is a fundam i>ntal difference 111.•t wf•en the> met hod of predic•tion based on the work described here and t he met hods used by previous workers (Padlan et rd .. 1977: Da,·ies & Padlan, 1977: Potter et rt! .. I 977: i-;1a nford & Wu , 1981 : Feldmann et al., 1981: de la Paz et al.. 1986). Those authors c·ompared the sequenc·es of the hy pervariable regions in their immunoglobulins with the sequen('es of t.he corresponding hypervariable regions in the known structures and then built a modt'I of t'ac·h loop from the region dosest in size and o\'erall sequence homology. In some cases ad­hoc adjustment::- were made to accommodate differences in sequence (Padlan et al., 1977: Feldmann r l al., 1981).

In the prediction method based on t he work described her!:'. we are only concerned with t.he presence in the sequenc·e whose structure is to be predicted of t he few particular residues t hat are responsible for the canonical structures. For example, to determine the conformation of Ll we would examine t he residues at positions 2, 25, 29. 30, 33 and i 1. If t.he residues found at t hese positions matched one of the sets listed above in sed,ion 4. we would ex pec·t LI to have the corr!:'><ponding canonical ><truc·ture what ever residues oc·c-urred at t he other positions. If t he residues at these positions did not match one of those sets, we would not expect one of the known ('anonieal strudures, however close the homology in the rest of t he hypen·ariable region.

A prediction was made for the st ructure of immunoglobulin Dl.3 and sent to the group carrying out the X-ray analysis prior to its structure determination (Chothia et al., 1986). The «Onformation of the main chain was prediC'ted using the analysis desC'ribed here; for the tonformation of the sid;-chains we used a procedure described Jffe,·iously (Lesk & Chothia , 1986). After t he predidion was made. the atomic structure of Dl.3 was determined from a 2·8 A electron density map (A mit ~/ al .. 1986). Predictions were made of the framework ><truc·t ure and of each of the six h~·pervariable region:<. Of th!:' 62 re::;id11es buried within or betwef' n the domains (Table 3) , 56 in Dl.3 are identical with those in the known structures and t he other !;ix differ hy no more t han a methyl group. The predid ion that the structure of thf' {J­i;heet fram .. work of D 1.3 is the same as that in the known st ruct.un·>< wa::- confirmed bv tlw ervstal slruc-ture analvsis. · ·

Three of th~ h~·pervariable region:< of Dl.3 (Ll , LZ and H:!) nre th\> same $ize a::: one of the known 1·anonic-al st ructures and , at. lht;> sites wt· identified 1ls important in de1ermining t heir con formation . 1lw_v c·ontain ident i1·;\l residue><. T he predit'tion t ha 1

tlw folds of these three regions would be close to t hose of t he c·anonical structures was confirmed by c-rystallographie analysis (Chothia et al ., 1986). ·

The other t hree hypervariable regions hav~ sc4uen<·es t hat are the same as, or similar in size to. known c·anonical structures but, at tht· silt>i; responsible for their conformation, they have similar but not identical residues. In making predictions of th t- structure of t hese regions. we had to judge whether the differences would produce a different main-chain conformation. Our prediction that t hey would not was correct in one ('ase, H3, and partly incorrect in the other two, L3 and HI .

This one test carried out so far supports our assertions that we have identified the residues responsible for t he conformation of the hyper­variable regions in the known structures, and that if these partic:ular residues occur in other immuno­globulins t heir hypervariable regions will have the same structure. It also suggests t hat when the residues are not ident ical it is difficult to predic·t the structure with confidence, even if t he changes are small. Further predictions have been made for the structure of the ' 'ariable domains of four imrnuno­globins whose X-ray analysis is in progress: DB3, NC4l. H20 and NQI0/12.5 (~tura et al., 1987: Laver r-1 al., 1987; Mariuzza et al., 1985). Co-ordinates of the predicted structures have been sent to the groups carry ing out the structure analyses.

Our analysis of t.he immunoglobulin sequences shows that many of their hypervariable regions form one of t he canonical struct ures found in the six \'L domains or four VH domains of known structure. The conclusion that most hypervariable regions have one of a small num ber of main-chain conformations may have only limited application to H3, where t he \'ariation in size and sequence is much greater than that found in the other regions. Our analysis does suggest, however, that half the H3 regions in t he known mouse sequences have a conformation that, at least in part, is close to that in MCPC603: a point confirmed by the successful prediction of H3 in Dl.3 (Chothia el al., 1986).

The a nalysis of additional antibody t r>1stal struetures will extend the repertoire of canonical structures. Attempts to predict additional structures, and their t est.s after the structures hal'e been determined cryst allographically, will improve our ability to understand the effects of the changes that <·an 'o<"rnr in the residues responsible for their conformation.

The predic-tion of antibody structures is of use not only in t t>s1 ing t he accurac>· of our identification of the residues responsible for the conformation ?f canonical structures. It is of central importance m engineering antibodies of a prescribed specificity.

W.- thank John Cresswell for the drawings, Ors . .\. Feinst.ein . .\I. Levitt. R. :Vlariuzza and G. Winter for dis<'ussion and comments on the manuscript. and the Ro.ml Sot"it>t\'. the U.S. National Scienre Foundation ( 1'<':'11 8~-21117 ·1 ), thE> National Institute of General ~l eclic-al Sciences (I : ~L25435) . and the European Molecular Biology Urganization for -' upport.

17 of 18 Celltrion, Inc., Exhibit 1062

Page 18: 1 of 18 Celltrion, Inc., Exhibit 1062...Cyrus Chothia and Arthur M. Lesk 38 1 of 18 Celltrion, Inc., Exhibit 1062. J. Jin/ Biol. ( 1987) 196 . 90I -91i Canonical Structures for the

The 8trw:l1ire of ll!JpPr•'flri11h/~ RPyions 917

References

Amit . A.G .. Mariuzza, R. A .. Phillips, 8. E. \'. & Poljak. R. J . (1986). Srie111'P. 233. 747-753.

Amzel. L. )I. & Poljak. R. J. (1979). A 11111c Rer. Biochern. 48. 961-997.

Bernstein. F. l' .. Koetzle. T. F .. Williams, C. J. B., '.\l~rer. E. F., Brice, .M . D., Rodgers. J . R .. Kennard. o . .' Shimanouchi. T. & Tasumi. M. {1977) . • J . • l/ol. Biol. 112, 535-542.

Chothia, C'. & J anin, J. (1975). Nature (London). 256. 705- 708.

Chothia, C .. Novotny. J., Bruccoleri , R . & Karplus. l\I. (1985). J. Nol. Biol. 186, 651--663.

Chothia. C .. Lesk. A. M .. fa•vitt . M .. Amit. A. G .. . \lariuzza. R. A .. Phillips. S. E. \'. & Poljak, R. J . (1986). Sl'iP>lfP.. 233. 755- 758.

Colman, P. M., S('hramm, H. J . & Guss, J . :\I. (1977). J. J/o/. Biol. 116. 73-79.

Davies, D. R. & Metzger. H . A. (1983) . .-1 111111. /fo" lmmunol. 1, 87- 117.

Da,•ies. D. R. & Padlan. E. A. (1977). ln .-lnlibodie.~ i11 Human Diagnosis and Thera.py (Haber. E. & Krause, JI.. eds) p. 119, Raven, :-;ew York.

de la Paz, P., Sutton. B . J., Da~ley. :\1. J. & Rees. A. R. (1986). Eil1BOJ. 5. 415--1-:?5.

Efimov, A. (1986). Mo/. Biol . (C.8.S.R.) . 20. 250-260. Epp. 0., Latham, E., Shiffer. ~!..Huber. R. & Palm. W.

(1975). Biochemistry, 14, 4943-4952. Fehlhammer, H., Schiffer, :'ll., Epp, 0 .. Colma n. P. :\!..

Lattman, E. E., S<·hwager. P. & Steigemann, \\' . (1975). Biophys. Strurt . .lfrrh. 1, 139- 146.

Feldmann. R. J .. Potter, ~I. & Glaudemans. ('. P. ,J. (1981). llfolec. /11111111110/. 18. 68:~-698.

Furey, \\'., Wang, B. C .. Yoo. C. K & Sax. M. (1983). J. Mol. Biol. 167, 661- 692.

Kabat, E. A. (1978). Advan. Protein ('hem. 32. 1- 75. Kabat, E . A .. Wu, T. T. & Bilofsky, H. (1977) . ./. Biol.

( ' flPlll . 252, 6609--6616. Kabat. E. A .. Wu, T. T., Bilofsky , H .. Reid-Milner, M. &

Perry. H. (1983). Sequences of Proteins of Immunological lnterest, 3rd. edit., P ubli<' Health Service, NI. H. Washington, D.( '.

Lan·r. \\' . G .. Webster. R. G. & Colman , P. M. (1987). Virology. 156, 181- 184.

Lee. B. K. & Richards, F. :'11. (1971). J . .llol. Biol. 55. 379-400.

Lesk . .-\ . ;\I. (1986). In Biosequenc.es: Per.<peclirP.s and flsn 8errire.~ i11 Europe, (Saccone. <: .. ed.), pp. :?3- :?S. European Economic Commission , Strasbourg.

Lesk, A. M. & Chothia, C. (1982). J. 111ol. Biol. 160. 325-:l~:!.

Lesk, .-\. \1. & ('h•>lhia . ('. H . (1986). Phil. Tmn.,. R. Sot'. fond . . ,n. A. 317. 345-356.

:\larit1zza. R. A .. Boulot .. ( ;., Guillon. \' .,Poljak. R. J .. Berek. C' ... Jarvis. ,J . '.\I. & \l il><t,.in. C'. (1985) . ./. Biol. f'hn11. 260. 10268- I02i0.

:\larquart, ?\I. & Deisenhofer, J. ( 1982). ln1111111tt1/tJffY

Today. 3. 160-166. '.\larquart. '.\I., Deisenhofer . . J .. Huber, R . & Palm . \\'.

(l980). J. Mol. Riol.141 . :369- 391. Padlan, E. A. (1977a). Pror.. Nat. A.cad. Sl'i .. l'.S .A . 74.

:!:35 1- :?5!i!i. Padlan, E. A. (1977b). Q. R~,. . B'ioplty.s. 10. 3n- 65. Padlan. E. A. (1979). Mo!. !111111 1111<1!. 16, 287 :!!Jo. Padla.n, E. A. & J>a,·il's. 0 . R. (1975). Proc. Nf// . . -l f'a1/ .

Sci .. c.s .. -1 . 72. 819-823 . Padlan. E . A., J>a\'i""· D. R .. Pt'dll . I .. (; i\·ol. D. &

Wright.('. (1977). Cold Spring Harbor S!fmp. Quant. Biol. 41. 627- 637.

Poljak, R . . ].. Amzel. L. '.\1. , Chen. B. L .. Phizat"kerley. R. P. & s.i1w . F. (1975). /1111111111"'.l''11~r i"·'· 2. 39:!-:l94.

Potter. :\I., Rudikoff, K . Padlan. E. A . &. Vrana .. ~I. (1977). In Antibodie.~ i11 Hu111a11 Diag1w11i11 a11d Theravy (Haber. E. & Krause. R. '.\I. eds). pp. 19-28. Ra\'t>n, New York .

Ra\·et('h . J. \ '., Siebenlist, U .. Korsm t-.\'t·r. S .. Waldmann . T. & Leder. P . (1981). ('ell. 27. 583-59 1.

Sakano, H .. :\laki. R., Kurosawa. Y., R<>t'der . \\' . & Tonegawa. S. (1980) . .\'fllurP (London). 286. 676-683.

Haul. F .. Amzel. L. & Poljak. R. J. (1978) . ./. Biol. (;hem. 253. 585- 597.

Schiffer, :\I. ( 1980). Biophys . .J. 32. 230- 231. Schiffer, M., Girling. R. L., Ely. K . R. & Edmund~on.

A. B (197:3). R;ochem.i.•try. 12. 4620- 4631. f't'gal. D., Padlan , E., Cohen . G .. Rudikoff. :-; . Pot te r, ~I.

& Davies. D. (1974). ProC'. Sar . ...ll'lld. SC'i .. /'.S.A . 71 . 4298-4302.

Sibanda., B. L. & Thornton . . J . )I. (1985). .VoturP (London). 316, 170-174.

Stanford . . J.M. & Wu, T. T . (1981). J. Tlieoret. Biol. 88. 4~1 -439.

Stura, E ... \ .. Feinstein. A . & \\'ilson. I. ,.\ ( 1987) . • / . Mot. Biol. 193, 229-231.

:-;uh . S. W., Bhat. T. :\ .. :\a\'ia. M.A .. Cohen.(: . H .. Rao. D. :\., Rudikoff, S. & Davies. D. R. (1986). I'm/Pins. 1, 74-80.

Thornton. J. '.\!.. Sibanda. B. L . & Tador. \\'. R .. (1985). In /11 1•P8ligatio11 and b'.rp/oitalion 1~j .-I 11lilJ11r!y ( 'om/Jin . ing Site.~ (Reid, R .. Cook,(;. :1·1 \\' . & ;\·lorst-. D .. J .. eds), pp. 23-3 1. Plenum. Xt->"' York .

Tonegawa. S. (1983) . • Va/urP ( Lonr/011 ). 302. 575-581.

Edited by R. Huber

18 of 18 Celltrion, Inc., Exhibit 1062