5
Proc. NatL Acad. Sci. USA Vol. 78, No. 2, pp. 785-789, February 1981 Biochemistry Complete amino acid sequence of the Fc region of a human 6 chain (immunoglobulin D/internal homology/domain origin/evolution) TOMOTAKA SHINODA*, NOBUHIRo TAKAHASHI*, TATSUNORI TAKAYASU*, TSUNEO OKUYAMA*, AND AKIRA SHIMIZUt *Department of Chemistry, Tokyo Metropolitan University, Setagaya-ku, Tokyo 158, Japan; and tDepartment of Medicine, University of Osaka Medical School, Osaka 553, Japan Communicated by Frank W. Putnam, October 20, 1980 ABSTRACT The complete amino acid sequence of an Fc-like fragment designated Fc6(t) and obtained by limited proteolysis with trypsin of an intact myeloma IgD protein (NIG-65) has been determined. The fragment contains 226 amino acid residues and has a molecular weight of 32,000 per monomeric unit. It has three glucosamine oligosaccharides at asparagine residues 68, 159, and 210. Of these, glucosamine-159 is characteristic of the 8 chain and has no counterpart position in any of the other classes. On the other hand, glucosamine-68 is shared by y, Au, and e, and glucos- amine-210 is shared by a and ,A. Although the Fc6(t) has the com- mon framework structure of immunoglobulins, its sequence has many individual characteristics when its two domains are com- pared separately with the counterpart domain of other heavy chains. Such comparison has shown that the two Fc domains of the 8 chain should be placed in an independent branch in topology; for all the other classes, the Fc domains are paired well with their counterparts. The comparison has also shown that there are three prominent gaps by which each domain can be divided into two homologous halves. For each class of immunoglobulin, a moderate degree of internal homology exists between the first half and the second half of each domain of the Fc, suggesting that the pri- mordial gene may have coded for a unit about the size of a half domain. Based on this observation together with sequence com- parisons, a possible genetic mechanism is proposed for the origin and evolution of the genes for immunoglobulin domains. Since the discovery of IgD as a fourth major class of human immunoglobulins (2), studies have been undertaken to eluci- date not only its biological functions but also its physicochemical and chemical characteristics. Although preliminary studies have shown some characteristics of IgD (3), detailed analyses have been hindered mainly because of (i) the low incidence of IgD myeloma, and the relatively low concentration of IgD in such plasma, (ii) the extreme susceptibility of IgD to various proteo- lytic enzymes (4), which hinders preparation of intact IgD in quantity, and (iii) great charge heterogeneity, mainly due to high carbohydrate content (12-15%) which causes difficulty in purification of IgD. However, because IgD has been demon- strated to be a major class of surface immunoglobulins on hu- man B lymphocytes (5) as well as on B cells of animals (6), studies have been carried out to learn whether the IgD molecule plays an essential role during cell differentiation, including antibody synthesis. Accumulating data have shown that IgD is closely involved in some cellular events, although its key role is still to be clarified (7). Little progress has been reported on structural studies of the IgD molecule, but preliminary work has unveiled some of its characteristic features (8, 9). These data show that, unlike other classes, IgD has only one disulfide bridge between heavy chains, and its heavy chain (6) is likely to have either a five- domain structure (10) or a somewhat extended hinge structure (4). Some sequence studies were also done on the sites of spon- taneous cleavage with various proteolytic enzymes and an intact human myeloma IgD (9). Recently, some amino acid sequences have been determined for an Fc8 fragment generated by short- time trypsin digestion (11) which revealed some characteristics of the 8 chain. More recently, a four-domain structure for hu- man IgD molecules and processes of its fragmentation with proteolytic enzymes have been reported by Lin and Putnam (12). Despite great interest in the structure of the IgD molecule in order to elucidate its relationship to cellular events, no rel- evant data have been available so far. In this context, we un- dertook structural analysis of a human myeloma IgD (13) and now have completed the amino acid sequence of the entire Fc,8(t) fragment obtained by limited proteolysis of the intact IgD with trypsin. MATERIALS AND METHODS For purification of IgD, 2 units of frozen plasma (NIG-65, ap- proximately 600 ml) were melted in 400 ml of 0.15 M NaCl/ 0.12% NaN3, pH 7.2/25 mM E-aminocaproic acid/10 mM CaCl2. Saturated (NH4)2SO4 solution/0.05% NaNJ10 mM E- aminocaproic acid, pH 7.2, was added, and the precipitate ob- tained between 1.5 and 1.8 M (NH4)2SO4 was collected. The precipitate was dissolved in 10 mM phosphate/0.05% NaN3, pH 8.0/10 mM E-aminocaproic acid and dialyzed against the same buffer. The solution was applied to a DEAE-Sephadex A- 50 column, and most of the IgD was eluted with the same buffer containing 80 mM NaCl. The IgD fraction was concentrated on a column of DEAE-Sepharose CL-6B and was purified further by gel filtration with a Bio-Gel A-5 m column in 0. 1 M Tris HCV/ 0.15 M NaCl/0.05% NaN3, pH 8.0/10 mM E-aminocaproic acid. Fractions were assayed by immunodiffusion and immu- noelectrophoresis for immunoglobulins, a,-acid glycoprotein, prealbumin, albumin, group-specific component, transferrin, fibrinogen, ceruloplasmin, and K and A Bence Jones proteins. The purity of IgD-positive fractions was also tested by poly- acrylamide gel electrophoresis (4.5-7.5%) before and after treatment with sodium dodecyl sulfate and also by NH2- and COOH-terminal analyses. Purified IgD (1.2 g) in 60 ml of 10 mM phosphate/0.05% NaN3, pH 8.0, was digested with 12 mg of trypsin (treated with L-1-tosylamido-2-phenylethyl chloromethyl ketone, twice crys- tallized, Worthington) at 370C for 2 min. The reaction was stopped by addition of 24 mg of lima bean trypsin inhibitor Abbreviations: Fc8(t); an Fc-like fragment of IgD resulting from limited proteolysis with trypsin; PAMs, accepted point mutations per 100 res- idues. Abbreviations for classes, fragments, regions, and domains of immunoglobulins are in accordance with official World Health Orga- nization recommendations for human immunoglobulins (1). 785 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertise- ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

Complete acid Fc region 6 - Proceedings of the National .... 78, No. 2, pp. 785-789, February 1981 Biochemistry CompleteaminoacidsequenceoftheFcregionofahuman6chain (immunoglobulin

  • Upload
    vocong

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Proc. NatL Acad. Sci. USAVol. 78, No. 2, pp. 785-789, February 1981Biochemistry

Complete amino acid sequence of the Fc region of a human 6 chain(immunoglobulin D/internal homology/domain origin/evolution)

TOMOTAKA SHINODA*, NOBUHIRo TAKAHASHI*, TATSUNORI TAKAYASU*, TSUNEO OKUYAMA*, ANDAKIRA SHIMIZUt*Department of Chemistry, Tokyo Metropolitan University, Setagaya-ku, Tokyo 158, Japan; and tDepartment of Medicine, University of Osaka Medical School,Osaka 553, JapanCommunicated by Frank W. Putnam, October 20, 1980

ABSTRACT The complete amino acid sequence of an Fc-likefragment designated Fc6(t) and obtained by limited proteolysiswith trypsin of an intact myeloma IgD protein (NIG-65) has beendetermined. The fragment contains 226 amino acid residues andhas a molecular weight of 32,000 per monomeric unit. It has threeglucosamine oligosaccharides at asparagine residues 68, 159, and210. Of these, glucosamine-159 is characteristic of the 8 chain andhas no counterpart position in any of the other classes. On theother hand, glucosamine-68 is shared by y, Au, and e, and glucos-amine-210 is shared by a and ,A. Although the Fc6(t) has the com-mon framework structure of immunoglobulins, its sequence hasmany individual characteristics when its two domains are com-pared separately with the counterpart domain of other heavychains. Such comparison has shown that the two Fc domains of the8 chain should be placed in an independent branch in topology;for all the other classes, the Fc domains are paired well with theircounterparts. The comparison has also shown that there are threeprominent gaps by which each domain can be divided into twohomologous halves. For each class of immunoglobulin, a moderatedegree of internal homology exists between the first half and thesecond half of each domain of the Fc, suggesting that the pri-mordial gene may have coded for a unit about the size of a halfdomain. Based on this observation together with sequence com-parisons, a possible genetic mechanism is proposed for the originand evolution of the genes for immunoglobulin domains.

Since the discovery of IgD as a fourth major class of humanimmunoglobulins (2), studies have been undertaken to eluci-date not only its biological functions but also its physicochemicaland chemical characteristics. Although preliminary studies haveshown some characteristics of IgD (3), detailed analyses havebeen hindered mainly because of (i) the low incidence of IgDmyeloma, and the relatively low concentration of IgD in suchplasma, (ii) the extreme susceptibility of IgD to various proteo-lytic enzymes (4), which hinders preparation of intact IgD inquantity, and (iii) great charge heterogeneity, mainly due tohigh carbohydrate content (12-15%) which causes difficulty inpurification of IgD. However, because IgD has been demon-strated to be a major class of surface immunoglobulins on hu-man B lymphocytes (5) as well as on B cells of animals (6), studieshave been carried out to learn whether the IgD molecule playsan essential role during cell differentiation, including antibodysynthesis. Accumulating data have shown that IgD is closelyinvolved in some cellular events, although its key role is stillto be clarified (7).

Little progress has been reported on structural studies of theIgD molecule, but preliminary work has unveiled some of itscharacteristic features (8, 9). These data show that, unlike otherclasses, IgD has only one disulfide bridge between heavychains, and its heavy chain (6) is likely to have either a five-domain structure (10) or a somewhat extended hinge structure

(4). Some sequence studies were also done on the sites of spon-taneous cleavage with various proteolytic enzymes and an intacthuman myeloma IgD (9). Recently, some amino acid sequenceshave been determined for an Fc8 fragment generated by short-time trypsin digestion (11) which revealed some characteristicsof the 8 chain. More recently, a four-domain structure for hu-man IgD molecules and processes of its fragmentation withproteolytic enzymes have been reported by Lin and Putnam(12).

Despite great interest in the structure of the IgD moleculein order to elucidate its relationship to cellular events, no rel-evant data have been available so far. In this context, we un-dertook structural analysis of a human myeloma IgD (13) andnow have completed the amino acid sequence of the entireFc,8(t) fragment obtained by limited proteolysis of the intact IgDwith trypsin.

MATERIALS AND METHODSFor purification of IgD, 2 units of frozen plasma (NIG-65, ap-proximately 600 ml) were melted in 400 ml of 0.15 M NaCl/0.12% NaN3, pH 7.2/25 mM E-aminocaproic acid/10 mMCaCl2. Saturated (NH4)2SO4 solution/0.05% NaNJ10 mM E-aminocaproic acid, pH 7.2, was added, and the precipitate ob-tained between 1.5 and 1.8 M (NH4)2SO4 was collected. Theprecipitate was dissolved in 10 mM phosphate/0.05% NaN3,pH 8.0/10 mM E-aminocaproic acid and dialyzed against thesame buffer. The solution was applied to a DEAE-Sephadex A-50 column, and most of the IgD was eluted with the same buffercontaining 80 mM NaCl. The IgD fraction was concentrated ona column of DEAE-Sepharose CL-6B and was purified furtherby gel filtration with a Bio-Gel A-5m column in 0. 1 M Tris HCV/0.15 M NaCl/0.05% NaN3, pH 8.0/10 mM E-aminocaproicacid. Fractions were assayed by immunodiffusion and immu-noelectrophoresis for immunoglobulins, a,-acid glycoprotein,prealbumin, albumin, group-specific component, transferrin,fibrinogen, ceruloplasmin, and K and A Bence Jones proteins.The purity of IgD-positive fractions was also tested by poly-acrylamide gel electrophoresis (4.5-7.5%) before and aftertreatment with sodium dodecyl sulfate and also by NH2- andCOOH-terminal analyses.

Purified IgD (1.2 g) in 60 ml of 10 mM phosphate/0.05%NaN3, pH 8.0, was digested with 12 mg of trypsin (treated withL-1-tosylamido-2-phenylethyl chloromethyl ketone, twice crys-tallized, Worthington) at 370C for 2 min. The reaction wasstopped by addition of 24 mg of lima bean trypsin inhibitor

Abbreviations: Fc8(t); an Fc-like fragment of IgD resulting from limitedproteolysis with trypsin; PAMs, accepted point mutations per 100 res-idues. Abbreviations for classes, fragments, regions, and domains ofimmunoglobulins are in accordance with official World Health Orga-nization recommendations for human immunoglobulins (1).

785

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertise-ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

786 Biochemistry: Shinoda etaP

HEAVY CHAIN1 1 10 20 30

Thr-Pro-Gl u-Cys-Pro-Ser-Hi s-Thr-Gl n-Pro-Leu-Gly-Val -Tyr-Leu-Leu-Thr-Pro-Al a-Val -Gl n-Asp-Leu-Trp-Leu-Arg-Asp-Lys-Al a-Thr-

40 50 60Phe-Thr-Cys-Phe-Val -Val -Gly-Ser-Asp-Leu-Lys-Asp-Al a-Hi s-Leu-Thr-Trp-Gl u-Val -Al a-Gly-Lys-Val -Pro-Thr-Gly-Gly-Val-Gl u-Gl u-

S 70 80 90I Gly-Leu-Leu-Gl u-Arg-Hi s-Ser-Asn-Gly-Ser-Gl n-Ser-Gl n-His-Ser-Arg-Leu-Thr-Leu-Pro-Arg-Ser-Leu-Trp-Asn-Al a-Gly-Thr-Ser-Val -S CHO

100 110 120Thr-Cys-Thr-Leu-Asn-Hi s-Pro-Ser-Leu-Pro-Pro-Gl n-Arg-Leu-Met-Al a-Leu-Arg-Gl u-Pro-Al a-Al a-Gl n-Al a-Pro-Val -Lys-Leu-Ser-Leu-

130 140 150Asn-Leu-Leu-Al a-Ser-Ser-Asp-Pro-Pro-Gl u-Al a-Al a-Ser-Trp-Leu-Leu-Cys-Gl u-Val -Ser-Gly-Phe-Ser-Pro-Pro-Asn-I le-Leu-Leu-Met-

160 170 180S Trp-Leu-Gl u-Asp-Gl n-Arg-Gl u-Val -Asn-Thr-Ser-Gly-Phe-Al a-Pro-Al a-Arg-Pro-Pro-Pro-Gl n-Pro-Gly-Ser-Thr-Thr-Phe-Trp-Al a-Trp-

CHO

190 200 210S Ser-Val -Leu-Arg-Val -Pro-Al a-Pro-Pro-Ser-Pro-Gln-Pro-Al a-Thr-Tyr-Thr-Cys-Val -Val -Ser-Hi s-Gl u-Asp-Ser-Arg-Thr-Leu-Leu-Asn-L I CHO

220 226Al a-Ser-Arg-Ser-Leu-Gl u-Val -Ser-Tyr-Val -Thr-Asp-Hi s-Gly-Pro-Met

FIG. 1. Amino acid sequence of entire Fc region of the NIG-65 8 chain. The residues are numbered consecutively from Thr-l to Met-226. Thisnumbering will be changed as the complete sequence of the 8 chain becomes available. A tentative alignment for disulfide bridges is given by thesolid line; this is based on homology to other immunoglobulins. Glucosamine oligosaccharides (designated CHO) are indicated under the residuesto which they are attached.

(twice crystallized, Worthington). The digestion mixture wasfractionated into the Fab and Fc fragments by column chro-matography on DEAE-Sepharose CL-6B (14). For reductionand aminoethylation, the purified Fc,(t) (420 mg) was dissolvedin 20 ml of 1 M Tris HCl, pH 8.6/6.8 M guanidine hydrochlo-ride/2 mM EDTA and treated as described (14). Aminoethyl-ated Fc5,(t) (100 mg) was digested with 1.3 mg of trypsin at 370Cfor 3 hr, and the digest was chromatographed on a column ofDEAE-Sephadex A-25. Peptides were further purified by gelchromatography with Bio-Gel P-6 (14). Digestion of aminoeth-ylated Fc,(t) (120 mg) was done with 4 mg of staphylococcalproteinase to obtain necessary overlaps for the tryptic peptides.Cyanogen bromide cleavage and purification of the fragmentswere carried out essentially as described (13).

The methods for molecular weight determination, for aminoacid and carbohydrate analysis, for NH2- and COOH-terminalend group determination, for digestion with carboxypeptidasesA, B, and P, for hydrazinolysis of the Fc,(t) and of some pep-tides, and for sequence determination by manual Edman deg-radation have been described (14, 15). An overlap for residues178-181 was done with the Beckman model 890B sequenatorby courtesy of F. W. Putnam (Department of Biology, IndianaUniversity, Bloomington, IN).

RESULTS AND DISCUSSION

Purity and Enzymatic Digestion of NIG-65 IgD. On poly-acrylamide gel electrophoresis, the isolated myeloma proteinNIG-65 showed a single relatively broad band accompanied byan additional faint component with a slightly larger mobility andaveraging about 2% of the total material. The purified proteinwas tested with specific antisera at concentrations of 0.5-20.0mg/ml and was found to react only with anti-8 and -A antisera,indicating that it was homogeneous in this criterion. In all, ap-proximately 8.4 g of purified IgD was obtained from 2 units ofNIG-65 plasma by this procedure. The molecular weight of the

major component of the purified NIG-65 was estimated to bebetween 168,000 and 172,000 (mean, 170,000). The minor con-taminant in the purified IgD preparation reacted only with anti-8 antiserum, and its mean molecular weight was estimated tobe 84,000.

Quantitative NH2-terminal analysis of the intact IgD gave1.86 mol of alanine (from the 8chain) and 1.82 mol of asparagine(from the A chain) per 170,000 g. After reduction and alkylation,the IgD could be separated into heavy and light chains by gelfiltration on a Bio-Gel A-5m column. The molecular weight ofthe heavy chain was found to be 63,000, whereas that of the lightchain was 23,000. The 6 chain contained approximately 480 res-idues, without correction for the destruction of certain aminoacids dring acid hydrolysis. Virtually all the carbohydratesfound in the purified NIG-65 IgD, including sialic acid, glu-cosamine, and galactosamine, were recovered from the 8 chain.On the other hand, the A chain contained 220 residues and hada homogeneous NH2-terminal sequence of Asn-Leu-Met-Leu-.Its complete sequence has recently been determined, givinga new A chain subgroup of VVI (14).Amino Acid Sequence. The complete amino acid sequence,

including assignments of all the amides and locations of threeglucosamine oligosaccharides of the Fc,(t), of NIG-65-IgD isgiven in Fig. 1. The sequence is the same as that of the Fc,(t)of IgD protein WAH, which was determined independently byLin and Putnam (16) using different approaches and techniques.The striking evidence that the Fc,(t) sequence is identical in thetwo 8 chains does not necessarily exclude the possible existenceof allotypes, subclasses, or isotypes of the 8 chain.The Fca(t) fragment has 226 residues with a single homoge-

neous NH2-terminal threonine as well as homogeneous COOH-terminal methionine, and it has the common framework struc-ture shared by all the other immunoglobulin classes so far ana-lyzed. That is, it has five half-cystine residues and they are dis-tributed in a fashion similar to that in the other heavy chainclasses. The half-cystine residues are cysteine-4, which is in-volved in the inter-heavy chain disulfide bridge and intra-chainbridges Cys-33/Cys-92 and Cys-137/Cys-198.

Proc. Natl. Acad. Sci. USA. 78 (1981)

Proc. Natd Acad. Sci. USA 78 (1981) 787

10 20 30

6 S H T Q P L G V Y L L T A V Q D - L W L R D K A T F|T C F V V G S

y E L L GG P S V FL F P P K P K DT L M IS RT P E VTP C V V V D V

p D Q D T A I R V F A I P P S F A S - I F L T K S T K L T C L V T D Le S N P R G V SL R P F D - L F I R K S P T I T C L V V D La P S C CH P R L S L H RUP A L Z. B - L L L G S Z A N Ll T C T L T G L

40 50 60D L - K D - A H L T E - V A[ K - V P T G G V E E

S H E DP Q V K FN W Y - V DIG VQ V H - - NAKTTT Y DS - VT I SWTTRE EG - AV - - - KTH TA P S K GT V N L T W S RA S G K P V - - - N H S TR D - A S G V T F T P S TS[ G K - - - - - S A V Z

70 80 90 100 110 1206 G L L [R H S N G S Q S Q H S R L T L P R S LW N A GT S V T C T L N H P S L -P P Q R L M A L R E P A - - - A Q A P V K

y KP R E QQ Y B S T Y R V V S V L T V L H Q N W L D G K E Y K C KV S N KA L P A P PI E KT I S KA K GQ P R E - P Qp B I S E S H P B A T F S A V G E A S I C E D B W N S G E R F T C T V T H T D LPS P - L K Q T I S R P K G V A L H R P B

R K E E IKQ R N G T L T V T S T L P V G T R D W I E G E T Y Q C R V T H P H L P R A -L M R S T T KT S G - P R A A P E

xG P PU R D L C G C Y S V S S V L P G C A E PW B H GK T F TUC T A A Y P E SKTT P L T A T L S K - S G - N T F R P Q

130 140 150 160 170 1806 L S L N L L A S S D -P P E A A S W L L CE V S G F S P N I L L M L E D Q R E - V - - - N T S G F A P A R P P P Q P

y V Y T L P P S R E E -MT K N Q V S L T C L V K G F Y P S D I A V E W E S N D G E -- - P E N Y K T T P P V L - - D S D

PA V V L L P P A R E Q L N L R E S A T I T C L V T G F S p A D V F V E W Q M Q R G E P L S P E K Y V T S A P M P E P - Q AC V Y A FAT P E W P - GS R D K RT L A C L I QN F M P E.D I S V Q W L H N E VQ - L P DA R H S T T Q P R K - - - T K

C1 V H L L P P P S Z Z L A L B Z L V T L TUC L A R Gl So K D V L V RUW L - Q G S Q E L P R E K Y L T W A S R Q E P S Q G

190 200 LO 220 230 2406 G S T T F W A W SV LR VP A P P S P Q P A T Y T VV SIHlE D S R T L - - LN A S R S L E V S Y V T D H G P M

r G S - F F - L Y S K L T V D K S R W Q Q G NV F S|C|S V M H|E A L H N H Y T - Q K S L S L S P GP P G R Y F - A H S I L T V S E E E W N T G Q T Y TICIV V A HEEA L P N R V T - E RT V D K S T G K P T L Y N V S L V M S

C G S G F F - V F S R L E V T R A E W Q E K D E F I|CIR A V HIEA A S P S Q T V Q R A V S V N P G K

T T T F A - V T S I L R V A A E D W K K G D T FSF M V G JZ AALP L A F T - Q K T I D R L A G K P T H V N V S V E M A

FIG. 2. Amino acid sequences [standard one-letter code (21)]Iof the FC regions of human IgD, IgG1, IgM, IgE, and IgAl, denoted by 8, y, ,u, E,

and a, respectively. The sequence is not given for the hinge region or for the last seven residues of ,u and a but only for the middle domain (the firstdomain of Fc-namely, C82, C,2, C,,3, C.3, and C.,2)-and the last domain (the second domain of Fc-namely, C03, C,3, C, 4, C.4, and C,,3). Thesequence of each chain is given continuously from left to right moving down the figure, but the chains are aligned to show maximal symmetry ofthe domains and maximal homology of the sequences. Residues identical in all chains are boxed. The sequences are aligned to place all half-cystineresidues and invariant tryptophan residues in register. Gaps have been inserted to maximize the homology. The first residue of the chain in thefigure is equivalent to Ser-6 in Fig. 1; the first residues for y, IA, E, and a are glutamate-233, aspartate-340, serine-321, and proline-240, respectively.Sources of sequence data are ref. 17 for y, ref. 18 for ,u, ref. 19 for E, and ref. 20 for a; the sequence and tentative numbering for the chain are fromthe present work.

In contrast to the yi chain, the Fc8(t) contains three glucos-amine oligosaccharides. The first one, at asparagine-68 (desig-nated GlcN 1), is shared by the yi chain and also by ,u and Echains. On the other hand, the second one, at asparagine-159(designated GlcN 2), appears to be characteristic of the 8 chain.It has no counterpart in any of the other classes. Although thelast glucosamine oligosaccharide, at asparagine-210 (designatedGlcN 3), is absent from the y, chain, it has a counterpart in theCOOH-terminal extra sequences of both ,u and a chains. Thelatter two heavy chains have the same acceptor sequence, Asn-Val-Ser; but the 8 chain has a different one, Asn-Ala-Ser. Thisinterchange in the triplet seems to be fortuitous because thereis little homology between 8 and 1L or between 8 and a in thetail piece (about 26% in each case), whereas that between , anda is about 58%. This suggests that there are small DNA seg-ments that either have already been attached to the CH3 domainexon or originally exist as discrete segments in introns and arejoined with the CH3 exon prior to expression into a functionaldomain gene.

It is a characteristic feature of the 8 chain that, whereas theother chains have 1 to 3 proline residues in the loop of the sec-ond half of the CH3 domain, there are 10 prolines in this loopin the 8 chain, suggesting that they may cause a large confor-mational distortion in this domain. This might expose twonearby hydrophobic regions-namely, 147-Ile-Leu-Leu-Met-Trp-Leu-152 and 177-Phe-Trp-Ala-Trp-180-and this will en-able the domain to interact with other components such as thoseof the cell membrane of B lymphocytes. This, together with an

extreme susceptibility to proteolytic enzymes, might enableIgD to play an essential role as a regulatory molecule in celldifferentiation during antibody synthesis. It is of interest that,when the sequence of residues 165 through 1.72 is comparedwith that of residues 186 through 193, 6 of the 8 are identical.This strongly suggests that these sequences have arisen by par-tial duplication during evolution.Some Implications of Sequence Homology. The Fc,(t) se-

quence is compared with the counterparts of IgGI (17), IgM(18), IgE (19), and IgA (20) in Fig. 2. Many gaps are introducedto obtain maximal symmetry of the domains and maximal ho-mology of the sequences among the chains. The hinge regionsare omitted from the comparisons because they have least iden-tity and that of the 8 chain has not yet been established. Fur-thermore, the hinge region is encoded separately in mouse yand a chains by discrete exons (22). There are 18 loci where allthe chains share common residues. These loci are distributednonuniformly within the loops of the domains, and none is pres-ent in the interdomain area. This suggests that they are highlyconserved throughout evolution, probably because they are re-lated to the basic framework structure and also to the functionsof immunoglobulins at the molecular level. As expected, thenumber of residues common to all human heavy chains ofknown sequence (yl, a, ,u, and E) is diminished by addition ofthe sequence of new chain. When the 8 chain is compared tothe others it is striking that the number of loci at which all thechains have a common residue is diminished from 30 to 18.There also are loci where 8 is identical to three of the four other

Biochemistry: Shinoda et aL

788 Biochemistry: Shinoda et al.

II a

Last domain Middle domain

FIG. 3. Phylogenetic tree of constant region domains of humanimmunoglobulins. This was derived from a matrix (not shown) basedon the sequence alignments in Fig. 2 by using the procedure of Dayhoff(23). Although the constant region of the 8, y, and a chains has threedomains, whereas the constant region of the , and E chains has fourdomains, the Fc fragment of all five chains has only two domains-thelast two in each chain. Here (and in the text) the term "middle" domainrefers to the first Fc domain of each chain (C, 3, Q2, CY2, C,3, and C02,respectively) and "last" refers to CA4, C.3, C,4, C 3, and C,3. Exceptfor 8, the middle and the last domains form two major branches. Exceptfor those of 8, the topological connections of the middle and the lastdomains will always be y with E and ku with a. The two domains of 8seem to form independent branches by themselves, but a significantcorrelation is seen between the middle domain of 8 and the counterpartof E and also between it and the last domain of ku. On the other hand,the last domain of 8 appears to have no counterpart in this scheme. Thetwo domains of 8 are separated from each other by 210 accepted pointmutations per 100 residues [PAMs (23)]. The corresponding values fory, u, E, and a are 208, 192, 278, and 235 PAMs, respectively. Becausethe 8 chain is placed in the middle, topological arrangements for J/a and y/E pairs are interchanged in the figure for both domains. Thebranch lengths shown do not correlate to an actual distance in PAMsin each pair but indicate only the topological connections. Dotted cir-cles between branches (C,2 and C,3, and C52 and C,A4) show high cor-relation among them.

chains but different from the fourth. At these loci, the a chainis different from the other four in seven instances, A chain infive, E in two, and y in one.

By comparison of the sequences in Fig. 2, a phylogenetic tree(Fig. 3) was constructed according to the procedure of Dayhoff(23). In this case, we compared each domain separately becausehomology between any pair of the counterpart domains variedconsiderably and because each domain might have differentevolutionary history (that is, it may reflect different amountsand rates of change). In general, except for the 8 chain, we ob-tained relatively good correlations in each pair between thecounterpart domains of the C region. For instance, for the ychain, the highest homology for the middle domain of the Cregion was obtained with the third domains of E and kk chains,and that for the last domain of y was with the last domains of

l, E, and a. Degrees of homology in these pairs are in the range34-36%. In these comparisons, highest homology was alwaysobtained with pairs of t/ca and E/y for both the domains. Thus,they are placed in pairs in the phylogenetic tree. On the otherhand, when we compared the two domains of Fc, with thoseof the Fc region of other chains, the figures were different. Asindicated by Fig. 3, the middle domain of the 8 chain (C,2) hashighest homology with the third domain of E (C,3), but C,2 ismore homologous with the last domains of both the y chain(Cr,3) and the 1i chain (CM4). Furthermore, C,2 is almost equallyhomologous to Ca2 and C,,3. These facts strongly suggest thatC82 has no counterpart either in y or A. Furthermore, the lastdomain of the 8 chain (C&3) has little homology with the last

domains of the other chains. Therefore, it is reasonable to as-sume that both C82 and C83 diverged independently from theircounterparts in the other chains. Thus, we placed these domainsof 8 as a separate branch in the tree (Fig. 3). The distance be-tween the two domains is approximately the same as those ineach class.

Origin and Evolution of Immunoglobulin Constant Do-mains. On inspection of the sequences shown in Fig. 2, it isnoted that there are three prominent gaps: (i) in the loop of thesecond domain involving residues 51-56, (ii) residues 111-118,just in the middle of the interdomain area, and (iii) residues162-178, in the middle of the last domain loop. In this way, theFc region of all chains is divided symmetrically into four seg-ments of similar size. In other words, each domain can be di-vided into two nearly equal parts, the first half and the secondhalf of the domain. Each segment has about 60 residues, andhas one half-cystine residue nearly in the middle. On compar-ison of this half-domain sequence, there are moderate degreesof internal homology in each pair of the segments within thesame domain (18-28%). These values are roughly the same asthose observed between any pair of domains.

Although there are many plausible ways to interpret thisfinding, one can make an assumption that these gaps were in-troduced through genetic events that had taken place in theearly history of evolution of a precursor immunoglobulin do-main-coding gene. Such events might have required either fu-sion of two discrete DNA segments or random duplication ofa half-domain-encoding DNA segment immediately adjacent toitself. Either of these mechanisms could have generated a newDNA segment, a primordial domain-encoding gene from whichboth the present-day light chain and heavy chain domain-en-coding genes are evolved.

_IJ

2a~

FIG. 4. Proposed model for the origin and evolution of constantdomain genes of the heavy chains. The model consists of three majorstages. (i) Steps 1 and 2 involve either duplication of element 1 (a half-domain-encoding DNA segment) immediately adjacent to itself, or fu-sion of two discrete DNA segments (24). The process could generatea primordial domain-encoding DNA segment. (ii) Steps 2-5 requiresuccessive duplications, from which basic four-domain DNA seg-ment(s) could arise in which a' is a duplicate of a and b' is a duplicateof b. (iii) Steps 6-8 include various rearrangements in which eachdomain rearranges either independently or restrictedly with either ofthe domain-encoding DNA segments. Inefficient rearrangement aswell as abortive rearrangement might be operative at a certain fre-quency in this stage. Furthermore, the hinge exon (designated H) andextra tail exons (designated E) which code for the extra tail sequencesof ,u and a and of 8 could be introduced through the process. As ananalogy to mouse heavy chain genes (22), certain lengths of interven-ing sequences might be present between domain-encoding DNA seg-ments as well as on both sides of the H exon and probably also the Eexon. An immunoglobulin gene designated Xr is an example for a pu-tative gene(s) generated by inefficient or abortive combinations amongdomain-encoding DNA segments. The proposed gene order does notnecessarily show the actual order in the human lymphocyte genome.

Proc. Natl. Acad. Sci. USA 78 (1981)

Proc. NatL. Acad. Sci. USA 78 (1981) 789

A proposed model to account for the mechanism is illustratedin Fig. 4. After the generation of the primordial domain-en-coding gene, successive duplications could have occurred im-mediately adjacent to themselves, to result in a precursor im-munoglobulin-encoding gene(s) with a four-domain structure(step 5 in Fig. 4). The next step requires stochastic rearrange-ment among the domain-encoding genes, in which each of themrearranged independently at a certain frequency. Besides manyinefficient and abortive rearrangements, there should have tobe some efficient rearrangements from which undifferentiatedprecursor genes for ,A/la, 3/r and E/y evolved. During theprocess of rearrangements, y, a, and 8 genes independentlydeleted the second domain-encoding gene, giving a three-do-main structure. The hinge was likely introduced into a, My, and8 in place of the second (or extra) domain, or it could be a re-sidual portion of a flanking sequence of a second domain-en-coding DNA segment. An additional rearrangement that wasprobably restricted might have been involved in the next step,and it eventually could have generated present-day embryonicimmunoglobulin genes. The model is applicable for all the im-munoglobulin classes so far identified and might also be appli-cable for any yet to be discovered.

We thank Dr. F. W. Putnam for giving us an opportunity to have thenecessary overlap for residues 178-181 and for his critical reading ofthe manuscript, S. Dorwin, J. Dwulet, E. Dodge, and J. Madison fortechnical assistance, and Dr. A. Yamada for general discussion. Thiswork was supported by a grant-in-aid from the Ministry of Educationof Japan and by research funds from the Ito Science Foundation.

1. World Health Organization (1972) Biochemistry 11, 3311-3312.2. Rowe, D. NV. & Fahey, J. L. (1965)]1 Exp. Med. 121, 171-184.3. Griffiths, R. W. & Gleich, G. J. (1972) J. Biol. Chein. 217,

4543-4548.

4. Spiegelberg, H. H., Prahl, J. WV. & Grey, H. M. (1970) Biochem-istry 9, 2115-2122.

5. Rowe, D. S., Hug, K., Forni, L. & Pernis, B. (1973)1. Exp. Med.138, 965-972.

6. Leslie, G. A. & Martin, L. N. (1978) Contemp. Top. Mol. Iminunol.7, 1-49.

7. Vitetta, E. S. & Uhr, J. WV. (1977) Imnunol. Rec. 37, 50-88.8. Spiegelberg, H. H. (1977) Immunol. Rev. 37, 1-24.9. Jefferis, R. & Matthews, J. B. (1977) Imnnunol. Rec. 37, 25-49.

10. Leslie, G. A., Clem, L. W. & Rowe, D. S. (1971) Immunochem-istry 8, 565-568.

11. Spiegelberg, H. H. (1975) Nature (London) 254, 723-725.12. Lin, L. -C. & Putnam, F. XV. (1979) Proc. Nati. Acad. Sci. USA 76,

6572-6576.13. Shinoda, T., Takahashi, N., Takavasu, T. & Shimizu, A. (1979)

Proc. Jpn. Immnunol. Soc. 9, 451 (abst.).14. Takahashi, N., Takavasu, T., Isobe, T., Shinoda, T., Okuvama,

T. & Shimizu, A. (1979)J Biochem. 86, 1523-1535.15. Shinoda, T. (1973)J. Biochem. 73, 417-431.16. Lin, L.-C. & Putnam, F. W. (1981) Proc. Natl. Acad. Sci. USA 78,

504-508.17. Edelman, G. M., Cunningham, B. A., Gall, W. E., Gottlieb, P.

O., Rutishauser, U. & Waxdal, M. J. (1969) Proc. Nati. Acad. Sci.USA 63, 78-85.

18. Putnam, F. WV., Florent, G., Paul, C., Shinoda, T. & Shimizu, A.(1973) Science 182, 287-291.

19. Bennich, H. & Bahr-Lindstrom, H. (1974) in Progress in Immnu-nology II, eds. Brent, L. & Holborow. J. (North-Holland. Am-sterdam), Vol. 1, pp. 49-58.

20. Low, T. L. K., Liu, Y. S. V. & Putnam, F. WV (1976) Science 191,390-392.

21. IUPAC-IUB Commission on Biochemical Nomenclature (1968)J. Biol. Chem. 243, 3557-3559.

22. Sakano, H., Rogers, J. H., Huppi, K., Brack, C., Toraunecker,A., Maki, R., Wall, R. & Tonegawa, S. (1979) Nature (London)277, 627-633.

23. Dayhoff, M. 0. (1972) Atlas of Protein Sequence and Structure(Natl. Biomed. Res. Found., Washington, DC), Vol. 5.

24. Shinoda, T. (1977) Rinsho Kagaku 13, 844-851.

Biochemistry: Shinoda et aL