1
Tumor Necrosis Factor Ligand Family TNF ligands are type II membrane proteins which belong to the C1q-TNF superfamily and signal through corresponding TNF receptors. Three putative TNF receptors have no known ligand, and this suggests that other ligands remain to be discovered. Most TNF domains are encoded by a single exon and bind one distinct TNF receptor, although there are exceptions to both rules. The currently known TNF ligand-receptor interactions and exon structures are shown below. Twenty-two structures of TNF and C1q structures are known, all of which have profound structural similarity among the ligands despite very poor sequence similarity (average pairwise identity is between ~ 9 and ~30%). Identifying TNFs by sequence- based methods is difficult because of the poor sequence conservation and their similarity to C1q proteins, which are not relevant to our interest in ligands for the orphan receptors. Mining for Novel TNF Ligands Using Unison, an Open Source Database for Target Discovery Reece Hart <[email protected]> Departments of Bioinformatics and Protein Engineering Genentech, Inc. San Francisco, CA 94080 Abstract Tumor Necrosis Factor (TNF) ligands, acting through their cognate TNF receptors, are critical to numerous immunological responses, including B and T cell differentiation, apoptosis, and inflammation. Several “orphan” TNF receptors exist for which the corresponding ligands are unknown. Over the past several years, we have undertaken attempts to identify these unknown ligands from curated protein sequences, six-frame translations of the human genome, and from pathogenic sequences. This poster summarizes these efforts and introduces Unison, an Open Source database for organizing and mining complex proteomic data. Mining Curated Sequence Databases We've mined public and proprietary sequence sources using many methods, including hidden Markov models and PSI-BLAST profiles from Pfam, CDD, Superfamily, and custom sequence- and structure-based alignments, and threading using Prospect (Xu and Xu) and ProHit (Sippl). The figures below outline one way to integrate and analyze these data in Unison. About Unison Unison is a database of non-redundant protein sequences, diverse computational predictions based on these sequences, and extensive auxiliary data which facilitate interpretations of the predictions. The intent is to provide an integrated resource for complex feature-based mining for target discovery and target elimination. Unison includes command line tools and a web interface. The schema, tools, web interface, and dumps of non-proprietary data have recently been released under the Academic Free License and are available at http://unison-db.sourceforge.net/ . Mining Six-Frame Translations of the Human Genome Most TNF ligands are encoded in the Human Genome with the majority of the TNF domain in a single exon. This suggests that it might be possible to detect novel TNFs by scanning naïve six-frame translations of ORFs. For calibration of scoring functions, we instead chose to scan fixed-length subsequences of 6-frame translations, as shown below. Mining Pathogenic Sequences for TNF-like Structures Because extensive expression cloning and computational prediction failed to identify a novel human TNF ligand which bound any of the orphan TNF receptors, we began to consider the possibility that these receptors might bind pathogenic proteins either as a surveillance mechanism or as an exploited “security hole” (as with herpes virus binding to HVEM, a TNF receptor). Recently, a new sequence appeared in Swiss-Prot which threads extremely well to TNF backbones and occurs in a virus known for its host evasion mechanisms. Acknowledgments Kiran Mukhyala and David Cavanaugh have contributed immensely to Unison. The TNF mining effort was a multi-year collaboration within Genentech and included: Vishva Dixit, Wayne Fairbrother, Sarah Hymowitz, Nobuhiko Kayagaki, Nick Skelton, Minhong Yan, and Zemin Zhang. Thanks to Genentech and William Wood for providing a great place to work. 1 2 3 4 5 6 7 X 8 9 10 11 12 13 14 15 16 17 18 19 20 Y 21 22 UCSC genome assembly (NHGD34) 450bp w/150bp overlap generates: 10 M fragments 60M 6-frame translations ~500M ORF fragments 27M fragments w/length ≥50AA ( ) fragments <50AA ( ) were discarded 27M fragments were threaded against 22 TNF superfamily members (TNF+C1q) 900K (of 27M) had score <=250; each was threaded against 3286 representative chains total time: 176 CPU-weeks (4 weeks on 22 2-cpu machines) X X Frequency Best raw score to any TNF SF member (lower is better) analyzed: 76 w/score ≤ -200 TBD: 166 w/score ≤ -120 (max TNF fragment score = -154) 8602 Distribution of Prospect2 raw scores histogram shows the distribution of the best (lowest) “raw” score for the alignment of each 150AA six-frame translation fragment to TNF-C1q superfamily backbones. Fragment 8602 is highlighted and shown as an example below. Unfortunately, only distinctly C1q-like proteins have been identified so far. 1 2 3 4 5 6 7 8 9 10 11 12 13 13B 14 15 18 50 100 0 150 200 250 Lta TNFa Ltb OX40L CD40L FasL CD27L CD30L 4-1BBL TRAIL RANKL TWEAK APRIL BLyS LIGHT VEGI AITRL EDA TNF Domain Exon TNF Family Exon Structure Most TNF domains are encoded within a single exon Six-Frame Translation and Threading Method Threading Results for Fragment 8602 looks more C1q-like than TNF-like, but close X X X X X X Fragment threading identifies NP_848635.1 Screenshots showing ambiguous alignment to different regions on chr 13. Threading of Unison:8602 to 1c28a Unison provides on-the-fly threading visualization via JMol, PyMOL, and RasMOL. (PyMOL is used below.) Legend: blue blue=identity; cyan cyan=similarity; red red=dissimilarity; yellow yellow=cysteine; yellow spacefill yellow spacefill= conserved cysteine; grey grey=query gap/template insert; >nAA< >nAA< = query insert/template gap Reasons for hope: VA28_MCV has a signal peptide and is known to be on viral coat; conditional mutants abolish entry MCV has numerous genes for host evasion, including homologs for a Death Effector Domain which inhibits caspase-8 (also found in HSV), IL18 BP, and MHC class I complex which may act as a decoy. There is a precedent for viral entry via TNFR: HSV enters via TNFRSF14/HveA/HVEM. MCV infects keratinocytes, which are known to express TNFR during their development Reasons for doubt: threading alignment has a significant deletion (but is nearly as good as other intra-TNF family alignments) A28 doesn't thread as well to other TNF backbones other A28s don't thread well to TNFs some viral capsid proteins also have a similar fold (but in RNA viruses) VA28_MCV does not appear to stimulate any of the orphan receptors. Non-orphans have not been tested. 1. Integrating multiple search methods A single Unison page allows users to select and integrate results from HMMs, PSSMs, and Prospect2 threadings to any family of models (TNFs in this case). “Hits” are then classified into true positives, false negatives, and “unknown” positives (candidates) by reference to a curated list of known family members. 5. On-the-fly re-threading of sequence 8602 to the TRAIL ligand viewed with RasMOL (PyMOL and JMol are also supported). 4. Genomic map. Unison contains rudimentary protein-to-genome alignments using BLAT. This sequence has a high- quality orthologous C-terminal fragment from mouse. Clicking the map opens an in-house viewer with more extensive genomic mapping data. Unison Contents >5M distinct sequences from >40 reliable and speculative sources covering >9900 species features and alignments from BLAST, PSI-BLAST, HMMER, Prospect threading, GPI anchoring, TM detection, signal prediction, cellular localization, genomic localization, regular expressions, CE alignments, and secondary structure prediction external databases: NCBI taxonomy, HomoloGene, GO, PDB (w/enumerated seqres- resid mapping), SCOP, MINT, Derwent Patent Database Conclusions and Directions We have identified several candidate TNF ligands among curated and speculative human sequence databases, six frame translations of the R34 release of the human genome, and pathogenic sequence, but none appear to bind the orphan TNF receptors. A large number of C1q-like sequences exist in the human genome. Unison has facilitated the management, update, and analysis of an enormous amount of diverse precomputed data. 1. Viral sequences sorted by the best TNF-C1q threading “raw” score. VA28_MCV is one of a family of orthologous A28 proteins in poxvirii. 2. Threading results for VA28_MCV aligned to 3286 FSSP representative backbones. TNF and C1q family members are among the best fold recognition templates. 3. For comparison, the alignment of Apo2L/TRAIL to the same FSSP representatives. The raw score for the alignment of VA28_MCV to 1gr3a, a TNF-C1q family member, is denoted by the red triangle () and is comparable to those for alignments of known TNFs to other TNF-C1q structures. 3B bp ≤150 AA six-frame translations 450 NT fragment C F D E A A' B B' G H A C F D E A' B B' G H 90º 120º 1aly (CD40L) 1tnf (TNFα) CE-generated alignment 141 aligned residues 2.2 Å RMSD (backbone) 26% Identity (c.f. 19% by S-W) c.f. 0.71 Å RMSD / 65 AA 0.78 Å RMSD / 48 AA 1aly-1c28a CD40L (1aly) structure-based alignment of two TNFs by CE 1aly, 1i9r 1tnf, 2tnf (mus) Others: 1c28, 1gr3 1tnr 1bzi* 1d0g 1d4v 1du3 1d2q,1dg6 1iqa, 1jtz 1jh5, 1kxg 9sgh NP Adapted from Bodmer, Schneider, Tschopp TiBS 27(1): 19-26 (2002). 3. Summary of features for Unison:8602. 4. A28 aligned to CD40L. Legend: blue blue=identity; cyan cyan=similarity; red red=dissimilarity; yellow yellow=cysteine; yellow spacefill yellow spacefill= conserved cysteine; grey grey=query gap/template insert; >nAA< >nAA< = query insert/template gap http://unison-db.sourceforge.net/ 2. Review candidates Clicking any of the classified results at left returns a list of distinct sequences with their “best” annotations. = mouse click X

Mining for Novel TNF Ligands

Embed Size (px)

Citation preview

Page 1: Mining for Novel TNF Ligands

Tumor Necrosis Factor Ligand FamilyTNF ligands are type II membrane proteins which belong to the C1q-TNF superfamily and signal through corresponding TNF receptors. Three putative TNF receptors have no known ligand, and this suggests that other ligands remain to be discovered. Most TNF domains are encoded by a single exon and bind one distinct TNF receptor, although there are exceptions to both rules. The currently known TNF ligand-receptor interactions and exon structures are shown below.

Twenty-two structures of TNF and C1q structures are known, all of which have profound structural similarity among the ligands despite very poor sequence similarity (average pairwise identity is between ~ 9 and ~30%). Identifying TNFs by sequence-based methods is difficult because of the poor sequence conservation and their similarity to C1q proteins, which are not relevant to our interest in ligands for the orphan receptors.

Mining for Novel TNF Ligands Using Unison,an Open Source Database for Target Discovery

Reece Hart <[email protected]> Departments of Bioinformatics and Protein Engineering Genentech, Inc. San Francisco, CA 94080

AbstractTumor Necrosis Factor (TNF) ligands, acting through their cognate TNF receptors, are critical to numerous immunological responses, including B and T cell differentiation, apoptosis, and inflammation. Several “orphan” TNF receptors exist for which the corresponding ligands are unknown. Over the past several years, we have undertaken attempts to identify these unknown ligands from curated protein sequences, six-frame translations of the human genome, and from pathogenic sequences. This poster summarizes these efforts and introduces Unison, an Open Source database for organizing and mining complex proteomic data.

Mining Curated Sequence DatabasesWe've mined public and proprietary sequence sources using many methods, including hidden Markov models and PSI-BLAST profiles from Pfam, CDD, Superfamily, and custom sequence- and structure-based alignments, and threading using Prospect (Xu and Xu) and ProHit (Sippl). The figures below outline one way to integrate and analyze these data in Unison.

About UnisonUnison is a database of non-redundant protein sequences, diverse computational predictions based on these sequences, and extensive auxiliary data which facilitate interpretations of the predictions. The intent is to provide an integrated resource for complex feature-based mining for target discovery and target elimination. Unison includes command line tools and a web interface. The schema, tools, web interface, and dumps of non-proprietary data have recently been released under the Academic Free License and are available at http://unison-db.sourceforge.net/ .

Mining Six-Frame Translations of the Human GenomeMost TNF ligands are encoded in the Human Genome with the majority of the TNF domain in a single exon. This suggests that it might be possible to detect novel TNFs by scanning naïve six-frame translations of ORFs. For calibration of scoring functions, we instead chose to scan fixed-length subsequences of 6-frame translations, as shown below.

Mining Pathogenic Sequences for TNF-like StructuresBecause extensive expression cloning and computational prediction failed to identify a novel human TNF ligand which bound any of the orphan TNF receptors, we began to consider the possibility that these receptors might bind pathogenic proteins either as a surveillance mechanism or as an exploited “security hole” (as with herpes virus binding to HVEM, a TNF receptor). Recently, a new sequence appeared in Swiss-Prot which threads extremely well to TNF backbones and occurs in a virus known for its host evasion mechanisms.

AcknowledgmentsKiran Mukhyala and David Cavanaugh have contributed immensely to Unison.

The TNF mining effort was a multi-year collaboration within Genentech and included: Vishva Dixit, Wayne Fairbrother, Sarah Hymowitz, Nobuhiko Kayagaki, Nick Skelton, Minhong Yan, and Zemin Zhang.

Thanks to Genentech and William Wood for providing a great place to work.

1 2 3 4 5 6 7 X 8 9 10 11 12 13 14 15 16 17 18 19 20 Y 21 22

● UCSC genome assembly (NHGD34)● 450bp w/150bp overlap generates:

– 10 M fragments– 60M 6-frame translations– ~500M ORF fragments– 27M fragments w/length ≥50AA ( )– fragments <50AA ( ) were discarded

● 27M fragments were threaded against 22 TNF superfamily members (TNF+C1q)

● 900K (of 27M) had score <=250; each was threaded against 3286 representative chains

● total time: 176 CPU-weeks (4 weeks on 22 2-cpu machines)

X

X

Frequen

cy

Best raw score to any TNF SF member (lower is better)

analyzed: 76 w/score ≤ -200

TBD: 166 w/score ≤ -120(max TNF fragment score = -154)

8602

Distribution of Prospect2 raw scoreshistogram shows the distribution of the best (lowest) “raw” score for the alignment of each 150AA six-frame translation fragment to TNF-C1q superfamily backbones. Fragment 8602 is highlighted and shown as an example below.

Unfortunately, only distinctly C1q-like proteins have beenidentified so far.

1234567891011121313B141518

50 1000 150 200 250

LtaTNFa

LtbOX40LCD40L

FasLCD27LCD30L

4­1BBLTRAIL

RANKLTWEAK

APRILBLyS

LIGHTVEGI

AITRLEDA

TNF DomainExon

TNF Family Exon StructureMost TNF domains are encoded within a single exon

Six-Frame Translation and Threading Method

Threading Results for Fragment 8602looks more C1q-like than TNF-like, but close

XXX

XX X

Fragment threading identifies NP_848635.1Screenshots showing ambiguous alignment to different regions on chr 13.

Threading of Unison:8602 to 1c28aUnison provides on-the-fly threading visualization via JMol, PyMOL, and RasMOL. (PyMOL is used below.)Legend: blueblue=identity; cyancyan=similarity; redred=dissimilarity; yellowyellow=cysteine; yellow spacefillyellow spacefill= conserved cysteine; greygrey=query gap/template insert; >nAA<>nAA< = query insert/template gap

Reasons for hope:● VA28_MCV has a signal peptide and is known to be on

viral coat; conditional mutants abolish entry● MCV has numerous genes for host evasion, including

homologs for a Death Effector Domain which inhibits caspase-8 (also found in HSV), IL18 BP, and MHC class I complex which may act as a decoy.

● There is a precedent for viral entry via TNFR: HSV enters via TNFRSF14/HveA/HVEM.

● MCV infects keratinocytes, which are known to express TNFR during their development

Reasons for doubt:● threading alignment has a significant deletion (but is

nearly as good as other intra-TNF family alignments)● A28 doesn't thread as well to other TNF backbones● other A28s don't thread well to TNFs● some viral capsid proteins also have a similar fold

(but in RNA viruses)● VA28_MCV does not appear to stimulate any of the

orphan receptors. Non-orphans have not been tested.

1. Integrating multiple search methods

A single Unison page allows users to select and integrate results from HMMs, PSSMs, and Prospect2 threadings to any family of models (TNFs in this case). “Hits” are then classified into true positives, false negatives, and “unknown” positives (candidates) by reference to a curated list of known family members.

5. On-the-fly re-threading of sequence 8602 to the TRAIL ligand viewed with RasMOL (PyMOL and JMol are also supported).

4. Genomic map.

Unison contains rudimentary protein-to-genome alignments using BLAT. This sequence has a high-quality orthologous C-terminal fragment from mouse. Clicking the map opens an in-house viewer with more extensive genomic mapping data.

Unison Contents● >5M distinct sequences from >40 reliable and speculative sources covering >9900

species● features and alignments from BLAST, PSI-BLAST, HMMER, Prospect threading, GPI

anchoring, TM detection, signal prediction, cellular localization, genomic localization, regular expressions, CE alignments, and secondary structure prediction

● external databases: NCBI taxonomy, HomoloGene, GO, PDB (w/enumerated seqres-resid mapping), SCOP, MINT, Derwent Patent Database

Conclusions and Directions● We have identified several candidate TNF ligands among curated and speculative

human sequence databases, six frame translations of the R34 release of the human genome, and pathogenic sequence, but none appear to bind the orphan TNF receptors.

● A large number of C1q-like sequences exist in the human genome.● Unison has facilitated the management, update, and analysis of an enormous amount

of diverse precomputed data.

1. Viral sequences sorted by the best TNF-C1q threading “raw” score.

VA28_MCV is one of a family of orthologous A28 proteins in poxvirii.

2. Threading results for VA28_MCV aligned to 3286 FSSP representative backbones. TNF and C1q family members are among the best fold recognition templates.

3. For comparison, the alignment of Apo2L/TRAIL to the same FSSP representatives. The raw score for the alignment of VA28_MCV to 1gr3a, a TNF-C1q family member, is denoted by the red triangle (▶) and is comparable to those for alignments of known TNFs to other TNF-C1q structures.

3B bp

≤150 AA six-frame translations

450 NT fragment

CF

DE

A

A'

BB'

G

H

A

CF

DE

A'

B

B'

G

H

90º

120º

1aly (CD40L)1tnf (TNFα)

CE-generated alignment141 aligned residues2.2 Å RMSD (backbone)26% Identity (c.f. 19% by S-W)c.f. 0.71 Å RMSD / 65 AA

0.78 Å RMSD / 48 AA 1aly-1c28a

CD40L (1aly) structure-based alignment of two TNFs by CE

1aly

, 1

i9r

1tn

f,2

tnf

(mu

s)

Oth

ers

:1

c28

, 1

gr3

1tn

r

1b

zi*1d

0g

1d4

v 1

du

31

d2

q,1

dg6

1iq

a, 1

jtz

1jh

5, 1

kxg

9sg

h

NP

Adapted from Bodmer, Schneider, TschoppTiBS 27(1): 19-26 (2002).

3. Summary of features for Unison:8602.

4. A28 aligned to CD40L.Legend: blueblue=identity; cyancyan=similarity; redred=dissimilarity; yellowyellow=cysteine; yellow spacefillyellow spacefill= conserved cysteine; greygrey=query gap/template insert; >nAA<>nAA< = query insert/template gap

http://unison-db.sourceforge.net/

2. Review candidates

Clicking any of the classified results at left returns a list of distinct sequences with their “best” annotations.

= mouse click

X