STRUCTURE DETERMINATION AND BIOCHEMICAL CHARACTERIZATION OF NOVEL HUMAN ... · PDF fileii STRUCTURE DETERMINATION AND BIOCHEMICAL CHARACTERIZATION OF NOVEL HUMAN UBIQUITIN-LIKE DOMAINS

STRUCTURE DETERMINATION AND BIOCHEMICAL CHARACTERIZATION OF NOVEL HUMAN

UBIQUITIN-LIKE DOMAINS.

by

Ryan Steven Doherty

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Graduate Department of Medical Biophysics

University of Toronto

© Copyright by Ryan Steven Doherty 2015

ii

STRUCTURE DETERMINATION AND BIOCHEMICAL

CHARACTERIZATION OF NOVEL HUMAN

UBIQUITIN-LIKE DOMAINS

Ryan Steven Doherty

Doctor of Philosophy

Department of Medical Biophysics University of Toronto

2015

Abstract

The ubiquitin fold acts as a signaling modulator associated with regulating, trafficking, and

degrading proteins. The human genome encodes 398 ubiquitin-like domains (UBLs), of which a

couple dozen may act as covalent modifiers. Ubiquitin and ubiquitin-like domains have been

implicated in a number of malignancies, neuromuscular disorders, neurodegenerative disorders

and other human illnesses. Identifying the structural effects of sequence variations between

different ubiquitin-like homologues will provide insight into their varied functional pathways, since

the role of ubiquitin-like modifiers is typically mediated by protein-protein interactions. Structure

determination and analyses of ubiquitin-like homologues facilitates residue mapping and

comparative analysis of protein-protein interaction sites, which provide insight into the many roles

that ubiquitin-like homologues play in cellular processes. The aim of this thesis was to develop a

framework through which complete structural coverage of all human ubiquitin-like domains could

be achieved. To accomplish this, I defined the human ubiquitin-like fold family, identified ubiquitin-

like domain constructs amenable for NMR structure determination, solved two structures

iii

(NFATc2IP & ubiquilin-1) and characterized associated binding partners, and created a data

resource for human ubiquitin-like domains that enables clustering and associating protein

structures with physicochemical features and cellular function. I also collaborated with the North-

East Structural Genomics consortium (NESG) and the Structural Genomics Consortium (SGC),

through which the molecular structures of 17 ubiquitin-like domains were determined using

nuclear magnetic resonance (NMR) experiments and X-ray crystallography. Comparative

analysis of structurally characterized ubiquitin-like folds revealed potential interaction partners

with regions similar to known ubiquitin and SUMO interacting domains. Potential interaction

partners for NFATC2IP and ubiquilin-1 were validated experimentally using NMR titration

experiments. Comparative analysis of structural features of all ubiquitin-like homologues

facilitates further studies into the mechanisms of the ubiquitylation system, predicted protein-

protein interactions, and the identification of functional pathways associated with uncharacterized

ubiquitin-like domains.

iv

Acknowledgements

I would like to thank my supervisor Cheryl Arrowsmith for her ongoing support, advice and

mentorship over the years. I also appreciate the guidance and knowledge shared by my

supervisory committee members: Sirano Dhe-Paganon, Brian Raught, Jane McGlade, and

Zhaolei Zhang. I would also like to recognize the efforts and support from members of the

Arrowsmith lab, past and present, especially Adelinda Yee, Shili Duan, Scott Houliston, Sasha

Lemak, Aleks Gutmanas, Christophe Fares, Yi Sheng, Lilia Kaustov, Bin Wu, Seth Chitayat,

Sampath Srisailam, Murthy Karra, Jonathan Lukin, Natalie Nady, Jack Liao, Rob Laister, Melissa

Ho, Tony Semesi, and Maite Garcia.

This thesis would not have been possible without collaborations. For this reason, I would like to

thank Gaetano Montelione, John Everett, Mani Ravichandran, Yufeng Tong, Masoud Vedadi,

David Yim and Raymond Hui for their time, resources, feedback and help in key aspects of this

project.

I would also like to thank members of various University of Toronto communities who have

encouraged, supported and worked alongside me throughout this endeavor: Medical Biophysics

Graduate Student Association, 89 Chestnut Residence, Massey College, Massey Grand Rounds,

and Impact Centre.

Finally, I thank my family and friends for their patience, love and understanding. It is to you that I

dedicate this thesis.

v

Table of Contents

Abstract

Acknowledgements

Table of Contents

List of Tables

List of Figures

List of Appendices

List of Abbreviations

Chapter 1 - Introduction

1.1 Overview

1.2 Biological Significance of ubiquitin & ubiquitin-like modifiers

1.3 Protein Modification & ubiquitin

1.4 The ubiquitin Fold

1.5 Ubiquitin-like domains (UBLs)

1.6 Ubiquitin-like modifiers (UBM)

1.7 Ubiquitin-like structural domains

1.8 Ubiquitin Conjugation Cascade

1.9 Ubiquitin-binding domains & interactions

1.9.1 Ubiquitin Interacting Motif (UIM)

1.9.2 Coupling of Ubiquitin conjugation to Endoplasmic Reticulum Degradation (CUE)

1.9.3 Ubiquitin-Associated Domain (UBA)

1.9.4 Ubiquitin Conjugating Enzyme Variant (UEV)

1.9.5 Npl4 Zing Finger Motif (NZF)

1.9.6 GGA And Tom1 Domain (GAT)

1.9.7 Other Ubiquitin Binding Domains

1.9.8 SUMO Interacting Motif (SIM)

ii

iv

v

x

xi

xiii

xiv

1

1

2

2

3

4

6

9

9

11

11

12

12

12

13

13

13

13

vi

Table of Contents (continued)

1.9.9 Diversity among Ubiquitin-Binding Domains

1.10 Thesis Overview

1.10.1 Identify and obtain near-complete structural coverage of all human ubiquitin-like domains.

1.10.2 Exploring NFATc2IP:NFATc2 & ubiquilin-1:PIN2 protein-protein interactions

Chapter 2 - The Ubiquitin Fold: Leveraging structural genomics

2.1 Summary

2.2 Introduction

2.3 Methods

2.3.1 Identifying human ubiquitin-like domains

2.3.2 Validating putative human ubiquitin-like domains

2.3.3 Target selection

2.3.4 Construct design

2.3.5 Sample preparation

2.3.6 1H15N-HSQC screening of ubiquitin-like domains

2.4 Results & Discussion

2.4.1 Identifying unannotated human ubiquitin-like domains

2.4.2 Small-Scale Screening

2.4.3 Screening by 1H15N-HSQC

2.4.4 Structural Coverage - Completing the UBL Phylogenetic Tree

2.5 Conclusion

Chapter 3 - Solution NMR structure determination of human ubiquitin-like domains in NFATc2IP & ubiquilin-1

3.1 Introduction

3.1.1 NFATc2IP

3.1.2 Ubiquilin-1

14

15

15

15

17

18

18

21

21

22

24

25

26

26

27

27

29

29

31

36

37

38

38

39

vii


3.1.3 Ubiquitin-like Fold

3.2 Experimental Procedures

3.2.1 NFATc2IP UBL domain NMR structure determination

3.2. 2 Ubiquilin-1 UBL domain NMR structure determination

3.2. 3 Comparative analysis of ubiquilin-1, NFATc2IP, ubiquitin & SUMO2

3.2. 4 Protein-protein interaction partner identification

3.2. 5 Binding interface analysis


3.3.1 Structure determination

3.3.2 Comparative analysis of ubiquilin-1, NFATc2IP & similar ubiquitin-like modifiers

3.3.2.1 Similar canonical ubiquitin-like modifiers: ubiquitin & SUMO-2

3.3.2.2 Structural comparison between ubiquilin-1 & NFATc2IP

3.3.2.3 Structural comparison between ubiquilin-1 & ubiquitin

3.3.2.4 Structural comparison between NFATc2IP & SUMO2

3.3.2.5 Structural differences between NFATc2IP_2nd & SUMO2

3.3.3 From Structure to Function: Exploring Protein-Protein Interactions involving ubiquitin-like domains

3.3.3.1 The ubiquitin-Interacting Motif interaction interface

3.3.3.2 Putative UIM Interaction Interface: Conserved Amino Acids

3.3.3.3 Putative UIM Interaction Interface: Similar Electrostatic Potential Distribution

3.3.3.4 Surveying Known UIM-Binding Partners

3.3.3.5 PIN1 – Peptidyl-Prolyl cis/trans Isomerase

3.3.3.6 Identifying a putative UIM in PIN1

3.3.3.7 Ubiquilin-1 & PIN1 NMR Titration

39

40

40

41

44

46

46

47

47

52

53

53

55

57

58

59

59

62

63

64

67

67

68

viii


3.3.3.8 Analysis of the ubiquilin-1 & PIN1 interface

3.3.4 Binding-Partner Driven - Structural analysis of the SUMO-Interacting Motif binding interface

3.3.4.1 NFATc2IP Binding Partners

3.3.5 SUMO-Interacting Motif

3.3.5.1 Identifying putative SIMs in NFATc2

3.3.6 NFATc2IP:NFATc2 NMR titration

3.3.6.1 Analysis of the NFATc2IP:NFATc2 interface

3.4 Conclusion

Chapter 4 - Exploring UBLs & UBL-Interaction Motifs: Computational & Experimental analysis of ubiquilin, NFATc2IP, UIMs and SIMs.

4.1 Introduction

4.1.1 Database & comparative analysis

4.1.1.1 Similarities & differences between model family members

4.1.1.2 Common defining features for each modelling family


4.2.1 UBL Database Development

4.2.2 Relating 17 structurally determined UBLs to nearest neighbours and model families

4.2.3 Secondary structure prediction & analysis

4.2.4 Relating structural features to functional pathways

4.3 Results

4.3.1 Structurally characterized ubiquitin-like domains

4.3.2 Nearest-neighbours of ubiquitin-like domains

4.3.3 Nearest-neighbours of structurally characterized UBMs

4.3.4 Grouping UBLs based on biological processes and molecular function

70

71

71

72

72

74

75

77

78

79

79

80

80

81

81

82

83

83

84

84

85

86

89

ix


4.3.5 Grouping UBLs based on medical significance

4.3.5.1 Cellular localization

4.3.6 Grouping UBLs based on cell localization

4.4 Conclusion

Chapter 5 - Conclusion and Future Directions

5.1 Conclusions

5.2 Future Directions

5.2.1 Ubiquitin-like domain fold, NFATc2IP & ubiquilins

5.2.2 Ubiquitin-like domain structural genomics

5.2.3 Protein Domain family analyses

5.3 Concluding remarks

Chapter 6 - References

91

92

93

95

96

96

97

97

98

98

98

99

x

List of Tables

Table 1.1: List of 18 annotated ubiquitin-like modifiers, and associated enzymatic complement, substrates and functional pathways.

Table 1.2: Protein-protein interaction modes structurally characterized with experimentally determined binding affinities between UBLs and binding partners.

Table 2.1: Summary of small-scale expression screening of human ubiquitin-like domains structurally characterized and deposited in the PDB as part of this thesis.

Table 2.2: Summary of 1H15N-HSQC screening results for human ubiquitin-like domains. 10 ubiquitin-like domains were solved by NMR (red), and 7 ubiquitin-like domains were solved by X-ray crystallography (blue).

Table 2.3: All human ubiquitin-like domains that remain to be structurally determined, along with their most similar protein structure and biological significance.

Table 3.1: NMR data and refinement statistics.

Table 3.2: Secondary structure elements of NFATc2IP, ubiquilin-1, ubiquitin and SUMO1/2/3.

Table 3.3: Sequence similarity & identity between NFATc2IP, ubiquilin-1, ubiquitin and SUMO1/2/3/4.

Table 3.4: UIM:ubiquitin complexes deposited in the PDB, along with UIM sequence.

Table 3.5: Human proteins that contain at least one canonical UIM motif and observed to interact with ubiquitin, along with the number of supporting publications and supporting structural complexes that have been deposited in the PDB.

Table 3.6: Human proteins that contain at least one canonical UIM motif and observed to interact with members of the ubiquilin family (Turner et al., 2010).

Table 3.7: 17 human proteins that interact with both human ubiquitin and a member of the ubiquilin family, and that also contain at least one UIM motif.

Table 3.8: UIM motif and 4 variations of the UIM motif were used to identify 17 human proteins that interact with both human ubiquitin and a member of the ubiquilin family.

Table 4.1: Data sources for ubiquitin-like domain repository.

Table 4.2: Biological significance and functional annotation for each of the 17 ubiquitin-like domains structurally characterized for this project.

Table 4.3: Tissue and cell localization for each of the 17 UBL structurally characterized for this project.

Table 4.4: Structural alignment of lysines within ubiquitin and ubiquitin-like domains characterized within both cytoplasm and ER; nucleus, cytoplasm and ER; and only nucleus.

8

14

29

30

33

48

50

53

59

64

64

65

66

82

90

92

94

xi

List of Figures

Figure 1.1: Ribbon & molecular surface representations of the ubiquitin.

Figure 1.2a: Phylogenetic tree of known ubiquitin-like domains in 2006.

Figure 1.2b: Phylogenetic tree of known ubiquitin-like domains in 2015.

Figure 1.3: Ubiquitin-like modifier conjugation cascade.

Figure 1.4: Ubiquitin conjugation cascade.

Figure 2.1: Novel UBL discovery process.

Figure 2.2: Secondary & tertiary structures of Human ubiquilin-1.

Figure 2.3: Pseudo-multiple sequence alignment of human ubiquilin-1.

Figure 2.4: UBL target selection, preparation and screening process.

Figure 2.5: Pseudo-multiple sequence alignment of ubiquilin-1 for construct design.

Figure 2.6: Distribution of structurally characterized and uncharacterized UBLs.

Figure 2.7: Examples of 1H15N-HSQC screening results for human UBLs.

Figure 2.8: Clustering of human UBLs into groups based on sequence similarity.

Figure 3.1: Secondary structure and H-bond patterns of ubiquilin-1.

Figure 3.2: Secondary structure and H-bond patterns of NFATc2IP.

Figure 3.3: Ribbon diagrams of ubiquilin-1, NFATc2IP, ubiquitin, SUMO1, SUMO2 & SUMO3.

Figure 3.4: Molecular surfaces of ubiquilin-1.

Figure 3.5: Molecular surfaces of NFATc2IP.

Figure 3.6: UIM-interaction interface of ubiquilin-1 and NFATc2IP.

Figure 3.7: Similarities between ubiquilin-1 and NFATc2IP.

Figure 3.8: Similarities between ubiquilin-1 and ubiquitin.

Figure 3.9: Similarities between NFATc2IP and SUMO2.

Figure 3.10: UIM -helices from PSMD4, VPS27 and HGS.

Figure 3.11: Ubiqutin:PSMD4(UIM) complex.

3

4

5

10

10

21

22

23

24

25

28

30

31

49

49

50

51

51

52

54

56

58

60

61

xii

List of Figures (continued)

Figure 3.12: UBL residues within UIM-interaction interface.

Figure 3.13: Multiple sequence alignment of UBLs from ubiquilin family members.

Figure 3.14: Similarity tree based on electrostatic potential within 4 Å of UIM-binding interface.

Figure 3.15: Sequence alignment of UIMs within PSMD4, DNJB2, EPN1 and PIN1.

Figure 3.16: Putative human PIN1 UIM.

Figure 3.17: Ubiquilin-1:PIN1 NMR titration.

Figure 3.18: Putative ubiquilin-1:PIN1 interaction.

Figure 3.19: NFATc2 SUMO Interacting Motifs.

Figure 3.20: Diversity of SIM motifs.

Figure 3.21: NFATc2IP:NFATc2 NMR titration.

Figure 3.22: Electrostatic potential of NFATc2IP & SUMO2.

Figure 3.23: Electrostatic potential diversity between similar UBLs.

Figure 4.1: Database schema of ubiquitin-like domain repository.

Figure 4.2: Secondary & tertiary structures of 17 structurally characterized UBLs.

Figure 4.3: Nearest-neighbour clustering of UBLs displayed with proportional transformed branches.

Figure 4.4: UBLs with a structural fold similar to FUBI-1.

Figure 4.5: UBLs with a structural fold similar to the second UBL of ISG15.

Figure 4.6: UBLs with a structural fold similar to SF3A1.

Figure 4.7: Distribution of human UBLs based on cellular localization.

62

62

63

65

67

69

70

73

73

74

75

76

81

84

85

86

87

88

93

xiii

List of Appendices

Appendix I: All human genes that encode at least one ubiquitin-like domain.

Appendix II: All human genes and isoforms that encode ubiquitin-like domains.

Appendix III: 205 proteins observed to interact with both ubiquitin and at least one member of the ubiquilin family.

Appendix IV: 127 putative UIM sequences within 106 proteins that interact with both ubiquitin and at least one member of the ubiquilin family.

Appendix V: Six similarities trees of ubiquitin-like domains clustered based on electrostatic potential at varying distances (1 Å to 6 Å) from the UIM-binding interface, along with groups of ubiquitin-like domains that share strong electrostatic potential similarity at that specific range.

113

119

131

133

137

xiv

List of Abbreviations

AESOP Analysis of electrostatic similarities of proteins

CUE Coupling of ubiquitin conjugation to endoplasmic reticulum degradation

DUB De-ubiquitylating enzyme

DUIM Double-sided ubiquitin interacting motif

E1 Ubiquitin activating enzyme

E2 Ubiquitin conjugating enzyme

E3 Ubiquitin protein ligase

GAT GGA and Tom1 domain

GLUE GRAM-like ubiquitin binding in Eap45

IPTG Isopropyl-1-thio-D-galactopyranoside

MIU Motif interacting with ubiquitin

NESG North-east structural genomics consortium

NFAT Nuclear factor of activated T-cells

NMR Nuclear magnetic resonance

NZF Npl4 Zing Finger Motif

PAZ Polyubiquitin associated zinc finger

PE Phosphatidylethanolamine

PIN1 Peptidyl-prolyl cis/trans isomerase

PSSM Position-specific scoring matrix

SGC Structural genomics consortium

SIM SUMO interacting motif

UBA Ubiquitin-associated domain

UBD Ubiquitin-binding domain

UBL Ubiquitin-like domain

UBM Ubiquitin-like modifier

xv

List of Abbreviations (continued)

UEV Ubiquitin conjugating enzyme variant

UIM Ubiquitin interacting motif

VHS Vps27,Hrs,STAM

1

Chapter 1

Introduction

1.1 Overview

Ubiquitin, the original member of the ubiquitin-fold superfamily, is a highly conserved 76 residue

regulatory protein found in all eukaryotic cells. It was initially characterized as a post-translational

modification moiety that mediates ATP-dependent proteolytic degradation, yet has since been

recognized as a signaling modulator with multiple regulatory roles mediated by transient protein-

protein interactions. My research focuses on the similarities and variations between human

ubiquitin-like domains, and their influence on protein-protein interactions. My goal is to define the

family of ubiquitin-like domains in the human proteome and to understand the extent of the

diversity of amino acids within the protein-protein interaction interfaces of the ubiquitin-like

domain, and the insights into their functional pathways. The first chapter provides an introduction

to ubiquitin and ubiquitin-like domains, as well as a rationale for the aims of this thesis. Chapter

Two discusses structural genomics approaches that were implemented to facilitate the

experimental screening and determination of 17 human ubiquitin-like domains for this project.

Chapter Three describes the structure determination of the second ubiquitin-like domain of

NFATc2IP and the ubiquitin-like domain of ubiquilin-1, and introduces approaches for predicting

functional activity by combining their structural data with information about other ubiquitin-like

domains. This chapter also examines protein-protein interactions that were predicted between

NFATc2IP and NFATc2 through a predicted SIM-like interaction, as well as interactions between

ubiquilin-1 and PIN1 through a predicted UIM-like interaction. Chapter Four combines additional

analyses with the lessons learned from Chapters two and three to facilitate analyses and

predictions related to the set of human ubiquitin-like domains associated with the 17 ubiquitin-like

domains that were structurally characterized as part of this thesis. The final chapter of the

dissertation discusses the significance of these findings, relating observations to the entire human

2

ubiquitin-like domain superfamily, in addition to providing future directions and concluding

remarks.

1.2 Biological significance of ubiquitin & ubiquitin-like modifiers

Conjugation of ubiquitin and ubiquitin-like modifiers is necessary for the regulation and

translocation of proteins. Ubiquitin conjugation, also referred to as ubiquitylation, has been

implicated in having a regulatory role in cellular processes, such as protein degradation, cell cycle

control, transcription regulation, DNA damage repair, antigen processing, activation of

transcriptional factors and kinases, endocytosis, protein sorting, membrane trafficking, and stress

response (Haglund et al., 2005). Ubiquitylation is also involved in biological functions, such as

inflammation, cellular differentiation, and silencing the inactive X chromosome in female

mammals (de Napoles et al., 2004). The disruption of ubiquitin conjugation pathways has been

associated with various human illness, ranging from neurodegenerative disorders, developmental

abnormalities, autoimmune diseases, neuromuscular disorders and malignancies (Ciechanover

et al., 2004). UBMs are also involved in a variety of biological processes, including pathogenesis

of viruses and bacteria. Some UBMs protect against viruses, while other viruses depend on UBMs

for survival; and some bacteria effectors target ubiquitylation machinery (Angot et al., 2007).

1.3 Protein modification & ubiquitin

In 1975, ubiquitin was discovered and initially identified as a tag for targeted proteasomal

degradation (Schlesinger et al., 1975). Proteins are targeted for proteasomal degradation through

a process referred to as ubiquitylation, which involves covalent modification of a surface exposed

lysine by ubiquitin. It is a highly conserved 76 residue protein found only in eukaryotic cells.

Within humans, there are four genes that encode ubiquitin as two distinct gene classes: a poly-

Ub gene that encodes a precursor protein with tandemly repeated ubiquitin domains (ie. UBB and

UBC), and fusion precursor proteins in which a single ubiquitin domain is linked to a ribosomal

protein (ie. RPS27a and UBA52). The ubiquitin region of all four genes are entirely conserved,

3

suggesting that mutations are negatively selected. The covalent association between ubiquitin

with ribosomal proteins has been suggested to promote their association with ribosomes (Finley

et al., 1989). This is an interesting attribute, since the putative UBM FAU is also fused to a

ribosomal protein and the gene structure could relate to the functional activity of the protein.

1.4 The ubiquitin fold

Figure 1.1: Ribbon & molecular surface representations of ubiquitin. The secondary structure elements and

molecular surface of the ubiquitin fold are displayed from two orientations with conserved lysine amino acids displayed as cyan ball and stick representation.

The ubiquitin-fold consists of a 5-strand mixed -sheet that is intercalated by a 2-helix -helical

core (Figure 1.1). There are 5 key structural features of ubiquitin that are associated with its

biological activity: the C-terminal -RLRGG peptide, 7 lysine residues that could be involved in

poly-ubiquitin chain formation (Komander et al., 2009), a conserved leucine 8 / isoleucine 44 /

valine 70 triad involved in E1 and ubiquitin-binding domain interactions, histidine 68 involved in

E1-ubiquitin thioester formation, and protein-protein interaction interfaces associated with

interactions with ubiquitin-binding domains that regulate a variety of downstream molecular

pathways. These structural features were used when performing comparative analyses of UBLs.

4

1.5 Ubiquitin-like domains (UBLs)

Figure 1.2a: Phylogenetic tree of known ubiquitin-like domains in 2006. There were 78 protein domains classified as human ubiquitin-like domains in 2006, of which 18 were known ubiquitin-like modifiers (blue) and 7 domains were putative ubiquitin-like modifiers based on sequence features (orange). Ubiquilin1 & NFATc2IP are highlighted with red

arrows, because they play a significant role in this dissertation.

5

Figure 1.2b: Phylogenetic tree of known ubiquitin-like domains in 2015. There are 448 human ubiquitin-like

domains within human proteins identified through bioinformatics techniques described in this thesis; 18 of the domains are known ubiquitin-like modifiers [ : ATG8, FAU_1-1, ISG15_1-2, NEDD8_1-1, SUMO1_1-1, SUMO1_2-1, SUMO2_1-1,

SUMO2_2-1, SUMO3_1-1, UBB_1-1/UBC_1-1/RPS27A_1-1/UBA52_1-1, URM1_1-1, UBD_1-2 (aka FAT10), and UFM1_1-1], and 22 domains are putative ubiquitin-like modifiers based on sequence features [ : HERPUD2_1-1, PARK2_1-1/PARK2_2-1/PARK2_5-1, PARK2_2-2, PIK3CA_1-2, PTPN3_1-2, PTPN13_3-6/PTPN13_4-7, SF3A1_1-1, SHARPIN_1-1/SHARPIN_2-1/SHARPIN_3-1, TMUB2_1-1/TMUB2_2-2, SHROOM1_1-1/SHROOM1_2-1, USP40_3-1, USP5_1-1, VCPIP1_1-2, WDR48_1-1, and WDR48_5-1].

6

Within the human genome, there are 220 genes that encode 448 protein domains that share the

same structural fold as ubiquitin (Figure 1.2b); at the start of this project in 2006, there were 78

known human ubiquitin-like domains of which 18 were known ubiquitin-like modifiers and 7 were

putative ubiquitin-like modifiers (Figure 1.2a). Even with the same structural fold, they have

different binding partners and diverse biological functions in the host organism, as well as viral

and bacterial pathogens. Sixteen of these UBLs have been characterized as UBMs, which can

become conjugated to target proteins (Table 1.1). An additional 22 putative UBMs are predicted

to become conjugated to target proteins due to the presence of a characteristic C-terminal double-

glycine tail, but lack evidence of conjugated substrate formation. The remaining 410 UBLs contain

a ubiquitin-like fold along with other structural domains, and can modulate the ubiquitylation

pathway in some cases by competing with UBMs when interacting with proteins that contain

ubiquitin-binding domains (Hochstrasser et al., 2009).

1.6 Ubiquitin-like modifiers (UBM)

Until the 1990s, ubiquitin was thought to be the only post-translational modification that involved

the covalent linkage of a protein modifier. That was until ISG15/UCRP was discovered to undergo

a similar mechanism and became the first UBM studied in vitro (Loeb KR & Haas AL, 1992). Most

of the UBMs become conjugated to surface exposed lysines of target proteins through an

analogous but distinct enzymatic cascade. Many UBMs are associated with essential cellular

processes, yet the amount of functional information about them remains limited.

Of the UBMs that have been functionally characterized: SUMO targets lysines within conserved

motifs (ie. ФKXE, phosphorylation-dependent sumoylation motif & negatively charged amino acid-

dependent sumoylation motif) (Yang et al., 2006), and is involved in transcriptional regulation and

genome surveillance (Müller et al., 2004). NEDD8 modification is involved in cell cycle control

and in embryogenesis by up-regulating the activities of cullin-based E3 ligases (Pan et al., 2004).

Covalent attachment of Atg12 to Atg5 is essential for autophagy (Mizushima et al., 1998). Apg8,

7

MAP1LC3A, MAP1LC3B, MAP1LC3C, GABARAP, GABARAPL1, and GABARAPL2 are involved

in lipidation through a ubiquitylation-like system (Ichimura et al., 2000). UBL5 is a unique member

of the UBMs, since it contains a C-terminal double-tyrosine motif, instead of the characteristic

double-glycine. The structure of UBL5 was solved by NMR, and the overall fold was similar to

ubiquitin, even though they share only 17.5% sequence identity (McNally et al., 2003). However,

experimental evidence remains necessary to determine whether UBL5 conjugation occurs.

8

Table 1.1: List of 18 annotated ubiquitin-like modifiers and associated enzymatic complement, substrates and

functional pathways.

Ubiquitin-like Modifier

Yeast Homologu

e

% Seq ID

C-term E1 E2 E3 USP / DUB

Mono /

Poly Substrate

Functional annotation

Ubiquitin Ubiquitin 100% Yes Ube1

/ Uba6

>37 >600 ~80 M & P Thousands Many, dependent

on linkages

Nedd8 Rub1 58% Yes

UBA3-

APPBP1

Ubc12, Ube2F

RBX1/RBX2,

SMURF1, CBL,

MDM2, MDMX, SCF,

TRIM40

SENP8 M & P

Cullins and related

proteins (Parc and Cul7), p53, p73,

Mdm2, pVHL, BCA3, EGFR

Alter interactions, conformation

MNSFβ (Fub1, Fau)

36% Yes TCRα-like

protein, Bcl-G, Endophilin II

Immuno-regulatory role

ISG15 (UCRP)

28/37 Yes Ube1

L UbcH8, UbcH6

Herc5 UBP43 M Viral and host

proteins

Antiviral immunity, IFN-

inducible

FAT10 27/36 No Uba6 Use2 Use2

Ub-independent proteasomal degradation,

immunoregulatory role

UFM1 23 Yes Uba5 Ufc1 Ufl1 UfSP1 UfSP2

C20orf116 Erythroid and

megakaryocyte development

SUMO1 Smt3 14 Yes SAE1

-SAE2

Ubc9 ~15 SENP1-2 M Hundreds Alter interactions,

localization, conformation

SUMO2 13 Yes SAE1

-SAE2

Ubc9 ~15 SENP1-3,

5-7 M & P Hundreds

Alter interactions, localization,

conformation

SUMO3 13 Yes SAE1

-SAE2

Ubc9 ~15 SENP1-3,

5-7 M & P Hundreds

Alter interactions, localization,

conformation

SUMO4 12 IκBα NFκB signaling, pseudogene or not processed

Atg12 Atg12 12 No Atg7 Atg10 M Atg5, Atg3 Autophagy,

mitochondrial homeostasis

Urm1 Urm1 17 No MOCS3

M

MOCS3, ATPBD3, UPF0432,

CAS, USP15, yeast: Ahp1

tRNA thiolation and oxidant-

induced protein modification

MAP1LC3A Atg8 9 Yes Atg7 Atg3 Atg12/5

/16L Atg4A-D M

Phosphatidylethanolamine

(PE)

Autophagosome biogenesis:

tethering and fusion

MAP1LC3B Atg8 13 Yes Atg7 Atg3 Atg12/5

/16L Atg4A-D M


(PE)



MAP1LC3C Atg8 10 Yes Atg7 Atg3 Atg12/5

/16L Atg4A-D M


(PE)



GABARAP Atg8 8 Yes Atg7 Atg3 Atg12/5

/16L Atg4A-D M


(PE)

Selective autophagy via interaction with

autophagy receptors

GABARAPL1 / Atg8L / GEC1

Atg8 12 Yes Atg7 Atg3 Atg12/5

/16L Atg4A-D M


(PE)

Functional difference

between isoforms is unclear

GABARAPL2 / GATE-16 /

GEF2 Atg8 14 Yes Atg7 Atg3

Atg12/5/16L

Atg4A-D M Phosphatidylet

hanolamine (PE)

Functional difference

between isoforms is unclear

9

1.7 Ubiquitin-like structural domains (UBL)

The human genome contains 220 genes that encode proteins with at least one ubiquitin-like

domain, of which 38 can be classified as known or potential UBMs. The remaining non-modifying

UBLs could act as permanent structural features that facilitate protein targeting interactions to

regulate a variety of cellular activities that include transcription, translation, nuclear transport,

proteolysis, autophagy, antiviral pathways, and processes associated with poly-ubiquitylation,

such as endocytosis, membrane-protein trafficking, cell signaling and DNA repair (Grabbe & Dikic,

2009). There is no known generalizable function for the UBL fold, aside from mediating protein-

protein interactions and the role of the small set of UBMs.

1.8 Ubiquitin Conjugation Cascade

Ubiquitin and UBMs are conjugated to their target substrate through a series of enzymatic

reactions that result in conjugation of the C-terminus of ubiquitin-like fold to the -amino group of

a surface exposed lysine within the target substrate. The enzymes involved in this cascade

consist of an E1, an E2, and an E3 (Figure 1.3 & Figure 1.4). A computational analysis has

determined that there are 16 human E1s, 53 human E2s, 527 human E3s, and 184 human DUBs

(Xu & Peng, 2006; Semple CA, 2003).

The activating enzyme (E1) activates ubiquitin by catalyzing the ATP-dependent formation of a

thioester bond involving a free thiol of the catalytic Cys and the C-terminal glycine of ubiquitin,

which facilitates the transfer of the C-terminal glycine to a surface exposed Cys on a conjugating

enzyme (E2) (Figure 1.4). This is followed by either the C-terminal glycine of ubiquitin being

transferred to a Cys of a protein ligase (E3) or the formation of a covalent conjugation between

the C-terminal glycine and an -amino group of a surface exposed lysine within the target protein.

There are also some rare cases where the N-terminal amino group, a cysteine residue, a

threonine residue, or a serine residue within a target protein acts as ubiquitylation sites (Wang et

al., 2007).

10

Figure 1.3: Ubiquitin-like modifier conjugation cascade. Enzymes in the ubiquitin conjugation cascade consist of

E1, E2s, and in some cases E3s that are uniquely associated with specific UBMs (Hochstrasser M, 2000).

Figure 1.4: Ubiquitin conjugation cascade. The enzymatic cascade that mediates ubiquitin conjugation is similar for

all UBMs. It involves ATP, ubiquitin activating enzymes (E1), ubiquitin conjugating enzymes (E2), and ubiquitin ligases (E3), and results in the conjugation of the UBM to a surface exposed lysine on the target protein. Conjugation is a dynamic process, and de-ubiquitylating enzymes (DUBs) can release the UBM from the target protein.

11

1.9 Ubiquitin-binding domains & interactions

Ubiquitin-binding proteins are key players in modulating the downstream activity of UBM

conjugation. Ubiquitin-binding proteins contain regions that are 20 to 150 residues that non-

covalently interact with the members of ubiquitin-fold superfamily. Some ubiquitin-binding regions

are independent domains (ie. UBA, VHS, CUE), and other ubiquitin-binding regions consist of

individual secondary structure elements (ie. UIM and SIM). Ubiquitin-binding domains (UBDs)

were first identified as interaction partners of ubiquitin, but several UBD family members do not

interact with ubiquitin. The specificity of such ubiquitin-binding domain proteins could favour other

UBLs.

Many UBDs have been observed in the enzymatic components of the UBM cascade, as well as

in proteins that are involved in the downstream translocation or functional effect of protein

conjugation. Due to the transient nature of these interactions, binding is on the moderate to low

affinity scale; Kd of ~460uM for GRAM-like ubiquitin binding in Eap45 (GLUE)-monoubiquitin,

compared to an apparent Kd of ~0.03-9uM for UBA-polyubiquitin (Haglund et al., 2005). The

interaction itself appears to be controlled by post-translational modification of the UBD-containing

protein, accessibility of the ubiquitin-binding interface and accessibility of the UBD-binding

interface. A relevant example of UBD modulation involves RAD23, which shuttles conjugated

proteins to the proteasome. The RAD23-ubiquitin interaction is inhibited by the association of its

UBD with its UBL (Chen et al., 2001). Whether the role of UBL is to regulate UBD-ubiquitin or

UBD-UBM interactions has been explored through the course of this thesis.

1.9.1 Ubiquitin Interacting Motif (UIM), Motif Interacting with Ubiquitin (MIU) & Double-sided Ubiquitin Interacting Motif (DUIM)

The ubiquitin interacting motif (UIM) is the ubiquitin-interacting region of the S5A/RPN10

proteasomal subunit (Young et al., 1998). This UIM is a short ~20 aa -helical segment of a

protein. Through sequence analysis, putative human UIMs were identified and some of these

12

peptides were selected as putative UIM binding partners for ubiquitin and ubiquilin-1. Two

additional ubiquitin-interacting motifs are similar to the UIM: MIUs which bind in a manner almost

identical to the UIM:Ub interaction but in the opposite orientation, and DUIMs which consist of two

tandem UIMs.

1.9.2 Coupling of Ubiquitin conjugation to Endoplasmic Reticulum Degradation (CUE)

The coupling of ubiquitin conjugation to endoplasmic reticulum degradation domain was

discovered through yeast-two hybrid screening by two independent groups (Shih et al., 2003;

Donaldson et al., 2003), and structural analyses have resulted in 7 structures (ie. CUE2 [PDB_ID:

1OTR] & VPS9 [PDB_ID: 1P3Q]). The CUE domain consists of a three-helix bundle, from which

residues on two -helices interact with ubiquitin.

1.9.3 Ubiquitin-Associated Domain (UBA)

The ubiquitin-associated domain (UBA) was identified through bioinformatics analyses of

enzymes involved in ubiquitylation or deubiquitylation (Hoffmann et al., 1996). UBA interact with

both monoubiquitylated and polyubiquitylated proteins, and structural analyses have resulted in

45 structures (ie. Dsk2p [PDB_ID: 1WR1] & ubiquilin 3 [PDB_ID: 2DAH]). The UBA domain is

similar to the CUE domain in that it consists of a three-helix bundle, from which residues on two

-helices interact with ubiquitin.

1.9.4 Ubiquitin Conjugating Enzyme Variant (UEV)

The ubiquitin conjugating enzyme variant (UEV) proteins are homologous to E2s, but are inactive

because they lack the active site Cys. Even though they are catalytically inactive, they are able

to interact with ubiquitin through their conserved ubiquitin-binding interface (Koonin et al., 1997).

Structural analyses of UEV have resulted in 12 structures (ie. TSG101 [PDB_ID: 1S1Q] & VPS23

[PDB_ID: 1UZX]).

13

1.9.5 Npl4 Zing Finger Motif (NZF)

The Npl4 zinc finger (NZF) motif is also a zinc finger binding motif (Meyer et al., 2002; Wang et

al., 2003). Structural analyses of NZF have resulted in 3 structures [PDB_ID: 1Q5W, 1NJ3,

2PJH]. The NZF motif binds to ubiquitin through three residues that are located on loops

coordinated by strands ordered by the zinc ion.

1.9.6 GGA And Tom1 Domain (GAT)

The GGA and Tom1 (GAT) domain was discovered by two-hybrid screens (Shiba et al., 2004),

and structural analyses have resulted in 5 structures [PDB_ID: 1YD8, 1WR6, 1WRD, 2C7M, and

2C7N]. The GAT domain is similar to both the CUE and the UBA domains in that it consists of a

three-helix bundle, from which residues on two -helices interact with ubiquitin. However, the

orientation of the helices differ, such that the two -helices are parallel for GAT and are anti-

parallel in both CUE and UBA.

1.9.7 Other Ubiquitin Binding Domains

The GRAM-like ubiquitin binding in Eap45 (GLUE) domain has been structurally determined 4

times (Teo et al., 2006), and the Vps27,Hrs,STAM (VHS) domain has been structurally

determined 12 times (Hoffman et al., 2001). The polyubiquitin associated zinc finger (PAZ)

domain was discovered by two-hybrid screens, and was further characterized biochemically

(Hook et al., 2002).

1.9.8 SUMO Interacting Motif (SIM)

Binding partners and modes have been identified for some ubiquitin-like modifiers, such as the

SUMO-interacting Motif (SIM) that interacts with SUMO. The SIM is a short -strand that behaves

as a -sheet extension to that of SUMO.

14

1.9.9 Diversity among Ubiquitin-Binding Domains

From the structural studies of UBD-UBM interactions, some similarities have been observed.

However, there is a great diversity involving the tertiary folds of the protein involved in the

interaction; residues from individual and adjacent -helices, -strands, as well as loops interact

with ubiquitin or a ubiquitin-like domain (Table 1.2). The diversity amongst the binding modes

also changes across members within the same UBD families. However, one common feature

shared by many of the UBD interactions is that they usually extend along the isoleucine 44 face

of ubiquitin, which is highly conserved throughout evolution and to a minor extent between UBLs

(Haglund et al., 2005).

Table 1.2: Protein-protein interaction modes that have been structurally characterized with experimentally determined

binding affinities between UBLs and binding partners.

Ubiquitin Binding Type Size Affinity Example

PDB Reference

UIM / DUIM / MIU

Ubiquitin Interacting Motif

~20 aa ~100-400 µM

(mono or poly-Ub) ~30 µM (MIU)

1Q0W

Young P, 1998; Fisher RD, 2003;

Swanson KA, 2003; Wang QH, 2005

SIM SUMO Interacting Motif

~12 aa ~2-10 µM 2ASQ Song J, 2005;

Hecker CM, 2006

CUE Coupling of Ubiquitin conjugation to Endoplasmic Reticulum Degradation

42-43 aa ~2-160 µM (mono-Ub)

1P3Q, 1OTR

Donaldson KH, 2003; Kang RS, 2003; Prag G, 2003; Shih SC, 2003

GAT GGA And Tom1 Domain

135 aa ~180 µM

(mono-Ub) 1YD8

Shiba Y, 2004; Prag G, 2005

GLUE GRAM-like ubiquitin binding in Eap45

~135 aa ~460 µM

(mono-Ub) 2DX5 Teo H, 2006

NZF Npl4 Zing Finger Motif

~35 aa ~100-400 µM

(mono-Ub) 1Q5W

Meyer HH, 2002; Wang B, 20003; Alam SL, 2004;

A20 ZnF A20 ZnF Domain ~35 aa ~10-25 µM

2FID 2FIF 2G45

Lee S, 2006; Penengo L, 2006

UBC Ubiquitin Conjugating Catalytic Domain ~150 aa ~300 µM 2FUH Brzovic PS, 2006

UBA

Ubiquitin-Associated Domain

45-55 aa ~10-500 µM (mono-Ub) ~0.03-9 µM (poly-Ub)

2JY6, 1ZO6 Hofmann K, 1996;

PAZ (ZnF-UBP) Polyubiquitin Associated Zinc finger

~58 aa ~3 µM ~60 nM

2G45, 3IHP Hook SS, 2002; Boyault C, 2006;

Reyes-Turcu, 2006

UEV Ubiquitin Conjugating Enzyme Variant

~145 aa ~100-500 µM

(mono-Ub) 1S1Q

Koonin EV, 1997; Sundquist WI, 2004

VHS Vps27,Hrs,STAM

150 aa ~50 µM 2L0T, 3LDZ Hong YH, 2009

15

1.10 Thesis Overview

Ubiquitin plays a vital role in protein trafficking, protein degradation, and a variety of disease

pathways. Significant advances in the study of ubiquitin, ubiquitin-binding domains, UBLs,

ubiquitin-like modifiers, and ubiquitin conjugating enzymes have led to a better understanding of

the complexity of the ubiquitin and ubiquitin-like modifier conjugation system. However, there

remains a gap in knowledge associated with the overarching significance of the ubiquitin fold, and

the nature and function of many UBLs remains largely unexplored.

This thesis explores the size and scope of human UBLs, which led to a structure and biophysical

examination of 17 UBLs. Analysis of the 17 UBLs led to the analysis of two UBL-binding domains

that interact with two distinct UBLs (NFATc2IP & ubiquilin-1), as well as revealing the biochemical

relationship between these 17 UBLs with each other and within the full set of all UBLs.

1.10.1 Identify and obtain near-complete structural coverage of all human UBLs.

The first experimental component of this study focused on identifying the complete set of all

human UBLs encoded within the human genome, which allowed for a better understanding of the

breadth and sequence diversity of ubiquitin’s -grasp fold. Upon determination of the expansive

population of human UBLs, we obtained near-complete structural coverage of the ubiquitin-like

fold for the human proteome. This resulted in generating 100 modelling families of related UBLs

and experimental structural determination of 17 UBLs.

1.10.2 Exploring the NFATc2IP + NFATc2 protein-protein interaction and the ubiquilin-1 + PIN2 protein-protein interaction

To assist in understanding the structural and functional diversity of the ubiquitin-like domain,

computational analyses of NFATc2IP & ubiquilin protein sequences, molecular structures and

known binding partners were performed. This led to the deduction that NFATc2IP could interact

with NFATc2 via SIM-like interaction, which was validated using peptide-array and NMR titration

16

experiments. A similar series of computational analyses was performed using the ubiquilin-1

protein sequence and structure, which led to the deduction that PIN2 could interact with ubiquilin

via UIM-like interaction. This was validated using NMR titration experiments.

17

Chapter 2

The ubiquitin fold: leveraging structural genomics

Contributions: J. Everett performed clustering of UBLs into model families. A. Semesi, M. Garcia

& A. Yee assisted with cloning, small scale sample preparation & small scale expression/solubility

screening. J. Lukin, C. Fares, M. Karra, S. Srisalam, S. Houliston assisted with NMR data

acquisition and NMR titration. I performed large scale NMR sample preparation and NMR

screening, as well as remaining experiments and analyses under the guidance of CH. Arrowsmith.

18

Chapter 2

The ubiquitin fold: leveraging structural genomics

2.1 Summary

Structural genomics brings together information about not just the protein for which a structure is

obtained, but also sequentially similar homologues and even distantly related fold family

members. For this thesis, structural genomics provided the tools for gaining insight into the

diversity of the ubiquitin-like domain family. Bioinformatics and computational techniques were

leveraged to expand the set of known human ubiquitin-like domain containing genes, prioritize

subsets of human ubiquitin-like domain containing genes based on their structure’s role in domain

family structure coverage, and assist in construct design for structure determination. We used

nuclear magnetic resonance (NMR) spectroscopy to screen human UBLs for structure

determination, and subsequently determined the structures of 17 human UBLs using X-ray

Crystallography and NMR spectroscopy. As a result, the RCSB PDB now has 32% structural

coverage of human UBLs, and 82% structural coverage when taking into account homology

models of UBL domains that have at least 30% sequence identity over the enter length of the full

domain. Of the remaining 74 human UBLs that lack structural information, 30 are singletons and

are on average 36% similar & 23% identical to the most similar regions of protein structures within

the PDB. The UBLs structurally characterized for this project facilitate 3.7% structural coverage

of all human UBLs. When taking into account UBL homology models, the structural coverage is

6%. Structural analyses have also provided insight into families of related proteins. In particular,

structural analysis of the NFATc2IP and ubiquilin protein families revealed insight into protein-

protein interactions and facilitated the prediction of novel binding partners.

2.2 Introduction

One goal of structural genomics is to provide a high throughput framework for generating accurate

molecular structure representations of at least one member of large groups of protein domain

19

families. The molecular structure itself provides insight into functional attributes shared among

protein domain family members, functional variability within the protein domain family, as well as

structural templates for ligand docking studies, homology modeling, and molecular replacement

methods for solving X-ray crystal structures.

Two structural genomics groups that have made significant contributions to the PDB are the

NorthEast Structural Genomics Consortium (NESG) and the Structural Genomics Consortium

(SGC). In 2000, the Protein Structure Initiative was established to provide funding and direction

to 9 structural genomics centres. The NESG uses both NMR & X-ray crystallography for

elucidating the structures of eukaryotic proteins related to cancer biology, protein-protein

interaction networks, specific biochemical pathways, or implicated in specific human diseases.

The SGC is a public/private initiative that focuses on medically significant proteins related to

human health. From 2003 until Jan 2014, the NESG determined 1174 protein structures (516 by

NMR & 658 by X-ray crystallography), and from 2004 to Jan 2014 the SGC determined 1232

protein structures (28 by NMR & 1204 by X-ray crystallography). These initiatives implement a

similar parallel high-throughput structural genomics framework that focuses on structurally

characterizing a large number of protein targets from gene to structure.

Structural genomics efforts have had a significant impact on scientific innovations related to the

biological sciences and human health. In addition to the wealth of knowledge generated through

these efforts, structural genomics facilitates: methods development and optimization, improved

datasets related to known and potential drug target proteins for drug discovery programs, and

increased availability of purified proteins for reagent development (Weigelt, 2010).

This thesis leverages the strengths of structural genomics experimental methods to explore the

significance of structural variation within the ubiquitin-like domain family. The ubiquitin-like

domain family was chosen because of the large number of medically-significant members of the

family, the large number of uncharacterized ubiquitin-like domain containing genes, the stable

20

and soluble nature of ubiquitin, and the scientifically interesting questions surrounding the

ubiquitylation system that include the unknown role that UBLs play.

There remains a significant gap in understanding the role of UBLs, as well as the breadth of

cellular and molecular activity of the full length proteins that contain UBLs. There is also a gap in

knowledge related to the size of the ubiquitin-like domain fold-space. In 2005, 73 genes were

formally annotated as containing UBLs. By 2012, the list of formally annotated ubiquitin-like

domain containing genes expanded to 152 genes. By 2014, the list of formally annotated

ubiquitin-like domain containing genes expanded to 191 genes and 325 isoforms (Marchler-Bauer

et al., 2013). The expanded set of formally annotated ubiquitin-like domain containing genes

remains substantially smaller than the number of genes that were determined using a PSI-BLAST

batch approach for this thesis project. This gap in breadth presents a gap in knowledge of the full

extent of the ubiquitin-like domain family and its diversity.

This thesis tries to explore these gaps to provide insight and a possible explanation for the breadth

and diversity of the ubiquitin-like domain family, while demonstrating its significance through

molecular structure analysis. The first objective of the project was to identify all UBLs within the

human genome. Once all UBLs were identified, a strategy was developed to work towards

complete structural coverage of the ubiquitin-like domain family. Combining molecular biology

and structural biology techniques, along with knowledge of the molecular structure of each human

ubiquitin-like domain would provide insight into the various biochemical functions of UBLs and the

significance of variations between domains. The second objective of this chapter discusses how

we leveraged bioinformatics, molecular biology and structural biology techniques to screen UBLs

for structure determination by NMR and prioritize constructs to facilitate greater family coverage

with each newly solved structure.

21

2.3 Methods

2.3.1 Identifying human ubiquitin-like domains

An initial list of all identifiable human UBLs was compiled based on gene/domain annotation within

UniProtKB (UniProt Consortium, 2014), Human Protein Atlas (Uhlen et al., 2010), the Human

Protein Reference Database (Prasad et al., 2009), and the NCBI’s Conserved Domain Database

(consisting of SMART, Pfam, COGs, TIGRFAM, and PRK) (Marchler-Bauer et al., 2013). The

resulting list of 73 human UBLs was expanded to 645 distantly related human UBLs by performing

a batch of independent DELTA-BLAST sequence similarity searches of GenBank and Uniprot

using each member of the initial list of human ubiquitin-like domain. DELTA-BLAST is a modified

version of BLAST that uses RPS-BLAST to search for conserved domains from which a position-

specific scoring matrix (PSSM) is generated and used to search the sequence databases (Benson

et al., 2013; Boratyn et al., 2012).

Figure 2.1: Novel UBL discovery process. Unannotated UBLs were discovered though a series of DELTA-BLAST

searches of the NCBI Genbank and Uniprot human protein databases. The predicted secondary structure elements of putative UBLs was analyzed to confirm whether it was a legitimate UBLs, and legitimate UBLs were also used as input sequences for subsequent DELTA-BLAST searches.

22

2.3.2 Validating putative human ubiquitin-like domains

Figure 2.2: Secondary & tertiary structures of Human ubiquilin-1. Secondary structure elements of the human

ubiquitin-like domain containing protein ubiquilin-1 (UBQL1_HUMAN; sp|Q9UMX0).

Ubiquitin-like domains have a characteristic secondary structure consisting of 5 -strands and 3

-helical regions (Figure 2.2). Secondary structure elements were predicted using JPRED and

PSIPRED webservers for each full length protein that contains at least one of the 645 UBLs. A

sequence similarity search of the PDB was also performed using each full length ubiquitin-like

domain containing protein to determine whether any protein structures were deposited with a

similar amino acid sequence. A pseudo-multiple sequence alignment was generated for each

ubiquitin-like domain, bringing together information about the full length protein sequence,

predicted secondary structure elements, and similar proteins deposited in the RCSB PDB (Figure

2.3).

23

1---------11--------21--------31--------41--------51--------61--------71--------81--------91--------101-------111-------121-------131-------141-------151-------

OrigSeq :MAESGESGGPPGSQDSAAGAEGAGAPAAAASAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQDHSAQQTNTAGSNVTTSSTPNSNSTSGSATSNPFGLGGLGGLAGLSS

Jnet :-----------------------------------EEEEEEEE----EEEEE----HHHHHHHHHHHH-------EEEEE---------HHH--------EEEEEEE-----------------------------------------------------

Jhmm :-----------------------------------EEEEEEEE----EEEEEE----HHHHHHHHHH-------EEEEEE--EE----HHHHH-------EEEEEEE-----------------------------------------------------

Jpssm :-----------------------------------EEEEEEEE---EEEEEE----HHHHHHHHHHHHH------HEEH---------------------EEEEEEE-----------------------------------------------------

Jnet_25 :---------------BB-B----------------BB-BBBBB------B-B---B-B--BB--BB----B-B-BBBBBB-B-BB-----B--B-B---BBBBBBBBB--------B--B------BBB---------------BBBBBBBBBBBBBBB-

Jnet_5 :------------------------------------B-B-B----------------B--B---B----------B-BBB----------B----------BBBB----------------------------------------B---B----B--B--

Jnet_0 :-----------------------------------------------------------------------------BB----------------------B-B--------------------------------------------------------

Jnet Rel :9988877777777777777777777777777777606899871686078884077508999999998003787500000046006676000004467875488987436777777777777777777777777777777777777777777777777777

UBIQUITIN_HUMAN-JPRED -EEEEEE----EEEEEE-----HHHHHHHHHHH-------EEEEE--------------------EEEEEEE---- : Jnet

UBIQUITIN_HUMAN -EEEEEE----EEEEEE-----HHHHHHHHHHH-------EEEEE---EE-----HHHH------EEEEEEE---- : 1Q0W

SUMO1_HUMAN-JPRED ----------------------EEEEEEEE----EEEEEE----HHHHHHHHHHHHH-----EEEEEE--------------------EEEEEEEE------- : Jnet

SUMO1_HUMAN ----------------------EEEEEEEE---EEEEEEEE-----HHHHHHHHHHH-----EEEEE--------------------EEEEEEE--------- : 1A5R

161-------171-------181-------191-------201-------211-------221-------231-------241-------251-------261-------271-------281-------291-------301-------311-------

OrigSeq :LGLNTTNFSELQSQMQRQLLSNPEMMVQIMENPFVQSMLSNPDLMRQLIMANPQMQQLIQRNPEISHMLNNPDIMRQTLELARNPAMMQEMMRNQDRALSNLESIPGGYNALRRMYTDIQEPMLSAAQEQFGGNPFASLVSNTSSGEGSQPSRTENRDPL

Jnet :------------HHHH------HHHHHHHH--HHHHH----HHHHHHHH---HHHHHHHH-------------HHHHHHHHHH-HHHHHHHHHHHHHHHHH-------HHHHHHHHHHHHHHHHHHHH--------------------------------

Jhmm :------------HHHH-------HHHHHH---HHHH------HHHHHH----HHHHHHHH-------------HHHHHHHHHHHHHHHHHHHH--HHHHH--------HHHHHHHHHHHHHHHHHHHH--------------------------------

Jpssm :------------HHHHH-----HHHHHHHH--HHHHH----HHHHHHHHH--HHHHHHHH-------------HHHHHHHHH--HHHHHHHHHHHHHHHHH-------HHHHHHHHHHHHHHHHHHH---------------------------------

Jnet_25 :BBB----B--B---BB--B--BB-BBB-BB---BBB-BB--B-BB--BB--B--B--BB--BB-B---B----BB--BB-BB--B-BB--BB-----BB--B-BB-BB--BB--BB--B---BB-BB------BBBB-B--B------------B---BB

Jnet_5 :-------B-----------------B--------B--B------B-----------------------B-----B--B---B-------------------B-----B---B--B---B---B--B---------------------------------B

Jnet_0 :----------------------------------------------------------------------------------------------------------------------------------------------------------------

Jnet Rel :7777777776523453047874089999802356460477508999990055589998841413434677621789999984006899997470099987037887636899986899999999863056777777665667777777777777777777

321-------331-------341-------351-------361-------371-------381-------391-------401-------411-------421-------431-------441-------451-------461-------471-------

OrigSeq :PNPWAPQTSQSSSASSGTASTVGGTTGSTASGTSGQSTTAPNLVPGVGASMFNTPGMQSLLQQITENPQLMQNMLSAPYMRSMMQSLSQNPDLAAQMMLNNPLFAGNPQLQEQMRQQLPTFLQQMQNPDTLSAMSNPRAMQALLQIQQGLQTLATEAPGL

Jnet :--------------------------------------------------------------------HHHH-----HHHHHHHHHHH--HHHHHHHHH--------HHHHHHHHHHHHHHHHHH--HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH--

Jhmm :-----------------------------------------------------------------------------HHHHHHHHHHH--HHHHHHHHH--------HHHHHHHHHHHHHHHHHH--HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Jpssm :----------------------EEE--------------------------------HH--------HHHHHH----HHHHHHHHHH---HHHHHHHHH--HHH---HHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH---

Jnet_25 :-BB------------B-----B--B------B--B--B-B-BBBBBBBBBBB-BBBB--BB--B--B---B--B--BBBBB-BB--BB-BB-BB--BB-BBBB---B--BB--B---BB-BB-BB----BB-BB-B--BB-BBB-BB-BB--B---BB-B

Jnet_5 :--B----------------------------------------B---B--B-----B-------------B--------B--BB---------B---B--BB-------B---B------B---B--------B----BB-BB---------B---BB-B

Jnet_0 :----------------------------------------------------------------------------------------------------------------------------------------------------------------

Jnet Rel :7777777777777777777774000267777777777777777777777777765410012577753000000067658999999860663589998614500005468999999748999873076589999868999999999999999987541000

481-------491-------501-------511-------521-------531-------541-------551-------561-------571-------581------ :

OrigSeq :IPGFTPGLGALGSTGGSSGTNGSNATPSENTSPTAGTTEPGHQQFIQQMLQALAGVNPQLQNPEVRFQQQLEQLSAMGFLNREANLQALIATGGDINAAIERLLGSQPS : OrigSeq

Jnet :------------------------------------------HHHHHHHH-------------HHHHHHHHHHHHH-----HHHHHHHHHH----HHHHHHHHH----- : Jnet

Jhmm :------------------------------------------HHHHHHH--------------HHHHHHHHHHHHH-----HHHHHHHHHH----HHHHHHHH------ : jhmm

Jpssm :------------------------------------------HHHHHHHHH-----------HHHHHHHHHHHHHH-----HHHHHHHHHH----HHHHHHHHH----- : jpssm

Jnet_25 :BBBBBBBBBBB--BBB---------B----B------B--BBBBBBBBBB-BBB-----B--B--BB--BB--B--BBB-B--BBB-BB-BB---BBBBB--B------ : Jnet_25

Jnet_5 :B--B-------------------------------------BB-BBB-B--B--------------B------B----------BB-BB------B-BBB--B------ : Jnet_5

Jnet_0 :----------------------------------------------------------------------------------------B-------------------- : Jnet_0

Jnet Rel :5677777777777777777777777777777777777777641788887004677877776507999999999986068866899999974588668999988026899 : Jnet Rel

Figure 2.3: Pseudo-multiple sequence alignment of human ubiquilin-1. Full length protein sequence and predicted secondary structure elements of the human

ubiquitin-like domain containing protein ubiquilin-1 (UBQL1_HUMAN; sp|Q9UMX0). The secondary structure elements for human ubiquitin & human SUMO1, as well as the predicted secondary structure elements for human ubiquitin & human SUMO1 are aligned with the ubiquitin-like domain of ubiquilin-1. Secondary structure elements were predicted using Jpred3.

24

Figure 2.4: UBL target selection, preparation and screening process. Legitimate UBLs were grouped into modeling

families, from which target UBLs were selected. For each target ubiquitin-like domain, constructs were designed with varying domain boundaries and protein samples were prepared using a parallel high-throughput batch approach. NMR screening was performed on ubiquitin-like domain samples that had sufficient expression and concentration. Ubiquitin-like domain samples with adequate 1H15N-HSQC spectra were re-expressed as 15N13C-labelled protein for full structure determination.

2.3.3 Target selection

A sequence similarity analysis was performed to group related UBLs. Modelling families were

generated that consist of subsets of UBLs in which the structure determination of one member of

the modelling family would facilitate a reliable structure prediction of all other members of the

modelling family using homology modelling techniques (Nair et al., 2009). This shortened the full

list of all UBLs to 76 ubiquitin-like domain targets after removing proteins whose structures have

already been deposited in the PDB, those that lack a homologue of sufficient sequence similarity,

and those for which DNA templates were not available. These UBLs were targeted for NMR

structure determination as described below.

25

2.3.4 Construct design

Multiple constructs were designed for each of the 76 UBLs to facilitate screening of solubility, yield

and NMR spectrum. The ubiquitin-like domain boundaries were defined using a pseudo-multiple

sequence alignment that contained sequence annotation, predicted secondary structure,

disordered regions, and all sequentially similar structurally characterized proteins within the PDB.

To facilitate protein purification using Ni2+ affinity chromatography, all constructs were generated

with a fused N-terminal poly-histidine tag. When necessary, constructs were redesigned based

on trends in small scale and NMR screening results.

21--------31--------41--------51--------61--------71--------81--------91--------101-------111------- OrigSeq : AEGAGAPAAAASAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQDHSAQQ

Jnet : ----------------EEEEEEEE----EEEEE----HHHHHHHHHHHH-------EEEEE---------HHH--------EEEEEEE------------

Jhmm : ----------------EEEEEEEE----EEEEEE----HHHHHHHHHH-------EEEEEE--EE----HHHHH-------EEEEEEE------------

Jpssm : ----------------EEEEEEEE---EEEEEE----HHHHHHHHHHHHH------HEEH---------------------EEEEEEE------------

Jnet_25 : ----------------BB-BBBBB------B-B---B-B--BB--BB----B-B-BBBBBB-B-BB-----B--B-B---BBBBBBBBB--------B--

Jnet_5 : -----------------B-B-B----------------B--B---B----------B-BBB----------B----------BBBB--------------

Jnet_0 : ----------------------------------------------------------BB----------------------B-B---------------

Jnet Rel : 7777777777777776068998716860788840775089999999980037875000000460066760000044678754889874367777777777

PSIPRED : cccccccccccccccccEEEEEEcccccEEEEEcccccHHHHHHHHHHHHccccccEEEEEccEEcccccHHHHcccccccEEEEEEEcccccccccccc

UBIQUITIN_HUMAN-JPRED -EEEEEE----EEEEEE-----HHHHHHHHHHH-------EEEEE--------------------EEEEEEE---- : Jnet

UBIQUITIN_HUMAN -EEEEEE----EEEEEE-----HHHHHHHHHHH-------EEEEE---EE-----HHHH------EEEEEEE---- : 1Q0W

SUMO1_HUMAN-JPRED --------EEEEEEEE----EEEEEE----HHHHHHHHHHHHH-----EEEEEE--------------------EEEEEEEE------- : Jnet

SUMO1_HUMAN --------EEEEEEEE---EEEEEEEE-----HHHHHHHHHHH-----EEEEE--------------------EEEEEEE--------- : 1A5R

OrigSeq : AEGAGAPAAAASAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQDHSAQQ

1J8C:A EPKI+KVTVKTPKEKEEFAVPENSSVQQFKE_ISKRFKS_TDQLVLIFAGKILKDQDTL_QHGIHDGLTVHLVIK (ID:95% SIM:96%)

1YQB:A __P_++KVTVKTPK+KE+F+V_+__++QQ_KEEIS+RFK+H_DQLVLIFAGKILKD_D+L+Q_G+_DGLTVHLVIK_Q+R (ID:68% SIM:85%)

1WX7:A A___+P_++KVTVKTPK+KE+F+V_+__++QQ_KEEIS+RFK+H_DQLVLIFAGKILKD_D+L+Q_G+_DGLTVHLVIK_Q+R (ID:66% SIM:84%)

2BWE:S +_+_+K+_++K_E__V___S+V_QFKE_I+K__________LI++GKILKD__T+__+_I_DG_+VHLV (ID:41% SIM:59%)

1YX5-B M++_VKT___K_____V__+_+++__K_+I__+_____DQ__LIFAGK_L+D__TLS_+_I____T+HLV++ (ID:36% SIM:54% GAP:1%)

Domain Boundaries:

Construct1 PKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRP

Construct2 PKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQD

Construct3 PKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKT

Construct4 SAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRP

Construct5 SAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQD

Construct6 SAEPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKT

Construct7 MKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRP

Construct8 MKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKTQNRPQD

Construct9 MKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDTLSQHGIHDGLTVHLVIKT

Figure 2.5: Pseudo-multiple sequence alignment of ubiquilin-1 for construct design. Pseudo-multiple sequence

alignment of ubiquilin-1 showing residues 20-119 of the full length protein sequence corresponding to the ubiquitin-like domain region, as well as predicted secondary structure elements, similar proteins deposited in the RCSB PDB, and constructs with predicted ubiquitin-like domain boundaries.

26

2.3.5 Sample preparation

Small scale expression and purification of each construct was performed to determine sample

solubility and yield. For each target, samples with the best yield were regrown for NMR screening.

15N-labelled samples were expressed in E.coli, grown in batches of 12 x 0.5L using modified M9

minimal media containing 15NH4Cl as the sole nitrogen source supplemented with kanamycin at

37oC until an OD600 of 1.0 was reached. Protein expression was induced with isopropyl-1-thio-D-

galactopyranoside (IPTG) and the cells were incubated for 12-18 hours at 15oC. The cells were

lysed by sonication, and the cell debris was clarified by centrifugation. The poly-histidine tagged

UBLs were purified by modified batch/column Ni2+-affinity chromatography (Qiagen) in batches of

6-12 samples, and eluted to a final volume of 5 mL. Each sample was exchanged from elution

buffer into a NMR buffer using centrifugal concentrators. The standard NMR buffer consisted of

a MOPS-based buffer, however other buffers were used based on pH of sample, solubility and

resolution of NMR spectroscopy signal. The samples were concentrated to a volume of ~500 µL

and transferred to 5 mm NMR tubes, ~200 µL for 3 mm NMR tubes, or ~40 µL for 1 mm NMR

microprobe tubes. The volume and NMR tube selection depended on amount of sample

available, and necessary sample concentration for adequate NMR spectroscopy signal (Yee et

al., 2014).

2.3.6 1H15N-HSQC screening of ubiquitin-like domains

An 1H15N-HSQC spectrum was generated for each sample using a Bruker 800MHz AVANCE

spectrometer, or a Bruker 500MHz or Bruker 600MHz AVANCE spectrometer equipped with

automated sample changers. Samples were ranked based on peak intensity, dispersion and

percentage of total residues observed in each 1H15N-HSQC spectra (Yee et al., 2002). For

samples with inadequate 1H15N-HSQC spectra, new constructs were designed to improve domain

boundaries and/or NMR buffer conditions were optimized in an attempt to improve solubility.

27


2.4.1 Identifying unannotated human ubiquitin-like domains

The human genome contains 220 genes that encode proteins with UBLs, of which 147 were not

annotated as having the ubiquitin-fold at the time of analysis (Appendix I). These proteins contain

645 distantly related human UBLs that include those within isoforms produced by alternative

splicing. By eliminating identical sequences within isoforms, the pool of 645 putative UBLs can

be reduced to 398 unique UBL sequences. The goal of this project has been to obtain structural

coverage of all UBLs, without experimentally determining each of the 398 unique UBLs. To

accomplish this, the UBLs were grouped into 100 modelling families. Modelling families represent

groups of homologous protein domains that have similar structures, for which the experimental

structure of one of the members of the modelling family provides “modelling leverage” to facilitate

computation determination of protein structures for the remaining members of the modelling family

through the use of homology, or comparative, modelling methods (Arnold et al., 2006; Kiefer et

al., 2009; Peitsch, 1995; Pieper et al., 2011). Some studies have shown that sequence similarity

of >40% over >50 residues can provide models with heavy atom RMSD of <2.5 Å from the

experimental structure (Bhattacharya et al., 2008; Koh et al., 2003; Marti-Renom et al., 2000;

Marti-Renom et al., 2003). Modelling families are typically defined by such sequence similarity

and sequence coverage parameters, but the parameters used for homology model generation for

this thesis were modified to >20% over 90% because all of the domains are from the same

organism, all of the domain sequence lengths are 70 aa-120 aa in length, and there is a high level

of secondary structure element conservation shared among UBLs. Of the 100 modelling families,

there are 5 singletons (OASL_HUMAN, PARK2_HUMAN, IKKB_HUMAN, UBL7_HUMAN &

P3C2B_HUMAN), which correspond to modelling families that contain only one UBL.

28

The 398 unique UBLs were subdivided into three classes: 128 UBLs with experimental structures

deposited in the PDB, 196 UBLs with hypothetical structures generated by homology modelling,

and 74 distantly related UBLs that cannot be reliably homology modeled and therefore, have no

protein structure information (Figure 2.6).

Figure 2.6: Distribution of structurally characterized and uncharacterized UBLs. There are 398 unique UBLs, of

which 128 molecular structures have been characterized by X-ray crystallography or NMR spectroscopy and 196 molecular structures can be modelled using homology modelling techniques. The remaining 74 UBLs are too distantly related from structurally characterized proteins.

29

2.4.2 Small-Scale Screening

The complete list of 645 human UBLs (corresponding to 398 unique UBLs) was reduced to 76

UBLs to be pursued for structure determination after removing domains that were structurally

characterized, domains that shared high sequence similarity, and domains for which reagents

were not readily availability. Between 9 to 12 UBL constructs were initially designed for each of

the 76 target proteins, and additional constructs were redesigned after taking into account the

results of small-scale expression and solubility screening. In total, 680 constructs were cloned,

resulting in 205 ubiquitin-like domain constructs with adequate expression and solubility for large-

scale 1H15N-HSQC Screening (Table 2.1).

Table 2.1: Summary of the small-scale expression screening of human UBLs that were structurally characterized and

deposited in the PDB as part of this thesis.

Gene Name

Expression Solubility

5 4 3 2 1 0 5 4 3 2 1 0

BRAF 17 15 2 1 3 25 1 6 2 4

FUBI 2 1 2 1 1

ISG15 16 6 2 1 4 15 6 1 2 1 4

HERPUD2 1 1

NFATc2IPN 1 1 2

NFATc2IPC 1 1

OTU1 1 1

PLXNC1 1 1 1 1

Ubiquilin-1 2 1 1

USP7 1 3 3 4 3

2.4.3 Screening by 1H15N-HSQC

NMR spectroscopy was used for screening protein constructs because samples amenable for

structure determination can be identified within minutes to hours of the protein being purified.

Protein constructs were expressed as poly-histidine-tagged 15N-labeled proteins, and purified

using a rapid batch purification protocol (Yee et al., 2002). 1H15N-HSQC spectra were classified

as poor, promising, good or excellent based on the number of peaks visible, the peaks:residues

ratio, and the signal:noise ratio. Poor 1H15N-HSQC spectra have no visible peaks or all peaks are

overlapping due the sample being an unfolded protein. Promising 1H15N-HSQC spectra may

30

consist of a partially folded protein that contains fewer than expected peaks, or inadequate peak

intensity. Good 1H15N-HSQC spectra show clear dispersion of peaks of equal intensity, an

equivalent number of peaks as amino acids, and adequate peak intensity for structure

determination. Excellent 1H15N-HSQC spectra are similar to the “Good” 1H15N-HSQC with

stronger peak intensity that would facilitate a shorter data collection period (Yee et al., 2002).

Figure 2.7: Examples of 1H15N-HSQC screening results for human UBLs. Sharpin 30 aa-154 aa resulted in a

HSQC classified as poor, FAT10 6 aa-165 aa resulted in a HSQC classified as promising, and FAT10 6 aa-89 aa resulted in a HSQC classified as good. Table 2.2: Summary of 1H15N-HSQC screening results for human UBLs. 10 UBLs were solved by NMR (red), and 7 UBLs were solved by X-ray crystallography (blue).

Gene Name 1H15N-HSQC quality PDB

BRAF promising-2 good-4 2L05 3NY5

FUBI good-4 2L7R

ISG15 2HJ8

HERPUD2 good-1 2KDB

MAP1ALC3 3ECI

NFATc2IP good-2 2L76

NFATc2IP good-1 2JXX

OTU1 good-1 2KZR

PLXNC1 3KUZ

RNF2/RING1B 3H8H

SF3A1 1ZKH

Ubiquilin-1 good-1 2KLC

Ubiquilin-3 1YQB

UHRF1 2FAZ

USP7 poor-1 good-1 2KVR

USP15 3PPA

Sharpin 30 aa-154 aa (poor) FAT10 6 aa-165 aa (promising) FAT10 6 aa-89 aa (good)

31

2.4.4 Structural Coverage - Completing the UBL Phylogenetic Tree

In 2005, there were 73 formally annotated UBLs, which has since grown to 191 formally annotated

ubiquitin-like domain-containing genes and 325 ubiquitin-like domain-containing isoforms

(Marchler et al., 2013). This increase in annotated domains was almost certain due, at least in

part, from the new structures of UBLs deposited in the PDB from work in this thesis; BRAF-1/-2

(PDB_ID: 2L05.A & PDB_ID: 3NY5.ABCD), FAU_1-1 (PDB_ID: 2L7R.A), HERPUD2_1-1

(PDB_ID: 2KDB.A), ISG15_1-2 (PDB_ID: 2HJ8.A), MAP1LC3A_1-1 (PDB_ID: 3ECI.AB),

NFATc2IP_1-1 (PDB_ID: 2L76.A), NFATc2IP_1-2 (PDB_ID: 2JXX.A), PLXNC1_1-2 (PDB_ID:

3KUZ.AB), RING1_2-1/-2 & RING1_2-2 (PDB_ID: 3H8H.A), SF3A1_1-1 (PDB_ID: 1ZKH.A),

UBQLN1_1-1 (PDB_ID: 2KLC.A), UBQLN3_1-1 (PDB_ID: 1YQB.A), UHRF1_1-1 (PDB_ID:

2FAZ.AB), USP15_1-1/-2/-3 & USP15_1-2 (PDB_ID: 3PPA.A) and USP7_1-3 (2KVR.A) (Table

2.2 & Figure 2.8).

Figure 2.8: Clustering of human UBLs into groups based on sequence similarity. Phylogenetic tree of all human

UBLs displaying sub-clustering into 5 groups based on UBL domain sequence similarity. UBLs structurally characterized for this project are labelled in blue alongside corresponding groups and PDB identifiers. Ubiquitin-like

modifiers and 3 putative ubiquitin-like modifiers structurally characterized for this project are underlined.

32

Nevertheless, our research has identified 398 unique UBLs in 220 human genes. When taking

into account isoforms and identical UBLs, there are 645 ubiquitin-like human protein domains. A

number of UBLs have low percent sequence identity, yet continue to share secondary structure

elements characteristic of the -grasp fold found in ubiquitin and UBLs. Our approach of

combining a BLAST sequence similarity search of human proteins followed by secondary

structure predictions and subsequent BLAST sequence similarity searches, allowed us to identify

putative UBLs. Some of the putative UBLs were not formally annotated at the time of analysis,

but have since been formally annotated, while 88 putative ubiquitin-like domain-containing

isoforms and 29 ubiquitin-like domain-containing genes have yet to be validated.

The ambitious goal of completing the structural coverage of all human UBLs through experimental

and computational means was not fully achieved, but 32% of human UBLs now have experimental

structures and an additional 49% of structural coverage has been achieved through 196

computationally determined homology models. The remaining 74 UBLs are too distantly related

to any of the experimentally characterized proteins within the PDB, and at least one member of

each modelling family will need to be experimentally characterized to complete the structural

coverage of all human UBLs (Table 2.3).

33

Table 2.3: All human UBLs that remain to be structurally determined, along with their most similar protein structure

and biological significance.

UBL # Genes that

contain UBL PDB

ID PDB Protein Name

% Sequence

Identity

% Query Length

Medical Significance (OMIM, CGP, DiseaseHub)

# PPI partners

(BioGRID, HPRD, BIND)

# publications

(PubMed)

1 ANKUB1-2 3FIN-C 50S ribosomal protein L1 – Thermus thermophilus

30% 62%

- - 3

2 ANKUB1-3 3G4O-A Aerolysin

– Aeromonas hydrophila 40% 27%

3 ARAP1-3 2JKB-A Sialidase B

– Streptococcus pneumonia 27% 58% - 24 33

4 ARAP2-2 1U5F-A Src kinase-associated

phosphoprotein 2 – Mus musculus

23% 62%

- 1 14

5 ARAP2-3 1YZX-A Glutathione S-transferase kappa 1 – Homo sapiens

27% 59%

6 ARAP3-2 3L7U-A Nucleoside

diphosphate kinase A – Homo sapiens

33% 43% - 3 17

7 ARHGAP20 1U5L-A Major prion protein

– Trachemys scripta 27% 60% - 1 13

8 ASPSCR1_3 3LH5-A Tight junction protein ZO-1

– Homo sapiens 35% 65%

alveolar soft part sarcoma & renal cell carcinoma

12 28

9 EPB41L1_3-1 2HE7-A Band 4.1-like protein 3


mental retardation

30 34

10 FRMD1_2-2 2DD4-B Thiocyanate hydrolase subunit

– Thiobacillus thioparus 44% 48% - 0 4

11

FRMD3_1-2 FRMD3_2-2 FRMD3_3-2 FRMD3_5-1 FRMD3_6-2 FRMD3_7-2 FRMD3_8-1 FRMD3_10-1

4K4K-A ORF:BACUNI_00621

– Bacteroides uniformis ATCC 8492

34% 54%

diabetic nephropathy & potential tumor

suppressor

- 10

12 FRMPD2_1-1 FRMPD2_2-1

3MEJ-A Putative transcriptional regulator

YwtF – Bacillus subtilis 22% 66%

- - 6

13 FRMPD2_4-1 1Q7X-A Tyrosine-protein phosphatase

non-receptor type 13 – Homo sapiens

47% 64%

14 MYLIP_2-1 2B50-A

Peroxisome proliferator-

activated receptor – Homo sapiens

34% 54% - 16 30

15 PAN2_1-1 PAN2_3-1

1E2Z-A Apocytochrome F

– Chlamydomonas reinhardtii 29% 66%

- 345 20 16

PAN2_1-2 PAN2_2-2 PAN2_3-2

2JWO-A V(D)J recombination-activating

protein 2 – Mus musculus 42% 29%

17 PAN2_1-3 PAN2_2-3 PAN2_3-3

4BUJ-B Superkiller protein 3

– Saccharomyces cerevisiae 32% 57%

18 PIK3C2B 2RD0-A

Phophatidylinositol 4,5-bisphophate 3-kinase catalytic

subunit isoform – Homo sapiens

32% 50% neoplasms 16 75

19 PIK3CG 3V65-B Low-density lipoprotein

receptor-related protein 4 – Rattus norvegicus

32% 55% longevity & HIV

pathways 36 380

20 PRIC285_1-1 2LU7-A Obscurin-like protein 1

– Homo sapiens 36% 67% - 7 16

21 PTPN13_1-2 PTPN13_3-2 PTPN13_4-3

3T30-B Nucleoplasmin-2 – Homo sapiens

22% (10% gap)

74% Systemic lupus erythematosus, lung cancer &

multiple sclerosis

34 79

22 PTPN13_3-8 PTPN13_4-9

1GAK-A Fertilization protein – Haliotis fulgens

27% 69%

23 PTPN14_1-3 4LXG-A / hydrolase

– Sphingomonas wittichii 24% 69%

breast neoplasms & lymphedema

38 31

24 PTPN21_1-2 4H1Z-A Enolase - Rhizobium meliloti 30% 48% Graves’ disease 8 14

25 PTPN3_1-2 1GG3-A Protein 4.1 – Homo sapiens 53.7% 53% - 12 28

26 RALGDS_1-1 RALGDS_2-1

1F1R-A 3,4-dihydroxyphenylacetate

2,3-dioxygenase – Iarthrobacter globiformis

31% 56% - 44 47

34

UBL # Genes that

contain UBL PDB

ID PDB Protein Name

% Sequence

Identity

% Query Length


# PPI partners


# publications

(PubMed)

27 RAPGEF2 2YW3-A

4-hydroxy-2-oxoglutarate aldolase/2-deydro-3-

deoxyphosphogluconate aldolase

– Thermus thermophiles

33% 43% - 18 31

28 RASSF4_1 RASSF4_4

3RSN-A Set1/Ash2 histone

methyltransferase complex subunit A SH2 – Homo sapiens

25% 69% Alzheimer’s

disease 3 14

29 RASSF6_4 3VHD-A B-1,4-endoglucanase – Prevotella bryantii

25% 63% - 5 18

30 RP1L1_1-3 2XOA-A Ryanodine receptor 1

– Oryctolagus cuniculus 40% 48%

occult macular dystrophy

- 14

31 SACS_1 SACS_2

1JHJ-A Anaphase-promoting complex

subunit 10 – Homo sapiens 30% 57% spastic ataxia 15 68

32 SHROOM1_1-2 SHROOM1_2-2

1X8M-A 4-deoxy-L-threo-5-hexosulose-

uronate ketol-isomerase – Escherichia coli

35% 45% - - 4

33 SNX27_1 SNX27_2 SNX27_3

4GXB-A Sorting nexin-17 – Homo sapiens

39% 67% - 9 32

34 SNX31_1-2 SNX31_2-2

4GXB-A Sorting nexin-17 – Homo sapiens

48% 50% - - 5

35 UBXN4_1-1 4L77-A 1,8-cineole

2-endo-monooxygenase – Citrobacter braakii

35% 51% - 10 19

36 UBXN6_1-1 3A79-A Variable lymphocyte receptor B

– Eptatretus burger 31% 64% - 40 29

37 UFM1_2 1WXS-A Ubiquitin-fold modifier 1

– Homo sapiens 100% 63% - 39 24

38 UHRF1BP1 1IXO-A Serine/threonine-protein

phosphatase 2A activator 1 – Saccharomyces cerevisiae

28% 58% - 2 12

39 USP11_1-2 2IQX-A Phosphatidylethanolamine-

binding protein 1 – Rattus norvegicus

32% 68% HIV interaction 98 51

40 USP25_1-1 4H6Y-A FERM, RhoGEF and pleckstrin

domain-containing protein 1 – Homo sapiens

29% 65%

-

32 33 41 USP25_2-1 2CWY-A TTHA0068

– Thermus thermophilus 26% 69%

42 USP25_2-2 3ZGJ-A Putative 4-hydroxyphenylpyruvic

acid dioxygenase – Streptomyces coelicolor

33% 43%

43 USP28_1-1 USP28_2-1

1AKO-A Exodeoxyribonuclease III

– Escherichia coli 27% 47%

- 49 28 44 USP28_1-2 USP28_2-2

1NBF-A Ubiquitin carboxyl-terminal

hydrolase 7 – Homo sapiens 30% 63%

USP28_2-3 3OCJ-A BPP1064 putative export protein

– Bordetella parapertussis 36% 51%

45 USP32_1-4 1PMI-A Mannose-6-phosphate

isomerase – Candida albicans 33% 40%

- 26 15 46 USP32_1-5 4LFY-A

Dihydroorotase – Burkholderia cenocepacia

38% 41%

47 USP32_1-6 1X4O-A SURP and G-patch domain-

containing protein 1 – Mus musculus

31% 58%

48 USP34_1-1 USP34_2-1 USP34_3-1

4B3F-X DNA-binding protein SMUBP-2

– Homo sapiens 30% 65% - 36 24

49 USP4_1-3 USP4_2-3

2Z1K-A (Neo)pullulanase

– Thermus thermophiles 32% 51% - 78 47

50 USP40_1-1 USP40_3-2

2F57-A Serine/threonine-protein kinase

PAK 7 – Homo sapiens 36% 51%

Parkinson’s Disease & Eye

Diseases 3 15 51

USP40_1-2 USP40_3-3

3IBD-A Cytochrome P450 2B6


52 USP40_2-1 1EW3-A Major allergen Equ c 1

– Equus caballus 29% 56%

53 USP43_1-1 3N5G-A Thymidylate synthase


- 9 7

54 USP43_1-2 2HW6-A MAP kinase-interacting

serine/threonine-protein kinase 1 – Homo sapiens

29% 60%

35

UBL # Genes that

contain UBL PDB

ID PDB Protein Name

% Sequence

Identity

% Query Length


# PPI partners


# publications

(PubMed)

55 USP47_1-4 1UF2-A Outer capsid protein P3

– Rice dwarf virus 26% 53%

- 12 26 56 USP47_2-3 4AWS-A NADH:flavin oxidoreductase

Sye1 – Shewanella oneidensis 57% 45%

57 USP47_2-4 3NWI-A Zinc transport protein ZntB – Salmonella typhimurium

35% 65%

58 USP48_2-3 3GB6-A Putative fructose-1,6-bisphosphate aldolase – Giardia intestinalis

35% 46%

- 10 24

59 USP48_5-1 1S70-A

Serine/threonine-protein

phosphatase PP1- catalytic subunit – Gallus gallus

27% 57%

60 USP48_5-2 2ISV-A Putative fructose-1,6-bisphosphate aldolase – Giardia intestinalis

35% 53%

61 USP48_6-1 3LAD-A Dihydrolipoyl dehydrogenase

– Azotobacter vinelandii 33% 57%

62 USP6_1-1 3UBF-A Neural-cadherin

– Drosophila melanogaster 26% 62%

aneurysmal bone cysts

15 30

63 USP6_1-2 4FN4-A Short chain dehydrogenase – Sulfolobus acidocaldarius

38% 55%

64 USP6_1-3 USP6_2-3

1K28-D Baseplate structural

protein Gp27 – Enterobacteria phage T4

31% 60%

65 USP6_2-2 1PGW-2 RNA2 polyprotein

– Bean-pod mottle virus 30% 69%

66 USP9X_1-3 USP9X_2-3

1VJV-A Ubiquitin carboxyl-terminal

hydrolase 6 – Saccharomyces cerevisiae

38% 62% Turner

syndrome 98 80

67 USP9Y_1-1 USP9Y_2-1

4NGU-A TRAP dicarboxylate transporter,

DctP subunit – Desulfovibrio desulfuricans

27% 64% Infertility /

azoospermia 6 30

68 USP9Y_1-3 USP9Y_2-3

2F1Z-A Ubiquitin carboxyl-terminal

hydrolase 7 – Homo sapiens 42% 62%

69 VCPIP1_1-1 4I15-A Class 1 phosphodiesterase

PDEB1 – Trypanosoma brucei 28% 69%

- 26 31

70 VCPIP1_1-3 3LXM-A Aspartate carbamoyltransferase

– Yersinia pestis 32% 51%

71 WDR48_1-1 WDR48_5-1

1LK5-A Ribose-5-phosphate isomerase

A – Pyrococcus horikoshii 31% 53%

- 70 28

72 WDR48_1-2 WDR48_5-2

1R8I-A TraC – Escherichia coli 23% 37%

73 WDR48_3-1 WDR48_4-1

2PBI-A Regulator of G-protein

signalling 9 – Mus musculus 28% 66%

74 WDR48_3-2 WDR48_4-2

1IDU-A Vanadium chloroperoxidase –

Curvularia inaequalis 33% 78%

36

2.5 Conclusion

The human genome contains 220 genes that encode 398 unique UBLs. At the time of the

analysis, 147 of the UBLs were not annotated as having the Ubiquitin-fold. The goal of this project

was to obtain structural coverage of all human UBLs, without experimentally determining each of

the 398 unique UBLs. This was facilitated by grouping the 398 UBLs into 100 modelling families

that represent homologous protein domains that have similar structures. NMR spectroscopy was

used to screen and prioritize UBLs for structure determination, and 17 human UBLs were

structurally characterized using X-ray Crystallography and NMR spectroscopy. As a result, the

RCSB PDB now has 32% structural coverage of human UBLs, and 82% structural coverage when

taking into account homology modelling. Of the 74 remaining human UBLs that lack structural

information, 30 are singletons and are 36% similar & 23% identical to protein structures in the

PDB. This project provided 3.7% coverage of the human UBLs through experimental structure

determination and 6% coverage when taking into account homology models. Structural analyses

also provide insight into families of related proteins. In particular, structural analysis of the

NFATc2IP and ubiquilin protein families revealed insight into protein-protein interactions and

facilitated the prediction of novel binding partners.

37

Chapter 3

Solution NMR structure determination of human Ubiquitin-like domains in NFATc2IP & Ubiquilin-1

Contributions: A. Semesi, M. Garcia & A. Yee assisted with cloning, small scale sample

preparation & small scale expression/solubility screening. C. Fares, M. Karra, S. Srisalam, S.

Houliston assisted with NMR data acquisition and NMR titration. B. Wu, A. Gutmanas & A. Lemak

assisted with NMR structure determination. I performed large scale NMR sample preparation and

NMR screening, as well as structure determination and subsequent analyses of NFATc2IP &

ubiquilin-1.

38

Chapter 3

Solution NMR structure determination of human Ubiquitin-like domains in NFATc2IP & Ubiquilin-1

3.1 Introduction

Ubiquitin-like domains from two human ubiquitin-like domain containing proteins, NFATc2IP and

Ubiquilin-1, were structurally determined using NMR spectroscopy. The ubiquitin-like domain of

human NFATc2IP (residues 342-419) and the ubiquitin-like domain of Ubiquilin-1 (residues 34-

112), both share the same -grasp domain architecture as Ubiquitin and other UBLs encoded

within the human genome.

Structure determination of these two protein structures was part of a collaborative effort that

resulted in the structure determination and characterization of 17 human ubiquitin-like domain

structures that have expanded our knowledge of the diversity of the ubiquitin fold.

3.1.1 NFATc2IP

NFATc2IP is involved in the Nuclear factor of activated T-cells (NFAT) signaling cascade, which

is important in immune response (Rengarajan et al., 2000). The NFAT family of transcription

factors (NFATc1, NFATc2, NFATc3, and NFATc4) are characterized by a Rel-homology region

and an NFAT-homology region (Macian F, 2005). NFATc2 interacts with NFATc2IP, and is

present in the cytoplasm prior to translocating to the nucleus upon T-cell receptor stimulation (Rao

et al., 1997). SUMO conjugation of NFATc2 leads to nuclear retention, regulation of

transcriptional activity and recruitment to nuclear SUMO-1 bodies (Nayak et al., 2009; Terui et al.,

2004). NFATc2 contains a putative SUMO interacting motif, which could be involved in the

association between NFATc2IP and NFATc2.

39

3.1.2 Ubiquilin-1

Ubiquilin-1 is one of the four members of the ubiquilin protein family. Ubiquilin proteins contain an

N-terminal ubiquitin-like domain and a C-terminal ubiquitin-associated domain, separated by ~450

aa (Mah et al., 2000). The central region of each member of the ubiquilin protein family contains

two STI1 motifs, capable of binding to heat shock proteins. Ubiquilin proteins physically associate

with proteasomes and ubiquitin ligases, and are thought to modulate protein degradation.

Ubiquilin-1 interacts with ubiquitin-interacting motifs (UIMs) in the proteasomal subunit S5A,

ataxin-3, HSJ1a, and EPS15 (Heir et al., 2006; Regan-Klapisz et al., 2005). Ubiquilin-1 also

interacts with CD47 and Gβγ, suggesting a role in integrating adhesion and signaling components

of cell migration (N'Diaye & Brown, 2003).

3.1.3 Ubiquitin-like Fold

The ubiquitin-like fold of both NFATc2IP & Ubiquilin-1 contain a 5-strand mixed -sheet that is

intercalated by an -helical core. Comparative analysis of both ubiquitin-like folds reveal minor

differences (1-2 aa) in loop lengths, and the most distinct difference is at the C-terminus of the -

helical core (Figure 3.3 & Table 3.2). The Ubiquilin-1 -helical core is 16 aa and contains a 2-

residue lysine 59 – serine 60 break that allows the three C-terminal residues of the -helix

(histidine 61, threonine 62, aspartic acid 63) to orient back into the fold.

40


3.2.1 NFATc2IP UBL domain NMR structure determination

NMR screening was performed on a 78 residue construct of the 2nd ubiquitin-like domain of

NFATc2IP, and its HSQC spectra revealed that it was amenable for structure determination

(MGSSHHHHHHSSGLVPRGSTETSQQLQLRVQGKEKHQTLEVSLSRDSPLKTLMSHYEEAMGLSGRKLSFFFDGTK

LSGRELPADLGMESGDLIEVWG - SGC clone accession: ubh72.342.419.pET28-MHL_SDC088D093).

The NMR sample was expressed in E. coli BL21 (DE3) in a 125 mL flask containing M9 minimal

media (100 uM ZnSO4, 8.55 mM NaCl, 47.6 mM Na2HPO4, 22 mM KH2PO4 100 mM MgSO4, 2

mM biotin, 1.5 mM thiamine.HCl, 10 mM ZnSO4, and 0.1 M CaCl2), supplemented with 15NH4Cl,

13C6-D-glucose and 50 µg/mL kanamycin, and was inoculated from a glycerol stock of bacteria.

The flask was incubated on a shaker for 18 hours at 220 rpm at 37ºC before being transferred to

a 2L flask containing 1000 mL M9 minimal media supplemented with 50 µg/mL kanamycin, and

incubated at 37 ºC until an OD600 of 1.0 was reached. Protein expression was induced with 100

µM IPTG and the cells were incubated for 15.5 hours at 220rpm at 15ºC. Cell pellets were

obtained by centrifugation, and frozen in 50 mL Falcon tubes at -80ºC. The frozen cell pellets

were thawed by soaking in warm water before being resuspended in 40 mL lysis buffer (15.4 mM

tris.HCl, 100 uM ZnSO4 100uL, 0.5 mM NaCl, and 15 mM imidazole; pH 8.5.) and lysed by

sonication on ice. The lysate was clarified through centrifugation for 20 min at 4 ºC, and the

supernatant was mixed with 2 mL of Ni2+ affinity beads per 40 mL lysate. The mixture was shaken

for 20 minutes at 4 ºC, before undergoing centrifugation at 2000 rpm for 6 minutes. The

supernatant was decanted and the remaining resin was resuspended and washed twice with lysis

buffer, followed by two 5 mL cold buffer washes (15.4 mM tris.HCl, 100 uM ZnSO4 100uL, 0.5 mM

NaCl, and 30 mM imidazole; pH 8.5). The washed resin was transferred to a gravity filter column

and washed with an additional 2 mL of wash buffer. The purified protein was then eluted from the

resin with 5 mL of elution buffer (15.4 mM tris.HCl, 100 uM ZnSO4 100uL, 0.5 mM NaCl, and 500

mM imidazole; pH 8.5).

41

The purified protein was exchanged from elution buffer into MOPS-based NMR buffer (NMR buffer

for H2O experiments: pH 8.0, 10 mM MOPS, 500 mM NaCl, 1 mM benzamidine, 0.01% NaN3, 10

µM ZnSO4, 10% D2O, and 90% H2O; NMR buffer for D2O experiments: pH 8.0, 10 mM MOPS,

500 mM NaCl, 1 mM benzamidine, 0.01% NaN3, 10 µM ZnSO4, and 100% D2O) by

ultracentrifugation using 2 mL concentrators with a 3,000 molecular weight cut-off (VivaSpin 2

MES) at 3000 rpm, resulting in a final volume of 300 µL and final protein concentration of 0.9 mM.

The concentrated protein was then transferred to a 3 mm NMR tube.

A series of NMR spectra (3D HNCO, 3D HNCA, 3D CBCA(CO)NH, 3D HBHA(CO)NH, 2D 1H-

13C Constant Time HSQC, 3D 1H-13C NOESY, 3D 1H-15N NOESY, 3D 1H-13C Aromatic

NOESY, 3D (H)CCH-TOCSY, and 3D H(C)CH-TOCSY) were collected at 298K using a 500MHz

Bruker AVANCE spectrometer, a 600MHz Bruker AVANCE spectrometer and a 800MHz Bruker

AVANCE spectrometer. After data collection was performed on the unaligned sample, the purified

protein was aligned by titrating 12 mg/mL Pf1 co-solvent Protease-free Phage into the NMR

sample until 10 Hz proton splitting was observed. Spectra of aligned and unaligned spectra (2D

1H-15N IPAP HSQC) were obtained using the 500MHz Bruker AVANCE spectrometer and the

800MHz Bruker AVANCE spectrometer. NMR data was processed and analyzed using

TOPSPIN, NMRPipe, NMRDraw, SPARKY, Abacus/FMCGUI, CNS, TALOS, PALES, PSVS, and

WhatIF.(Delaglio et al., 1995; Goddard & Kneller; Lemak et al., 2011; Brünger et al., 1998;

Brünger AT, 2007; Shen et al., 2009; Zweckstetter & Bax, 2000; Bhattacharya A et al., 2007;

Vriend G, 1990)

3.2.2 Ubiquilin-1 UBL domain NMR structure determination

The process for NMR structure determination of Ubiquilin-1 was very similar to that of NFATc2IP,

with a few minor differences that included the use of the LEX fermentation system and non-

uniform sampling. The LEX fermentation system is a high-throughput bioreactor developed at the

Structural Genomics Consortium that consists of an enclosure that houses cell culture within

42

media bottles that are connected to an air manifold via a quick disconnect manual flow regulator

to ensure sufficient oxygenation and mixing of cells at a regulated temperature (Koehn & Hunt,

2009). Of the three constructs generated for ubiquilin-1, a 79 residue construct was determined

to be most amenable for structure determination by NMR (SGC clone accession:

ubqln1.034.112.p15Tvlic

MGSSHHHHHHSSGRENLYFQGPKIMKVTVKTPKEKEEFAVPENSSVQQFKEEISKRFKSHTDQLVLIFAGKILKDQDT

LSQHGIHDGLTVHLVIKTQNRP).

The NMR sample was expressed in E. coli BL21 (DE3) RIL in M9 minimal media supplemented

with biotin, thiamine, and 10 µM ZnSO4; 15NH4Cl and 13C-glucose were the sole nitrogen and

carbon source. Starter cultures (50 mL in a 250 mL flasks) were prepared with media

supplemented with 100 µL of glycerol stock and shaken overnight (18 hours) at 220 rpm at 37ºC.

The starter culture was used to inoculate 500 mL of growth media that was placed in a modified

LEX fermentation system at 37ºC until an OD600 of 1.0 was achieved. Protein expression was

induced with 1 mM IPTG and grown at room temperature for 15.5 hours. Cells were harvested

by centrifugation and frozen in 50 mL Falcon tubes at -80ºC. The frozen cell pellets were thawed,

resuspended in 25 mL lysis buffer (20 mM tris.HCl, 100 uM ZnSO4, 0.5 mM NaCl, and 15 mM

imidazole, pH 8.5) and lysed by sonication on ice. Lysate was clarified by centrifugation for 20

min at 4°C and the supernatant was mixed for 20 minutes at 4°C with 2 mL settled Ni2+ affinity

beads. Beads were batch-washed twice with 5 mL of cold wash buffer (20 mM tris.HCl, 100 uM

ZnSO4, 0.5 mM NaCl, and 30 mM imidazole, pH 8.5), spun at 2000 rpm for 6 minutes, transferred

to a column, and further washed with 2 mL of wash buffer. The purified protein was eluted with 5

mL of Elution buffer (20 mM tris.HCl, 100 uM ZnSO4, 0.5 mM NaCl, and 500 mM imidazole, pH

8.5). The purified protein was exchanged into NMR buffer (pH 7.0, 10 mM Tris-HCl, 300 mM

NaCl, 10 mM DTT, 1 mM benzamidine, 0.01% NaN3, 1x inhibitor cocktail (Roche), 10 µM ZnSO4,

10% D2O, and 90% H2O) and protein concentration was performed using VivaSpin concentrators

43

with a 5,000 molecular weight cut-off at 3000 rpm, resulting in a final volume of 300 µL and protein

concentration of 0.5 mM.

The purified protein was transferred to a 5 mm Shigemi NMR tube for data collection, and a series

of spectra (3D HNCO, 3D HNCA, 3D CBCA(CO)NH, 3D HBHA(CO)NH, 3D (H)CCH-TOCSY, 3D

H(C)CH-TOCSY, 13C-edited aliphatic NOESY, 13C-edited aromatic NOESY, 15N-edited NOESY-

HSQC, and 13C Constant Time HSQC) were collected at 25ºC on a 800 MHz Bruker AVANCE

spectrometer and a 600 MHz Bruker AVANCE spectrometer equipped with a z-shielded gradient

triple resonance cryoprobe. Chemical shifts were referenced to external DSS. All spectra were

non-uniformly sampled, and were processed using the NMRPipe, NMRDraw and

multidimensional decomposition software (Delaglio et al., 1995). The backbone assignments

were obtained using HNCO, CBCA(CO)NH, HBHA(CO)NH, HNCA and 15N-edited NOESY-HSQC

spectra. Aliphatic side chain assignments were obtained from H(C)CH-TOCSY, (H)CCH-TOCSY,

13C-edited aliphatic NOESY and 15N-edited NOESY-HSQC spectra. 36 H-N and 39 Ca-CO RDC

constraints were generated using SPARKY and PALES. NMR data was processed and analyzed

using TOPSPIN, NMRPipe, NMRDraw, SPARKY, MDD, FMCGUI, CYANA, CNS, TALOS,

PALES, and PSVS.

Distance restraints for structure calculations were derived from cross-peaks in 15N-edited NOESY-

HSQC, 13C-edited aliphatic and aromatic NOESY-HSQC spectra. NOE assignment and structure

calculations were performed using FMC-GUI and CYANA. The quality of the structure calculation

was assessed by NMR structure quality assessment scores (NMR PRF scores). The best 20 of

100 CYANA structures from the final cycle were selected and subjected to molecular dynamics

refinement in explicit water with RDC constraints using the program CNS. The structures were

inspected by PROCHECK and MolProbity using NESG validation software package PSVS.

44

3.2.3 Comparative analysis of Ubiquilin-1, NFATc2IP, Ubiquitin & SUMO2

Structural models (homology models and experimentally determined models) were inspected

using UCSF Chimera, and extraneous atoms removed (e.g. poly-histidine tag, water molecules,

other proteins/peptides, and residues that extended beyond the core ubiquitin-like domain)

(Petterson et al., 2004). The molecular structures of each structurally characterized Ubiquitin-like

domain were structurally aligned and superimposed using UCSF Chimera. Based on the

structural alignment, the corresponding core RMSD and C RMSD were calculated. Based on

both the structural alignment and secondary structure element alignment, a multiple sequence

alignment was generated.

Electrostatic potential distributions of 58 human UBLs were evaluated using the Analysis of

Electrostatic Similarities Of Proteins (AESOP) framework (Gorham et al., 2011). The x-ray crystal

structure coordinates of GABARAPL1(PDBID:2R2Q), NFATc2IP_2nd(PDBID:3RD2),

FAF1(PDBID:3QX1), USP15(PDBID:3PPA), TCEB2(PDBID:4B95), NSFL1C(PDBID:1S3S),

RNF2(PDBID:3H8H), UBXN7(PDBID:1WJ4), BRAF(PDBID:3NY5), NCF2(PDBID:1OEY),

PIK3CG(PDBID:3CST), OASL(PDBID:1WH3), RGL2(PDBID:4JGW), SUMO3(PDBID:2IO1),

PIK3CD(PDBID:4XE0), EPB41(PDBID:1GG3), EPB41L3(PDBID:2HE7), RALGDS(PDBID:2RGF),

ISG15(PDBID:3SDL), NF2(PDBID:1H4R), MAP1LC3A(PDBID:3ECI), MAP1LC3B(PDBID:3VTU),

UBQLN3(PDBID:1YQB), BAG1(PDBID:1WXV), UBL7(PDBID:1X1M), USP14(PDBID:2AYN),

RAD23A(PDBID:2WYQ), NEDD8(PDBID:4FBJ), UHRF1(PDBID:2FAZ), PIK3CA(PDBID:4JPS),

RDX(PDBID:1J19), UBIQUITIN(PDBID:3B0A & 4HK2), RAF1(PDBID:1GUA), and UBLCP1(PDBID:2M17)

were used for surface charge analysis (Berman et al., 2000). Representative models from 30

NMR ensembles were used: BRAF(PDBID:2L05), FAU(PDBID:2L7R), HERPUD1(PDBID:1WGD),

IQUB(PDBID:2DAF), ISG15(PDBID:2HJ8), NFATc2IP_1st(PDBID:2L76), NFATc2IP_2nd(PDBID:2JXX),

RAD23B(PDBID:1UEL), SF3A1(PDBID:1ZKH), SUMO1(PDBID:1A5R), SUMO2(PDBID:2AWT),

TBCB(PDBID: 2KJ6), UBIQUITIN(PDBID: 1Q0W & 1YX6), UBL3(PDBID: 1WGH), UBL4A(PDBID: 2DZI),

UBL5(PDBID: 1UH6), UBQLN1(PDBID: 2KLC), UBQLN2(PDBID: 1J8C), UBQLN3(PDBID: 1WX7),

UBTD2(PDBID: 1TTN), UBXN4(PDBID: 2KXJ), UFM1(PDBID: 1WXS), UHRF2(PDBID: 1WY8),

45

URM1(PDBID: 1WGK), USP7(PDBID: 2KVR), mouse ASPSCR1(PDBID: 2AL3), mouse RGL1(PDBID:

1EF5), mouse TMUB2(PDBID: 1WIA), and mouse UBFD1(PDBID:1V86).

Structural models were prepared for electrostatic potential calculations by determining partial

charges at a pH of 7.6 and van der Waals radii using PDB2PQR with the PARSE forcefield

(Dolinsky et al., 2007; Sitkoff et al., 1994). Electrostatic potentials were calculated using the

linearized Poisson Boltzmann equation,

where r represents discrete grid point positions within and around the protein, ε(r) is the dielectric

coefficient, ε0 is the vacuum permittivity, κ(r) is the ion accessibility function, ϕ(r) is the

electrostatic potential, e is the electron charge, κB is the Boltzmann constant, T is the temperature,

and z is the unit or partial charge at position δ(r − rr) (Davis et al., 1990). The Adaptive Poisson-

Boltzmann Solver (APBS) software package calculates electrostatic potential by embedding each

UBL in a grid, and solves the Poisson-Boltzmann equation to determine electrostatic potential at

each grid point based on assigned charge, dielectric coefficient, and ion accessibility (Baker et

al., 2001). The dielectric surface was defined using a sphere probe with a radius of 1.4 Å, and ion

accessibility surface was defined using a sphere probe with a radius of 2.0 Å. All UBLs were

superimposed within a unified grid dimensions (129 × 97 × 97 points) with calculated isopotential

contour surfaces plotted at ±1kbT/e. Electrostatic potentials were visualized using USCF Chimera

(Pettersen et al., 2004). Comparison of the spatial distributions of electrostatic potentials of the

UBLs were performed by generating a similarity distance matrix according to the metric:

where ϕA(i,j,k) and ϕB(i,j,k) are electrostatic potentials of proteins A and B, respectively, at a

common grid point (i,j,k), and N the number of grid points. This method implies that proteins

having a distance of 0 have identical spatial distributions of electrostatic potentials, whereas those

having a distance of 2 have completely different electrostatic potential spatial distributions.

46

3.2.4 Protein-protein interaction partner identification

The ScanProsite tool was used to search all human proteins for putative UIMs based on a series

of motifs with strict ([ED](3)-x(3)-[AG]-x(3)-S-x(2)-[ED]) and weak stringency ([ED]-x(3)-[AG]-x(6)-

S-x(2)-[ED]). The resulting lists of putative UIM-containing human proteins were compared to

experimentally known binding partners of ubiquitin, ubiquilin family members and isoforms.

Binding partners were identified by searching multiple protein-protein interaction databases

(BioGRID, iRefWeb, and Human Protein Reference Database) using protein name, uniprot ID,

and protein sequence (Turner et al., 2010). Multiple isoforms of ubiquilin family members and

NFATc2IP exist, and each isoform was included in the search. Human binding partners observed

to interact with non-human forms of ubiquilin and NFATc2IP were also considered in the analysis

of potential binding partners.

For proteins known or predicted to interact with ubiquilin-1 and NFATc2IP that lacked

experimental structures, secondary structure elements were predicted using the JPRED algorithm

for the full length protein of proteins (Cuff et al., 2000; Cole et al., 2008).

A difference approach was performed for identifying putative binding partners for NFATc2IP. Only

two binding partners were known for NFATc2IP. Therefore, bioinformatics analyses were

performed on both of these binding partners to identify possible modes of interaction related to

the ubiquitin fold. Secondary structure elements were predicted. Each -helix was analysed to

identify similarities with the canonical UIM. Each -strand was analysed to identify similarities

with the canonical SIM.

3.2.5 Binding interface analysis

UCSF Chimera was used to superimpose the newly characterized molecular structures of both

ubiquilin-1 and NFATc2IP onto known protein-protein interaction complexes involving

ubiquitin:UIM (PDBID: 1Q0W, 1P9D, 1UEL) and SUMO:SIM (PDBID: 2RPQ, 2ASQ & 2KQS).

47

Residues at varying distances from each atom of the UIM and SIM were annotated. Residues in

proximity to the UIM or SIM were further analysed for conservation or shared similar

physicochemical attributes as ubiquitin or SUMO2.

Molecular surfaces for each UBL were calculated, as well as hydrophobicity and electrostatic

potential distributions. Chemical characteristics near the UIM and SIM binding interfaces were

compared between UBLs, and key observations and residues were annotated.


3.3.1 Structure determination

High-quality NMR structures were obtained for both NFATc2IP & Ubiquilin-1. Their coordinates

were deposited in the Protein Data Bank on November 30th 2007 (NFATc2IP PDBID: 2JXX) and

June 30th 2009 (Ubiquilin-1 PDBID: 2KLC). Both structures consist of a compact globular -grasp

fold that contains 2 -helices and a 5-stranded -sheet with a C RMSD of 1.234Å for 39 core

residues, an overall RMSD of 1.234Å for all 69 aligned residues, and a structural distance

measurement (cutoff 5.0) of 34.382 (Figures 3.1 & 3.2). The -helical core is packed against one

side of the -sheet, and the Ubiquilin-1 -helix contains a 2-residue lysine 59 - serine 60 break

that allows the three C-terminal residues of the -helix (histidine 61, threonine 62, aspartic acid

63) to orient back into the fold (Figure 3.3). The second -helix of Ubiquilin-1 and NFATc2IP is

5-6 aa in length and situated at the top of the -sheet (Table 3.2).

The electrostatic potential distribution at pH 7 is significantly different between ubiquilin-1 and

NFATc2IP. Ubiquilin-1 is mostly positively charged and NFATc2IP is mostly negatively charged

(Figure 3.4 & 3.5). Both Ubiquilin-1 and NFATc2IP contain small hydrophobic patches, while

Ubiquilin-1 has a larger hydrophobicity patch within the region of residues valine 47, leucine 65,

valine 66, leucine 67, isoleucine 68, isoleucine 73, leucine 74, leucine 93, valine 94, and

isoleucine 95, which is within a few angstroms of the putative UIM-binding interface (Figure 3.6).

48

An analysis of each ubiquitin-like domain structure was performed to characterize similarities

between each molecular structure that was determined as part of this thesis. The molecular

structure analysis consisted of exploring four attributes: molecular surface characteristics,

electrostatic potential distribution, secondary structure elements, and protein-protein interaction

interfaces. The protein-protein interaction interface analysis focused on the UIM and SIM binding

interfaces, because the UIM region and SIM region of UBDs are amenable to identification using

computational analysis.

Table 3.1: NMR data and refinement statistics.

NFATc2IP Ubiquilin-1

NMR distance and dihedral constraints Distance constraints: Total NOE 2094 1997 Intra-residual 411 421 Sequential (|i-j| = 1) 556 566 Medium-range (2 ≤ |i-j| ≤ 4) 301 331 Long-range ( |i-j| ≥ 4) 826 679 Hydrogen bonds 0 24 Dihedral Angle constraints: 109 84

- phi 54 41

- psi 55 43

Structure statistics Violations (mean and s.d.) Distance constraints (Å) 0.038 +/- 0.004 0.016 +/- 0.001 Dihedral angle constraints (°) 3.680 +/- 6.088 0.855 +/- 0.130 Max. distance constraint violation (Å) 1.25 0.35 Max. dihedral angle violation (°) 152.43 5.74 Deviations from idealized geometry Bond lengths (Å) 1.235 +/- 0.007 1.256 +/- 0.005 Bond angles (°) 0.495 +/- 0.008 0.516 +/- 0.009 Impropers (°) 0.634 +/- 0.023 0.668 +/- 0.025 Ramachandran plot Most favoured regions (%) 87.5% 84.4% Allowed regions (%) 12.5% 14.3% Generously allowed regions (%) 0.1% 1.3% Disallowed regions (%) 0% 0.1% Average pairwise RMSD (Å) Heavy 1.57 +/- 0.25 1.15 +/- 0.10 Backbone 1.20 +/- 0.35 0.72 +/- 0.12

PDB accession ID 2JXX 2KLC BMRB accession ID 15576 16390

49

Figure 3.1: Secondary structure and H-bond patterns of ubiquilin-1. Secondary structure elements of ubiquilin-1 showing H-bond patterns and physicochemical properties (blue = arginine/lysine/histidine [positively charged], yellow = phenylalanine/threonine/tyrosine [aromatic], dark green = alanine/valine/isoleucine/leucine/methionine [non-polar], light green = glycine [small non-polar], orange = proline, red = glutamic acid/aspartic acid [negatively charged], purple

= asparagine/serine/threonine/glutamine [uncharged polar]).

Figure 3.2: Secondary structure and H-bond patterns of NFATc2IP. Secondary structure elements of NFATc2IP showing H-bond patterns and physicochemical properties (blue = arginine/lysine/histidine [positively charged], yellow = phenylalanine/threonine/tyrosine [aromatic], dark green = alanine/valine/isoleucine/leucine/methionine [non-polar], light green = glycine [small non-polar], orange = proline, red = glutamic acid/aspartic acid [negatively charged], purple

= asparagine/serine/threonine/glutamine [uncharged polar]).

50

Ubiquilin-1 NFATc2IP

Ubiquitin SUMO1 SUMO2 SUMO3

Figure 3.3: Ribbon diagrams of ubiquilin-1, NFATc2IP, ubiquitin, SUMO1, SUMO2 & SUMO3. Ubiquilin-1 and

NFATc2IP contain an -helical break, which also occurs in ubiquitin, SUMO1, SUMO2 and SUMO3.

Table 3.2: Secondary structure elements of NFATc2IP, ubiquilin-1, ubiquitin and SUMO1/2/3.

-strand 1

-strand 2

-helix 3

-strand 4

-strand 5

-helix 6

-strand 7

NFATc2IP_2nd 346-354

9 aa 4 aa

359-367 9 aa

2 aa

370-383 14 aa

6 aa

390-393 4 aa

2 aa

396-398 3 aa

4 aa

403-408 6 aa

4 aa

413-418 6 aa

Ubiquilin-1 26-32 7 aa

2 aa

35-41 7 aa

4 aa

46-61 16 aa

(gap)

2 aa

64-69 6 aa

2 aa

72-74 3 aa

5 aa

80-84 5 aa

4 aa

89-96 8 aa

Ubiquitin 2-6 5 aa

5 aa

12-16 5 aa

5 aa

22-39 18 aa

(gap)

1 aa

41-45 5 aa

2 aa

48-49 2 aa

5 aa

55-60 6 aa

5 aa

66-71 6 aa

SUMO1 21-28 7 aa

4 aa

32-39 7 aa

4 aa

44-55 10 aa

7 aa

62-65 4 aa

10aa

- - 76-80 5 aa

5 aa

86-92 7 aa

SUMO2 18-23 6 aa

5 aa

29-34 6 aa

5 aa

40-52 13 aa

6 aa

59-62 4 aa

2 aa

65-66 2 aa

15aa

82-83 2 aa

1 aa

85-87 3 aa

SUMO3 16-22 7 aa

5 aa

28-34 7 aa

4 aa

39-55 17 aa

(gap)

1 aa

57-61 5 aa

2 aa

64-65 2 aa

16a

82-87 6 aa

2 aa

90-91 2 aa

51

Figure 3.4: Molecular surfaces of ubiquilin-1. Four orientations (x,y,z), (x-90o,y,z), (x,y-90o,z) and (x-180o,y,z)

revealing corresponding faces of ubiquilin-1 represented as ribbon, molecular surface coloured based on electrostatic potential distribution at pH 7.0 (blue = positive, white = neutral, and red = negative) and molecular surface coloured based on hydrophobicity based on the Kyte-Doolittle scale (blue = hydrophilic, white = neutral, and orange/red =

hydrophobic).

Figure 3.5: Molecular surfaces of NFATc2IP. Four orientations (x,y,z), (x-90o,y,z), (x,y-90o,z) and (x-180o,y,z)

revealing corresponding faces of NFATc2IP represented as ribbon, molecular surface coloured based on electrostatic potential distribution at pH 7.0 (blue = positive, white = neutral, and red = negative) and molecular surface coloured based on hydrophobicity based on the Kyte-Doolittle scale (blue = hydrophilic, white = neutral, and orange/red =

hydrophobic).

x -90o

y -90o

x -180o

y -90o

x -180o

x -90o

52

Figure 3.6: UIM-interaction interface of ubiquilin-1 and NFATc2IP. A hydrophobic patch (orange) on ubiquilin-1 is

near the UIM-interaction interface, consisting of residues valine 47, leucine 65, valine 66, leucine 67, isoleucine 68, isoleucine 73, leucine 74, leucine 93, valine 94, and isoleucine 95. Four aliphatic residues (leucine 262, isoleucine 263, alanine 266, and isoleucine 267; pink) in the putative NFATc2 UIM peptide are closest to the hydrophobic patch.

3.3.2 Comparative analysis of ubiquilin-1, NFATc2IP & similar ubiquitin-like modifiers

The ubiquitin fold is the underlying characteristic that unifies all UBLs. However, structural and

physicochemical differences lead to the various functional pathways that UBLs are involved in.

To identify these differences, a comparative analysis of ubiquilin-1 and NFATc2IP was performed,

which was further expanded to include ubiquitin-like modifiers. Even with a core C RMSD of

1.234 Å (39 residues) and common secondary structure elements, the sequence identity between

ubiquilin-1 & NFATc2IP is 13% and the sequence similarity is 38%.

x +90o

x +90o

53

3.3.2.1 Similar canonical ubiquitin-like modifiers: ubiquitin & SUMO-2

The sequence identity/similarity between each ubiquitin-like domain and ubiquitin-like modifiers

was calculated. The closest canonical ubiquitin-like modifier for ubiquilin-1 is ubiquitin (35%

sequence identity & 54% sequence similarity), and the closest canonical ubiquitin-like modifier for

NFATc2IP is SUMO2 & SUMO4 (35% sequence identity & 55% sequence similarity) (Table 3.3).

Table 3.3: Sequence similarity & identity between NFATc2IP, ubiquilin-1, ubiquitin and SUMO1/2/3/4.

NFATc2IP ubiquilin-1 ubiquitin SUMO1 SUMO2 SUMO3 SUMO4

NFATc2IP_2nd 13%id (9) 11%id (8) 29%id (21) 35%id (28) 34%id (27) 35%id (28)

ubiquilin-1 13%id (9) 35%id (26) 19%id (15) 15%id (11) 15%id (11) 12%id (9)

NFATc2IP_2nd 38%sim (27) 41%sim (29) 54%sim (40) 55%sim (44) 53%sim (43) 55%sim (44)

ubiquilin-1 38%sim (27) 54%sim (40) 42%sim (33) 41%sim (30) 41%sim (30) 35%sim (26)

NFATc2IP_2nd 2%gaps (2) 1%gaps (1) 1%gaps (1) 1%gaps (1) 1%gaps (1) 1%gaps (1)

ubiquilin-1 2%gaps (2) 1%gaps (1) 1%gaps (1) 1%gaps (1) 1%gaps (1) 1%gaps (1)

3.3.2.2 Structural comparison between ubiquilin-1 & NFATc2IP

Ubiquilin-1 and NFATc2IP share 8 identical residues, 5 within secondary structure elements and

3 within loop regions. All three of the identical residues in loop regions are small & flexible, one

serine & two glycine amino acids. Most of the conserved residues are within the -sheet, however

conserved surface-exposed residues are scattered throughout the molecular surface of the

proteins (Figure 3.7). This may mean that residue conservation between Ubiquilin-1 and

NFATc2IP is related to the common fold and not shared binding partners.

54

Figure 3.7: Similarities between ubiquilin-1 and NFATc2IP. Ubiquilin-1 & NFATc2IP share 8 identical residues (5

within secondary structure elements) and 27 similar residues (12 within secondary structure elements). Molecular surface diagrams highlight all of the conserved (dark) & similar residues (light) within secondary structure (blue) or loops (green).

Ubiquilin-1 NFATc2IP 28-V Aliphatic 30-V 30-V Aliphatic 32-G 38-E Acidic 40-E 41-V Aliphatic 43-L 47-V Aliphatic 49-L 49-Q Polar/Uncharged 51-T 58-F Non-Polar/Uncharged 60-M 69-F Aromatic 71-F 74-L Aliphatic 76-L

Ubiquilin-1 NFATc2IP 84-G Aliphatic/Small 86-G 91-V Aliphatic 93-I 93-L Aliphatic 95-V

Outside Secondary Structure Elements 45-S Polar/Uncharged 47-S 71-G Aliphatic 73-G 88-G Aliphatic 90-G

55

3.3.2.3 Structural comparison between ubiquilin-1 & ubiquitin

Ubiquitin and ubiquilin-1 share 26 conserved residues, and the C-terminal -strand is almost

entirely conserved. Many residues are also conserved throughout the -sheet and major -helix.

Conserved residues exist on the major -helix turns that face the core of the fold. Conserved

surface-exposed residues are also visible on all faces of the protein, and a prominent patch of

conserved residues are within the UIM binding interface of ubiquitin. The presence of the region

of conserved residues could result in a common binding partner between ubiquitin and ubiquilin-

1. Analysis of protein-protein interaction databases revealed that 205 proteins interact with both

ubiquitin & at least one member of the ubiquilin family, while 2407 unique proteins have been

observed for ubiquitin, and 1512 unique proteins have been observed to interact with at least one

member of the ubiquilin family. At least one putative UIM has been observed in 106 of the 205

proteins known to interact with both ubiquitin and a member of the ubiquilin family (Appendix III).

Conserved residues outside secondary structure regions are found mostly at both the N-terminus

and C-terminus of the minor -helix (Figure 3.8).

56

z

z

Figure 3.8: Similarities between ubiquilin-1 and ubiquitin. Ubiquilin-1 & ubiquitin share 26 identical residues (20

within secondary structure elements) and 40 similar residues (29 within secondary structure elements). Molecular surface diagrams highlight all of the conserved (dark) & similar residues (light) within secondary structure (blue) or loops (green).

Ubiquilin-1 Ubiquitin 26-M Non-Polar/Uncharged 1-M 28-V Aliphatic 3-I 30-V Aliphatic 5-V 31-K + Charged 6-K 32-T Polar/Uncharged 7-T 41-V Aliphatic 17-V 46-S Polar/Uncharged 22-T 47-V Aliphatic 23-I 49-Q Polar/Uncharged 25-N 51-K + Charged 27-K 54-I Aliphatic 30-I 55-S Polar/Uncharged 31-Q 57-R + Charged 33-K 63-D - Charged 39-D 64-Q Polar/Uncharged 40-Q 67-L Aliphatic 43-L 68-I Aliphatic 44-I 69-F Non-Polar/Uncharged 45-F

Ubiquilin-1 Ubiquitin 72-K + Charged 48-K 74-L Aliphatic 50-L 80-L Aliphatic 56-L 81-S Polar/Uncharged 57-S 90-T Polar/Uncharged 66-T 91-V Aliphatic 67-L 92-H Aromatic 68-H 93-L Aliphatic 69-L 94-V Aliphatic 70-V 95-I Aliphatic 71-L 96-K + Charged 72-R

Outside Secondary Structure Elements 34-K + Charged 11-K 70-A Aliphatic 46-A 71-G Aliphatic 47-G 76-D - Charged 52-D 79-T Polar/Uncharged 55-T 85-I Aliphatic 61-I

57

3.3.2.4 Structural comparison between NFATc2IP & SUMO2

Structure conservation between NFATc2IP and SUMO2 is mostly within the -sheet and in loop

regions, with some conserved residues within the major -helix. The conserved loop residues

are at the C-terminus of the major -helix, and C-terminus of a couple of the -strands. Some

molecular-surface exposed conserved residues from loop regions are visible as patches, with

multiple conserved residues bordering the UIM binding interface and limited conservation within

the SIM binding interface (Figure 3.9). This may mean that there isn’t a commonly shared UIM

or SIM between NFATc2IP and SUMO2. However, conservation near the binding interfaces could

mean partial conservation between NFATc2IP binding partners and SUMO2 binding partners.

58

3.3.2.5 Structural differences between NFATc2IP_2nd & SUMO2

Figure 3.9: Similarities between NFATc2IP and SUMO2. NFATc2IP and SUMO2 share 28 identical residues (14

within secondary structure elements) and 44 similar residues (16 within secondary structure elements). Molecular surface diagrams highlight all of the conserved (dark) & similar residues (light) within secondary structure (blue) or

loops (green). To assist with showing the location of the UIM-binding interface & SIM binding interface, both a -strand

from a SIM (purple) and an -helix from a UIM (yellow) are superimposed on the structure.

NFATc2IP SUMO2 26-L Aliphatic 18-I 27-Q Polar/Uncharged 19-N 28-L Aliphatic 20-L 29-R + Charged 21-K 30-V Aliphatic 22-V 32-G Aliphatic 24-G 37-Q Polar/Uncharged 28-S 39-L Aliphatic 30-V 43-L Aliphatic 34-I 49-L Aliphatic 40-L 52-L Aliphatic 43-L 53-M Non-Polar/Uncharged 44-M 56-Y Aromatic 47-Y 58-E - Charged 49-E 69-F Non-Polar/Uncharged 60-F 71-F Non-Polar/Uncharged 62-F 74-T Polar/Uncharged 65-Q 76-L Aliphatic 67-I 91-D - Charged 82-D

NFATc2IP SUMO2 93-I Aliphatic 84-I 94-E - Charged 85-D 95-V Aliphatic 86-V

96-W Non-Polar/Uncharged 87-F

Outside Secondary Structure Elements 45-R + Charged 36-R 48-P Polar/Uncharged 39-P 61-G Aliphatic 52-G 62-L Aliphatic 53-L 63-S Polar/Uncharged 54-S 65-R + Charged 56-R 72-D - Charged 63-D 73-G Aliphatic 64-G 82-P Polar/Uncharged 73-P 83-A Aliphatic 74-A 85-L Aliphatic 76-L 87-M Non-Polar/Uncharged 78-M 88-E - Charged 79-E

59

3.3.3 From Structure to Function: Exploring Protein-Protein Interactions involving ubiquitin-like domains

As described in Chapter One, ubiquitin is known to be involved in many weak and transient

interactions. One of these interactions involves a UIM, which is an -helix found in hundreds of

known ubiquitin binding partners. The UIM is characterized by a conserved motif (E/D-E/D-E/D-

Φ-x-x-A-x-x-x-S-x-x-E/D; where Φ is a hydrophobic residue) (Fisher et al., 2003).

3.3.3.1 The Ubiquitin-Interacting Motif interaction interface

A few UIM:ubiquitin complexes have also been structurally characterized (Table 3.4). Two of the

UIM:ubiquitin complexes involve a UIM within the 26S proteasome non-ATPase regulatory

subunit 4 (Hofmann & Falquet, 2001). The 26S proteasome non-ATPase regulatory subunit 4

UIM does not fit the canonical UIM motif even though the binding mode and interaction features

remain the same. The key differences between the canonical UIM motif and the 26S proteasome

non-ATPase regulatory subunit 4 UIM include a glutamine neighbouring the conserved

hydrophobic residue within the acidic N-terminal region of the motif, and there are 4 amino acids

instead of the canonical 2-residue gap between the conserved serine and the acidic C-terminal

region.

Table 3.4: UIM:ubiquitin complexes deposited in the PDB, along with UIM sequence.

PDB_ID Year UIM-containing protein ubiquitin or ubiquitin-like domain UIM sequence

1UEL:B 2003

26S proteasome non-ATPase

regulatory subunit 4

P55036 UV excision repair protein

RAD23 homolog B (H.sapiens)

P54727 …EEEQIAYAMQMSLQGAE…

doesn’t fit canonical motif

1P9D:A 2003

26S proteasome non-ATPase

regulatory subunit 4

P55036 UV excision repair protein

RAD23 homolog A

(H.sapiens) P54725

…EEEQIAYAMQMSLQGAE…

doesn’t fit canonical motif

1Q0W:A 2003

Vacuolar protein sorting-

associated protein VPS27

P40343 ubiquitin (S.cerevisiae) P0CG63 …EDEEELIRKAIELSLKE…

2D3G:P 2005 HGS HRS O14964 ubiquitin (B. Taurus) P0CH28 …EEEELQLALALSQSEAEE…

60

Analysis of the UIM:ubiquitin complexes reveal structural conservation of acidic residues at the

termini of the UIM-containing-helix, as well as the general positioning of the conserved serine

residue and hydrophobic residues along the ubiquitin-facing surface of the UIM between the N-

terminal acidic residues and the conserved serine (Figure 3.10 & Figure 3.11).

Figure 3.10: UIM -helices from PSMD4, VPS27 and HGS. The UIMs from PSMD4, VPS27 and HGS were structurally characterized in complex with ubiquitin; acidic residues are red, basic residues are blue, hydrophobic residues are orange, and serine are green. Three conserved regions are highlighted: two acidic termini are highlighted with the blue box and the conserved serine highlighted by the green box.

PSMD4

1UEL / 1P9D

VPS27

1Q0W

HGS HRS

2D3G

61

Figure 3.11: Ubiqutin:PSMD4(UIM) complex. Ubiquitin residues within 3Å (isoleucine 68, isoleucine 73, alanine 70,

G71, H92) and 4Å (valine 66, isoleucine 68, alanine 70, G71, K72, isoleucine 73, H92, valine 94) of the UIM displayed as sticks.

Analysis of ubiquitin residues within proximity of the UIM, and corresponding residues within a

superimposed ubiquilin-1 molecular structure, reveal amino acid conservation; 6 out of 6 of

ubiquilin-1 residues at 3Å (isoleucine 68, isoleucine 73, alanine 70, G71, H92), and 14 out of 16

of ubiquilin-1 residues at 4Å (valine 66, isoleucine 68, alanine 70, G71, K72, isoleucine 73, H92,

valine 94). All of these conserved residues are also localized to interact with the hydrophobic

residues of the UIM (Figure 3.12).

62

Figure 3.12: UBL residues within UIM-interaction interface. This chart displays amino acids from ubiquitin, UBTD2

and ubiquilin-1 that are within 2Å, 3Å, and 4Å of each amino acid within the -helix from the PSMD4 UIM. Acidic amino acids are red, hydrophobic amino acids are green, and serine is blue. Black arrows identify amino acids that are

conserved between ubiquitin and ubiquilin-1.

3.3.3.2 Putative UIM Interaction Interface: Conserved Amino Acids

Within the ubiquitin-like domain of ubiquilin family members, there is residue conservation

between family members within two stretches of highly-conserved residues (10 aa in length & 14

aa in length) in both C-terminal -strands (Figure 3.13).

Figure 3.13: Multiple sequence alignment of UBLs from ubiquilin family members. Two conserved regions

correspond to amino acids within 4Å of UIM atoms.

63

3.3.3.3 Putative UIM Interaction Interface: Similar Electrostatic Potential Distribution

Clustering of ubiquitin-like domain molecular structures based on electrostatic potential

distribution at pH7 and 4Å from each UIM atom revealed a strong similarity between ubiquitin and

members of the ubiquilin family (Figure 3.14). For this reason, we looked at potential UIMs that

are within proteins known to interact with both ubiquitin and at least one member of the ubiquilin

family.

Figure 3.14: Similarity tree based on electrostatic potential within 4 Å of UIM-binding interface. A UIM a-helix is

superimposed in the UIM binding interface to show the location & orientation of the UIM.

Ubiquitin Ubiquilin-1

pH 7

Ubiquilin-2

pH 7

Ubiquilin-3

64

3.3.3.4 Surveying Known UIM-Binding Partners

There are currently 78 human proteins with annotated UIMs, of which 16 are known to interact

with ubiquitin (Table 3.5). There are also 5 human proteins with annotated UIMs that are known

to interact with at least one member of the ubiquilin family, and two of these proteins interact with

multiple ubiquilin proteins (Letunic et al., 2014; Turner et al., 2010) (Table 3.6). All 5 of the

proteins have been observed to also interact with ubiquitin. However, this could be an

underrepresented number, as demonstrated by known -helices with minor variations in the UIM

sequence that have been shown to interact with ubiquitin.

Table 3.5: Human proteins that contain at least one canonical UIM motif and observed to interact with ubiquitin, along

with the number of supporting publications and supporting structural complexes that have been deposited in the PDB.

UIM Ubiquitin Interaction ID Supporting

Publications Supporting Structure

PSMD4 UBC 700227 13 1UEL, 1P9D

HGS UBC 1024774 9 -

HGS UBC (Bovin) 728136 3 2D3G

DNJB2 UBC 1007317 1 1Q0W

DNJB2 UBC (Bovin) 877312 1 -

EPN1 UBC 962133 3 -

EPN2 UBC 891993 1 -

EPS15 UBC 1010404 6 -

AN13A UBC 910747 1 -

STAM1 UBC 1008921 1 -

STAM1 UBC 1129713 5 -

STAM2 UBC 1061783 1 -

AKIB1 UBC (Bovin) 1078418 1 -

Table 3.6: Human proteins that contain at least one canonical UIM motif and observed to interact with members of the

ubiquilin family (Turner et al., 2010).

UIM Ubiquilin Interaction ID Supporting Publications

PSMD4

UBQLN4 670139 1

UBQLN2 693598 3

UBQLN1 1155859 3

DNJB2 UBQLN1 772775 1

HGS UBQLN1 840585 2

UBQLN4 898735 1

STAM2 UBQLN4 883239 1

EPS15 UBQLN1 1011809 1

There are 368 human proteins annotated to interact with members of the ubiquilin family, and 827

human proteins known to interact with ubiquitin. There are 202 proteins that have been shown to

interact with ubiquitin & at least one member of the ubiquilin family, of which 57 are human

65

proteins. At least one putative UIM has been observed in 61 of the 202 proteins (17 of the 57

human proteins) known to interact with both ubiquitin & at least one member of the ubiquilin family

(Appendix III).

Table 3.7: 17 human proteins that interact with both human ubiquitin and a member of the ubiquilin family, and that

also contain at least one UIM motif.

ANCHR EPS15 PIN1 RD23A STAM1 UBP34

DNJB2 HD PSMD3 RNF11 STAM2 USP9X

EF1A1 HGS PSMD4 SAE2 UBE3A Analysis of bound UIM domains revealed variability within the canonical UIM motif. These include

a variable length stretch of residues between the N-terminal acidic residues and the conserved

alanine (ie. PSMD4 and DNJB2 have a stretch of 4 residues, while EPN1 has 3 residues that

separate the acidic residues from the alanine), and a variable length stretch of residues separates

the conserved serine and the C-terminal acidic residues (ie. PSMD4 has 4 residues, while DNJB2

and EPN1 have 2 residues that separate the serine from the C-terminal acidic residues). PIN1

had a few additional differences: hydrophobic residues within the N-terminal acidic residue

stretch, a glycine instead of a conserved alanine near the N-terminal acidic residue stretch, a

longer stretch of residues between the conserved glycine/alanine and the conserved serine, and

a single glycine to separate the conserved serine and C-terminal acidic residues (Figure 3.15).

PSMD4 EEEQIAYAMQMSLQGAE

DNJB2 EDEEELIRKAIELSLKE

EPN1 EEEELQLALALSQSEAEE

PIN1 TRTKEEALELINGYIQKIKSGEEDFESLAS

Figure 3.15: Sequence alignment of UIMs within PSMD4, DNJB2, EPN1 and PIN1. Sequence alignment of three

structurally characterized UIMs (PSMD4, DNJB2 and EPN1), as well as the putative UIM in PIN1. Acidic residues are red, basic residues are blue, hydrophobic residues are orange, and serine is green.

To take into account this variability, as well as variability introduced by the structural variance

between UBLs, 4 alternate UIM motifs were used when searching for putative UIMs in known

binding partners of both ubiquitin and members of the ubiquilin family (Table 3.8). Six of these

proteins have no molecular structure deposited in the PDB, while the remaining 11 proteins have

at least one structure within the PDB. PIN1 stands out because its molecular structure has been

deposited into the PDB 45 times (Table 3.8).

66

Table 3.8: UIM motif and 4 variations of the UIM motif were used to identify 17 human proteins that interact with both

human ubiquitin and a member of the ubiquilin family.

[ED](3)-x(3)-[AG]-x(3)-S-x(2)-[ED]

P25686 DNJB2_HUMAN 252 – 265 DEDlqlAmaySlsE 2 PDB structures

O14964 HGS_HUMAN 260 – 273 EEElqlAlalSqsE 4 PDB structures

P55036 PSMD4_HUMAN 232 – 245 EEEarrAaaaSaaE 5 PDB structures

Q92783 STAM1_HUMAN 173 – 186 EEDlakAielSlkE 3 PDB structures

O75886 STAM2_HUMAN 167 – 180 DEDiakAielSlqE 3 PDB structures

[ED]-x(3)-[AG]-x(3)-S-x(2)-[ED]

Q96K21 ANCHR_HUMAN 208 – 219 DerqGsipStqE 0 PDB structures

P25686 DNJB2_HUMAN 211 – 222 254 – 265

DlalGlelSrrE

DlqlAmaySlsE 1 PDB structures

P42566 EPS15_HUMAN 881 – 892 DlelAialSksE 2 PDB structures

P42858 HD_HUMAN 1261 – 1272 EkfgGflrSalD 0 PDB structures

O14964 HGS_HUMAN 262 – 273 ElqlAlalSqsE 2 PDB structures

P55036 PSMD4_HUMAN 215 – 226 234 – 245

ElalAlrvSmeE

EarrAaaaSaaE 5 PDB structures

Q9Y3C5 RNF11_HUMAN 141 – 152 EpvdAallSsyE 0 PDB structures

Q92783 STAM1_HUMAN 175 – 186 DlakAielSlkE 3 PDB structures

O75886 STAM2_HUMAN 169 – 180 DiakAielSlqE 3 PDB structures

[ED]-x(3)-[AG]-x(4)-S-x(2)-[ED]

Q9UBT2 SAE2_HUMAN 483 – 495 EdgkGtiliSseE 4 PDB structures

Q05086 UBE3A_HUMAN 98 – 110 EnskGapnnScsE 0 PDB structures

[ED]-x(3)-[AG]-x(5)-S-x(2)-[ED]

P25686 DNJB2_HUMAN 71 – 84

254 – 267

EgltGtgtgpSraE

DlqlAmayslSemE 2 PDB structures

P68104 EF1A1_HUMAN 319 – 332 DvrrGnvagdSknD 1 PDB structures

P42858 HD_HUMAN 409 – 422 EesgGrsrsgSivE 6 PDB structures

O14964 HGS_HUMAN 262 – 275 ElqlAlalsqSeaE 6 PDB structures

P55036 PSMD4_HUMAN 213 – 226 DpelAlalrvSmeE 8 PDB structures

Q92783 STAM1_HUMAN 173 – 186 EedlAkaielSlkE 3 PDB structures

O75886 STAM2_HUMAN 167 – 180 DediAkaielSlqE 3 PDB structures

Q93008 USP9X_HUMAN 1682 – 1695 EqhdAleffnSlvD 0 PDB structures

[ED]-x(3)-[AG]-x(6)-S-x(2)-[ED]

P68104 EF1A1_HUMAN 403 – 417 DmvpGkpmcveSfsD 7 PDB structures

P42566 EPS15_HUMAN 576 – 590 EvttAvtekvcSelD 0 PDB structures

Q13526 PIN1_HUMAN 87 – 101 ElinGyiqkikSgeE 45 PDB structures

O43242 PSMD3_HUMAN 52 – 66 DgktAaaaaehSqrE 0 PDB structures

P55036 PSMD4_HUMAN 255 – 269 DsddAllkmtiSqqE 5 PDB structures

P54725 RD23A_HUMAN 150 – 164 EedaAstlvtgSeyE 3 PDB structures

Q9UBT2 SAE2_HUMAN 218 – 232 EpteAeararaSneD 5 PDB structures

Q70CQ2 UBP34_HUMAN 786 – 800

1672 – 1686

EknmAdfdgeeSgcE

EscsGlyklslSglD 3 PDB structures

67

3.3.3.5 PIN1 – Peptidyl-Prolyl cis/trans Isomerase

Peptidyl-prolyl cis/trans isomerase (PIN1) regulates protein function by inducing a conformational

change of peptidyl-bonds in polypeptide chains after phosphorylation, and plays a significant role

in cell cycle regulation and cancer development (Lippens et al., 2007). PIN1 also regulates the

function and processing of Tau and APP, and is important for protecting against age-dependent

neurodegeneration. PIN1 is also the only gene known so far that, when deleted in mice, can

cause both tau and Aβ-related pathologies in an age-dependent manner that resembles human

Alzheimer’s disease (Liou et al., 2003).

PIN1 has been associated with ubiquitin through its ubiquitylation, and has been experimentally

observed to interact with ubiquilin-4 through a yeast-2-hybrid interaction (Lim et al., 2006).

However, the mode of that interaction remains unknown.

3.3.3.6 Identifying a putative UIM in PIN1

PIN1 consists of 14 secondary structure elements (10 -strands & 4 -helices). The putative UIM

is within the solvent-exposed -helix1.

EEALELINGYIQKIKSGEED

HHHHHHHHHHHHHHHHTSS-

Figure 3.16: Putative human PIN1 UIM. Human PIN1 protein with the putative UIM highlighted, along with corresponding UIM amino acid sequence highlighting conserved acidic residues (red), conserved glycine (green), and a conserved serine (blue).

68

The putative UIM identified within PIN1 contains non-canonical features; including hydrophobic

residues within the N-terminal acidic residue stretch (ie. …EEALELING…), a glycine instead of a

conserved alanine near the N-terminal acidic residue stretch (ie. …EEALELING…), and a longer

stretch of residues between the conserved glycine/alanine and the conserved serine (ie. 6

residues …GYIQKIKS… instead of 3 residues …GMQMS…, which corresponds to an extra turn in

the -helix).

PIN1 has been structurally characterized by X-ray crystallography and NMR with 45 structures

deposited in the PDB, and the putative UIM identified within PIN1 corresponds to an -helical

region of the protein (Figure 3.16). For this reason, one of the full length PIN1 constructs used

for structure determination by NMR was obtained and used for NMR titration to validate the

hypothesis that PIN1 contains a UIM that can interact with the ubiquilin-1 UIM-binding interface.

3.3.3.7 Ubiquilin-1 & PIN1 NMR Titration

NMR titration was performed using the ubiquitin-like domain of 15N-ubiquilin-1 (corresponding to

PDB-ID: 2KLC) and the full length PIN1 protein (corresponding to PDB-ID: 1NMV; BMRB: 5305).

2KLC was solved by NMR in TRIS buffer with NaCl, NaN3, benzamidine, ZnSO4, and DTT by our

group in 2009, and 1NMV was solved by NMR in phosphate buffer with DTT, EDTA and 50-100

mM sodium sulfate by Bayer et al. in 2003.

A series of NMR titrations were attempted at pH 6.5 and pH 7.0 in buffer optimized for an

UIM:ubiquitin interaction (50 mM sodium phosphate, 1 mM DTT, 10% D2O / 90% H2O) based on

the previously deposited UIM:ubiquitin complex [PDBID: 2RR9], but no chemical shift change was

visible. An additional NMR titration was performed at pH 8.0 in buffer optimized for ubiquilin-1

(10 mM TRIS, 300 mM sodium chloride, 0.01% sodium azide, 1 x inhibitor cocktail [Roche], 1 mM

benzamidine, 10 uM ZnSO4, 10 mM DTT, 10% D2O / 90% H2O) corresponding to the same buffer

used to determine ubiquilin-1 [PDBID: 2KLC], and 9 chemical shift peak changes were observed

69

at a 1:20 ubiquilin-1:PIN1 molar ratio. These peak shifts included D63, K72, isoleucine 73, leucine

74, Q82, H92, valine 94, and K96 (Figure 3.17). These results correspond to amino acids

predicted to be within the UIM binding site of ubiquilin-1 (Figure 3.18).

Figure 3.17: Ubiquilin-1:PIN1 NMR titration. HSQC (64 scans) from NMR titration from 1:0 ubiquilin-1:PIN1 (blue) to 1:20 ubiquilin-1:PIN1 (red); 150 µM 15N-ubiquilin-1 + 3 mM PIN1 at pH 8.0 (298K) in 40 µL sample volume with 50

mM sodium phosphate and 1 mM DTT.

K72 K72

70

3.3.3.8 Analysis of the ubiquilin-1 & PIN1 interface

Analysis of the ubiquilin-1:PIN1 interface reveals an extra -helical turn within the UIM resulting

from an additional three residues between the conserved glycine and conserved serine. The

UIM-binding region has several structural features: -strands 3, 4 & 5 curve around the UIM, a

phenylalanine & histidine are near the conserved serine of the UIM, and an isoleucine and valine

are near the leucine 262 – isoleucine 263 within the acidic N-terminal region of the UIM. The

molecular surface of the UIM-binding interface is positively charged, which could mediate an

interaction with the acidic residues at both termini of the UIM. The 9 residues corresponding to

the chemical shift changes in the NMR titration (aspartic acid 63, lysine 72, isoleucine 73, leucine

74, glutamine 82, histidine 92, valine 94, and lysine 96) are all within the UIM-interaction interface,

and all of the residues were predicted to interact with the UIM based on proximity to the putative

UIM-binding site, and comparative analysis of the UIM:ubiquitin complexes deposited in the PDB.

Analysis of the UBLs of ubiquilin family members reveal that 7 of the 9 residues are conserved

throughout the family. The two residues that are not conserved are isoleucine glutamine and

valine arginine. Both of these residues interact with the same isoleucine on the UIM, which is

next to the N-terminal acidic region of the UIM. This is the same region where hydrophobic

residues are inserted in the acidic region of the PIN1 UIM. Additional experiments are necessary

to validate and further characterize the ubiquilin-1:PIN1 interaction (Chapter Five).

Figure 3.18: Putative ubiquilin-1:PIN1 interaction. Ubiquilin-1 modelled with PIN1 (blue -helix) highlighting 9 stick

residues corresponding to chemical shift changes in the NMR titration.

71

3.3.4 Binding-Partner Driven - Structural analysis of the SUMO-Interacting Motif binding interface

For NFATc2IP, a different approach was taken for identifying a potential binding partner. Instead

of searching for SIMs in known binding partners of both ubiquitin and ubiquilin, the sequence and

secondary structure of all known binding partners of NFATc2IP were analyzed to identify a

possible mode of interaction.

3.3.4.1 NFATc2IP Binding Partners

Human NFATc2 has been observed to interact with 28 human proteins, in addition to HIV tat and

HIV Vpr (Turner et al., 2010). Of the NFATc2-interacting proteins, only NFATc2IP contains two

UBLs. NFATc2IP has been observed to interact with 11 human proteins; B-ATF-3, NFATc2,

RNF4, SREK1, SUMO2, TRAF1/EBI6, TRAF2/TRAP3, TRAF3, TRAF5/RNF84, TRAF6/RNF85,

and ubiquitin (Turner et al., 2010). NFATc2IP contains an arginine-rich N-terminus and two UBLs

at its C-terminus. NFATc2IP is a homologue of yeast DNA repair factor RAD60, sharing 13%

sequence identity along the full length of the protein and 22% sequence identity between the

second ubiquitin-like domain of NFATc2IP and the lone ubiquitin-like domain of RAD60.

Our analyses revealed that SUMO2 and SUMO4 are the ubiquitin-like modifiers that are most

similar to NFATc2IP; 35% sequence identity and 55% sequence similarity (Table 3.3). Based on

the similarity between NFATc2IP and members of the SUMO family, we performed sequence

analysis of the known binding partners of NFATc2IP to determine whether there were -strands

similar to the canonical SIM motif.

72

3.3.5 Sumo-Interacting Motif (SIM)

The SUMO-interacting motif (SIM) was discovered as a protein-protein interaction related to

sumoylation, and the defining characteristics of the SIM have changed over time (Minty 2000,

Song 2004, Song 2005, Hannich 2005, Hecker 2006, Kerscher 2007, Perry 2008, Zhu 2008,

Makhnevych 2009). Initially, a SXS triplet motif was identified in 2000 as being important for

SUMO interaction, followed by a second hydrophobic core motif of V/I-X-V/I-V/I in 2004, and

further experimentation revealed that flanking acidic residues also play a role in SUMO:SIM

interactions (Minty 2000, Song 2004, Hannich 2005, Hecker 2006).

The functional role of the SIM has yet to be fully elucidated. However, it has been shown to be

involved in recruiting SUMO-modified Ubc9 to facilitate sumoylation of the SIM-containing protein.

Structurally, the SIM interaction consists of a -sheet extension, and is a stronger interaction when

compared to other binding modes involving the ubiquitin-fold (Chapter One).

3.3.5.1 Identifying putative SIMs in NFATc2

Full length human NFATc2 consists of 18 secondary structure elements (3 -helices & 15 -

strands). Our analysis of its secondary structure elements revealed that two of the -strands have

characteristics similar to that of the SIM. These include amino acids similar to the hydrophobic

V/I-X-V/I-V/I region, and acidic residues nearby. Analysis of the molecular structure of NFATc2

deposited in the PDB reveal that both of the putative SIM-containing -strands are solvent-

exposed (Figure 3.19).

73

Figure 3.19: NFATc2 SUMO Interacting Motifs. Human NFATc2 protein with two putative SIMs highlighted, along

with corresponding SIM amino acid sequences highlighting secondary structure elements and underlined residues associated with the SIM sequence motif.

Analysis of molecular structures of the SIM:SUMO interaction deposited in the PDB have revealed

that there is variability among residues within the V/I-X-V/I-V/I motif, as well as other characteristic

amino acids associated with SIMs (Figure 3.20). This demonstrates that sequence alone cannot

act as a means to identify putative SIMs. However, the propensity for -strand formation is shared

between SIMs.

2ASQ (PIASx) – kvdVIDLtiessd

---EEE--TTSS-

2KQS (DAXX) - peeIIVLsdsd

-------------

2RPQ (ATP7IP)- ssgVIDLtmddee

----EE--SS---

2MP2 (RNF4) - gdeIVdLtcesle

- S------S-----

Figure 3.20: Diversity of SIM motifs. Sequence alignment of experimentally characterized SIM:SUMO structural

complexes reveals variability within the V/I-X-V/I-V/I motif.

We performed an NMR titration between NFATc2IP and the putative SIM1 region of NFATc2

knowing that an interaction between both proteins has already been observed, and because the

putative SIM region is within a secondary structure element has residues similar to the SIM motif

and is solvent exposed.

GHPVVQLHGYMENKPLGLQIFIG

--EEEEEE-----EEEEEEEEEE SGRIVSLQTASNPIECSQRS

----EEE-------------

74

3.3.6 NFATc2IP:NFATc2 NMR titration

NMR titration was performed using the ubiquitin-like domain of 15N-NFATc2IP (corresponding to

PDB-ID: 2JXX) and a 15 residue peptide of the putative SIM motif within NFATc2 (corresponding

to residues S-554 to S-573 in PDB-ID: 1S9K.C).

A series of NMR titrations were attempted at pH 6.5 and pH 7.0 in buffer optimized for a

NFATc2IP:NFATc2 interaction (50 mM sodium phosphate, 1 mM DTT, 10% D2O / 90% H2O)

based on the previously deposited NFATc2 protein, but no chemical shift change was visible until

1:20 molar ratio. These peak shifts consisted of 2 major peak shifts (glutamine 37 & threonine

38) and 4 minor peak shifts (glycine 32, leucine 39, alanine 59 & tryptophan 96) (Figure 3. 21).

These results correspond to amino acid residues predicted to be within the UIM binding site of

ubiquilin-1 (Figure 3.22 & Figure 3.23).

Figure 3.21: NFATc2IP:NFATc2 NMR titration. HSQC from NMR titration from 1:0 NFATc2IP:NFATc2 (red) to 1:20 NFATc2IP:NFATc2 (blue).

75

3.3.6.1 Analysis of the NFATc2IP:NFATc2 interface

Differences in electrostatic potential within the SIM-binding interface of NFATc2IP and SUMO2

are apparent when looking at the electrostatic potential distribution (Figure 3.22 & Figure 3.23).

These differences likely correspond to differences in binding partners, even though the molecular

surface conformation of the region and the secondary structure elements of the ubiquitin fold are

similar. This reveals that a gradient of complementary binding partners involved in a -sheet

extension could exist for the SIM-interaction interface, facilitating a similar binding mode but

different physicochemical attributes among residues of the binding partner. However, because

of the nature of such a relationship, sequence motif alone cannot be used to identify all putative

binding partners, and instead a secondary structure element analysis and query of solvent

exposed regions are also necessary.

Figure 3.22: Electrostatic potential of NFATc2IP & SUMO2. NFATc2IP (PDB_ID: 2JXX; left) & SUMO2 (PDB_ID:

2AWT; right) with electrostatic potential distribution mapped onto molecular surfaces, and a SIM -strand superimposed within the SIM-interacting interface.

76

Figure 3.23: Electrostatic potential diversity between similar UBLs. The ubiquitin-fold consists of a β-sheet

intercalated by an α-helical core. Electrostatic potential mapping reveals a different charge distribution at the SIM-binding interface of NFATc2IP-2 despite domain sequence similarity. There are 2 SIM-like regions of NFATc2 that may interact with NFATc2IP despite lacking a negative charge typical of SIM motifs.

77

3.4 Conclusion

The molecular structures of NFATc2IP & ubiquilin-1 were determined by NMR spectroscopy,

putative binding modes (SIM & UIM) were identified through structural analysis of similar ubiquitin-

like modifiers, and interactions with binding partners (NFATc2 & PIN1) and were validated through

NMR titration. NFATc2IP was predicted to interact with its binding partner NFATc2 in a SIM-like

-strand extension interaction. Ubiquilin-1 was predicted to interact with its binding partner PIN1

in a UIM-like -helical mediated interaction. These results suggest that a structure-based

approach can be useful for identifying potential interaction partners and mechanisms in the

ubiquitin fold superfamily.

78

Chapter 4

Exploring UBLs & UBL-Interaction Motifs: Computational & Experimental analysis of ubiquilin, NFATc2IP, UIMs and

SIMs.

Contributions: D. Yim & Z. Zhang developed the UBL database and web service. I designed the

UBL database and web service, identified data sources, and performed analyses of UBL data

under the guidance of CH. Arrowsmith.

79

Chapter 4

Exploring UBLs & UBL-Interaction Motifs: Computational & Experimental analysis of ubiquilin, NFATc2IP, UIMs and

SIMs.

4.1 Introduction

This research project was to obtain near complete structural coverage of human UBLs, without

experimentally determining each of the 398 unique UBLs. This was partly facilitated by grouping

the UBLs into 100 modelling families that represent homologous protein domains with similar

structures (Chapter Two). NMR spectroscopy was used to screen and prioritize UBLs for

structure determination, and 17 human UBLs were structurally characterized using X-ray

Crystallography and NMR spectroscopy. The RCSB PDB now has 32% structural coverage of

human UBLs based on structures experimentally determined by X-ray Crystallography and NMR

spectroscopy, and 82% when taking into account homology modelling. This chapter explores

similarities between UBLs, focusing on each of the 17 human UBLs that were structurally

characterized for this project and related UBLs. This chapter also discusses the 74 remaining

human UBLs that lack structural information, and provides hypotheses for further study.

4.1.1 Database & comparative analysis

Information about each ubiquitin-like domain was compiled from multiple databases to generate

a repository that would allow for detailed analysis of relationships between sequence, structure

and function of each protein domain. A detailed analysis focused on UBLs that were structurally

determined as part of the project, and members of associated modeling families. A relational

database facilitated identification of trends and hypothesis generation

80

4.1.1.1 Similarities & differences between model family members

Molecular features from UIM-binding & SIM-binding interfaces were identified and compared

within and across modelling families. Full domain and binding-interface localized electrostatic

potential distribution clustering was also performed to identify UBLs that shared similar

physicochemical characteristics

In addition to comparing molecular features of protein-protein interaction interfaces on the

ubiquitin-like domain, our analyses extended to grouping together UBLs that shared common

putative protein-protein interaction partners mentioned in literature.

A variety of other features were annotated to identify other similarities and differences between

members of each model family. These data included conserved residues, sequence similarity,

functional residues (ie. lysines for poly-ubiquitin chains), phosphorylation sites, hydrophobic

patches, GO-terms, and full length protein domain structure.

4.1.1.2 Common defining features for each modelling family

Common defining features provide insight into shared attributes of members of each modelling

family. These features could also identify functional attributes, binding partners, or other

characteristics shared between UBLs. This is particularly important for the ubiquitin-like domain

superfamily, since 35.5% of human proteins containing UBLs have no known functional

annotations. Additional significance arises from 54 human UBLs associated with disease

pathways.

81


4.2.1 UBL Database Development

A MySQL relational database was developed to contain information about all UBLs and related

proteins. The database includes a framework consisting of PHP scripts that facilitate aggregation

of online resources (Appendix 1.1.1 & Figure 4.1), and a web-based user interface for displaying

and accessing the information.

Figure 4.1: Database schema of ubiquitin-like domain repository.

82

Table 4.1: Data sources for ubiquitin-like domain repository.

Data Source Description of information

GenBank Nucleotide sequence and gene annotations

UniProtKB Gene structure, protein isoform sequences, and gene annotations

SMART Protein domains, and domain structure annotations

DiseaseHub Physiological information compiled from OMIM, GAD, HGMD, PharmGKB, GCP and GWAS human disease and physiology repositories.

BioGRID GO annotations, protein-protein interactions, cell localization, molecular function, and biological processes

Uniprot-GOA GO annotations, cell localization, molecular function, and biological processes

PDB Molecular structure information

BMRB NMR molecular structure information

UCSF Chimera Structural features, electrostatic potential distribution, molecular surface information, and secondary structure elements.

AESOP Electrostatic potential distribution

4.2.2 Relating 17 structurally determined UBLs to nearest neighbours and model families

For each UBL molecular structure that was determined as part of this project, an analysis was

performed to identify and characterize its most similar UBLs that were either part of the same

modelling family or were nearest neighbours. Sequence, structure electrostatic potential

distribution similarity were analyzed using ClustalW, SIAS, UCSF Chimera, AESOP, and R using

similar approaches as described in Chapter Three (Sievers et al., 2011; Petterson et al., 2004;

Gorham et al., 2011). Sequence alignment was performed using ClustalW, while sequence

identities and sequence similarities were calculated using SIAS.

Distances from remaining structurally unresolved UBLs were also analyzed, taking into account

the distance from UBLs solved as part of this project, as well as general distances from other

unresolved UBLs to identify clusters of unresolved structural information.

83

4.2.3 Secondary structure prediction & analysis

The protein domains within each human ubiquitin-like domain containing protein were annotated,

and clustered based on similar full length protein domain architecture. Protein domains were

identified using information from UniprotKB, PROSITE, SMART & NCBI GenBank, and plotted

using PROSITE MyDomains (Galperin et al., 2015; Sigrist et al., 2012; Letunic et al., 2014;

Benson et al., 2013).

The architecture of all UBLs were compared at the sequence-level using secondary-structure

sequence alignment, as well as at the structural level using UCSF Chimera (Petterson et al.,

2004).

4.2.4 Relating structural features to functional pathways

For each human ubiquitin-like domain, gene ontology annotations for cellular localization and

functional annotation were retrieved from Uniprot-GOA QuickGO (Huntley et al., 2015). Clusters

of human UBLs grouped based on common functional activity and/or cellular localization were

analyzed using UCSF Chimera to identify common molecular features (Petterson et al., 2004).

84

4.3 Results

4.3.1 Structurally characterized ubiquitin-like domains

Comparison of molecular structures of the 17 UBLs solved for this project revealed a few structural

variations. These include extended loops (between β-strand1 & β-strand2, β-strand2 & -helix1),

additional/missing -helicles, and a missing β-strand4. Structural analysis also revealed

conserved amino acids associated with the fold (Figure 4.2).

Figure 4.2: Secondary & tertiary structures of 17 structurally characterized UBLs. Ribbon diagrams of 17

ubiquitin-like domain structures solved for this project, along with corresponding secondary structure architecture.

85

4.3.2 Nearest-neighbours of ubiquitin-like domains

Clustering all UBLs based on sequence similarity reveals 5 groups and 30 subgroups (Figure 4.3).

Each of the groups contains at least one UBMs, with the majority of UBMs within Group I and the

largest proportion of structurally uncharacterized UBLs within Group IV.

Figure 4.3: Nearest-neighbour clustering of UBLs displayed with proportional transformed branches. Ubiquitin-like domains structurally determined for this project are highlighted in blue. Ubiquitin-like modifiers and

putative ubiquitin-like modifiers are underlined.

86

4.3.3 Nearest-neighbours of structurally characterized UBMs

Three of the structurally characterized UBLs were ubiquitin-like modifiers (FUBI-1, ISG15-2, and

SF3A1-1). To identify UBLs that may regulate ubiquitin-like modifiers by competing for binding

partners, a nearest-neighbour analysis was performed on FUBI-1, ISG15-2 and SF3A1-1.

Ubiquitin-like domains with structures with an RMSD of less than 2Å were compared to UBLs with

similar electrostatic potential and low RMSD (Figure 4.4, Figure 4.5, Figure 4.6).

Figure 4.4: UBLs with a structural fold similar to FUBI-1. There are 21 structurally characterized UBLs with an

RMSD of less than 2Å when compared to FUBI-1.

Twelve UBLs share similar electrostatic potential distribution as FUBI-1, of which 3 (highlighted

in red) have a fold with an RMSD of less than 2Å when compared to FUBI-1: UBIML_1-1,

UBIML_2-1, ISG15-2, PARK2_1-1, PARK2_2-1, PARK2_5-1, IQUB_1-1, IQUB_2-1, UBL7-1,

UBLCP1-1, USP14-1, and UBFD1-1.

87

Figure 4.5: UBLs with a structural fold similar to the second UBL of ISG15. There are 25 structurally

characterized UBLs with an RMSD of less than 2Å when compared to ISG15-2.

Thirteen UBLs share similar electrostatic potential distribution as ISG15-2, of which 5 (highlighted

in red) have a fold with an RMSD of less than 2Å when compared to ISG15-2: UHRF2_1-1,

UHRF2_2-1, UBA52-1, UBB-1, UBC-1, RPS27A-1, NEDD8-1, ANUBL1-1, RAD23A-1, RAD23B-

1, UBL4A-1, UBL4B-1, and UHRF1-1.

88

Figure 4.6: UBLs with a structural fold similar to SF3A1. There are 25 structurally characterized UBLs with an

RMSD of less than 2Å when compared to SF3A1-1

Three UBLs share similar electrostatic potential distribution as SF3A1-1, of which none have a

fold with an RMSD of less than 2Å when compared to SF3A1-1: TBCB-1, USP40_3-1, and

USP47_2-3.

89

4.3.4 Grouping UBLs based on biological processes and molecular function

Many UBLs are uncharacterized, with 62.77% of UBLs having biological process annotations and

64.5% of UBLs having molecular function annotations within the GO repository. A pool of 145

UBLs are associated with a total of 369 unique biological processes and 149 UBLs are associated

with 133 unique molecular functions (Huntley et al., 2015). Up to 53 biological processes and 9

molecular functions are associated with an individual UBL. Similarly to cellular localization,

biological process attribution and molecular function are associated with full length UBL-

containing proteins, and not each individual UBL domain. As a result, factors associated with

functional activity could result from molecular features in other domains within the full length

protein.

90

Table 4.2: Biological significance, functional annotation, and UBL group for each of the 17 UBLs structurally

characterized for this project.

Protein Method PDB ID

UBL Group

Biological Significance Function

SF3A1 NMR 1ZKH V Spliceosome gene regulation: nuclear

mRNA 3'-splice site recognition

ISG15 NMR 2HJ8 I Innate immune response activated

by interferon-& interferon- signaling protein

MAP1ALC3 Xray 3ECI II endomembrane system apoptosis: autophagy

HERPUD2 NMR 2KDB III Endoplasmic Reticulum protein binding

RNF2/RING1B Xray 3H8H I & III E3 ligase of lysine 119 on histone

H2A Transcription

PLXNC1 Xray 3KUZ I Receptor related to immune

modulation during virus infection signaling protein

USP7 Xray 2KVR II

Deubiquitylates proteins; prevents MDM2 self-ubiquitylation and enhances MDM2 E3 activity

towards p53 and its proteasomal degradation

protein binding: ubiquitinyl hydrolase 1

NFATc2IP_1st NFATc2IP_2nd

NMR 2L76 2JXX

I

Down-regulates poly-SUMO chain formation by UBE2I/UBC9, and

involved in expression of cytokine genes in T-cells

transcription

protein binding

BRAF (N-term)

Xray NMR

2L05 3NY5

I

Vemurafenib (approved by FDA in 2011) was first drug to target B-RAF for treatment of late-stage

melanoma; B-RAF is a Raf kinase and regulates MAP kinase/ERKs signaling pathway, which affects cell division, differentiation and

secretion.

transferase: non-specific serine/threonine protein

kinase

FUBI NMR 2L7R II C-term is ribosomal protein S30

and N-term is a UBL intracellular, ribosome and

translation

USP15 Xray 3PPA I

Ubiquitin-specific protease that targets lysine 48-linked poly-Ub

chains; Targets ubiquitylated APC and human papillomavirus type 16

protein E6

ubiquitin thioesterase activity

UHRF1 Xray 2FAZ I

E3 ubiquitin ligase involved in methylation-dependent

transcriptional regulation. Important for G1/S transition and possibly chromosomal stability and DNA

repair.

ligase activity

Ubiquilin 1 NMR 2KLC I

Modulates accumulation of presenilin protein, and is found in

lesions associated with Alzheimer’s and Parkinson’s disease. Also

associated with: neurodegenerative diseases, ALS,

Dementia, Ataxia, Huntington’s Disease & Lung Adenocarcinoma.

protein binding

Ubiquilin 3 Xray 1YQB I N/A signaling protein

91

4.3.5 Grouping UBLs based on medical significance

Some UBLs are associated with medically significant functional pathways based on annotations

within DiseaseHub, a tool that aggregates gene-disease associations from OMIM, GAD, HGMD,

PharmGKB, CGP and GWAS (DiseaseHub; http://zldev.ccbr.utoronto.ca/~ddong/diseaseHub). A

pool of 54 UBLs are associated with a total of 103 medically significant functional pathways. The

specific role of each UBL domain remains unknown in many cases. Similar to cellular localization,

medical significance is associated with full length UBL-containing proteins, and not individual UBL

domains. Based on medical significance, 6 structurally uncharacterized and distant UBLs can be

prioritized for functional significance (BRAF, PCGF2, PIK3C2A, PIK3C2B, USP40 and USP6).

92

4.3.5.1 Cellular localization

Table 4.3: Tissue and cell localization for each of the 17 UBL structurally characterized for this project.

Protein PDB ID

UBL Group

Tissue Cell Localization

SF3A1 1ZKH V Ubiquitous Nucleus, cytosol,

peroxisome, plasma membrane

ISG15 2HJ8 I

Detected in lymphoid cells, striated and smooth muscle, several epithelia and neurons. Expressed

in neutrophils, monocytes and lymphocytes. Enhanced expression seen in pancreatic

adenocarcinoma, endometrial cancer, and bladder cancer, as compared to non-cancerous tissue. In

bladder cancer, the increase in expression exhibits a striking positive correlation with more

advanced stages of the disease.

Extracellular, cytosol, nucleus

MAP1ALC3 3ECI II Most abundant in heart, brain, skeletal muscle and testis. Little expression observed in liver.

HERPUD2 2KDB III - Nucleus, cytosol, ER

RNF2/RING1B 3H8H I & III - Nucleus

PLXNC1 3KUZ I Detected in heart, brain, lung, spleen and

placenta.

Plasma membrane, cytosol, extracellular, mitochondria,

peroxisome

USP7 2KVR II Widely expressed. Overexpressed in prostate

cancer. Cytosol, nucleus,

mitochondria

NFATc2IP_1st NFATc2IP_2nd

2L76 2JXX

I - Nucleus, cytosol

BRAF (N-term)

2L05 3NY5

I Brain and testis. Cytosol, plasma membrane,

nucleus

FUBI 2L7R II - Cytosol, nucleus

USP15 3PPA I Expressed in skeletal muscle, kidney, heart,

placenta, liver, thymus, lung, and ovary, with little or no expression in other tissues.

Nucleus, cytsol, mitochondrion, plasma

membrane

UHRF1 2FAZ I Expressed in thymus, bone marrow, testis, lung

and heart. Overexpressed in breast cancer. Nucleus, cytosol

Ubiquilin 1 2KLC I

Ubiquitous. Highly expressed throughout the brain; detected in neurons and in

neuropathological lesions, such as neurofibrillary tangles and Lewy bodies. Highly expressed in heart, placenta, pancreas, lung, liver, skeletal

muscle and kidney.

Nucleus, ER, cytosol, vacuole, cytoskeleton

Ubiquilin 3 1YQB I Testis Cytosol, nucleus

Tissue and cell localization information retrieved from GeneCards (Rebhan et al., 1997). UBL groups are annotated in Figure 4.3.

93

4.3.6 Grouping UBLs based on cell localization

Upon analysis of all 231 UBL-containing proteins, 65.8% of UBLs have cell localization

annotations within the GO repository. This pool of 152 UBLs are associated with a total of 110

unique cellular regions, and up to 10 cellular regions are associated with a single UBL. Cell

localization is a significant attribute to consider when characterizing a protein, since it provides

insight into possible protein-protein interactions and functional pathways associated with that

particular cell localization, and also provides insight into the chemical environment (ie. pH). Cell

localization data for UBLs was analyzed a few different ways. First, the geographic distribution

of UBLs within the cell was analyzed, and the most common cellular locations for UBLs were the

nucleus, cytoplasm and ER (Figure 4.7). Of the UBLs that have been characterized to exist in the

cytoplasm, nucleus and/or ER, 90 UBLs are structurally characterized (bold blue font), and 12

are UBMs (underlined bold blue font).

Figure 4.7: Distribution of human UBLs based on cellular localization.

There are a few caveats to this approach. For example, cell localization is based on the full length

protein, which would affect any direct correlation between cell localization and specific protein

domains; and 34.2% of UBL-containing proteins lack information about cellular localization.

94

However, taking into account molecular structure data, specifically electrostatic potential

distribution mapped onto the molecular surface, the influence of pH on the binding interfaces and

structural features could be elucidated.

Table 4.4: Structural alignment of lysines within structurally characterized ubiquitin and ubiquitin-like domains

characterized within both cytoplasm and ER; cytoplasm & nucleus; nucleus, cytoplasm and ER; and only nucleus.

Cellular Localization Number of UBLs lysine-6 lysine-11 lysine-27 lysine-29 lysine-33 lysine-48 lysine-63

ER & Cytoplasm 3 2 1 0 1 2 1 1

Nucleus 14 1 4 7 9 6 2 1

Cytoplasm, ER & Nucleus

21 7 7 12 6 7 12 6

Cytoplasm 18 4 1 9 5 8 6 3

Nucleus & Cytoplasm 29 8 8 16 11 9 8 7

none of the above 10 4 2 8 2 1 3 1

Of the 21 UBLs associated with the ER, 4 UBLs are found to only be associated with the cytoplasm

and ER; GABARAP, GABARAPL1, HSPA13 and VCPIP1. The ubiquitin-like domain of

GABARAP and GABARAPL1 have been structurally characterized. VCPIP1 contains two

putative UBLs; there are distantly related protein structures for fragments of the first ubiquitin-like

domain of VCPIP1, and a homology model can be generated for the second ubiquitin-like domain.

However, the homology model for the second ubiquitin-like domain of VCPIP1 has a low

confidence C-terminal tail due to template sequence alignment gaps. Homology models were not

generated for HSPA13 nor VCPIP1_1-1 because low quality homology models would have been

generated.

Structural analysis of the homology model of VCPIP1, GABARAP and GABARAPL1 reveal

structural alignment of lysine 53 in VCPIP1 with poly-ubiquitin chain target lysine 48 of ubiquitin,

and lysine 35 & lysine 66 of GABARAP and GABARAPL1 with lysine 6 & lysine 33 of ubiquitin.

Comparative analysis of the molecular surface of each ubiquitin-like domain structure at pH 7.2

revealed no major hydrophobic patches nor electrostatic potential patches across all structures.

However, this could be due to the small sample size of only two structurally characterized UBLs

and one homology model for this group of UBLs.

95

Structural analysis of UBLs found within the nucleus provide richer pool of information. There

was structural information for 14 UBLs, of which 8 structures were generated using homology

modelling techniques. Analysis of electrostatic potential distribution grouped the proteins into 3

subgroupings: UBLs with surfaces that are mostly positively charged (PCGF1_1-1, PCGF2_1-1,

PCGF3_1-1, PCGF5_1-1, SF3A1_1-1), UBLs with a large conserved negatively charged patch

(PCGF5_1-1, PCGF6_1-1, PCGF6_2-2, USP31_1-1), UBLs with mixed distribution of negatively

& positively charged residues (UHRF1_1-1, UHRF2_1-1, UBLCP1_1-1, SUMO2_1-1, PCGF6_1-

1, PCGF6_2-2). Some small hydrophobic patches were identified for small subgroupings of

UBLs, but nothing significant to characterize the full group of UBLs. Similar to the group of UBLs

within the ER, there is also a subset of nuclear UBLs that have structurally conserved lysines in

regions corresponding to lysine 6, lysine 33, and lysine 48 of ubiquitin (Table 4.4).

4.4 Conclusion

Information about the ubiquitin-like domain family has been compiled as a resource for generating

hypotheses about ubiquitin-like domain containing proteins, and the role of the UBLs in

uncharacterized proteins based on structural similarity analyses that could be associated with

potential protein-protein interaction interfaces. Multiple approaches were pursued for grouping

UBLs; these included clustering based on sequence similarity, structural features, and functional

characterization. Based on the analyses that were performed, a framework was generated to

explore molecular diversity of protein domains and putative members of protein domain families.

Structurally unresolved UBLs were ranked based on the amount and significance of information

generated by subsequence structural analyses. The top 10 UBLs recommended for future

characterization are ANKUB1-2, FRMD1_2-2, FRMPD2_1-1, SHROOM1_1-2, SNX31_1-2,

USP9X_1-3, USP11_1-2, SACS_1, PAN2_1-1, PAN2_1-2, PAN2_1-3, PIK3CG and PTPN13_1-2.

96

Chapter 5

Conclusion and Future Directions

5.1 Conclusions

Over the course of this thesis project, I investigated the scope and diversity of the ubiquitin fold

among human ubiquitin-like domains. This revealed a functionally diverse superfamily of 448

protein domains, related to one another in terms of structural fold and secondary structure

elements. The functional diversity of the 448 human UBLs was efficiently surveyed by grouping

related UBLs into modelling families. As a result, 680 DNA constructs representing 76 UBL

domains were expressed in E.coli for small-scale screening of protein expression and solubility,

of which 205 UBL domain constructs were further screened by NMR spectroscopy. 17 UBLs with

high novel leverage were selected for molecular structure determination based on protein

expression and solubility screening results. The structurally characterized UBLs were surveyed

and compared with structurally characterized UBMs, revealing amino acid variability and

complementarity that maintains the protein fold while diversifying the chemical environment of

protein-protein interaction interfaces.

Aggregating and analyzing these distant features facilitated correlations and predicted

relationships based on structural features. Two of these predictions, the second ubiquitin-like

domain of NFATc2IP interacting with the second SIM of NFATc2, as well as the ubiquitin-like

domain of ubiquilin-1 interacting with a putative UIM of PIN1, were screened by NMR titration

(Chapter Three). Changes in chemical shifts of residues at or near the putative binding site

validate the predicted interaction, and also demonstrate the potential for ubiquitin-like domains to

have interactions that are similar to known binding partners of ubiquitin and ubiquitin-like modifiers

yet complement the interaction interface of the ubiquitin-like fold. The significance of these

interactions are yet to be characterized, but could be related to shared functional activity, common

97

functional pathways, modulation of ubiquitin-like modifier activity, or could be involved in

mediating ubiquitin-like modifier conjugation of ubiquitin-like domain containing proteins.

5.2 Future Directions

5.2.1 Ubiquitin-like domain fold, NFATc2IP & ubiquilins

In order to better understand the significance of conserved residues on maintaining the ubiquitin

fold and the characteristic secondary structure elements, a series of mutagenesis experiments

could be performed. Mutagenesis could also be performed on amino acids within binding

interfaces to explore complementarity between ubiquitin-like domains and binding partners. For

NFATc2IP, the amino acids would include those identified in the NMR titration experiments;

glutamine 37, threonine 38, glycine 32, leucine 39, alanine 59, and tryptophan 96. For ubiquilin-

1, the amino acids would include aspartic acid 63, lysine 72, isoleucine 73, leucine 74, glutamine

82, histidine 92, valine 94, and lysine 96.

NMR titration experiments were performed using isolated ubiquitin-like domains and fragments of

binding partner proteins, and should be repeated using full length proteins (NFATc2IP, NFATc2,

ubiquilin-1, and PIN1). The full length NFATc2IP protein contains tandem NFATc2IP ubiquitin-

like domains, and a comparative analysis can be performed as a tandem NFATc2IP ubiquitin-like

domain fragment. NFATc2IP and ubiquilin-1 genes each have multiple protein family members

and isoforms, and similar experiments can be performed on each of these members to determine

whether binding specificity extends to other family members and/or isoforms.

Ubiquilin-1 has been observed in the cytoplasm, nucleus and ER, while NFATc2IP has been

observed in the cytoplasm and nucleus. Subtle differences in the chemical environments of each

cellular compartment could impact the electrostatic surface potential at binding interfaces and

impact protein-protein interactions. For this reason, experiments involving pH titration and the

impact on protein-protein interactions could be explored. Similarity, phosphorylation and post-

98

translational modification sites exist on NFATc2IP, NFATc2, ubiquilin-1 and PIN1, and

experiments could be performed to determine whether phosphorylation or other post-translational

modifications affect binding affinities.

Based on fold conservation and structural feature similarity, competition analysis with similar

ubiquitin-like domains could be performed (ubiquitin, UBL4A & UBTD2 for ubiquilin-1) to

determine whether the ubiquitin-like domains compete to interact with PIN1 for ubiquilin-1. A

matrix of similar competition analyses could be performed using additional binding partners.

5.2.2 Ubiquitin-like domain structural genomics

270 ubiquitin-like domains remain to be structurally determined for structural completeness, which

becomes 74 when taking into account homology models. I’d recommend a strategy for completing

structural coverage which is prioritized based on structural coverage & functional significance. This

would consist of screening and characterizing the following ubiquitin-like domains: ANKUB1-2,

FRMD1_2-2, FRMPD2_1-1, SHROOM1_1-2, SNX31_1-2, USP9X_1-3, USP11_1-2, SACS_1,

PAN2_1-1, PAN2_1-2, PAN2_1-3, PIK3CG, and PTPN13_1-2.

5.2.3 Protein Domain family analyses

Our systematic approach of surveying, selecting, screening, structural determination, and

analysis could be performed on a variety of different protein families to explore the amino acid

and structural diversity of any group of proteins, whether fold superfamily or structural motif.

5.3 Concluding remarks

My structural genomics analysis of human ubiquitin-like domains demonstrates the value of: (1)

NMR 1H15N-HSQC screening for amenability for structure determination; (2) modelling family

analysis and homology model generation to assist in completing structural coverage of a protein

family; and (3) utilizing relational databases and structure-driven hypothesis generation to predict

putative binding partners.

99

6.0 References

Angot A, Vergunst A, Genin S, and Peeters N. “Exploitation of eukaryotic ubiquitin signaling

pathways by effectors translocated by bacterial type III and type IV secretion systems.” PLoS

pathogens 3, no. 1 (2007): e3.

Arnold K, Bordoli L, Kopp J, and Schwede T. "The SWISS-MODEL Workspace: A web-based

environment for protein structure homology modelling." Bioinformatics 22 (2006): 195-201.

Baker NA, Sept D, Joseph S, Holst MJ, and McCammon JA. "Electrostatics of nanosystems:

application to microtubules and the ribosome." Proceedings of the National Academy of Sciences

of the United States of America 98 (2001): 10037–10041.

Benson, DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, and Sayers EW.

"GenBank." Nucleic Acids Research 41, no. Database Issue (2013): D36-D42.

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, and Bourne

PE. "The Protein Data Bank". Nucleic Acids Research 28 (2000): 235–242.

Bhattacharya, A, Tejero R, and Montelione GT. "Evaluating protein structures determined by

structural genomics consortia." Proteins: Structure, Function, and Bioinformatics 66, no. 4 (2007):

778-795.

Bhattacharya, A, Wunderlich Z, Monleon D, Tejero R, and Montelione GT. "Assessing model

accuracy using the homology modeling automatically software." Proteins 70 (2008): 105-118.

Boratyn, Grzegorz M, Schaffer AA, Agarwala R, Altschul SF, Lipman DJ, and Madden TL.

"Domain enhanced lookup time accelerated BLAST." Biology Direct 7, no. 1 (2012): 12.

100

Boyault C, Gilquin B, Zhang Y, Rybin V, Garman E, Meyer-Klaucke W, Matthias P, Müller CW,

and Khochbin S. “HDAC6–p97/VCP controlled polyubiquitin chain turnover.” The EMBO journal

25, no. 14 (2006): 3357-3366.

Brzovic PS, Lissounov A, Christensen DE, Hoyt DW, and Klevit RE. “A UbcH5/ubiquitin

noncovalent complex is required for processive BRCA1-directed ubiquitination.” Mol. Cell 21

(2006): 873–880.

Brünger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang J-S,

Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL. "Crystallography

& NMR system: A new software suite for macromolecular structure determination." Acta

Crystallographica Section D: Biological Crystallography 54, no. 5 (1998): 905-921.

Brünger AT. "Version 1.2 of the Crystallography and NMR system." Nature protocols 2, no. 11

(2007): 2728-2733.

Chen L, Shinde U, Ortolan TG, and Madura K. “Ubiquitin‐associated (UBA) domains in Rad23

bind ubiquitin and promote inhibition of multi‐ubiquitin chain assembly.” EMBO reports 2, no. 10

(2001): 933-938.

Ciechanover A, and Schwartz AL. “The ubiquitin system: pathogenesis of human diseases and

drug targeting.” Biochimica et Biophysica Acta (BBA)-Molecular Cell Research 1695, no. 1 (2004):

3-17.

Cole C, Barber JD, and Barton GJ. "The Jpred 3 secondary structure prediction server" Nucleic

Acids Res. 36, suppl. 2 (2008): W197-W201.

Cuff JA, and Barton GJ. "Application of Enhanced Multiple Sequence Alignment Profiles to

Improve Protein Secondary Structure Prediction." PROTEINS: Structure, Function and Genetics

40 (2000): 502-511.

101

Davis ME, McCammon JA "Electrostatics in biomolecular structure and dynamics." Chem. Rev.

90 (1990): 509–521.

de Napoles M, Mermoud JE, Wakao R, Tang YA, Endoh M, Appanah R, Nesterova TB, Silva J,

Otte AP, Vidal M, Koseki H, and Brockdorff N. “Polycomb group proteins Ring1A/B link

ubiquitylation of histone H2A to heritable gene silencing and X inactivation.” Developmental cell

7, no. 5 (2004): 663-676.

Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, and Bax AD. "NMRPipe: a multidimensional

spectral processing system based on UNIX pipes." Journal of biomolecular NMR 6, no. 3 (1995):

277-293.

Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA. "PDB2PQR:

expanding and upgrading automated preparation of biomolecular structures for molecular

simulations." Nucleic Acids Research 35 (2007): W522–W525.

Donaldson KM, Yin H, Gekakis N, Supek F, and Joazeiro CA. “Ubiquitin signals protein trafficking

via interaction with a novel ubiquitin binding domain in the membrane fusion regulator, Vps9p.”

Current biology 13, no. 3 (2003): 258-262.

Finley D, Bartel B, and Varshavsky A. “The tails of ubiquitin precursors are ribosomal proteins

whose fusion to ubiquitin facilitates ribosome biogenesis.” Nature 338, no. 6214 (1989): 394-401.

Fisher RD, Wang B, Alam SL, Higginson DS, Robinson H, Sundquist WI, & Hill CP. "Structure

and ubiquitin binding of the ubiquitin-interacting motif." Journal of Biological Chemistry 278, no.

31 (2003): 28976-28984.

Goddard TD & Kneller DG. "SPARKY 3", University of California, San Francisco.

102

Gorham Jr RD, Kieslich CA, Morikis D. "Electrostatic Clustering and Free Energy Calculations

Provide a Foundation for Protein Design and Optimization." Annals of Biomedical Engineering 39,

no. 4 (2011): 1252–1263.

Grabbe C & Dikic I. "Functional roles of ubiquitin-like domain (ULD) and ubiquitin-binding domain

(UBD) containing proteins." Chemical reviews 109, no. 4 (2009): 1481-1494.

Haglund K, and Dikic I. “Ubiquitylation and cell signaling.” The EMBO journal 24, no. 19 (2005):

3353-3359.

Hannich JT, Lewis A, Kroetz MB, Li SJ, Heide H, Emili A, and Hochstrasser M. "Defining the

SUMO-modified proteome by multiple approaches in Saccharomyces cerevisiae." Journal of

Biological Chemistry 280, no. 6 (2005): 4102-4110.

Hecker CM, Rabiller M, Haglund K, Bayer P, and Dikic I. “Specification of SUMO1-and SUMO2-

interacting motifs.” Journal of Biological Chemistry 281, no. 23 (2006): 16117-16127.

Heir R, Ablasou C, Dumontier E, Elliott M, Fagotto-Kaufmann C, Bedford FK. "The UBL domain

of PLIC-1 regulates aggresome formation." EMBO reports 7, 12 (2006): 1252-1258.

Hochstrasser M. “Origin and function of ubiquitin-like proteins.” Nature 458, no. 7237 (2009): 422-

429.

Hochstrasser, M. "Evolution and function of ubiquitin-like protein-conjugation systems." Nature

cell biology 2, no. 8 (2000): E153-E157.

Hofmann K & Bucher P. "The UBA domain: a sequence motif present in multiple enzyme classes

of the ubiquitination pathway." Trends in biochemical sciences 21, no. 5 (1996): 172-173.

103

Hofmann K & Falquet L. “A ubiquitin-interacting motif conserved in components of the

proteasomal and lysosomal protein degradation systems.” Trends in biochemical sciences 26, no.

6 (2001): 347-350.

Hong YH, Ahn HC, Lim J, Kim HM, Ji HY, Lee S, Kim JH, Park EY, Song HK, and Lee BJ.

“Identification of a novel ubiquitin binding site of STAM1 VHS domain by NMR spectroscopy.”

FEBS letters 583, no. 2 (2009): 287-292.

Hook SS, Orian A, Cowley SM, and Eisenman RN. “Histone deacetylase 6 binds polyubiquitin

through its zinc finger (PAZ domain) and copurifies with deubiquitinating enzymes.” Proceedings

of the National Academy of Sciences 99, no. 21 (2002): 13425-13430.

Ichimura Y, Takayoshi K, Toshifumi T, Yoshinori S, Yasutsugu S, Naotada I, Noboru M, et al. "A

ubiquitin-like system mediates protein lipidation." Nature 408, no. 6811 (2000): 488-492.

Kang RS, Daniels CM, Francis SA, Shih SC, Salerno WJ, Hicke L, and Radhakrishnan I. “Solution

structure of a CUE–ubiquitin complex reveals a conserved mode of ubiquitin binding.” Cell 113

(2003): 621–630.

Kerscher O. "SUMO junction—what's your function?." EMBO reports 8, no. 6 (2007): 550-555.

Kiefer F, Arnold K, Künzli M, Bordoli L, and Schwede T. "The SWISS-MODEL Repository and

associated resources.” Nucleic Acids Research 37 (2009): D387-D392.

Ko HS, Uehara T, Tsuruma K, and Nomura Y. "Ubiquilin interacts with ubiquitylated proteins and

proteasome through its ubiquitin-associated and ubiquitin-like domains." FEBS letters 566, no. 1

(2004): 110-114.

104

Koehn J & Hunt I. "High-Throughput Protein Production (HTPP): a review of enabling technologies

to expedite protein production." In High Throughput Protein Expression and Purification, pp. 1-18.

Humana Press, 2009.

Koh IYY, Eyrich VA, Marti-Renom MA, Przybylski D, Madhusudhan MS, Eswar N, Grana O, Pazos

F, Valencia A, Sali A, and Rost B. "EVA: evaluation of protein structure prediction servers."

Nucleic Acids Research 31, no. 13 (2003): 3311-3315.

Komander D. “The emerging complexity of protein ubiquitination.” Biochemical Society

Transactions 37, no. Pt 5 (2009): 937-953.

Koonin EV & Abagyan RA. “TSG101 may be the prototype of a class of dominant negative

ubiquitin regulators.” Nature genetics 16, no. 4 (1997): 330-331.

Lee S, Tsai YC, Mattera R, Smith WJ, Kostelansky MS, Weissman AM, Bonifacino JS, and Hurley

JH. “Structural basis for ubiquitin recognition and autoubiquitination by Rabex-5.” Nature Struct.

Mol. Biol. 13, (2006): 264–271.

Lemak A, Gutmanas A, Chitayat S, Karra M, Farès C, Sunnerhagen M, and Arrowsmith CH. "A

novel strategy for NMR resonance assignment and protein structure determination." Journal of

biomolecular NMR 49, no. 1 (2011): 27-38.

Letunic I, Doerks T, and Bork P. "SMART: recent updates, new developments and status in 2015."

Nucleic Acids Research 43, no. D1 (2014): D257-D260.

Liou YC, Sun A, Ryo A, Zhou XZ, Yu ZX, Huang HK, Uchida T, Bronson R, Bing G, Li X, Hunter

T, and Lu KP. "Role of the prolyl isomerase Pin1 in protecting against age-dependent

neurodegeneration" Nature 424, no. 6948 (2003): 556-561.

105

Lim J, Hao T, Shaw C, Patel AJ, Szabó G, Rual JF, Fisk CJ, Li N, Smolyar A, Hill DE, Barabási

AL, Vidal M, and Zoghbi HY. "A protein–protein interaction network for human inherited ataxias

and disorders of Purkinje cell degeneration." Cell 125, no. 4 (2006): 801-814.

Lippens G, Landrieu I, and Smet C. "Molecular mechanisms of the phospho‐dependent prolyl

cis/trans isomerase Pin1." FEBS journal 274, no. 20 (2007): 5211-5222.

Loeb KR & Haas AL. "The interferon-inducible 15-kDa ubiquitin homolog conjugates to

intracellular proteins." Journal of Biological Chemistry 267, no. 11 (1992): 7806-7813.

Macian F. "NFAT proteins: key regulators of T-cell development and function." Nature Reviews

Immunology 5, no. 6 (2005): 472-484.

Mah AL, Perry G, Smith MA, and Monteiro MJ. "Identification of ubiquilin, a novel presenilin

interactor that increases presenilin protein accumulation." The Journal of cell biology 151, no. 4

(2000): 847-862.

Makhnevych T, Sydorskyy Y, Xin X, Srikumar T, Vizeacoumar FJ, Jeram SM, Li Z, Bahr S,

Andrews BJ, Boone C, and Raught B. "Global map of SUMO function revealed by protein-protein

interaction and genetic networks." Molecular cell 33, no. 1 (2009): 124-135.

Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, Gonzales NR, Gwadz

M, Hurwitz DI, Lanczycki CJ, Lu F, Lu S, Marchler GH, Song JS, Thanki N, Yamashita RA, Zhang

D, and Bryant SH. "CDD: conserved domains and protein three-dimensional structure." Nucleic

acids research 41, no. D1 (2013): D348-D352.

Marti-Renom MA, Madhusudhan MS, Fiser A, Rost B, and Sali A. "Reliability of assessment of

protein structure prediction methods." Structure 10 (2002): 435-440.

106

Marti-Renom MA, Stuart A, Fiser A, Sanchez R, Melo F, and Sali A. "Comparative protein

structure modeling of genes and genomes." Annual Review of Biophysics and Biomolecular

Structure 29 (2000): 291-325.

McNally T, Huang Q, Janis RS, Liu Z, Olejniczak ET, and Reilly RM. "Structural analysis of UBL5,

a novel ubiquitin-like modifier." Protein science 12, no. 7 (2003): 1562-1566.

Meyer HH, Wang Y, and Warren G. “Direct binding of ubiquitin conjugates by the mammalian p97

adaptor complexes, p47 and Ufd1–Npl4.” The EMBO journal 21, no. 21 (2002): 5645-5652.

Minty A, Dumont X, Kaghad M, and Caput D. "Covalent Modification of p73α by SUMO-1 two-

hybrid screening with p73 identifies novel SUMO-1-interacting proteins and a SUMO-1 interacting

motif." Journal of Biological Chemistry 275, no. 46 (2000): 36316-36323.

Mizushima N, Noda T, Yoshimori T, Tanaka Y, Ishii T, George MD, Klionsky DJ, Ohsumi M, and

Ohsumi Y. "A protein conjugation system essential for autophagy." Nature 395, no. 6700 (1998):

395-398.

Müller S, Ledl A, and Schmidt D. "SUMO: a regulator of gene expression and genome integrity."

Oncogene 23, no. 11 (2004): 1998-2008.

N'Diaye EN & Brown EJ. "The ubiquitin-related protein PLIC-1 regulates heterotrimeric G protein

function through association with Gβγ." The Journal of cell biology 163, no. 5 (2003): 1157-1165.

Nair, Rajesh, Liu J, Soong TT, Acton TB, Everett JK, Kouranov A, Fiser A, Godzik A, Jaroszewski

L, Orengo C, Montelione GT, and Rost B. "Structural genomics is the largest contributor of novel

structural leverage." Journal of Structural and Functional Genomics 10, no. 2 (2009): 181-191.

Nayak A, Glöckner-Pagel J, Vaeth M, Schumann JE, Buttmann M, Bopp T, Schmitt E, Serfling E

and Berberich-Siebelt F. "Sumoylation of the transcription factor NFATc1 leads to its subnuclear

107

relocalization and interleukin-2 repression by histone deacetylase" Journal of Biological Chemistry

284, no. 16 (2009): 10935-10946.

Pan ZQ, Kentsis A, Dias DC, Yamoah K, and Wu K. "Nedd8 on cullin: building an expressway to

protein destruction." Oncogene 23, no. 11 (2004): 1985-1997.

Peitsch, MC. "Protein modeling by E-mail." Nature Biotechnology 13 (1995): 658-660.

Penengo L, Mapelli M, Murachelli AG, Confalonieri S, Magri L, Musacchio A, Di Fiore PP, Polo S,

and Schneider TR. “Crystal structure of the ubiquitin binding domains of rabex-5 reveals two

modes of interaction with ubiquitin.” Cell 124 (2006): 1183–1195.

Perry JJP, Tainer JA, and Boddy MN. "A SIM-ultaneous role for SUMO and ubiquitin." Trends in

biochemical sciences 33, no. 5 (2008): 201-208.

Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, and Ferrin TE

"UCSF Chimera–a visualization system for exploratory research and analysis." J. Comput. Chem

25 (2004): 1605–1612.

Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H, Yang Z,

Meng EC, Pettersen EF, Huang CC, Datta RS, Sampathkumar P, Madhusudhan MS, Sjolander

K, Ferrin TE, Burley SK, and Sali A. "MODBASE, a database of annotated comparative protein

structure models and associated resources." Nucleic Acids Research 39 (2011): 465-474.

Prag G, Misra S, Jones EA, Ghirlando R, Davies BA, Horazdovsky BF, and Hurley JH.

“Mechanism of ubiquitin recognition by the CUE domain of Vps9p.” Cell 113 (2003): 609–620.

Prag G, Lee SH, Mattera R, Arighi CN, Beach BM, Bonifacino JS, and Hurley JH. “Structural

mechanism for ubiquitinated-cargo recognition by the Golgi-localized, gamma-ear-containing,

108

ADP-ribosylation-factor-binding proteins.” Proceedings of the National Academy of Sciences of

the United States of America 102 (2005): 2334–2339.

Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D,

Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS,

Sebastian A, Rani S, Ray S, Kishore CJH, Kanth S, Ahmed M, Kashyap MK, Mohmood R,

Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S,

Chaerkady R, and Pandey A. "Human Protein Reference Database - 2009 Update." Nucleic Acids

Research 37 (2009): D767-D772.

Rao A, Luo C, and Hogan PG. “Transcription factors of the NFAT family: regulation and function.”

Annual review of immunology 15, no. 1 (1997): 707-747.

Rebhan M, Chalifa-Caspi V, Prilusky J, and Lancet D. "GeneCards: integrating information about

genes, proteins and diseases." Trends in Genetics 13, no. 4 (1997): 163.

Regan-Klapisz E, Sorokina I, Voortman J, de Keizer P, Roovers RC, Verheesen P, Urbé S, Fallon

L, Fon EA, Verkleij A, Benmerah A, and van Bergen en Henegouwen PM. "Ubiquilin recruits

Eps15 into ubiquitin-rich cytoplasmic aggregates via a UIM-UBL interaction." Journal of cell

science 118, no. 19 (2005): 4437-4450.

Rengarajan J, Mittelstadt PR, Mages HW, Gerth AJ, Kroczek RA, Ashwell JD, and Glimcher LH.

“Sequential involvement of NFAT and Egr transcription factors in FasL regulation.” Immunity 12,

no. 3 (2000): 293-300.

Reyes-Turcu FE, Horton JR, Mullally JE, Heroux A, Cheng X, and Wilkinson KD. “The ubiquitin

binding domain ZnF UBP recognizes the C-terminal diglycine motif of unanchored ubiquitin.” Cell

124, no. 6 (2006): 1197-1208.

109

Schlesinger DH, Goldstein G, and Niall HD. “Complete amino acid sequence of ubiquitin, an

adenylate cyclase stimulating polypeptide probably universal in living cells.” Biochemistry 14, no.

10 (1975): 2214-2218.

Semple CA. “The comparative proteomics of ubiquitination in mouse.” Genome Research 13

(2003): 1389–1394.

Shen Y, Delaglio F, Cornilescu G, and Bax A. "TALOS+: a hybrid method for predicting protein

backbone torsion angles from NMR chemical shifts." Journal of biomolecular NMR 44, no. 4

(2009): 213-223.

Shiba Y, Katoh Y, Shiba T, Yoshino K, Takatsu H, Kobayashi H, Shin HW, Wakatsuki S, and

Nakayama K. “GAT (GGA and Tom1) domain responsible for ubiquitin binding and ubiquitination.”

Journal of Biological Chemistry 279, no. 8 (2004): 7105-7111.

Shih SC, Prag G, Francis SA, Sutanto MA, Hurley JH, and Hicke L. “A ubiquitin‐binding motif

required for intramolecular monoubiquitylation, the CUE domain.” The EMBO Journal 22, no. 6

(2003): 1273-1281.

Shimodaira H. "An approximately unbiased test of phylogenetic tree selection." System. Biol. 51

(2002): 492–508.

Shimodaira H. "Approximately unbiased test of regions using multistep-multiscale bootstrap

resampling." Ann. Statist. 32 (2004): 2616–2641.

Sitkoff D, Sharp KA, Honig B. "Accurate calculation of hydration free energies using macroscopic

solvent models." J. Phys. Chem. 98 (1994): 1978–1988.

110

Song J, Durrin LK, Wilkinson TA, Krontiris TG, and Chen Y. "Identification of a SUMO-binding

motif that recognizes SUMO-modified proteins." Proceedings of the National Academy of

Sciences of the United States of America 101, no. 40 (2004): 14373-14378.

Song J, Zhang Z, Hu W, and Chen Y. “Small ubiquitin-like modifier (SUMO) recognition of a SUMO

binding motif: a reversal of the bound orientation.” J.Biol.Chem. 280 (2005): 40122-40129.

Sundquist WI, Schubert HL, Kelly BN, Hill GC, Holton JM, and Hill CP. “Ubiquitin recognition by

the human TSG101 protein.” Mol. Cell 13 (2004): 783–789.

Swanson KA, Kang RS, Stamenova SD, Hicke L, and Radhakrishnan I. “Solution structure of

Vps27 UIM-ubiquitin complex important for endosomal sorting and receptor downregulation.”

EMBO J. 22 (2003): 4597–4606.

Teo H, Gill DJ, Sun J, Perisic O, Veprintsev DB, Vallis Y, Emr SD, and Williams RL. “ESCRT-I

core and ESCRT-II GLUE domain structures reveal role for GLUE in linking to ESCRT-I and

membranes.” Cell 125, no. 1 (2006): 99-111.

Terui Y, Saad N, Jia S, McKeon F, and Yuan J. "Dual role of sumoylation in the nuclear localization

and transcriptional activation of NFAT1." Journal of Biological Chemistry 279 (2004): 28257-

28265.

Turner B, Razick S, Turinsky AL, Vlasblom J, Crowdy EK, Cho E, Morrison K, Donaldson IM, and

Wodak SJ. "iRefWeb: interactive analysis of consolidated protein interaction data and their

supporting evidence." Database (2010): baq023.

Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, Forsberg M, Zwahlen M, Kampf C,

Wester K, Hober S, Wernerus H, Björling L, and Ponten F. "Towards a knowledge-based Human

Protein Atlas." Nature Biotechnology 28, no. 12 (2010): 1248-1250.

111

UniProt Consortium. "Activities at the Universal Protein Resource (UniProt)." Nucleic Acids

Research 42, no. D1 (2014): D191-D198.

Varadan R, Assfalg M, Raasi S, Pickart C, and Fushman D. “Structural determinants for selective

recognition of a lys48-linked polyubiquitin chain by a UBA domain.” Mol. Cell 18 (2005): 687–698.

Vriend, G. "WHAT IF: a molecular modeling and drug design program." Journal of molecular

graphics 8, no. 1 (1990): 52-56.

Wang B, Alam SL, Meyer HH, Payne M, Stemmler TL, Davis DR, and Sundquist WI. “Structure

and ubiquitin interactions of the conserved zinc finger domain of Npl4.” Journal of Biological

Chemistry 278, no. 22 (2003): 20225-20234.

Wang QH, Young P, and Walters KJ. “Structure of S5a bound to monoubiquitin provides a model

for polyubiquitin recognition.” J. Mol. Biol. 348 (2005): 727–739.

Wang X, Herr RA, Chua WJ, Lybarger L, Wiertz EJHJ, and Hansen TH. "Ubiquitination of serine,

threonine, or lysine residues on the cytoplasmic tail can induce ERAD of MHC-I by viral E3 ligase

mK3." The Journal of cell biology 177, no. 4 (2007): 613-624.

Weigelt J. "Structural genomics—impact on biomedicine and drug discovery." Experimental cell

research 316, no. 8 (2010): 1332-1338.

Xu P & Peng J. "Dissecting the ubiquitin pathway by mass spectrometry." Biochimica et

Biophysica Acta (BBA)-Proteins and Proteomics 1764, no. 12 (2006): 1940-1947.

Yang SH, Galanis A, Witty J, and Sharrocks AD. "An extended consensus motif enhances the

specificity of substrate modification by SUMO." The EMBO journal 25, no. 21 (2006): 5083-5093.

112

Yee A, Chang X, Pineda-Lucena A, Wu B, Semesi A, Le B, Ramelot T, Lee GM, Bhattacharyya

S, Gutierrez P, Denisov A, Lee CH, Cort JR, Kozlov G, Liao J, Finak G, Chen L, Wishart D, Lee

W, McIntosh LP, Gehring K, Kennedy MA, Edwards AM, and Arrowsmith CH. "An NMR approach

to structural proteomics." Proceedings of the National Academy of Sciences 99, no. 4 (2002):

1825-1830.

Yee AA, Semesi A, Garcia M, and Arrowsmith CH. "Screening proteins for NMR suitability. In

Structural Genomics and Drug Discovery." Springer New York (2014): 169-178.

Young P, Deveraux Q, Beal RE, Pickart CM, and Rechsteiner M. "Characterization of two

polyubiquitin binding sites in the 26 S protease subunit 5a". Journal of Biological Chemistry 273,

no. 10 (1998): 5461–5467.

Zhu J, Zhu S, Guzzo CM, Ellis NA, Sung KS, Choi CY, and Matunis MJ. "Small ubiquitin-related

modifier (SUMO) binding determines substrate recognition and paralog-selective SUMO

modification." Journal of Biological Chemistry 283, no. 43 (2008): 29405-29415.

Zweckstetter M & Bax A. "Prediction of sterically induced alignment in a dilute liquid crystalline

phase: aid to protein structure determination by NMR." Journal of the American Chemical Society

122, no. 15 (2000): 3791-3792.

113

7.0 Appendix

Appendix I: All human genes that encode at least one ubiquitin-like domain.

Gene Name Protein Name HUGO HGNC

NCBI GeneID

EC EnzymeID

UniProt ID UniProt Name

ANKRD60 Ankyrin repeat domain-containing protein 60 16217 140731 - Q9BZ19 ANR60_HUMAN

ANKUB1-1/-2/-3 ANKUB1 389161 29642 - A6NFN9 ANKUB_HUMAN

ANUBL1-1 AN1-type zinc finger protein 4 23504 93550 - Q86XD8 ZFAN4_HUMAN

APBB1IP Amyloid beta A4 precursor protein-binding family B member

1-interacting protein 17379 54518 - Q7Z5R6 AB1IP_HUMAN

ARAF-1 Serine/threonine-protein kinase A-Raf 646 369 2.7.11.1 P10398 ARAF_HUMAN

ARAP1 Arf-GAP with Rho-GAP domain, ANK repeat and PH domain-

containing protein 1 16925 116985 - Q96P48 ARAP1_HUMAN


containing protein 2 16924 116984 - Q8WZ64 ARAP2_HUMAN


containing protein 3 24097 64411 - Q8WWN8 ARAP3_HUMAN

ARHGAP20 Rho GTPase-activating protein 20 18357 57569 - Q9P2F6 RHG20_HUMAN

ASPSCR1_1-1 Tether containing UBX domain for GLUT4 13825 79058 - Q9BZE9 ASPC1_HUMAN

ATG12 Ubiquitin-like protein ATG12 588 9140 - O94817 ATG12_HUMAN

ATG3-1 Ubiquitin-like-conjugating enzyme ATG3 20962 64422 6.3.2.- Q9NT62 ATG3_HUMAN

ATG7_1-1 Ubiquitin-like modifier-activating enzyme ATG7 16935 10533 - O95352 ATG7_HUMAN

BAG1_1-1 BAG family molecular chaperone regulator 1 937 573 - Q99933 BAG1_HUMAN

BAG6_1-1 Large proline-rich protein BAG6 13919 7917 - P46379 BAG6_HUMAN

BMI1-1 Polycomb complex protein BMI-1 1066 648 - P35226 BMI1_HUMAN

BRAF-1/-2 Serine/threonine-protein kinase B-raf 1097 673 2.7.11.1 P15056 BRAF_HUMAN

CLK4 Dual specificity protein kinase CLK4 13659 57396 2.7.12.1 Q9HAZ1 CLK4_HUMAN

DCDC1 Doublecortin domain-containing protein 1 20625 341019 - P59894 DCDC1_HUMAN

DCDC2 Doublecortin domain-containing protein 2 18141 51473 - Q9UHG0 DCDC2_HUMAN

DCDC2B Doublecortin domain-containing protein 2B 32576 149069 - A2VCK2 DCD2B_HUMAN

DCDC2C Doublecortin domain-containing protein 2C 32696 728597 - A8MYV0 DCD2C_HUMAN

DCDC5 Doublecortin domain-containing protein 5 24799 100506627 - Q6ZRR9 DCDC5_HUMAN

DCLK1 Serine/threonine-protein kinase DCLK1 2700 9201 2.7.11.1 O15075-2 DCLK1_HUMAN

DCLK2 Serine/threonine-protein kinase DCLK2 19002 166614 2.7.11.1 Q8N568 DCLK2_HUMAN

DCX Neuronal migration protein doublecortin 2714 1641 - O43602 DCX_HUMAN

DDI1-1 Protein DDI1 homolog 1 18961 414301 - Q8WTU0 DDI1_HUMAN

DDI2_1-1 Protein DDI1 homolog 2 24578 84301 - Q5TDH0 DDI2_HUMAN

DGKQ Diacylglycerol kinase theta 2856 1609 2.7.1.107 P52824 DGKQ_HUMAN

EPB41L1_1-1 Band 4.1-like protein 1 3378 2036 - Q9H4G0 E41L1_HUMAN

EPB41L2-1 Band 4.1-like protein 2 3379 2037 - O43491 E41L2_HUMAN

EPB41L3_1-1 Band 4.1-like protein 3 3380 23136 - Q9Y2J2 E41L3_HUMAN

EPB41L4A Band 4.1-like protein 4A 13278 64097 - Q9HCS5 E41LA_HUMAN

EPB41L4B_1 Band 4.1-like protein 4B 19818 54566 - Q9H329 E41LB_HUMAN

EPB41L5_1-1 Band 4.1-like protein 5 19819 57669 - Q9HCM4 E41L5_HUMAN

FAF1_1-1 FAS-associated factor 1 3578 11124 - Q9UNN5 FAF1_HUMAN

114


NCBI GeneID

EC EnzymeID


FAF2-1 FAS-associated factor 2 24666 23197 - Q96CS3 FAF2_HUMAN

FARP2_1-1 FERM, RhoGEF and pleckstrin domain-containing protein 2 16460 9855 - O94887 FARP2_HUMAN

FAU_1-1 Ubiquitin-like protein FUBI 3597 2197 - P35544 UBIM_HUMAN

FRMD1_1-1 FERM domain-containing protein 1 21240 79981 - Q8N878 FRMD1_HUMAN

FRMD3_1-1/-2 FERM domain-containing protein 3 24125 257019 - A2A2Y4 FRMD3_HUMAN

FRMD4A_1-1 FERM domain-containing protein 4A 25491 55691 - Q9P2Q2 FRM4A_HUMAN

FRMD4B_1-1 FERM domain-containing protein 4B 24886 23150 - Q9Y2L6 FRM4B_HUMAN

FRMD5_1-1/-2 FERM domain-containing protein 5 28214 84978 - Q7Z6J6 FRMD5_HUMAN

FRMD6_1-1 FERM domain-containing protein 6 19839 122786 - Q96NE9 FRMD6_HUMAN

FRMD7_1-1 FERM domain-containing protein 7 8079 90167 - Q6ZUT3 FRMD7_HUMAN

FRMPD2_1-1 FERM and PDZ domain-containing protein 2 28572 143162 - Q68DX3 FRPD2_HUMAN

GABARAP Gamma-aminobutyric acid receptor-associated protein 4067 11337 - O95166 GBRAP_HUMAN

GABARAPL1_1-1 Gamma-aminobutyric acid receptor-associated protein-like 1 4068 23710 - Q9H0R8 GBRL1_HUMAN

GABARAPL2 Gamma-aminobutyric acid receptor-associated protein-like 2 13291 11345 - P60520 GBRL2_HUMAN

GRB10 Growth factor receptor-bound protein 10 4564 2887 - Q13322 GRB10_HUMAN



HERPUD1_1-1 Homocysteine-responsive endoplasmic reticulum-resident

ubiquitin-like domain member 1 protein 13744 9709 - Q15011 HERP1_HUMAN

HERPUD2_1-1 Homocysteine-responsive endoplasmic reticulum-resident

ubiquitin-like domain member 2 protein 21915 64224 - Q9BSE4 HERP2_HUMAN

HSPA13 Heat shock 70 kDa protein 13 11375 6782 - P48723 HSP13_HUMAN

IKBKB_1-1 Inhibitor of nuclear factor kappa-B kinase subunit 5960 3551 2.7.11.10 O14920 IKKB_HUMAN

IQUB_1-1 IQ and ubiquitin-like domain-containing protein 21995 154865 - Q8NA54 IQUB_HUMAN

ISG15_1-1/-2 Ubiquitin-like protein ISG15 4053 9636 - P05161 ISG15_HUMAN

MAP1LC3A_1-1 Microtubule-associated proteins 1A/1B light chain 3A 6838 84557 - Q9H492 MLP3A_HUMAN

MAP1LC3B Microtubule-associated proteins 1A/1B light chain 3B 13352 81631 - Q9GZQ8 MLP3B_HUMAN

MAP1LC3B2 Microtubule-associated proteins 1A/1B light chain 3 2 34390 643246 - A6NCE7 MP3B2_HUMAN

MAP1LC3C Microtubule-associated proteins 1A/1B light chain 3C 13353 440738 - Q9BXW4 MLP3C_HUMAN

MDP1_1 Magnesium-dependent phosphatase 1 28781 145553 3.1.3.48 Q86V88 MGDP1_HUMAN

MIDN Midnolin 16298 90007 - Q504T8 MIDN_HUMAN

MLLT4_1 Afadin 7137 4301 - P55196 AFAD_HUMAN

MOCS2 Molybdopterin synthase sulfur carrier subunit 7193 4338 - O96033 MOC2A_HUMAN

MYLIP_1-1 E3 ubiquitin-protein ligase MYLIP 21155 29116 6.3.2.- Q8WY64 MYLIP_HUMAN

MYO9A_1 Unconventional myosin-Ixa 7608 4649 - B2RTY4 MYO9A_HUMAN

MYO9B_1-1 Unconventional myosin-Ixb 7609 4650 - Q13459 MYO9B_HUMAN

NAE1_1-1 NEDD8-activating enzyme E1 regulatory subunit 621 8883 - Q13564 ULA1_HUMAN

NCF2_1-1 Neutrophil cytosol factor 2 7661 4688 - P19878 NCF2_HUMAN

NEDD8 NEDD8 7732 4738 - Q15843 NEDD8_HUMAN

NF2_1 Merlin 7773 4771 - P35240 MERL_HUMAN

NFATC2IP_1 NFATC2-interacting protein 25906 84901 - Q8NCF5 NF2IP_HUMAN

NPLOC4_1 Nuclear protein localization protein 4 homolog 18261 55666 - Q8TAT6 NPL4_HUMAN

NSFL1C_1 NSFL1 cofactor p47 15912 55968 - Q9UNZ2 NSF1C_HUMAN

OASL_1 2'-5'-oligoadenylate synthase-like protein 8090 8638 - Q15646 OASL_HUMAN

115


NCBI GeneID

EC EnzymeID


PAN2_1-1/-2/-3 Retinol dehydrogenase 14 19979 57665 1.1.1.- Q9HBH5 RDH14_HUMAN

PARK2_1 E3 ubiquitin-protein ligase parkin 8607 5071 6.3.2.- O60260 PRKN2_HUMAN

PCGF1_1-1 Polycomb group RING finger protein 1 17615 84759 - Q9BSM1 PCGF1_HUMAN

PCGF2_1-1 Polycomb group RING finger protein 2 12929 7703 - P35227 PCGF2_HUMAN

PCGF3_1-1 Polycomb group RING finger protein 3 10066 10336 - Q3KNV8 PCGF3_HUMAN

PCGF5_1-1 Polycomb group RING finger protein 5 28264 84333 - Q86SE9 PCGF5_HUMAN

PCGF6_1-1 Polycomb group RING finger protein 6 21156 84108 - Q9BYE7 PCGF6_HUMAN

PIK3C2A Phosphatidylinositol 4-phosphate 3-kinase C2

domain-containing subunit 8971 5286 2.7.1.154 O00443 P3C2A_HUMAN

PIK3C2B Phosphatidylinositol 4-phosphate 3-kinase C2

domain-containing subunit 8972 5287 2.7.1.154 O00750 P3C2B_HUMAN

PIK3CA Phosphatidylinositol 4,5-bisphosphate 3-kinase

catalytic subunit isoform 8975 5290 2.7.1.153 P42336 PK3CA_HUMAN

PIK3CB Phosphatidylinositol 4,5-bisphosphate 3-kinase

catalytic subunit isoform 8976 5291 2.7.1.153 P42338 PK3CB_HUMAN

PIK3CD Phosphatidylinositol 4,5-bisphosphate 3-kinase

catalytic subunit isoform 8977 5293 2.7.1.153 O00329 PK3CD_HUMAN

PIK3CG Phosphatidylinositol 4,5-bisphosphate 3-kinase

catalytic subunit isoform 8978 5294 2.7.1.153 P48736 PK3CG_HUMAN

PLXNC1_1-1/-2 Plexin-C1 9106 10154 - O60486 PLXC1_HUMAN

HELZ2_1-1/-2/-3 Helicase with zinc finger domain 2 (PRIC285) 30021 85441 3.6.4.- Q9BYK8 PR285_HUMAN

PTPN13_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 13 9646 5783 3.1.3.48 Q12923 PTN13_HUMAN

PTPN14 Tyrosine-protein phosphatase non-receptor type 14 9647 5784 3.1.3.48 Q15678 PTN14_HUMAN

PTPN21_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 21 9651 11099 3.1.3.48 Q16825 PTN21_HUMAN

PTPN3_1-1/-2 Tyrosine-protein phosphatase non-receptor type 3 9655 5774 3.1.3.48 P26045 PTN3_HUMAN

PTPN4_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 4 9656 5775 3.1.3.48 P29074 PTN4_HUMAN

RAD23A UV excision repair protein RAD23 homolog A 9812 5886 - P54725 RD23A_HUMAN

RAD23B UV excision repair protein RAD23 homolog B 9813 5887 - P54727 RD23B_HUMAN

RAF1_1 RAF proto-oncogene serine/threonine-protein kinase 9829 5894 2.7.11.1 P04049 RAF1_HUMAN

RALGDS_1-1/-2 Ral guanine nucleotide dissociation stimulator 9842 5900 - Q12967 GNDS_HUMAN

RAPGEF2 Rap guanine nucleotide exchange factor 2 16854 9693 - Q9Y4G8 RPGF2_HUMAN

RAPGEF4_1 Rap guanine nucleotide exchange factor 4 16626 11069 - Q8WZA2 RPGF4_HUMAN

RAPH1_1 Ras-associated and pleckstrin homology domains-

containing protein 1 14436 65059 - Q70E73 RAPH1_HUMAN

RASIP1 Ras-interacting protein 1 24716 54922 - Q5U651 RAIN_HUMAN

RASSF1_1 Ras association domain-containing protein 1 9882 11186 - Q9NS23 RASF1_HUMAN

RASSF2 Ras association domain-containing protein 2 9883 9770 - P50749 RASF2_HUMAN

RASSF3_1 Ras association domain-containing protein 3 14271 283349 - Q86WH2 RASF3_HUMAN

RASSF4_1 Ras association domain-containing protein 4 20793 83937 - Q9H2L5 RASF4_HUMAN

RASSF5_1 Ras association domain-containing protein 5 17609 83593 - Q8WWW0 RASF5_HUMAN

RASSF6_1 Ras association domain-containing protein 6 20796 166824 - Q6ZTQ3 RASF6_HUMAN

RASSF7_1 Ras association domain-containing protein 7 1166 8045 - Q02833 RASF7_HUMAN

RASSF8_1 Ras association domain-containing protein 8 13232 11228 - Q8NHQ8 RASF8_HUMAN

RASSF9 Ras association domain-containing protein 9 15739 9182 - O75901 RASF9_HUMAN

RBCK1_1-1/-2 RanBP-type and C3HC4-type zinc finger-containing

protein 1 15864 10616 6.3.2.- Q9BYM8 HOIL1_HUMAN

RDX_1-1 Radixin 9944 5962 - P35241 RADI_HUMAN

RGL1_1-1 Ral guanine nucleotide dissociation stimulator-like 1 30281 23179 - Q9NZL6 RGL1_HUMAN

RGL2_1-1 Ral guanine nucleotide dissociation stimulator-like 2 9769 5863 - O15211 RGL2_HUMAN

116


NCBI GeneID

EC EnzymeID


RGL3_1-1 Ral guanine nucleotide dissociation stimulator-like 3 30282 57139 - Q3MIN7 RGL3_HUMAN

RGS12_1 Regulator of G-protein signaling 12 9994 6002 - O14924 RGS12_HUMAN

RGS14_1 Regulator of G-protein signaling 14 9996 10636 - O43566 RGS14_HUMAN

RIN1_1 Ras and Rab interactor 1 18749 9610 - Q13671 RIN1_HUMAN

RIN2_1 Ras and Rab interactor 2 18750 54453 - Q8WYP3 RIN2_HUMAN

RIN3_1 Ras and Rab interactor 3 18751 79890 - Q8TB24 RIN3_HUMAN

RING1_1-1/-2 E3 ubiquitin-protein ligase RING1 10018 6015 6.3.2.- Q06587 RING1_HUMAN

RING2_1-1 E3 ubiquitin-protein ligase RING2 10061 6045 6.3.2.- Q99496 RING2_HUMAN

RP1 Oxygen-regulated protein 1 10263 6101 - P56715 RP1_HUMAN

RP1L1_1 Retinitis pigmentosa 1-like 1 protein 15946 94137 - Q8IWN7 RP1L1_HUMAN

RPS27A_1-1 Ubiquitin-40S ribosomal protein S27a 10417 6233 - P62979 RS27A_HUMAN

RSG1_1-1/2 REM2- and Rab-like small GTPase 1 28127 79363 - Q9BU20 RSG1_HUMAN

SACS_1 Sacsin 10519 26278 - Q9NZJ4 SACS_HUMAN

SAE1_1-1 SUMO-activating enzyme subunit 1 30660 10055 - Q9UBE0 SAE1_HUMAN

SAE2 SUMO-activating enzyme subunit 2 30661 10054 6.3.2.- Q9UBT2 SAE2_HUMAN

SF3A1_1-1 Splicing factor 3A subunit 1 10765 10291 - Q15459 SF3A1_HUMAN

SHARPIN_1-1/-2 Sharpin 25321 81858 - Q9H0F6 SHRPN_HUMAN

SHROOM1 Shroom1 24084 134549 - Q2M3G4 SHRM1_HUMAN

SNRNP25 U11/U12 small nuclear ribonucleoprotein 25 kDa protein 14161 79622 - Q9BV90 SNR25_HUMAN

SNX27_1 Sorting nexin-27 20073 81609 - Q96L92 SNX27_HUMAN

SNX31_1 Sorting nexin-31 28605 169166 - Q8N9S9 SNX31_HUMAN

SUMO1_1-1/-2 Small ubiquitin-related modifier 1 12502 7341 - P63165 SUMO1_HUMAN

SUMO2_1-1 Small ubiquitin-related modifier 2 11125 6613 - P61956 SUMO2_HUMAN

SUMO3_1-1 Small ubiquitin-related modifier 3 11124 6612 - P55854 SUMO3_HUMAN

SUMO4_1-1 Small ubiquitin-related modifier 4 21181 387082 - Q6EEV6 SUMO4_HUMAN

TBCB_1-1 Tubulin-folding cofactor B 1989 1155 - Q99426 TBCB_HUMAN

TBCE Tubulin-specific chaperone E 11582 6905 - Q15813 TBCE_HUMAN

TBCEL Tubulin-specific chaperone cofactor E-like protein 28115 219899 - Q5QJ74 TBCEL_HUMAN

TCEB2_1-1 Transcription elongation factor B polypeptide 2 11619 6923 - Q15370 ELOB_HUMAN

TECR_1 Very-long-chain enoyl-CoA reductase 4551 9524 1.3.1.93 Q9NZ01 TECR_HUMAN

TIAM1 T-lymphoma invasion and metastasis-inducing protein 1 11805 7074 - Q13009 TIAM1_HUMAN

TIAM2_1 T-lymphoma invasion and metastasis-inducing protein 2 11806 26230 - Q8IVF5 TIAM2_HUMAN

TMUB1_1-1 Transmembrane and ubiquitin-like domain-containing

protein 1 21709 83590 - Q9BVT8 TMUB1_HUMAN

TMUB2_1-1 Transmembrane and ubiquitin-like domain-containing

protein 2 28459 79089 - Q71RG4 TMUB2_HUMAN

UBA1 Ubiquitin-like modifier-activating enzyme 1 12469 7317 - P22314 UBA1_HUMAN

UBA3_1 NEDD8-activating enzyme E1 catalytic subunit 12470 9039 6.3.2.- Q8TBC4 UBA3_HUMAN

UBA5_1 Ubiquitin-like modifier-activating enzyme 5 23230 79876 - Q9GZZ9 UBA5_HUMAN

UBA6_1 Ubiquitin-like modifier-activating enzyme 6 25581 55236 - A0AVT1 UBA6_HUMAN

UBA7 Ubiquitin-like modifier-activating enzyme 7 12471 7318 - P41226 UBA7_HUMAN

UBA52_1-1 Ubiquitin-60S ribosomal protein L40 12458 7311 - P62987 RL40_HUMAN

UBAC1 Ubiquitin-associated domain-containing protein 1 30221 10422 - Q9BSL1 UBAC1_HUMAN

117


NCBI GeneID

EC EnzymeID


UBB_1-1 Polyubiquitin-B 12463 7314 - P0CG47 UBB_HUMAN

UBC_1-1 Polyubiquitin-C 12468 7316 - P0CG48 UBC_HUMAN

UBD_1-1/-2 Ubiquitin D 18795 10537 - O15205 UBD_HUMAN

UBFD1_1-1/-2 Ubiquitin domain-containing protein UBFD1 30565 56061 - O14562 UBFD1_HUMAN

UBIML_1-1 Putative ubiquitin-like protein FUBI-like protein

ENSP00000310146 - - - A6NDN8 UBIML_HUMAN

UBL3_1-1 Ubiquitin-like protein 3 12504 5412 - O95164 UBL3_HUMAN

UBL4A_1-1 Ubiquitin-like protein 4A 12505 8266 - P11441 UBL4A_HUMAN

UBL4B_1-1 Ubiquitin-like protein 4B 32309 164153 - Q8N7F7 UBL4B_HUMAN

UBL5_1-1 Ubiquitin-like protein 5 13736 59286 - Q9BZL1 UBL5_HUMAN

UBL7_1-1 Ubiquitin-like protein 7 28221 84993 - Q96S82 UBL7_HUMAN

UBLCP1_1-1/-2/-3 Ubiquitin-like domain-containing CTD phosphatase 1 28110 134510 3.1.3.16 Q8WVY7 UBCP1_HUMAN

UBQLN1_1-1 Ubiquilin-1 12508 29979 - Q9UMX0 UBQL1_HUMAN

UBQLN2_1-1 Ubiquilin-2 12509 29978 - Q9UHD9 UBQL2_HUMAN

UBQLN3_1-1 Ubiquilin-3 12510 50613 - Q9H347 UBQL3_HUMAN

UBQLN4_1-1 Ubiquilin-4 1237 56893 - Q9NRR5 UBQL4_HUMAN

UBQLNL_1-1 Ubiquilin-like protein 28294 143630 - Q8IYU4 UBQLN_HUMAN

UBTD1_1-1 Ubiquitin domain-containing protein 1 25683 80019 - Q9HAC8 UBTD1_HUMAN

UBTD2_1-1 Ubiquitin domain-containing protein 2 24463 92181 - Q8WUN7 UBTD2_HUMAN

UBXN1_1-1 UBX domain-containing protein 1 18402 51035 - Q04323 UBXN1_HUMAN

UBXN2A_1-1/-2 UBX domain-containing protein 2A 27265 165324 - P68543 UBX2A_HUMAN

UBXN2B_1-1/-2 UBX domain-containing protein 2B 27035 137886 - Q14CS0 UBX2B_HUMAN

UBXN4_1-1/-2 UBX domain-containing protein 4 14860 23190 - Q92575 UBXN4_HUMAN

UBXN6_1-1/-2 UBX domain-containing protein 6 14928 80700 - Q9BZV1 UBXN6_HUMAN

UBXN7_1-1/-2 UBX domain-containing protein 7 29119 26043 - O94888 UBXN7_HUMAN

UBXN8_1-1 UBX domain-containing protein 8 30307 7993 - O00124 UBXN8_HUMAN

UBXN10_1-1 UBX domain-containing protein 10 26354 127733 - Q96LJ8 UBX10_HUMAN

UBXN11_1 UBX domain-containing protein 11 30600 91544 - Q5T124 UBX11_HUMAN

UFM1_1-1 Ubiquitin-fold modifier 1 20597 51569 - P61960 UFM1_HUMAN

UHRF1_1-1 E3 ubiquitin-protein ligase UHRF1 12556 29128 6.3.2.- Q96T88 UHRF1_HUMAN

UHRF1BP1 UHRF1-binding protein 1 21216 54887 - Q6BDS2 URFB1_HUMAN

UHRF2_1-1 E3 ubiquitin-protein ligase UHRF2 12557 115426 6.3.2.- Q96PU4 UHRF2_HUMAN

URM1_1-1 Ubiquitin-related modifier 1 28378 81605 - Q9BTM9 URM1_HUMAN

USP11_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 11 12609 8237 3.4.19.12 P51784 UBP11_HUMAN

USP14_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 14 12612 9097 3.4.19.12 P54578 UBP14_HUMAN

USP15_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 15 12613 9958 3.4.19.12 Q9Y4E8 UBP15_HUMAN

USP20_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 20 12619 10868 3.4.19.12 Q9Y2K6 UBP20_HUMAN

USP21_1-1 Ubiquitin carboxyl-terminal hydrolase 21 12620 27005 3.4.19.12 Q9UK80 UBP21_HUMAN

USP24_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 24 12623 23358 3.4.19.12 Q9UPU5 UBP24_HUMAN

USP25_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 25 12624 29761 3.4.19.12 Q9UHP3 UBP25_HUMAN

USP28_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 28 12625 57646 3.4.19.12 Q96RU2 UBP28_HUMAN

USP31_1-1 Ubiquitin carboxyl-terminal hydrolase 31 20060 57478 3.4.19.12 Q70CQ4 UBP31_HUMAN

USP32 Ubiquitin carboxyl-terminal hydrolase 32 19143 84669 3.4.19.12 Q8NFA0 UBP32_HUMAN

118


NCBI GeneID

EC EnzymeID


USP34 Ubiquitin carboxyl-terminal hydrolase 34 20066 9736 3.4.19.12 Q70CQ2 UBP34_HUMAN

USP4_1-1/-2/-3/-4 Ubiquitin carboxyl-terminal hydrolase 4 12627 7375 3.4.19.12 Q13107 UBP4_HUMAN

USP40_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 40 20069 55230 3.4.19.12 Q9NVE5 UBP40_HUMAN

USP43 Ubiquitin carboxyl-terminal hydrolase 43 20072 124739 3.4.19.12 Q70EL4 UBP43_HUMAN

USP47 Ubiquitin carboxyl-terminal hydrolase 47 20076 55031 3.4.19.12 Q96K76 UBP47_HUMAN

USP48_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 48 18533 84196 3.4.19.12 Q86UV5 UBP48_HUMAN

USP5_1-1 Ubiquitin carboxyl-terminal hydrolase 5 12628 8078 3.4.19.12 P45974 UBP5_HUMAN

USP6 Ubiquitin carboxyl-terminal hydrolase 6 12629 9098 3.4.19.12 P35125 UBP6_HUMAN

USP7 Ubiquitin carboxyl-terminal hydrolase 7 12630 7874 3.4.19.12 Q93009 UBP7_HUMAN

USP8_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 8 12631 9101 3.4.19.12 P40818 UBP8_HUMAN

USP9X_1-1/-2/-3 Probable ubiquitin carboxyl-terminal hydrolase FAF-X 12632 8239 3.4.19.12 Q93008 USP9X_HUMAN

USP9Y_1-1/-2/-3 Probable ubiquitin carboxyl-terminal hydrolase FAF-Y 12633 8287 3.4.19.12 O00507 USP9Y_HUMAN

VCPIP1_1-1/-2/-3 Deubiquitinating protein VCIP135 30897 80124 3.4.19.12 Q96JH7 VCIP1_HUMAN

WDR48_1-1/-2 WD repeat-containing protein 48 30914 57599 - Q8TAF3 WDR48_HUMAN

YOD1_1-1 Ubiquitin thioesterase OTU1 25035 55432 3.4.19.12 Q5VVQ6 OTU1_HUMAN

119

Appendix II: All human genes & isoforms that encode ubiquitin-like domains.

Gene Name Protein Name UniProt ID UniProt Name

ANKRD60 Ankyrin repeat domain-containing protein 60 Q9BZ19 ANR60_HUMAN

ANKUB1-1/-2/-3 ANKUB1 A6NFN9 ANKUB_HUMAN

ANUBL1-1 AN1-type zinc finger protein 4 Q86XD8 ZFAN4_HUMAN

APBB1IP Amyloid beta A4 precursor protein-binding family B member 1-interacting protein Q7Z5R6 AB1IP_HUMAN

ARAF-1 Serine/threonine-protein kinase A-Raf P10398 ARAF_HUMAN

ARAP1 Arf-GAP with Rho-GAP domain, ANK repeat and PH domain-containing protein 1 Q96P48 ARAP1_HUMAN

ARAP2 Arf-GAP with Rho-GAP domain, ANK repeat and PH domain-containing protein 2 Q8WZ64 ARAP2_HUMAN

ARAP3 Arf-GAP with Rho-GAP domain, ANK repeat and PH domain-containing protein 3 Q8WWN8 ARAP3_HUMAN

ARHGAP20 Rho GTPase-activating protein 20 Q9P2F6 RHG20_HUMAN

ASPSCR1_1-1 Tether containing UBX domain for GLUT4 Q9BZE9 ASPC1_HUMAN

ASPSCR1_2-1 Q9BZE9-2 ASPC1_HUMAN

ASPSCR1_3 Q9BZE9-3 ASPC1_HUMAN

ATG12 Ubiquitin-like protein ATG12 O94817 ATG12_HUMAN

ATG3_1-1 Ubiquitin-like-conjugating enzyme ATG3 Q9NT62 ATG3_HUMAN

ATG7_1-1 Ubiquitin-like modifier-activating enzyme ATG7 O95352 ATG7_HUMAN

ATG7_2-1 O95352-2 ATG7_HUMAN

ATG7_3-1 O95352-3 ATG7_HUMAN

BAG1_1-1 BAG family molecular chaperone regulator 1 Q99933 BAG1_HUMAN

BAG1_2-1 Q99933-2 BAG1_HUMAN



BAG6_1-1 Large proline-rich protein BAG6 P46379 BAG6_HUMAN

BAG6_2-1 P46379-2 BAG6_HUMAN

BAG6_3-1 P46379-3 BAG6_HUMAN

BMI1_1-1 Polycomb complex protein BMI-1 P35226 BMI1_HUMAN

BRAF_1-1/-2 Serine/threonine-protein kinase B-raf P15056 BRAF_HUMAN

CLK4_1-1 Dual specificity protein kinase CLK4 Q9HAZ1 CLK4_HUMAN

DCDC1_1-1 Doublecortin domain-containing protein 1 P59894 DCDC1_HUMAN

DCDC2_1-1 Doublecortin domain-containing protein 2 Q9UHG0 DCDC2_HUMAN

DCDC2B_1-1 Doublecortin domain-containing protein 2B A2VCK2 DCD2B_HUMAN

DCDC2C_1-1 Doublecortin domain-containing protein 2C A8MYV0 DCD2C_HUMAN

DCDC5 Doublecortin domain-containing protein 5 Q6ZRR9 DCDC5_HUMAN

DCLK1_1-1 Serine/threonine-protein kinase DCLK1 O15075-2 DCLK1_HUMAN

DCLK2_1-1 Serine/threonine-protein kinase DCLK2 Q8N568 DCLK2_HUMAN

DCX_1-1 Neuronal migration protein doublecortin O43602 DCX_HUMAN

DCX_2-1 O43602-2 DCX_HUMAN

DDI1_1-1 Protein DDI1 homolog 1 Q8WTU0 DDI1_HUMAN

120

DDI2_1-1 Protein DDI1 homolog 2 Q5TDH0 DDI2_HUMAN

DDI2_2-1 Q5TDH0-2 DDI2_HUMAN

DDI2_3-1 Q5TDH0-3 DDI2_HUMAN

DGKQ Diacylglycerol kinase theta P52824 DGKQ_HUMAN

EPB41_1-1 P11171 DGKQ_HUMAN

EPB41_2-1 P11171-2 DGKQ_HUMAN





EPB41L1_1-1 Band 4.1-like protein 1 Q9H4G0 E41L1_HUMAN

EPB41L1_2-1 Q9H4G0 E41L1_HUMAN



EPB41L2-1 Band 4.1-like protein 2 O43491 E41L2_HUMAN

EPB41L3_1-1 Band 4.1-like protein 3 Q9Y2J2 E41L3_HUMAN

EPB41L3_2-1 Q9Y2J2 E41L3_HUMAN

EPB41L3_3-1 Q9Y2J2 E41L3_HUMAN

EPB41L4A Band 4.1-like protein 4A Q9HCS5 E41LA_HUMAN

EPB41L4B_1 Band 4.1-like protein 4B Q9H329 E41LB_HUMAN

EPB41L4B_2 Q9H329 E41LB_HUMAN

EPB41L5_1-1 Band 4.1-like protein 5 Q9HCM4 E41L5_HUMAN

EPB41L5_2-1 Q9HCM4 E41L5_HUMAN



FAF1_1-1 FAS-associated factor 1 Q9UNN5 FAF1_HUMAN

FAF1_2-1 Q9UNN5 FAF1_HUMAN

FAF2-1 FAS-associated factor 2 Q96CS3 FAF2_HUMAN

FARP2_1-1 FERM, RhoGEF and pleckstrin domain-containing protein 2 O94887 FARP2_HUMAN

FARP2_2-1 O94887 FARP2_HUMAN

FAU_1-1 Ubiquitin-like protein FUBI P35544 UBIM_HUMAN

FRMD1_1-1 FERM domain-containing protein 1 Q8N878 FRMD1_HUMAN

FRMD1_2-1/-2 Q8N878 FRMD1_HUMAN

FRMD3_1-1/-2 FERM domain-containing protein 3 A2A2Y4 FRMD3_HUMAN

FRMD3_2-1/-2 A2A2Y4-2 FRMD3_HUMAN


FRMD3_4-1 A2A2Y4-4 FRMD3_HUMAN






FRMD4A_1-1 FERM domain-containing protein 4A Q9P2Q2 FRM4A_HUMAN

121

FRMD4B_1-1 FERM domain-containing protein 4B Q9Y2L6 FRM4B_HUMAN

FRMD5_1-1/-2 FERM domain-containing protein 5 Q7Z6J6 FRMD5_HUMAN

FRMD5_2-1 Q7Z6J6-2 FRMD5_HUMAN

FRMD6_1-1 FERM domain-containing protein 6 Q96NE9 FRMD6_HUMAN

FRMD6_2-1 Q96NE9-2 FRMD6_HUMAN

FRMD7_1-1 FERM domain-containing protein 7 Q6ZUT3 FRMD7_HUMAN

FRMPD2_1-1 FERM and PDZ domain-containing protein 2 Q68DX3 FRPD2_HUMAN

FRMPD2_2-1 Q68DX3-2 FRPD2_HUMAN

FRMPD2_4-1/-2 Q68DX3-4 FRPD2_HUMAN

FRMPD2_5-1 Q68DX3-5 FRPD2_HUMAN

GABARAP Gamma-aminobutyric acid receptor-associated protein O95166 GBRAP_HUMAN

GABARAPL1_1-1 Gamma-aminobutyric acid receptor-associated protein-like 1 Q9H0R8 GBRL1_HUMAN

GABARAPL1_2-1 Q9H0R8-2 GBRL1_HUMAN

GABARAPL2 Gamma-aminobutyric acid receptor-associated protein-like 2 P60520 GBRL2_HUMAN

GRB10 Growth factor receptor-bound protein 10 Q13322 GRB10_HUMAN



HERPUD1_1-1 Homocysteine-responsive endoplasmic reticulum-resident ubiquitin-like domain member

1 protein Q15011 HERP1_HUMAN

HERPUD1_2-1 Q15011-2 HERP1_HUMAN

HERPUD1_3-1 Q15011-3 HERP1_HUMAN

HERPUD2_1-1 Homocysteine-responsive endoplasmic reticulum-resident ubiquitin-like domain member

2 protein Q9BSE4 HERP2_HUMAN

HSPA13 Heat shock 70 kDa protein 13 P48723 HSP13_HUMAN

IKBKB_1-1 Inhibitor of nuclear factor kappa-B kinase subunit O14920 IKKB_HUMAN

IQUB_1-1 IQ and ubiquitin-like domain-containing protein Q8NA54 IQUB_HUMAN

IQUB_2-1 Q8NA54-2 IQUB_HUMAN

ISG15_1-1/-2 Ubiquitin-like protein ISG15 P05161 ISG15_HUMAN

MAP1LC3A_1-1 Microtubule-associated proteins 1A/1B light chain 3A Q9H492 MLP3A_HUMAN

MAP1LC3A_2-1 Q9H492-2 MLP3A_HUMAN

MAP1LC3B Microtubule-associated proteins 1A/1B light chain 3B Q9GZQ8 MLP3B_HUMAN

MAP1LC3B2 Microtubule-associated proteins 1A/1B light chain 3 2 A6NCE7 MP3B2_HUMAN

MAP1LC3C Microtubule-associated proteins 1A/1B light chain 3C Q9BXW4 MLP3C_HUMAN

MDP1_1 Magnesium-dependent phosphatase 1 Q86V88 MGDP1_HUMAN

MDP1_2 Q86V88 MGDP1_HUMAN

MDP1_3 Q86V88 MGDP1_HUMAN

MIDN Midnolin Q504T8 MIDN_HUMAN

122

MLLT4_1 Afadin P55196 AFAD_HUMAN

MLLT4_2 P55196-2 AFAD_HUMAN





MOCS2 Molybdopterin synthase sulfur carrier subunit O96033 MOC2A_HUMAN

MYLIP_1-1 E3 ubiquitin-protein ligase MYLIP Q8WY64 MYLIP_HUMAN

MYLIP_2-1 Q8WY64 MYLIP_HUMAN

MYO9A_1 Unconventional myosin-Ixa B2RTY4 MYO9A_HUMAN

MYO9A_2 B2RTY4 MYO9A_HUMAN



MYO9B_1-1 Unconventional myosin-Ixb Q13459 MYO9B_HUMAN

MYO9B_2-1 Q13459 MYO9B_HUMAN

NAE1_1-1 NEDD8-activating enzyme E1 regulatory subunit Q13564 ULA1_HUMAN

NAE1_2-1 Q13564 ULA1_HUMAN

NCF2_1-1 Neutrophil cytosol factor 2 P19878 NCF2_HUMAN

NEDD8 NEDD8 Q15843 NEDD8_HUMAN

NF2_1 Merlin P35240 MERL_HUMAN

NF2_2 P35240 MERL_HUMAN







NFATC2IP_1 NFATC2-interacting protein Q8NCF5 NF2IP_HUMAN

NFATC2IP_2 Q8NCF5-2 NF2IP_HUMAN

NFATC2IP_3 Q8NCF5-3 NF2IP_HUMAN

NPLOC4_1 Nuclear protein localization protein 4 homolog Q8TAT6 NPL4_HUMAN

NPLOC4_2 Q8TAT6 NPL4_HUMAN

NSFL1C_1 NSFL1 cofactor p47 Q9UNZ2 NSF1C_HUMAN

NSFL1C_2 Q9UNZ2 NSF1C_HUMAN



OASL_1 2'-5'-oligoadenylate synthase-like protein Q15646 OASL_HUMAN

OASL_2 Q15646-2 OASL_HUMAN

PAN2_1-1/-2/-3 Retinol dehydrogenase 14 Q9HBH5 RDH14_HUMAN

PAN2_2-1/-2/-3 Q9HBH5-2 RDH14_HUMAN

PAN2_3-1/-2/-3 Q9HBH5-3 RDH14_HUMAN

123

PARK2_1 E3 ubiquitin-protein ligase parkin O60260 PRKN2_HUMAN

PARK2_2 O60260-2 PRKN2_HUMAN

PARK2_3 O60260-3 PRKN2_HUMAN PARK2_4 O60260-4 PRKN2_HUMAN



PCGF1_1-1 Polycomb group RING finger protein 1 Q9BSM1 PCGF1_HUMAN

PCGF1_2-1 Q9BSM1-2 PCGF1_HUMAN

PCGF2_1-1 Polycomb group RING finger protein 2 P35227 PCGF2_HUMAN

PCGF3_1-1 Polycomb group RING finger protein 3 Q3KNV8 PCGF3_HUMAN

PCGF3_2-1 Q3KNV8-2 PCGF3_HUMAN

PCGF5_1-1 Polycomb group RING finger protein 5 Q86SE9 PCGF5_HUMAN

PCGF6_1-1 Polycomb group RING finger protein 6 Q9BYE7 PCGF6_HUMAN

PCGF6_2-1/-2 Q9BYE7-2 PCGF6_HUMAN

PIK3C2A Phosphatidylinositol 4-phosphate 3-kinase C2 domain-containing subunit O00443 P3C2A_HUMAN

PIK3C2B Phosphatidylinositol 4-phosphate 3-kinase C2 domain-containing subunit O00750 P3C2B_HUMAN

PIK3CA Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit isoform P42336 PK3CA_HUMAN

PIK3CB Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit isoform P42338 PK3CB_HUMAN

PIK3CD Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit isoform O00329 PK3CD_HUMAN

PIK3CG Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit isoform P48736 PK3CG_HUMAN

PLXNC1_1-1/-2 Plexin-C1 O60486 PLXC1_HUMAN

HELZ2_1-1/-2/-3 Helicase with zinc finger domain 2 (PRIC285) Q9BYK8 PR285_HUMAN

HELZ2_2-1/-2 Q9BYK8-2 PR285_HUMAN

HELZ2_3 Q9BYK8-3 PR285_HUMAN

PTPN13_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 13 Q12923 PTN13_HUMAN

PTPN13_2-1/-2/-3 Q12923-2 PTN13_HUMAN

PTPN13_3-1/-2/-3/-4/-5/-6/-7/-8/-9

Q12923-3 PTN13_HUMAN

PTPN13_4-1/-2/-3/-4/-5/-6/-7/-8/-9/-10

Q12923-4 PTN13_HUMAN

PTPN14_1-1/-2/-3/-4

Tyrosine-protein phosphatase non-receptor type 14 Q15678 PTN14_HUMAN

PTPN21_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 21 Q16825 PTN21_HUMAN

PTPN3_1-1/-2 Tyrosine-protein phosphatase non-receptor type 3 P26045 PTN3_HUMAN

PTPN4_1-1/-2/-3 Tyrosine-protein phosphatase non-receptor type 4 P29074 PTN4_HUMAN

RAD23A UV excision repair protein RAD23 homolog A P54725 RD23A_HUMAN

RAD23B UV excision repair protein RAD23 homolog B P54727 RD23B_HUMAN

124

RAF1_1 RAF proto-oncogene serine/threonine-protein kinase P04049 RAF1_HUMAN

RAF1_2 P04049-2 RAF1_HUMAN

RALGDS_1-1/-2 Ral guanine nucleotide dissociation stimulator Q12967 GNDS_HUMAN

RALGDS_2-1/-2/-3 Q12967-2 GNDS_HUMAN

RALGDS_3 Q12967 GNDS_HUMAN

RAPGEF2 Rap guanine nucleotide exchange factor 2 Q9Y4G8 RPGF2_HUMAN

RAPGEF4_1 Rap guanine nucleotide exchange factor 4 Q8WZA2 RPGF4_HUMAN

RAPGEF4_2 Q8WZA2 RPGF4_HUMAN

RAPGEF4_3 Q8WZA2 RPGF4_HUMAN

RAPH1_1 Ras-associated and pleckstrin homology domains-containing protein 1 Q70E73 RAPH1_HUMAN

RAPH1_2 Q70E73 RAPH1_HUMAN








RASIP1 Ras-interacting protein 1 Q5U651 RAIN_HUMAN

RASSF1_1 Ras association domain-containing protein 1 Q9NS23 RASF1_HUMAN

RASSF1_2 Q9NS23 RASF1_HUMAN




RASSF2 Ras association domain-containing protein 2 P50749 RASF2_HUMAN

RASSF3_1 Ras association domain-containing protein 3 Q86WH2 RASF3_HUMAN

RASSF4_1 Ras association domain-containing protein 4 Q9H2L5 RASF4_HUMAN

RASSF4_2 Q9H2L5-2 RASF4_HUMAN



RASSF5_1 Ras association domain-containing protein 5 Q8WWW0 RASF5_HUMAN

RASSF5_2 Q8WWW0 RASF5_HUMAN



RASSF6_1 Ras association domain-containing protein 6 Q6ZTQ3 RASF6_HUMAN

RASSF6_2 Q6ZTQ3 RASF6_HUMAN



RASSF7_1 Ras association domain-containing protein 7 Q02833 RASF7_HUMAN

RASSF7_2 Q02833 RASF7_HUMAN

125

RASSF8_1 Ras association domain-containing protein 8 Q8NHQ8 RASF8_HUMAN

RASSF8_2 Q8NHQ8 RASF8_HUMAN

RASSF9 Ras association domain-containing protein 9 O75901 RASF9_HUMAN

RBCK1_1-1/-2 RanBP-type and C3HC4-type zinc finger-containing protein 1 Q9BYM8 HOIL1_HUMAN

RBCK1_2-1/-2 Q9BYM8 HOIL1_HUMAN

RBCK1_2-2 Q9BYM8 HOIL1_HUMAN

RBCK1_3-1 Q9BYM8 HOIL1_HUMAN

RDX_1-1 Radixin P35241 RADI_HUMAN

RGL1_1-1 Ral guanine nucleotide dissociation stimulator-like 1 Q9NZL6 RGL1_HUMAN

RGL1_2-1 Q9NZL6-2 RGL1_HUMAN

RGL2_1-1 Ral guanine nucleotide dissociation stimulator-like 2 O15211 RGL2_HUMAN

RGL3_1-1 Ral guanine nucleotide dissociation stimulator-like 3 Q3MIN7 RGL3_HUMAN

RGS12_1 Regulator of G-protein signaling 12 O14924 RGS12_HUMAN

RGS12_2 O14924-2 RGS12_HUMAN



RGS14_1 Regulator of G-protein signaling 14 O43566 RGS14_HUMAN




RIN1_1 Ras and Rab interactor 1 Q13671 RIN1_HUMAN

RIN1_2 Q13671-2 RIN1_HUMAN

RIN2_1 Ras and Rab interactor 2 Q8WYP3 RIN2_HUMAN

RIN2_2 Q8WYP3-2 RIN2_HUMAN

RIN3_1 Ras and Rab interactor 3 Q8TB24 RIN3_HUMAN

RING1_1-1/-2 E3 ubiquitin-protein ligase RING1 Q06587 RING1_HUMAN

RING1_2-1/-2 Q06587-2 RING1_HUMAN

RING2_1-1 E3 ubiquitin-protein ligase RING2 Q99496 RING2_HUMAN

RP1_1-1 Oxygen-regulated protein 1 P56715 RP1_HUMAN

RP1L1_1 Retinitis pigmentosa 1-like 1 protein Q8IWN7 RP1L1_HUMAN

RP1L1_2 Q8IWN7 RP1L1_HUMAN

RPS27A_1-1 Ubiquitin-40S ribosomal protein S27a P62979 RS27A_HUMAN

RSG1_1-1/2 REM2- and Rab-like small GTPase 1 Q9BU20 RSG1_HUMAN

SACS_1 Sacsin Q9NZJ4 SACS_HUMAN

SACS_2 Q9NZJ4 SACS_HUMAN

SAE1_1-1 SUMO-activating enzyme subunit 1 Q9UBE0 SAE1_HUMAN

SAE1_2-1 Q9UBE0-2 SAE1_HUMAN

SAE1_3-1 Q9UBE0-3 SAE1_HUMAN

SAE2_1-1 SUMO-activating enzyme subunit 2 Q9UBT2 SAE2_HUMAN

SF3A1_1-1 Splicing factor 3A subunit 1 Q15459 SF3A1_HUMAN

126

SHARPIN_1-1/-2 Sharpin Q9H0F6 SHRPN_HUMAN

SHARPIN_2-1/-2 Q9H0F6-2 SHRPN_HUMAN

SHARPIN_3-1 Q9H0F6-3 SHRPN_HUMAN

SHROOM1_1 Shroom1 Q2M3G4 SHRM1_HUMAN

SHROOM1_2 Q2M3G4-2 SHRM1_HUMAN

SNRNP25 U11/U12 small nuclear ribonucleoprotein 25 kDa protein Q9BV90 SNR25_HUMAN

SNX27_1 Sorting nexin-27 Q96L92 SNX27_HUMAN

SNX27_2 Q96L92 SNX27_HUMAN

SNX27_3 Q96L92 SNX27_HUMAN

SNX31_1 Sorting nexin-31 Q8N9S9 SNX31_HUMAN

SNX31_2 Q8N9S9-2 SNX31_HUMAN

SUMO1_1-1/-2 Small ubiquitin-related modifier 1 P63165 SUMO1_HUMAN

SUMO2_1-1 Small ubiquitin-related modifier 2 P61956 SUMO2_HUMAN

SUMO2_2-1 P61956 SUMO2_HUMAN

SUMO3_1-1 Small ubiquitin-related modifier 3 P55854 SUMO3_HUMAN

SUMO4_1-1 Small ubiquitin-related modifier 4 Q6EEV6 SUMO4_HUMAN

TBCB_1-1 Tubulin-folding cofactor B Q99426 TBCB_HUMAN

TBCE Tubulin-specific chaperone E Q15813 TBCE_HUMAN

TBCEL Tubulin-specific chaperone cofactor E-like protein Q5QJ74 TBCEL_HUMAN

TCEB2_1-1 Transcription elongation factor B polypeptide 2 Q15370 ELOB_HUMAN

TECR_1 Very-long-chain enoyl-CoA reductase Q9NZ01 TECR_HUMAN

TIAM1 T-lymphoma invasion and metastasis-inducing protein 1 Q13009 TIAM1_HUMAN

TIAM2_1 T-lymphoma invasion and metastasis-inducing protein 2 Q8IVF5 TIAM2_HUMAN

TIAM2_2 Q8IVF5 TIAM2_HUMAN



TMUB1_1-1 Transmembrane and ubiquitin-like domain-containing protein 1 Q9BVT8 TMUB1_HUMAN

TMUB2_1-1 Transmembrane and ubiquitin-like domain-containing protein 2 Q71RG4 TMUB2_HUMAN

TMUB2_2-1/-2 Q71RG4-2 TMUB2_HUMAN

TMUB2_3-1 Q71RG4 TMUB2_HUMAN

TMUB2_4-1 Q71RG4 TMUB2_HUMAN

UBA1 Ubiquitin-like modifier-activating enzyme 1 P22314 UBA1_HUMAN

UBA3_1 NEDD8-activating enzyme E1 catalytic subunit Q8TBC4 UBA3_HUMAN

UBA3_2 Q8TBC4 UBA3_HUMAN

UBA5_1 Ubiquitin-like modifier-activating enzyme 5 Q9GZZ9 UBA5_HUMAN

UBA5_2 Q9GZZ9 UBA5_HUMAN

UBA6_1 Ubiquitin-like modifier-activating enzyme 6 A0AVT1 UBA6_HUMAN

UBA6_2 A0AVT1-2 UBA6_HUMAN

UBA7 Ubiquitin-like modifier-activating enzyme 7 P41226 UBA7_HUMAN

UBA52_1-1 Ubiquitin-60S ribosomal protein L40 P62987 RL40_HUMAN

127

UBAC1 Ubiquitin-associated domain-containing protein 1 Q9BSL1 UBAC1_HUMAN

UBB_1-1 Polyubiquitin-B P0CG47 UBB_HUMAN

UBC_1-1 Polyubiquitin-C P0CG48 UBC_HUMAN

UBD_1-1/-2 Ubiquitin D O15205 UBD_HUMAN

UBFD1_1-1/-2 Ubiquitin domain-containing protein UBFD1 O14562 UBFD1_HUMAN

UBIML_1-1 Putative ubiquitin-like protein FUBI-like protein ENSP00000310146 A6NDN8 UBIML_HUMAN

UBIML_2-1 A6NDN8-2 UBIML_HUMAN

UBL3_1-1 Ubiquitin-like protein 3 O95164 UBL3_HUMAN

UBL4A_1-1 Ubiquitin-like protein 4A P11441 UBL4A_HUMAN

UBL4B_1-1 Ubiquitin-like protein 4B Q8N7F7 UBL4B_HUMAN

UBL5_1-1 Ubiquitin-like protein 5 Q9BZL1 UBL5_HUMAN

UBL7_1-1 Ubiquitin-like protein 7 Q96S82 UBL7_HUMAN

UBLCP1_1-1/-2/-3 Ubiquitin-like domain-containing CTD phosphatase 1 Q8WVY7 UBCP1_HUMAN

UBQLN1_1-1 Ubiquilin-1 Q9UMX0 UBQL1_HUMAN

UBQLN1_2-1 Q9UMX0 UBQL1_HUMAN

UBQLN2_1-1 Ubiquilin-2 Q9UHD9 UBQL2_HUMAN

UBQLN3_1-1 Ubiquilin-3 Q9H347 UBQL3_HUMAN

UBQLN4_1-1 Ubiquilin-4 Q9NRR5 UBQL4_HUMAN

UBQLN4_2-1 Q9NRR5 UBQL4_HUMAN

UBQLNL_1-1 Ubiquilin-like protein Q8IYU4 UBQLN_HUMAN

UBQLNL_2-1 Q8IYU4 UBQLN_HUMAN

UBTD1_1-1 Ubiquitin domain-containing protein 1 Q9HAC8 UBTD1_HUMAN

UBTD2_1-1 Ubiquitin domain-containing protein 2 Q8WUN7 UBTD2_HUMAN

UBXN1_1-1 UBX domain-containing protein 1 Q04323 UBXN1_HUMAN

UBXN1_2-1 Q04323 UBXN1_HUMAN

UBXN2A_1-1/-2 UBX domain-containing protein 2A P68543 UBX2A_HUMAN

UBXN2B_1-1/-2 UBX domain-containing protein 2B Q14CS0 UBX2B_HUMAN

UBXN4_1-1/-2 UBX domain-containing protein 4 Q92575 UBXN4_HUMAN

UBXN6_1-1/-2 UBX domain-containing protein 6 Q9BZV1 UBXN6_HUMAN

UBXN6_2-1 Q9BZV1-2 UBXN6_HUMAN

UBXN7_1-1/-2 UBX domain-containing protein 7 O94888 UBXN7_HUMAN

UBXN8_1-1 UBX domain-containing protein 8 O00124 UBXN8_HUMAN

UBXN8_2-1 O00124 UBXN8_HUMAN

UBXN8_3-1 O00124 UBXN8_HUMAN

UBXN10_1-1 UBX domain-containing protein 10 Q96LJ8 UBX10_HUMAN

128

UBXN11_1 UBX domain-containing protein 11 Q5T124 UBX11_HUMAN

UBXN11_2 Q5T124 UBX11_HUMAN





UFM1_1-1 Ubiquitin-fold modifier 1 P61960 UFM1_HUMAN

UFM1_2-1 P61960 UFM1_HUMAN

UHRF1_1-1 E3 ubiquitin-protein ligase UHRF1 Q96T88 UHRF1_HUMAN

UHRF1BP1 UHRF1-binding protein 1 Q6BDS2 URFB1_HUMAN

UHRF2_1-1 E3 ubiquitin-protein ligase UHRF2 Q96PU4 UHRF2_HUMAN

UHRF2_2-1 Q96PU4 UHRF2_HUMAN

URM1_1-1 Ubiquitin-related modifier 1 Q9BTM9 URM1_HUMAN

URM1-2 Q9BTM9 URM1_HUMAN

USP11_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 11 P51784 UBP11_HUMAN

USP11_1-2 P51784 UBP11_HUMAN

USP14_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 14 P54578 UBP14_HUMAN

USP14_1-2 P54578 UBP14_HUMAN

USP15_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 15 Q9Y4E8 UBP15_HUMAN

USP15_1-2 Q9Y4E8 UBP15_HUMAN

USP15_1-3 Q9Y4E8 UBP15_HUMAN

USP15_2-1/-2/-3 Q9Y4E8-2 UBP15_HUMAN

USP15_2-2 Q9Y4E8-2 UBP15_HUMAN


USP15_3-1/-2/-3 Q9Y4E8-3 UBP15_HUMAN



USP20_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 20 Q9Y2K6 UBP20_HUMAN

USP20_1-2 Q9Y2K6 UBP20_HUMAN

USP21_1-1 Ubiquitin carboxyl-terminal hydrolase 21 Q9UK80 UBP21_HUMAN

USP21_3-1 Q9UK80-3 UBP21_HUMAN

USP24_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 24 Q9UPU5 UBP24_HUMAN

USP24_1-2 Q9UPU5 UBP24_HUMAN

USP25_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 25 Q9UHP3 UBP25_HUMAN

USP25_1-2 Q9UHP3 UBP25_HUMAN


USP25_2-1/-2/-3 Q9UHP3 UBP25_HUMAN



129

USP28_1-1 Ubiquitin carboxyl-terminal hydrolase 28 Q96RU2 UBP28_HUMAN

USP28_1-2 Q96RU2 UBP28_HUMAN

USP28_2-1 Q96RU2-2 UBP28_HUMAN



USP31_1-1 Ubiquitin carboxyl-terminal hydrolase 31 Q70CQ4 UBP31_HUMAN

USP32_1-1/-2/-3/-4/-5/-6

Ubiquitin carboxyl-terminal hydrolase 32 Q8NFA0 UBP32_HUMAN

USP34_1-1/-2/-3/-4/-5/-6

Ubiquitin carboxyl-terminal hydrolase 34 Q70CQ2 UBP34_HUMAN

USP34_2-1/-2/-3 Q70CQ2-2 UBP34_HUMAN

USP34_3-1/-2/-3 Q70CQ2-3 UBP34_HUMAN

USP4_1-1/-2/-3/-4 Ubiquitin carboxyl-terminal hydrolase 4 Q13107 UBP4_HUMAN

USP4_2-1/-2/-3/-4 Q13107-2 UBP4_HUMAN

USP40_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 40 Q9NVE5 UBP40_HUMAN

USP40_2-1 Q9NVE5-2 UBP40_HUMAN

USP40_3-1/-2/-3 Q9NVE5-3 UBP40_HUMAN

USP43_1-1/-2/-3/-4 Ubiquitin carboxyl-terminal hydrolase 43 Q70EL4 UBP43_HUMAN

USP43_3-1 Q70EL4-3 UBP43_HUMAN

USP47_1-1/-2/-3/-4/-5/-6

Ubiquitin carboxyl-terminal hydrolase 47 Q96K76 UBP47_HUMAN

USP47_2-1/-2/-3/-4/-5

Q96K76-2 UBP47_HUMAN

USP47_3-1 Q96K76-3 UBP47_HUMAN

USP48_1-1/-2 Ubiquitin carboxyl-terminal hydrolase 48 Q86UV5 UBP48_HUMAN

USP48_2-1/-2/-3 Q86UV5-2 UBP48_HUMAN

USP48_3-1/-2 Q86UV5-3 UBP48_HUMAN

USP48_4-1 Q86UV5-4 UBP48_HUMAN


USP48_6-1 Q86UV5-6 UBP48_HUMAN


USP5_1-1 Ubiquitin carboxyl-terminal hydrolase 5 P45974 UBP5_HUMAN

USP5_2-1/-2 P45974-2 UBP5_HUMAN

USP6_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 6 P35125 UBP6_HUMAN

USP6_2-1/-2/-3/-4 P35125-2 UBP6_HUMAN

USP7_1-1/-2/-3/-4/-5/-6/-7

Ubiquitin carboxyl-terminal hydrolase 7 Q93009 UBP7_HUMAN

USP8_1-1/-2/-3 Ubiquitin carboxyl-terminal hydrolase 8 P40818 UBP8_HUMAN

USP9X_1-1/-2/-3 Probable ubiquitin carboxyl-terminal hydrolase FAF-X Q93008 USP9X_HUMAN

USP9X_2-1/-2/-3 Q93008-2 USP9X_HUMAN

USP9Y_1-1/-2/-3 Probable ubiquitin carboxyl-terminal hydrolase FAF-Y O00507 USP9Y_HUMAN

USP9Y_2-1/-2/-3 O00507-2 USP9Y_HUMAN

VCPIP1_1-1/-2/-3 Deubiquitinating protein VCIP135 Q96JH7 VCIP1_HUMAN

130

WDR48_1-1/-2 WD repeat-containing protein 48 Q8TAF3 WDR48_HUMAN

WDR48_2-1 Q8TAF3-2 WDR48_HUMAN

WDR48_3-1/-2/-3 Q8TAF3-3 WDR48_HUMAN

WDR48_4-1/-2/-3 Q8TAF3-4 WDR48_HUMAN

WDR48_5-1/-2 Q8TAF3-5 WDR48_HUMAN

YOD1_1-1 Ubiquitin thioesterase OTU1 Q5VVQ6 OTU1_HUMAN

YOD1_2-1 Q5VVQ6 OTU1_HUMAN

131

Appendix III: 205 proteins interact with both ubiquitin & at least one member of the ubiquilin family.

Uniprot ID UniProtKB Entry Uniprot ID UniProtKB Entry Uniprot ID UniProtKB Entry

P62259 1433E_MOUSE P56480 ATPB_MOUSE P68104 EF1A1_HUMAN

P68510 1433F_MOUSE P01887 B2MG_MOUSE Q9D8N0 EF1G_MOUSE

P61982 1433G_MOUSE B2RRX1 B2RRX1_MOUSE P17182 ENOA_MOUSE

P68254 1433T_MOUSE B2RSC8 B2RSC8_MOUSE P42566 EPS15_HUMAN

P63101 1433Z_MOUSE Q6PAJ1 BCR_MOUSE P42567 EPS15_MOUSE

A2BFF7 A2BFF7_MOUSE P70444 BID_MOUSE P19096 FAS_MOUSE

P05067-7 A4_HUMAN O08539 BIN1_MOUSE XP_005266087 FBX25_HUMAN

P12023 A4_MOUSE Q64152 BTF3_MOUSE Q9CPU7 FBX32_MOUSE

A8DUV3 A8DUV3_MOUSE P00920 CAH2_MOUSE Q9Z0E6 GBP2_MOUSE

Q8CBW3 ABI1_MOUSE P62204 CALM_MOUSE Q99PT1 GDIR1_MOUSE

Q5SWU9 ACACA_MOUSE P47757 CAPZB_MOUSE P13020 GELS_MOUSE

P57780 ACTN4_MOUSE P14635 CCNB1_HUMAN P15105 GLNA_MOUSE

Q9QYC0 ADDA_MOUSE P63038 CH60_MOUSE P38647 GRP75_MOUSE

Q16186 ADRM1_HUMAN Q68FD5 CLH1_MOUSE P11021 GRP78_HUMAN

Q8CJG0 AGO2_MOUSE Q922J3 CLIP1_MOUSE NM_002111 HD_HUMAN

P24549 AL1A1_MOUSE Q06890 CLUS_MOUSE Q15034 HERC3_HUMAN

Q8R0Y6 AL1L1_MOUSE O55029 COPB2_MOUSE O14964 HGS_HUMAN

P05064 ALDOA_MOUSE P47941 CRKL_MOUSE Q9D0E1 HNRPM_MOUSE

P45376 ALDR_MOUSE Q93034 CUL5_HUMAN Q61699 HS105_MOUSE

Q96K21 ANCHR_HUMAN P17302 CXA1_HUMAN P17879 HS71B_MOUSE

P17427 AP2A2_MOUSE Q7TMB8 CYFP1_MOUSE P34931 HS71L_HUMAN

Q9DBG3 AP2B1_MOUSE D2KHZ9 D2KHZ9_MOUSE P07901 HS90A_MOUSE

O54774 AP3D1_MOUSE O08788 DCTN1_MOUSE P63017 HSP7C_MOUSE

Q9R0Q6 ARC1A_MOUSE Q8CBY8 DCTN4_MOUSE P54105 ICLN_HUMAN

A2BH40 ARI1A_MOUSE Q62167 DDX3X_MOUSE Q9D6R2 IDH3A_MOUSE

P61161 ARP2_MOUSE P63037 DNJA1_MOUSE P12268 IMDH2_HUMAN

Q9CQE6 ASF1A_MOUSE P25686 DNJB2_HUMAN P48025 KSYK_MOUSE

Q925I1 ATAD3_MOUSE Q9Z1N5 DX39B_MOUSE P16125 LDHB_MOUSE

Q03265 ATPA_MOUSE Q9JHU4 DYHC1_MOUSE Q91ZX7 LRP1_MOUSE

Q9QXZ0 MACF1_MOUSE Q8CI94 PYGB_MOUSE P42227 STAT3_MOUSE

Q8R001 MARE2_MOUSE Q3UHZ3 Q3UHZ3_MOUSE Q9WUM5 SUCA_MOUSE

P97310 MCM2_MOUSE Q3ULF7 Q3ULF7_MOUSE Q13148 TADBP_HUMAN

P97311 MCM6_MOUSE Q4VAE6 Q4VAE6_MOUSE P10637 TAU_MOUSE

P14152 MDHC_MOUSE Q921K2 Q921K2_MOUSE P11983 TCPA_MOUSE

P08249 MDHM_MOUSE Q922K6 Q922K6_MOUSE P80316 TCPE_MOUSE

P20357 MTAP2_MOUSE Q62172 RBP1_MOUSE Q9NZ01 TECR_HUMAN

Q8VDD5 MYH9_MOUSE P54725 RD23A_HUMAN P55072 TERA_HUMAN

Q64331 MYO6_MOUSE P54727 RD23B_HUMAN Q01853 TERA_MOUSE

P70670 NACAM_MOUSE P53026 RL10A_MOUSE Q04207 TF65_MOUSE

132

Uniprot ID UniProtKB Entry Uniprot ID UniProtKB Entry Uniprot ID UniProtKB Entry

P15532 NDKA_MOUSE P47963 RL13_MOUSE Q8QZT1 THIL_MOUSE

Q8TAT6 NPL4_HUMAN Q9CR57 RL14_MOUSE P19438 TNR1A_HUMAN

P35486 ODPA_MOUSE Q9D8E6 RL4_MOUSE P20333 TNR1B_HUMAN

P29341 PABP1_MOUSE P62987 RL40_HUMAN P17751 TPIS_MOUSE

P49586 PCY1A_MOUSE P47911 RL6_MOUSE P21107 TPM3_MOUSE

Q9WU78 PDC6I_MOUSE P12970 RL7A_MOUSE Q12933 TRAF2_HUMAN

P12382 PFKAL_MOUSE P14869 RLA0_MOUSE Q9R1R2 TRIM3_MOUSE

Q13526 PIN1_HUMAN Q96GF1 RN185_HUMAN Q9QZE7 TSNAX_MOUSE

Q9QXS1 PLEC_MOUSE Q9Y3C5 RNF11_HUMAN P62837 UB2D2_HUMAN

P63330 PP2AA_MOUSE P70336 ROCK2_MOUSE P61077 UB2D3_HUMAN

P35700 PRDX1_MOUSE P38886 RPN10_YEAST P0CG48, NP_066289

UBC_HUMAN

Q61171 PRDX2_MOUSE O48726 RPN13_ARATH P49459-3 UBE2A_HUMAN

P97313 PRKDC_MOUSE P62281 RS11_MOUSE P63146 UBE2B_HUMAN

P62334 PRS10_MOUSE P25444 RS2_MOUSE Q05086 UBE3A_HUMAN

P62192 PRS4_MOUSE P62908 RS3_MOUSE P11441 UBL4A_HUMAN

P54775 PRS6B_MOUSE E9Q401 RYR2_MOUSE Q70CQ2 UBP34_HUMAN

P35998 PRS7_HUMAN Q9UBT2 SAE2_HUMAN Q9UMX0 UBQL1_HUMAN

P62196 PRS8_MOUSE O43865 SAHH2_HUMAN Q9UHD9 UBQL2_HUMAN

P25787 PSA2_HUMAN P42208 SEPT2_MOUSE P15374 UCHL3_HUMAN

P60900 PSA6_HUMAN P28661 SEPT4_MOUSE Q13564 ULA1_HUMAN

Q9QUM9 PSA6_MOUSE Q9R1T4 SEPT6_MOUSE Q9C0B0 UNK_HUMAN

Q3TXS7 PSMD1_MOUSE O55131 SEPT7_MOUSE XP_005272733 USP9X_HUMAN

Q13200 PSMD2_HUMAN P84022 SMAD3_HUMAN Q9WV55 VAPA_MOUSE

Q8VDM4 PSMD2_MOUSE Q920B9 SP16H_MOUSE P20152 VIME_MOUSE

O43242 PSMD3_HUMAN Q62261 SPTB2_MOUSE P62960 YBOX1_MOUSE

P14685 PSMD3_MOUSE P16546 SPTN1_MOUSE P39447 ZO1_MOUSE

P55034 PSMD4_ARATH O60232 SSA27_HUMAN O95218-2 ZRAB2_HUMAN

P55036, P55036-2

PSMD4_HUMAN Q92783 STAM1_HUMAN

Q05920 PYC_MOUSE O75886 STAM2_HUMAN

133

Appendix IV: 127 putative UIM sequences within 106 proteins that interact with both ubiquitin & at least one member of the ubiquilin family.

[ED](3)-x(3)-[AG]-x(3)-S-x(2)-[ED] 6 hits in 6 sequences P25686 DNJB2_HUMAN 252 - 265: DEDlqlAmaySlsE

O14964 HGS_HUMAN 260 - 273: EEElqlAlalSqsE

P55036 PSMD4_HUMAN 232 - 245: EEEarrAaaaSaaE

Q920B9 SP16H_MOUSE 994 - 1007: EEEarkAdreSryE

Q92783 STAM1_HUMAN 173 - 186: EEDlakAielSlkE

O75886 STAM2_HUMAN 167 - 180: DEDiakAielSlqE

[ED]-x(3)-[AG]-x(3)-S-x(2)-[ED] 32 hits in 27 sequences P63101 1433Z_MOUSE 20 - 31: DdmaAcmkSvtE

Q5SWU9 ACACA_MOUSE 553 - 564: DsqfGhcfSwgE

Q96K21 ANCHR_HUMAN 208 - 219: DerqGsipStqE

Q925I1 ATAD3_MOUSE 33 - 44: DrgaGdrpSpkD

B2RRX1 B2RRX1_MOUSE 226 - 237: EmatAassSslE

B2RSC8 B2RSC8_MOUSE 8 - 19: DeseApvlSedE

O08539 BIN1_MOUSE 168 - 179: EakiAkpvSllE

Q922J3 CLIP1_MOUSE 664 - 675: EavkArldSaeD

P25686 DNJB2_HUMAN 211 - 222: DlalGlelSrrE

254 - 265: DlqlAmaySlsE

P42566 EPS15_HUMAN 881 - 892: DlelAialSksE

P42567 EPS15_MOUSE 882 - 893: DlelAialSksE

P19096 FAS_MOUSE 1589 - 1600: DcmlGmefSgrD

P42858 HD_HUMAN 1261 - 1272: EkfgGflrSalD

O14964 HGS_HUMAN 262 - 273: ElqlAlalSqsE

Q9QXZ0 MACF1_MOUSE 4960 - 4971: EelqAktsSleE

P20357 MTAP2_MOUSE 889 - 900: EnlsGesgSfyE

Q9QXS1 PLEC_MOUSE 2090 - 2101: ElelGrirSnaE

4289 - 4300: DpetGkemSvyE

4364 - 4375: DqyrAgtlSitE

P14685 PSMD3_MOUSE 37 - 48: EeaaAgsgStgE

P55034 PSMD4_ARATH 225 - 236: ElalAlrvSmeE

P55036 PSMD4_HUMAN 215 - 226: ElalAlrvSmeE

234 - 245: EarrAaaaSaaE

Q9Y3C5 RNF11_HUMAN 141 - 152: EpvdAallSsyE

P70336 ROCK2_MOUSE 1143 - 1154: EpddGfpeSrlE

P38886 RPN10_YEAST 227 - 238: ElamAlrlSmeE

E9Q401 RYR2_MOUSE 4198 - 4209: EmqlAaqiSesD

Q920B9 SP16H_MOUSE 933 - 944: DaedGdseSeiE

996 - 1007: EarkAdreSryE

Q92783 STAM1_HUMAN 175 - 186: DlakAielSlkE

O75886 STAM2_HUMAN 169 - 180: DiakAielSlqE

134

[ED]-x(3)-[AG]-x(4)-S-x(2)-[ED] 25 hits in 20 sequences P68510 1433F_MOUSE 136 - 148: EvasGekknSvvE

P24549 AL1A1_MOUSE 138 - 150: DkihGqtipSdgD

O54774 AP3D1_MOUSE 884 - 896: EelaAstitSpkD

A2BH40 ARI1A_MOUSE 2131 - 2143: DlilAtppfSrlE

Q6PAJ1 BCR_MOUSE 850 - 862: DyerAewreSirE

O08788 DCTN1_MOUSE 875 - 887: EqiyGspssSpyE

965 - 977: ElseAnvrlSllE

P63037 DNJA1_MOUSE 74 - 86: EggaGggfgSpmD

P19096 FAS_MOUSE 1358 - 1370: EvqpApsllSqeE

P38647 GRP75_MOUSE 244 - 256: DlggGtfdiSilE

Q9QXZ0 MACF1_MOUSE 105 - 117: DlrdGhnliSllE

3823 - 3835: EqyaAslarSeaE

Q9QXS1 PLEC_MOUSE 217 - 229: DlrdGhnliSllE

2360 - 2372: EvteAarqrSqvE

P62192 PRS4_MOUSE 382 - 394: DlimAkddlSgaD

P14685 PSMD3_MOUSE 50 - 62: DgkaAatehSqrE

Q3UHZ3 Q3UHZ3_MOUSE 180 - 192: EseeGnsaeSaaE

Q62172 RBP1_MOUSE 83 - 95: EgyaAfqedSsgD

415 - 427: DlqgGikdlSkeE

Q9UBT2 SAE2_HUMAN 483 - 495: EdgkGtiliSseE

P28661 SEPT4_MOUSE 2 - 14: DhslGwqgnSvpE

Q920B9 SP16H_MOUSE 140 - 152: DkfpGefmkSwsD

930 - 942: EgsdAedgdSesE

Q62261 SPTB2_MOUSE 1600 - 1612: DaaeAeawmSeqE

Q05086 UBE3A_HUMAN 98 - 110: EnskGapnnScsE

135

[ED]-x(3)-[AG]-x(5)-S-x(2)-[ED] 30 hits in 26 sequences Q5SWU9 ACACA_MOUSE 945 - 958: DshaAtlnrkSerE

P45376 ALDR_MOUSE 217 - 230: DrpwAkpedpSllE

B2RRX1 B2RRX1_MOUSE 224 - 237: EqemAtaassSslE

Q6PAJ1 BCR_MOUSE 325 - 338: DsggGytpdcSsnE

P00920 CAH2_MOUSE 19 - 32: DfpiAngdrqSpvD

P63037 DNJA1_MOUSE 268 - 281: EalcGfqkpiStlD

P25686 DNJB2_HUMAN 71 - 84: EgltGtgtgpSraE

254 - 267: DlqlAmayslSemE

Q9JHU4 DYHC1_MOUSE 3952 - 3965: DeqfGiwldsSspE

P68104 EF1A1_HUMAN 319 - 332: DvrrGnvagdSknD

P42858 HD_HUMAN 409 - 422: EesgGrsrsgSivE

O14964 HGS_HUMAN 262 - 275: ElqlAlalsqSeaE

Q91ZX7 LRP1_MOUSE 2807 - 2820: EsvtAgclynStcD

Q8VDD5 MYH9_MOUSE 1153 - 1166: DstaAqqelrSkrE

Q64331 MYO6_MOUSE 1234 - 1247: ErcgGiqylqSaiE

Q9QXS1 PLEC_MOUSE 1200 - 1213: EpspAaptlrSelE

2037 - 2050: EerlAqlrkaSesE

P54775 PRS6B_MOUSE 361 - 374: EdyvArpdkiSgaD

P55034 PSMD4_ARATH 223 - 236: DpelAlalrvSmeE

309 - 322: DlalAlqmsmSgeE

P55036 PSMD4_HUMAN 213 - 226: DpelAlalrvSmeE

Q62172 RBP1_MOUSE 19 - 32: EhgsGltrtpSseE

83 - 96: EgyaAfqedsSgdE

P38886 RPN10_YEAST 225 - 238: DpelAmalrlSmeE

E9Q401 RYR2_MOUSE 3702 - 3715: EdddGeeevkSfeE

Q62261 SPTB2_MOUSE 1378 - 1391: DankAelftqScaD

Q92783 STAM1_HUMAN 173 - 186: EedlAkaielSlkE

O75886 STAM2_HUMAN 167 - 180: DediAkaielSlqE

Q93008 USP9X_HUMAN 1682 - 1695: EqhdAleffnSlvD

Q9WV55 VAPA_MOUSE 143 - 156: EpskAvplnaSkqD

136

[ED]-x(3)-[AG]-x(6)-S-x(2)-[ED] 34 hits in 27 sequences

A2BH40 ARI1A_MOUSE 117 - 131: EppgGgggggsSssD

Q6PAJ1 BCR_MOUSE 324 - 338: EdsgGgytpdcSsnE

P62204 CALM_MOUSE 7 - 21: EeqiAefkeafSlfD

Q922J3 CLIP1_MOUSE 661 - 675: DsleAvkarldSaeD

O55029 COPB2_MOUSE 593 - 607: EyqtAvmrrdfSmaD

Q9JHU4 DYHC1_MOUSE 4621 - 4635: DfeiAtkedprSfyE

P68104 EF1A1_HUMAN 403 - 417: DmvpGkpmcveSfsD

P42566 EPS15_HUMAN 576 - 590: EvttAvtekvcSelD

P19096 FAS_MOUSE 584 - 598: EvacGyadgclSqrE

Q91ZX7 LRP1_MOUSE 1353 - 1367: DwiaGniywveSnlD

2630 - 2644: DcedAsdemncSatD

3967 - 3981: DwvaGnvywtdSgrD

Q9QXZ0 MACF1_MOUSE 2199 - 2213: DtsvGlrsefkSehD

2685 - 2699: DmatGkrvtlaSalE

6870 - 6884: DrvkAlitehqSfmE

P97310 MCM2_MOUSE 790 - 804: DvnmAirvmmeSfiD

P20357 MTAP2_MOUSE 7 - 21: DegkAphwtsaSltE

Q64331 MYO6_MOUSE 702 - 716: DlmqGgfpsraSfhE

Q13526 PIN1_HUMAN 87 - 101: ElinGyiqkikSgeE

P97313 PRKDC_MOUSE 2041 - 2055: DfstGvqsysySsqD

O43242 PSMD3_HUMAN 52 - 66: DgktAaaaaehSqrE

P55036 PSMD4_HUMAN 255 - 269: DsddAllkmtiSqqE

Q62172 RBP1_MOUSE 452 - 466: EtkiAqeiaslSkeD

P54725 RD23A_HUMAN 150 - 164: EedaAstlvtgSeyE

P38886 RPN10_YEAST 194 - 208: EgssGmgafggSggD

E9Q401 RYR2_MOUSE 1859 - 1873: EeegGtpekeiSieD

3337 - 3351: DhlkAeargdmSeaE

Q9UBT2 SAE2_HUMAN 218 - 232: EpteAeararaSneD

Q920B9 SP16H_MOUSE 930 - 944: EgsdAedgdseSeiE

Q62261 SPTB2_MOUSE 2063 - 2077: EksaAtwderfSalE

2148 - 2162: EmvnGaaeqrtSskE

P16546 SPTN1_MOUSE 1604 - 1618: DrirGvidmgnSliE

Q70CQ2 UBP34_HUMAN 786 - 800: EknmAdfdgeeSgcE

1672 - 1686: EscsGlyklslSglD

137

Appendix V: Six similarities trees of ubiquitin-like domains clustered based on electrostatic potential at varying

distances (1Å to 6Å) from the UIM-binding interface, along with groups of ubiquitin-like domains that share strong electrostatic potential similarity at that specific range.

Documents

STRUCTURE DETERMINATION AND BIOCHEMICAL CHARACTERIZATION OF NOVEL HUMAN ... · PDF fileii STRUCTURE DETERMINATION AND BIOCHEMICAL CHARACTERIZATION OF NOVEL HUMAN UBIQUITIN-LIKE DOMAINS