Clamping, bending, and twisting inter-domain motions in the misfold-recognising portion of UDP-glucose:glycoprotein glucosyl-transferase

UDP-glucose:glycoprotein glucosyltransferase (UGGT) is the only known glycoprotein folding quality control checkpoint in the eukaryotic glycoprotein secretory pathway. When the enzyme detects a misfolded glycoprotein in the Endoplasmic Reticulum (ER), it dispatches it for ER retention by re-glucosylating it on one of its N-linked glycans. Recent crystal structures of a fungal UGGT have suggested the enzyme is conformationally mobile. Here, a negative stain electron microscopy reconstruction of UGGT in complex with a monoclonal antibody confirms that the misfold-sensing N-terminal portion of UGGT and its C-terminal catalytic domain are tightly associated. Molecular Dynamics (MD) simulations capture UGGT in so far unobserved conformational states, giving new insights into the molecule’s flexibility. Principal component analysis of the MD trajectories affords a description of UGGT’s overall inter-domain motions, highlighting three types of inter-domain movements: bending, twisting and clamping. These inter-domain motions modify the accessible surface area of the enzyme’s central saddle, likely enabling the protein to recognize and re-glucosylate substrates of different sizes and shapes, and/or re-glucosylate N-linked glycans situated at variable distances from the site of misfold. We propose to name “Parodi limit” the maximum distance between a site of misfolding on a UGGT glycoprotein substrate and an N-linked glycan that monomeric UGGT can re-glucosylate on the same glycoprotein. MD simulations estimate the Parodi limit to be around 60-70 Å. Re-glucosylation assays using UGGT deletion mutants suggest that the TRXL2 domain is necessary for activity against urea-misfolded bovine thyroglobulin. Taken together, our findings support a “one-size-fits-all adjustable spanner” substrate recognition model, with a crucial role for the TRXL2 domain in the recruitment of misfolded substrates to the enzyme’s active site.


Introduction
A wonderfully efficient protein folding machinery in the Endoplasmic Reticulum (ER) of eukaryotic cells ensures that only correctly folded glycoproteins can exit the ER, proceed to the Golgi, and from there continue along the secretory pathway towards their cellular or extracellular destinations [1]. The stringency of this Endoplasmic Reticulum Quality Control (ERQC) system is of great advantage to healthy cells because it allows time for complex glycoproteins to fold in the ER and prevents premature secretion of incompletely folded species. In the background of a misfold-inducing missense mutation in a secreted glycoprotein gene, the resulting misfolded glycoprotein is either retained in the ER by ERQC or degraded by the ER associated degradation (ERAD) machinery [2]. ERQC bears particularly unfortunate consequences when the mutation induces a minor folding defect but does not abrogate the function of the glycoprotein ("responsive mutant"): in this case ERQC causes disease by blocking the secretion of the glycoprotein mutant, even though its residual activity would be beneficial to the organism (see for example [3]).
Central to ERQC is the ER-resident 170 kDa enzyme UDP-Glucose:glycoprotein Glucosyl Transferase (UGGT). The enzyme selectively reglucosylates a misfolded glycoprotein on one of its N-glycans and promotes its association with the ER lectins calnexin and calreticulin, thus mediating ER retention. Only correctly folded glycoproteins escape UGGT-mediated reglucosylation and can progress down the secretory pathway -to the Golgi and beyond. More than 25 years after the discovery of UGGT [3], recent structural and functional work has uncovered the protein's multi-domain architecture and obtained preliminary evidence of its inter-domain conformational flexibility [4][5][6]. Negativestain electron microscopy and small-angle X-ray scattering (SAXS) first revealed an arc-like structure with some degree of structural variability [4]. Soon after, four distinct full length Chaetomium thermophilum UGGT (CtUGGT) crystal structures, together with a 15 Å cryo-EM reconstruction of the same protein, suggested that the C-terminal portion -comprising two β-sandwiches and the catalytic domainconstitutes a relatively rigid structure, while most of the conformational flexibility localizes to the four N-terminal thioredoxin-like (TRXL) domains [5]. Contrary to these observations, relative flexibility of the catalytic domain with respect to the rest of the structure was proposed by a different study of Thermomyces dupontii UGGT (TdUGGT), based on atomic force microscopy data and a 25 Å negative stain EM map, to which crystal structures for the catalytic domain and the remaining part of TdUGGT were separately fitted [6,7].
Here, we further characterize UGGT's inter-domain flexibility and seek to clarify the above-mentioned controversy regarding the relative movements of the C- Kinetic measurements on TRXL2 and TRXL3 deletion UGGT mutants in reglucosylating assays of urea-misfolded bovine thryoglobulin suggest that only the former domain is absolutely necessary for re-glucosylation of this substrate. We discuss the functional implications of these discoveries.

Results
A new crystal structure of CtUGGT adds to the landscape sampled by previously observed UGGT conformations.
In the previously reported crystal CtUGGT structures, a full length eukaryotic UGGT revealed four thioredoxin-like domains (TRXL1-4) arranged in a long arc, terminating in two β-sandwiches (βS1 and βS2) tightly clasping the glucosyltransferase family 24 (GT24) domain [5]. The wild-type protein was captured in Those four CtUGGT structures mainly differ in the spatial organization of domains TRXL2 and TRXL3 ( Figure 1, and SI Appendix movie I). Across these structures, the TRXL2 domain is rotated by different amounts with respect to the rest of the protein and adopts different degrees of proximity to it. The TRXL3 domain instead appears in the same relative conformation in all structures, except for the 'open' one (right-hand side panel in Figure 1A and green in Figures 1B,C), in which the TRXL3 and TRXL1 domains move apart, leading to the opening of a cleft between them.
We describe here a fifth novel full-length CtUGGT structure (hereinafter CtUGGT Kif , PDB ID 6TRF), obtained from cells treated with the mannosidase inhibitor kifunensine (the compound prevents elaboration of N-linked glycans along the secretory pathway and ensures that secreted glycoproteins carry highmannose glycans). CtUGGT Kif adopts a conformation which combines a TRXL1-TRXL3 distance as found in the 'open' conformation, but a TRXL2/TRXL3 relative orientation similar to the one found in the 'close j ~÷|"Ji u d-like' conformation ( Figure 1C). We label this CtUGGT Kif conformation 'new intermediate'.
In order to establish a framework for the discussion of the motions of the UGGT molecule, mapping its inter-domain conformational landscape in a compact way, we define here 3 collective conformational coordinates (CCs). 'CC1' or 'clamp', describes the changes in the distance between the centres of mass of the TRXL1 and TRXL3 domains, and measures the openness of the cleft between them; 'CC2' or 'bend', describes the changes in the angle between the centres of mass of the TRXL1, TRXL2 and TRXL3 domains, and measures the proximity of the TRXL2 and GT24 domains across the central saddle; lastly, 'CC3' or 'twist', describes the changes in the dihedral angle between the Cα atoms of residues Y518, F466, T863 and I735 (the first two residues in the TRXL2 and the last two in the TRXL3 domain), thus informing on the relative orientation of the TRXL2 and TRXL3 domains. UGGT's motions can be described in simple terms as two rigid groups of domains moving with respect to each other.
We asked the question whether the conformational landscape spanned by UGGT full-length crystal structures can be extended by in silico molecular dynamics. We performed 250 ns long MD simulations starting from four of the CtUGGT crystal structures and computed principal components (PCs, also called essential modes [8]) from the four individual MD trajectories and from the fusion of all four MDs into a single trajectory. UGGT MD trajectories, as expected, span a wider conformational landscape compared to the set of crystal structures. Overall, UGGT's motions can be described in simple terms as two rigid groups of domains moving with respect to each other: one group is formed by domains TRXL2-TRXL3; and the other is formed by domains TRXL1-TRXL4-βS1-βS2-GT24 -the latter group is enclosed in a grey circle in Figure 2B. The interface between domains TRXL3 and TRXL4 acts as a hinge region between the two domain groups.  Figure 3A, right hand side panel) describes a movement in which the TRXL2 domain rotates with respect to TRXL3 (SI Appendix Figure S1 B and SI Appendix Movie III), and the βS2, TRXL1 and TRXL4 domains also undergo motion. The motion encoded by PC2 is well represented in the MD starting from the 'new intermediate' CtUGGT Kif structure, whose projection in Figure 3D also shows a considerable degree of back-and-forth movement.
The fact that the CtUGGT βS1-βS2:GT24 portion of the molecule behaves as one relatively rigid structure throughout the MD simulations is hardly surprising: the domains bury a 1400 Å 2 surface, with a calculated -7.1 kcal/mol solvation free energy gain [9]. The βS1-βS2:GT24 interface is supported by 16 hydrogen bonds, five salt bridges, and 11 hydrophobic interactions, involving 86 residues overall.
The PISA server Complex Formation Significance Score (CSS) is 1.0 [9], suggesting that the contacts in the CtUGGT βS1-βS2:GT24 interface are sufficient to support the observed N term :C term interdomain structure. The solvation free energy gain computed by the same server has a P-value of 0.326 (P<0.5 indicates interfaces with "surprising" -i.e. higher than average -hydrophobicity, implying that the interface is likely interaction-specific) [9].
The tight association we observe between the GT24 and βS1-βS2 domains is at odds with a hypothesis formulated in 2017 and based on negative stain EM and atomic force microscopy (AFM) of Thermomyces dupontii UGGT (TdUGGT): the study proposed that the UGGT GT24 domain assumes a number of different relative orientations with respect to the rest of the molecule, enabled by the flexible linker between the βS2 and GT24 domains [6,7]. Of the 48 residues contributing side chains to the UGGT βS1-βS2:GT24 interface, 44 are conserved between TdUGGT and CtUGGT, and none of the 4 side chain differences would likely abrogate contributions to the interdomains interface (see SI Appendix Figure S4). This prompted us to hypothesize that the GT24 and βS1-βS2 domains constitute a rigid group in TdUGGT also (and, by extension, in UGGTs across all eukaryotes), just as observed in full-length CtUGGT structures and their MD simulations. In absence of a full-length TdUGGT crystal structure, the only information about the relative orientation of TdUGGT GT24 and βS1-βS2 domains comes from a 25 Å negative stain EM reconstruction of TdUGGT in complex with an anti-TdUGGT antibody fragment (Fab) [6,7]. In order to check if the TdUGGT negative stain EM reconstruction is compatible with a model in which GT24 and βS1-βS2 domains also form a rigid group, we generated a full-length TdUGGT homology model, selected a representative Fab structure from the Protein Databank, and proceeded to fit them into the 25 Å negative-stained EM map for the complex of TdUGGT with its Fab (separately fitting the models into both the original map and its enantiomeric mirror image [10]). The correlation coefficients (CCs) between the 25 Å negativestained EM map and the TdUGGT and Fab models are around 90% for both original hand ( Figure 4A-C) and inverted hand map ( Figure 4D-F), for both TdUGGT and Fab models. In the fitted models, the Fab contacts the 440-460 portion of TdUGGT domain TRXL2, in agreement with the published Fab epitope (residues TdUGGT 29-468) [6,7]. Therefore, the 25 Å negative-stained EM map of the complex of TdUGGT with its Fab can be fitted by a full-length TdUGGT model without invoking any detachment of the catalytic domain from the βS1-βS2 region, contrary to what stated in [6,7].

UGGT 'Twisting' and 'clamping' motions are uncorrelated.
As shown in Figure

UGGT activity is underpinned by its inter-domain conformational mobility.
The study in [5] Figure S1A, the MD trajectory starting from the CtUGGT D611C/G1050C structure shows significantly restricted mobility along the first PC, confirming that the extra disulphide bridge in CtUGGT D611C/G1050C tethers the TRXL2 and βS2 domains in a closed conformation; along the second PC, CtUGGT D611C/G1050C moves further than the other structures.
The CtUGGT N796C/G1118C mutant on the other hand still retains most of its mobility, being able to explore a similar conformational space as those observed for wildtype CtUGGT (SI Appendix Figure S1B).
In activity assays of UGGT-mediated re-glucosylation of the misfolded glycoprotein substrate urea-misfolded bovine thyroglobulin, both CtUGGT N796C/G1118C and CtUGGT D611C/G1050C mutants had lower activity than wildtype CtUGGT, but CtUGGT N796C/G1118C had a higher catalytic activity and a lower melting temperature than CtUGGT D611C/G1050C [5]. Taken together, these results suggested that the 'bending' motion is important for re-glucosylation of this particular substrate. In order to probe the functional role of the 'clamping' motion uncovered in the present analysis, we engineered three novel double cysteine CtUGGT mutants: CtUGGT G177C/A786C , CtUGGT G179C/T742C and CtUGGT S180C/T742C , all designed to form disulfide bridges across the TRXL1 and TRXL3 domains, clamping the cleft between them shut. All three CtUGGT double Cys mutants were expressed and purified from the supernatant of mammalian HEK293F cells; the presence of the engineered disulfide bridges was confirmed by mass spectrometry (SI Appendix Figure S3). The crystal structures of CtUGGT G177C/A786C and CtUGGT S180C/T742C were determined to about 4.7-4.5 Å resolution. Both crystal structures show the TRXL3 domain tethered to the TRXL1 domain by the extra disulfide bridge ( Figure 6A). We tested the in vitro activity of the three double Cys mutants (in addition to the activity of the WT and the already published CtUGGT D611C/G1050C ) in a re-glucosylation assay of the UGGT substrate urea-misfolded bovine thyroglobulin ( Figure 6B). Despite their structural similarity, the CtUGGT S180C/T742C and CtUGGT G177C/A786C mutants differ significantly in their ability to re-glucosylate urea-misfolded bovine thyroglobulin: the former is more active than WT CtUGGT, while the latter has similar activity to it.
In order to probe the contributions of individual UGGT TRXL domains to UGGT re-glucosylating activity, we cloned three mutants of CtUGGT, each lacking one of the TRXL1-3 domains: CtUGGT-ΔTRXL1, lacking residues 42-224; CtUGGT-ΔTRXL2, lacking residues 417-650; and CtUGGT-ΔTRXL3, lacking residues 666-870. Only the latter two mutants expressed and were purified, and CtUGGT-ΔTRXL2 was the only TRXL domain deletion mutant that yielded crystals, enabling crystal structure determination by X-ray diffraction to 5.7 Å resolution. The CtUGGT-ΔTRXL2 crystal structure most closely resembles the 'closed' structure (1.32 Å rmsd Cα with PDB ID 5NV4, over 975 residues) apart from a minor rearrangement of the TRXL3 domain, which moves away from the rest of the truncated molecule ( Figure 7A). UGGT-mediated re-glucosylation activity assays of CtUGGT-ΔTRXL2 and CtUGGT-ΔTRXL3 against urea-misfolded bovine thyroglobulin detect impaired re-glucosylation activity upon deletion of TRXL3 and complete loss of activity upon deletion of TRXL2 ( Figure 7B).

UGGT: a 'one size fits all' adjustable spanner?
Taken together, our UGGT dynamics, structures and functional data so far  (Table 1 and Figure 8A). In contrast, for binding of larger substrates (15 Å ⪟ rog ⪟ 23 Å, and 200-500 residues) an opening of the central saddle would be needed ( Figure 8B).

Discussion
Since the discovery of UGGT back in 1989 [11,12], activity studies have used a range of glycoprotein substrates, such as urea-misfolded bovine thyroglobulin [11], mutants of exo-(1,3)-β-glucanase [13], RNase BS [14,15], small size synthetic compounds bearing high-mannose glycans attached to fluorescent aglycon moieties such as 'TAMRA' and 'BODIPY' [16,17] and chemically synthesized misfolded glycoproteins [18][19][20], to mention only a few. Although a comprehensive list of physiological UGGT substrate glycoproteins has not been compiled, and the molecular detail on UGGT:substrate interactions remains uncharacterized, it is apparent that the enzyme is highly promiscuous. Suggestions that the UGGT2 isoform (only present in higher eukayotes) is competent in reglucosylating glycopeptides [21] may point to a duplication of the gene and evolution of two isoforms with separate pools of misfolded glycoprotein substrates.
If this is the case, the "UGGT1-ome" and "UGGT2-ome" (defined as full lists of clients of UGGT1 and UGGT2, respectively [22] [5]), with one study speculating large relative movements between the two portions of the UGGT molecule thanks to this flexible linker [6,7]. In this 'reach-and-grab' model, UGGT would preferentially bind and re-glucosylate glycans close to the site of misfold, but would also be able to extend to re-glucosylate distal glycans in neighboring folded regions, if the GT24 were to escape the embrace of the βS1-βS2 domains [13]. Atomic force microscopy (AFM) can indeed pull the BT24 and βS1-βS2 domains apart [6,7], but this likely constitutes mechanical denaturation, breaking the interface between these domains in a non-physiological manner. Here, we consult all the available structural evidence (namely crystal structures of full length UGGTs and their mutants, their MD trajectories and the 25 Å negative stained EM map for the complex between TdUGGT and an anti-TdUGGT Fab [6,7]) and find no evidence suggesting separation of the βS1-βS2 and GT24 domains on either side of the cleaved flexible linker. Claims to the contrary in [6,7] were likely due to difficulties in docking the N-terminal (PDB ID 5Y7O) and C-terminal portions (PDB ID 5H18) of TdUGGT separately into the negative-stain EM map, in absence of the knowledge of the intimate association between the GT24 domain and the βS1-βS2 tandem domains, observed for the first time in full-length UGGT crystal structures [5] (the TdUGGT study was in the last stages of the editorial process at that time).
Having concluded that UGGT's promiscuity is not dependent on the flexible linker between the catalytic domain and the N-terminal misfold sensing region, but is likely underpinned by the motions uncovered by the MD simulations, the question remains regarding UGGT's reported ability to survey not only folding of small-and medium-size glycoprotein monomers, but also quaternary structure of glycoprotein oligomers and larger multi-glycoprotein complexes [23] [24] [25]. The Indeed, the MD conformational landscape observed in this work reaches extremely compact conformations, which may explain the activity of the enzyme against synthetic glycopeptides [16][17][18][19][20]. Importantly, monomeric UGGT can recognize and re-glucosylate a misfolded glycoprotein only if it can bridge the distance between a folding defect and at least one of the glycoprotein's N-linked glycosylation sites. Thus, irrespective of the misfolded glycoprotein substrate, the finite size of the enzyme puts an upper limit to the maximum distance between a site of misfold and an N-linked glycan that monomeric UGGT can re-glucosylate on the same glycoprotein substrate (unless UGGT misfolded glycoprotein recognition is mediated by UGGT dimers/multimers -a hypothesis not supported by any data so far). The existence of this limit in turn would imply evolutionary pressure on a secreted glycoprotein sequence to develop N-glycosylations sites at accessible distances from the portions of glycoprotein that are most prone to folding difficulties (i.e. a folding glycoprotein 'Achille's heels). We propose to name "Parodi limit" [12] the maximum distance between a UGGT substrate's site of misfold and an Nlinked glycan on the same substrate that enables re-glucosylation by monomeric UGGT in response to recognition of misfold at that same site. On the basis of our CtUGGT MD simulations at 300 K and on the conformational mobility of Man 9 GlcNAc 2 N-linked glycans [26], we estimate the Parodi limit to be in the region of 60-70 Å. Functional data from UGGT-mediated re-glucosylation of a series of rigid, misfolded UGGT glycoprotein substrates, each bearing one recombinantly engineered N-linked glycosylation site at a specific distance from a single site of misfold common to all substrates in the series, would enable experimental estimation of the Parodi limit. Ideally, one such series of artificial N-linked glycosylation sites at varying distance from a single site of misfold would have to be engineered for a number of different substrate glycoprotein scaffolds, in order to minimise the dependency of the Parodi limit estimation from a given substrate series, and to estimate a standard error on that value.
When it comes to correlating UGGT interdomain conformational mobility with its activity, among the CtUGGT double cysteine mutants tested so far, the CtUGGT D611C/G1050C mutant described in [5] is the least active in re-glucosylating urea-misfolded bovine thyroglobulin, compatible with our observation that its MD When it comes to the clamping motion, two a priori rather similar double cysteine mutants, CtUGGT S180C/T742C and CtUGGT G177C/A786C , both designed to clamp the TRXL1-TRXL3 domains shut, differ significantly in their ability to reglucosylate urea-misfolded bovine thyroglobulin, with the former mutant more active than (and the latter mutant having similar activity to) WT CtUGGT. These observations point to the possibility that each misfolded glycoprotein substrate may depend to a different degree on a different subset of UGGT inter-domain conformational degrees of freedom. In the light of these data, the extent to which various portions of the UGGT structure and its motions are critical to its activity will profit from a number of re-glucosylation assays using the same set of UGGT mutants on different glycoprotein substrates.
As to the portion(s) of UGGT that mediate the binding to misfolded glycoproteins -involvement of TRXL2 and placement of substrates between this domain and the βS2-βS2-GT24 rigid domains group is the simplest hypothesis The molecular forces supporting UGGT-mediated glycoprotein misfold recognition have been generally hypothesized to be hydrophobic/hydrophobic interactions [29]. Our observation that the UGGT TRXL2 domain surface facing the central saddle bears distinct patches of hydrophobic residues that are conserved across UGGT1 and UGGT2 sequences [5] supports such models of misfold recognition. The fact that UGGT bears de-mannosylated glycans -a hallmark of ER associated degradation (ERAD) [30,31] -is also compatible with the hypothesis that UGGT may recognise misfolded glycoproteins via an intrinsically misfolded domain ("it takes one to know one" [22]), as observed for the mouse ERAD checkpoint mannosidase, which also preferentially acts on misfolded glycoproteins and has been proven to undergo constant ERAD degradation [32].

Experimental Procedures
Cloning, expression and purification of full-length CtUGGT are described in [5].

Cloning of CtUGGT G177C/A786C and CtUGGT V178C/A786C
Mutation of the CtUGGT into CtUGGT A786C was effected starting from the gene of Step 5: 72 °C for 2 minutes. After KLD treatment (see above) E. coli DH-5α chemically competent cells were transformed with the DNA as described previously. Colony-PCR was performed on DNA from various colonies (using T7_F and T7_R primers, see SI Appendix Table S3) and the DNA obtained was loaded on a 1% w/v agarose gel and run for 50 minutes at 150 V. Analysis of this gel allowed identification of colonies with amplified DNA of the appropriate size, which were used to inoculate 5 ml LB supplemented with 0.1 mg/mL carbenicillin. This was used to make glycerol stocks (see above). The DNA was mini-prepped (Qiagen), sequenced with primers X_F and X_R and maxiprepped (see above) to obtain 3 mL of CtUGGT S180C/T742C :Litmus28i plasmid DNA at 500 ng/µl.
The CtUGGT S180C/T742C insert in Litmus 28i was cloned into the pHLsec vector [33] to contain a hexa-His Tag at the C-terminus. DNA for pHLsec was linearised and gel-purified as described above for the CtUGGT G177C/T786C and CtUGGT V178C/T786C double mutants. PCR was performed on the CtUGGT S180C/A742C insert in Litmus 28i:  Table S3), and maxiprepped as described earlier, to obtain 3 mL of CtUGGT S180C/A742C :pHLsec plasmid DNA at 700 ng/µL.

Cloning of CtUGGT-Δ ΔTRXL1
The  Table S3) and the DNA obtained was run on a 1% w/v agarose gel for 50 minutes at 150 V. Analysis of this gel allowed identification of colonies with amplified DNA of the appropriate size, which were used to miniprep the DNA, sequenced with primer pHLsec_F, see SI Appendix Table S3) and maxiprepped to obtain 3 mL of CtUGGT-ΔTRXL1:Litmus28i plasmid DNA at 400 ng/µl. The CtUGGT-ΔTRXL1 insert in Litmus 28i was cloned into the pHL-sec vector to contain a hexa-His Tag at the C-terminus as described before for the double Cys mutants to obtain 3 mL CtUGGT-ΔTRXL1:pHL-sec plasmid DNA at 500 ng/µl.

Cloning of CtUGGT-Δ ΔTRXL3
The  Table S3) and the DNA obtained was run on a 1% w/v agarose gel for 50 minutes at 150 V. Analysis of this gel allowed identification of colonies with amplified DNA of the appropriate size, which were used to miniprep the DNA, sequenced with primer CtUGGT_401_800_F, see SI Appendix Table S3) and maxiprepped to obtain 3 mL of CtUGGT-ΔTRXL1:Litmus28i plasmid DNA at 500 ng/µl. The CtUGGT-ΔTRXL3 insert in Litmus 28i was cloned into the pHL-sec vector to contain a hexa-His Tag at the C-terminus as described before for the double Cys mutants to obtain 3 mL CtUGGT-ΔTRXL3:pHL-sec plasmid DNA at 800 ng/µl.

Protein Production.
All transfections were carried out as follows, except where otherwose indicated.
Human epithelial kidney FreeStyle 293F cells (ThermoFisher Scientific) at 10 6 cells/mL suspended in FreeStyle 293 Media (ThermoFisher Scientific) were transfected using the FreeStyle 293 expression system (ThermoFisher Scientific). Protein containing fractions were pooled and concentrated to V=800 µL OD 280 =19.00 (6.28 mg/ml ) and the sample stored at 4 °C.

CtUGGT G177C/A786C A 200 ml volume of HEK293F cells culture was transfected with
CtUGGT S177C/A786C :pHLsec plasmid DNA and left expressing for 4 days, the supernatant processed as described previously for CtUGGT S180C/T742C , and run on  and it was decided to dehydrate them by re-equilibrating the crystallization drop against a PEG 6,000-containing mother liquor reservoir: 13 µL of mother liquor were taken out of the 50 µL in the reservoir, replaced with 13 µL of a solution of 50% w/V PEG 6,000 in mother liquor, and the plate re-sealed. After undergoing dehydration for a week, one crystal was flash frozen in liquid N 2 for data collection.
The amount of glucosylation was measured in comparison to control by measuring the peak area of the PNGase F released 2-AA-labelled species Man 9 GlcNAc 2 and Glc 1 Man 9 GlcNAc 2 using Waters Empower software. This allows the % of glucosylation to be determined as the % of Glc 1 -species (Peak Area Glc 1 Man 9 GlcNAc 2 ) as a total of potential glucosylation species (Peak Area of Glc 1 Man 9 GlcNAc 2 + Man 9 GlcNAc 2 ). All datasets were processed with the autoPROC suite of programs [37]. SI Appendix Table S1 contains the data processing statistics.

Crystal Structure Determination and Refinement.
CtUGGT Kif (PDB ID 6TRF): Phaser [38] was run in all primitive orthorhombic space groups searching for one copy of PDB ID 5NV4 from which TRXL2 was domain for a fourth copy was subject to refinement with the same protocol as above (R=31.9% Rfree=33.3%). After this refinement, electron density for the missing TRXL3 domain and the remaining domains of the fourth copy of the molecule was visible in the map. One of the CtUGGT-ΔTRXL2 molecules was superposed onto the fourth copy's TRXL3 domain, followed by rigid body fitting of the bulk of the final copy in Coot [45]. The final model was refined in autoBUSTER with one set of TLS thermal motion tensors per domain and non-crystallographic symmetry and external restraints to the PDB ID 5NV4 structure.
CtUGGT S180C/T742C (PDB ID 6TRT): CCP4-Molrep was run against the CtUGGT S180C/T742C data in P3 1 Table S2 reports the Rfactors and geometry statistics for all models after the final refinements. software [49]. Standard protonation states were assigned to titratable residues (Asp and Glu are negatively charged; Lys and Arg are positively charged).

Fitting of the
Histidine protonation was assigned favoring formation of hydrogen bonds in the crystal structure. The complete protonated systems were then solvated by a truncated cubic box of TIP3P waters, ensuring that the distance between the biomolecule surface and the box limit was at least 10 Å.
MD simulations. Each system was first optimized using a conjugate gradient algorithm for 5000 steps, followed by 150 ps. long constant volume MD equilibration, in which the first 100 ps were used to gradually raise the temperature of the system from 0 to 300 K (integration step = 0.0005 ps/step). The heating was followed by a 250 ps. long constant temperature and constant pressure MD simulation to equilibrate the system density (integration step = 0.001 ps/step).
During these temperature and density equilibration processes, the protein alphacarbon atoms were constrained by 5 kcal/mol/Å force constant using a harmonic potential centered at each atom starting position. Next, a second equilibration MD of 500 ps. was performed, in which the integration step was increased to 2 fs and the force constant for restrained alpha-carbons was decreased to 2 kcal/mol/Å.           CtUGGT secondary structure is indicated above its sequence. Blue dots: residues whose side chains are forming hydrogen bonds across the GT24:βS1-βS2 domains interface. Red stars: residues whose side chains are forming salt bridges across the GT24:βS1-βS2 domains interface. Orange squares: residues whose side chains are forming hydrophobic interactions across the GT24:βS1-βS2 domains interface. The sequences were aligned using Clustal Omega [52]. The figure has been made using ESPript [53].

Radius of Gyration (Å) Reference
Crambe hispanica crambin 1CRN 46 9.7 [54] Hordeum vulgare chymotrypsin inhibitor 2 2CI2 64 11.4 [29] Human We have not included glycoproteins that have been inferred to be UGGT substrates by in cellula experiments (see for example [25,[61][62][63][64]) nor glycoproteins that are bona fide UGGT substrates but whose structure has not been determined [11,65]). (*): structures are available for the mature glycoprotein but it is the pro-glycoprotein (previous to protease cleavage) that folds in the ER under UGGT control -so we have not estimated the RoG.   Tables.   SI Appendix Table S1. CtUGGT X-ray diffraction data collection statistics. All structures were refined against X-ray data from one crystal only. Values in parentheses are for highest-resolution shell.