DNA-Protein Crosslink Proteolysis Repair.

Proteins that are covalently bound to DNA constitute a specific type of DNA lesion known as DNA-protein crosslinks (DPCs). DPCs represent physical obstacles to the progression of DNA replication. If not repaired, DPCs cause stalling of DNA replication forks that consequently leads to DNA double-strand breaks, the most cytotoxic DNA lesion. Although DPCs are common DNA lesions, the mechanism of DPC repair was unclear until now. Recent work unveiled that DPC repair is orchestrated by proteolysis performed by two distinct metalloproteases, SPARTAN in metazoans and Wss1 in yeast. This review summarizes recent discoveries on two proteases in DNA replication-coupled DPC repair and establishes DPC proteolysis repair as a separate DNA repair pathway for genome stability and protection from accelerated aging and cancer.


Overview of DNA-protein crosslinks
DNA contains all genetic information a cell and organism requires to grow, differentiate, survive and divide. DNA is an extremely fragile molecule embedded in highly reactive environment such as H2O. DNA is constantly damaged by various endogenous and exogenous agents, such as reactive oxygen species and UV light, respectively. Loss of DNA integrity leads to various defects in cellular physiology and consequently to diseases such as cancer, diabetes, accelerated aging, neurodegeneration or cell death. To preserve DNA integrity all organisms possess an elaborate genome maintenance apparatus, consisting of multiple DNA damage repair (DDR) and DNA damage tolerance (DDT) pathways (see Glossary) ( Figure 1). Different DDR pathways are involved in the recognition and repair of specific types of DNA lesions [1,2]. Currently, most of the known DDR pathways are wellcharacterised. However, it is still not well understood how DNA-protein crosslinks are repaired [3]. Although DPCs are one of the most abundant DNA lesions and their presence, if not removed, is cytotoxic, the existence of a specialised DPC repair pathway remained elusive. Recently, several research groups identified a unique DNA-protein crosslink repair pathway based on proteolysis, which we have termed here as DNA-protein crosslink proteolysis repair (DPC-PR) [4][5][6][7][8]. DPC-PR pathway is conserved from yeast to humans and is orchestrated by DNA-dependent proteases Wss1 in yeast and SPARTAN (SPRTN), also known as DVC1, in metazoans. The aim of this review is to establish DPC-PR as a unique DNA repair pathway based on DNA replication-coupled proteolysis orchestrated by SPRTN protease in metazoans and Wss1 in yeast. 4

Origins and chemistry of DNA-protein crosslinks
DPCs are created when proteins covalently and irreversibly bind to DNA. Virtually any protein in close proximity to DNA can be crosslinked to DNA upon exposure to various endogenous or exogenous crosslinking agents. Mass-spectrometry analyses identified numerous DNA binding proteins including histones, transcription factors, DNA repair and replication proteins, as well as non-DNA binding proteins as DPCs [4,[9][10][11]. DPC classification is explained in Box 1. Given the high concentration of histones in close proximity to DNA, it is not surprising that histones are among the most abundant DPCs [12]. Aldehydes, reactive oxygen (ROS), nitrogen species cause DPCs through agent-specific mechanisms. UV-light excites DNA bases, most commonly thymidines, which in turn covalently bind to amino acids, specifically cysteine, lysine, phenylalanine, tryptophan or tyrosine with highest efficiency [26].
For the detailed chemistry of DPCs see Box 2. Altogether, DPCs are constantly formed in our genome and if not repaired cause severe threat to genome integrity.

Involvement of canonical DNA repair pathways in DPC repair
Although DPCs are abundant DNA lesions, the mechanisms of DPC repair were under-investigated and thus poorly understood [12]. A few biochemical and genetic studies in different organisms suggested that two canonical DNA repair pathways, namely nucleotide excision repair (NER) and homologous recombination (HR), orchestrate DPC repair and protect cells from DPC-induced cytotoxicity (Figure 2A) [27]. This concept came from the initial studies in bacteria where NER was found to remove and repair small DPCs (smaller than 16kDa) [28], while HR and subsequent replication restart repaired bulky DPCs [29] [12]. A genome-wide screen in yeast implicated NER in the repair of DPCs after acute exposure to high formaldehyde doses and HR in the repair of DPCs after chronic exposure to low formaldehyde doses [30]. Sensitivity analysis in mutant yeast strains also showed that NER is dominant in the DPC repair after high formaldehyde doses [5]. The coordination of NER and HR in yeast is probably dependent on cell cycle phase (high formaldehyde doses cause cell cycle arrest and thus favour NER), and the size of the DPC, similar as in bacteria, although this has not been shown so far. The analysis of mammalian NER excision capacity in vitro and in vivo showed that NER is only able to remove DNA-6 crosslinked proteins of less than 8 -10 kDa [ suggesting that bulky (bigger than 16 kDa) DPCs need to be processed into smaller peptides before the action of NER. Similarly, removal of enzymatic DPCs, TOP1-ccs and TOP2-ccs by Tyrosyl-DNA phosphodiesterase 1 (TDP1) and 2 (TDP2), respectively, requires upstream proteolysis of TOP1 and 2 into smaller peptides [45] as TDP1 and 2 can efficiently process peptides of ~150 amino acid long [46][47][48] (Box3). Altogether, these data suggest that it must exists a protease that proteolysis large DPCs to small peptide remnants attached to the DNA backbone. These peptide remnants are further processed by NER, TDP1 and TDP2 or bypassed by translesion DNA synthesis (TLS) during DNA synthesis. Without such a protease, bulky DPCs would block the progression of the DNA replication fork and lead to DSBs in proliferative cells. Thus, proteolysis-coupled DPC repair was proposed [49].

Proteasome in DPC repair
The proteasome, being the main protease involved in protein degradation, was 9 considered to be involved in proteolytic DPC repair. However, the role of the proteasome system in DPC processing remains unclear due to contradictory literature reports. In bacteria, inhibition of ATP-dependent proteases, which function like a proteasome, did not affect cell survival after exposure to DPC inducing agents formaldehyde and azacytidine [29]. In human cells, proteasome inhibition prevented the removal of histone DPCs, TOP1ccs, and TOP2ccs

DNA-dependent Proteases in DPC repair
The proteolysis-dependent model of DPC repair [49] was further supported by several studies, which all demonstrated replication-coupled proteolysis of a specific DPC in vitro [44,55,56]. However, the protease in question remained unknown. This model was further supported by the discovery of Wss1, a protease in yeast found to cleave TOP1, histone H1 and HMG proteins in vitro and contribute to the cellular resistance to formaldehyde and camptothecin [5]. Four recent studies identified a new  Although Wss1 and SPRTN share similar proteolytic activity in vitro, in vivo they show considerable differences in DPC removal and sensitivity to DPC-inducing agents. SPRTN alone prevents accumulation of endogenous DPCs, as well as formaldehyde-induced DPCs [4,6]. Concordantly, SPRTN protects cells from formaldehyde-induced DPC toxicity [4,6,7]. On the contrary, Wss1 is not involved in the removal of DPCs following formaldehyde treatment, but does partially protect cells from formaldehyde-induced DPC toxicity [5]. Another difference between Wss1 and SPRTN is observed in the repair of TOP1-ccs. SPRTN deficiency results in Top1-cc accumulation and severe sensitivity to CPT [4], while Wss1 depletion does not cause any adverse effects in untreated yeast cells [5]. Only upon co-depletion of Wss1 with Tdp1 is cell survival affected, while in mammals, TDP1 and SPRTN codepletion has not an additive effect in comparison to SPRTN depletion alone [4]. The in vivo differences between the two proteases are further demonstrated by the inability of ectopic SPRTN over-expression to rescue the phenotypes of Wss1-deficient yeast cells [7]. However, despite numerous functional differences, both proteases are essential for replication fork progression [4,8,58,61], indicating that Wss1, like   17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 family of alanyl aminopeptidases ( Figure 3A). This analysis indicates that WLM and SPRT families do not share a common ancestor as was previously suggested [67]. The differences in our methodology compared to previous phylogenetic studies comparing SPRT and WLM families are: (i) an extended number of species were included in the analysis, most importantly prokaryotes (bacteria, archaea, cyanobacteria) and plants which increases the accuracy of tree topologies; and (ii) our analysis included another gluzincin family which enables comparative perspective to the relationship between SPRT and WLM families (Figures S1, S2 and Table S1). Moreover, the SPRT family is present in bacteria, archaea, cyanobacteria, funghi, plants and animals, but is absent from yeast ( Figure 3A, Table S1). Wss1 is part of the WLM protein family which, like the SPRT family, consists of zinzin metalloproteases [68]. WLM proteins are present in yeast, funghi and plants, but are absent from animals and bacteria ( Fig 3A) [12, 68]. A conserved feature of all zinzin metalloproteases is a short consensus HEXXH motif in their active centres, which includes two zinc-binding histidines and a glutamic acid. SPRT and WLM domains ( Figure 3B) are also very different in terms of amino acid sequence identity (5 % identical, 14% similar) and can only be aligned over a short region around the HEXXH motif ( Figure 2S) [4]. In addition, SPRT domains have many highly conserved regions, which are not present in WLM domains ( Figure S2). To further strengthen our finding that SPRTN and Wss1 are two independently-evolved enzymes, we modeled the structure of the SPRT domain of SPRTN using the recently solved crystal structure of Wss1b in fission yeast  ,   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65 right). The described protein core of SPRTN is shared with other gluzincin metallopeptidases, including Wss1b among others. This is confirmed by a homology model of SPRTN domain using another zinzin protease, abylysin (PDB 4JIU) ( Figure   3C, left). Indeed, the region over which the SPRT domain could be modelled with high confidence was longer when aligned to abylysin than to Wss1b. The additional part of SPRTN domain modeled with high confidence includes three β-sheets upstream of the HEXXH active centre.
Apart from the differences among the SPRT and WLM domains in terms of amino acid sequence and structure, both proteins differ distinctly in their C-terminal regions. Therefore, we suggest that extrapolating similarities between SPRTN and Wss1 with respect to substrate specificity, affinity, mode of substrate binding and recruitment to chromatin should be done with caution. Most importantly, structural extrapolations should not be made before the crystal structure of SPRTN is solved. We conclude that the WLM and SPRT families are two separate families within the gluzincin subgroup of the zinzin superfamily. Like other gluzincins, they share similar properties such as an HEXXH proteolytic active site. Moreover, SPRTN is evolutionary closer to 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 bacterial SPRT proteases than to Wss1. Thus, we would like to clear up the confusion in the published literature, which occasionally states that yeast Wss1 is an ortholog of SPRTN. Any similar functional properties shared by these two proteases is a result of a convergent rather than divergent evolution [4].

DNA Replication-coupled DPC Proteolysis Repair
DPCs constitute strong physical blocks for the progression of DNA replication, causing DNA replication fork stalling and, consequently, fork collapse [4,8]. Using in vitro approach in Xenopus egg extract, it was recently demonstrated that in order for replication to progress in the presence of DPCs, DPCs have to be cleaved into smaller peptides on both the leading and lagging DNA strand [44]. However, the protease involved in the processing of DPCs remained unknown until SPRTN was identified as the S-phase specific protease responsible for DPC repair [4] ( Figure 2B).

DPCs and human pathogenesis
The contribution of DPCs to human pathogenesis was proposed by several studies that associated different DPC-inducing agents with ageing and cancer [78-80]. Mice   3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 Concluding remarks and future perspectives Repair of DPCs was dogmatically considered to be solely under the jurisdiction of canonical DNA repair pathways like NER and HR. However, recent independent work from several laboratories demonstrates that a specialised DNA repair pathway, which strictly depends on proteolysis, repairs DPCs. We named this novel pathway DNA-protein crosslink proteolysis repair (DPC-PR). The proteases involved in DPC-PR repair are Wss1 in yeast and SPARTAN in metazoans. The discoveries of these two proteases and a human syndrome resulting from defective DPC repair establish this new DNA repair pathway as an essential mechanism for genome maintenance and protection from accelerated ageing and cancer in mammals (see Outstanding questions).

Supplementary methods
PSI-BLAST (Position-Specific Iterated Blast) was used to identify orthologs of SRTN in bacteria, archaea, cyanobacteria, yeast, fungi, plants and animals by blasting the SPRT domain protein sequence of human SPRTN through the NCBI database (National Center for Biotechnology Information) [1]. The same approach was used to identify WLM domain-containing proteins using the WLM domain protein sequence from S. cerevisiae Wss1. Alanyl aminopeptidases (leukotriene A-4 hydrolase) protein sequences were downloaded from NCBI, and M1-LTA4H domains containing a HEXXH protease core were used for alignment and tree construction. Multiple sequence alignments were done using MAFFT [2]. Quality of alignment was estimated with Guidance software (alignment score was 0.532896) [3]. Phylogenetic tree was constructed using Maximum Likelihood method in PhyML 3.0.1 software (LG model, 10 rate categories, best of NNI and SPR for tree searching operations) [4]. Confidence of nodes was estimated by approximate likelihood ratio test (aLRT) [5]. Homology modelling of human SPRT domain was done using the crystal structure of Wss1b in fission yeast (Schizosaccharomyces pombe) (PDB: 5JIG) as a