Mapping the SARS-CoV-2 spike glycoprotein-derived peptidome presented by HLA class II on dendritic cells

Understanding and eliciting protective immune responses to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an urgent priority. To facilitate these objectives, we have profiled the repertoire of human leukocyte antigen class II (HLA-II)-bound peptides presented by HLA-DR diverse monocyte-derived dendritic cells pulsed with SARS-CoV-2 spike (S) protein. We identify 209 unique HLA-II-bound peptide sequences, many forming nested sets, which map to sites throughout S including glycosylated regions. Comparison of the glycosylation profile of the S protein to that of the HLA-II-bound S peptides revealed substantial trimming of glycan residues on the latter, likely introduced during antigen processing. Our data also highlight the receptor-binding motif in S1 as a HLA-DR-binding peptide-rich region. Results from this study have application in vaccine design, and will aid analysis of CD4+ T cell responses in infected individuals and vaccine recipients.


INTRODUCTION
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a novel beta-coronavirus that first emerged as a human pathogen in the Hubei province of China in late 2019, and is the aetiologic agent of coronavirus disease 2019 . Although SARS-CoV-2 infection is frequently asymptomatic or results in only mild illness, ~20% of symptomatically infected individuals progress to develop severe pneumonia, acute respiratory distress syndrome and/or sepsis, which can be fatal. By the 24th of July 2020 15,296,926 cases and 628,903 deaths had been reported worldwide 1 . The rapid global spread of SARS-CoV-2 and resulting pandemic have placed tremendous pressure on healthcare services, had a huge societal impact, and profoundly damaged the global economy, prompting an urgent need for vaccines to prevent further spread of infection and avert disease development 1 .
Immune correlates of protection against SARS-CoV-2 infection and progression to severe disease are not yet well-understood, although infection was found to induce at least-shortterm protective immunity in a SARS-CoV-2 non-human primate (NHP) infection model, indicating that immune responses are capable of mediating protection 2 . Passively transferred neutralising antibodies (nAbs) protect against SARS-CoV-2 infection in small animal models, and convalescent sera has been shown to be effective in the treatment of severe disease, suggesting the utility of nAb induction by vaccines 3,4,5 . Notably, the four seasonal common cold-causing human coronaviruses and the zoonotic Middle East respiratory syndrome (MERS) and SARS coronaviruses typically elicit poorly-sustained nAb responses, putatively enabling subsequent re-infection 6 . However, somewhat more durable T cell responses are induced, which in animal models can prevent development of severe disease on challenge, providing a rationale for vaccine-mediated induction of T cell as well as nAb responses 7,8,9,10 .
More than 150 candidate SARS-CoV-2 vaccines are now in preclinical development or clinical trials 11 .
The SARS-CoV-2 spike (S) glycoprotein (comprised of S1 and S2 subunits) is the primary target of vaccine development efforts. Homotrimers of the transmembrane S protein on the virion surface mediate virion attachment and entry into host cells, making S a key target for nAbs 12 .
S is also highly immunogenic for T cells, with many studies suggesting that although infected individuals mount CD4 + and CD8 + T cell responses to epitopes throughout the viral proteome, S is often at the top of the antigenic hierarchy 13,14,15 . The relative roles of CD4 + and CD8 + T cells in disease control or pathogenesis and impact of their protein and epitope specificity are unknown; but given the importance of CD4 + T cells (particularly CD4 + T follicular helper (Tfh) cells) in providing help for antibody responses 16 , and the correlation of memory B cell/nAb responses to S with circulating CD4 + Tfh responses in recovered COVID patients 17 , induction of potent Tfh cell responses to the S protein is likely to be crucial for the success of nAbinducing vaccines.
CD4 + T cells are initially activated in response to recognition of specific peptides presented with major histocompatibility complex class II (MHC-II) molecules on professional antigen presenting cells such as dendritic cells (DCs) 18 19,20 . HLA-II polymorphisms dictate the repertoire of peptides presented for CD4 + T cell recognition and shape the response elicited, which can influence the outcome of infection or vaccination 21 . Here, we defined SARS-CoV-2 S-derived peptides presented with diverse HLA-II alleles on DCs to facilitate analysis of pre-existing, post-infection or vaccine-elicited S-specific CD4 + T(fh) cell responses and their roles in protection, pathogenesis and prevention of re-infection.

RESULTS
Approach for analysis of SARS-CoV-2 S HLA-II presentation by monocyte-derived DCs (MDDCs) To identify peptides in the SARS-CoV-2 S protein with potential for targeting by CD4 + T cell responses, a mass spectrometry-based immunopeptidome profiling approach was employed to define peptides presented by HLA-II on DCs, antigen presenting cells that play a key role in in vivo CD4 + T cell priming ( Figure 1A). MDDCs were generated from 5 HLA-DRB1heterozygous donors, selected to enable profiling of peptides presented with a total of 9 different HLA-DRB1 alleles (Table 1); these donors also expressed 7 distinct HLA-DPB1 alleles.
MDDCs from each donor were pulsed with a recombinant SARS-CoV-2 S protein vaccine immunogen candidate 22 (produced in a mammalian cell expression system, Figure S2) or a recombinant viral glycoprotein from an unrelated virus that had been produced in the same way (to provide a negative control dataset), and incubated for 18 hours to allow antigen uptake, processing and presentation. Flow cytometry analysis indicated that, as anticipated, the CD11c + MDDC population had robust expression of HLA-I and HLA-DR, expressed high levels of the lectin-type receptors DC-SIGN and DEC-205 and had a relatively immature phenotype, expressing low levels of the DC maturation marker CD83 and moderate levels of the costimulatory molecules CD40, CD80 and CD86. Notably, no difference was observed in the phenotype of SARS-CoV-2 S protein and control protein-pulsed MDDCs, indicating that the S protein had not altered the DC maturation state or HLA expression levels ( Figure 1B, Figure S1).

Immunopeptidomic profiling of HLA-II-associated peptides presented by MDDCs
Protein-pulsed MDDCs were lysed and sequential immunoprecipitations performed with a pan-HLA-I-specific antibody (W6/32) for depletion of HLA-I complexes, followed by serial pan-HLA-DR (L243) and pan-HLA-DP (B721) immunoprecipitations for enrichment of HLA-DR-and HLA-DP-peptide complexes. After peptide elution and sequencing by tandem mass spectrometry, a total of 27,081 unique HLA-DR-and 2,801 HLA-DP-associated peptide sequences were identified at 1% FDR, of which 147 (HLA-DR) and 12 (HLA-DP) mapped to the S protein (Figure 2A-D, Table S1). None of these peptides were identified in the control protein-pulsed MDDC samples (not shown), consistent with derivation from the S protein antigen. The total number of identified peptides varied in each donor, and was influenced by starting cell numbers (Figure 2A-E). The overall peptide length distributions were highly characteristic of HLA-II-associated peptides, with a median amino acid length of 15 for both human and S peptides ( Figure 2F,G). As the HLA-DRA chain is invariant, differences in the peptide binding repertoires of HLA-DR heterodimers are dependent on the HLA-DRB allele expressed. Binding predictions (performed using NetMHCIIpan 4.0) suggested that 60-80% of the peptide sequences identified in the HLA-DR immunopeptidome had a high predicted binding affinity for the donor-specific HLA-DRA/B1 alleles 23 ( Figure 2H). When stratified by HLA-DRB1 allele, ~80% of peptides were predicted to bind just 1 allele in donors C2, C460 and C459, whilst for donors C491 and C493, peptides were more equally distributed between both HLA-DRB1 alleles ( Figure 2I). This was further reflected in an unsupervised Gibbs clustering analysis, which revealed the distinct sequence motifs characteristic of at least one of the donor's HLA-DRB1 alleles ( Figure 2J). Prior to purifying class II complexes, we also purified HLA-I ligands and identified 29,309 self-peptides. There was no evidence of MDDC HLA-I cross-presentation of the pulsed S protein (data not shown).

Multiple regions of S are presented by HLA-DR in a genotype-dependent manner
Characteristic of HLA-II-bound peptides, many of the S peptides formed distinctive nested sets around a common core. Two of the identified peptides originated from regions that were altered to assist recombinant protein expression and purification ( Figure S2C). The location of the identified S peptides in the context of protein region and domain structure and relative frequency with which particular sites are presented in each donor is summarised in Figure 3A.
Several "hotspots" from which a large number of unique peptides (typically different members of a nested set) are presented in multiple donors are apparent across the length of the full S protein, and two regions of S, spanning amino acids 24-49 and 457-485, particularly stand out as the sites from which the highest number of unique HLA-II-bound peptides were derived.
To explore the contribution of individual HLA-DR alleles to presentation of the S protein, we investigated the likely allele to which each peptide was bound using HLA-II binding prediction

HLA-II-bound S peptides with N-linked glycosylation predominantly bear truncated paucimannose glycans
To determine the glycosylation status of the S protein immunogen used in this study, a proteomic approach was used to map the N-linked glycosylation sites, involving in vitro digestion of the recombinant S protein, trimming of glycans from the generated peptides with PNGaseF in the presence of heavy water (H2 18 O), and peptide characterisation by mass spectrometry 24 . This analysis revealed that the 22 N-linked glycosylation sites previously described in S 25 were occupied in the S protein employed here (Table S2). Notably, regions of the S protein containing glycosites were devoid of peptides identified in our initial analysis of the HLA-II-bound peptidome, raising the question of whether S-derived glycopeptides were also presented by MDDCs ( Figure 3A).
To enable glycopeptide analysis, non-PNGaseF-treated S digests were analysed using a wellestablished glycoproteomics strategy 26 . Using this approach, glycopeptides at 19 sites of S were identified to carry oligomannosidic and complex/hybrid-type N-glycans ( Figure 4A, Table S3). Most sites displayed extensive glycan microheterogeneity arising from differences in both glycan types and structural features including terminal sialylation and fucosylation in agreement with the known site-specific glycosylation of S 25 .
Next, we applied the site-specific glycopeptide methodology to the mass spectra acquired from samples eluted from HLA-II ( Figure 4B). 80 distinct glycopeptide forms mapping to S were identified, the majority of these (76) were derived from the HLA-DR-bound immunopeptidome (Table S4). These glycopeptide forms mapped to 52 unique peptide sequences that typically formed nested sets, were predominantly observed in datasets generated from donors C459 and C460 (where the highest number of unique HLA-DR-bound non-glycosylated peptides were also detected), and had a similar length distribution to Sderived non-glycopeptides ( Figure 4C,D). 66% (20-100%) of all glycopeptide sequences were predicted (using NetMHCIIpan 4.0) to bind to one or more of the donor's HLA-DR alleles ( Figure 4C). The largest nested set consisted of glycopeptides from donor C459/C460 MDDCs that mapped to position N801 located directly in the fusion peptide (FP, 788-806), a highly conserved region which facilitates membrane fusion during viral entry ( Figure 4E,F). In total, we identified HLA-II-bound glycopeptides bearing glycans derived from 14 of the N-linked glycosylation sites in S ( Figure 4F). HLA-II-bound peptides carried predominantly short paucimannosidic-type N-glycans while S carried oligomannosidic-and GlcNAc-capped complex-type N-glycan structures at these sites ( Figure 4B,F). The paucimannosylation of the HLA-II bound peptides comprised both core-fucosylated (M1F, M2F, and M3F i.e. Man1-3GlcNAc2Fuc1) and a fucosylated (M2, Man2GlcNAc2) species as supported by fragment spectra analysis ( Figure 4E).
In addition, utilising an open post translational modification (PTM) peptide identification methodology, we identified peptides containing the other most common post translational modifications (other than glycans) in the S immunopeptidome ( Figure S5A). The most modified residue was cysteine (C), which was found to have undergone cysteinylation, oxidation to cysteic acid and conversion to glutathione disulphide ( Figure S5B). A total of 27 peptides with modified C residues were identified that mapped to 6 positions in S. All peptides contained a single modified C residue and are known to form disulphides in the tertiary structure of S 27 .
Peptides derived from the receptor binding domain (RBD) of the SARS-CoV-2 S protein are presented by multiple HLA-DR alleles Altogether, a total of 209 unique HLA-II-bound peptides (differing in amino acid sequence) derived from the SARS-CoV-2 spike protein were detected in this study. The locations and putative presenting HLA-II alleles of these peptides (typically members of large nested sets) are summarised in Figure 5. Partly overlapping nested sets of peptides predicted to be presented by distinct HLA-DR alleles in different donors were identified in several regions of the spike protein. One region in the RBD particularly stood out in this regard, as it contained 5 nested sets of peptides and an additional peptide that were predicted to be presented by 6 different HLA-DR alleles. At least one peptide within this region was found to be presented in every donor studied ( Figure 5, Figure 6A). In the donors analysed, a total of 21 unique peptide sequences derived from this region were identified altogether ( Figure 6A,B) and versions of some of these post-translationally-modified at residues C480 and C432 also detected ( Figure   6C). This region overlapped directly with the receptor binding motif (RBM), an extended insert on the beta-6/5 strands that contains the contact points with the receptor ACE2 28 .
To gain insight into the sequence conservation of this region in other coronaviruses infecting humans, S protein sequences from SARS-CoV-2, SARS-CoV and MERS-CoV (the other betacoronaviruses that have caused epidemics in humans in the past two decades) and the endemic human coronaviruses 229E, NL63, OC43, and HKU1 were aligned ( Figure 6D).
Although this region of the SARS-CoV-2 S protein showed some sequence similarity with the equivalent region of the SARS-CoV S protein, this is an indel-rich region of S that was much less well conserved in the other coronaviruses examined. However, although residues that are likely to constitute key anchors in the core regions of the nested peptide sets predicted to bind to particular HLA-DRA/DRB1 molecules (SARS-CoV-2 F464 and S469, which match anchor residue preferences in DRB1*04:01-, 07:01-, 13:03-and 15:01-binding peptides; F464 and D467, which match preferred anchors of DRB1*03:01-binding peptides; and I472 and S477, which match those of DRB1*01:01-binding peptides) are not well conserved or not appropriately positioned relative to one another in all coronaviruses, there appears to be some potential for HLA-II-binding peptides to be generated from this region of other human coronavirus S protein sequences.

DISCUSSION
Concurrently with the design and clinical evaluation of candidate immunogens in the race to develop vaccines with prophylactic efficacy against SARS-CoV-2 infection and associated disease, there is an urgent need to define T cell epitopes to facilitate analysis of the contribution of T cell responses to protection and pathogenesis in infected individuals and monitoring of immune responses elicited in human vaccine trials 14 . As the SARS-CoV-2 S protein is the major target on the virus for neutralising antibodies 29,3,4,30,5,31,32,33 and has also been shown to be highly immunogenic for T cell responses in infected individuals 13 protein that could putatively be targeted by T cell responses in multiple individuals. The peptides identified in this study provide an important resource that will expedite 1) exploration of pre-existing T cell responses to other coronaviruses, 2) cross-comparison of responses elicited by different vaccine immunogens and platforms, and 3) design of nextgeneration vaccines tailored to elicit enhance responses to nAb epitopes, or focus T cell responses on selected epitopes.
The importance of defining SARS-CoV-2-derived peptides presented by diverse HLA alleles is illustrated by the plethora of recent efforts to employ in silico approaches to predict putative T cell epitopes in SARS-CoV-2 proteins 35 36 . Our data give key insight into the repertoire of peptides that are in fact presented with HLA-II when exogenous S protein is internalised and processed by DCs, mimicking a scenario occurring as T cell responses are induced during natural infection or following vaccination with protein immunogens. Whether these peptide profiles are also representative of those presented on DCs in which the S proteins is endogenously-expressed (e.g. as may occur following antigen delivery with viral vectored or nucleic acid-based vaccine platforms), which may lead to antigen processing and peptide association with HLA-II in different intracellular compartments, remains to be determined 18 . infectivity and also subvert recognitions by host adaptive responses (by shielding nAb binding sites and impairing antigen processing for T cell recognition); but it is also targeted by host innate immune recognition pathways 40 . S protein glycosylation is carried out by the host cell glycan processing machinery, resulting in attachment of a range of oligomannosidic, complex or hybrid structures that mimic mature surface glycoproteins of the host. We initially confirmed that these patterns were present in the intact S protein used to pulse MDDCs.
Strikingly, we found that the HLA-II-bound S peptides were in contrast glycosylated at the same site but with glycans rich in highly processed paucimannosidic-type structures. This observation implies a significant modulation of the glycan phenotype upon internalisation, processing, and presentation of the S glycoprotein in MDDCs. Paucimannosidic glycans are defined as truncated α-or β-mannosyl-terminating N-glycans carried by proteins expressed widely across the eukaryotic domain, but remains a poorly understood glycan class in human glycobiology and virology 41 . We have recently reported that neutrophils 42,43 and monocytes/macrophages 44 , but thus far not DCs, are paucimannose producing cell types in the innate immune system. The paucimannosidic glycans have been proposed to be formed via the sequential trimming facilitated by the N-acetyl-β-hexosaminidase isoenzymes and linkage-specific α-mannosidases residing in lysosomes or lysosomal-like compartments 41 .
Supporting our data suggesting an extensive DC-driven glycan remodelling ahead of viral glycopeptide presentation, N-acetyl-β-hexosaminidase and α-mannosidase, and several other hydrolytic enzymes (e.g. cathepsin D) are known to be abundantly expressed and highly active in MHC class II processing compartments (MIICs) 45 . Furthermore, MHC class II immunopeptides carrying truncated N-glycans have previously been reported from other cellular origins 46,47 . CD4+ T cell recognition of glycosylated peptides has been reported in rheumatoid arthritis (O-linked) 48 and cancer (N-linked) 49 , and CD4 + peptides in the melanoma antigen tyrosinase require the presence of N-linked glycosylation to elicit a T cell response 49 .
A recent study also showed that immunization of mice with a recombinant human immunodeficiency virus type 1 (HIV-1) envelope (Env) glycoprotein immunogen elicits CD4+ T cell responses to a glycopeptide epitope that provide help for induction of Env-specific antibody responses 50  antigen processing and T cell responses in the context of melanoma 56 . In an antigen presenting cell loading system similar to that used here, a requirement for peptide endocytosis and processing of a spontaneously cysteinylated peptide was required to establish T cell activation but not MHC binding and presentation 55 . Thus it will be important to determine whether the cysteine-modified S peptides described herein are targeted by T cells following vaccination or during SARS-CoV-2 infection.
Notably, the RBM, an area of the RBD important for interacting with the host receptor ACE2 28 , was found to be a HLA-DR-binding peptide-rich region, with presentation of peptides derived from amino acids 457-485 of the SARS-CoV-2 spike protein being detected in all of the HLA-diverse donors studied. Analysis of the T cell responses elicited following vaccination of mice with recombinant DNA (rDNA) based vectors encoding the S proteins from both SARS and SARS-CoV-2 has shown that the epitopes targeted by CD4+ T cells include a site in the RBD that encompasses the RBM 57,58 , suggesting that peptides derived from this region may be presented in a cross-species manner. Little information is currently available about the epitopes recognised by CD4+ T(fh) cell responses in SARS-CoV-2 infected or vaccinated individuals, although CD4 T cell responses to an epitope at amino acids 449-461 in the SARS-CoV spike (equivalent to, although having a number of sequence differences from amino acids 462-474 of the SARS-CoV-2 spike protein) were detected in healthy donors not exposed to SARS-CoV (or SARS-CoV-2) 36 ; and a recent analysis of T cell responses in recovered SARS-CoV-2 infected patients detected a response two overlapping peptides spanning amino acids 446-465 of the SARS-CoV-2 spike protein 59 . More work is needed to determine how commonly this region is recognised by CD4+ T cell responses, both in individuals exposed only to seasonal human coronaviruses and SARS-CoV-2 individuals, and also to explore the inter-virus crossreactivity of RBM-targeting CD4+ T cells; but our findings highlight this as a putatively immunogenic region worthy of further study.
In summary, our data provide a detailed map of HLA-II-binding peptides in the SARS-CoV-2 S protein that will facilitate the analysis of CD4+ T cell responses to both "conventional" and

Protein expression and purification
The SARS-CoV-2 ectodomain constructs were produced and purified as described previously 22 60 . Briefly, the expression construct included residues 1−1208 of the SARS-CoV-2 S (GenBank: MN908947) with proline substitutions at residues 986 and 987, a "GSAS" substitution at the furin cleavage site (residues 682-685), a C-terminal T4 fibritin trimerization motif, an HRV3C protease cleavage site, a TwinStrepTag and an 8XHisTag.
Expression plasmids encoding the ectodomain sequence were used to transiently transfect FreeStyle293F cells using Turbo293 (SpeedBiosystems). Protein was purified on the sixth day post transfection from the filtered supernatant using StrepTactin resin (IBA), followed by size exclusion chromatography using a Superose 6 Increase 10/300 column.

Differentiation of monocyte-derived DCs (MDDCs)
To differentiate MDDCs, mononuclear cells were thawed and plated at 10 6

Production and purification of HLA-specific antibodies
Hybridoma cells (clones W6/32, L243, B721) were cultured in a CELLine CL 1000 Bioreactor (Integra) as described in (31). B721 was kindly supplied by Prof. Anthony Purcell, Monash University, Melbourne, Australia. Briefly, cells were cultured in serum-free Hybridoma-SFM medium (Gibco) supplemented with hybridoma mix (2,800 mg/l of d-glucose, 2,300 mg/l of peptone, 2 mM l-glutamine, 1% penicillin/streptomycin, 1% non-essential amino acids, 0.00017% 2-mercaptoethanol). Supernatant containing antibody was harvested and stored at -20. To purify antibodies, supernatants were thawed then centrifuged at 2,500 × g for 25 min at 4°C, filtered (0.2 µm SteriCup Filter (Millipore)) and incubated with protein A resin (PAS) (Expedeon) for 18 hours at 4°C. Antibody-resin complexes were then collected by gravity flow through chromatography columns, washed with 20 ml of PBS, and eluted with 5 ml 100 mM glycine pH 3.0. pH was adjusted to pH 7.4 using 1 M Tris pH 9.5 and antibodies were buffer exchanged into PBS and concentrated with a 5 kDa molecular weight cut-off ultrafiltration device (Millipore).  with default settings. Alignments were visualised in Jalview and colouring amino acids by property based on CLustal_X definitions or identity. All graphs were plotted in R or Excel.

Glyco-site and glycopeptide analysis
Raw data for the de-N-glycosylated samples were analysed as described above with the following modifications. A standard PeaksDB search was utilised with additional variable post translational modifications set for 18 O labelling (C-terminal +2.00 Da), 18  utilized in a 'rare' variable modification search strategy. A fixed modification of carboxyamidomethylation was included for digested S. Data was filtered at a 1% protein FDR resulting in a 1.38 % peptide FDR in the combined dataset, glycopeptides were further filtered to ensure that oxonium ions were present in the MS2 spectrum and that scores were above 150 as described 64 . All reported glycopeptides were manually checked. Filtering and spectral counting was performed in Byologic v3.8.11, spectra with the same precursor mass and peptide identity was grouped and compositions determined as per the library.
The authors declare no conflict of interest.