Genetic modification of primary human B cells generates translationally-relevant models of high-grade lymphoma

Sequencing studies of Diffuse Large B Cell Lymphoma (DLBCL) have identified hundreds of recurrently altered genes. However, it remains largely unknown whether and how these mutations may contribute to lymphomagenesis, either individually or in combination. Existing strategies to address this problem predominantly utilize cell lines, which are limited by their initial characteristics and subsequent adaptions to prolonged in vitro culture. Here, we describe a novel co-culture system that enables the ex vivo expansion and viral transduction of primary human germinal center B cells. The incorporation of CRISPR/Cas9 technology enables high-throughput functional interrogation of genes recurrently mutated in DLBCL. Using a backbone of BCL2 with either BCL6 or MYC we have identified co-operating oncogenes that promote growth and survival, or even full transformation into synthetically engineered models of DLBCL. The resulting tumors can be expanded and sequentially transplanted in vivo, providing a scalable platform to test putative cancer genes and for the creation of mutation-directed, bespoke lymphoma models.


Introduction
Diffuse Large B Cell Lymphoma (DLBCL) is the most common form of non-Hodgkin lymphoma. Although potentially curable with immunochemotherapy, up to 40% of patients succumb to their disease 1 . In an attempt to unravel the biological basis of DLBCL and to identify new therapeutic opportunities, several groups have recently reported large genomic studies [2][3][4] . These highlight the considerable genetic heterogeneity of DLBCL and identify hundreds of recurrently mutated genes, copy number alterations and structural variants.
Clusters of co-mutated genes suggest the existence of genetic subtypes of DLBCL that may behave differently when exposed to therapeutic agents. Whilst the functional and mechanistic consequences of some of these genetic alterations have been established, for the majority we have little to no understanding of their contribution to lymphomagenesis. To translate these genomic findings into therapeutic progress, it is critical to understand the functional importance and therapeutic relevance of these genetic alterations, both individually and in combination.
Existing model systems used for the functional interrogation of lymphoma genetics consist predominantly of lymphoma cell lines and genetically modified mice. However, both have limitations; cell lines were often established from patients with end-stage, non-nodal or even leukemic phase lymphoma and carry an extensive and biased mutational repertoire, further selected over years or even decades of in vitro growth [5][6][7][8] . Genetically engineered mice, on the other hand, are costly, time-consuming to generate and therefore unsuitable for highthroughput or combinatorial experiments. Furthermore, the genetic requirements for tumorigenesis in mice do not always accurately reflect those in humans 9,10 . As such, the development of new, preclinical models of lymphoma that can capture its considerable genetic diversity, has been identified as a priority area for lymphoma research 11 .
In common with many of the mature B cell malignancies, DLBCL is thought to arise from the germinal center (GC) stage of B cell differentiation 12,13 . An attractive solution would therefore be to use primary human GC B cells as a platform for ex vivo genetic manipulation.
Equivalent approaches have proved fruitful for epithelial malignancies [14][15][16][17] . However, technical difficulties associated with the ex vivo culture and genetic manipulation of human GC B cells, including high manipulation-associated cell toxicity and low transduction efficiency, have obstructed the exploitation of such models to study lymphoma.
Here, we describe an optimized strategy that facilitates proliferation and highly efficient transduction of non-malignant, primary, human GC B cells ex vivo. We show that combinations of oncogenes permit long term culture in vitro, allowing the system to be used for high-throughput screening of oncogenes and tumor suppressors, and for the creation of genetically customized human lymphoma models that can be studied in immunodeficient mice.

Ex vivo growth and transduction of primary human GC B cells
Germinal center B cells are programmed to undergo apoptosis in the absence of survival signals from T follicular helper cells and follicular dendritic cells (FDC). Consistent with this, it is well-established that GC B cells perish rapidly if cultured unsupported ex vivo 18 .
Previous attempts to support ex vivo growth of human GC B cells employed CD40lg transfected fibroblasts in combination with soluble cytokines including IL2, IL4 and IL10 18,19 . Related strategies have used a FDC-like feeder cell termed HK that supported GC survival and allowed short term proliferation when combined with CD40lg 20 . With the increasing appreciation of the importance of IL21 to GC B cell biology 21,22 , later systems have used HK feeder cells combined with CD40lg and IL21 23 . However, proliferation of GC B cells in all these systems was typically limited to a period of up to 10 days [18][19][20]23 .
We employed a similar system based upon a freshly established culture of modified HK cells, termed YK6 that were immortalized with TERT, P53dd and CDK4 ( Figure S1a). These were further engineered to express membrane bound human CD40lg and to secrete soluble IL21, termed YK6-CD40lg-IL21 ( Figure S1b). We isolated primary GC B cells (CD38 + CD20 + CD19 + CD10 + ) from pediatric tonsil tissue (Figure 1a), which when grown in co-culture with YK6-CD40lg-IL21 survived and proliferated vigorously for up to 10 days without a requirement for any additional cytokines (Figure 1b&c  Interestingly, the GaLV envelopes also enabled the transduction of primary human DLBCL cells supported on YK6-CD40lg-IL21 cells ( Figure S1c).

Long term expansion of human germinal center B cells ex vivo
We proceeded to use this culture-transduction system to introduce into human GC B cells oncogenes that are commonly deregulated in human lymphoma. Out of five genes tested, no single gene was able to prolong the survival of primary GC B cells cultured in our system ( Figure 2a&b). However, BCL2 when co-expressed with either MYC or BCL6 overexpression did lead to long term expansion and survival of transduced GC B cells in culture. These cells continued to expand and proliferate vigorously in culture beyond 100 days. We also tested other transcription factors associated with the germinal center reaction, and their lymphomaassociated mutants, in combination with BCL2 in a pooled, competitive culture. This showed initial expansion of cells transduced with MEF2B Y69H, a mutation commonly found in DLBCL and follicular lymphoma 29 . However, by day 59, cultures were dominated by BCL6transduced cells suggesting this as the transcription factor best able to promote long-term growth of GC B cells ex vivo (Figure 2c, Supplementary Table 1). Flow cytometry after 10 weeks of culture showed that cells transduced with BCL2 and BCL6 maintained expression of surface markers reminiscent of GC B cells including CD19, CD20, CD22, CD38, CD80 and CD95 (Figure 2d). Cells expressed both CD86 and CXCR4 markers, an immunophenotype intermediate between light and dark zone GC B cells (Figure 2d). Cells transduced with BCL2 and MYC remained viable and proliferated but downregulated CD20 and CD19, consistent with differentiation towards plasmablasts ( Figure S1d). The plasma cell marker CD138 was not expressed by either BCL2/MYC or BCL2/BCL6 transduced cells ( Figure S1e).
We compared gene expression profiles of freshly isolated and transduced GC B cells cultured ex vivo at early (5 days) and late (10 weeks) time points (Figure 2e). As anticipated, this showed enrichment of a STAT3 signature in cultured cells consistent with ongoing IL21 stimulation. While freshly isolated GC B cells were enriched for expression of centroblast genes, the cultured and transduced cells adopted a gene expression profile more similar to that of centrocytes, consistent with ongoing CD40 stimulation. Importantly, the centrocyte is the stage of GC differentiation most similar to DLBCL 30 . Transcriptome analysis was also compared with that of six cell lines commonly used as models of GC-derived lymphomas, including the main subtypes of DLBCL and Burkitt lymphoma. When compared to a signature of germinal centre expressed genes (GCB-1) 31 , long-term BCL6-transduced cells clustered more closely with GC B cells than did the cell lines ( Figure 2f & Figure S1f).
Overall, these results suggest that transduced primary human germinal center B cells can be cultured long-term ex vivo, retaining characteristics of the initial GC B cell that are shared with DLBCL cells. This represents a valuable new model system for the functional interrogation of genes involved in germinal center lymphomagenesis.

High-throughput screening for tumor suppressor genes in cultured primary human GC B cells.
We wished to use the system for the high-throughput study of putative tumor suppressor genes (TSGs) in lymphoma. We hypothesized that many tumor suppressor pathways are already inactivated in lymphoma cell lines, and as such, primary GC B cells should be more sensitive in identifying a competitive growth or survival advantage following TSG inactivation. Robust expression of Cas9 was achieved using a stable Cas9 retroviral packaging line ( Figure S2a) and initial experiments confirmed efficient gRNA-directed targeting in primary, human, GC B cells ex vivo ( Figure S2b & c). We therefore created a lymphoma-focused CRISPR gRNA library composed of 6000 gRNAs targeting a total of 692 genes reported to be mutated or deleted in human lymphoma, along with 250 non-targeting control guides. Each gene was targeted by up to 9 gRNAs ( Figure S2d) and deep sequencing revealed that 99% of gRNAs were within four times of the mean in frequency (Figure 3a).
The library was transduced into primary GC cells shortly after their transduction with BCL2, BCL6 and Cas9 cDNAs. Figure 3b shows an experimental scheme of the CRISPR screening.
Cas9 and gRNA constructs were marked with fluorescent proteins to allow selection to be visualized by FACS. Whilst Cas9 and gRNA dual infected cells comprised only 10% of all cells at day 4 this population expanded to 90% by day 88 of culture ( Figure S2e), suggesting strong selection for one or more of the library gRNAs. Genomic DNA was sequenced at intervals and a CRISPR gene score was generated for each gene (Figure 3b).
Genes that showed the greatest enrichment during culture over 10 weeks included wellestablished tumor suppressors such as TP53, CDKN2A and PTEN (Figure 3c), thus validating the ability of our system to detect bona fide TSGs. Interestingly, the greatest enrichment was seen for GNA13 (Figure 3c), which encodes the G protein subunit a13. Inactivating mutations of GNA13 are common in DLBCL and BL 32,33 but rare in other forms of cancer, where, in contrast, amplification may be more common ( Figure S2f) 34,35 . As such, GNA13 can be considered as a germinal center specific TSG. Enrichment was seen for 8 out of 9 gRNAs targeting GNA13 over different timepoints, with similar results seen for TP53 and CDKN2A (Figure 3d), and was reproduced in replicate screens performed using GC B cells from three separate donors (Supplementary Table 2). All GNA13 gRNAs led to effective depletion of GNA13 (Figure 3e), apart from one which was associated with presumed offtarget toxicity and further confirmed in a cell line ( Figure S2g). We performed a parallel screen using the lymphoma cell line HBL1 ( Figure S2h) and also compared data from recent published CRISPR screens (Figure 3f). In these cell line experiments, enrichment of gRNAs targeting TSGs was much more modest. This highlights a unique strength of this system to identify genetic changes associated with enhanced growth and survival; a phenotype that is hard to identify using heavily mutated cell lines, already optimized for in vitro growth.
GNA13 acts downstream of the G-protein coupled receptors S1PR2 and P2RY8 and enrichment for both genes was observed in our screens (Figure 3c). Mouse knockout studies have suggested that suppressed activity of this pathway in lymphoma may allow egress from the germinal centre and increase cell survival secondary to enhanced AKT activity 33,36,37 . In contrast, other studies suggest a pro-survival effect in DLBCL that is independent of AKT activity 38 . We therefore quantified pAKT levels in ex vivo GC B cells transduced with gRNAs against GNA13, PTEN or non-targeting controls, and stimulated on YK6-CD40lg-IL21 feeder cells (Figure 3g). Although pAKT was increased in PTEN-depleted cells, no increase was seen in GNA13-depleted cells. However, GNA13 depletion did lead to a marked reduction in apoptosis in cultured primary GC cells (Figure 3h), but no change in cell proliferation ( Figure S2i). This confirms AKT-independent, enhanced cell survival as the explanation for the competitive advantage seen following GNA13 depletion in this culture system.
These experiments highlight how primary ex vivo GC cells can be used both for highthroughput as well as gene-focused functional experiments. They demonstrate how the system is especially suited to the identification of genetic alterations associated with increased competitive fitness, a phenotype that is hard to induce in established cell lines.
Finally, the ability to identify lymphoma-specific TSGs strongly supports the validity of this system for the study of lymphoma genetics.  Figure S3). Importantly, all tumors were negative for EBER (ISH) confirming that latent EBV genes did not contribute to lymphomagenesis in these tumors.   Table 3). The significance of these mutations to tumor formation is uncertain, however some of these genes are typical targets of aberrant somatic hypermutation suggesting the possibility of ongoing somatic hypermutation in these lymphomas. To investigate this possibility, we analyzed the variable region sequence of dominant clones detected in the IgH clonality assay. As expected, given their germinal center origin, almost all clones showed evidence of diversification from the germline V gene sequence (Figure 5d&e).
Importantly however, clones also showed evidence of ongoing diversification of the hypervariable regions ( Figure S5d). This suggests that AID mediated somatic hypermutation remained active during the process of tumor formation.
Overall, the ability of these tumors to closely recapitulate the appearances of high-grade B cell lymphoma further validates the biological relevance of this system to the study of human lymphoma and provides the opportunity to generate mutation-directed, bespoke in vivo lymphoma models.

Discussion
The plethora of genomic information generated from next generation sequencing studies has left us with a need for new experimental systems in which to study the genetics of human lymphoma and to decipher these rich data resources. The availability and suitability of current preclinical models is recognized as a rate limiting step in translating genomic knowledge into patient benefit 11 . The cell of origin of most aggressive B cell lymphomas, including DLBCL and BL, is the GC B cell 12,13 . We therefore reasoned that non-malignant, human GC B cells should be the input for a system to create genetically defined models of human lymphoma. We describe an optimized system for the culture and transduction of primary, human GC B cells ex vivo. This relies on the provision of microenvironmental survival signals common to that of the germinal center, as well as the overexpression of combinations of oncogenes common to the pathogenesis of human lymphoma. In particular, this includes BCL6, a transcription factor central to the GC reaction as well as an established oncogene in GC-derived lymphoma. A related strategy has been employed previously to expand peripheral blood memory B cells for the purposes of monoclonal antibody engineering 40 . However, this is the first use of genetically altered human, primary, GC B cells for the functional investigation of lymphoma genetics and the first to generate synthetic, in vivo, human models of lymphoma.
A major advantage of using primary GC cells over established lymphoma cell lines is the ability to investigate defined genetic alterations on a genetically normal background. In particular, this provides a sensitive platform for investigating the ability of specific genetic alterations to increase survival and proliferation. An enhanced oncogenic phenotype is much harder to discern in cell lines where the mutational repertoire is likely to have evolved extensively for optimal in vitro growth. The superior sensitivity of this system, compared to cell lines, to detect alterations associated with increased growth or survival is evidenced by the strong enrichment for TSGs in our CRISPR screen when compared to conventional cell lines. The relevance of the system to the pathogenesis of human lymphoma is underscored firstly, by the ability to recapitulate the appearances of human high-grade B cell lymphoma in vivo and secondly, by the ability to identify GC-specific TSGs such as GNA13 in our CRISPR screen. Inactivating mutations of GNA13 are common in lymphoma, but rarely seen in other forms of malignancy. Indeed, amplification is more common in solid organ cancers, where GNA13 is generally considered to act as an oncogene 41 . The detection of GNA13 missense mutations and a frameshift mutation of S1PR2 in one of our synthetic lymphomas further underscores the importance of this pathway in GC derived lymphomas. We show how CRISPR/Cas9 can be integrated into the system for high throughput screening as well as for individual, gene-focused analysis such as the clear demonstration of AKT-independent survival advantage in GNA13-depleted cells. This is a finding consistent with the greater enrichment scores for GNA13 compared to PTEN in our screening experiments.
The system affords versatility, with potential to vary the stimulation provided, the combination of expressed backbone oncogenes and the mechanism of their introduction. We envisage future studies that might remove or replace components of the feeder-based stimulation, for instance to identify factors promoting cytokine independent growth, or alter the background oncogene combination to screen for synergy between different oncogenic hits. The selective pressure imposed could be further altered by the use of pharmacological inhibitors of specific pathways. Future studies might also employ mutant open reading frame (mORF) screens or targeted CRISPR gene editing to introduce specific mutations into endogenous loci.
The complex genetic heterogeneity of human lymphoma is becoming increasingly evident 2-4 .
It is clear that the repertoire of available cell lines does not adequately represent each of the many molecular subtypes predicted from the analysis of sequencing studies. Therefore, the ability to generate mutation-directed tumors in vivo, provides an attractive route for patientpersonalized preclinical models. A particular advantage over tumor-derived xenograft models is the ability to create paired, syngeneic controls; tumors that are genetically identical other than the presence or absence of a specific mutation. Similar approaches to culture and manipulate human primary cells are proving successful for some solid organ malignancies [14][15][16][17] . However technical limitations have precluded this in B cell lymphoma. We present for the first time an extensively optimized, yet inexpensive strategy to employ primary, human, GC B cells for the investigation of lymphoma genomics and to generate bespoke, in vivo models of human lymphoma. This addresses an important bottleneck in translating lymphoma genomic findings into functional understanding that can drive improved patient outcomes and personalized therapy.

Availability of data and reagents
Gene expression data has been uploaded to the EGA database under the accession number EGAS00001003560. All other remaining data are available within the article or supplementary information. Reagents including feeder cell lines, viral packaging line, viral plasmids will be distributed freely upon request.

Figure S3
Immunohistochemistry images for the indicated markers are shown (Magnification 20x). Six different tumors are shown. Scale bar, 100μM.
Ten different tumors are shown. Scale bar, 100μM.

Supplementary Videos
Videos show culture of GC B cells alone, YK6 alone, YK6 + GC B cells, YK6-CD40Lg + GC B cells and YK6-CD40Lg-IL21 + GC B cells from the time of plating to 132 hours after.

Supplementary Table 1
Enrichment scores based on relative read counts of barcoded expression constructs for transcription factors or their mutant versions in GC B cells co-transduced with BCL2 over 4 different timepoints (n=3).

Supplementary Table 3
Protein altering variants identified by MUTEC2 using the matched, pre-transduced, germinal center B cells as the normal control.    CRISPR gene score = Average gRNA score for gene      T  C  G  A  P  a  n  C  a  n  )   B  r  e  a  s  t  In  v  a  s  iv  e  C  a  r  c  in  o  m  a  B  r  e  a  s  t  (  T  C  G A P a n C a n ) B la d d e r ( T C G A P a n C a n ) U te r in e ( T C G A P a n C a n ) S to m a c h ( T C G A P a n C a n ) L iv e r ( T C G A P a n C a n ) L u n g a d e n o ( T C G A P a n C a n ) C e r v ic a l ( T C G A P a n C a n ) M e la n o m a ( T C G A P a n C a n ) O v a r ia n ( T C G A P a n C a n ) C o lo r e c ta l ( T C G A P a n C a n ) U v e a l m e la n o m a ( T C G A P a n C a n ) P C P G ( T C G A P a n C a n ) A C C ( T C G A P a n C a n ) P a n c r e a s ( T C G A P a n C a n ) P r o s ta te ( T C G A P a n C a n ) T e s ti c u la r g e r m c e ll ( T C G A P a n C a n )       streptomycin and kept at 37°C in a humidified incubator (5% CO2 and 95% atmosphere).  harvest timepoint going forward. A minimum of 1000x representation was maintained at each passaging step. Cells were harvested every 14 days. Genomic DNA extraction was conducted as described previously 14 and Illumina sequencing was performed as described 6,15 . Purified libraries were quantified, pooled and sequenced on Illumina HiSeq4000 by 50-bp single-end sequencing.

Computational analysis of CRISPR screens
Raw reads were normalized to a total number of reads in a sample as follows: =#>?@ ∆AB , Finally, # , which represents the magnitude and direction of a fitness of a gene between the two time points is: where # denotes the number of sgRNA of gene in replicate and is the number of available replicates.

RNA-Sequencing
Total RNA from cells was extracted using NucleoSPIN RNA from Macherey-Nagel and cDNA was produced from 500ng of total RNA using qScript TM cDNA SuperMix (Quanta Biosciences). RNA-seq library was prepared using the NEBNext Poly (A) mRNA Magnetic Isolation Module (E7490) and NEBNext Ultra Directional RNA Library Prep Kit for Illumina (E7420) according to manufacturer's instructions. NEBNext Multiplex Oligos for Illumina (E7500) was used for indexing and sequenced on a HiSeq4000 by 50-bp single-end sequencing. RNA-Seq were mapped to the hg38/GRCh38 reference human genome using splice-aware aligner STAR 2.5.3a 1 in two pass-mode. The genome index was built with GENCODE v.28 comprehensive gene annotation set. Uniquely mapped reads were assigned to genes with RSubread package 2 allowing for assignment of a read to more than one overlapping features. At least 25 of overlapped bases were required to assign a read to a gene.
Genes with low-counts were filtered out with a threshold of minimum 128 counts in at least 25% of samples. Gene expression values were obtained using variance stabilizing transformation as implemented in the DESeq2 3 package.

Barcoded overexpression experiments
The CDS for human gene sequences (BCL6 WT, BCL6_G559R, BCL6_H641R,   BCL6_R585W, BCL6_P586A, IRF8 WT, IRF8_N87Y, IRF8_T80A, IRF8_380_stop, IRF8_S55A, MEF2B WT, MEF2B_D83V, MEF2b Y69H) were cloned into barcoded pBMN-IRES-LyT2 retroviral vector using NEBuilder® HiFi DNA Assembly. Primary human GC B cells were retrovirally transduced with BCL2, followed by infection with barcoded overexpression genes and pooled four days after transduction, and then grown in competitive culture. Genomic DNA was collected at day 4 and approximately every 14 days after. Genomic DNA extraction was conducted as described previously 14

BCR amplification
PCR amplification of DNA from synthetic lymphoma tumors (100 ng input) was performed with JH reverse primer and FR1 forward primer set pools (provided by Sigma Aldrich) as previously described 16 . MiSeq libraries were generated using KAPA Hyper Prep Kit (KAPA Biosystems) incorporating KAPA Dual-Indexed Adapter for Illumina MiSeq platforms and reads were filtered as described previously 17 . The computational pipeline MRD Assessment and Retrieval Code in Python (MRDARCY) was then used to analyze BCRs, followed by secondary rearrangement analysis in which the relative frequencies of each IgHV gene were determined by BLAST using the IMGT reference gene database 18

High-throughput sequencing and analysis of heavy chain immunoglobulin
Deep sequencing of PCR amplified immunoglobulin heavy chain variable gene regions and BCR network generation algorithm and network properties were performed as previously described 16 . Each vertex represents a unique sequence, where relative vertex size is proportional to the number of identical reads. Edges join vertices that differ by single nucleotide non-indel differences and clusters are collections of related, connected vertices. Ig gene usages and sequence annotation were performed in IMGT V-QUEST, where repertoire differences were performed by custom scripts in Python.
For the visual representations of the BCR repertoires, BCR network subsampling was performed using the cluster-enforced linkage sampling (CC) method to preserve the overall clonal structure. Briefly, the CC algorithm employs three steps to account for loss of connectivity between vertices in clusters during sampling: (1) Vertex selection: Vertices were reselected until the number of desired clusters in the original network G are represented.
(2) Cluster-vertex migration: For each cluster in the original network which contains more than one vertex that was sampled, vertices were reselected such that the cluster connectivity is retained in the sampled network.

Mutation analysis
To identify somatic mutations across synthetic lymphoma tumors a hybrid-capture platform was used with a bait set 14

Variant calling for substitutions and indels
Single base substitutions and short insertions and deletions were called using GATK 21