Recent expansion of the non‐recombining sex‐linked region on Silene latifolia sex chromosomes

Abstract Evolution of a non‐recombining sex‐specific region on the Y (or W) chromosome (NRY) is a key step in sex chromosome evolution, but how recombination suppression evolves is not well understood. Studies in many different organisms indicated that NRY evolution often involves several expansion steps. Why such NRY expansions occur remains unclear, although it is though that they are likely driven by sexually antagonistic selection. This paper describes a recent NRY expansion due to shift of the pseudoautosomal boundary on the sex chromosomes of a dioecious plant Silene latifolia. The shift resulted in inclusion of at least 16 pseudoautosomal genes into the NRY. This region is pseudoautosomal in closely related Silene dioica and Silene diclinis, indicating that the NRY expansion occurred in S. latifolia after it speciated from the other species ~120 thousand years ago. As S. latifolia and S. dioica actively hybridise across Europe, interspecific gene flow could blur the PAR boundary in these species. The pseudoautosomal genes have significantly elevated genetic diversity (π ~ 3% at synonymous sites), which is consistent with balancing selection maintaining diversity in this region. The recent shift of the PAR boundary in S. latifolia offers an opportunity to study the process of on‐going NRY expansion.


| INTRODUC TI ON
Evolution of sex chromosomes is arguably one of the most significant genomic changes that leads to wide-reaching consequences throughout the genome (Charlesworth, 2008(Charlesworth, , 2015Mank et al., 2014;Wright et al., 2016). Despite multiple independent origins of sex chromosomes (e.g. Bachtrog et al., 2014), their properties are similar in different organisms, featuring a genetically degenerate non-recombining sex-specific Y (or W) chromosome (NRY) and non-sex-specific X (or Z) chromosome that actively recombines in homogametic sex (Charlesworth, 2008). Previous work in mammals, Drosophila and plants shed light on many aspects of sex chromosome evolution (Charlesworth, 2015;Wright et al., 2016). For example it is clear that sex chromosomes typically evolve from autosome(s) that acquire sex-determining gene(s), stop recombining around that gene in the heterogametic sex (Ohno, 1967) and gradually degenerate afterwards (Charlesworth, 2008;Wright et al., 2016).
However, there appear to be many 'exceptions to the rule' (Furman et al., 2020), such as sex chromosome turnovers that are common in some organisms (Vicoso, 2019), the extent of sex chromosome | 1697 FILATOV differentiaton (Darolti et al., 2019), as well as differences in pace of sex chromosome evolution between plants and animals (Bergero & Charlesworth, 2011;Chibalina & Filatov, 2011;Krasovec, Chester, et al., 2018a). It has been argued that these 'exceptions' reflect reasonably well understood processes (Charlesworth, 2021) and only prove the 'rule'. Recent studies of non-model organisms with homomorphic sex chromosomes, such as frogs (Jeffries et al., 2018;Rodrigues et al., 2017), fish (El Taher et al., 2021) and plants (Harkess et al., 2020;Muller et al., 2020) revealed a highly dynamic picture, with frequent sex chromosome turnovers (Vicoso, 2019) and occasional recombination between the X (or Z) and Y (or W) chromosomes (Rodrigues et al., 2017), which expands the 'classic paradigm' of sex chromosome evolution (Kratochvil et al., 2021).
Despite the progress in understanding of sex chromosome evolution, many pivotal questions remain unanswered (Furman et al., 2020;Wright et al., 2016). In particular, how nascent sex chromosomes originate in the first place, and how recombination suppression between them evolves remain poorly understood (Charlesworth, 2017(Charlesworth, , 2021Ponnikas et al., 2018). The non-recombining sex-specific region tends to expand over time, including a larger proportion of the sex chromosome. Such expansions leave a characteristic signature of 'evolutionary strata'-lower divergence between the X and Y (or Z and W) chromosomes in regions that stopped recombining more recently ('younger strata') compared to older non-recombining regions ('older strata'; Lahn & Page, 1999).
Such stratification of divergence between the X and Y (or Z and W) chromosomes was reported in many organisms that evolved sex chromosomes independently from each other (e.g. Bergero et al., 2007;Lahn & Page, 1999;Zhou et al., 2014). Why such expansions of the NRY occur remains unclear and is actively discussed in the literature (Charlesworth, 2017;Jeffries et al., 2021;Lenormand & Roze, 2022;Ponnikas et al., 2018). Sexually antagonistic (SA) genes are thought to play an important role in this process (Charlesworth et al., 2014;Kirkpatrick & Guerrero, 2014;Rice, 1987), though relatively little experimental evidence in support of this plausible hypothesis is available (Charlesworth, 2018). Alternatives to the SA hypothesis, proposed to explain expansion of the NRY, include early emergence of dosage compensation (Lenormand & Roze, 2022), neutral divergence between the X and Y chromosomes (Bengtsson & Goodfellow, 1987;Ironside, 2010;Jeffries et al., 2021;Olito & Abbott, 2020) and sheltering of deleterious mutations by permanent heterozygosity in males (Charlesworth & Wall, 1999;Jay et al., 2022;Olito et al., 2022).
Evolution of a non-recombining sex-specific region on the Y (or W) chromosome restricts recombination in heterogametic sex to pseudoautosomal region (PAR) at the end of sex chromosomes. PAR has peculiar properties that distinguish it from both the sex chromosomes and the autosomes (Otto et al., 2011). In particular, PARs are often gene-rich, GC-rich and tend to have high recombination (Lien et al., 2000) and mutation (Filatov & Gerrard, 2003) rates, compared with other parts of the genome. Furthermore, due to their partial sex linkage PARs are thought to be the hotspots for the accumulation of sexually antagonistic mutations (Charlesworth et al., 2014;Kirkpatrick & Guerrero, 2014;Rice, 1987). While much research effort is focused on sex chromosomes, the PARs remain understudied, although some recent progress in understanding of PAR evolutionary dynamics has been achieved in mice (Morgan et al., 2019), humans (Ellis et al., 1990;Monteiro et al., 2021) and few plants.
In plants the analyses of PAR have, so far, been focusing on Silene latifolia (Campos et al., 2017;Guirao-Rico et al., 2017;Krasovec et al., 2020;Qiu et al., 2016), although papaya (Lappin et al., 2015) and brown alga Ectocarpus (Avia et al., 2018;Luthringer et al., 2015) were also studied to some extent. This paper extends the analysis of the PAR in S. latifolia and its close relatives with the aim to better characterise the boundary between the PAR and the rest of the sex chromosome and to test whether it differs between closely related species. The shifts of the PAR boundary result in expansion of the NRY, which may occur via inversions suppressing recombination between the sex chromosomes in the heterogametic sex, as has been proposed for sex chromosomes in multiple bird lineages (Zhou et al., 2014). Alternatively, the shift of the PAR boundary may be gradual, as suggested by the reports that the PAR boundary in mice (Morgan et al., 2019) and S. latifolia (Krasovec et al., 2020;Qiu et al., 2016) is 'fuzzy'.
Silene latifolia is a model species for studies of plant sex chromosomes (Bernasconi et al., 2009). This lineage has evolved separate sexes (dioecy) and sex chromosomes relatively recently, around 11 million years (my) ago (Krasovec, Chester, et al., 2018a).
Heteromorphic X and Y chromosomes are the largest in the S. latifolia genome and are clearly distinguishable under the microscope (Armstrong & Filatov, 2008), which made S. latifolia the species of choice for the studies of plant sex chromosomes since the first half of the last century (Warmke, 1946;Westergaard, 1946Westergaard, , 1958. Most (>700) species in the genus Silene are non-dioecious and separate sexes and sex chromosomes are clearly derived traits (Desfeux et al., 1996), making it possible to compare the sex chromosomes in the dioecious species to the 'ancestral' state preserved in nondioecious species of the same genus (Filatov, 2005).
At least one large expansion of the NRY has been reported for S. latifolia sex chromosomes, which resulted in formation of two evolutionary strata approximately 11 and 6 million years ago (Bergero et al., 2007;Filatov, 2005;Krasovec, Chester, et al., 2018a). A recent study (Campos et al., 2017) making a comparison of sex chromosomes between closely related S. latifolia and S. dioica that share the same or very similar sex chromosomes , suggested a second NRY expansion that occurred in S. latifolia since its divergence from S. dioica only ~120 thousand years (Liu et al., 2020).
Unfortunately, the previous analysis of the PAR boundary shift had limited resolution as it was based on six genes, only two of which were located in the region that is pseudoautosomal in S. dioica and sex-linked in S. latifolia. This paper extends that analysis to a larger set of genes near the PAR boundary to re-evaluate the evidence for the recent NRY expansion in S. latifolia, to establish the boundaries and to study the possible causes of NRY expansion. The analyses reported below confirm the recent expansion of the NRY in S. latifolia and reveal that the newly added evolutionary stratum contains at least 16 genes, but it is much smaller than the older strata on S. latifolia sex chromosomes that contain hundreds of genes (Bergero et al., 2015;Bergero & Charlesworth, 2011;Chibalina & Filatov, 2011;Papadopulos et al., 2015). This illustrates that recombination suppression can progress gradually or in small steps that shrink the PAR and result in NRY expansion.

| Plant material
This work is based on plants from S. latifolia genetic cross df108 described previously (Papadopulos et al., 2015). The df108 family is an F2 generation from crossing of maternal (fSa985) and paternal (Sa984) plants, with transcriptomes sequenced for the parents and 52 F2 progeny (see Table S1). Furthermore, the work involved wild accessions from three dioecious Silene species (S. latifolia, S. dioica and S. diclinis) and a non-dioecious Silene vulgaris that was used as an outgroup (

| Sequence data
Transcriptome sequence data for the genetic cross df108 were generated in the previous study (Papadopulos et al., 2015) and is available from NCBI bioproject PRJNA289919. RNA-seq data for wild samples included both published and unpublished datasets (Table S1). RNA was extracted from actively growing shoots and flower buds using a Qiagen RNeasy Plant Mini Kit, including the optional on-column DNAse digestion. Isolation of mRNA, cDNA synthesis and high-throughput sequencing were conducted according to the standard Illumina RNA-Seq procedure at the Oxford genomics facility of the Wellcome Trust Centre for Human Genetics (WTCHG).
All newly generated sequence data were submitted to the NCBI SRA database under bioproject number PRJNA826722.
The numbers of reads that successfully mapped to reference for each transcriptome are listed in Table S1. Single nucleotide polymorphisms (SNPs) were called with GATK HaplotypeCaller (McKenna et al., 2010) and the resulting multisample vcf file was imported to proseq3 v3.997 (Filatov, 2009). The dataset was filtered with proseq3 to leave only the genes which are located close to the PAR boundary in the previously published genetic map (Papadopulos et al., 2015). proseq3 was also used to assign coding regions to all sequences, based on annotation in the reference transcriptome (Chibalina & Filatov, 2011;Papadopulos et al., 2015) and filter sites in the alignments to reduce the data to 4-fold degenerate sites. Nucleotide diversity (π eq. 10.5 in (Nei, 1987)), K st * (Hudson et al., 1992), Tajima's D (Tajima, 1989) and Kelly's ZnS (Kelly, 1997) values were calculated with proseq3. HKA test (Hudson et al., 1987) was done using the 'direct mode' in dnasp v6.12.03 (Rozas et al., 2017), while MLHKA test was done with mlhka program (Wright & Charlesworth, 2004) that was integrated in proseq3. The 30 autosomal genes used as the reference set in the mlhka analysis were randomly selected from the genes mapped previously (Papadopulos et al., 2015). The sequence divergence from the outgroup S. vulgaris was calculated with mega v10.1.5 (Kumar et al., 2018).

| Identification of the putative Y-linked alleles
Transcriptome sequence data from parents (fSa985 and Sa984; Table S1) and 52 progeny (20 males and 32 females) of S. latifolia genetic cross df108 (Papadopulos et al., 2015) were used to identify the putative Y-linked alleles. The Y-alleles ('Y-SNPs') were identified as alleles always inherited from father to male progeny and never to female progeny across two generations. The corresponding X-alleles at these loci are always inherited from father to female progeny. The presence of these Y-SNPs was further analysed in wild male and female plants in three dioecious Silene species. To allow for missing data, two stringency thresholds were used, requiring the presence of the putative Y-SNP in (i) at least two wild males and no females and (ii) in all seven wild males and none of the females. This yielded sets of SNPs that are Y-linked in both the genetic cross and the wild samples, as well as the SNPs that show Y-linked segregation pattern, but are male-specific in one or more species, as shown on Figure 1a.

| Measuring divergence between homologous X-and Y-linked genes
Sequence reconstruction for the Y-linked homologs of X-linked genes was done as described previously (Chibalina & Filatov, 2011, Papadopulos et al., 2015. Briefly, to reconstruct the sequence of the Y-linked genes, Y-SNPs were used to identify the sequence reads (along with their paired reads) corresponding to the Y-linked allele and to call Y-consensus with proseq3 (Filatov, 2009). The homologous X-and Y-linked sequences were aligned with muscle (Edgar, 2004).
The resulting alignments were used to calculate sequence divergence at silent sites with proseq3 (Filatov, 2009).

| Gene expression analysis
Gene expression was measured as the number of read fragments per million reads mapped per kilobase of reference sequence (FPKM) calculated with RSEM (Li & Dewey, 2011). FPKM was calculated for each gene of each individual and gene expression in males and females was compared as log2(median(femaleFPKM)) − log2(median(m aleFPKM)).

| The data
The analysis presented below included previously published transcriptome sequence data from 52 progeny of S. latifolia genetic cross df108 (Papadopulos et al., 2015) as well as newly sequenced and previously published data for 14 S. latifolia and 14 S. dioica wild accessions grown from seed collected across Europe (Table S1). Each accession came from a different location to avoid the effects of population structure at the local scale (Wakeley, 1998). These samples include both males and females, which are analysed jointly or separately, depending on the type of analysis. Furthermore, the dataset included a male and a female Silene diclinis plants grown from seed collected in Valencia (Spain), as well as a non-dioecious Silene vulgaris that was used as an outgroup. As the aim of this work is to analyse and compare the PAR boundary region in S. latifolia and its close relatives, the analyses below focus on 57 genes (Table S2) in the proximity of the PAR boundary (according to the genetic map (Papadopulos et al., 2015)) for which sufficient sequence data was available for both sexes in S. latifolia, S. dioica and S. diclinis (Table S1).

| The location of the PAR boundary in S. latifolia and S. dioica
The location of the PAR boundary was established using a combination of segregation analysis in a previously published genetic cross F I G U R E 1 Location of the PAR boundary in Silene latifolia and its relatives. PAR extends to the left of the plot and the differentiated part of the sex chromosomes extends to the right. (a) The number of putative Y-linked SNPs per gene identified as segregating from father to sons in family df108 (Papadopulos et al., 2015) (grey bars) and as male-specific SNPs in the wild accessions of S. latifolia, S. dioica and S. diclinis (blue + green, yellow + orange and red bars, respectively). The blue + green bars for S. latifolia and yellow + orange bars for S. dioica show the numbers of Y-SNPs present in at least two and in all seven wild males of each species analysed, respectively. The black bars show the numbers of putative 'Y-SNPs' (defined by segregation in df108 cross) present in wild S. latifolia females. (b) K st * (Hudson et al., 1992) between males and females in the genes adjacent to the PAR boundary in S. latifolia (blue squares) and S. dioica (orange dots). K st * for S. diclinis was not calculated because only two accessions were available for this species. Dashed lines show linear regression separately fitted to S. latifolia and S. dioica data in the left, middle and right parts of the plot Number of Y-linked SNPs df108 (Papadopulos et al., 2015) and the analysis of polymorphism in wild accessions of S. latifolia, S. dioica and S. diclinis. In the genetic cross, the putatively Y-linked SNPs were identified as SNPs inherited from father to sons and not daughters. Gene cnt8383 and all 38 genes located proximally (to the right in Figure 1a) contain SNPs segregating from father to sons and not to daughters, indicating some degree of sex linkage. No such putatively Y-linked SNP were detected in the 18 genes located distally (to the left in Figure 1a) to the gene cnt8383 despite multiple SNPs segregating at these genes, suggesting their pseudoautosomal location.
Segregation in a genetic cross cannot detect rare recombination events and the size of fully sex-linked regions may be overestimated (e.g. (Charlesworth, 2021;Krasovec et al., 2020)). Thus, the presence of male-specific SNPs in unrelated individuals of the same species is a more accurate indicator of gene sex linkage. To account for inevitable missing data, two stringency levels were used, requiring male-specific SNPs to be found in at least two and in all seven analysed wild S. latifolia males, respectively ( Figure 1a). In both cases, the SNPs present in wild females were excluded. No male-specific SNPs were identified in wild S. latifolia samples in cnt8383 and in its proximal neighbour gene cnt9785, but nearly all the genes located more proximally contained at least one male-specific SNP, indicating complete or nearly complete sex linkage of the genes located proximally (to the right in Figure 1a) to cnt9785. Using the stricter requirement of Y-SNP presence in all seven wild S. latifolia males (green bars in Figure 1a), the PAR boundary may be located a bit more proximallyaround the gene cnt5552. This is also consistent with the presence of segregation-defined Y-SNPs in wild females in the genes located distally to cnt5552 (black bars in Figure 1a). The few cases of Y-SNP presence in females for genes located proximally to cnt5552 may be due to occasional SNP mis-calling or genuine rare recombination events between the X and Y chromosomes near the PAR boundary.
Consistent with the location of the PAR boundary in this region, K st * (an F st -like statistic for sequence data (Hudson et al., 1992)) between S. latifolia males and females is nearly zero (average K st * = 0.007) in the genes located distally to cnt9785. All more proximal genes have diclinis female for the genes located distally to cnt15301, while this is never the case for proximally located genes, where the putatively Ylinked SNPs are present in the male, but not in the female (Figure 1a). This is consistent with the same location of the PAR boundary in S. dioica and S. diclinis. For convenience and brevity, the three parts of the analysed region will be referred to as the 'left', the 'middle' (or 'mid') and the 'right' (Figure 1).

| Genetic diversity near the PAR boundary
The genetic diversity was measured for synonymous (four-fold degenerate) sites (Figure 2a,b,e; Table S2) to minimize the effect of purifying selection on the level of diversity at non-silent sites. Overall, genetic diversity is similar in the two species (Figure 2e), and the difference between S. latifolia and S. dioica females is not significant (π dioF = 0.0199 and π latF = 0.0196; paired t-test, p = 0.837). However, the diversity in S. dioica males is slightly but significantly lower compared to S. latifolia males (π dioM = 0.024 and π latM = 0.028; paired t-test, p = 0.021). In both species the overall diversity in males is significantly higher than in females (paired t-tests, p < 0.01), which is driven primarily by much higher polymorphism in the males compared to females in the right part of the region that is sex-linked in both species (π latMright = 0.0332, π latFright = 0.0093, p < 0.0001; π dioMright = 0.0263, π dioFright = 0.0097, p < 0.0001). The differences in genetic diversity between the sexes are not significant in the left (π latMleft = 0.0281, π latFleft = 0.0303, p = 0.22; π dioMleft = 0.0260, π dioFleft = 0.0276, p = 0.184) or the middle (π latMmid = 0.0220, π latFmid = 0.0196, p = 0.22; π dioMmid = 0.0215, π dioFmid = 0.0233, p = 0.184) parts of the region. The higher diversity in males in the right part of the region is likely driven by divergence between the X-and the Y-linked alleles, while recombination in the PAR in males prevents such divergence (Figure 2d). This divergence is significantly higher in the right compared to the middle region (mean dS XYright = 0.049; dS XYmid = 0.016; t-test, p < 10 −6 ), consistent with very recent recombination suppression in the mid region in S. latifolia males.

S. laƟfolia
The same HKA test for the comparison of synonymous variation in the left and the right regions in S. dioica females is not significant (χ 2 = 2.383, p = 0.1226), but the comparison of the left region with autosomal genes is significant (χ 2 = 4.368, p = 0.0366). As HKA test uses divergence from the outgroup to adjust for possible mutation rate differences between the genes, higher mutation rate in the PAR cannot explain the excess of genetic diversity in the pseudoautosomal genes compared with the X-linked genes in S. latifolia.

Synonymous genetic diversity at individual genes in the left
and middle parts of the region was compared to the X-linked genes in the right part and to autosomal genes using mlhka (Wright & Charlesworth, 2004), a maximum likelihood implementation of the HKA test (Hudson et al., 1987). In S. latifolia females, this analysis revealed significant excess of genetic diversity in 10 and 8 out of 20 pseudoautosomal genes (left region) when compared to the Xlinked (right region) and autosomal genes, respectively (Figure 2a; Table S2). The same analysis in S. dioica identified 7 and 6 PAR genes in the left region, compared with the X-linked and autosomal genes, respectively ( Figure 2b; Table S2). One of the S. latifolia PAR genes, cnt8437, contained significantly less genetic diversity compared with that in the X-linked and autosomal genes (Figure 2a; Table S2).
In both species only three out of 16 'mid' genes contained significant excess of polymorphism compared with the X-linked and autosomal genes ( Figure 2a; Table S2). None of the X-linked genes (right region) showed significant excess of diversity compared with autosomal genes in both species, but three and one of the X-linked genes showed significant lack of diversity in S. latifolia and S. dioica, respectively (Table S2).
As S. latifolia and S. dioica often hybridise in the wild, differential interspecific gene flow in different genomic regions could, at least partly, account for elevated genetic diversity in the PAR. Indeed, F st between the species (Figure 2h) is significantly lower in the PAR compared to the mid and right regions in females (t-test, p < 0.0001) and in the mid region in males (t-test, p < 0.05), consistent with reduced interspecific gene flow on the S. latifolia and S. dioica X chromosomes, as reported in the previous study (Hu & Filatov, 2016).
However, F st in the PAR is similar (t-test, p > 0.05) to that for the autosomal genes (Figure 2h), indicating that differential interspecific gene flow is unlikely to be the cause of high PAR genetic diversity.
This is consistent with the conclusion of the previous study  that explicitly modelled the effect of interspecific gene flow between S. latifolia and S. dioica on patterns of diversity in the PAR. It is worth noting that reduced F st in the right region for males is caused by sharing of numerous Y-SNPs by the two species because the divergence of the X-and Y-linked gametologs for the genes in the right region predates speciation of S. latifolia and S. dioica.

| Patterns of genetic diversity around the PAR boundary
The overall frequency spectrum of polymorphisms, summarised by the Tajima's D statistic (Tajima, 1989) is consistent with the left part of the region being pseudoautosomal in both species, the right being sex-linked and the middle being sex-linked in S. latifolia and pseudoautosomal in S. dioica (Figure 2f; Table S3). Although Tajima The patterns of linkage disequilibrium (LD) within genes, summarised by the Kelly's Z nS statistic (Kelly, 1997) are also consistent with sex linkage of the right part of the region in both species and F I G U R E 2 Level and patterns of DNA sequence polymorphism (a, b, e-h) in Silene latifolia and S. dioica and silent divergence from the outgroup S. vulgaris (c) and between X-and Y-linked gametologs (d) in the genes near the PAR boundary. The order of genes in the panels (a-d) is the same and the gene names are shown below panel (d). Dashed lines in panels (a-d) show linear regression separately fitted to data in the left, middle and right parts of the plot. The wide grey lines in panels (a) and (b) show the difference between sexes in nucleotide diversity per site (π) and the red stars below the zero line show significance (***p < 0.001; **p < 0.01; *p < 0.05 [with starts shown vertically]) for individual genes in mlhka test (Wright & Charlesworth, 2004) with the 21 X-linked genes from the right part used as a reference set. For autosomal genes used as a reference see Table S2. (d) dS between X and Y is shown only for genes in the mid and right regions; (e-h) Boxand-whiskers plots showing the quartiles, the mean (the cross) and the actual data points (circles) for the distributions of summary statistics in the genes in the left, the middle and the right parts of the region, as well as in 30 autosomal genes (aut ; Table S4) randomly chosen from the previously mapped genes (Papadopulos et al., 2015), for females (red) and males (blue) in S. latifolia and S. dioica. (a), (b), (e) Nucleotide diversity per site (π (Nei, 1987)) at 4-fold degenerate codon positions; (f) Tajima's D (Tajima, 1989), (g) Kelly's Z nS (Kelly, 1997) and (h) F st between S. latifolia and S. dioica | 1703 FILATOV the middle part in S. latifolia, but not in S. dioica (Figure 2g; Table S3).
Sex linkage is expected to inflate LD in the heterogametic sex and this is clearly visible in the Z nS distribution for the right region in males of both species (Figure 2g). LD is the lowest in the left region of both species, as expected for the genes in the PAR that actively ary (Papadopulos et al., 2015). Another possibility is that the middle region may be fully sex-linked in all the species analysed and the lack of male-specific SNPs in S. dioica and S. diclinis may be caused by a deletion of Y-linked alleles in these species. However, this would either imply independent deletions of the same region in S. dioica and S. diclinis, which is unlikely, or it implies that Y chromosomes of S. dioica and S. diclinis are more closely related to each other than to S. latifolia, which contradicts the available evidence. S. latifolia and S. dioica sex chromosomes share the same structure , which differs from that in S. diclinis .
Furthermore, S. latifolia and S. dioica are cross-compatible and widely hybridise in the wild, while S. diclinis is partially reproductively isolated, suggesting that it is more diverged from the other two species. This is also consistent with sequence divergence indicating that S. diclinis is slightly more diverged compared with S. latifolia and S.
dioica (Krasovec, Nevado, & Filatov, 2018b). Thus, a shift in the PAR boundary in S. latifolia appears to be a more likely explanation to the results presented above. Lower synonymous divergence between the X-and Y-linked gametologs in the mid compared with the right region ( Figure 2d) is also consistent with the recent shift of the PAR boundary in S. latifolia.
How big is the region added to the S. latifolia NRY as a result of recent PAR boundary shift? Our analysis identified 16 expressed genes that are located in the region of recent NRY expansion in S. latifolia (Figure 1). While it is difficult to assess the physical size of that region without high quality genome sequence, it is possible to evaluate its size given the number of genes in that region and the average gene density for genomes of similar size, such as the tobacco (Nicotiana tabacum) genome (Sierro et al., 2014). The gene density in that genome is roughly 15 to 20 genes per megabase, which suggests that the size of the recent NRY expansion in S. latifolia is at least 1 Mb long. This is a very crude estimate as gene density can vary by an order of magnitude between actively and rarely recombining regions of the genome. Also, it is likely a minimal size estimate because only a subset of actively expressed genes have been identified in this region. Although small, it is comparable to or larger than the sizes of the entire NRY in many of the plant species analysed, such as persimmon (NRY length ~1 Mb; (Akagi et al., 2014)), kiwi fruit (NRY length <1 Mb; (Akagi et al., 2019)) and Asparagus (NRY length <1 Mb (Harkess et al., 2017)). Thus, the recent NRY expansion in S. latifolia is substantial in size, though it is clearly much smaller than the older 'evolutionary strata' in S. latifolia that include hundreds of genes and are likely hundreds of megabases in size. It remains unclear whether the older larger strata originated via instantaneous step-wise recombination cessation in the entire stratum, or the process was more gradual. If recombination cessation occurred in multiple small steps, the recent expansion of the NRY in S. latifolia described above may represent one of such steps and it could be used to study the process of on-going NRY expansion.
The mechanisms involved in recombination suppression on the sex chromosomes are not well understood, but it is often assumed that inversions play a major role in recombination suppression and NRY expansion on sex chromosomes (Charlesworth, 2017). The spread of inversions expanding the non-recombining region may be driven by the advantage conferred by sheltering deleterious mutations by permanent heterozygosity in males (Charlesworth & Wall, 1999;Jay et al., 2022;Olito et al., 2022). It was recently shown that such inversions have a limited window of opportunity to fix (while still lightly loaded by mutations) and the outcome of this process is dependent on inversion size, with small inversions having elevated probability of fixation compared to neutrality (Olito et al., 2022).
log2 relative female to male expression An inversion is expected to suppress recombination at the same time all over the inverted region. It is curious that K st * in S.
latifolia increases with distance from the PAR (in the middle part in Figure 1b), which is not expected if recombination cessation occurred simultaneously for all the genes in the region, for example due to an inversion. The gradient in K st * could be created by a gradient in the extent of sex linkage along this region or by a gradual rather than stepwise shift of the NRY boundary. Gradual recombination suppression was hypothesized to be caused by neutral divergence between the PAR genes closely linked to the X and Y chromosomes (Bengtsson & Goodfellow, 1987;Ironside, 2010;Jeffries et al., 2021;Olito & Abbott, 2020). This hypothesis predicts elevated genetic diversity (due to X:Y divergence) only in the PAR region immediately adjacent to the PAR boundary, while the analyses presented above do not reveal such localised effect in proximal PAR genes (Figure 2a,b). It is possible that the PAR region analysed is too small to detect the expected pattern of diversity reduction in the PAR with distance from the PAR boundary.
However, given the length of the analysed PAR region in the genetic map is ~7 cM (Table S2), the distal genes in this region should be sufficiently decoupled from the differentiated part of sex chromosomes to reveal lower diversity compared to the proximal PAR genes.
Selection to link sexually antagonistic (SA) genes to sex is thought to be a plausible cause for the expansion of sex-linked regions by inclusion of ever larger proportion of pseudoautosomal genes in the NRY (Rice, 1987). SA selection in the PAR is expected to create distinctive footprints in DNA sequence variation, such as local divergence between the Y-and the X-linked alleles or between males and females, which can be detected as peaks of F st (or K st *) between the sexes and elevated local genetic diversity (Kirkpatrick & Guerrero, 2014;Qiu et al., 2013). The analyses presented above reveal significantly elevated genetic diversity in the PAR genes, compared with the X-linked genes, adjusting for ploidy and possible mutation rate difference in the HKA (Hudson et al., 1987) and mlhka (Wright & Charlesworth, 2004) tests. Balancing selection due to partial sex linkage in the PAR can only elevate diversity in the gene(s) immediately adjacent to the PAR boundary (Kirkpatrick et al., 2010). Qiu et al. (2016) estimate that in S. latifolia the region of elevated divergence due to sex linkage can be as short as half a kilobase (Qiu et al., 2016). Thus, balancing selection due to partial sex linkage is likely insufficient to elevate diversity in multiple pseudoautosomal genes, as shown on Figure 2a,b. While balancing selection due to sex linkage can only elevate diversity in the gene(s) immediately adjacent to the PAR boundary, SA selection in pseudoautosomal gene(s) could extend this effect to a much wider region in the PAR (Kirkpatrick & Guerrero, 2014). Thus, the presence of elevated polymorphism in multiple PAR genes is consistent with SA at some of the genes in the PAR.
Can interspecific gene flow blur the PAR boundary in S. latifolia and S. dioica? S. latifolia and S. dioica diverged about 120 thousand years ago and they widely hybridise across Europe (Liu et al., 2020;Muir et al., 2012). If these species differ in the location of the PAR boundary, their on-going hybridisation may blur the location of the PAR boundary, which could, at least partly, account for the 'fuzziness' of the PAR boundary previously reported in S. latifolia (Krasovec et al., 2020;Qiu et al., 2016). However, the effect of interspecific hybridisation on the location of the PAR boundary may depend on whether the X or the Y chromosome play the primary role in recombination suppression in S. latifolia middle region. Interspecific introgression of the Y chromosome is likely to be a rare event be-

| CON CLUS IONS
Evolution of the non-recombining region is the central process in sex chromosome evolution, yet relatively little is known about the mechanisms involved in this process (Bergero & Charlesworth, 2009;Charlesworth, 2017). The region adjacent to the PAR boundary may provide the clues of how cessation of recombination evolves and how NRY expands. This study extends the work devoted to the location of the PAR boundary and analysis of the adjacent region in S. latifolia and closely related species Campos et al., 2017;Guirao-Rico et al., 2017;Krasovec et al., 2020;Qiu et al., 2016). The analyses presented above indicate that the PAR boundary shift has occurred in S. latifolia since its divergence from its close relatives S. dioica and S. diclinis. The region newly added to the non-recombining region in S. latifolia is smaller than the two older evolutionary strata described in this species, but it contains at least 16 expressed genes and is likely to be over a megabase long.
Significantly elevated diversity at the pseudoautosomal genes are consistent with balancing and SA selection maintaining polymorphism in the PAR, which is in line with theoretical expectations for partially sex-linked genes near the PAR boundary (Kirkpatrick et al., 2010;Kirkpatrick & Guerrero, 2014).

AUTH O R CO NTR I B UTI O N S
DAF designed the study, generated the data, conducted the analyses and wrote the paper.

ACK N OWLED G EM ENTS
This work was funded by a BBSRC grant BB/P009808/1.

CO N FLI C T O F I NTE R E S T
The author has no conflict of interest to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in NCBI database at https://www.ncbi.nlm.nih.gov/, all reference numbers are listed in Table S1.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/jeb.14063.