Evolutionary Processes Driving the Rise and Fall of Staphylococcus aureus ST239, a Dominant Hybrid Pathogen

ABSTRACT Selection plays a key role in the spread of antibiotic resistance, but the evolutionary drivers of clinically important resistant strains remain poorly understood. Here, we use genomic analyses and competition experiments to study Staphylococcus aureus ST239, a prominent MRSA strain that is thought to have been formed by large-scale recombination between ST8 and ST30. Genomic analyses allowed us to refine the hybrid model for the origin of ST239 and to date the origin of ST239 to 1920 to 1945, which predates the clinical introduction of methicillin in 1959. Although purifying selection has dominated the evolution of ST239, parallel evolution has occurred in genes involved in antibiotic resistance and virulence, suggesting that ST239 has evolved toward an increasingly pathogenic lifestyle. Crucially, ST239 isolates have low competitive fitness relative to both ST8 and ST30 isolates, supporting the idea that fitness costs have driven the demise of this once-dominant pathogen strain.

carriage of S. aureus, mainly in the nares, but this bacterium is capable of causing serious invasive infections at a number of sites in the body (5). Antibiotic use has driven the epidemic spread of waves of resistant S. aureus strains; in particular, epidemic strains of methicillin-resistant S. aureus (EMRSAs) are now a global problem in health care settings and a growing problem in the community. For example, ST239 (a member of clonal complex 8 [CC8]) is a major EMRSA strain that has been the causative agent of multiple epidemics in health care settings around the globe. The first ST239 isolates were collected in the late 1970s in Australia (6). This was followed by the first EMRSA strain, also an ST239 and known as EMRSA-1, which emerged in the UK in 1981 (7). Between 1987 and 1988, more than 40% of MRSA isolates collected in England and Wales were ST239 EMRSA-1 (8), and ST239 became highly prevalent around the globe. By the mid-2000s, ST239 was the predominant MRSA strain in Asia, causing up to 90% of hospitalacquired MRSA within a region accounting for .60% of the world's population (9). More recently, reports from around the globe show that levels of ST239 MRSA have been decreasing dramatically, but the causes of this decline remain unclear (10)(11)(12).
Robinson and Enright proposed that ST239 was formed as a result of a large-scale chromosomal replacement event in which an ST8 clone acquired ;550 kb from an ST30 clone by recombination (13). Interestingly, five of the six known other cases of large-scale chromosomal replacement in S. aureus overlap to some extent with the acquired region of ST239 (13)(14)(15)(16), suggesting that this region may be a recombination hot spot (17). Large-scale chromosomal replacements have also been detected in a number of important AMR pathogens other than S. aureus, including K. pneumoniae ST258, Campylobacter coli ST1150, and Streptococcus agalactiae (18)(19)(20). In ST239, this acquired region also contains a large (60-kb) SCCmec-III element that confers resistance to a broad spectrum of antibiotics and heavy metals, and ST239 is thought to have spread in health care settings around the world as a result of the selective advantage of the resistance provided by this acquired element (21).
Although subsequent studies have provided good support for the Robinson and Enright model, important questions regarding the origin of ST239 remain unresolved (22)(23)(24)(25). All of the 2,979 ST239 genomes in the Staphopia database (26) that are associated with a typeable SCCmec element carry a type III SCCmec element, suggesting that this element was acquired early in the evolutionary history of ST239. However, the SCCmec-III element has not been identified in any ST30 genomes, including all 1,896 ST30 genomes in the Staphopia database. The absence of SCCmec-III in ST30 suggests that the true ancestor of ST239 may be a closely related lineage that has not been considered in previous studies of ST239. Second, the origins of ST239 remain unclear. It is often assumed that antibiotic-resistant pathogens evolve in response to treatment; in this case, it has been argued that the introduction of methicillin into clinical practice in the 1960s drove the evolution of ST239. However, recent work has shown that some other MRSA STs from CC8 predated the clinical introduction of methicillin (27), raising the possibility that ST239 may also have a deeper evolutionary history.
Finally, it is unclear why the prevalence of ST239 has recently declined. Acquiring DNA is usually associated with fitness costs (28)(29)(30), and many studies have shown fitness costs associated with acquired resistance genes (31)(32)(33), including large SCCmec elements (34,35). Given these costs, it is possible that the recombination event(s) that gave rise to ST239 created a low-fitness "hybrid" pathogen, although this has not previously been investigated in detail.
Here, we use a combination of computational and experimental techniques to investigate the underlying evolutionary processes that have driven the rise of ST239. First, we assemble a diverse collection of ST239 genome sequences to reconstruct the evolutionary history of this lineage and to test the Robinson and Enright model. We then use competition assays to test the hypothesis that ST239 has low fitness and to explore the link between AMR and fitness. Finally, we use population genetic approaches to understand how selection has operated in ST239 and to identify key genes involved in adaptive evolution in this lineage.

RESULTS AND DISCUSSION
Reconstructing the origin of ST239. The evolutionary drivers that contributed to the global emergence of the epidemic multidrug-resistant MRSA strain ST239 are not well understood; however, it has been suggested that the introduction of methicillin into clinics in the early 1960s may have been a contributing factor (36). To reconstruct the evolutionary history of ST239, we assembled a collection of 96 ST239 high-quality genomes from isolates that were collected from diverse geographic locations and time points, deliberately avoiding overrepresenting isolates from intensively sampled ST239 outbreaks (see, for example, references 37 and 22) (see Table S1A in the supplemental material).
We constructed a pangenome of 3,337 ST239 genes, of which 1,889 were present in 99 to 100% of all strains, resulting in a core genome length of 1,754,805 bp, representing a greater number of ST239 isolates from varied locations and dates than in previous evolutionary studies of ST239 (38)(39)(40). After we identified and excluded sites involved in recombination (41), 3,696 core variant sites remained, from which we reconstructed a phylogeny by maximum likelihood (Fig. 1A). The phylogeny had strong bootstrap support, and the long branch lengths were suggestive of a diverse set of ST239 isolates (38,42). The clustering of isolates into geographically distinct clades is consistent with the population structure observed by Harris in 2010 and by Castillo-Ramirez in 2012, who identified strong geographical clustering of ST239 sequences on continental, national, and city scales (38,42). In the present study, isolates from Oceania, Asia, and South America formed distinct clades, with rare exceptions. In contrast, North American and European isolates were dispersed throughout the tree.
We estimated a time to the most recent common ancestor (MRCA) of these ST239 sequences by fitting four evolutionary models using BEAST (see Table S2A) (43). The time to MRCA was consistent between all four models, with no significant difference between the models identified through Bayes Factor analysis. Therefore, the time to MRCA from the simplest model (strict molecular clock, constant population size) is recorded here, as 1940.1 (95% highest posterior density [HPD] intervals: 1934.7 to 1945.5). Similar results were obtained using BactDating (see Table S2B). Although there is some uncertainty in these estimates, these models predict that the origin of ST239 predated the clinical introduction of methicillin in 1959 (44) by more than 10 years.
Identifying the donor of the acquired region of the ST239 genome. Robinson and Enright initially proposed that ST30 was the closest known ancestor of the ST239 acquired-region (13), and they were able to identify potential boundaries of the recombination event that gave rise to ST239 (13). However, their conclusions were based on the partial sequencing of a small number of genes from representative isolates. To test the Robinson-Enright model using a large collection of genomes, we extracted the core genes shared by .99% of the isolates from the ST239 collection (n = 96), combined with additional collections of diverse ST8 (n = 111) and ST30 (n = 57) genomes. A total of 1,980 core genes were identified that were shared between .99% of all 264 genomes. Within each ST (ST239, ST8, and ST30), we generated a consensus sequence for each of these genes, and calculated the percentage similarity between each ST239 gene and its homolog in either ST30 or ST8 (Fig. 1C).
There was a clear distinction between regions of the ST239 genome that were closely related to ST8 and ST30, allowing us to clearly differentiate between the backbone and acquired regions of the ST239 genome ( Fig. 1D and E). This was consistent with Holden et al., who compared an ST239 genome sequence with a CC30 complete genome sequence and found that the acquired region was more closely related to CC30 by a shift of roughly 1% in the DNA percent identity compared to the backbone region (22).
Although the acquired region of the ST239 genome is similar to ST30, it is possible that the true ancestor of this region was a closely related lineage of S. aureus that was not considered in previous analyses of ST239. For example, all of the 2,979 ST239 isolates in the Staphopia database that are associated with a typeable SCCmec element carry a type III SCCmec element (see Table S3), but this element has not been identified in any of the 1,896 ST30 genomes in the Staphopia database (26), suggesting that ST30 may be a sister group, rather than the true ancestor of the acquired region. To systematically search for the ancestor of the acquired region, we used BIGSI to screen 447,833 bacterial and viral raw-read and assembled genomes for short sequences that match those found in the acquired region of the ST239 genome (45). Sequences from isolates that were closely related to the acquired region of the ST239 genome were mapped to the acquired region of the ST239 reference genome (TW20, NCBI accession number FN433596) and assembled into a phylogeny alongside the acquired region of the 96 ST239 genomes ( Fig. 2A). Crucially, we found that all ST239 isolates share a recent common ancestor with five ST30 isolates dating from the 1950s and 1960s (and one ST30 isolate of unknown origin), and this branch is well supported by bootstrapping. These five ST30 isolates (see Table S4A) were all from the penicillin-resistant S. aureus phage type 80/81 clone, which caused serious health care-associated and community-associated infections worldwide and was largely eliminated in the 1960s as a result of the introduction of methicillin (46,47).
Interestingly, none of the phage type 80/81 genomes that are closely related to ST239 contain an SCCmec element. The ubiquity of SCCmec-III in ST239 genomes suggests that the SCCmec-III element was acquired by the ancestor of the acquired region of ST239 following the divergence of this lineage from the phage type 80/81 lineage. However, it is possible that the SCCmec-III element was secondarily acquired following the chromosomal replacement that gave rise to ST239. To test the secondary acquisition hypothesis, we estimated the time to MRCA of the SCCmec-III elements in the collection of ST239 isolates (see Fig. S1 and Table S2A). However, there was considerable uncertainty in this estimate due to the small size of the SCCmec-III element (60 kb) and the high rate of recombination in this region of the genome. Given, these limitations, this analysis had limited power to reject the null hypothesis that the SCCmec-III element was acquired as part of the initial chromosomal replacement event that gave rise to ST239.
As a final test of the Robinson-Enright model, we used BIGSI to identify the closest known ancestor of the backbone region of the ST239 genome. Consistent with the Robinson and Enright model, this analysis identified ST8 as the closest ancestor of the ST239 backbone region (13). Specifically, the ancestor of ST239 was part of a diverse lineage of ST8 that has been isolated across multiple continents over the last 70 years ( Fig. 2B; see also Table S4B).
Dating the MRCA of ST239 isolates places an upper bound on the origin of ST239, but it is possible that ST239 originated prior to the MRCA of contemporary isolates. To place a lower bound on the origin of ST239, we estimated times to the MRCA of the acquired and backbone regions of ST239 with their respective ST8 and ST30 ancestors. The MRCA of the acquired region and ST30 was estimated as 1900.3 to 1926.5 (95% HPD), and the MRCA of the backbone region and ST8 was estimated as 1924.5 to 1934.7 (95% HPD). The overlap of these two estimates is encouraging, and this analysis suggests that ST239 is unlikely to have originated prior to the 1920s.
Recombination. Given that recombination created the ST239 lineage, it is possible that recombination has also played an important role in the subsequent evolution of ST239. To investigate this idea, we used ClonalFrameML to identify regions of recombination within the collection of 96 ST239 genomes ( triangle) and ST30 (red circle) consensus core gene sequences compared to the ST239 consensus core gene sequences (consensus sequences were formed from 96 ST239 genomes, 111 ST8 genomes, and 57 ST30 genomes, for 1,980 core genes that were shared between all 264 genomes). In the ST239 genome, the acquired region spans the origin of replication and hence is split between the beginning and the end of the linearized genome. from the TW20-like clade were excluded from this analysis due to high sequence similarity to the reference sequence that all ST239s were mapped to. The ratio of rates of recombination and mutation (R/u ) was 0.41, and the ratio of effects of recombination and mutation (r/m) was 2.37. Therefore, recombination events occurred 2.5 times less often than mutations; however, because each recombination event introduced an average of 6.0 substitutions, recombination overall was responsible for 2.4 times more substitutions than mutations (48). In line with previous analyses, we found evidence of recombination hot spots in regions of the genome containing mobile genetic elements (24). Recombination was particularly frequent in the region surrounding the f SA1 prophage, suggesting that this element, which borders the acquired region, may have played a key role in the recombination event that gave rise to ST239. Notably, the SCCmec-III element is also a hot spot for recombination.
Evolutionary consequences of large-scale chromosomal replacement. To begin to understand the evolutionary consequences of large-scale chromosomal replacement, we calculated the evolutionary rate of the backbone and acquired regions of the ST239  Evolution of S. aureus ST239 ® genome using BEAST (see Table S2A) (43). After removing recombination and the SCCmec element, the overall genomic substitution rate was estimated as 1.205 Â 10 26 single nucleotide polymorphisms (SNPs)/site/year (1.13 Â 10 26 to 1.28 Â 10 26 SNPs/site/ year; 95% HPD), which is similar to the substitution rates of other EMRSAs (2,(49)(50)(51). Interestingly, the substitution rate of the acquired region, 1.515 Â 10 26 (1.32 Â 10 26 to 1.71 Â 10 26 SNPs/site/year; 95% HPD), was 2 to 51% more rapid than the backbone, 1.21 (1.13 Â 10 26 to 1.29 Â 10 26 SNPs/site/year; 95% HPD). On the one hand, the rapid evolution of the acquired region could be a signature of rapid adaptive evolution, perhaps to overcome the costs associated with horizontal gene transfer. Alternatively, it is possible that the acquired region has evolved at a high rate due to weak selective constraints in this region of the genome.
To understand how selection has acted in the two genetic regions of the ST239 genome, we used the McDonald-Kreitman test to compare patterns of polymorphism and divergence in the backbone and acquired regions of the genome (52). The McDonald-Kreitman net neutrality index (N) indicates whether selection is overall purifying (N . 1) or positive (N , 1) (53).
In this analysis, we compared the acquired region of the 96 ST239 genomes with the homologous region in the previously defined collection of 57 ST30 genomes. In addition, we compared the backbone region of the 96 ST239 genomes with the homologous region in the previously defined collection of 111 ST8 genomes. Only "core" genes that were shared by more than 99% of all ST239, ST8, and ST30 genomes in the collections were included. The divergence between ST239 and ST30 in the acquired region (0.326 substitutions/Mb) was much greater than the divergence between ST239 and ST8 in the backbone region (0.188 substitutions/Mb), reflecting the fact that ST239 is more closely related to ST8 than ST30 in the respective regions. Divergence between STs at nonsynonymous sites was low relative to levels of within-ST polymorphism, demonstrating an overall trend toward purifying selection in ST239 (Table 1; N .1). However, we did not find any difference in the net neutrality index between the backbone (N = 2.06; 95% confidence interval [CI] = 0.92 to 2.51) and the acquired region (N = 1.52 [95% CI = 1.34 to 3.19]).
One weakness of this approach in this context is that the backbone and acquired regions of the ST239 genome have to be compared to different outgroups. Given that signatures of purifying selection should become stronger over time, it could be argued that the McDonald-Kreitman test is biased toward detecting purifying selection in the acquired region of the genome, which was compared to a more divergent outgroup. To further test the idea that selection on the acquired region has been weak, we calculated the ratio of GC!AT to AT!GC substitutions in each region of the ST239 genome ( Table 2). Spontaneous mutation in S. aureus is biased toward GC!AT transitions, and regions of the genome that are subject to weak selective constraints are therefore expected to have a high ratio of GC!AT to AT!GC substitutions (54). In this case, the  GC!AT/AT!GC ratio in the acquired region was significantly greater than that of the backbone region (Fisher exact test, odds ratio = 1.15, P = 0.0303). This analysis, which uses data on substitutions that have occurred during the diversification of ST239, provides evidence of relaxed selective constraints on the acquired region of the ST239 genome. Fitness costs of chromosomal replacement. To understand the fitness effects of chromosomal replacement more directly, we measured the competitive ability of a collection of S. aureus isolates in vitro (see Fig. S3). Our collection of isolates included divergent isolates of ST239 (n = 4), ST8 (n = 6), and ST30 (n = 5) that were deliberately chosen to avoid, including clonal isolates within STs (see Table S1D). If chromosomal replacement is costly, then we would expect ST239 isolates to have low competitive ability relative to ST8. The additional ST30 isolates provided a useful reference point to compare fitness values. Isolates were directly competed against each other in three different culture media; tryptic soy broth (TSB), brain-heart infusion (BHI) broth, and porcine serum (PS). These media impose different stresses that mimic some of the challenges encountered by S. aureus in clinical environments.
We deep sequenced (142Â to 211Â depth) each competition mixture before and after growth and estimated changes in the relative abundance of each isolate by quantifying the relative abundance of isolate-specific SNPs during competition. This assay produced highly reproducible measures of competitive ability for individual combinations of isolate and condition (median coefficient of variation = 16%; interquartile range = 10 to 40%; n = 35). The competitive ability varied between isolates, and we found a strong statistical interaction between ST and media, which reflects the fact that the average competitive ability of the three STs varied across media (Fig. 4, the first three panels, and Table 3; see also Table S5A). For example, we did not find any evidence of low fitness associated with ST239 in porcine serum. In spite of this variation in fitness, we found a significant difference in competitive ability between STs. Crucially, we found that ST239 had lower competitive ability than both ST8 and ST30 (Fig. 4, fourth panel; P , 0.05 [post hoc Tukey test]). The low fitness of ST239 relative to the ST8 is consistent with the idea that chromosomal replacement carries a long-term fitness cost. This hypothesis is further supported by the high fitness of ST30, which suggests that the Evolution of S. aureus ST239 ® difference in fitness between ST239 and ST8 reflects a low fitness of ST239 rather than a high fitness of ST8.
As a complementary approach to measure fitness, we also measured the growth rate of the individual ST239, ST8, and ST30 isolates in TSB, BHI and PS (see Fig. S4). There was significant variation in growth rate between isolates, media, and ST, showing that this trait is highly variable (see Table S5B). Crucially, we found that ST239 isolates had significantly lower growth rate than ST8 isolates, providing further evidence of costs associated with chromosomal replacement.
Parallel evolution. A recurring theme of studies of microbial evolution is that genes that are under strong positive selection evolve in parallel (55)(56)(57). To better understand the selective pressures that have shaped the evolution of ST239, we compared the distribution of observed mutations per gene with a neutral model derived from the Poisson distribution, in which mutations are randomly distributed across genes. This analysis was carried out independently for the acquired and backbone regions of the ST239 genome to take into account the differences in substitution rate between these genomic regions. Only the 1,980 "core" genes that were shared between the previously defined ST239, ST30, and ST8 genome collections were included.
The number of mutations per gene differed from the Poisson expectation in both the acquired region (x 2 = 19.158, P = 0.0014) and the backbone region (x 2 = 177.8, P , 0.0001), showing that substitutions are nonrandomly distributed across the ST239 genome. A subset of genes that show more evidence of parallel evolution than expected due to chance alone were defined as those that had 9 or more substitutions per gene. Our justification for this cutoff is that the Poisson distribution predicts that 1 or 2 genes in each region of the genome should have acquired 9 mutations or more due to chance alone, given an average of 2.53 substitutions per gene in the backbone region and 2.97 substitutions per gene in the acquired region. The proportion of genes showing evidence of parallel evolution did not differ between the backbone (20/1,659 = 1.21% genes) and acquired regions (6/316 = 1.90% genes), indicating that genes under positive selection are evenly distributed across the ST239 genome (chi-squared test, x 2 = 0.9818, P = 0.3218). However, of the genes showing evidence of parallel evolution, those in the acquired region had a much larger proportion of substitutions (N = 154 substitutions, 15.75%) than the backbone region (n = 284 substitutions; 6.61%). The acquired region includes the spa gene (n = 62 substitutions), which is potentially prone to sequencing errors due to the large number of repeats within this gene. Even after we excluded this gene, the remaining genes showing evidence of parallel evolution in the acquired region are significantly enriched for substitutions compared to those in the backbone (chisquared test, x 2 = 13.111, P = 0.000294). The high rate of substitutions in these genes suggests that the acquired region has been a hotspot for adaptive evolution in the ST239 genome. Note that this analysis is robust to the overall elevated substitution rate of the acquired region because it is based on a subset of genes that show a high rate of substitution compared to other genes in the region.
Interestingly, many (n = 7) of the genes that show evidence of parallel evolution (see Table S6) are involved in resistance to antibiotics, including vancomycin/daptomycin (walK), fluoroquinolones (grlA), b-lactams (ponA and mprF), and rifampicin (rpoB). To test for elevated resistance at phenotypic level, we measured the resistance of our isolates to a broad panel of antibiotics that have activity against S. aureus. ST239 isolates were resistant to a greater number of antibiotics ( Fig. 5; mean = 7.25; standard ; P = 0.0187), highlighting the high levels of AMR associated with this lineage. Given that antibiotic resistance tends to be costly, we tested for a tradeoff between resistance (measured as the number of resistance phenotypes) and the competitive ability at the level of individual isolates. We did not find tradeoffs between AMR and fitness in any culture medium (n = 3 conditions, P . 0.3), nor did Evolution of S. aureus ST239 ® we find a tradeoff between AMR and the overall average competitive ability of each isolate across all three conditions (F 1,13 = 1.33; P = 0.27; r 2 = 0.09). This analysis suggests that although ST239 has high resistance and low fitness, the low fitness of ST239 cannot simply be explained by the high number of resistant determinants found in this strain.
Conclusion. In line with previous work, our results support the hypothesis that ST239 was produced by a large-scale chromosomal replacement event in which an ST8 clone acquired .600 kb of DNA from an ST30 clone (13,22). We were able to refine this model by showing that the ancestor of the acquired region of the ST239 genome was most closely related to phage type 80/81 clones that were associated with the epidemic spread of penicillin resistance in the 1950s and 1960s (58). The most parsimonious explanation for the presence of the SCCmec-III in ST239 is that this element was acquired by the ST30 ancestor of ST239, following the divergence of this lineage from phage type 80/81. However, our analysis had limited power to detect secondary acquisition of SCCmec-III. We estimate that ST239 originated between the 1920s and 1945, in contrast to a previous study (25), but providing further support to other evidence that MRSA predated the clinical introduction of methicillin in 1959 (44). SCCmec-III provides resistance to first generation antibiotics that were used prior to the introduction of methicillin, such as tetracycline and erythromycin, and heavy metals, such as cadmium and mercury, that are used in disinfectants and biocides in health care settings (59,60), suggesting that these resistance phenotypes may have provided ST239 with a selective advantage prior to the introduction of methicillin.
Fitness costs of laboratory-evolved antibiotic resistance have been demonstrated in many studies (31,61,62). Although resistance tends to carry a cost, it is important to emphasize that this is an overall statistical trend, and it is clear that fitness effects of resistance are shaped by bacterial genetic background and culture conditions (63)(64)(65)(66). The cost of resistance in clinical pathogen populations have received less attention (66)(67)(68), making it difficult to predict how much variation in fitness to expect between resistant and sensitive strains. However, the variability in the fitness effect of resistance mutations across strains and conditions suggests that the fitness of clinical isolates should be variable. To take this variability into account, we measured fitness of multiple isolates of each ST across a range of culture conditions. We found extensive variation in competitive fitness between S. aureus isolates and culture conditions. In spite of this variation, we found a clear overall trend toward low fitness in ST239 relative to ST8, providing good evidence of a fitness cost associated with the evolution of elevated antibiotic resistance. This hypothesis is further supported by epidemiological evidence. ST8 is primarily found in the community, where antimicrobial use is low, whereas ST239 has mainly been restricted to health care settings where antimicrobial use is high, suggesting that the low fitness of this ST has restricted the spread of ST239 into the community (22,69,70). The acquisition of large SCCmec elements tends to generate a fitness cost, suggesting that the SCCmec-III element (which is the largest known SCCmec element) contributes to the low fitness of ST239 (see also reference 67). Although we found some evidence that the acquired region of the ST239 genome is subject to relaxed selective constraints, the evolution of this region is dominated by purifying selection, suggesting that the chromosomal replacement may have had costs beyond those associated with the acquisition of SCCmec-III.
Although the evolution of ST239 has been dominated by purifying selection, we found evidence of positive selection in genes that are implicated in antibiotic resistance, virulence and metabolism. Notably, many of the genes that show clear hallmarks of lineage-specific positive selection in the ST239 genome are associated with resistance to antibiotics that have been used to treat MRSA infections, such as ciprofloxacin (grlA), vancomycin (walK), and rifampicin (rpoB). These patterns of parallel evolution suggest that the ST239 has evolved to increase antibiotic resistance and virulence rather than to overcome the costs associated with chromosomal replacement and SCCmec-III acquisition.
Bacterial recombination is typically associated with the exchange of short DNA sequences between closely related strains or species (71), and the large-scale chromosomal replacements that have been detected in pathogenic bacteria are conspicuous exceptions to this overall trend (18)(19)(20). Our results support the idea that ST239 is a "hopeful monster" that has declined in prevalence due to fitness costs of chromosomal replacement, and an important goal for future work will be to understand the fitness consequences of chromosomal replacement in other dominant hybrid pathogens.

MATERIALS AND METHODS
Genome data retrieval and processing. Sequences with an MLST profile corresponding to ST239 were identified within the Staphopia database (26). The isolation date and geographic location were extracted from the metadata files. Additional ST239 sequences with corresponding isolation date and location metadata were identified from the NCBI database using MegaBLAST (72) and through a literature search. A total of 96 ST239 sequences were selected and downloaded from EMBL-EBI (see Table S1A). Similarly, a total of 57 ST30 and 111 ST8 genomes were identified and downloaded from EMBL-EBI (see Table S1B and C).
The boundaries of the SCCmec-III element were identified in the ST239 reference genome using BLAST against the reference SCCmec-III element under accession number AB03767159. The boundaries were confirmed using SCCmecFinder (76).
The SCCmec type of all 2,979 ST239 sequences in the Staphopia database was predicted using the Staphopia API. The Staphopia database was also mined for all sequences predicted to contain SCCmec-III. The MLST of these sequences was identified using PubMLST (77). The MLST was confirmed using ARIBA v2.13.3 (78), and the SCCmec type was confirmed using SCCmecFinder.
Construction of ST239 phylogeny. RaxML v8.2.9 (79) was used to construct a maximum likelihood phylogeny from the collection of 96 ST239 genomes that had been mapped to the ST239 reference sequence, using a GTR model with gamma correction for among site rate variation, which was replicated for 100 bootstraps, with recombination masked using Gubbins (41). The ST8 reference sequence, mapped to the ST239 reference sequence, was used as an outgroup. Genes were annotated with Prokka v1.13 (80). Gene function for genes that could be annotated by Prokka was identified in UniProt, and MegaBLAST (71) was used to identify genes where no annotation was found with Prokka. The pangenome and core genome (genes shared by .99% of the ST239 isolates) was extracted using Roary v3.12.0 (81).
Estimation of time to the MRCA of the ST239 collection. The time to the MRCA for the collection of 96 ST239 genomes was initially estimated from the maximum likelihood phylogeny (with recombination masked using Gubbins [41]), using the BactDating R package linear regression function (82). Mixedgamma, strict-gamma, and relaxed-gamma evolutionary models were run for 10,000,000 MCMC steps to estimate a time to the MRCA of the ST239 whole-genome sequences. The effective sample size (ESS) values for all parameters were greater than 100, indicating adequate sampling of the posterior distribution.
A total of 6,819 variant sites were extracted from the ST239 sequence alignments, after recombination was masked using Gubbins (39), using snp-sites v2.3.3 (83). Bayesian phylogenetic analysis was also carried out using BEAST version 1.10.4 (43), using the GTR nucleotide substitution model with all combinations of the strict and uncorrelated relaxed molecular clock models and constant and exponential growth models. The XML file was edited to reflect the number of unchanging sites in the original alignments. For each model, three independent MCMC chains with 300,000,000 steps were run and combined, with path sampling/stepping-stone sampling every 100 steps. In all cases, the Bayes Factor showed no significant difference in the likelihood of the different models, and therefore the estimated time to the MRCA from the simplest model (strict molecular clock, constant population size) was recorded. The burn-in was set at 10%, and runs were combined using LogCombiner, with a resample size of 10,000. The MRCA and evolutionary rates were estimated with 95% HPD intervals.
BEAST analysis was repeated for 95 ST239 SCCmec-III element sequences (n = 54 variant sites), using the same clock and nucleotide substitution models as previous (one ST239 sequence was removed from the analysis due to low mapping quality of the SCCmec-III region).
BLAST comparison of the ST239, ST30, and ST8 core genes. A multiple sequence alignment was constructed from the 96 ST239, 57 ST30, and 111 ST8 genomes that had been mapped to the ST239 reference genome. The shared pan-genome was calculated (n = 2,962 genes), and the core genes that were shared between .99% of all 264 genomes were extracted using Roary v3.12.0 (81) (n = 1,980 core genes). EMBOSS (84) was used to generate ST-specific consensus sequences from the core genes of each ST (ST239, ST8, and ST30). For each gene in the ST239 consensus sequence, the percentage identities compared to the homologous gene in the ST30 and ST8 consensus sequences were calculated using MegaBLAST (71). The ST239 core genes were defined as "acquired" (i.e., from the ST30-like region) or Evolution of S. aureus ST239 ® November/December 2021 Volume 12 Issue 6 e02168-21 mbio.asm.org 13 "backbone" (i.e., from the ST8-like region) depending on their position in the genome and similarity to the ST30 and ST8 consensus sequences.
Identifying the closest known ancestor of the acquired and backbone regions. The consensus sequences of the 316 core genes from the acquired region of the 96 ST239 sequences were queried for similar sequences in BIGSI using a k-mer threshold of 99. This allowed for an average divergence of around 10 SNPs per gene. The most closely related sequences were identified by ST using the Staphopia API. The MLST was double checked using ARIBA, and 28 sequences were removed due to uncertainty in typing, which indicated contamination.
The consensus sequences of the 1,659 core genes from the backbone region of the 96 ST239 sequences were also queried for similar sequences in BIGSI using a k-mer threshold of 99, as well as Staphopia, as described above. This allowed for an average divergence of around 9 SNPs per gene.
Sequences that shared at least 99% k-mer identity with .200 of the ST239 core genes from the acquired region and shared two or fewer MLST alleles with ST239 were downloaded from the EBI database. Only a selection of the 12 most closely related ST30, ST36, and ST39 sequences were included. These 143 sequences were mapped to the ST239 TW20 reference sequence, as described above. Four sequences were removed due to poor mapping quality (.25% gaps). The acquired region was extracted (minus the SCCmec element) and combined into a multifasta alignment with the acquired regions from the 96 ST239 sequences and 57 ST30 sequences that were described previously.
Sequences that shared at least 99% k-mer identity with .1,400 of the ST239 core genes from the backbone region and were not previously identified as ST239-like were then downloaded from the EBI database. All 32 non-ST239-like sequences were identified using the Staphopia API as ST8. These 32 ST8 sequences were mapped to the ST239 TW20 reference sequence, as described above. The backbone region was extracted and combined into a multifasta alignment with the backbone regions from the 96 ST239 sequences and 111 ST8 sequences that were described previously.
All variant sites were extracted using snp-sites v2.3.3, and RaxML was used to estimate a maximumlikelihood phylogeny of all 292 acquired-region sequences, using a GTR model with gamma correction for among site rate variation and replicated for 100 bootstraps, after recombination was masked using Gubbins as previously described. This was outgroup rooted to the corresponding region of the ST8 reference sequence. A maximum-likelihood phylogeny was also estimated for the backbone region sequences, which was outgroup rooted to the corresponding region of the ST30 reference sequence.
The acquired region of the 96 ST239 sequences and the closest related non-ST239 clade, consisting of six ST30 isolates, were combined into a multiple sequence alignment. Variant sites were extracted using snp-sites v2.3.3. To calculate the time to the MRCA, Bayesian phylogenetic analysis was carried out using BEAST version 1.10.4, as previous (for the GTR nucleotide substitution model with all combinations of the strict and uncorrelated relaxed molecular clock models and constant-and exponential-growth models). This was repeated for the backbone region of the 96 ST239 sequences and the closest related non-ST239 clade, consisting of 18 ST8 isolates (one ST8 sequence was removed from the analysis, since it had no associated date of isolation). In both cases, the Bayes Factor showed no significant difference in the likelihood of the different models, and therefore the estimated time to the MRCA from the simplest model (strict molecular clock, constant population size) was recorded.
Visualization of recombination in ST239. ClonalFrameML v1.0-20 was used to estimate regions of recombination from the ST239 phylogeny, before recombination was masked. Ten isolates from the TW20-like clade were excluded from this analysis due to high sequence similarity to the ST239 reference sequence.
Estimating evolutionary rates of the acquired and backbone regions. The evolutionary rates of the acquired (minus the SCCmec element) and backbone regions of ST239 were estimated using BEAST analysis on 5,353 variant sites from the 96 ST239 backbone region sequences, and 1,449 variant sites from the ST239 acquired region sequences. The GTR nucleotide substitution model was used with all combinations of the strict and uncorrelated relaxed molecular clock models and constant and exponential growth models, as previously described.
McDonald-Kreitman comparison of ST239, ST30, and ST8 core genes. All 1,980 core genes from the 96 ST239 genomes, 111 ST8 genomes, and 57 ST30 genomes were converted into amino acid sequence alignments. Each of the ST239, ST30, and ST8 consensus core gene sequences that were generated previously using EMBOSS (84) was also converted into consensus amino acid sequences.
The number of fixed synonymous and fixed nonsynonymous SNPs was calculated for core genes within the whole ST239 genome, core genes within the backbone region, and core genes within the acquired region using snp-sites v2.3.3. This analysis was repeated for the 111 ST8 genomes collection and the 57 ST30 genomes collection. The McDonald-Kreitman neutrality index (N) was calculated as N = (P n /P s )/(D n /D s ), where N is the net neutrality index, P n is the number of nonsynonymous polymorphisms, P s is the number of synonymous polymorphisms, D n is the number of nonsynonymous substitutions, and D s is the number of synonymous substitutions.
Competition experiments. Cryostocks of the 15 isolates were streaked on tryptic soy agar (TSA) and incubated at 37°C for 24 h. Single colonies were incubated for 24 h in 3 mL of TSB at 37°C with 225rpm shaking. One ml of each culture was combined into a single mixture and mixed thoroughly. Genomic DNA was extracted and purified from 1 mL of the mixture, and sequencing was carried out, as previously described.
A portion (6 mL) of the mixture was pelleted, washed three times in phosphate-buffered saline (PBS), and separated into six 1-mL aliquots. These were then diluted 50Â in either TSB, BHI, or PS. A total of 3 mL of each mixed culture was incubated at 37°C with 225-rpm shaking for 24 h. After 24 h, genomic DNA was extracted and purified, and sequencing was carried out, as previously described. All competition experiments were repeated in triplicate.
The DNA sequences from before and after each competition were mapped to the consensus sequence, as previously described. The number of reads supporting each unique variant site allele (for both the reference allele and the variant allele) was determined from the mapping of the raw sequence reads from each competition experiment. Any sites that were supported with a total of three reads or less, for both the variant allele and the consensus allele, were removed from the analysis to reduce the number of incorrect alleles due to sequencing error.
For each competition experiment, the number of supporting reads for each unique variant site was recorded. From this, the average coverage of all variants that were unique to each isolate was determined, before and after being exposed to the competition conditions for 24 h, to determine how the proportion of each isolate changed during each competition. The limit of detection for each isolate was also calculated, as M Z = 4/[(N Z 1 N Z 9)/V], where M is the minimum detection limit for isolate Z, N Z is the total number of reads in support of isolate Z at sites unique to isolate Z, N Z 9 is the total number of reads in support of non-Z isolates at sites unique to isolate Z, and V is the number of variant sites that are unique to isolate Z. Any isolate with an average coverage that was lower than the limit of detection was called at the limit of detection for that isolate. Raw competitive ability was calculated as the difference in log relative abundance of each isolate before and after competition in each replicate. Raw competition values were then rescaled, such that the mean competitive ability in each culture medium was equal to zero. This small correction factor was used to account for the fact that the true final density of some isolates was below the minimal detection threshold.
Sequencing DNA from isolates for competition experiments. Cryostocks of the 15 isolates were streaked on TSA, followed by incubation at 37°C for 24 h. Single colonies were incubated for 24 h in 3 ml of TSB at 37°C with 225-rpm shaking. Genomic DNA was extracted and purified using the Qiagen DNeasy blood and tissue kit and, following the protocol for purification of bacterial or yeast DNA with enzymatic lysis, using QIAcube. Sequencing was carried out using an Illumina HiSeq4000 system with 150-bp paired-end reads by the Oxford Genomic Centre, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
The sequences were de novo assembled into contigs using SPAdes v3.13.0 (85) and ordered against the ST239, ST30, or ST8 reference sequences using abacas v1.3.1 (86). Sequences were annotated using PROKKA or UniProt, as previously described. Using Roary, we defined core genes as genes that were shared among all 15 isolates. Unique variant alleles were defined for each isolate, which were identified in only a single isolate, using snp-sites v2.3.3 (79).
To verify the validity of each unique variant site, each DNA sequence was remapped to the consensus sequence using bwa and SAMtools (71). The number of reads supporting each unique variant site was extracted, and only unique variant sites that could be used to accurately identify a single isolate and that were supported with four or more reads were included as true unique variant sites.
AMR profiles. Cryostocks of the 15 isolates were streaked on TSA, followed by incubation at 37°C for 24 h. Single colonies were incubated for 24 h in 3 ml of MH2 at 37°C with 225-rpm shaking. Cultures were diluted 100Â in MH2 broth (to a density of ;1 Â 10 6 CFU/ml) according to the MicroNaut evaluation protocol for MicroNaut-S MRSA/GP. Then, 100 ml was added to each well of a MicroNaut-S MRSA/GP plate, followed by incubation for 18 h at 37°C with 225-rpm shaking, after which the OD 595 was measured. AMR breakpoints were assessed according to EUCAST standards (87). This was repeated in triplicate for each isolate. Tests for ampicillin and penicillin always gave the same results; hence, we considered the result from these two antibiotics as a single test score.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.