Impact of stochastic physics and model resolution on the simulation of tropical cyclones in climate GCMs

: The role of model resolution in simulating geophysical vortices with the characteristics of realistic tropical cyclones (TCs) is well established. The push for increasing resolutioncontinues, with general circulation models (GCMs) starting to use sub-10-km grid spacing. In the same context it has been suggested that the use of stochastic physics (SP) may act as a surrogate for high resolution, providing some of the beneﬁts at a fraction of the cost. Either technique can reduce model uncertainty, and enhance reliability, by providing a more dynamic environment for initial synoptic disturbances to be spawned and to grow into TCs. We present results from a systematic comparison of the role of model resolution and SP in the simulation of TCs, using EC-Earth simulations from project Climate-SPHINX, in large ensemble mode, spanning ﬁve different resolutions. All tropical cyclonic systems, including TCs, were tracked explicitly. As in previous studies, the number of simulated TCs increases with the use of higher resolution, but SP further enhances TC frequencies by ; 30%, in a strikingly similar way. The use of SP is beneﬁcial for removing systematic climate biases, albeit not consistently so for interannual variability; conversely, the use of SP improves the simulation of theseasonalcycleofTCfrequency.AninvestigationofthemechanismsbehindthisresponseindicatesthatSPgeneratesbothhigher TC (and TC seed) genesis rates, and more suitable environmental conditions, enabling a more efﬁcient transition of TC seeds into TCs. These results were conﬁrmed by the use of equivalent simulations with the HadGEM3-GC31 GCM.


Introduction
The simulation of tropical cyclones (TCs) in contemporary climate models remains a challenge, with a systematic underestimation of both the number of TCs and their intensity (Shaevitz et al. 2014;Walsh et al. 2015;Roberts et al. 2020).While the importance of atmospheric model resolution in reducing these biases has long been recognized (Camargo and Wing 2016;Emanuel 2018) and increasing the horizontal resolution has been demonstrated to improve the representation of TCs in climate models (Jung et al. 2012;Murakami et al. 2012;Strachan et al. 2013;Murakami et al. 2014;Knutson et al. 2013;Manganello et al. 2014;Roberts et al. 2015;Bhatia et al. 2018;Bacmeister et al. 2018;Roberts et al. 2020), alternative stochastic approaches have been put forward in recent years [see Palmer (2019) for a motivational discussion].Stochastic approaches account for unresolved processes and variability in models, and have the potential to complement some of the benefits of resolution.Moreover, these ideas explicitly acknowledge the multiscale nature of the atmosphere and the role of scale interactions for weather and climate variability.In this study we analyze the statistics of TCs in a systematic set of ensemble climate simulations, carried out at varying horizontal resolutions-with and without the inclusion of stochastic schemes-in order to quantify and compare their impact.

a. Stochastic physics and uncertainty in climate prediction
Traditional parameterization schemes rely on the assumption that the physics of the unresolved processes is uncoupled from the dynamics of the flow.However, in the presence of upscale energy cascades (e.g., through convection) the assumption clearly does not hold and nonlinear interactions between different scales [i.e., from meso-a to meso-g (2-20 km)], comprising mesoscale convective systems to TCs, as well as organized convection like the Madden-Julian oscillation, cannot be adequately modeled.Stochastic physics (SP) parameterization schemes aim to account for the missing nonlinear interactions between unresolved processes, as well as their impacts on the large scale, and thus to account for some of the missing scale interactions in models (Palmer 2001;Williams 2005;Khouider et al. 2010;Slingo and Palmer 2011).
The development of SP schemes for weather and climate models was pioneered by the numerical weather prediction community and motivated by the need to explicitly represent uncertainty due to model errors on subgrid scales (Buizza et al. 1999;Shutts and Palmer 2007;Plant and Craig 2008;Teixeira and Reynolds 2008;Berner et al. 2009;Bengtsson et al. 2013;Sanchez et al. 2014).However, they are now commonly used in many operational forecasting centers; see Leutbecher et al. (2017) for a recent overview.In the context of atmospheric seasonal forecasts, Berner et al. (2012) argue that, for tropical precipitation biases and tropical variability, increasing the horizontal resolution has a rather small impact compared to the use of stochastic schemes or improved physical parameterizations.
SP schemes have also become increasingly relevant for climate simulations, mainly due to their potential role in modifying the mean state, via noise-induced drift processes, and thus reduce intransigent biases [see Berner et al. (2017) for a review and Palmer (2019) for a more recent comprehensive perspective].A common theme in all these studies is the crucial role played by nonlinear interactions of the stochastic scheme with convective processes, which allow even a zero-mean perturbation to profoundly impact the model climate attractor.It is thus expected that SP also influences tropical storms in general and TCs in particular [see Stockdale et al. (2018) for an assessment of the SP impact on ECMWF's seasonal forecasts].
It should be stressed that the computational costs of adding SP to state-of-the-art climate models are very small indeed: while the doubling of resolution typically causes costs to rise by a factor of 8-10, stochastic perturbations such as those used in this study increase the costs by only ;5%.

b. TCs and prediction uncertainty
The simulation of TCs challenges all our current capabilities: their multiscale nature has justified the use of any and all affordable resolutions (e.g., Jung et al. 2012;Roberts et al. 2015Roberts et al. , 2020)).Their low annual frequency and large variability, from days to decades, require the use of ensembles and long simulations, including for predictions under climate change (Yoshida et al. 2017;Mei et al. 2019).Their sensitivity to the large-scale environment requires minimal model biases, which depends on the quality of model dynamics and physical parameterizations (LaRow 2013; Murakami et al. 2014;He and Posselt 2015;Kepert 2012;Camargo et al. 2020).
There are many challenges and opportunities in the simulation of large-scale TC drivers: the governing nature of El Niño-Southern Oscillation (ENSO) on TC location and frequency has been known for a long time (e.g., Gray 1984;Chan 1985;Lander 1994;Camargo et al. 2007b;Bell et al. 2014), but it is also increasingly evident that the presence of strong TC activity in the northwest Pacific can affect the Niño-3.4index three months later (Wang et al. 2019).Thus the simulation of TCs could benefit from the improved simulation of ENSO using SP (e.g., Christensen et al. 2017); conversely, a better simulation of TC location, intensity, and frequency, particularly interannual variability, could benefit the prediction of ENSO in global seasonal and climate simulations [e.g., as implied by Wang et al. (2019)].
Prediction of TCs on decadal and climate time scales is also important to provide longer-term mitigation planning, in particular for coastal communities, which are particularly vulnerable to changes in TC track and intensity in a changing climate [see quantification of TC-related precipitation in Guo et al. (2017) and Franco-Díaz et al. (2019)].
There is consequently considerable uncertainty between climate models as to the climatology, variability, and changes in TCs with climate change (Camargo and Wing 2016;Knutson et al. 2020).Vecchi et al. (2019) have shown, in the context of the prediction of future changes in TCs, that pre-TC synopticscale disturbances (which they call ''TC seeds'') are the main drivers of the simulated response (see also Sugi et al. 2020;Yamada et al. 2021), where they also discuss changes in the large-scale environment governing their origins and development.Such disturbances are substantially weaker than the fully formed TCs, so that their representation in GCMs is even more uncertain [Slingo et al. 1994; see also the conclusions of Hodges et al. (2017) with regard to the skill in current reanalyses].Other recent papers, such as the downscaling study of Emanuel (2021), use prescribed seeding, implicitly suggesting that this process is of secondary importance in the study of climate change and that what matters are changes in the TC environment.

c. TCs and GCM resolution
For the past 20 years the climate modeling community has been investigating the role of model resolution in the simulation of TCs; for example, Bengtsson et al. (1995), building on the work of Broccoli and Manabe (1990) and inspired by questions in Evans (1992), successfully simulated the climatology of TCs (then called tropical cyclone like vortices) with ECHAM3 at T q 106 truncation ('125-km Dx on a quadratic Gaussian grid).A comprehensive history of these studies has been traced in a review paper (Camargo and Wing 2016) that spans NWP and seasonal applications; Walsh et al. (2013) and Emanuel (2018) also summarize the impact of resolution on TCs in the context of climate simulations.
More recently, the modeling community has organized a number of intercomparison projects focusing on TC simulation [see, e.g., results from the U.S. CLIVAR Hurricane Working Group in Shaevitz et al. (2014)].TCs are also one of the target phenomena in PRIMAVERA-HighResMIP project (Haarsma et al. 2016), which aims to systematically understand the role of model resolution in the context of climate simulations.
Based on previous studies, the current consensus is that realistic TCs ''emerge'' in climate models at sub-100-km resolution, and their track densities, as well as interannual variability by basin, start to look credible at about 20-km resolution (less so for the Atlantic basin), as shown for instance in Shaevitz et al. (2014) and Roberts et al. (2020).GCMs have also demonstrated increased skill in simulating interannual variability of TCs as resolution is increased (Zhao et al. 2009;Strachan et al. 2013;Roberts et al. 2015), but large intermodel and intra-ensemble uncertainties remain, so that ensemble sizes of at least O(10) are required (Yoshida et al. 2017;Mei et al. 2019;Roberts et al. 2020).Credible simulations of TC intensity remains elusive (for surface winds, but some GCMs simulate TCs that are too deep; Manganello et al. 2012).Roberts et al. (2020) demonstrate that only one of the PRIMAVERA GCMs (CNRM) presents a credible pressure-wind relationship, and even a few of the DYAMOND (i.e., Dynamics of the Atmospheric General Circulation Modeled on Non-hydrostatic Domains) GCMs (Stevens et al. 2019), with an average resolution of 5 km, struggle with this aspect [see further details in Judt et al. (2021)].
SP thus seems to be particularly well suited for the simulation of TCs because it targets unresolved variability, delivering some of the benefits of resolution, while enabling the use of large ensembles by being parsimonious.The focus of the present study is to investigate the relative impact of enhancing resolution and/ or using SP on the representation of TCs in climate simulations.
The main questions of this study are the following: Can SP be an efficient surrogate for model resolution, and, specifically, can it be beneficial for the simulation of tropical cyclones?Therefore two hypotheses will be considered with regards to the role of SP: 1) SP injects flow-dependent noise into the simulations, increasing the initial number of (cyclonic) tropical disturbances that can possibly grow into TCs [i.e., TC seeds, as in Vecchi et al. (2019)].2) SP creates a more favorable large-scale environment for genesis and development of TCs, revealed as increased likelihood of TC seeds transitioning into TCs.
It is also possible that both hypotheses are valid in tandem, and we use explicit tracking, as well as an empirical approach (index-based), to investigate both.The specific aspects under investigation are the climatology of TC numbers and geographical distribution, as well as their interannual variability; TC intensity, while briefly analyzed in this study, will be the focus of a follow-on paper, including the analysis of Climate-SPHINX (i.e., Stochastic Physics High Resolution Experiments) coupled (ocean-atmosphere) and climate change simulations.
The paper continues with a description of the models and analysis methods in section 2, and the results are presented in sections 3-5, spanning means and variability of TCs, including its drivers.A discussion is given in section 6 and a summary and final conclusions in section 7.

a. EC-Earth configuration, and the Climate-SPHINX project campaign
In the present work, EC-Earth version 3.1 is used, exploiting a set of atmosphere-only simulations carried out within the Climate-SPHINX project [see Davini et al. (2017), which also provides details of the scientific configuration].
All the simulations span the period from 1979 to 2008 (30 years) where well-mixed GHGs, stratospheric ozone, and volcanic aerosol concentrations follow the CMIP5 (phase 5 of the Coupled Model Intercomparison Project) protocol (historic forcing ending in 2005, then RCP8.5;Moss et al. 2010).Sea surface temperatures (SSTs) and sea ice concentration used as boundary conditions have been obtained from the HadISST2.1.1 dataset (Titchner and Rayner 2014), modified to provide daily increments suitable for high resolution (Kennedy et al. 2017, the same used in HighResMIP).
The hydrostatic EC-Earth atmospheric component is based on the Integrated Forecasting System (IFS) and has been tuned and improved for climate purposes by the EC-Earth Consortium (Hazeleger et al. 2010).It is important to stress that, for a clear interpretation of the sensitivity studies in this paper, model tuning has been performed only for the T255 deterministic (hereafter referred to as BASE) configuration, and all other simulations are performed without retuning, in order to enable a clear understanding of the role of resolution and SP.Therefore, energy budgets at resolutions different from T255 present small biases [see Davini et al. (2017) for details].
The Climate-SPHINX simulations comprise several ensemble members over a range of five resolutions from T L 159 ('125 km) to T L 1279 ('16 km; see also Table 1) but retain the same vertical grid configuration (L91), as hybrid sigma levels with the last full level at 0.01 hPa.For each resolution-which is defined by the spectral truncation-half of the ensemble members have the stochastic physics parameterizations activated (STOC) and half are run with only the BASE configuration.The prefixes for each experiment are shown in the last column of Table 1, and the suffix B indicates BASE experiments, while S stands for STOC, so that ''CAB'' is the BASE experiment at COARSE resolution and ''CAS'' is the corresponding STOC experiment.The number of ensemble members starts at 10 for the lowest resolutions, for each of BASE and STOC, and decreases to 1 for the highest resolution (see Table 1), due to computational costs.The stochastic parameterizations used within EC-Earth comprise two different schemes: the stochastically perturbed parameterization tendencies (SPPT) and the stochastic kinetic energy backscatter (SKEB) scheme (Berner et al. 2009;Palmer et al. 2009;Davini et al. 2017).Both schemes are always used together in all STOC experiments in SPHINX; the SPPT configuration is exactly the same across all model resolutions, while SKEB uses different backscatter ratios at each resolution.
The SPPT scheme focuses on the uncertainty arising from the existing subgrid parameterization schemes (including radiation, clouds, convection, turbulence and boundary layer processes, and gravity wave drag) using a multiplicative noise approach.The SPPT scheme perturbs the total diabatic tendencies for T, U, V, and q at each time step, using the same perturbation field e, which is the sum of three independent random fields with horizontal correlation scales of 500, 1000, and 2000 km.These fields are evolved in time using an autoregressive process with lag 1 [AR(1)] on time scales of 6 h, 3 days, and 30 days, with fields standard deviations of 0.52, 0.18, and 0.06, respectively (Leutbecher et al. 2017).
The SKEB scheme (Berner et al. 2009) was developed for the ECMWF IFS model, which is a spectral model, and computes the backscatter of kinetic energy based on the dissipation rates from deep convection, numerical dissipation, and gravity mountain wave drag.This upscale transfer of energy is observed in the real atmosphere, albeit absent in traditional (deterministic) climate simulations.The kinetic energy lost in the model at the smallest scales, due to dissipation, is scattered upscale through perturbation of the streamfunction at the largest scales.The SKEB scheme streamfunction perturbations are modulated using the same stochastic spectral pattern as for SPPT, except that the perturbations vary in height as well as in space.
The relative contributions and total dissipation rates are shown in Berner et al. (2012) in their Fig. 1.The dominant contributor is deep convection in the tropics.Numerical dissipation is the second largest contributor.Berner et al. (2012) pointed out a weak resolution dependence of these calculations and suggested, based on the apparent underdispersion of the forecast ensembles, that there are additional sources of model uncertainty that are not captured by the scheme.
The analysis of initialized seasonal forecasts with the IFS model in Weisheimer et al. (2014) revealed that the SKEB scheme had almost no impact, while a positive impact was found for the SPPT scheme, in particular in regions where deep convection plays a major role, such as the western tropical Pacific.In the same study, SPPT also shows a positive impact on the Madden-Julian oscillation (MJO) amplitude distribution and ENSO forecast quality.

b. The unified model
The configuration of the global coupled model HadGEM3-GC3.1,submitted to the CMIP6 HighResMIP (Haarsma et al. 2016), is described in Roberts et al. (2019): it incorporates a global atmosphere-land configuration called GA/GL7.1 (Walters et al. 2019), forced by the same SSTs and sea ice used for the SPHINX experiments.The HighResMIP protocol was followed, which recommends the use of the MACv2-SP scheme (Stevens et al. 2017) for simplified and standardized aerosol forcing.This specifies the change of anthropogenic aerosol optical properties over time, enabling easier comparison between different models, while retaining the model's own aerosol mean background climatology and therefore requiring little or no additional tuning.It is used here in place of the prognostic GLOMAP-mode scheme (Mulcahy et al. 2018).
The nonhydrostatic atmospheric model uses a regular latitude-longitude grid, and has 85 levels extending to 85 km.The HadGEM3 simulation analyzed in this study has a Dx ' 60 km at midlatitudes (N216).Compared to the spectral IFS, this formulation brings with it substantial differences in numerics, the effective resolution (see Klaver et al. 2020), and in particular the dissipation rates (convective and numerical dissipation only).
We use simulations with and without SP: the three ensemble members taken as control correspond to the official GA/GL7.1 scientific configuration, using SP, while three sensitivity experiments disable SP entirely or else use an individual SP scheme at a time.The GA/GL7.1 scientific configuration encodes SP schemes similar to EC-Earth, albeit following entirely independent implementations, including their parameter settings: stochastic perturbation of parameters (SPT) and SKEB-type [SKEB2; see Sanchez et al. (2016) for details].Tennant et al. (2011) describe the SKEB2 system as developed for the Unified Model (UM; the parent of HadGEM3).Differences in the dissipation rates of the UM, compared to the IFS, are ascribed to differences in the convection schemes between the models.Another noteworthy difference between the schemes is that the SKEB2 scheme not only provides a forcing for the streamfunction (as in the IFS), but also acts as a forcing to the velocity potential [see Fig. 1 in Sanchez et al. (2016) for details], which represents an important difference from the SKEB scheme used in Climate-SPHINX.
The same Climate-SPHINX period was extracted from each for the purpose of TC tracking in this paper.

c. Tracking and TC identification
TCs are tracked and identified on an annual basis (January-December in the NH and July-June in the SH), using 6-hourly data from each individual simulation using the same methodology as in Hodges et al. (2017), there applied to reanalyses, and previously used in several other model-based studies of TCs (e.g., Manganello et al. 2012;Strachan et al. 2013;Roberts et al. 2015).Initially all cyclonic systems are tracked using the vertical average of relative vorticity in the layer 850-600 hPa, at a common resolution of T63, between 608S and 608N.This initial step, bringing all model fields to a common lowresolution truncation (here T63), has been used for the past 20 years in order to enable a fair comparison between models with different formulations and resolutions, and has been shown to reduce noise and to produce more coherent tracks and more complete TC life cycles, including their very early stages.Cyclonic disturbances (TC precursors) are initially identified as grid point maxima (NH) or minima (SH), exceeding a magnitude of 5 3 10 26 s 21 at each time step (scaled by 21 in the SH), and the off-grid locations are then obtained using B-spline interpolation and steepest ascent maximization, resulting in smoother tracks.Tracks are initialized using a nearest neighbor method and are then refined by minimizing a cost function for track smoothness, subject to adaptive constraints for track smoothness and displacement distance within a time step.Following the tracking, tracks are filtered to retain those that last longer than 2 days.To allow warm core criteria to be applied, the T63 vorticity maxima (minima in the SH) are recursively added to the 2-day tracks at levels 850, 700, 600, 500, 400, 300, and 200 hPa, using the B-spline interpolation and maximization to assign a value if the maximum is within a 58 radius of the center at the previous level.Also added to the tracks are the full-resolution MSLP minima and 10-m maximum wind speeds.For MSLP, B-splines and steepest descent are used using the 850-hPa vorticity center as the starting point for the minimization and assigning minima if they are within a 58 radius of the tracked vorticity center.For the 10-m winds a direct search for the maximum wind of the grid points within 68 radius of the tracked center is performed.
TCs are identified from among all the tracked cyclonic systems (TC precursors) using incremental criteria: 1) tracks must start within a latitudinal band (308S-308N) and last for at least 2 days (TC seeds); 2) the T63 relative vorticity at 850 hPa must attain a threshold of at least 6 3 10 25 s 21 ; 3) the difference in vorticity between 850 and 250 hPa (at T63 spectral truncation) must be greater than 6 3 10 25 s 21 , to provide evidence of a warm core (via thermal wind balance); 4) the T63 vorticity center must exist at each level between 850 and 250 hPa for a coherent vertical structure; and 5) criteria 2-4 must be jointly attained for a minimum of four consecutive time steps (one day) and only apply over the oceans.
Note that the initial vorticity thresholds are very low by design [e.g., two orders of magnitude smaller than what is used in Hsieh et al. (2020)], as the TRACK approach to identification is in the use of the structural criteria described above.A common identification procedure based on a small set of criteria is also beneficial, in that it reduces the number of subjective choices to be made, such as resolution-dependent thresholds, and removes the identification as an uncertainty in comparing the results from different models.Some of these aspects, including comparison to other tracking methods, are covered in Roberts et al. (2020).
Following the tracking, spatial statistics including the track and genesis densities are computed using spherical kernel estimators (Hodges 1996).
The benefit of this approach is that the full life cycles of TCs are identified, including the precursor stage (e.g., easterly waves; Thorncroft and Hodges 2001;Serra et al. 2010;Yang et al. 2019), as well as other precursors (Fine et al. 2016) and post-TC stages (Sainsbury et al. 2020;Baker et al. 2021).After all TCs have been identified with the criteria above, TC precursors are further split, via an additional analysis stage, and only for the specific goal of the tables in section 3b, into tropical (308S and 308N) and extratropical (poleward of 308S or 308N).TRACK follows a Lagrangian approach that identifies each cyclonic feature explicitly, and is complementary to the work on TC seeds in Vecchi et al. (2019), who used instead bandpassfiltered variances, with a period of 3-10 days.A more recent paper by Hsieh et al. (2020), continuing previous studies of TC seeds, used explicit tracking, as in this study, albeit performed on the original model grid.

d. The genesis potential index and its terms
To complement the understanding of what controls TC formation, distribution, and lifetime, both in terms of climate means and variability, we have computed the genesis potential index (GPI; see e.g., Emanuel and Nolan 2004;Camargo and Wing 2016) as where h is the absolute vorticity at 850 hPa, H is the 600-hPa relative humidity, V pot is the potential intensity, and V shear is the magnitude of 850-200-hPa wind shear.All GPI terms were computed by following the methodology of Bister and Emanuel (2002), as implemented in the Python script provided online.1All relevant variables, as monthly means for each ensemble member, were transferred from CINECA (Bologna, Italy), and climatologies, as well as basin time series, were built for each experiment.The same analysis was carried out by applying the algorithm to the ERA5 reanalysis, in order to provide an observational foundation.
3. Mean climatology of TC frequencies, distributions, and their response to the large-scale environment a. Basin statistics Figure 1 shows the number of tropical depressions, tropical storms (combined), and tropical cyclones, by category, identified by TRACK in each hemisphere.We have used a classification based on MSLP, as in Klotzbach et al. (2020), because of their strong arguments with regard to the study of the impact of TCs.This classification also enables us to identify TD1TS in IBTrACS, and at the same time it avoids the peculiarities of surface layer extrapolation of model level winds, a problem that has been particularly evident in recent intercomparisons (see, e.g., Roberts et al. 2020).Figure 1 shows the different SPHINX simulations (BASE with B suffix and STOC with S suffix), organized with increasing resolution (from left to right) and each SP experiment (S) inserted between each pair of resolutions (B).The height of the bars for each category suggests that SP is equivalent to an in-between model resolution when it comes to how many TCs are produced in each hemisphere.The results compare well to past studies, indicating that, as model resolution is increased, more TCs are produced in the climate simulation, with an initial steep increase for the weaker categories, up to T255, then tending to reach a plateau.As in previous GCM intercomparison studies (e.g., Roberts et al. 2020), a realistic simulation of the frequency of TC categories 4 and 5 is still elusive, even at the highest resolution (T1279), which nonetheless demonstrates a tendency for resolution and SP to both be beneficial.A recent intercomparison study by Judt et al. (2021), in which the PRIMAVERA-HighResMIP IFS model participated, albeit with convection explicitly represented, indicates that the full spectrum of TC intensities (both winds and central pressure) does become more realistic at 5-km resolution for that particular model configuration.
The climatology of the spatial distribution for the cyclonic disturbances counted in Fig. 1 confirms that these are indeed tropical cyclones, with spatial signatures that bear strong resemblance to observations.Figure 2  Tables 2 and 3 also provide some overall statistics that are useful to interpret the information in the track density maps (Fig. 3), as well as the ensemble spread and the systematic differences between resolutions, which are significant, given the small ensemble spread.
The comparison of Figs. 2 and 3 demonstrates two things: first, similar to previous studies (Zhao et al. 2009;Strachan et al. 2013;Roberts et al. 2015), enhancing resolution increases the number of simulated TCs, which is reflected in the enhanced track density and better agreement with observations.The results in Fig. 3 are quite unique in terms of their robustness, afforded by the large ensemble and the five different resolutions.The resolution effect is apparent in each of the major TC basins and is consistent in both sets of simulations, BASE and STOC, as each column is inspected individually.For the southwest Pacific domain, it is obvious that all models at moderate to high resolution, independent of the use of SP, overestimate the track densities and counts, as compared to observations.This is a common feature in GCMs, particularly for AGCMs, but also partially reflects observational uncertainty, due to how TCs are counted in different basins, as discussed in Hodges et al. (2017).Similar errors occur over the north Indian and South Atlantic domains, and have been amply discussed in previous publications (e.g., Hodges et al. 2017;Roberts et al. 2020).
Additionally, and similar to the resolution effect, SP enhances the number of simulated TCs, which is revealed by comparing the first and second columns, as well as their differences in the third column.A striking resemblance between the STOC plot in each row and its companion BASE plots in the left column (same row and one row down) supports a strong and systematic finding for TC simulation: the use of SP is equivalent to an increase in resolution, albeit not quite equivalent to a full doubling of resolution.By inspection of the panels in Fig. 3, the model response is spatially coherent across resolutions, with the track densities tending to increase rather than shift, with few local exceptions.The individual ensemble members are also coherent with each other (not shown).Further, the differences between STOC and BASE are shown to be robust in the TC-active regions, as indicated by the Welch test (Wilk 2011).
The SP effect is most pronounced and consistent in the northwest Pacific, and presents a local maximum in the southwest Pacific, to the east of Australia.In other basins (e.g., the North Atlantic, where most GCMs still struggle to simulate a realistic number of TCs), the response to SP is not consistent at all resolutions, with a decrease at COARSE, then an increase at LOW, and a dipole response, reflecting differences in the eastern versus western North Atlantic, for MEDIUM, HIGH, and ULTRA-HIGH resolutions.In terms of significance, there is an interplay between information provided by the large ensemble sizes at COARSE resolution and the stronger signal at HIGH and ULTRA-HIGH resolutions.In other words, taking the  ensemble mean of the larger ensembles at COARSE to MEDIUM resolution plays less of a role than the stronger signal found in HIGH and ULTRA-HIGH, which is further supported by the larger numbers seen in Tables 2 and 3.
An additional question arises as to whether increased track density is due to changes in the TC to area ratio (e.g., larger number of TCs in the same area) or to unchanged numbers, albeit with longer-lasting tracks.Examining the genesis density statistics in Fig. 4, as well as the information on interannual and seasonal variability shown in later sections, it is clear that the use of SP enhances genesis by up to 30%, thus reflecting the first mechanism.However, the genesis density change is not uniform in all basins: the northwest Pacific and north Indian basins show increases that are consistent across resolutions and rather widespread.The eastern North Atlantic also shows a consistent increase at the first three resolutions, at the location where easterly waves encounter the ocean (Thorncroft and Hodges 2001), as well as the western North Atlantic region just north of Colombia, which is important for Gulf TC tracks and east Pacific TCs (Serra et al. 2010).Other locations, such as the eastern Pacific, see a reduction in genesis at MEDIUM and HIGH resolution, which could be a response to the representation of the mountains in Central America, which have increased height and complexity at high resolution, affecting the propagation of atmospheric waves from the Caribbean.

b. Statistic of transition from precursors to TC seeds and to TCs
We now turn to addressing the two complementary hypotheses on the role of seeding and the role of environments favorable for cyclogenesis.Table 2 shows the TC precursors, TC seeds, and full warm-core TC counts for the Northern Hemisphere (including ensemble spreads) and their differences, as percentages; Table 3 shows the counts for the Southern Hemisphere.In both hemispheres we start with around 10 000 TC precursors per year [an equivalent number of extratropical cyclone (ETC) precursors are found poleward of 308]; next, around 3000 TC seeds (2-day systems) are found, finally ending up with less than 100 TCs, in broad agreement with observations.The ensemble spread is very small at each resolution, around 2%, and independent of the use of SP.
Both tables show that TC precursors and seeds increase systematically with resolution in both BASE and STOC, but the increase from BASE to STOC is larger than the increase caused by enhancing resolution (even across all five resolutions); this is true in both hemispheres.Further, the relative increase in TCs caused by the use of SP is seen both in the number of precursors, compatible with the first hypothesis, as well as in the last transition, to full warm-core TCs, supporting the second hypothesis.
Comparing the numbers in terms of percentages, the SPinduced increase in precursor generation is between 23% and 29% (23%-27% in the SH), with lower-resolution simulations showing a larger percentage increase.However, the last column of data shows that the extra transition into full TCs is in the range of 22%-36% (19%-42% in the SH).The impact of SP on the final transition to warm-core TC further increases the initial probability of generating a TC from precursors, and is less systematically dependent on resolution, although it appears to increase overall in the SH.This spatially dependent resolution response (hemispheric here, and basin-dependent in later sections) is counterintuitive, as the (dominant) SPPT scheme perturbations are the same at all resolutions, and this requires careful interpretation (see section 6).

TRANSITION FROM PRECURSORS TO 2-DAY CYCLONES IN THE EXTRATROPICS
Given the fact that SP has been shown in Watson et al. (2017) to preferentially impact the tropics in terms of moisture availability, and given previous studies on the effects of SP on ETCs, it is interesting to contrast TC and ETC transition rates in the context of a single study.The statistics for ETCs, shown in Tables 4 and 5, indicate that we start from a number of precursors almost identical to what has been found in the tropical region, still accompanied by small ensemble spreads.However, for the ETC precursors, differences between 10% (12% in SH) for COARSE and 7% (5% in the SH) for ULTRA-HIGH were found in terms of sensitivity to SP; for the (2-day) ETCs, the differences are even smaller (1%-4%), indicating that SP plays some role in seeding, albeit only a limited role in conditioning the ETC environment, as ETC dynamics is different from TC dynamics, and less dependent on convection.

c. Probability of cyclogenesis and its environmental control
The probability of cyclogenesis [PC; similar to survival rate (SR) in Yamada et al. (2021, their Eq. (1)] is the local ratio between the frequency of TCs and that of TC seeds: PC 5 TC/TC seed .
We computed the spatial distribution of PC by taking the ratio of track density for TCs and TC seeds at each grid point.All PCs are computed separately for each of the 60 simulations and then combined as ensemble and multiyear averages, in order to illustrate the regional impacts of SP (columns) and resolution (rows).Figure 5 shows that PC for track density is as high as 30% in large regions of TC activity: the highest values are found in the northwest Pacific and the largest changes correspond to the change in resolution, with SP showing a similar (albeit smaller) response.A few cases of reduction in PC when using SP are found, as large as 210%, as seen for instance in the eastern Pacific.These results are comparable with those in studies considering developing versus nondeveloping tropical synoptic disturbances identified in reanalyses in both the Atlantic and northwest Pacific (Fu et al. 2012;Peng et al. 2012;Brammer and Thorncroft 2015;Hankes et al. 2015), where PC can be as high as 30% or more.

d. Location and relative magnitude of seeding and cyclogenesis
To better understand the SP response of TCs seen so far, DTC, we break it into two terms at each resolution: (i) extra formation of TCs due to an increase in the number of TC seeds, versus (ii) extra transition from TC seed to TCs, due to cyclogenesis.These terms are directly related to our two hypotheses and we call them the seeding term and the cyclogenesis term: Ideally, the two rightmost terms should add to the leftmost exactly, but in practice this is an approximation, as the seeding term will tend to affect the genesis stage of TCs, while the cyclogenesis term applies to their entire lifetime, so that the contributions from the two terms cannot be expected to be exactly collocated, even for relatively large regions, as resolved by the COARSE simulations.For the large sample size used in this paper, however, the approximation holds well (also verified by plotting the difference between left-hand side and the sum of the right-hand side terms; not shown).Figure 6 shows the DTC in response to SP at each resolution for the ensemble means and the breakdown into the contribution of TC seeding term versus the contribution of the cyclogenesis term.The seeding response to SP is mostly positive and largest in areas where TCs spend most of their lifetime, as well as slightly sensitive to resolution.The cyclogenesis term has a stronger response to SP, and it can result in either an increase or decrease of TCs in each region, most notably with a decrease in the eastern Pacific and over most of the Indian Ocean.The cyclogenesis term also responds to resolution more vigorously, and in an apparently linear way.
It can be said in summary that SP acts to enhance TC seeding overall, and this effect is slightly sensitive to resolution; however, cyclogenesis is more important for transition, and more sensitive to both resolution and SP, able to cause both an increase and a decrease of TCs simulated in a region.This justifies turning our attention to the TC environment simulated by the SPHINX models.

e. GPI terms and their control on the climatology of TCs in each basin
To understand systematically the location and magnitude of changes due to resolution or SP, a number of difference fields for TCs and their environment are computed.We have organized the plots so that the leftmost column in each of the four panels of Fig. 7 shows the response to the introduction of SP at each resolution (STOC 2 BASE), while the rightmost column shows the response to one increment of resolution (e.g., LAB 2 CAB, MAB 2 LAB, and so on), while retaining the BASE model formulation (no SP is used).
The DPC values are plotted as contours overlaid onto changes in environmental conditions as used in the calculation of GPI (shear, relative humidity, vorticity, potential intensity), shown in Fig. 7.The convention is that signs for GPI terms are presented exactly as shown by the GPI equation, so that, for instance, a large GPI shear term corresponds to a small magnitude of vertical wind shear and a positive change in the GPI shear term corresponds to a decrease in vertical wind shear.For potential intensity, we have also computed all terms, in particular the thermodynamic efficiency.We plotted GPI and the three most significant GPI terms in Fig. 7: for GPI itself, SP acts to reduce PC in the eastern Pacific and around the Maritime Continent, while acting to increase PC in the North Atlantic MDR, in the SPCZ, in the Indian Ocean, and in the northwest Pacific.Increasing the model resolution acts to increase PC in the eastern Pacific, as well as in all main TC activity regions.At LOW to MEDIUM resolution, the spatial patterns for the regions where PC increases are very similar to each other.
Considering the GPI terms individually, SP acts to moisten the environment in the western part of each basin, mostly where PC increases, while drying the environment in the eastern Pacific, where PC decreases (e.g., for MEDIUM and HIGH).Model resolution appears to cause more of a drying effect at LOW to MEDIUM resolution, particularly around the Maritime Continent, with the exception of the Bay of Bengal and the northwest Pacific at MEDIUM resolution and the eastern Pacific at LOW and MEDIUM resolution.
The use of SP causes a stronger shear term, which is clear in the eastern Pacific and to the east of the Maritime Continent, corresponding to reduced PC, and, to a lesser extent, a weakened shear in the northwest Pacific, increasing PC.The resolution enhancement reduces shear overall, and this is beneficial to PC, particularly in the eastern Pacific.
The potential intensity maps show that the use of SP tends to decrease the simulated PI, while an increase in resolution tends to increase PI.The effect is rather widespread and, while located predominantly in the tropics, it does not show any particular spatial correlation with the regions where PC changes.Emanuel et al. (2013) identify three thermodynamic variables useful to understand the formation of TCs: 1) the presence of moist convection, 2) midtropospheric humidity, and 3) potential intensity (PI).We have focused so far on the last two variables, part of GPI, and their role in the conversion of TC seeds; our findings are supportive of the conclusions in Vecchi et al. (2019) and Yamada et al. (2021).The role of moist convection may not be revealed by GPI, but can be investigated by the examination of the OLR fields in the EC-Earth model.Results from Davini et al. (2017) (their Table 2), indicate that top-of-the-atmosphere emitted longwave radiation is 238.71W m 22 for COARSE BASE and 241.74 W m 22 for ULTRA-HIGH STOC, with a pattern of nearly linear increase.Examination of the OLR field (not shown) reveals that both resolution and SP act, instead, to reduce radiative loss at the locations where TCs are present, and particularly so where PC rates are high.The overall model behavior is thus consistent with deep convection being present at the locations of TC activity, and strongest at the locations of maximum PC.The tropical impact of SP is in fact consistent with the broad moistening of the tropical atmosphere shown by Watson et al. (2017) for the same model.These two aspects of large-scale versus local-scale response point to a potential feedback between TCs and their environment, which may be enhanced or hindered by SP, such as via the multiplicative nature of the SPPT scheme.

Interannual to seasonal variability and environmental controls on TCs
Linking the interannual variability (IV) of TCs to the simulated environment in each experiment can be used to understand what the main drivers of the response to resolution and SP are, as well as uncovering any improvements in predictability, defined as skill in predicting year-to-year changes in TC frequency.The other reason for focusing on IV is that predictive skill at this level provides an important degree of trust in a model's dynamics and physics, as it is harder to tune for variability than for climate means, and IV provides a stricter test for a model to be deemed ''fit for purpose'' in terms of climate change applications [see discussion in Vidale et al. (2003)].
In this section, statistics for the interannual variability of TC counts for the entire period are presented by basin, for all simulations.First, Table 6 demonstrates how the use of SP increases interannual variability across resolutions and basins, with few exceptions, mostly representing an improvement, with the notable exception of the SWPAC, where IV is overestimated at truncations of T255 (LAB) and above.This is an important result, as GCMs, particularly at low resolution (as seen in COARSE BASE), tend to underestimate variability, and it is not clear a priori how an ensemble of AGCMs, all driven by the same observed SSTs and sea ice, might react to the addition of SP.It remains to be seen whether this additional variability originates from signals contained in the SST/sea ice and radiative (aerosols) forcings, potentially a sign of climate predictability, or whether it is just internal variability (noise).
It is of particular interest and value to the community to improve the prediction of interannual changes in TC frequencies by basin, based on previous evidence of potential predictability (see Zhao et al. 2009;Roberts et al. 2015Roberts et al. , 2020)).To this end, we focus on four traditional basins: the North Atlantic (Fig. 8a), the northwest Pacific (Fig. 8b), the south Indian Ocean (Fig. 8c), and the southwest Pacific (Fig. 8d), showing the time series of TC counts at each resolution, as well as correlations with observed TC statistics (IBTrACS).
The counts and variability at COARSE resolution are both underestimated but increase as we move to the bottom of each panel, toward ULTRA-HIGH.In the North Atlantic the correlation between the IBTrACS counts and the number of TCs identified in the Climate-SPHINX models increases from 0.23 to 0.35 (COARSE to HIGH), but is then reduced to 0.19 at ULTRA-HIGH.The correlations in STOC are mostly lower, except for LOW resolution.It is particularly noticeable how all models entirely miss the exceptional TC season of 2005, but are able to capture some La Niña responses (e.g., in 1995).For the northwest Pacific, correlations are once again modest, and slightly higher as we increase the resolution (range from 0.15 to 0.28); the addition of SP in STOC would seem beneficial in COARSE, LOW, and MEDIUM.There is moderate skill in the south Indian Ocean, with correlations from 0.25 to 0.37, with some improvement from COARSE to HIGH (also reflected in the means).SP improves the IV correlations at COARSE to MEDIUM, but this decays with HIGH and ULTRA-HIGH.BASE and STOC also appear to be distinctly different at HIGH resolution in the last 10 years of simulation, also based on the small ensemble spread.For the southwest Pacific, it is clear that there is a significant overestimate of the number of TCs, which had also been observed in reanalyses (Hodges et al. 2017), and made worse by SP; the representation of IV is similar to that seen in the North Atlantic in the range COARSE to MEDIUM, but decaying after that.SP seems to make no significant improvement, or even to give a negative correlation at HIGH resolution.Overall, the interannual variability skill is poor when compared to other models (see section 6); moreover, by using a two-tailed Kolmogorov-Smirnov test (p 5 0.05) we cannot state that any of the differences encountered, in any of the basins, are significant, despite the sizable ensemble.2017), for each of the SPHINX experiments, averaged over the respective ensemble (see Table 1 for details).For comparison, the IBTrACS mean annual cycle for the 1979-2008 period is added on the right-hand side.

a. Seasonal variability
Figure 9 shows that, for the North Atlantic, the number of TCs simulated in each month increases with resolution, as well as with the use of SP, resembling more and more the IBTrACS observations; further, for MEDIUM, HIGH, and ULTRA-HIGH, the seasonal peak is reached in August without SP, but in September using SP, providing a better match with observations.The start and end of the TC seasons are less sensitive to the use of SP.For the northwest Pacific, all experiments show a delay in the peak of the season, and a too late end, but this error is mitigated by the use of the highest resolutions, combined with SP.
A separate analysis (not shown) of North Atlantic basin 850-600-hPa humidity and 200-850-hPa vertical wind shear shows that, in the August-September transition, both are slightly more favorable in this basin, but no coherent spatial pattern (e.g., associated with the positioning or magnitude of the steering anticyclone) is found.The nature of this seasonal response appears thus to be associated with individual TCs.

b. GPI terms and the modulation of the annual number of TCs in each basin
We extended our GPI analysis to its temporal behavior, considering monthly means of GPI terms in each basin, using, wherever possible, the same domain definitions as in Wing et al. (2015), albeit altering the seasons to July-November for the Northern Hemisphere, and November-March for the Southern Hemisphere, for two reasons: 1) these new definitions of season provide significantly stronger correlations, when compared to those shown in Wing et al. (2015); and 2) a longer, more homogeneous seasonality, also closer to operational criteria, makes it possible to retain the same definitions for the analysis of climate change experiments.
All GPI terms were computed for each ensemble member at each grid point, and the results are shown as basin and ensemble means, together with an envelope that shows the ensemble spread.The figures contain both the seasonal evolution and the annual means for the particular months considered in Figure 10 shows the temporal evolution of the GPI term for each resolution in two basins: the North Atlantic and northwest Pacific, superposed on the ERA5 estimates.All experiments do a reasonable job of reproducing GPI, including seasonal to multiannual temporal signatures.The SP experiments have a small localized tendency to overshoot GPI at the peak of each season in the North Atlantic (end of season in the northwest Pacific), but overall it is hard to distinguish any strong response to SP or resolution.This is also shown by the interannual variability statistics on the right-hand side of each panel, which show that all models have high and comparable skill in simulating the IV of the environment, which is then reflected in high correlations between the models GPI and the models TCs in the North Atlantic (and south Indian Ocean, not shown), as well as providing a reasonably good predictor for observed TCs (r from 0.3 to 0.6).For the northwest Pacific, the correlation between simulated and ERA5 GPI is high, and so is the correlation between simulated GPI and TCs, but GPI is a poor predictor for observed TCs; the same applies in the southwest Pacific (not shown).
In terms of the PI terms, the thermodynamic efficiency provides an interesting case.Previous studies [see discussion in Camargo and Wing (2016) and Camargo et al. (2007a,b)] include mention of PI and interannual variability of TCs.Emanuel et al. 2013, Wing et al. 2015, and Bengtsson et al. (1995) computed the thermodynamic efficiency term of PI, (T s 2 T o )/T o , where T s is the surface temperature and T o is the temperature of the outflow layer,2 which was shown (Emanuel et al. 2013;Wing et al. 2015) to explain up to 30% of the PI trend in the North Atlantic, although virtually no signal was found in the northwest Pacific.The time series presented in both Emanuel et al. (2013) and Wing et al. (2015) contain, in fact, evidence of substantial interannual variability.In terms of seasonal variability of PI, Gilford et al. (2017) [also Gilford et al. (2019) for along-track PI] found that, in most ocean basins, the air-sea enthalpy disequilibrium (part of PI) drives seasonal variability, but in the western North Pacific, the only basin in which outflow levels are above the tropopause throughout the seasonal cycle, the seasonal cycle of lowerstratospheric temperatures influences outflow temperatures (and thus thermodynamic efficiency) and damps the seasonality of PI.
Inspired by the discussion in Wing et al. (2015)-''Investigating the contribution of tropical tropopause layer temperatures to interannual variability in maximum intensity, rather than trends, may therefore be a valuable extension of this work'' (p.8676)-we turn to the analysis of the interannual variability of the thermodynamic efficiency term.In so doing, the ultimate focus of this paper is the explicitly simulated TCs, rather than just the indices, which are used in this study as guidance in the investigation of potential predictors contained in the TC environment.The thermodynamic efficiency in the SPHINX simulations is governed, in terms of sensitivity to resolution Figure 11 shows the temporal evolution of the thermodynamic efficiency term, based on monthly means.This term seems to be overestimated for all SPHINX experiments, and this error increases systematically as we reach the highest resolution.The reason for this is a negative bias in the T o term, which is increasingly made worse by model resolution in all domains; compared to this, the SP sensitivity is insignificant.The seasonal (and even intraseasonal) signal is, however, captured rather well, and, by comparison to ERA5, the skill at simulating interannual variability is the highest in the entire set, reaching correlations of up to 0.8.The correlation with the model TCs also shows that this is often the most important term in GPI for this particular model formulation.
The correlations between the annual mean count of TCs, using IBTrACS data, and the annual mean thermodynamic efficiency term have been computed and are shown in Fig. 11.The correlations for the ensemble mean and for each experiment (BASE, STOC) are also shown in the right-hand boxes.In general, correlations between the thermodynamic efficiency term and the observed TC count for the North Atlantic are surprisingly higher (;0.5) than have been found for Fig. 8 for the correlation between simulated and observed TCs, albeit nearly zero in the northwest Pacific, as also found, albeit limited to the trend, in Wing et al. (2015).For the North Atlantic there is a slight increase in the correlations going from BASE to STOC, except for the ULTRA-HIGH resolution; STOC has higher correlations in only two cases, COARSE and MEDIUM.
A survey of all possible plot combinations, both time series and scatterplots (total of 108), indicates that the top governing environmental variables for predicting TC IV in the North Atlantic and south Indian Ocean are wind shear, thermodynamic efficiency, PI, and GPI (in that order, but nearly with the same weights for wind shear and thermodynamic efficiency).For the eastern Pacific, RH is the most important factor, followed by GPI and then PI and wind shear.For the northwest Pacific GPI plays the largest role, while for the southwest Pacific it is vorticity, but both basins have such low skill at representing IV that this is hardly worth mentioning.
Once again, applying a Kolmogorov test, as in Fig. 8 reveals that the differences between the different experiments are not significant, but pooling all the results shown in Fig. 11 for the Atlantic and northwest Pacific basin and contrasting with those in Fig. 8 reveals that the improvement in skill provided by the thermodynamic efficiency term is in fact significant.

c. Scatterplots of IV correlations
A metric of IV correlation is introduced in order to better uncover any evidence of potential predictability, and to summarize the relative importance of SP and resolution for each field computed in the GPI diagnostic.While all correlations of all GPI variables have been computed in each basin, only two notable examples will be shown.
Figure 12 shows a summary of the correlations in Figs. 10 and 11.The set of correlations for BASE is plotted against the set of correlations for STOC, providing information on each resolution and on the potential of the GPI and its terms to provide interannual predictability for the number of TCs in each basin.It is clear that GPI is well simulated in both the Atlantic and the northwest Pacific, and that neither resolution nor SP makes a difference in the simulation of the interannual variability of GPI.Further, GPI is important for predicting the number of simulated TCs in a given year, albeit not a good predictor of observed TCs in the North Atlantic.For the northwest Pacific, while GPI is not a very good predictor of simulated TCs, PI is, and it is also in better agreement with the PI estimates in ERA5.The thermodynamic efficiency term is one of the two most important terms in the North Atlantic, but it is not well simulated in the northwest Pacific.PI is in fact not a good predictor for observed TCs in the northwest Pacific, but is a reasonably good predictor of simulated TCs (a moderately better predictor in this basin is in fact the GPI vorticity term; not shown).

Investigation of SP sensitivity with an independent GCM, HadGEM3-GC31
With regard to the reproducibility of the results based on the Climate-SPHINX simulations, it can be useful to resort to analyzing the simulations based on another model, with a completely independent dynamical core and set of physical parameterizations.Sensitivity experiments to test the impact of SP were designed using an independent model, HadGEM3-GC31 (see section 2) with comparable scientific configuration (except for SKEB2 instead of SKEB), forcing, and resolution of N216 (Dx ' 60 km at 508N), comparable to the LOW resolution in Climate-SPHINX.Contrary to the results from the Climate-SPHINX approach, however, HadGEM3-GC31 was tuned with the SP schemes enabled; therefore disabling SP is bound to produce slightly worse results overall (e.g., in terms of radiative balance).
Three ensemble members (control experiment, with full SP) are compared in Figs.13a-c to one ensemble member using no SP (corresponding to the SPHINX BASE configuration; Fig. 13d) and two further ensemble members (Figs.13e,f), which used only a single SP package: SKEB2 in one case and SPT in the second case.
The results shown in Fig. 13 are fully consistent with Climate-SPHINX: by comparison of the panels, the use of SP increases the mean counts by up to 30% and does so in regions that are directly comparable to those discussed earlier in this study (see, e.g., Fig. 3).The sensitivity to the use of SP is clearly larger than the ensemble spread, which can be estimated by comparing Figs.13a, 13b, and 13c.The fact that the response to SP is far larger than the internal variability of the model (spawned by perturbing initial conditions) also matches SPHINX results well.
In terms of location and extent, the areas of strong sensitivity also match what was found in the SPHINX experiments (e.g., for the North Atlantic, the northwest Pacific, and the southwest Pacific, east of Australia).The sensitivity in the north Indian Ocean is far less, but there is some sensitivity in the south Indian Ocean, which partially matches SPHINX (at least around Madagascar).The sensitivity in South America is a well-known characteristic of HadGEM models, for instance as a response to resolution enhancement, and has been shown in multiple papers in the course of the last 10 years (Strachan et al. 2013;Roberts et al. 2015).With that, these South Atlantic systems tend to be tropical depressions or hybrid systems that are included in observational counts in some other basins, albeit not all.In fact, the South Atlantic Ocean is not officially classified as a tropical cyclone basin by the World Meteorological Organization and does not have a designated regional specialized meteorological center (RSMC).
These independent model results confirm our first finding: that the use of stochastic physics reduces the mean error in the simulation of TC track densities.

Discussion
The richness of the Climate-SPHINX ensembles enables a robust comparative study of the effects of resolution and the effects of SP, individually and combined, in the simulation of TCs.Cyclonic disturbances from 3600 model years have been extracted, with 10 000 TC precursors identified each year in each hemisphere.As a result, we were able to study the characteristics and evolution of 36 million cyclonic disturbances, from TC precursors to warm-core TCs.Climate-SPHINX confirms other studies (Zhao et al. 2009;Strachan et al. 2013;Roberts et al. 2015;Shaevitz et al. 2014) on the role of resolution for TC simulation, but also offers important further insights.For instance, the more recent PRIMAVERA simulations (Roberts et al. 2020) took a multimodel heterogeneous ensemble approach, with individual centers submitting three ensemble members each, and this was enough to show robust intermodel agreement on the TC response to resolution.The small ensemble spread found in Climate-SPHINX, for means and variability of TCs, also supports the robustness of previous studies.This finding should, however, not undermine the value of a large ensemble, which will come into play once we start to analyze extratropical transition, landfall, and most of all the response of intensity to imposed climate change.
The novelty of this study lies in the robust comparison of the impact of the use of SP for the simulation of TCs, versus the traditional focus on just increasing resolution, under the premise that the two are to some extent equivalent (see Palmer 2019).The results of this study have indeed uncovered sensitivity to SP that mimics the effects of increased resolution (albeit with smaller amplitude) and is overall beneficial.This claim is supported, for instance, by comparing the maps of TC track and genesis density in both EC-Earth and HadGEM3-GC31.The linear response to SP at each resolution in Climate-SPHINX, and for each basin, as well as the remarkable ensemble coherence, add to the robustness of our results.There is, however, some localized evidence (east and southwest Pacific, despite substantial observational uncertainty; see Hodges et al. 2017) that the increase of the number of TCs for the higher resolutions may be excessive, and made worse by SP.SP has also been shown to have a far larger impact on the simulation of TCs than on the simulation of ETCs, which is important for the configuration of models used in operational prediction: Sanchez et al. (2014) for instance also found a larger impact in NWP simulations of the tropics versus extratropics.
Analysis of the relative roles of seeding and of the environment, via the cyclogenesis term, for the mean number and distribution of TCs (e.g., in Fig. 7) indicates that the latter is more important and more strongly responding to SP and resolution, but can lead to excessive transition, especially clear in the southwest Pacific basin, seen as very large values of PC.The smaller role played by the seeding term, and its lack of sensitivity to resolution or SP, indicate that the relatively smooth seeding used by, for instance, Emanuel (2021), is likely adequate to study TC formation, as long as the significantly variable nature of environmental controls of cyclogenesis are then credibly taken into account.The finding, however, cannot be reconciled immediately with the importance of seeding put forward in Vecchi et al. (2019) in the context of climate change.
A follow-on study, in preparation, will therefore present results from an identical Climate-SPHINX ensemble, forced by an RCP8.5 scenario, as well as century-long coupled simulations at MEDIUM resolution.
In terms of the interannual variability of TCs frequency, the correlation between simulations and IBTrACS observations for the EC-Earth configuration used in SPHINX is limited to 0.33 (HIGH-BASE) in the North Atlantic, which is a rather poor number when compared to past findings, for example between 0.6 and 0.7 as in Roberts et al. (2015) for the UPSCALE campaign, or Roberts et al. (2020) for the HighResMIP experiments, using the HadGEM3-GC31 climate model.The range of correlations in HighResMIP is 0.3-0.7,and EC-Earth, in a configuration nearly identical to Climate-SPHINX, with the same SST and sea ice (Kennedy et al. 2017), albeit with some different forcings (e.g., MAC aerosols; Mulcahy et al. 2018), shows correlation of IV between observed and simulated TCs to be nearly identical to that in Fig. 8.
The correlation of observed IV to TCs found in the northwest Pacific is at most 0.27 (ULTRA-HIGH BASE); in the east Pacific it is 0 at best (MEDIUM-BASE), and mostly negative.Roberts et al. (2020) found in PRIMAVERA-HighResMIP, using HadGEM3-G31 at 25-km resolution, a correlation of 0.3 in the northwest Pacific and 0.5 in the eastern Pacific.However, at comparable resolution, HadGEM3-GC31 has substantially more TCs than EC-Earth (and even too many in the northwest Pacific), as seen in the multimodel intercomparison in Roberts et al. (2020), which was already found, to a lesser extent, in an earlier version of HadGEM3 at the same resolution (Roberts et al. 2015) that did not use SP.
The small spread in the time series of TC counts in each basin (e.g., Fig. 8) and the inability to accurately reproduce observed variability suggests that the EC-Earth model, as configured for the Climate-SPHINX campaign, is more skillful at predicting itself than at predicting observed TC counts in each year.This is also evidenced by the SNR results (Fig. A1) and overall characterizes the models in Climate-SPHINX as overconfident.These results are also in agreement with what has been shown in Camargo and Barnston (2009) in terms of model overconfidence, in the case of initialized seasonal forecasts, and stresses the fact that prediction of TC IV continues to constitute a challenge at the basic level of GCM formulation, which neither resolution nor SP is able to fully overcome.
SP enables a better representation of the seasonal cycle of TC frequency, particularly at the higher range of resolutions.The North Atlantic end of season response to SP is small, albeit robust, as even a small change near the end of August would impact the monthly means; this should be further investigated in (larger) ensembles of seasonal forecasts.To explain the seasonal skill, the simplest hypothesis would be that the response is due to the multiplicative nature of SPPT: the impact of the scheme should be larger when the variance of the deterministic tendencies is higher.It is reasonable to expect, a priori, for this to coincide with the peak of the TC season, when the variance is presumably higher, than at the tails of the season.This asymmetry might project onto the mean state changes to humidity, with a greater expected change at the peak of the seasonal cycle.For the northwest Pacific, however, the evolution of the seasonal anticyclone will also play a strong role at the end of the season, when TCs track northward, transporting much water vapor in the process, supplying other precipitation systems inland (see Guo et al. 2017).
In terms of environmental controls of TC interannual variability, the extensive review in Camargo and Wing (2016) pointed out that there is a stronger relationship between model GPI and observed TC variability than with the model TC variability.These conclusions still apply to the present study, where we showed that the environment is in general very well simulated, as compared to ERA5, and that the model GPI has correlations to observed TCs interannual frequency often twice as large as the correlation between simulated and observed TCs.The relative roles of the individual terms of the GPI are, however, moderately sensitive to resolution and to SP, with some important distinctions, depending on the basin.
The role of shear for TC development is well established, and the dependence of shear on model resolution in this context has been discussed in Bell et al. (2013) and Roberts et al. (2015), among others.The same relationship is apparent in SPHINX, and vertical shear appears to respond to resolution and SP in terms of climate means.For IV, it is an important driver in the North Atlantic and south Indian Ocean, although not so across the rest of the globe, which could be related to the overall problems in storm structure (EC-Earth TCs are too large compared to observations and other models; see Vanniere et al. 2020).
The role of atmospheric moisture appears to be comparatively more prominent for simulations in which SP has been activated; this sensitivity has also been pointed out in previous studies by Bell et al. (2013) and Camargo et al. (2020).More specifically, in the context of SP studies, this is consistent with the idea put forward in Strommen et al. (2019b) that, by broadening the distribution of humidity tendencies, SPPT has a particularly strong impact on humidity, due to the nonlinearities associated with condensation.For example, for a parcel of air close to saturation, a tendency perturbation in one direction may trigger condensation, increasing the total liquid water, while the opposite perturbation will not.Changes in available moisture might therefore be expected to be magnified in a model with SPPT turned on, and for the specific problem of TC simulation, with potential for local feedbacks.A cautionary note is needed here with regard to lack of moisture conservation when using SPPT, which required a fix [see discussion in Davini et al. (2017)].The full impact of the moisture fix is not known, since the model will not run without it, and yet sensitivity to SP in SPHINX simulations shows a stronger impact of the moisture field on TCs, while ETCs are less impacted, possibly due the moisture response being limited to the tropics.RH is, however, not a useful predictor in terms of IV.
The thermodynamic efficiency term shows promise in explaining the different TC responses for the North Atlantic and south Indian Ocean domains.This is particularly evident in the lower range of resolutions, in which SP is more active in terms of changes in the probability of cyclogenesis (e.g., see Fig. 7).The thermodynamic efficiency term shows correlations with observed TC frequencies (IBTrACS) that are nearly twice as large as the correlations between observed and modelproduced TC frequencies.It is reasonable to expect that this in part reflects the use of observed SSTs and that the importance of this term would be diminished in AOGCM simulations.Future work on existing Climate-SPHINX data will in fact exploit the coupled atmosphere-ocean experiments, with a special focus on TC intensity, for which there was no space in this paper, as well as the 2070-2100 (FUTURE) atmosphere-only simulations, expanding the investigation of TC predictability.
While it is notable that thermodynamic efficiency is found to play an important role in these simulations (and this reflects the importance of the outflow temperature variability), the SPHINX models develop a substantial cold bias as resolution is increased, seen when plotting the tropical tropopause layer temperature time series (TTL; not shown), which is the reason for the errors seen in the PI terms.This can in part be attributed to the choice of not retuning the models at each resolution, so it is not a final statement of models being ''fit for purpose'' in terms of climate change.In fact, this is very important for our trust in future projections, as Emanuel et al. (2013) (in the context of decadal to interannual TTL signals) suggest that ''the failure of most GCMs to capture this cooling must be addressed before such models can be used to project future changes in tropical cyclone activity'' (p.2300).It should not be too hard to adapt EC-Earth to more reliably simulate the future of TCs: to improve the overall simulated environment, tuning TOA radiation is normally easier than tuning surface layer similarity (often done for improving TC wind-pressure relationships and air-sea fluxes).Given the computational costs involved, exploiting IV for carefully selected periods is more economical than running long climate simulations.
The apparent skill in GPI, particularly the thermodynamic efficiency term in the North Atlantic and south Indian Ocean provides a paradox, pointing to other types of model formulation uncertainty, unaffected by resolution or SP, that seem to prevent the model from exploiting its own potential source of predictability.The fixed roles of dynamics and physics could be unsuitable for such a large range of resolutions [see discussion in Vanniere et al. ( 2020)], affecting their interaction with the large-scale environment.Recent development work with the IFS (e.g., Magnusson et al. 2019), on surface friction, seems to have already improved the EC-Earth model with respect to TC simulation, and some of these advances are in fact evidenced by the results in Roberts et al. (2020).
In terms of future directions, the resources used to produce the T1279-BASE simulation could have enabled a further four to five ensemble members with T799-STOC, bringing it to eight, or enabling more sensitivity experiments.For instance, as a general recommendation, we endorse Strommen et al. (2019a) in pointing out that model tuning performed on SP configurations may benefit the prediction of TCs in an operational setting; this may require SP and even resolution-based tuning in the case of scale-aware SP parameterizations, such as SKEB2.This requires further research, as SKEB2 has been shown to have a larger impact than SPT (as exemplified by Fig. 13).It is important to remember that EC-Earth is a spectral hydrostatic model, while HadGEM3 is a grid point nonhydrostatic model: horizontal advection and wave propagation are, for instance, substantially different, and so is the representation of tropical convection, so that the superposed effects of SP cannot be expected to be the same.Moreover, the type of SP formulation (SKEB2 instead of SKEB), as well as the individual parameter settings in HadGEM3-GC31 are different from what has been used in Climate-SPHINX, as well as different from the parameter settings for EC-Earth in PRIMAVERA-HighResMIP [the models analyzed by Roberts et al. (2020) andVanniere et al. (2020)], all of which points to more fundamental aspects of model formulation to explain the different responses, and a large number of individual sensitivity studies would be needed for robustness.All of the above does not detract from the strong finding that the overall sensitivity of TC simulation to resolution and SP is consistent across two entirely independent CMIP6 GCMs.

Summary and conclusions
Statistics of TCs identified in Climate-SPHINX confirm past and current findings that increasing model resolution systematically improves the simulated climatology-numbers and distribution-in both hemispheres.The use of stochastic physics further increases the number of TCs, by '30%, when compared to the base simulations, in a spatially realistic way, thus representing a surrogate for resolution.
Analysis of the impact of SP as a cause of additional TC seeding, versus increasing the probability of transition of TC seeds to TCs, through modification of the TC environment, points to the latter being the prevalent effect.Further analysis, focusing on the interannual variability of TC numbers per year indicates that it is larger overall, and more realistic, when applying SP and/or enhancing resolution.Unfortunately, the increased IV of simulated TCs does not translate into significantly enhanced skill in terms of predicting the annual number of observed TCs: if anything, SP seems to be adding more noise than signal to the problem of predicting the annual number of TCs.The representation of the seasonal cycle of TC frequency is however improved by the use of SP at the higher range of resolutions.
From the point of view of TC predictors, the realistic performance of the thermodynamic efficiency term (part of the potential intensity calculation) applied to the EC-Earth simulations provides a stringent test to strengthen our trust in the use of EC-Earth (or similar GCMs) for the prediction of TC responses to climate variability and change, even in the case of models at moderate resolution.However, for this generation of EC-Earth specifically, other aspects of simulation fidelity need to be improved before this potential predictability can be exploited.The parsimonious nature of SP can in fact be exploited for further model development.Science for Service Partnership (CSSP) China as part of the Newton Fund.
Data availability statement.Climate SPHINX experiments are available via a dedicated THREDDS Web Server hosted by CINECA (https://sphinx.hpc.cineca.it).Details on the data accessibility and on the Climate SPHINX project itself are available on the official website of the project (http://www.to.isac.cnr.it/sphinx/).The HadGEM3-GC31 baseline SP simulations are available on ESGF.UM datasets and tracks are available via the PRIMAVERA website: https://www.primavera-h2020.eu/modelling/data-code/.

APPENDIX
The Signal-to-Noise Ratio of Simulated TC Interannual Variability As a way to further interpret Fig. 8, we computed the signalto-noise ratio (SNR): s 2 signal /s 2 noise .where s 2 signal is the signal variance of the model ensemble mean in time, while s 2 noise is the variance of the ensemble members about the ensemble mean [the spread, A1 as in Eade et al. (2014)].The analysis was made more robust by resampling with bootstrapping, to get 1000 artificially generated ensembles (for each of the stochastic/ deterministic ensembles at each resolution).The 95% confidence interval obtained in this way is used to test statistical significance.Additionally, the three higher resolutions were combined, in order to create an ensemble size comparable to that of COARSE and LOW.
Figure A1 summarizes the findings so far, with STOC in red and BASE in blue.In terms of signal, STOC is significantly higher than BASE for the MEDIUM resolution, but otherwise not distinguishable, with no significant differences overall.Across the resolution it is hard to distinguish any two adjacent resolutions from each other, but the (combined) high resolutions are in fact significantly higher than the lowest resolutions, suggesting that the increase is robust.For SNR, STOC is significantly lower for the (combined) HIGH experiment, and just barely not significantly lower for the COARSE.No significant difference in SNR were found for MEDIUM and no distinguishable change was found across all resolutions.
The clearest and most robust change across both resolution and stochastic physics is in the noise component, shown in Fig. A2.A significant increase in noise is seen for both basins, with each successive increase of resolution, when pooling both BASE and STOC (right column of Fig. A2).SP significantly increases the noise in four of the six cases, with a particularly pronounced impact in the North Atlantic.Because the SNR metric does not experience a similarly robust increase, the data suggest that the first-order impact of increased resolution, and, to a lesser extent, SP, is mainly to increase internal variability.Additional model tuning is likely required in order to obtain comparable increases in the signal and associated improvements in predictability.
Figure1shows the number of tropical depressions, tropical storms (combined), and tropical cyclones, by category, identified by TRACK in each hemisphere.We have used a classification based on MSLP, as inKlotzbach et al. (2020), because of their strong arguments with regard to the study of the impact of TCs.This classification also enables us to identify TD1TS in IBTrACS, and at the same time it avoids the peculiarities of surface layer extrapolation of model level winds, a problem that has been particularly evident in recent intercomparisons (see, e.g.,Roberts et al. 2020).Figure1shows the different SPHINX simulations (BASE with B suffix and STOC with S suffix), organized with increasing resolution (from left to right) and each SP experiment (S) inserted between each pair of resolutions (B).The height of the bars for each category suggests that SP is equivalent to an in-between model resolution when it comes to how many TCs are produced in each hemisphere.The results compare well to past studies, indicating that, as model resolution is increased, more TCs are produced in the climate simulation, with an initial steep increase for the weaker categories, up to T255, then tending to reach a plateau.As in previous GCM intercomparison studies (e.g.,Roberts et al. 2020), a realistic simulation of the frequency of TC categories 4 and 5 is still elusive, even at the highest resolution (T1279), which nonetheless demonstrates a tendency for resolution and SP to both be beneficial.A recent intercomparison study byJudt et al. (2021), in which the PRIMAVERA-HighResMIP IFS model participated, albeit with convection explicitly represented, indicates that the full spectrum of TC intensities (both winds and central pressure) does become more realistic at 5-km resolution for that particular model configuration.The climatology of the spatial distribution for the cyclonic disturbances counted in Fig.1confirms that these are indeed tropical cyclones, with spatial signatures that bear strong resemblance to observations.Figure2shows the track density FIG.2.TC track densities from the IBTrACS dataset for the Climate-SPHINX simulation period.Track density units are number per month per unit area, where the unit area is a 58 spherical cap.
FIG. 3. TC track densities for the EC-Earth (IFS) model, Climate-SPHINX campaign.Blue numbers in each panel indicate the number of storm transit per months for each sector, while black numbers in the top row are the IBTrACS observations.Stippling indicates the significance at the 5% level; track density units are number per month per unit area, where the unit area is a 58 spherical cap.
Figure5shows that PC for track density is as high as 30% in large regions of TC activity: the highest values are found in the northwest Pacific and the largest changes correspond to the change in resolution, with SP showing a similar (albeit smaller) response.A few cases of reduction in PC when using SP are found, as large as 210%, as seen for instance in the eastern Pacific.These results are comparable with those in studies considering developing versus nondeveloping tropical synoptic disturbances identified in reanalyses in both the Atlantic and northwest Pacific(Fu et al. 2012;Peng et al. 2012; Brammer  and Thorncroft 2015;Hankes et al. 2015), where PC can be as high as 30% or more.Figure5also shows the GPI, computed from the Bister-Emanuel algorithm(Bister and Emanuel 2002) as shaded colors.The GPI fields shown here indicate that the mean TC environment responds less to the introduction of SP, or increase of resolution, than the PC itself.The figure also indicates that GPI peaks at the point of entry of the most active TC regions [e.g., in the North Atlantic main development region (MDR)].The subtler aspects of the response of GPI to SP and resolution will be explored in a later section, particularly for variability.

FIG. 4 .
FIG. 4. TC genesis densities for the EC-Earth (IFS) model, Climate-SPHINX campaign.Stippling indicates the significance at the 5% level.Genesis density units are number per month per unit area, where the unit area is a 58 spherical cap.

FIG. 5 .
FIG. 5.The SPHINX models GPI (shaded color mesh) and the probability of cyclogenesis (PC %) from TC seeds to TCs (contours), as a function of resolution (rows, with COARSE at top and ULTRA-HIGH at bottom) and SP treatment (columns: BASE on the left and STOC on the right).Contour levels for PC are shown in the lower box.

FIG. 6 .
FIG. 6.The relative roles of seeding and cyclogenesis in creating a TC response to SP at each resolution: (left) total count, (center) role of seeding, and (right) role of cyclogenesis.Units are as in Fig. 3.

Figure 9
Figure 9 shows the mean annual cycle for the North Atlantic (top) and northwest Pacific basins (bottom), defined as in

FIG. 7 .
FIG. 7. The environment governing TC seed transition in the NH: (top left) GPI and its components (top right) RH, (bottom left) vertical wind shear, and (bottom right) potential intensity.All are scaled according to the GPI equation.Superposed are contours of the change in the probability of cyclogenesis (DPC).Contour levels are shown in the lower box.

FIG. 8 .
FIG. 8. TC interannual variability for the (top left) North Atlantic, (top right) northwest Pacific, (bottom left) south Indian Ocean, and (bottom right) southwest Pacific for each SPHINX simulation.BASE experiments are shown as continuous lines, while STOC experiments are shown by the dotted lines.The ensemble spread, defined by adding shading the area between ensemble minimum and ensemble maximum for each year, is shown by the shaded areas.IBTrACS observations are shown by the thick dashed lines.The yearly Pearson correlation between ensemble mean and IBTrACS is reported for both BASE (r B ) and STOC (r S ) to the right of each panel.

FIG. 9 .
FIG. 9. Area averaged TC seasonal variability by month (ensemble climatologies), with IBTrACS observations shown in the right column of each panel, for (top) the Atlantic basin and (bottom) the northwest Pacific.With regard to the monthly split, a tropical cyclone is attributed to a specific month according to its date of genesis.

FIG. 10
FIG. 10. (left) Time series of GPI and (right) annual correlations with observed environment and TCs, shown for the (top) North Atlantic and (bottom) northwest Pacific, with lowest resolution at the top of each panel and highest resolution at the bottom.A symbol to the left of each season in the monthly time series indicates the annual mean, for each experiment and for ERA5 observations; the shaded interval is a measure of the ensemble spread.Correlations are reported for both BASE (r B ) and STOC (r S ) to the right of each panel, as four boxes: (i) model environment to model TCs (top left); (ii) model environment to IBTrACS TCs (top right); (iii) model environment to ERA5 environment (bottom left), and (iv) model TCs to IBTrACS TCs (bottom right).

FIG. 11 .
FIG. 11.As in Fig.11, but for the GPI thermodynamic efficiency term (part of PI).

FIG. 12 .
FIG. 12. Scatterplots of the IV correlations, corresponding to the boxes in Figs. 10 and 11, for (top) GPI, (middle) PI, and (bottom) the PI thermodynamic term for the (left) North Atlantic and (right) northwest Pacific basins.The size of the symbols corresponds to the different resolutions, with the smallest symbols identifying COARSE and the largest symbols identifying ULTRA-HIGH.
FIG. 13.TC track densities for HadGEM3-GC31 model, as configured and run for the PRIMAVERA/ HighResMIP campaign.Shown are (a)-(c) three ensemble simulations with full SP, and simulations with (d) SP fully disabled, (e) SKEB2 disabled (SPT only), and (f) SPT disabled (SKEB2 only).Track density units are number per month per unit area, where the unit area is a 58 spherical cap.
FIG. A1.TC variances (total, signal, and noise) based on the ensemble interannual variability for different domains in Climate-SPHINX.The symbol sizes are proportional to the signal-to-noise ratio.Units are TCs per year.

TABLE 1 .
The experimental configuration in SPHINX.

TABLE 2 .
All NH TC precursors, TC seeds, TCs, number per year.

TABLE 3 .
All SH TC precursors, TC seeds, TCs, number per year.

TABLE 4 .
All NH ETC precursors and ETCs, number per year.

TABLE 6 .
TC variability (standard deviation of annual means) in selected basins: NATL 5 North Atlantic, NWPAC 5 northwest Pacific; EPAC 5 eastern Pacific; NIND 5 northern Indian Ocean; SIND 5 southern Indian Ocean, and SWPAC 5 southwest Pacific.Units are TCs per year.