Single Precision in Weather Forecasting Models: An Evaluation with the IFS

Earth’s climate is a nonlinear dynamical system with scale-dependent Lyapunov exponents. As such, an important theoretical question for modeling weather and climate is how much real information is carried in a model’s physical variables as a function of scale and variable type. Answering this question is of crucial practical importance given that the development of weather and climate models is strongly constrained by available supercomputer power. As a starting point for answering this question, the impact of limiting almost all real-number variables in the forecasting mode of ECMWF Integrated Forecast System (IFS) from 64 to 32 bits is investigated. Results for annual integrations and medium-range ensemble forecasts indicate no no- ticeable reduction in accuracy, and an average gain in computational efﬁciency by approximately 40%. This study provides the motivation for more scale-selective reductions in numerical precision.


Introduction
A question of both theoretical and practical importance is how many bits of real information are carried in the hundreds of millions of variables in a weather or climate prediction model. Related to it is the need of minimal numerical precision of weather models still sufficient to grant a simulation of the desirable quality. For example, since the small-scale variables in a dynamical core are strongly influenced by the irreducible stochastic uncertainty in subgrid parameterization schemes, one can expect that fewer bits will be needed to encode the real information content contained in small-scale variables, compared with large-scale variables (Palmer 2014). Studies that have tried to reduce precision to a level that can be justified theoretically using a scale-dependent approach for an idealized atmospheric application and a spectral dynamical core suggest that precision can be reduced significantly (see Düben et al. 2014;Düben and Palmer 2014;Thornes et al. 2017).
Naturally since the early days of numerical weather prediction the issue of appropriate numerical precision has been the subject of considerable interest (see Baede et al. 1976). Some precision-sensitive operations, such as matrix inversions, often require so-called doubleprecision (64 bit) representation of real numbers. But the common practice to keep double precision as the exclusive representation of all real numbers in the models remains questionable. Reducing the number of bits used to represent weather and climate variables to levels that can be justified theoretically could have a substantial impact on reducing data movement and processing time, resulting in savings that could be reinvested to increase model resolution and hence to improve forecasts.
There are many opportunities to reduce the numerical precision in models. For example, future computer hardware will offer a choice of double, single (32 bit), and additionally half (16 bit) floating-point precision. Furthermore, programmable hardware may allow variables with fully variable levels of precision [see Düben et al. (2015)].
As a motivation to undertake such an activity, we study the impact in a global operational weather prediction model of reducing the precision of real number variables from double to single precision. Although we do have spectral representation of some of the model fields there is no scale-selective approach in the Integrated Forecast System (IFS). Most of the computational expense occurs in the gridpoint space where all scales are treated jointly. Still, we think it is of value to investigate the impact of changing from double to single precision on forecast accuracy. There are few examples of realistic models that run with single precision in the literature, such as the U.S. Navy NOGAPS (Hogan and Rosmond 1991) or the COSMO model (Rüdisühli et al. 2013). No systematic study exists, however, comparing double-and single-precision versions of a comprehensive, operational, global weather forecast model. Here we present an analysis of the ECMWF IFS spectral model (ECMWF 2016b).
In section 2 we describe the model setup and outline the necessary code changes required to obtain a model with reduced precision yet practically identical forecast skill. Results from extensive qualitative and quantitative comparisons are presented in section 3, with further discussion in section 4.

Code modifications allowing the use of single precision
The version of IFS used for the model simulations was based on model cycle CY41R2 (operational cycle at ECMWF since March 2016). The presented simulations are not coupled to an ocean model, though coupling to the operational wave model is included.
Technically the switch between single-and doubleprecision arithmetic is realized through the FORTRAN KIND parameter of all real variables in the code. This KIND parameter is defined in one module to modify the precision of all real variables. Additional code changes are required to allow both single-and double-precision execution with the same code. Those rather technical changes were as follows: d Replace hard coded thresholds and security constants by parameters defined with intrinsic functions of precision [e.g., to replace code such as ''x 5 10.E+100'' by an intrinsic FORTRAN function ''x 5 huge(x)''].
d Extend code interfaces with I/O, MPI, and mathematical libraries (LAPACK/BLAS) to handle both double-and single-precision data.
d Ensure system functions (such as the system clock) are independent of the requested model precision.
d Make sure binary input files are read with the appropriate precision level.
To avoid run-time problems (i.e., floating-point exceptions or under/overflow), the model code had to be further analyzed to ensure all of the computations remain within the floating-point interval available to single precision (between 2 2126 and 2 127 ). These changes were mostly seen as desirable in any case in terms of securing the general performance of the model, regardless of its actual precision. Typical changes in this respect were as follows: d Increase security by avoiding divisions by floatingpoint numbers that may become zero.
d Rewrite a few formulas to make sure that all intermediate results stay within the single-precision range [e.g., rewrite A 4 /B 3 into A(A/B) 3 ].
Finally, to ensure that the single-precision model delivers meteorologically similar results to the doubleprecision reference, three parts in the model code had to be further modified.
d The cost of the radiation scheme is reduced by recomputing the fluxes on a longer time step than the model step. The contribution to a standard time step has to be upscaled from this coarser temporal sampling. This procedure was found responsible for introducing of an error for low solar zenith angles (below 868) in the shortwave radiation code. The double-precision version of the code using a lower threshold to truncate numerical noise was in this case responsible for amplifying a higher amount of noise. This led to the difference of results from the two precision options. By rewriting the code to an equivalent but simpler formula, substituting the implicit and zero-division protected double division of the same variable by single multiplication (not requiring a protection), equal and improved performance for both precision levels was achieved.
d The setup for Legendre transformations, mainly the calculations of the roots and the associated polynomials, is very sensitive to round-off errors due to their recurrence relations. Before the Swarztrauber algorithm (Swarztrauber and Spotz 2000) in the Legendre transforms was implemented in the IFS, this part of the model had to be computed with quadruple precision for resolutions higher than ;40 km. When this code is run in single-precision arithmetic, the model results for higher resolution are substantially degraded. The solution adopted was to ensure that this code is always evaluated in double precision during the model setup (first time step). The Legendre transformation during the standard time step could then be evaluated in single precision.
d Use of single precision in the vertical finite element scheme (VFE) (Untch and Hortal 2004) was found responsible for amplifying model error through the pressure gradient term evaluation. This resulted in unacceptably poor mass conservation. The adopted solution, which almost entirely solved this issue, was to precompute the VFE integral operator with doubleprecision arithmetic. The resulting operator was truncated to a chosen precision.
The code changes outlined above have been regarded as beneficial with respect to code quality, an improved computational efficiency, and stability of the model. The exercise of exploring use of single precision exposed various problematic places in the code that could potentially also degrade the forecast when using double precision.

Results
In this section, we compare results of the experiments in single and double precision of long-term simulations and medium-range ensemble forecasts. The former test serves to assess a realism of the model-generated climate when comparing it with the reality (represented by ERA-40 and observations), while the latter test gives a quantitative comparison of the two configurations. Both of them represent typical experiments used for relative quick evaluation of scientific or technical modifications of the model.

a. Long-term simulations
To assess the model climate and compare against observations, a 4-member ensemble started from initial time shifted by 36 h to mitigate an effect of diurnal cycle was integrated for 13 months. All integrations are run in uncoupled mode, with prescribed sea surface temperatures, where the first month of the integrations was discarded. The 12-month integrations for 2000-01 are compared with 35 observational datasets and model reanalysis. A pressure mass fixer is used to maintain mass conservativeness for long simulations (Rasch and Williamson 1990). Experiments use T399 (;50 km) horizontal resolution and 137 vertical levels. Examples from this diagnostic tool could be found on the ECMWF website under model climate evaluation (ECMWF 2016a).
The differences between the two experiments in single and double precision are generally small relative to the magnitude of the systematic forecast errors. The main differences were observed in midlatitudes and polar areas, typically above continents. This corresponds with the high flow variability in these parts of Earth. The maximum difference of annual zonal averages of the two sets of climate simulations did not exceed 2% in relative humidity, 0.5 K in temperature, and 1 m s 21 in flow speed. In terms of scores computed against observations it was very hard to choose the winner from the two tests. Table 1 summarizes root-mean-square errors from annual means of various model parameters as they are compared against the different datasets. It is quite evident that the single-precision arithmetic is capable of offering very competitive performance to the doubleprecision reference. Figure 1 shows the spatial distribution of the difference between the averaged precipitation and the GPCP2.2 dataset (Figs. 1a and 1b) and the difference of the TOA shortwave flux compared against CERES-EBAF data (Figs. 1c and 1d). The doubleprecision IFS outperforms the single-precision IFS for precipitation while shortwave flux favors the opposite conclusion. In either case the error patterns from the two experiments are very similar.

b. Medium-range ensemble forecasts
Ensemble forecasts were run 15-day forecasts for 46 start dates that were distributed evenly over an entire year with 50 ensemble members each, starting from 4 December 2013 to 29 November 2014. The resolution used is T399 with 91 vertical levels. Figure 2 shows the continuous ranked probability score (CRPS; Hersbach 2000) for the experiment run in single precision (dashed red) and double precision (blue) for geopotential and temperature.
The results of the single-and the double-precision experiments are very similar in the extratropics and the small visible differences might still be influenced by the limited sample size (Figs. 2a and 2b). For 500 hPa in the tropics, there is a more striking difference between the experiments (Fig. 2c). These differences stem from a known problem of the IFS. It does not conserve mass, rather it gains or loses mass depending on the combination of time step and resolution, which then results in a forecast bias. This bias then can strengthen or compensate other biases in the model. Hence, it can appear in the scores as an improvement or degradation. Especially in the tropics the scores are quite sensitive to biases, namely geopotential at 500 hPa, due to the smaller variability in comparison to the extratropics. To test further the sensitivity of the single-precision results, two additional experiments are run with the mass fixer activated so the model conserves global mass. With the fixer enabled, the differences in the extratropics between the single-and double-precision experiment become even smaller and the differences in the tropics are also very small as it could be seen with nearly indistinguishable CRPS scores (Fig. 3).

c. Computing cost in single and double precision
To evaluate the performance gain brought by the single precision, the two sets of ensemble described in section 3b were evaluated on the ECMWF Cray XC30. The system at that time consists of two Cray XC30 compute clusters, each having some 3500 compute nodes with Intel E5-2697v2 (Ivy Bridge) processors. Each node has in total 24 physical cores running at 2.7-GHz clock frequency. All experiments were submitted with the same topology having 96 MPI tasks and 8 OpenMP threads running on 768 cores.
To mitigate the variability in job execution time when running on a nondedicated cluster caused by contention on shared resources, MPI network, and file system, all perturbed ensemble members (i.e., excluding the control forecast running with slightly different setting) were considered. The minimum time for the model time step was used as a reliable measure of the undisturbed setup. One experiment as explained in section 3 then represents 2300 forecast jobs. Every 15-day forecast requires 720 time steps to be completed making altogether The performance of the single-precision executable for the T399 resolution IFS took 37% less run time with respect to the reference double precision. A further gain was achieved after considering the difference in cache utilization between the single-precision and doubleprecision versions and hence increasing the length of the inner ''DO loops'' in the model. Nearly all inner loops in gridpoint space are controlled by a parameter set through the namelist. This concept originally designed for vector computers has been useful in controlling cache optimization on scalar machines. The default value of this parameter (i.e., 16) is optimized for double precision. Another ensemble experiment was completed with this parameter doubled to 32. This experiment using the single-precision code with longer inner loops was 40.7% more efficient than the double-precision experiment. This translates to the average reduction of over 45% of the total wall-clock time for the singleprecision jobs, to give an estimate of how those absolute numbers impact the real computation in the concurrent environment of the standard load at the ECMWF computer.

Discussion and outlook
We have shown the potential of extensive use of single precision in the ECMWF IFS. We have been able to reproduce very similar results with IFS in terms of model seasonal variability and ensemble forecasting system for a reduced run time of approximately 40%.
The presented simulations exclusively focused on the atmospheric component of the IFS. This represents only one part of the entire forecast system that also includes other major components such as data assimilation, data handling, and ocean model. It is important to further explore a reduction in precision for the fully coupled system. Further work to verify the model performance and skill at higher resolutions or for climate modeling is also required. This current work will also be the starting point for further studies on the use of reduced precision beyond single precision in both the physical parameterization schemes and the spectral core of the IFS, and in particular, a reduction of numerical precision with decreasing spatial scale following the work of Düben et al. (2014) and Düben and Palmer (2014). This study is part of a larger scientific and technical program to develop models that only retain those bits of information that can be justified, either theoretically or practically. Therefore, we want to remind the reader of the benefit and impact that an increase in performance by 40% could potentially bring to the scientific community for both weather and climate simulations such as a significant increase in vertical resolution, an increase of ensemble size for ensemble forecasts, or an increase of horizontal resolution at the same computational cost. To illustrate the potential of single precision one can do a simple experiment comparing the single-precision full ensemble with 50 members with a reduced ensemble being composed from 34 members of a double-precision ensemble, both representing roughly equal computational cost. The attached Fig. 4 with CRPS scores demonstrates the potential gain in terms of forecast quality one could get without requiring more computing resources.
In addition to the reduced computational cost, the process of testing single precision through the code of the IFS has been found generally beneficial in terms of detecting and fixing badly conditioned code. In the future we would like to review more configurations with the single-precision alternative (such as the tangent-linear code) to reveal potentially problematic code there.
The single-precision version of IFS documented in this paper yields overall meteorologically equivalent results at the resolutions examined here (up to 50 km) while significantly reducing the computational resources. These experiments do not demonstrate whether this low precision approach will still work at any higher resolution, though they do not imply any evidence to the FIG. 2. CRPS of (a) Northern Hemisphere geopotential height at 500 hPa, (b) Northern Hemisphere temperature at 850 hPa, and (c) geopotential height at 500 hPa in the tropics; experiment with single precision (sp, dashed red) and double precision (dp, blue). contrary. These conclusions show the potential to save electric power consumption and computational time by investing in more sophisticated codes that can handle reduced precision arithmetic. This may guide the design of future weather forecasting and climate models. We see this as a first step in the path toward the development of energy efficient high-resolution weather and climate models, and crucial to the goal of developing global cloud resolving models at the earliest opportunity [see also Palmer (2015)].