Index Datasets and scripts accompanying the manuscript “Runaway GC evolution in gerbil genomes” by Rodrigo Pracana, Adam D. Hargreaves, John F. Mulley and Peter W. H. Holland. Dataset 1 - High GC region transcripts Assembled transcripts for genes in the high-GC region described by Hargreaves et al (2017) in four gerbil species (Psammomys obesus, Meriones unguiculatus, Meriones libycus and Meriones shawi). Hargreaves et al (2017) Genome sequence of a diabetes-prone rodent reveals a mutation hotspot around the ParaHox gene cluster. Proceedings of the National Academy of Sciences, 114 (29) 7677-7682; DOI: 10.1073/pnas.1702930114 Dataset 2 - High GC region substitution rates and GC-content GC-content (GC, GC1, GC2, GC12, GC3) and rates of synonymous (dS) and nonsynonymous (dN) substitutions for each mutational category (weak-to-strong [WS], strong-to-weak [SW], weak-to-weak [WW] and strong-to-strong [SS]) for 27 genes in the high-GC region described by Hargreaves et al (2017) in four gerbil species (Psammomys obesus, Meriones unguiculatus, Meriones libycus and Meriones shawi) and two murine species (Mus musculus and Rattus norvegicus). The rates are measured from the point of murine-gerbil divergence in the species tree to the tips representing each of the six species. Dataset 3 - Psammomys obesus CDS Predicted Psammomys obesus coding sequences for 8,809 groups of orthologous genes. Dataset 4 - Psammomys obesus peptide sequences Predicted Psammomys obesus peptide sequences for 8,809 groups of orthologous genes. Dataset 5 - Orthogroup IDs Protein accession IDs for each species in 8,809 groups of orthologous protein-coding genes. Dataset 6 - substitution rates in gerbil and murine genes Rates of synonymous (dS) and nonsynonymous (dN) substitutions for each mutational category (weak-to-strong [WS], strong-to-weak [SW], weak-to-weak [WW] and strong-to-strong [SS]) in two gerbil species (Psammomys obesus, Meriones unguiculatus) and two murine species (Mus musculus and Rattus norvegicus) for 8,809 groups of orthologous genes. The rates are measured from the point of murine-gerbil divergence in the species tree to the tips representing each of the four species. The genomic coordinates given are for the representative mouse gene for each group of orthologous genes in the mouse assembly GRCm38. Dataset 7 - Normalised substitution rates in gerbil and murine genes Normalised rates of synonymous (dS) and nonsynonymous (dN) substitutions for each mutational category (weak-to-strong [WS], strong-to-weak [SW], weak-to-weak [WW] and strong-to-strong [SS]) in two gerbil species (Psammomys obesus, Meriones unguiculatus) and two murine species (Mus musculus and Rattus norvegicus) for 8,809 groups of orthologous genes. The rates are measured from the point of murine-gerbil divergence in the species tree to the tips representing each of the four species. The genomic coordinates given are for the representative mouse gene for each group of orthologous genes in the mouse assembly GRCm38. Rates were normalised by dividing their original value by the average for the respective category and species. Dataset 8 - GC-content in gerbil and murine genes GC-content (GC, GC1, GC2, GC12, GC3) in two gerbil species (Psammomys obesus, Meriones unguiculatus) and two murine species (Mus musculus and Rattus norvegicus) for 8,809 groups of orthologous genes. Dataset 9 - Sliding window analyses Sliding window averages of normalised synonymous (dS) and nonsynonymous (dN) substitution rates for each mutational category (weak-to-strong [WS], strong-to-weak [SW], weak-to-weak [WW] and strong-to-strong [SS]) in two gerbil species (Psammomys obesus, Meriones unguiculatus) and two murine species (Mus musculus and Rattus norvegicus) for 8,809 groups of orthologous genes. The averages are based on the genomic coordinates for the representative mouse gene for each group of orthologous genes in the mouse assembly GRCm38. The rates are measured from the point of murine-gerbil divergence in the species tree to the tips representing each of the four species. For each window, we give the mean, standard deviation, median, minimum, maximum and number of outliers (where the rate is greater than 2.5 times the global mean) for each rate in each species. Dataset 10 - comparison between GC in the groups of orthologous genes and Hierarchical Orthologous Groups Comparison between GC-content for 6,735 gene sequences in four focal species (Psammomys obesus, Meriones unguiculatus, Mus musculus and Rattus norvegicus) with those in the Hierarchical Orthology Groups (HOGs) to which the orthologous groups were assigned. For each gene in the focal species, we determined the rank of GC-content measurements (GC, GC1, GC2, GC12, GC3) relative to the measurements for the species represented in the HOGs. The HOGs were downloaded from the OMA database (Altenhoff et al 2018). For each gene, we supply the OMA ID, the GC measurements for the focal (‘query’) species and summary statistics for the assigned HOG. Altenhoff et al (2018) The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces. Nucleic Acids Research. 46 (D1) D477–D485; https://doi.org/10.1093/nar/gkx1019 Script 1 Parameters used in the BppML subprogramme of BppSuite to optimise branch lengths for each alignment using the YN98 (F3X4) model. Script 2 Parameters used in the MapNH subprogramme of BppML to estimate dS and dN for different mutational categories (weak-to-strong, strong-to-weak, weak-to-weak and strong-to-strong).