This readme file was generated on 2024-08-15 and last updated on 2025-10-31 by Jade Whitlam. GENERAL INFORMATION Title of Dataset: Archaeobotanical data and analyses for the Neolithic sites of Sharara and el-Hemmeh, Jordan Description: Archaeobotanical data from the Pre-Pottery, or Early, Neolithic sites of Sharara and el-Hemmeh in southern Jordan, and R code for analyses undertaken on the data. Author Information Name: Jade Whitlam (main author of the related paper) ORCID: 0000-0001-9769-4704 Institution: University of Oxford, School of Archaeology and Department for Continuing Education Email: jade.whitlam@arch.ox.ac.uk Author Information Name: Pascal Flohr ORCID: 0000-0003-3203-913X Institution: University of Oxford (current affiliation, visiting researcher); University of Kiel, Institute for Pre- and Proto-Historic Archaeology and Cluster of Excellence ROOTS (main affiliation for this dataset); Leiden University (current affiliation) Email: p.flohr@library.leidenuniv.nl Author Information Name: Amy Bogaard ORCID: 0000-0002-6716-8890 Institution: University of Oxford, School of Archaeology; Santa Fe Institute Author Information Name: Mike Charles Institution: University of Oxford, School of Archaeology Author Information Name: Bill Finlayson ORCID: 0000-0002-0330-8006 Institution: University of Oxford, School of Archaeology Author Information Name: Cheryl A. Makarewicz ORCID: 0000-0002-1649-336X Institution: University of Kiel, Institute for Pre- and Protohistoric Archaeology and Cluster of Excellence ROOTS Email: c.makarewicz@ufg.uni-kiel.de Date of data collection/laboratory analysis: 2019-10 to 2023-09 Geographic location of data collection: - Sharara archaeological site, Wadi Hasa, Jordan - El-Hemmeh archaeological site, Wadi Hasa, Jordan Information about funding sources that supported the collection of the data: - British Academy Postdoctoral Fellowship (pf170053) - the National Geographic Council for Exploration and Research (HJ-043R-17) - the Deutsche Forschungsgemeinschaft (EXC 2150, ROOTS Ð Social, Environmental, and Cultural Connectivity in Past Societies; University of Kiel). SHARING/ACCESS INFORMATION Licenses/restrictions placed on the data: CC BY Links to publications that cite or use the data: Jade Whitlam, Pascal Flohr, Amy Bogaard, Mike Charles, Bill Finlayson, Cheryl A. Makarewicz. 2024. Preprint: Developmental plasticity and genetic selection shaped cereal evolution in the Early Holocene southern Levant. bioRxiv. https://doi.org/10.1101/2024.08.18.608467 Links to other publicly accessible locations of the data: Supplementary information of the Whitlam et al. 2024 preprint. Links/relationships to ancillary data sets: Linked to FlohrWhitlam_Isotope_Dataset, deposited on same depository (Flohr, P., J. Whitlam, A. Bogaard, M. Charles, B. Finlayson, C. Makarewicz. 2025. Plant carbon stable isotope values for the Neolithic sites of Sharara and el-Hemmeh, Jordan. Oxford University Research Archive (ORA). Also, in future data set with the grain photographs will be made available in the same repository and as supplementary information to the paper. Was data derived from another source? No Recommended citation for this dataset: J. Whitlam, Flohr, P., A. Bogaard, M. Charles, B. Finlayson, C. Makarewicz. 2025. Archaeobotanical data and analyses for the Neolithic sites of Sharara and el-Hemmeh, Jordan. Oxford University Research Archive (ORA). [Please also add the DOI to your citation]. DATA & FILE OVERVIEW File List: 1. Dataset_S1.csv: Archaeobotanical data (species by sample) for 34 Sharara samples, along with groupings and codes used in data analyses. 2. Dataset_S2.csv: Archaeobotanical data (species by sample) for 46 el-Hemmeh samples, along with groupings and codes used in data analyses. 3. Dataset_S3.csv: List of el-Hemmeh samples and whether they correspond to PPNA or LPPNB period. 4. Dataset_S4.csv: Barley grain measurements (breadth and thickness) for Sharara and el-Hemmeh, with corresponding descriptive classifications ('domestic-sized', 'intermediate-sized', wild-sized') based on established parameters. 5. Dataset_S5.csv: List of species and summary of relevant information taken from the Flora Palaestina to inform weed ecological analysis. Flowering duration of species and calculation of flowering duration calculated as number of months. 6. Dataset_S6.csv: Average attribute scores for each analysed archaeobotanical sample input into weed ecological analysis (run in SPSS) and the outputs of the Discriminant Analysis for Sharara and el-Hemmeh samples. 7. Whitlam_et_al_ArchaeobotanyAnalysis.R: Full R script used for data analysis. Please note all csv files need converting into excel files prior to running R script, or code will need updating. Additional related data collected that was not included in the current data package: - Photographs of grain cross-sections. - Isotope data. Are there multiple versions of the dataset? No, but part of the data has also been included in the supplementary information of the Whitlam et al. 2024 preprint and will be included in supplementary information of the final publication. METHODOLOGICAL INFORMATION The methods have been described in detail in the supplementary information of Whitlam et al. 2024, https://doi.org/10.1101/2024.08.18.608467. Below follows a shorter summary focused on the stable isotope analyses. Sampling and recovery of charred macrobotanical remains: Charred plant remains were recovered at Sharara and el-Hemmeh through a programme of systemic sampling and flotation. At el-Hemmeh, up to 10L sediments recovered from midden, hearth and storage contexts were floated using a hand-pump system that gently agitated sediments using continuously replaced water drawn from the reservoir adjacent to the site (Shelton and White, 2010). Floated materials were recovered using a 250µm mesh. Initial analysis of the charred macrobotanical remains recovered from el-Hemmeh during the 2004-2007 seasons, are reported in White and Makarewicz (2012) for PPNA plant remains and White and Wolff (2012) for LPPNB remains. For this study, we focused on a subset of previously studied PPNA (n=8) and LPPNB (n=11) samples, which were re-analysed to ensure consistency in identification and quantification between assemblages, using the methods reported in Whitlam et al. (2023). We also analysed 27 previously unstudied samples from PPNA levels at el-Hemmeh collected between 2010-2014. At Sharara, up to 10L of sediments were floated in buckets making use of the perennial water flow in the Wadi el-Hasa. All Sharara samples (n=34) analysed by Whitlam et al. (2023) were included in this study. Sorting and identification of charred plant remains took place in the Archaeobotany Laboratory at the School of Archaeology, University of Oxford. Identification and documentation: All non-woody charred plant remains were identified using a Leica MZ75 stereomicroscope at magnifications of 6x to 50x. Photographs and measurements of charred plant remains were obtained at the School of Archaeology using either a PixeLINK camera attached to a Nikon SMZ25 stereomicroscope and coupled to a PC, assisted by PixeLINK Capture and NIS-Elements D software, or a Leica DFC495 camera attached to a Leica Z6 APO and coupled to a PC, assisted by LAS software. Identifications were made according to morphological characteristics, surface texture and size, and verified by comparison to modern reference material housed at the School of Archaeology. Plant remains were quantified using the principle of recording the Ôminimum number of individualsÕ (MNI) following Jones (1991). Nomenclature follows the Flora Palaestina (Feinbrun-Dothan, 1986, 1978; Zohary, 1972, 1966), except for family names where modern conventions have been applied. Full sample-by-taxa data are provided in Datasets S1 and S2. Archaeobotanical analyses: To explore the botanical composition of samples, stacked bar charts showing the relative proportions of major botanical categories within samples containing 10 or more items were produced for Sharara, as well as PPNA and LPPNB el-Hemmeh. We also undertook correspondence analysis to further explore similarities and differences in the botanical composition of samples from PPNA and LPPNB el-Hemmeh. All taxa occurring in five or more (i.e. 10%) of el-Hemmeh samples (n=38), and all samples containing 30 or more items representing these taxa (n=31) were included in the correspondence analysis. The relative percentages of different chaff types for barley and emmer were also calculated for each assemblage (not including sample HEM_1143). Metrical analysis of barley grains: Breadth and thickness measurements were taken for all whole barley grains from Sharara, PPNA el-Hemmeh and LPPNB el-Hemmeh. At Sharara, where a relatively small number of whole barley grains were recovered, breadth and thickness were measured for all grain fragments to provide more data points. For PPNA and LPPNB el-Hemmeh, grain fragments were only measured if they were being submitted for stable isotope analysis (carbon and nitrogen). Measurements were taken at the widest point of the grain and are reported in millimeters (mm). Breadth and thickness measurements were used to assign barley grains to Ôwild-sizedÕ, Ôintermediate-sizedÕ and Ôdomestic-sizedÕ categories, following established parameters. Weed ecological data preparation: Flowering duration was selected as the most useful trait for distinguishing between tilled and non-tilled habitats. To ensure that samples included in the weed ecological analysis had a high probability of representing cereal crops and their associated weed assemblages, we only included samples in weed ecological analysis if they contained a minimum of 30 cereal items, and 10 weed items identified to a taxonomic level allowing attribution of a functional trait value. This excluded samples with minor proportions of crop items that are less likely to reflect cereal processing activities and related weed assemblages. We also undertook crop processing analysis to determine whether the cereal and weed remains within each sample were consistent in terms of the stage of the crop processing sequence represented. If consistent samples were considered to have a high likelihood of representing crops and their associated weeds. If inconsistent, the implication is that some weeds may have arrived on site via a route other than cereal harvesting, and thus would not be appropriate for reconstructing the conditions in which cereal crops were growing. Such samples were considered to have a low likelihood of representing crops and their associated weeds. We first experimented with using a minimum threshold of 100 cereal items and 10 weed items for the inclusion of samples in weed ecological analysis. However, lowering the threshold to 30 cereal items did not alter the number of samples included in the analysis for Sharara, while at PPNA el-Hemmeh only four more samples met this lower threshold for inclusion, all having a low likelihood of representing crops and their associated weeds based on crop processing analysis. For LPPNB el-Hemmeh, even with the lower threshold of 30 cereal items, no samples met the minimum criteria for inclusion in weed ecological analysis or crop processing analysis. This was a result of the low numbers of cereal items and the fact that few weed taxa and types could be identified to a taxonomic level sufficient for attribution of a functional-trait value. Crop processing analysis: To determine which stage of the crop processing sequence was represented within archaeobotanical samples, these were compared to ethnobotanical samples from the Greek Island of Amorgos that derive from known stages of the crop processing sequence, in two ways: firstly, in terms of the relative percentages of grain, rachis and weed seeds within samples following Jones (1990), and secondly via weed-based DA following Jones (Jones, 1987, 1984). Samples with minor proportions of crop items (either <100 or <30 depending on the threshold we were using) were excluded from the analyses. As barley and glume wheats have slightly different processing requirements and are unlikely to have been processed together, we also only included samples that represented a relatively pure crop (i.e., that were comprised of at least 80% barley or glume wheat). For relative percentages of grain, rachis and weed items, results are shown on triangular plots (triplots) alongside the Amorgos samples. Triplots were plotted in R using the ÔCropPro-packageÕ developed by Stroud et al. (in review). For weed-based DA, remains of wild plant taxa were classified according to their physical characteristics that determine at which stage in the crop processing sequence they are removed, for example, their size, headedness (i.e. the tendency for seeds to stay in heads despite threshing) and aerodynamic properties. These classifications are listed in Datasets S1 and S2 (Ôcrop pro codeÕ). The size of seeds was taken from measurements based on archaeological specimens and information on seed characteristics collated from various sources, including the Flora Palaestina, the Kew seed database (http://data.kew.org/sid/) and other published studies. A discriminant analysis was then performed on both the archaeological dataset and the ethnographic Amorgos samples from known crop processing stages (winnowing by product, coarse sieve by-product, fine sieve by-product and fine-sieve product). This classified each of the archaeological samples into one of the four ethnobotanical groups based on similarities in their weed composition. Only samples with at least 10 weed items that were classified in terms of their physical properties as these relate to crop processing were included in the DA. The DA was carried out in R using the ÔCropPro-packageÕ developed by Stroud et al. (in review). Reviewing weed taxa for inclusion in weed ecological analysis: Wild/weed taxa were reviewed prior to weed ecological analysis. All taxa identified to a species-level were included, with information on flowering duration taken directly from the Flora Palaestina. For taxa identified to a genus level, potential species lists were drawn up using the Flora Palaestina. Any species that could be ruled out on morphological grounds or that were reported as an introduced species in the Flora Palaestina were excluded from this list. The remaining species were then input into the analysis as an aggregate species group, using their average flowering duration. If the flowering durations of potential species listed for a taxon varied by more than two months, then the taxon was excluded from the analysis if it was poorly represented at the site in terms of abundance and/or ubiquity. If well represented, however, the taxon was included in the analysis using the average flowering duration of the aggregated species group, after first testing the effect of using the minimum and maximum flowering durations of the aggregated species group in the analysis. Weed ecological analysis: We used discriminant analysis to distinguish between arable (tilled) and non-arable (untilled) cereal habitats according to their disturbance conditions using the model developed by Weide et al. (2022). The archaeobotanical samples were entered into the classification phase of the analysis as cases with unknown disturbance conditions. The analysis classified each archaeobotanical sample as ÔarableÕ (tilled) or Ônon-arableÕ (untilled) with a low (<90%) or high (>90%) probability. All archaeobotanical samples received a discriminant score, which was used to visualize the relative position of samples to each other by plotting these along the extracted discriminant function. Samples from PPNA el-Hemmeh were analysed separately using average, minimum and maximum flowering duration values for aggregate species groups (as outlined above) where differences in flowering duration varied significantly. Instrument- or software-specific information needed to interpret the data: All data processing was done in MS Excel and R. Data analysis was undertaken in R Studio version 2023.12.1. For weed ecological analysis, data were prepared in MS Excel and R studio with the discriminant analysis performed using IBM SPSS version 27. Describe any quality-assurance procedures performed on the data: See above. People involved with sample collection, processing, analysis and/or submission: The authors, with advice from Alex Weide (University of Cambridge) and Elizabeth Stroud (University of Oxford). The Sharara and el-Hemmeh excavation teams were involved in collecting the samples in the field. *** DATA-SPECIFIC INFORMATION FOR: Dataset_S1.csv *** Description: Archaeobotanical data (species by sample) for 34 Sharara samples, along with groupings and codes used in data analyses. Number of variables: 44 Number of cases/rows: 87 including header row Variable List: - FAMILY: Botanical families identified in assemblage with naming following modern conventions. - Genus/Species: Taxonomic categorisations to genus/species level identified in assemblage. Nomenclature follows the Flora Palaestina. Additional descriptive categorisations (e.g. plant part and domestic morphology) also included. - group 1: major botanical categories, used to explore botanical composition of samples. - group 2: major botanical categories with cereals and wild/weed taxa further subdivided, used to explore botanical composition of samples. - chaff code: Chaff morphology code, based on identification of scar types as described in methods and Main Text. - corro code: Codes used in correspondence analysis (not done for Sharara). - Weed eco code: Codes used in weed ecological analysis. - triplot: Codes used to organise data to calculate percentages of grain, rachis and weed within samples. - crop pro code: Codes used in weed-based discriminant analysis. - SHAR_086a... SHAR_075: Sample name (for 34 samples) consisting of the site abbreviation ('SHAR' for Sharara) and the botanical sample number in three digits. For samples that were processed separately and later amalgamated all original three digit codes are retained and separated by a full stop. Missing data codes: 'N/a' for not applicable Specialised formats or other abbreviations used: None. *** DATA-SPECIFIC INFORMATION FOR: Dataset_S2.csv *** Description: Archaeobotanical data (species by sample) for 46 el-Hemmeh samples, along with groupings and codes used in data analyses. Number of variables: 55 Number of cases/rows: 145 including header row Variable List: - FAMILY: Botanical families identified in assemblage with naming following modern conventions. - Genus/Species: Taxonomic categorisations to genus/species level identified in assemblage. Nomenclature follows the Flora Palaestina. Additional descriptive categorisations (e.g. plant part and domestic morphology) also included. - group 1: major botanical categories, used to explore botanical composition of samples. - group 2: major botanical categories with cereals and wild/weed taxa further subdivided, used to explore botanical composition of samples. - chaff code: Chaff morphology code, based on identification of scar types as described in methods and Main Text. - corro code: Codes used in correspondence analysis. - Weed eco code: Codes used in weed ecological analysis. - triplot: Codes used to organise data to calculate percentages of grain, rachis and weed within samples. - crop pro code: Codes used in weed-based discriminant analysis. - - HEM_815... HEM_353: Sample name (for 46 samples) consisting of the site abbreviation ('HEM' for el-Hemmeh) and the botanical sample number in three digits and an optional letter. Missing data codes: 'N/a' for not applicable Specialised formats or other abbreviations used: None. *** DATA-SPECIFIC INFORMATION FOR: Dataset_S3.csv *** Description: List of 46 el-Hemmeh samples and whether they correspond to PPNA or LPPNB period. Number of variables: 2 Number of cases/rows: 43 including header row Variable List: - period: The cultural period. PPNA = Pre-Pottery Neolithic A, LPPNB = Late Pre-Pottery Neolithic B. - Sample: Sample name consisting of the site abbreviation ('HEM' for el-Hemmeh) and the botanical sample number in three digits and an optional letter. Missing data codes: not applicable Specialised formats or other abbreviations used: None. *** DATA-SPECIFIC INFORMATION FOR: Dataset_S4.csv *** Description: Barley grain measurements (breadth and thickness) for Sharara and el-Hemmeh, with corresponding descriptive classifications ('domestic-sized', 'intermediate-sized', wild-sized') based on established parameters. Number of variables: 6 Number of cases/rows: 232 including header row Variable List: - Grain ID: Unique grain ID consisting of the site abbreviation ('SHAR' for Sharara, 'HEM' for el-Hemmeh), the botanical sample number in three to four digits, and a letter indicating the individual grain measured from the botanical sample (e.g., 'SHAR069A' is grain A from botanical sample 069 from the site of Sharara). - Site: Abbreviation for the site and period. SHAR = Sharara, HEMMEH_PPNA = el-Hemmeh PPNA, HEMMEH_LPPNB = el-Hemmeh LPPNB. - Breadth (mm): measured breadth in millimetres of grain. - Thickness (mm): measure thickness in millimetres of grain. - White: Domestication status according to the metrics by White 2013. Values: Domestic-sized, Intermediate-sized, Wild-sized. See related paper of Whitlam et al. for an explanation. - Colledge: Domestication status according to the metrics by Colledge 2002. Values: Domestic-sized, Intermediate-sized, Wild-sized. See related paper of Whitlam et al. for an explanation. Missing data codes: not applicable Specialised formats or other abbreviations used: Average grain size for three assemblages (Sharara, el-Hemmeh PPNA and el-Hemmeh LPPNB also included in rows (SHAR, SHAR_AVG, 1.91, 1.25, Average, Average); (HEMA, HEMMMEH_PPNA_AVG, 2.40, 1.64, Average, Average); (HEMB, HEMMMEH_LPPNB_AVG, 2.47, 1.77, Average, Average). *** DATA-SPECIFIC INFORMATION FOR: Dataset_S5.csv *** Description: List of species and summary of relevant information taken from the Flora Palaestina to inform weed ecological analysis. Flowering duration of species and calculation of flowering duration calculated as number of months. Number of variables: 17 Number of cases/rows: 260 including header row Variable List: - Species: Species name taken from Flora Palaestina. - Family (Flora Palaestina): Taxonomic family name as listed in Flora Palaestina. - Family (current): Taxonomic family name according to modern conventions, - Flora Palaestina volume. Volume from which information for species extracted. - Fam. #: Family number in Flora Palaestina. - Gen. #: Genus number in Flora Palaestina. - Sp. #.: Species number in Flora Palaestina. - Genus: Genus name in Flora Palaestina. - Species: Species name in Flora Palaestina. - Section: Section name in Flora Palaestina. - Authority: Authority listed in Flora Palaestina. - Life form: Life form (Annual, Perennial, Biennial) as listed in Flora Palaestina. - height (cm): Height of plant reported in Flora Palaestina. - alt. (M): Altitude at which species occurs as reported in Flora Palaestina. - fl.: Months during which species flowers as reported in Flora Palaestina. - Fl duration (# months): number of months that species flowers for, calculated based on information in 'fl.; column. - Notes: Notes regarding specific decisions made were relevant, or characteristics, such as rarity of species, which would make them less likely candidates. Missing data codes: not applicable Specialised formats or other abbreviations used: None. *** DATA-SPECIFIC INFORMATION FOR: Dataset_S6.csv *** Description: Average attribute scores for each analysed archaeobotanical sample input into weed ecological analysis (run in SPSS) and the outputs of the Discriminant Analysis for Sharara and el-Hemmeh samples. Number of variables: 6 Number of cases/rows: 24 including header row Variable List: - ID: Unique ID number in analysis (starting at 258 following on from previous analyses undertaken by Weide et al. 2022). - Site: Abbreviation for the site, period and value range used. SHAR = Sharara, HEMMEH PPNA = el-Hemmeh PPNA (average values), HEMMEH PPNA = el-Hemmeh PPNA (min values), HEMMEH PPNA = el-Hemmeh PPNA (max values). See methods. - site/sample code: Sample name consisting of the site abbreviation ('SHAR' for Sharara, 'HEM' for el-Hemmeh) and the botanical sample number in three or four digits, sometimes including letter. - av_flor_dur: average flowering duration value calculated by weed ecological analysis (see methods). - DS: Discriminant Score calculated by weed ecological analysis (see methods). - predicted: predicted grouping (1 or 2) calculated by weed ecological analysis (see methods). Missing data codes: not applicable Specialised formats or other abbreviations used: None.