Introduction ============ This data repository contains the data underpinning the following paper: Title: Group-personalized regression models for prediction of mental health scores from objective mobile phone data streams Authors: N. Palmius; K. E. A. Saunders; O. Carr; J. R. Geddes; G. M. Goodwin; and M. De Vos Journal: Journal of Medical Internet Research Full citation details are available in the metadata for the repository. Data Overview ============= The following data are available in this repository: * Demographic data from Table 2. * QIDS scores and calculated feature for participants included in the analysis. * Mean regression results underpinning the summary statistics in Table 3. Data format and notation ------------------------ All data are provided in standard comma-delimited csv files in the compressed data.zip file. Each participant is identified by a unique random ID field, different from the one used to identify participants in the associated AMoSS study. Full Data Description ===================== Demographic data ---------------- Demographic data from Table 2 are available in the demographics.csv file. This includes the following data for each participant: * ParticipantID: Unique identifier for the participant. * Cohort: The diagnosis of the participant - healthy control (HC); bipolar disorder (BD) or borderline personality disorder (BPD). * Gender: Gender of the participant - m or f. * Age: The age of the participant in whole years on entry to the study. * BMI: The BMI of the participant on entry to the study. * WeeksOfData: The number of valid labelled weeks of data available for processing from the participant. * QIDSMean: The mean value of the QIDS scores in the valid labelled weeks of data available for processing from the participant. * QIDSRange: The range of the QIDS scores in the valid labelled weeks of data available for processing from the participant (maximum - minimum QIDS scores). QIDS scores and feature values ------------------------------ Calculated feature values for the valid labelled weeks of data from the participants are available in the data.csv file. This contains the following details for each week of valid data: * ParticipantID: Unique identifier for the participant. * WeekBeginning: The first day of the week (Monday) for the data from which the features were calculated. * QIDS: The QIDS score label for the week. * Features: The next 10 columns contain the raw calculated features. Features are notated using the feature abbreviation given in Table 1 of the paper, which also gives details of how each feature is calculated. Regression model results ------------------------ Individual regression model results summarised in Table 3 are available in the remaining .csv files, named results__predictions.csv. There are six files as follows: * results_population_level_model_predictions.csv: Results from the population-level model. * results_fully_personalized_cv_model_predictions.csv: Results from the fully personalized model using cross validation over all data points. * results_fully_personalized_model_predictions.csv: Results from the fully personalized model trained on calibration data. * results_group_personalized_model_predictions.csv: Results from the group-personalized model with optimized clusters. * results_group_personalized_model_calibration_predictions.csv: Results from the group-personalized model with clusters allocated using calibration data. * results_community_similarity_network_predictions.csv: Results from the Community Similarity Network clustering model based on Lane et al. / Abdullah et al. In all six files, the following fields identify the data: * ParticipantID: Unique identifier for the participant. * WeekBeginning: The first day of the week (Monday) for the data for which the results were calculated. The following additional fields are used in specific data files: * results_population_level_model_predictions.csv: - TrainTest: Identifies the calibration data (0) used in the training of the model; and the test data (1) used to present the results in Table 3. - PopulationLevelModelPrediction: The mean predicted QIDS scores over all 1000 iterations of Gibbs sampling of the Bayesian Lasso model. * results_fully_personalized_cv_model_predictions.csv: - TrainTest: Identifies the calibration data (0) used in the training of the model; and the test data (1) used to present the results in Table 3. - FullyPersonalizedModelPrediction: The mean predicted QIDS scores over all 1000 iterations of Gibbs sampling of the Bayesian Lasso model. * results_fully_personalized_cv_model_predictions.csv: - TrainTest: Identifies the calibration data (0) used in the training of the model in iteration n; and the test data (1) used to evaluate the model. - FullyPersonalizedModelPrediction: The mean predicted QIDS scores in iteration n over all 1000 iterations of Gibbs sampling of the Bayesian Lasso model. The results in Table 3 show the mean MAE value of the predictions on the test participants in each iteration. * results_group_personalized_model_predictions.csv / results_group_personalized_model_calibration_predictions.csv: - TrainTest: Identifies the calibration data (0) used in the training of the model; and the test data (1) used to present the results in Table 3. - GroupAllocated: The group ID to which the participant was allocated. GroupPersonalizedModelPrediction: The mean predicted QIDS scores over all 1000 iterations of Gibbs sampling of the Bayesian Lasso model. * results_community_similarity_network_predictions.csv: - CommunitySimilarityNetworkPrediction: The mean predicted QIDS scores using the Community Similarity Network clustering model.