Journal article icon

Journal article

Investigating the association of environmental exposures and all-cause mortality in the UK Biobank using sparse principal component analysis

Abstract:
Multicollinearity refers to the presence of collinearity between multiple variables and renders the results of statistical inference erroneous (Type II error). This is particularly important in environmental health research where multicollinearity can hinder inference. To address this, correlated variables are often excluded from the analysis, limiting the discovery of new associations. An alternative approach to address this problem is the use of principal component analysis. This method, combines and projects a group of correlated variables onto a new orthogonal space. While this resolves the multicollinearity problem, it poses another challenge in relation to interpretability of results. Standard hypothesis testing methods can be used to evaluate the association of projected predictors, called principal components, with the outcomes of interest, however, there is no established way to trace the significance of principal components back to individual variables. To address this problem, we investigated the use of sparse principal component analysis which enforces a parsimonious projection. We hypothesise that this parsimony could facilitate the interpretability of findings. To this end, we investigated the association of 20 environmental predictors with all-cause mortality adjusting for demographic, socioeconomic, physiological, and behavioural factors. The study was conducted in a cohort of 379,690 individuals in the UK. During an average follow-up of 8.05 years (3,055,166 total person-years), 14,996 deaths were observed. We used Cox regression models to estimate the hazard ratio (HR) and 95% confidence intervals (CI). The Cox models were fitted to the standardised environmental predictors (a) without any transformation (b) transformed with PCA, and (c) transformed with SPCA. The comparison of findings underlined the potential of SPCA for conducting inference in scenarios where multicollinearity can increase the risk of Type II error. Our analysis unravelled a significant association between average noise pollution and increased risk of all-cause mortality. Specifically, those in the upper deciles of noise exposure have between 5 and 10% increased risk of all-cause mortality compared to the lowest decile.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Publisher copy:
10.1038/s41598-022-13362-3

Authors


More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Women's & Reproductive Health
Oxford college:
Linacre College
Role:
Author
ORCID:
0000-0002-0576-8874
More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Women's & Reproductive Health
Role:
Author
ORCID:
0000-0002-4807-4610


Publisher:
Springer Nature
Journal:
Scientific Reports More from this journal
Volume:
12
Article number:
9239
Publication date:
2022-06-02
Acceptance date:
2022-05-13
DOI:
EISSN:
2045-2322
Pmid:
35654993


Language:
English
Keywords:
Pubs id:
1262818
Local pid:
pubs:1262818
Deposit date:
2022-08-19

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP