Journal article
Incorporating machine learning into sociological model-building
- Abstract:
- Quantitative sociologists frequently use simple linear functional forms to estimate associations among variables. However, there is little guidance on whether such simple functional forms correctly reflect the underlying data-generating process. Incorrect model specification can lead to misspecification bias, and a lack of scrutiny of functional forms fosters interference of researcher degrees of freedom in sociological work. In this article, I propose a framework that uses flexible machine learning (ML) methods to provide an indication of the fit potential in a dataset containing the exact same covariates as a researcher’s hypothesized model. When this ML-based fit potential strongly outperforms the researcher’s self-hypothesized functional form, it implies a lack of complexity in the latter. Advances in the field of explainable AI, like the increasingly popular Shapley values, can be used to generate understanding into the ML model such that the researcher’s original functional form can be improved accordingly. The proposed framework aims to use ML beyond solely predictive questions, helping sociologists exploit the potential of ML to identify intricate patterns in data to specify better-fitting, interpretable models. I illustrate the proposed framework using a simulation and real-world examples.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 3.9MB, Terms of use)
-
- Publisher copy:
- 10.1177/00811750231217734
Authors
+ Economic and Social Research Council
More from this funder
- Funder identifier:
- https://ror.org/03n0ht308
- Publisher:
- SAGE Publications
- Journal:
- Sociological Methodology More from this journal
- Volume:
- 54
- Issue:
- 2
- Pages:
- 217-268
- Publication date:
- 2024-01-13
- Acceptance date:
- 2023-09-12
- DOI:
- EISSN:
-
1467-9531
- ISSN:
-
0081-1750
- Language:
-
English
- Keywords:
- Pubs id:
-
1611460
- Local pid:
-
pubs:1611460
- Deposit date:
-
2024-02-02
Terms of use
- Copyright holder:
- Mark Verhagen
- Copyright date:
- 2024
- Rights statement:
- © The Author(s) 2024. This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record