Journal article icon

Journal article

Equiflow: An open-source software package for evaluating changes in cohort composition

Abstract:
Clinical research studies routinely apply exclusion criteria and data preprocessing steps that can substantially alter dataset composition, potentially introducing hidden biases that affect validity and generalizability. This is particularly important in artificial intelligence/machine learning (AI/ML) studies where models learn patterns directly from training data. We developed Equiflow, an open-source Python package that automates creation of enhanced participant flow diagrams tracking both sample size and composition changes throughout studies. Equiflow quantifies distributional shifts at each exclusion step and generates visualizations showing how key clinical and demographic variables evolve during participant selection. In a case study of sepsis patients from the eICU database, sequential exclusions reduced the sample from 126,750–1,094 patients. Requiring non-missing troponin measurements in the final step of data processing caused substantial demographic shifts that would typically remain invisible in traditional reporting. By making compositional biases visible during cohort construction before modeling begins, Equiflow enables researchers to make informed decisions about analyses and acknowledge limitations in generalizability to their readers. This standardized, open-source approach promotes transparency in clinical research and supports development of more equitable clinical AI systems, addressing a critical need as healthcare increasingly relies on data-driven decision making.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1371/journal.pdig.0001342

Authors

More by this author
Role:
Author
ORCID:
0000-0002-8623-9500


Publisher:
Public Library of Science
Journal:
PLOS Digital Health More from this journal
Volume:
5
Issue:
4
Pages:
e0001342
Article number:
e0001342
Publication date:
2026-04-08
Acceptance date:
2026-03-18
DOI:
EISSN:
2767-3170
ISSN:
2767-3170


Language:
English
Pubs id:
2407769
Local pid:
pubs:2407769
Source identifiers:
3930330
Deposit date:
2026-04-08
ARK identifier:
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP