Journal article icon

Journal article

CGAT-core: a python framework for building scalable, reproducible computational biology workflows

Abstract:
In the genomics era computational biologists regularly need to process, analyse and integrate large and complex biomedical datasets. Analysis inevitably involves multiple dependent steps, resulting in complex pipelines or workflows, often with several branches. Large data volumes mean that processing needs to be quick and efficient and scientific rigour requires that analysis be consistent and fully reproducible. We have developed CGAT-core, a python package for the rapid construction of complex computational workflows. CGAT-core seamlessly handles parallelisation across high performance computing clusters, integration of Conda environments, full parameterisation, database integration and logging. To illustrate our workflow framework, we present a pipeline for the analysis of RNAseq data using pseudo-alignment.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Publisher copy:
10.12688/f1000research.18674.2

Authors


More by this author
Institution:
University of Oxford
Division:
MSD
Department:
Physiology Anatomy & Genetics
Role:
Author
ORCID:
0000-0001-5288-3077


Publisher:
F1000Research
Journal:
F1000Research More from this journal
Volume:
8
Article number:
377
Publication date:
2019-04-04
Acceptance date:
2019-04-04
DOI:
ISSN:
2046-1402


Language:
English
Keywords:
Pubs id:
pubs:997713
UUID:
uuid:467be341-ca5e-4b95-b313-97fa70ad102e
Local pid:
pubs:997713
Source identifiers:
997713
Deposit date:
2019-05-12

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP