Thesis
Deep generative models for biology: represent, predict, design
- Abstract:
- 
		
			Deep generative models have revolutionized the field of artificial intelligence, fundamentally changing how we generate novel objects that imitate or extrapolate from training data, and transforming how we access and consume various types of information such as texts, images, speech, and computer programs. They have the potential to radically transform other scientific disciplines, ranging from mathematical problem solving, to supporting fast and accurate simulations in high-energy physics or enabling rapid weather forecasting. In computational biology, generative models hold immense promise for improving our understanding of complex biological processes, designing new drugs and therapies, and forecasting viral evolution during pandemics, among many other applications. Biological objects pose however unique challenges due to their inherent complexity, encompassing massive spaces, multiple complementary data modalities, and a unique interplay between highly structured and relatively unstructured components. In this thesis, we develop several deep generative modeling frameworks that are motivated by key questions in computational biology. Given the interdisciplinary nature of this endeavor, we first provide a comprehensive background in generative modeling, uncertainty quantification, sequential decision making, as well as important concepts in biology and chemistry to facilitate a thorough understanding of our work. We then deep dive into the core of our contributions, which are structured around three chapters. The first chapter introduces methods for learning representations of biological sequences, laying the foundation for subsequent analyses. The second chapter illustrates how these representations can be leveraged to predict complex properties of biomolecules, focusing on three specific applications: protein fitness prediction, the effects of genetic variations on human disease risk and viral immune escape. Finally, the third chapter is dedicated to methods for designing novel biomolecules, including drug target identification, de novo molecular optimization, and protein engineering. This thesis also makes several methodological contributions to broader machine learning challenges, such as uncertainty quantification in high-dimensional spaces or efficient transformer architectures, which hold potential value in other application domains. We conclude by summarizing our key findings, highlighting shortcomings of current approaches, proposing potential avenues for future research, and discussing emerging trends within the field. 
Actions
Authors
Contributors
- Role:
- Supervisor
- ORCID:
- 0000-0002-2733-2078
- Role:
- Examiner
- ORCID:
- 0000-0003-1388-2252
- Role:
- Examiner
- Funder identifier:
- https://ror.org/0439y7842
- Funding agency for:
- Notin, P
- Grant:
- 18000077
- Programme:
- EPSRC iCASE studentship with GSK
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
- 
                    English
- Keywords:
- Subjects:
- Deposit date:
- 
                    2023-12-21
Terms of use
- Copyright holder:
- Pascal Notin
- Copyright date:
- 2023
If you are the owner of this record, you can report an update to it here: Report update to this record