Thesis
Investigating B-cell repertoire data using deep learning approaches to aid in the development of antibody therapeutics
- Abstract:
-
Antibodies have become an invaluable form of biotherapeutics, with an increasing number of new antibody derived therapeutics being developed and marketed each year. Despite their success, the process of antibody discovery and design remains challenging. In this DPhil, we leverage publicly available antibody sequences to develop computational tools to aid in the design of therapeutic antibodies. We introduce two new databases; an updated and expanded Observed Antibody Space (OAS) and the Patent and Literature Antibody Database (PLAbDab). To investigate these databases, we developed KA-Search, a rapid and flexible tool for antibody sequence identity search, and demonstrated its use at mining the billions of sequences in OAS and obtaining new binding-specific insights with PLAbDAb.
Deep learning methods also benefits from the growth of available antibody data. General protein language models can effectively capture context-aware protein sequence representations, useful for state-of-the-art predictions. For antibody specific tasks, a language model trained solely on antibodies may be more powerful. We therefore developed AbLang, trained on OAS, and demonstrated how it learns inherent antibody patterns and can restore fragmented antibody sequences. However, we reveal how antibody sequences are considerably biased towards the germline, potentially limiting antibody-trained models ability to suggest relevant non-germline mutations. To overcome this, we introduced AbLang-2, a refined model capable of suggesting a diverse set of valid mutations with high cumulative probability.
Collectively, the insights, databases, and computational tools presented in this work enhance our computational capabilities in antibody design and opens the way for leveraging deep learning in therapeutic antibody discovery and design.
Actions
Authors
Contributors
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Statistics
- Role:
- Supervisor
- ORCID:
- 0000-0003-1388-2252
- Funder identifier:
- https://ror.org/0439y7842
- Grant:
- EP/S024093/1
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2024-07-10
If you are the owner of this record, you can report an update to it here: Report update to this record