Thesis
Towards trustworthy machine learning models in vision, physics, and language applications
- Abstract:
-
The wide-ranging impact of machine learning, particularly deep neural networks, cannot be overstated. These highly capable models are now deployed in critical domains such as autonomous driving, medical diagnosis, finance, and manufacturing. While their adoption is driven by superior performance on benchmark tasks, their data-driven nature often renders them unpredictable when encountering non-standard inputs. This unpredictability poses a significant challenge in safety-critical applications, where interactions with humans or the potential for system failures could lead to severe consequences. This underscores the need for trustworthy machine learning, where models must not only excel in standard metrics but also prove to be reliable and robust in real-world settings.
Our work addresses this need by improving methods that aim to either certify the robustness of these systems or, at a minimum, provide strong empirical evaluations of robustness and safety to support responsible deployment. We present advancements in probabilistic certification for image classification via randomized smoothing, introduce a general framework for verifying the partial derivatives of neural networks, which has applications in certifying the correctness of physics-informed neural networks, and analyse the safety risks involved in fine-tuning large language models on task-specific data, along with mitigation strategies. Additionally, we explore the broader implications of open-source generative AI models for improving trustworthiness. These contributions mark a step forward in developing trustworthy machine learning systems, and we conclude by discussing their strengths, limitations, as well as key open questions that remain for the field.
Actions
Authors
- Funder identifier:
- https://ror.org/0439y7842
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Deposit date:
-
2025-09-30
Terms of use
- Copyright holder:
- Francisco Girbal Eiras
- Copyright date:
- 2024
If you are the owner of this record, you can report an update to it here: Report update to this record