Thesis
Unsupervised learning and continual learning in neural networks
- Abstract:
-
For decades research has pursued the ambitious goal of designing computer models that learn to solve problems as effectively as humans can. Artificial neural networks -- generic, optimizable models that were originally inspired by biological neurons in the brain -- appear to provide a promising answer. However, a significant limitation of current models is that they tend to only be reliably proficient at tasks and datasets that they were explicitly trained on. If more than one task or dataset is being trained on, samples need to be appropriately mixed and balanced so that training on successive batches does not induce forgetting of knowledge learned in previous batches, which is an impediment to continual learning. Furthermore, associations need to be made explicit via paired input-target samples for the trained network to achieve its best performance on desired tasks; when the network is trained in an unsupervised manner without explicit targets, in an effort to reduce the cost of data collection, knowledge learned by the network transfers to desired tasks significantly worse compared to supervised training with explicit associations.
Each of these problems relates to the fundamental issue of generalization, which is the ability to perform well despite novelty. In chapter 2, we discuss conditions under which good generalization can be expected to arise, including small model size and similarity between training and test data in supervised, unsupervised and continual learning contexts. Chapter 3 proposes a method for predicting when a model does not generalize to a test sample, deriving generalization bounds that quantify predictive reliability using both model size and similarity with training data. Chapter 4 presents a clustering method that learns how to approximately separate data between semantic concepts with an unsupervised objective that does not use manual labels. Chapter 5 contains a method for performing the task of object localization without specialized training data, by repurposing saliency maps. Chapter 6 contains a continual learning method where the model is forced to reconsider previously held knowledge concurrent with new knowledge, and chapter 7 uses a dynamic architecture to suppress interference from new learning episodes on old knowledge.
Without solutions to these generalization problems, neural networks cannot learn effectively in real time from naturally sequential and un-annotated real world data, which limits their deployment options. Generalization is therefore a problem with immense practical implications, as well as being interesting theoretically and from the perspective of biologically-inspired learning.
Actions
- Funding agency for:
- Ji, X
- Grant:
- 1753489
- Programme:
- EPSRC CDT in Autonomous Intelligent Machines and Systems
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2021-09-02
Terms of use
- Copyright holder:
- Ji, X
- Copyright date:
- 2021
If you are the owner of this record, you can report an update to it here: Report update to this record