Thesis
Spatial reasoning and planning for deep embodied agents
- Abstract:
-
Humans can perform complex tasks with long-term objectives by planning, reasoning, and forecasting outcomes of actions. For embodied agents (e.g. robots) to achieve similar capabilities, they must gain knowledge of the environment transferable to novel scenarios with a limited budget of additional trial and error. Learning-based approaches, such as deep reinforcement learning, can discover and take advantage of inherent regularities and characteristics of the application domain from data, and continuously improve their performances, however at a cost of large amounts of training data. This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks, focusing on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
Four key contributions are made. Firstly, CALVIN, a differential planner that learns interpretable models of the world for long-term planning. It successfully navigated partially observable 3D environments, such as mazes and indoor rooms, by learning the rewards (goals and obstacles) and state transitions (robot dynamics) from expert demonstrations.
Secondly, SOAP, a reinforcement learning algorithm that discovers macro-actions (options) unsupervised for long-horizon tasks. Options segment a task into subtasks and enable consistent execution of the subtask. SOAP showed robust performances on history-conditional corridor tasks as well as classical benchmarks such as Atari.
Thirdly, LangProp, a code optimisation framework using Large Language Models to solve embodied agent problems that require reasoning by treating code as learnable policies. The framework successfully generated interpretable code with comparable or superior performance to human-written experts in the CARLA autonomous driving benchmark.
Finally, Voggite, an embodied agent with a vision-to-action transformer backend that solves complex tasks in Minecraft. It achieved third place in the MineRL BASALT Competition by identifying action triggers to segment tasks into multiple stages.
These advancements provide new avenues for applications of learning-based methods in complex spatial reasoning and planning challenges.
Actions
Authors
Contributors
- Institution:
- University of Oxford
- Division:
- MPLS
- Department:
- Engineering Science
- Role:
- Supervisor
- DOI:
- Type of award:
- DPhil
- Level of award:
- Doctoral
- Awarding institution:
- University of Oxford
- Language:
-
English
- Keywords:
- Subjects:
- Deposit date:
-
2024-09-22
Terms of use
- Copyright holder:
- Ishida, S
- Copyright date:
- 2024
If you are the owner of this record, you can report an update to it here: Report update to this record