Spatial reasoning and planning for deep embodied agents

Ishida, S

Thesis

Spatial reasoning and planning for deep embodied agents

Abstract:: Humans can perform complex tasks with long-term objectives by planning, reasoning, and forecasting outcomes of actions. For embodied agents (e.g. robots) to achieve similar capabilities, they must gain knowledge of the environment transferable to novel scenarios with a limited budget of additional trial and error. Learning-based approaches, such as deep reinforcement learning, can discover and take advantage of inherent regularities and characteristics of the application domain from data, and continuously improve their performances, however at a cost of large amounts of training data. This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks, focusing on enhancing learning efficiency, interpretability, and transferability across novel scenarios.

Four key contributions are made. Firstly, CALVIN, a differential planner that learns interpretable models of the world for long-term planning. It successfully navigated partially observable 3D environments, such as mazes and indoor rooms, by learning the rewards (goals and obstacles) and state transitions (robot dynamics) from expert demonstrations.

Secondly, SOAP, a reinforcement learning algorithm that discovers macro-actions (options) unsupervised for long-horizon tasks. Options segment a task into subtasks and enable consistent execution of the subtask. SOAP showed robust performances on history-conditional corridor tasks as well as classical benchmarks such as Atari.

Thirdly, LangProp, a code optimisation framework using Large Language Models to solve embodied agent problems that require reasoning by treating code as learnable policies. The framework successfully generated interpretable code with comparable or superior performance to human-written experts in the CARLA autonomous driving benchmark.

Finally, Voggite, an embodied agent with a vision-to-action transformer backend that solves complex tasks in Minecraft. It achieved third place in the MineRL BASALT Competition by identifying action triggers to segment tasks into multiple stages.

These advancements provide new avenues for applications of learning-based methods in complex spatial reasoning and planning challenges.

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Ishida, S. (2024). Spatial reasoning and planning for deep embodied agents [PhD thesis]. University of Oxford.

MLA Style

Ishida, S. Spatial Reasoning and Planning for Deep Embodied Agents. University of Oxford, 2024.

Chicago Style

Ishida, S. 2024. “Spatial Reasoning and Planning for Deep Embodied Agents.” PhD thesis, University of Oxford.
Share
Print