Conference item
TACO: Learning task decomposition via temporal alignment for control
- Abstract:
- Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, they provide training data for each policy from different high-level tasks and compose them to perform novel ones. However, most existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. By contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies, which improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort, and significantly outperforms methods which separate segmentation and imitation.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 3.1MB, Terms of use)
-
Authors
- Publisher:
- Journal of Machine Learning Research
- Host title:
- International Conference on Machine Learning
- Journal:
- Thirty-fifth International Conference on Machine Learning (ICML 2018) More from this journal
- Publication date:
- 2018-07-03
- Acceptance date:
- 2018-06-12
- Pubs id:
-
pubs:857022
- UUID:
-
uuid:db521575-3720-4091-94e7-e6a5da1fedb5
- Local pid:
-
pubs:857022
- Source identifiers:
-
857022
- Deposit date:
-
2018-06-12
- ARK identifier:
Terms of use
- Copyright holder:
- Whiteson et al
- Copyright date:
- 2018
- Notes:
- Copyright 2018 by the author(s). This is the accepted manuscript version of the article. The final version is available online from Journal of Machine Learning Research at: http://proceedings.mlr.press/v80/shiarlis18a.html
If you are the owner of this record, you can report an update to it here: Report update to this record