Hierarchical policies perform control at different levels of abstraction, providing benefits such as:
Previous work often relies on costly online interaction, as offline learning within hierarchical formulations is limited by:
We propose OHIO, a framework to learn hierarchical behavior policies from offline data. By exploiting structural knowledge of the low-level policy, we solve an inverse problem (top center) to transform low-level trajectory data (top left) into a dataset amenable to offline RL (top right), regardless of the nature of the policy used for data collection. At inference time, the RL-trained policy provides inputs to the low-level policy (bottom).
If no analytic form exists, we solve numerically using:
To demonstrate broad applicability of our framework, OHIO, is evaluated across different domains:
✔️ OHIO successfully recovers hierarchical policies from non-hierarchical datasets
✔️ Analytical inverse is more susceptible to model error and model misspecifications
✔️ Selecting the observed next state as high-level action is ineffective
✔️ OHIO facilitates offline RL under unknown or varying low-level configurations
✔️ OHIO directly encodes domain-specific constraints and enables robust online fine-tuning
✔️ OHIO significantly improves scalability and performance
@inproceedings{
schmidt2025offline,
title={Offline Hierarchical Reinforcement Learning via Inverse Optimization},
author={Carolin Schmidt and Daniele Gammelli and James Harrison and Marco Pavone and Filipe Rodrigues},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=dTPz4rEDok}}