When to trust your model: Model-based policy optimization in offline RL settings
This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settingsPaper:When to trust your model: Model-based policy optimizationWith much thanks, this code is based on Xingyu-Lin‘s easy-to-read pytorch implementation of MBPO See requirements.txtThe code depends on D4RL‘s environments and datasetsOnly support hopper, walker, halfcheetah and ant environments right now (if you wish to evaluate in other environments, modify the termination function in predict_env.py) Simply run python main_mbpo.py –env_name=halfcheetah-medium-v0 –seed=1234 […]
Read more