When to trust your model: Model-based policy optimization in offline RL settings

This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings
Paper:When to trust your model: Model-based policy optimization
With much thanks, this code is based on Xingyu-Lin‘s easy-to-read pytorch implementation of MBPO

See requirements.txt
The code depends on D4RL‘s environments and datasets
Only support hopper, walker, halfcheetah and ant environments right now (if you wish to evaluate in other environments, modify the termination function in predict_env.py)

Simply run

  
python main_mbpo.py --env_name=halfcheetah-medium-v0 --seed=1234
  

Or modify the script runalgo.sh, then

GitHub

View Github

 

 

 

To finish reading, please visit source site