Preliminary code for Representation learning with Generalized Similarity Functions

Code for GSF learning in offline Procgen.

Note: The repo is under construction right now, some experiments might still be changed/ added.

Since the dataset is very large due to operating on pixel observations, we provide a way to generate it from pre-trained PPO checkpoints instead of hosting 1Tb+ of data.

Instructions

  1. Clone the repo
  2. Either train a PPO agent from scratch on 200 levels (see here: here), or download provided PPO checkpoints (same repo link). TLDR, you can run python train_ppo.py --env_name=bigfish in the current repo to do so.
  3. Run python evaluate_ppo.py --dataset_dir --shards --timesteps --obs_type rgb --model_dir=.
    This will generate obs_X.npy, action_X.npy, reward_X.npy, done_X.npy arrays, where X goes from 1 to n_shards.
  4. You can then work on these NumPy arrays in the classical

     

     

     

    To finish reading, please visit source site