Preliminary code for Representation learning with Generalized Similarity Functions

Code for GSF learning in offline Procgen.

Note: The repo is under construction right now, some experiments might still be changed/ added.

Since the dataset is very large due to operating on pixel observations, we provide a way to generate it from pre-trained PPO checkpoints instead of hosting 1Tb+ of data.

Instructions

Clone the repo
Either train a PPO agent from scratch on 200 levels (see here: here), or download provided PPO checkpoints (same repo link). TLDR, you can run python train_ppo.py --env_name=bigfish in the current repo to do so.
Run python evaluate_ppo.py --dataset_dir --shards --timesteps --obs_type rgb --model_dir=.
This will generate obs_X.npy, action_X.npy, reward_X.npy, done_X.npy arrays, where X goes from 1 to n_shards.
You can then work on these NumPy arrays in the classical

To finish reading, please visit source site