SCU OlympicsRunning Baseline
Environment check details in Jidi Competition RLChina2021智能体竞赛 做出的修改: 奖励重塑:修改了环境,重新设置了奖励的分配,使得奖励组成不只有零和博弈,还有探索环境的奖励。 算法微调:修改了官方PPO算法的actor loss部分,增加了对actor分布熵的约束,未来计划加入RND、ICM等辅助部件。 Dependency conda create -n olympics python=3.8.5 conda activate olympics pip install -r requirements.txt Run a game python olympics/main.py Train a baseline agent python rl_trainer/main.py By default parameters, the total reward of training is shown below. GitHub View Github
Read more