Skip to content

realiti4/snake-rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

snake-rl

This repo tries to solve snake game (10, 10) and (20, 20) in an efficient way with multiple envs and fp16 training using A2C and PPO.

Installation

This project uses uv for dependency management and supports modern Python package standards via pyproject.toml.

Prerequisites

  • Python 3.12 or higher
  • uv package manager

Install with uv (Recommended)

# Install uv if you haven't already
pip install uv

# Install the project and all dependencies
uv sync

Install with pip (Alternative)

pip install -e .

The project automatically installs:

  • Modified version of tianshou with fp16 training and custom networks support
  • Gym-Snake environment
  • PyTorch with CUDA 12.8 support (configurable via tool.uv.sources in pyproject.toml)

Training

With uv:

uv run main_a2c.py  # for A2C training
# or
uv run main_ppo.py  # for PPO training

With traditional Python:

python main_a2c.py  # for A2C training
# or
python main_ppo.py  # for PPO training

Notes

  • It takes around 4 hours to train a decent A2C that reaches average of 80 reward (max is 97 for 10, 10). Still experimenting to max it and make it stable.

  • PPO is more stable, but it takes a little bit longer to learn.

  • These results acquired with 256 env running simultaneously, so it requires many steps. I optimized the environment above so it wasn't a problem even on a single core. On Linux you can use SubprocVectorEnv instead of dummy one to utilize all cores. It shouldn't be a problem to run hundreds of envs at the same time.

  • You can comment out onpolicy_trainer(...) section and enable loading to watch pretrained agents.

About

Solving Snake game with A2C and PPO - Pytorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages