Status: Experimental Prototype
This repository is an AI-assisted “vibe coding” experiment, not a finished or polished game AI.
This repository contains a small Python prototype that applies an AlphaZero-style search (MCTS + neural network) to a simplified, offline version of the StarCraft II arcade map “New Random Tower Defense”.
To avoid confusion:
- It is not a general StarCraft II reinforcement learning agent.
- It does not connect to the live SC2 client or play ladder games.
- It does not represent my low-level implementation skills in RL or deep learning.
Instead, this code runs a standalone grid-based simulation that imitates some of the decision structure of the arcade tower defense game (e.g. tower placement along fixed mob paths), in order to test an AlphaZero-style loop in a toy setting.
This project was built as an explicit AI-assisted “vibe coding” experiment:
-
Most of the concrete Python code was written by AI (LLMs).
- MCTS scaffolding
- Environment wrappers
- Training loop boilerplate
- Model class structure
-
My contribution is primarily conceptual and architectural:
- Describing the mechanics and constraints of New Random Tower Defense
- Proposing state and action representations (grid, lanes, tower slots)
- Designing and iterating on reward ideas
- Steering and editing AI-generated code to roughly match those ideas
Please do not treat this repository as a portfolio of “pure hand-written RL code.”
It is closer to a log of how I used AI tools to prototype and explore an idea.
The main goals of this prototype are:
- to represent tower-defense states on a discrete grid,
- to use Monte Carlo Tree Search (MCTS) guided by a neural network,
- and to see whether AlphaZero-style self-play can discover sensible tower placement policies in a simplified offline environment.
It should be viewed as:
- experimental,
- unstable,
- and primarily educational / exploratory.
Core files:
-
alpha_mcts.py
Monte Carlo Tree Search implementation:- selection
- expansion
- simulation
- backpropagation
-
alpha_env.py
Custom environment wrapper:- defines the grid-based state and action space
- encodes a simplified tower-defense reward
- tracks mob progress along lanes
-
alpha_model.py
Neural network used by MCTS:- policy head (action probabilities over legal tower placements)
- value head (estimated outcome of a state)
-
alpha_train.py
AlphaZero-style training loop:- runs self-play episodes using MCTS + current network
- stores game histories
- periodically updates the network from collected data
-
alpha_rtd.py
Example script for running the agent in a specific Random Tower Defense scenario. -
alpha_common.py
Shared helpers:- loading grid/waypoint CSVs
- configuration utilities
-
config.json
Central configuration for hyperparameters (learning rate, number of simulations, etc.).
Data files:
-
mob_path_waypoints_v2.csv
Approximate creep/mob path waypoints for the offline grid simulation. -
grid with lane and slot.csv
Grid layout specification:- lane paths
- valid tower slots
- blocked cells
git clone https://github.com/7riangle/sc2-rtd-alphazero.git
cd sc2-rtd-alphazeroRequires Python 3.8+.
If you have a requirements.txt:
pip install -r requirements.txtOr install core libraries manually:
pip install numpy pandas gymnasium torchOptionally, for plotting and progress bars:
pip install matplotlib tqdmCurrently, the code expects the CSV files to be in the repository root directory:
mob_path_waypoints_v2.csvgrid with lane and slot.csv
If you move them into a data/ folder, you will need to update the paths in alpha_common.py (and any other script that loads these files).
python alpha_train.pyThis starts an AlphaZero-style self-play loop in the offline grid environment:
- the agent generates games using MCTS + the current network,
- game data is stored,
- and the network is updated from these examples.
python alpha_rtd.pyThis runs the agent in a configured tower-defense scenario, using the current model parameters and environment settings.
- MCTS behaviour in terminal / edge states needs further debugging.
- Environment code should be refactored to support flexible relative paths (e.g.
data/directory). - Reward signals are simple and may need redesign to avoid degenerate behaviour.
- Code structure could be simplified (some functions are longer than necessary).
- No formal evaluation or baselines are implemented yet.
To be explicit:
- This project does not connect to the live StarCraft II client.
- It does not control units or play online games.
- It is a standalone Python simulation that borrows the basic idea of the “New Random Tower Defense” arcade map and applies an AlphaZero-style search to a simplified grid version of that idea.
- The code is largely AI-generated, based on my high-level descriptions of the game logic and reward structure.
If you are interested in this kind of experiment, feel free to fork, modify, or strip down the code for your own tower-defense or grid-based toy environments.