- Inspired by TinyZero (countdown task)
- Setup Stockfish
- Download link
- On Mac, allow app in Systems Settings -> Privacy & Security
- Set
STOCKFISH_PATH
environment variable to the Stockfish executable
- From TinyZero
- Twitter thread
- Model size, base vs. instruct, RL algorithm
- Effect of curriculum training
- Increasing elo vs. random ordering
- Board representation
- FEN vs. piece-list vs. ascii
- Qwen VL/2.5-VL + image of board (?)
- Reward shaping
- Binary 0/1 vs. Stockfish score delta (centipawns)