Last updated 2025-10-27.
This project contains three subprojects that are an initial foray into vibe coding for me:
- The Pips Solver is a human-specified solver for the NY Times Pips game with implementations coded by Gemini Pro 2.5, Claude Sonnet 4.5 Code, and OpenAI Codex 5.
- The 8x8-2x2 Mechanical Proof is a human-specified, human-guided, AI-implemented mechanical proof of an assertion from Polyominoes 101 - The Absolute Basics that an 8x8 grid with a 2x2 square removed can be tiled with the 12 pentominoes (up to chirality). Rather than a three-way bakeoff, this work was done by Claude Code, and I gave it the opportunity to read through the code for the then in-progress Polypips project.
- The Polypips Game Concept extends the concept of the Pips game to polyominoes, introduces a game generator that accepts various heuristics, and a solver. As with the other work here, this was human-specified, human guided, and AI implemented. Rather than a three-way bakeoff, this work was done by Codex 5.
These is more discussion about each of the subprojects in the sections below.
The specification is in pips-solution-strategy.md, and my primary objective was to experiment with different AI models/agents and what it's like to work with them (and to have them work with each other). I used the Zed editor for authoring, with a draft of the specification being written independently first and then iteratively improved through the initial work of implementing it via Gemini.
I worked through four stages:
- Each AI coding agent implementing independently based on the specification.
- Optimization of the algorithm with the agents critiquing and adapting code from each other.
- Implementation of human-specified improvements (e.g., a heuristic for selecting the next move).
- Further improvements (see the issues for the project) to output formatting (adapted from Brian Berns' F# implementation), and pulling JSON games down directly with the agent integrated with Github via the Github MCP (
mcp-remotepresenting astdiointerface to Codex). I only had Codex do this work, so the Claude and Gemini implementations don't have those improvements yet.
The most interesting puzzles from a troubleshooting standpoint were the 2025-09-15 "hard" (due to the large single constraint) and the 2025-10-14 "hard" (due to the number and complexity of constraints). Solving takes no more than ~30 seconds on my laptop.
The Codex solver has some enhancements to count the total number of solutions for a puzzle, and most puzzles have only one solution. Of the 183 puzzles in examples:
- 131 puzzles (≈72%) have exactly one solution: easy 50, medium 44, hard 37.
- Remaining puzzles show small clusters of alternatives except for a handful ofhard cases with large branching factors.
The puzzles with the most solutions are:
- 2025-09-15 hard: 2,764,800 solutions
- 2025-10-05 hard: 344 solutions
- 2025-09-30 hard: 110 solutions
- 2025-09-04 hard: 86 solutions
- 2025-08-23 hard: 80 solutions
These notes are primarily for the initial implementation and a couple of subsequent optimization/refinement passes.
I used Gemini 2.5 Pro to make a first pass from within the Zed editor, interacting with the AI to build a working solver; net net, it was certainly faster than writing it on my own. After making passes with Claude and Codex, I used Gemini 2.5 Pro again (but from the CLI this time) for a second pass after the refinements to the specification obtained through the interaction with the other models.
For the first pass, Gemini didn't do a great job following the instructions in the specification, and it was aboslutely the most "YOLO" in terms of writing tests and explaining its thinking.