Skip to content

prb/pips-solver

Repository files navigation

Pips Solver, 8x8-2x2 Mechanical Proof, and Poly Pips Game Concept

Last updated 2025-10-27.

This project contains three subprojects that are an initial foray into vibe coding for me:

  • The Pips Solver is a human-specified solver for the NY Times Pips game with implementations coded by Gemini Pro 2.5, Claude Sonnet 4.5 Code, and OpenAI Codex 5.
  • The 8x8-2x2 Mechanical Proof is a human-specified, human-guided, AI-implemented mechanical proof of an assertion from Polyominoes 101 - The Absolute Basics that an 8x8 grid with a 2x2 square removed can be tiled with the 12 pentominoes (up to chirality). Rather than a three-way bakeoff, this work was done by Claude Code, and I gave it the opportunity to read through the code for the then in-progress Polypips project.
  • The Polypips Game Concept extends the concept of the Pips game to polyominoes, introduces a game generator that accepts various heuristics, and a solver. As with the other work here, this was human-specified, human guided, and AI implemented. Rather than a three-way bakeoff, this work was done by Codex 5.

These is more discussion about each of the subprojects in the sections below.

Pips Solver Discussion

The specification is in pips-solution-strategy.md, and my primary objective was to experiment with different AI models/agents and what it's like to work with them (and to have them work with each other). I used the Zed editor for authoring, with a draft of the specification being written independently first and then iteratively improved through the initial work of implementing it via Gemini.

I worked through four stages:

  1. Each AI coding agent implementing independently based on the specification.
  2. Optimization of the algorithm with the agents critiquing and adapting code from each other.
  3. Implementation of human-specified improvements (e.g., a heuristic for selecting the next move).
  4. Further improvements (see the issues for the project) to output formatting (adapted from Brian Berns' F# implementation), and pulling JSON games down directly with the agent integrated with Github via the Github MCP (mcp-remote presenting a stdio interface to Codex). I only had Codex do this work, so the Claude and Gemini implementations don't have those improvements yet.

Notes on Puzzles

The most interesting puzzles from a troubleshooting standpoint were the 2025-09-15 "hard" (due to the large single constraint) and the 2025-10-14 "hard" (due to the number and complexity of constraints). Solving takes no more than ~30 seconds on my laptop.

The Codex solver has some enhancements to count the total number of solutions for a puzzle, and most puzzles have only one solution. Of the 183 puzzles in examples:

  • 131 puzzles (≈72%) have exactly one solution: easy 50, medium 44, hard 37.
  • Remaining puzzles show small clusters of alternatives except for a handful ofhard cases with large branching factors.

The puzzles with the most solutions are:

  • 2025-09-15 hard: 2,764,800 solutions
  • 2025-10-05 hard: 344 solutions
  • 2025-09-30 hard: 110 solutions
  • 2025-09-04 hard: 86 solutions
  • 2025-08-23 hard: 80 solutions

Notes on Agents

These notes are primarily for the initial implementation and a couple of subsequent optimization/refinement passes.

Gemini 2.5 Pro

I used Gemini 2.5 Pro to make a first pass from within the Zed editor, interacting with the AI to build a working solver; net net, it was certainly faster than writing it on my own. After making passes with Claude and Codex, I used Gemini 2.5 Pro again (but from the CLI this time) for a second pass after the refinements to the specification obtained through the interaction with the other models.

For the first pass, Gemini didn't do a great job following the instructions in the specification, and it was aboslutely the most "YOLO" in terms of writing tests and explaining its thinking.

Claude Sonnet 4.5 + Code