🪄 Automatically convert a PDF into a fillable form.
💻 Hosted Models (detect.semanticdocs.org) | 📄 CommonForms Paper | 🤗 Dataset | 🦾 Models
This repo contains three things:
- the pip-installable
commonforms
package, which has a CLI and API for converting PDFs into fillable forms - the FFDNet-S and FFDNet-L models from the paper CommonForms: A Large, Diverse Dataset for Form Field Detection
- the preprocessing code for the CommonForms dataset, which is hosted on HuggingFace: https://huggingface.co/datasets/jbarrow/CommonForms
CommonForms can be installed with either uv
or pip
, feel free to choose your package manager flavor:
uv pip install commonforms
Once it's installed, you should be able to run the CLI command on ~any PDF.
The simplest usage will run inference on your CPU using the default suggested settings:
commonforms <input.pdf> <output.pdf>
Input | Output |
---|---|
Argument | Type | Default | Description |
---|---|---|---|
input |
Path | Required | Path to the input PDF file |
output |
Path | Required | Path to save the output PDF file |
--model |
str | FFDNet-L |
Model name (FFDNet-L/FFDNet-S) or path to custom .pt file |
--keep-existing-fields |
flag | False |
Keep existing form fields in the PDF |
--use-signature-fields |
flag | False |
Use signature fields instead of text fields for detected signatures |
--device |
str | cpu |
Device for inference (e.g., cpu , cuda , 0 ) |
--image-size |
int | 1600 |
Image size for inference |
--confidence |
float | 0.3 |
Confidence threshold for detection |
--fast |
flag | False |
If running on a CPU, you can trade off accuracy for speed and run in about half the time |
In addition to the CLI, you can use
from commonforms import prepare_form
prepare_form(
"path/to/input.pdf",
"path/to/output.pdf"
)
All of the above arguments are keyword arguments to the prepare_form
function.
🚧 Code for dataset prep exists in the dataset
folder.
If you use the tool, models, or code in an academic paper, please cite the CommonForms paper:
@misc{barrow2025commonforms,
title = {CommonForms: A Large, Diverse Dataset for Form Field Detection},
author = {Barrow, Joe},
year = {2025},
eprint = {2509.16506},
archivePrefix= {arXiv},
primaryClass = {cs.CV},
doi = {10.48550/arXiv.2509.16506},
url = {https://arxiv.org/abs/2509.16506}
}
If you use it in a non-academic setting, please reach out to the author (joseph.d.barrow [at] gmail.com)! I love to hear when people are using my work!