This is Project Themis, a suite of tools which powers Brier.fyi and previously Calibration City. The purpose of this project is to perform useful analysis of prediction market calibration and accuracy with data from each platform's public API.
Clone this repository and enter it:
git clone [email protected]:wasabipesto/themis.git
cd themisInstall any other dependencies:
- The downloader and extractor are written in rust. To install the rust toolchain, follow the instructions here. You could run these utilities in Docker but that is not officially supported.
- The website is written with Astro, which uses
nodeandnpm. You can find the official node/npm installation instructions here, run everything in Docker, or use whatever version Debian stable is shipping. - Docker and the docker compose plugin are used to run the database and its connectors. It's possible to run these without docker by installing Postgres and PostgREST manually.
- For running tasks I have provided a
justfile, which requiresjustto run. You can install that by following the instructions here. Thejustfileis very simple, and you can just run the commands by hand if you don't want to install it. - The script for site deployment uses
rcloneand thus can be deployed to any target supported by that utility. You can install rclone by following the instructions here, or deploy the site some other way. - Some other optional utilities:
- There are a few Python scripts I use for development in the
scriptsfolder. If you want to use these, ensure you havepythonanduvinstalled. - When testing API responses I use
jqfor filtering and general formatting. You can get that here. - A couple scripts for debugging are written with
rust-script. Installation instructions are here. - Some admin tools lean on an
ollamaAPI endpoint for extracting keywords, generating slugs, and more. You can find installation instructions here. By default it expects that the service will be started and available on localhost.
- There are a few Python scripts I use for development in the
In previous versions of this program, we deserialized all API responses immediately upon receiving them in order to work in a type-safe rust environment. This works great if APIs never change. Since external APIs can change unexpectedly, we have broken the download flow into two programs: a downloader and an extractor. The downloader will grab all relevant data from the platform APIs, then the extractor will deserialize that data into something we can use.
Before downloading, make sure you have enough disk space, memory, and time:
- By default the download program will download from all platforms in parallel to avoid getting bottle-necked by any one platform's API rate limit. In order to do this we first download the platform's bulk list as an index and load it into memory. If you are running in the default mode, expect to use around 6 GB of memory. If you run out of memory, you can run the platforms one at a time with the
--platformoption. - This program will download all relevant data from each platform's API to disk. We try to avoid reading or writing any more than necessary by buffering writes and appending data where possible. Still, a large amount of disk space will be required for this data. As of February 2025 it uses around 20 GB, but this will increase over time.
- When run the first time, this utility takes a day or so to complete. It will first download each platform's index and make a download plan. Then it will queue up batches of downloads that run asynchronously. If you interrupt the program or it runs into an error, simply restart it. It will look for an existing index file and attempt to resume the downloads automatically.
To run the downloader:
just download --help # for options
just download # run with default settingsThe download utility is designed to be robust so you can set it and forget it. Errors are much more likely in later steps. If the downloader crashes and resuming a few minutes later does not solve the problem, please submit an issue. This could be caused by a major shift in a platform's API structure or rate limits.
Note: Do not run multiple instances of the download program to try and make it go faster! Site-specific rate limits are baked in to stay under the rate limits and prevent overloading the servers. The data downloader queues items sequentially, so you will end up with duplicate markets while also getting yourself IP-banned.
While the downloader is running, set up the database.
First, we'll create our environment file and update the connection passwords.
cp template.env .env
sed -i "s/^POSTGRES_PASSWORD=.*/POSTGRES_PASSWORD=$(openssl rand -base64 32 | tr -d '/+=' | cut -c1-32)/" .env
sed -i "s/^PGRST_JWT_SECRET=.*/PGRST_JWT_SECRET=$(openssl rand -base64 32 | tr -d '/+=' | cut -c1-32)/" .envOnce the .env file has been created, you can go in and edit any settings you'd like.
Next, we'll generate our JWT key to authenticate to PostgREST. You can do this with many services, but we'll generate it with this script:
sed -i "s/^PGRST_APIKEY=.*/PGRST_APIKEY=$(python3 scripts/generate-db-jwt.py)/" .envThat key will be valid for 30 days, to refresh it just run that line again.
To actually start our database and associated services:
just db-upThis command will start the database, the REST adapter, and the backup utility. These services need to be running during the extract process, group building process, and site building process. When the site is deployed it reaches out to the database for a few non-critical functions.
The database will run in Postgres, which will persist data in the postgres_data folder. You should never need to access or edit the contents of this folder. Another container handles backups, which will be placed in the postgres_backups folder daily.
If you ever change a setting in the .env file, you can re-run just db-up to reload the configuration and restart containers if necessary.
To manually run a backup or get the database schema:
just db-backup # run a backup and save in the postgres_backups folder
just db-get-schema # extract the current schema and output to stdoutImport the schema, roles, and some basic data with the db-load-schema task:
just db-load-schema # run all provided SQL filesReload PostgREST for it to see the new schema:
docker kill -s SIGUSR1 postgrest # trigger a reload
docker restart postgrest # or restart the whole containerThen you can test that everything is working with curl:
just db-curl platformsYou should see data for each platform, formatted for readability with jq.
We don't need to do this yet, but to completely stop the database and services:
just db-downOnce everything has been downloaded from the platform APIs, we can extract and import that data into the database.
This utility will read the data files you just downloaded and make sure every item matches our known API schemas. If anything changes on the API end, this is where you will see the errors. Please submit an issue if you encounter any fatal errors in this step.
Running this program in full on default settings will take about 5 minutes and probably produce a few dozen non-fatal errors. Every platform has a couple items that are "invalid" in some way, and we've taken those into account when setting up our error handling.
After a few thousand items are ready to upload, the program will send them to the database through the PostgREST endpoints. It should fail quickly if it's unable to connect to the database.
Ensure the database services are running and then run:
just extract --help # for options
just extract --schema-only # check that schemas pass
just extract # run with default settingsIf you get an error like Connection refused (os error 111), make sure you imported all schemas and reloaded the PostGREST configuration from the previous section.
Then you can test that everything is working with curl:
just db-curl "markets?limit=10"
just db-curl "markets?select=count"
just db-curl "daily_probabilities?limit=10"
just db-curl "daily_probabilities?select=count"You should see a few sample markets and data points, with total counts for each.
The extract tool is designed to be safe to run multiple times. It will only overwrite items in the market table, and it will update items if they already exist. You can even run it while the download is in-progress to extract what's available.
The heart of the site are "questions", which are small groups assembled from markets that are equivalent.
Ideally every platform would predict every important event with the same method and resolve with the same criteria, but they don't. Some platforms are legally unable, some have differing market mechanisms, and some just don't like predicting certain things. Our goal is to find a few markets from different platforms that are similar enough to be comparable and link them together under a "question" and do this as many times as possible.
Right now this is done manually in order to ensure that linked markets are actually similar enough to be comparable. For my purposes, two markets are similar enough if the differences in their resolution criteria would affect their probabilities by 1% or less. For instance, two markets with a duration over 6 months with close dates differing by 1 day are usually still similar enough to compare equitably. This requires a fair amount of human judgment, though I am experimenting with ways to automate it.
In the meantime, I have made a secondary Astro site with the tools you will need to search and view markets, link markets into groups, edit the question groups, and edit most other items in the database. To run it in the basic mode, run:
just groupThis will launch the site in Astro's dev mode, which will be enough for anything you need to do. The site can also be compiled statically and served in the same way as the main site, but I recommend against doing this since it will have your database admin credentials baked in.
If you want to use embeddings to find similar markets, you can generate them with the following script:
uv run scripts/update-embeddings.pyFor now I am intentionally not documenting specific features of the admin tools since they are not user-facing and I am constantly changing them to suit my needs better. The method I have found that works best for me is:
- Sort all markets by volume, number of traders, or duration. Find one that seems interesting.
- Find markets from other platforms that have equivalent or nearly-equivalent resolution criteria.
- Sort those markets by volume, number of traders, or duration to find the one "authoritative" market per platform.
- Create a question group with a representative title and slug consistent with your conventions.
- Add all selected markets to the question by copying in their IDs.
- Check that the probabilities overlap and set start/end date overrides if necessary.
- Check that the resolutions match and invert questions if necessary.
- While you have those searches open, look for other possible question groups in the same topic.
- Once you have exhausted the markets in that topic, return to the top-level search and find another topic.
When you have finished grouping markets, you can calculate all market scores by running the grader tool:
# optional, fix criterion probabilities to be more intuitive for linked questions
uv run scripts/fix-criterion-probs.py
# caluclate all scores and grades
just gradeThis tool will run through basically everything in the database and calculate some scores that are a little to compute-intensive to do at build time and refresh all the database views. This tool is non-destructive just like the others, you can run it over and over again and lose nothing but your time. Just make sure you re-run it every time you finish grouping markets before generating the site.
You will also need to generate embeddings for related questions. You can generate those with the following script:
# just generate embeddings for questions
uv run scripts/update-embeddings.py --questions-only
# regenerate embeddings for all items
uv run scripts/update-embeddings.py --allThe site is static and designed to be deployed behind any standard web server such as nginx. It could also be deployed to GitHub Pages, an AWS S3 bucket, or any other static site host.
You can view a preview of the site or build it like so:
just site-dev # live preview the site in a browser
just site-build # build the site to the site/dist directoryThe first site page load (in preview mode) or build will take a while as items are downloaded from the server. Subsequent loads/builds will be much faster but will not reflect the database's current state. In order to clear the cache, run the task:
just site-cache-reset # invalidate site data cacheWe use rclone to deploy the site to your provider of choice. First, configure your rclone target and add the details to the .env file:
rclone config # set up a new target
nano .env # add your rclone target pathThen, you can deploy the site at any time with this command:
just deploy # build and deploy site to rclone targetIf you're just developing on the site you don't actually need to use the download and extract tools!
You can build the site against my public database that the main site builds from by doing either of these:
- Change the
PGRST_URLvariable in the.envenvironment file tohttps://data.brier.fyi. - Run the site development server with
PGRST_URL="https://data.brier.fyi" just site-dev.
First load of the dev site will be slow while it caches some of the Big Data™. Other than that the Astro project should be pretty straightforward.
Over time, new markets will be added and other markets will be updated. In order to update the database with the freshest data, you can re-run the download and extract programs to load the new data.
The download program has two different arguments for resetting:
--reset-indexwill re-download the platform index and then follow any rules set for what to download. This is good for catching markets that have been added since the last download but will not refresh markets that already existed but were resolved since the last download. This is usually not what you want for updating a database.--reset-cacheused by itself will re-download everything, updating the database with 100% fresh data. Unfortunately this will take several days unless used with one of the filters below.--resolved-sincewill filter the market download queue to just those resolved since the given date. Must be in the form of an ISO 8601 string.--resolved-since-days-agowill do the exact same as the previous option, but with a duration supplied instead of a date. This is usually the best option for a scripted refresh.
All reset options make a backup of the previous data files in case you want to look at past data.
# run a full refresh and add to the database
just download --reset-cache && just extract
# only download markets resolved recently and add to the database
just download --reset-cache --resolved-since-days-ago 10 && just extractAfter the data is downloaded, you can add groups and edit data in the database as before. Then, build the site again and see the results.
Eventually you may want to wipe the markets table in the database, either because you are changing the database schema or because you want to start fresh. In order to do this without losing data you will need to first export your questions and market-question links. I've provided a script to do this:
# back up your database, just in case
just db-backup
# export the questions and market links
uv run scripts/migrate.py --mode export
# either drop all tables
just db-run-sql schema/00-drop-all.sql
# or wipe the data folder
just db-down
sudo rm -r postgres_data
just db-up
# load the schema
just db-load-schema
# reload the schema cache
docker kill -s SIGUSR1 postgrest
# import the questions and market links
uv run scripts/migrate.py --mode import
# calculate stats and refresh everything else
just grade
# check and build the site
just site-dev
just site-build
# check everything is in place
just db-curl "market_details?limit=10&question_slug=not.is.null"Note that this is not necessary if you want to edit table views. To reload the database view schema, just run:
just db-run-sql schema/03-views.sqlThe production database is publicly readable via PostgREST at https://data.brier.fyi. This will lead you to a full OpenAPI spec, which you could plug in to Swagger or your client generator of choice.
For example, to get items from various tables:
curl -sf https://data.brier.fyi/question_details?limit=100
curl -sf https://data.brier.fyi/market_details?limit=100
curl -sf https://data.brier.fyi/daily_probability_details?limit=100You can find PostgREST documentation here:
- https://docs.postgrest.org/en/stable/references/api/tables_views.html
- https://docs.postgrest.org/en/stable/references/api/pagination_count.html
This project has been awarded the following grants:
- $3,500 as part of the Manifold Community Fund, an impact certificate-based funding round.
- $8,864 as part of the EA Community Choice, a donation matching pool.
These grants have been used for furthering development but have not influenced the contents of this site towards or away from any viewpoint.
This project has been featured in the following instances:
- Leveraging Log Probabilities in Language Models to Forecast Future Events
- Tangle News: Lessons from the election you could bet on
- Forecasting Newsletter: June 2024
- Calibrations Blog: Should we let ourselves see the future?
- Lightcone News: Accuracy and Trust
- Valis Research: Unanswered Questions Surrounding Prediction Markets
- Human Invariant: The Future of Play Money Prediction Markets
I use prediction markets, mainly Manifold and Metaculus, as a personal exercise in calibration. This project grew out of an effort to see how useful they can be as information-gathering tools.
As with any statistics, this data can be used to tell many stories. I do my best to present this data in a way that is fair, accurate, and with sufficient context.
The code for this project as presented in this repository is copyright under the MIT License, attached.
The contents of the live published website and database, including the explanatory descriptions, market/question links, categorizations, graphics, and visualizations are copyright under CC BY-NC-SA 4.0 Deed, attached.