Skip to content

SQL-Storm/SQLStorm

Repository files navigation

SQLStorm: Taking Database Benchmarking into the LLM Era

SQLStorm is an LLM-generated analytical benchmark for database systems. Check out the paper under https://db.in.tum.de/~schmidt/papers/sqlstorm.pdf.

Contact Tobias Schmidt ([email protected]) if you have any questions.

ChangeLog

v1.0 - 01.03.2025

  • Initial Release of SQLStorm
  • Queries generated by OpenAI GPT-4o-mini on the StackOverflow, TPC-H, and JOB dataset
  • The following machine was used for the query generation and selection:
    • OS: Ubuntu 24.10 x86_64
    • Host: PowerEdge R6615
    • Kernel: 6.11.0-13-generic
    • CPU: AMD EPYC 9454P (96) @ 2.750GHz
    • Memory: 382GB
  • The following database systems were used to select the queries:
    • PostgreSQL 17.0
    • Umbra 25.01
    • DuckDB 1.2.0

Project Structure

sqlstorm
 ├─ v0.0                        # SQLStorm version 0.0 - Parameterized Queries
 |   └─ <dataset>                 # Available datasets: tpch, tpcds, and job
 |       └─ queries                 # Parameterized queries, generated by the official tools
 ├─ v1.0                        # SQLStorm version 1.0 - Generated by GPT-4o-mini
 |   └─ <dataset>                 # Available datasets: stackoverflow, tpch, tpcds, and job
 |       ├─ queries                 # The final query set after rewriting, making them compatible, and selecting the parseable ande executable ones
 |       └─ queries_generated       # The original set of queries generated by the LLM and the rewritten/compatible version created by the scripts
 ├─ prompts                     # Prompts used to generate the SQLStorm queries
 |   ├─ p*.yaml                   # The individual prompts
 |   └─ <dataset>.yaml            # The SQL schema as text to append to the prompt
 ├─ scripts                     # Helper scripts
 └─ sql                         # SQL scripts for creating and loading the database

Running the SQLStorm with OLAPBench (Recommended)

OLAPBench is an automated benchmarking framework for analytical database systems. It takes care of downloading the datasets, loading the database, and running the queries. It supports multiple database systems and can be easily extended to support new systems.

We provide benchmark definition files that can be used to run SQLStorm in combination with OLAPBench. All our experiments were done with OLAPBench and the scripts for selecting the queries all use the output files.

For setting up OLAPBench just run

./scripts/olapbench.py

You can run SQLStorm v1.0 on StackOverflow with the following command:

./benchmark/benchmark.sh definitions/v1.0/stackoverflow.benchmark.yaml

It will run the SQLStorm queries on the StackOverflow dataset at for all sizes (0, 1GB, 12GB, 222GB) on PostgreSQL, Umbra, DuckDB, Hyper, ClickHouse, SQLServer, Apollo ( SQLServer + column store), and SingleStore. Note that each system runs every query only once and a timeout after 10s is configured. If the accumulated execution time on the executed queries exceeds 24h the remaining queries on that database will be skipped and reported as global timeout queries.

OLAPBench will create the following directories in the current working directory:

sqlstorm
 ├─ db          # The database's internal files
 ├─ data        # The raw dataset as CSV/TSV files
 └─ results     # The benchmark results

The results will be written to results/<hostname>/24h/stackoverflow.csv. Depending on the system configuration, it is recommended to symlink db and data to an SSD RAID to increase the available space and I/O bandwidth.

It is possible to run the benchmark on individual system and datasets as follows:

./benchmark/benchmark.sh definitions/v1.0/postgres.benchmark.yaml stackoverflow --zero --query-dir sqlstorm_v1.0 # PostgreSQL on StackOverflow 0
./benchmark/benchmark.sh definitions/v1.0/duckdb.benchmark.yaml stackoverflow --dba --query-dir sqlstorm_v1.0 # DuckDB on StackOverflow 1GB
./benchmark/benchmark.sh definitions/v1.0/cedardb.benchmark.yaml stackoverflow --math --query-dir sqlstorm_v1.0 # CedarDB on StackOverflow 12GB
./benchmark/benchmark.sh definitions/v1.0/umbra.benchmark.yaml stackoverflow --query-dir sqlstorm_v1.0 # Umbra on StackOverflow 222GB

Running SQLStorm without OLAPBench

It is also possible to run SQLStorm without OLAPBench. You can download the data and the schema of the StackOverflow dataset from the following links:

  1. StackOverflow Schema
  2. StackOverflow DBA (1GB)
  3. StackOverflow Math (12GB)
  4. StackOverflow Full (222GB)

For the other datasets (TPC-H, TPC-DS, and JOB) please refer to the official tools to generate the data and the schema.

Create a new benchmark

Creating a new SQLStorm benchmark requires the following steps:

1. Query Generation

Install the required dependencies:

./setup.sh  # Creates a virtual environment and installs the dependencies

For generating the queries, run the following script for all prompts:

./scripts/prompt.py <prompt> <dataset> <version>

As of SQLStorm v1.0, we support seven prompts. They can be found in the prompts/ directory. Each file includes the LLM to use, the number of queries to generate, the preamble, and the actual instructions for the LLM to generate the queries. New prompts can be added by creating a new file. The <prompt>.yaml should be the name of the file containing the prompt.

The <dataset> flag determines the schema that is given to the LLM for generating queries. Adding a new dataset requires a new schema text file called prompts/<dataset>.txt.

The <version> flag determines the output directory for the queries. The generated queries will be stored under <version>/<dataset>/queries_generated. Note if the queries already exist, the script will fail.

For example to generate the queries at version v1.0 for the StackOverflow dataset for prompt P1 run

./scripts/prompt.py p1 stackoverflow v1.0

All further scripts will use stackoverflow and v1.0 for <dataset> and <version>.

2. Rewrite Queries

Next, we need to rewrite the generated queries. This step includes eliminating duplicates, extracting the select statements, removing comments, and replacing functions that access the current_date with a fixed date. Run the following scripts:

./scripts/select.py stackoverflow v1.0

The rewritten queries will be stored in the same directory as the generated queries (<version>/<dataset>/queries_generated/) but the suffix _rewritten is added to the filename.

The intermediate version of the new benchmark will be copied to <version>/<dataset>/queries/. Next, we need to make it accessible for OLAPBench. For this run:

./scripts/olapbench.sh

The script will download OLAPBench and register all benchmarks with OLAPBench.

3. Increasing Compatability

LLMs, like GPT-4o-mini, are trained on a large corpus of publically available data including GitHub and StackOverflow. PostgreSQL enjoys a broad adoption in open-source projects and is therefore well represented in the training data and the queries are likely to use PostgreSQL specific syntax. To increase the compatibility with other databases, we need to rewrite the queries. We run the queries on empty database in PostgreSQL, Umbra, and DuckDB to determine whether they are parsable:

./benchmark/benchmark.sh definitions/v1.0/build0.benchmark.yaml stackoverflow --zero --query-dir sqlstorm_v1.0

Queries that run in all three systems are considered compatible and are kept in the benchmark. The queries that are not parsable in all three systems will be send to the LLM for rewriting:

./scripts/select.py stackoverflow v1.0 -c results/v1.0/queryset0/stackoverflow_zero_sqlstorm_v1.0.csv

The rewritten queries will be stored in <version>/<dataset>/queries_generated/ and a suffix _compatible will be added to the filename.

4. Query Selection

The final step is to select the queries for the benchmark. We use the following criteria: (a) the query must be parsable in at least two systems, and (b) the query must be executable by at least one system on the smallest non-empty database (1GB for StackOverflow, TPC-H, and JOB). We first remove the non-parsable queries and then run the benchmark on SQLStorm-1:

./benchmark/benchmark.sh definitions/v1.0/build1.benchmark.yaml stackoverflow --zero --query-dir sqlstorm_v1.0
./scripts/select.py stackoverflow v1.0 -c results/v1.0/build0/stackoverflow_zero_sqlstorm_v1.0.csv -p results/v1.0/build1/stackoverflow_zero_sqlstorm_v1.0.csv

./benchmark/benchmark.sh definitions/v1.0/build1.benchmark.yaml stackoverflow --dba --query-dir sqlstorm_v1.0
./scripts/select.py stackoverflow v1.0 -c results/v1.0/build0/stackoverflow_zero_sqlstorm_v1.0.csv -p results/v1.0/build1/stackoverflow_zero_sqlstorm_v1.0.csv -e results/v1.0/build2/stackoverflow_dba_sqlstorm_v1.0.csv

After running the last command the final benchmark will be available under <version>/<dataset>/queries.

Analyse the results

Recreate the results reported in the paper. We provide multiple python notebooks used to analyse the results from running the SQLStorm queries. These notebooks can be used to create the tables and figures in the SQLStorm paper.

Query Complexity & Diversity: analyse/features.ipynb

Computes the number of distinct queries and classifies the operators and expressions used in the queries according to their complexity.

First run the features benchmark to extract query features on Umbra (e.g., the number of joins, aggregations, the query tree shape, etc):

# Parameterized queries
./benchmark/benchmark.sh definitions/v0.0/features.benchmark.yaml tpch --query-dir sqlstorm_v0.0
./benchmark/benchmark.sh definitions/v0.0/features.benchmark.yaml tpcds --query-dir sqlstorm_v0.0
./benchmark/benchmark.sh definitions/v0.0/features.benchmark.yaml job

# SQLStorm queries
./benchmark/benchmark.sh definitions/v1.0/features.benchmark.yaml stackoverflow --dba --query-dir sqlstorm_v1.0
./benchmark/benchmark.sh definitions/v1.0/features.benchmark.yaml tpch --query-dir sqlstorm_v1.0
./benchmark/benchmark.sh definitions/v1.0/features.benchmark.yaml tpcds --query-dir sqlstorm_v1.0
./benchmark/benchmark.sh definitions/v1.0/features.benchmark.yaml job --query-dir sqlstorm_v1.0

Next executed the notebook analyse/features.ipynb to create the tables and figures. The extracted features for all datasets will be stored in analyse/features/.

analysis/features
 ├─ v0.0        # SQLStorm version 0.0 - Parameterized Queries from the official benchmarks (TPC-H, TPC-DS, and JOB)
 |   └─ ...
 └─ v1.0        # SQLStorm version 1.0 - Generated by GPT-4o-mini
   ├─ stackoverflow_dba_sqlstorm.log              # Output log of the feature extraction
   ├─ stackoverflow_dba_sqlstorm.csv              # Extracted features for each query
   ├─ stackoverflow_dba_sqlstorm_operators.csv    # Extracted operators for each query
   └─ stackoverflow_dba_sqlstorm_expressions.csv  # Extracted expressions for each query

Citation

@article{sqlstorm25,
  author  = {Tobias Schmidt and Viktor Leis and Peter Boncz and Thomas Neumann},
  title   = {SQLStorm: Taking Database Benchmarking into the LLM Era},
  journal = {Proc. {VLDB} Endow.},
  volume  = {18},
  number  = {11},
  pages   = {4144--4157},
  year    = {2025}
}

About

SQLStorm: Taking Database Benchmarking into the LLM Era

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published