Robin is a Robustness Benchmark for range indexes (especially for updatable learned indexes).
Robin is also an insect-eating bird that offers great benefits to agriculture.
- Face dataset contains the numeric_limit<uint64_t>, Some indexes may use this as a sential for easing implementation. Therefore, we shifted all fb keys by minus one as Face-1 dataset.
- We modify the LIPP/SALI's hyperparameter MAX_DEPTH to ensure it can successfully run all the test cases (otherwise it will crash due to its assertion at runtime).
- We modify the bulkload process of STX B+tree to ensure its node half filled (load factor = 0.5) after bulkloaading, which aligns its insertions and splits to show its performance robustness.
- Other parameters of all indexes are the same as their original implementations.
- All of our tested index implementations can be found in this repo. Each branch is corresponding to one index.
- We add profiling stats for art, btree, alex and lipp about the distribution of depth, comparison count of leaf node search, the model of root node and so on, with minor invasion.
If you want to go faster, you can just run the following script to install the dependencies download the dataset and build the project:
bash prepare.sh
RoBin depends on the tbb, jemalloc and boost library. You can install them by the following command:
sudo apt update
sudo apt install -y libtbb-dev libjemalloc-dev libboost-dev
If the repository is not cloned with the --recursive
option, you can run the following command to clone the submodule:
git submodule update --init --recursive
Download the dataset from remote and construct linear and fb-1:
cd datasets
bash download.sh
python3 gen_linear_fb-1.py
rm -rf build
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
make -j
or just run the following script:
bash build.sh
Benchmark all the competitors via RoBin with the following command:
It may cost some time to finish.
bash reproduce.sh
The results will be stored in the results
directory.
Using the jupyter notebook to plot the results:
cd results
# open and run the following jupyter notebook to reproduce the figure in our paper
# such as single_thread.ipynb and etc.
Build with the flag "PROFILING":
rm -rf build
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DPROFILING=ON
make -j
or just run the following script:
bash build.sh profiling
Note that our code modifications for profiling have no impact on index performance when building without this flag for benchmark test.
Run profiling script:
bash run_case_profiling.sh # (recommended) minimal case study to reproduce the figures in our paper [2~3 hours]
# bash run_all_profiling.sh # all case profiling which may take large amount of running time and disk space
Using the jupyter notebook to plot the profiling results:
cd profiling_result
mkdir -p fig
## open and run the following jupyter notebooks to reproduce the figures in our paper
## analysis_depth.ipynb
## analysis_memory.ipynb
## analysis_overfit.ipynb
## analysis_smo.ipynb
We also provide a script to run the RoBin with custom parameters. You can run the following command to see the help information:
python3 run.py --help
- We build this benchmark based on a well-designed benchmark GRE. The related paper is:
Wongkham, Chaichon, et al. "Are updatable learned indexes ready?." Proceedings of the VLDB Endowment 15.11 (2022): 3004-3017.