This repository contains software supporting research into query routing in distributed search systems.
The code base is written in Rust. It uses some cutting-edge language features, so you will need to use at least Rust 1.42.
If you do not have Rust ecosystem installed, you can set it up simply with:
# Taken from: https://www.rust-lang.org/learn/get-started
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
If you already have rustup
installed, you only need to update it to the latest stable version:
rustup update
The project follows the standard Cargo workflow. For those who are unfamiliar with it, here are just a few commands you will need to build and run the application.
The following command will build the project along with all its dependencies in release mode. It will take longer but the build will be faster due to the optimizations performed at the compilation time. You can skip the --release
flag if you only want to play with toy examples or for development of the application.
cargo build --release
See Installation for an alternative way to build it.
If you have built the project with the command above, the binary is now located in target/release/simulation
and ready to run:
target/release/simulation --help
Alternatively, you can run the binary with cargo
:
cargo run --release --bin simulation -- --help
Notice the --
, which separates Cargo's arguments from the arguments of the target application.
Instead of building and running the project locally, you can install it in Cargo's local repository. On Linux, this is usually in ~/.cargo
. In order to make the installed binaries available, make sure to add ~/.cargo/bin
to your PATH
environment variable or its equivalent in your operating system. Consult Cargo's documentation for more information.
To install this project with Cargo, run the following command.
cargo install --path .
simulation
is a TUI application, it looks more or less like this:
You can run simulation --help
to print help information about command line arguments. Two most important arguments are:
- Simulation configuration passed with
--config
option. - Query log either passed as a positional argument or read from the standard input.
To explore the application, you can run:
simulation --config tests/config.yml tests/queries.jl
Configuration is a file in YAML format. Here is an example from tests
directory:
brokers: 1 # use 1 broker
cpus_per_node: 1 # no. CPUs per shard node
query_distribution:
mean: 200 # mean interval between queries appearing in the system
std: 20 # standard deviation
time_unit: micro # interpret any duration passed to the simulation as microseconds
seed: 182374190 # seed for pseudo-random number generator for reproducible results
assignment: # shard routing probabilities (0.0 means not assigned)
- [0.50, 0.50, 0.00, 0.00] # node 1
- [0.00, 0.00, 0.50, 0.50] # node 2
- [0.00, 0.50, 0.50, 0.00] # node 3
- [0.50, 0.00, 0.00, 0.50] # node 4
- [0.25, 0.25, 0.25, 0.25] # node 5
Queries come in a list of JSON objects. At minimum, it must contain retrieval times for all shards:
{"retrieval_times":[147,137,160,147]}
There is tests/queries.jl
file you can use as an example.
Other query properties are not yet used by the simulation and will be documented later.
There is a set of default bindings to navigate the application, which you can print with:
simulation --key-bindings
In short, you can use arrows to move between panes, Enter
to activate a pane or see query details, Esc
to come back to the previous view, and the usual keys to navigate up and down any list. You can also maximize an active pane with F
key (Shift+f
).
Although TUI is very convenient for getting important insights into the inner workings of the simulation, eventually, we want to run a longer simulation and report its statistics. This is what --no-ui
mode is for. When using no-ui mode, you will also need to define the simulation --time
.