A tool for managing numerical simulations across multiple compute hosts.
But nothing stops you to use it to use it for any directory on any of your computers you want to find by id or name.
In that case, mentally substitute simulation
for stuff
in this file.
With only the name or id of a simulations, get any or all of the following three without manual work:
- find the simulations on any host :
smurf search {id or name}
- instantly get a shell at the directory of your simulation on any host :
scd {id or name}
- mount the data onto your local machine (python3):
m = smurf.mount.Mount("{id or name}")
You
- have
python3
andbash=/=zsh
? - use ssh keys and ssh-agent (or similar)?
- have
rsync
,python3 setuptools
installed? - are willing to put a
meta
dir in the directories you want to find?
Then, yes!
You put a directory called meta
inside the dir you care about (smurf init
).
This meta
dir contains meta data including a name.txt
file and a unique id (meta/uuid/{the uuid}
).
Then standard unix tools and caching are used to find and store the location of the dir you care about (you only need to use smurf search ...
).
Clone this repo, navigate to it and run ./scripts/deploy.sh
If you get an import error for setuptools.find_namespace_packages
, try upgrading setuptools (python3 -m pip install -U setuptools
).
Follow these steps:
smurf init
create themeta
dir and the id. The name will be set to the directories name.smurf cache --notify
tells smurf about the simulations
smurf init
all simulation directories.- add the root directory containing all the simulations with
smurf config add rootdir /path/to/simulations
- search through the rootdir and generate a cache with
smurf cache -g
- check the entries in the cache and scrub it with
smurf cache -s
- maybe regenerate the cache with
smurf cache -g
- install smurf onto the remote host by running
./scripts/deploy.sh remotehost
inside the repo dir. - ssh to remote host
- configure smurf on the remote host just as on your local machine
Smurf uses ssh
to automatically connect to your remote hosts and saves you to manually search for simulations on them and navigate to them.
It also uses sshfs
to mount data automatically.
To do its job, smurf assumes that you login to remote hosts using ssh keys and that you set up a ssh-agent, such that you don’t need to type your password every time you connect to a remote host.
If you are unsure, try ssh remotehost
and see if its works (you can configure your remote hosts in ~/.ssh.config
).
If it fails, search online on how to set up ssh keys and a ssh-agent.
Smurf provides a python3
API and a command line interface.
There is a bash
and zsh
plugin which features tab completion and the scd
(get a shell at any simulation directory on any host) command.
Run
smurf config
to show the current config values.
Smurf allows you to specify root directories in which you can place the directories you want to track. The whole directory tree under each root dir is searched when generating the cache.
To add/remove the rootdir /scratch/simulations
run
smurf config {add,remove} rootdir /scratch/simulations
To add/remove a remote host on which to search on, run
smurf config {add,remove} host {remotehost}
Smurf uses ssh
in the background, so you can use any address (user@host
or just host
) which you can use with ssh {remotehost}
.
Please make sure that you have set up a key agent (e.g. ssh-agent
) so that you can login automatically.
Otherwise you have to type your password times.
To install smurf
on your computer or server, follow these steps:
- Make sure you have installed all the requirements listed below.
- Clone this repository.
- Navigate to the repository in your terminal and run
./scripts/deploy.sh
This installs the python packages (using python3 setup.py install --user
) and creates a wrapper to call it from the command line. Try it by running the command
smurf
If you get an error saying that the command can’t be found, make sure that ~/.local/bin
is in your PATH
variable (echo $PATH | grep ~/.local/bin
should produce some output).
You can add it by running
export PATH="$PATH:$HOME/.local/bin"
and adding the same line in your .bashrc/.zshrc
file.
To install the bash/zsh integration to use the tab completion, run
smurf enable_shell_plugin
and follow the instructions to activate it and add it to your .bashrc/.zshrc
file.
If you have rsync
installed on your machine, you can install smurf on a remote host to which you can ssh with ssh {remotehost}
via
./scripts/deploy.sh {remotehost}
This saves you from cloning the repository on all your hosts and makes it easy to setup a whole smurf network.
python3
- python3’s
setuptools
package
For each simulation I run, I store the complete source code, the config files and the binary and the output data in one single directory. I refer to this as the simulation directory (simdir). Alongside all the required code and data, I store meta data and scripts to run/build/queue the simulation in a directory called job inside the simdir. Usually, the simdir’s name indicates some of the simulations parameters.
Each simulation also gets its own uuid, such that it can be located. This uuid is stored inside {simdir}/job/uuid/{the uuid} as a file with the uuid as its filename. That way, its extremely easy to locate the simdir of a simulation given its uuid by using unix find or locate. For this smurf find can be used.
Additionally to the concept of simulation directories, I use the concept of project directories. These are indicated by a .project file which contains the name of the project. This file also serves for a way to find the project root directory walking up the directory tree until this file is found. (smurf project root).
- store items in a database (e.g. mysql) instead of json file
Define a structure for the meta information and add an identifier to the meta dir to make it detectable. Define which information is where and add versioning.
- smurf info add set/add command for fields
- create .local/bin if not present and make sure its in PATH