Requirements:
- Local: Docker
- Local:
python>=3.10 - Remote: Singularity
On your local host run:
pip install docksingDockSing is a pure-python lightweight CLI tool to orchestrate deployment of jobs to docker and slurm end points based on the compose specification and loosely inspired by Google Vertex AI.
Deploying a job on a local docker:
docksing --ssh username@hostname --config config.yaml --localDeploying a job on a remote Slurm HPC:
docksing --ssh username@hostname --config config.yaml DockSing exists to reduce the overhead effort required to scale from development to testing to deployment of experiments. Specifically, DockSing takes care of automatically converting docker-compose specifications to singularity specifications overloaded with SBATCH commands, lifting us from dealing with the nuisances of mapping and combining the three.
DockSing aims to simplify the experimentation workflow for those using docker and more specifically devcontainers.
Just like docker-compose, Docksing requires a config.yaml to initiate a job.
This config.yaml, however, slightly differs from a typical docker-compose file in that it is split in three chapters:
remotedir: Path to the target directory that will be created in the remote host. All files required to run the job, comprising of.sifimages, bind maps and eventual job outputs will be stored here.slurm: Chapter ofkey:valuemaps encodingsrunoptions (reference).container: Chapter containing all entries one would use in a normaldocker-composefile. Note that Docksing only supports some limited docker-compose functionalities, please refer to the supported compose specification section below.
Example of a config.yaml:
remotedir: path/to/remote/direcotry
slurm:
nodes: 1
cpus-per-task: 1
job-name: job-name
container:
image: tag
commands: ["sh -c","'echo Hello World'"]
environment:
- env_variable: env_content
- another_env_variable: another_env_content
volumes:
- /absolute/path/to/bind:/container/path/to/bind
- /another/absolute/path/to/bind:another/container/path/to/bindTo launch the job then run:
docksing --ssh username@hostname --config path/to/config.yaml Essentially the above commands automate the follwoing actions, in order:
- Attempts to establish a connection through SSH to the remote host
- Attempts to establish a connection to the local docker daemon
- Verifies that the image
tagis available in the local docker daemon - Creates the
remotedirin the remote host - Copies the image
tagpulled from the local docker daemon to theremotedir - Copies the content of all source binds in
volumesfrom the local host to the remote host - Converts the image
tagin a.sifbuild, compatible with singularity - Starts the
srunjob by passing all options found in theslurmchapter while also passing all options found incontainerto the nestedsingularity run
A side note, steps 7 and 8 and executed within the same srun instance to minimize queues on the remote.
In this use case we wish to print the content of some environment variables in a .txt file.
This can be achieved with the following config.yaml:
remotedir: target_directory_on_remote_host
slurm:
nodes: 1
cpus-per-task: 1
job-name: name_of_the_slurm_job
container:
image: alpine:latest
commands: ["sh -c","'echo the $VARIABLE is $VALUE > /output/result.txt'"]
environment:
- VARIABLE: color
- GOOGLE_APPLICATION_CREDENTIALS: credentials
- VALUE: red
volumes:
- /absolute/path/to/output:/output
First and foremost, we pull the image (or build a dockerfile) required to run the job:
$ docker pull alpine:latestDockSing will raise an error if it cannot find the image in the local docker daemon.
Afterwords, we may wish to assert whether our setup is correct by inspecting the explicit cli, through:
$ docksing --ssh username@hostname --config config.yaml --cli --local
docker run --env VARIABLE=color --env GOOGLE_APPLICATION_CREDENTIALS=credentials --env VALUE=red --volume /absolute/path/to/output:/output alpine:latest sh -c 'echo the $VARIABLE is $VALUE > /output/result.txt'If it does look right, we may proced to run a local run to assess whether our logic is correct:
$ docksing --ssh username@hostname --config config.yaml --localIf it is, we likewise check whether our setup is correct in the remote case:
$ docksing --ssh username@hostname --config config.yaml --cli
srun --nodes=1 --cpus-per-task=1 --job-name=name_of_the_slurm_job bash -c "singularity build target_directory_on_remote_host/91ef0af61f39.sif docker-archive://target_directory_on_remote_host/91ef0af61f39.tar && singularity run --env VARIABLE=color --env GOOGLE_APPLICATION_CREDENTIALS=credentials --env VALUE=red --bind target_directory_on_remote_host/output:/output target_directory_on_remote_host/91ef0af61f39.sif sh -c 'echo the $VARIABLE is $VALUE > /output/result.txt'"Note how a simple docker run quickly explodes in complexity and verbosity when we need to deploy it remotely via SLURM on singularity, which may be prone to errors.
If the command looks right, we may actually submit the job on the HPC via:
$ docksing --ssh username@hostname --config config.yaml Which lauches the job.
Often, however, one may which to monitor the logs to assess how the job is going.
To do so, one can simply run:
$ docksing --ssh username@hostname --config config.yaml --stream Which streams the remote stdout and stderr to the current console.
- Launching a local job on docker
docksing --ssh username@hostname --config config.yaml --local - Launching a remote job
docksing --ssh username@hostname --config config.yaml --cli- Inspecting local cli
docksing --ssh username@hostname --config config.yaml --local --cli- Inspecting remote cli
docksing --ssh username@hostname --config config.yaml --cli- Stream the remote job logs to a local console
docksing --ssh username@hostname --config config.yaml --streamworking_direnvironmentvolumescommandsentrypoint
DockSing is developed with the aim of maintaining the highest adherence to existing standards with the lowest code overhead possible, in order to retrospectively preserve interoperability with docker, singularity and SLURM documentations.
To squeeze the most out of DockSing it is advisable to have good proficiency with the docker ecosystem.
Docksing was tested on a Windows Linux Subsytem, milage may very on other settings.
Depending on the image size and the performance of the machine hosting the local docker daemon,for larger images (>5GB) you may receive a timeout error:
requests.exceptions.ReadTimeout: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)This can be avoided by increasing the default timeout from 60 seconds to an higher value, 600 fror example, using the timeout argument:
$ docksing --ssh username@hostname --config config.yaml --timeout 600By default, Docker assignes /root as working directory, while singularity uses the current working directory.
This may cause odd behaviors when jobs that works when launched on docker fail on singularity.
The issue above can be addressed by explicitly decalring a --working-dir in the .yaml file.