GitHub - mattn/dagu: A just another Cron alternative with a Web UI, but with much more capabilities. It aims to solve greater problems.

A just another Cron alternative with a Web UI, but with much more capabilities
It runs DAGs (Directed acyclic graph) defined in a simple YAML format.

Highlights

Install by placing just a single binary file
Schedule executions of DAGs with Cron expressions
Define dependencies between related jobs and represent them as a single DAG (unit of execution)

Getting started

See Install dagu and ️Quick start.

Motivation

In legacy complex systems there are implicit dependencies of jobs on each other. When there are hundreds of cron jobs in a server's crontab, it is impossible to keep track of the dependencies between them. If one job fails, it is impossible to know which one to rerun. You also have to SSH into the server to see the logs. And to rerun them, you have to manually run the shell scripts one by one. This is a huge hassle and makes operation impossible. We need a tool that can explicitly visualize and manage pipeline dependencies as a DAG. How nice it would be if we could visually check the dependencies, execution status, and logs of each job in a Web UI, and rerun or stop a series of jobs with just a mouse click!

Why not an existing workflow scheduler like Airflow?

There are existing tools such as Airflow, Prefect, Temporal, etc., but many libraries require you to write code in a programming language such as Python to define the DAG. In systems that have been in operation for a long time, there are already complex jobs with hundreds of thousands of lines of code written in other languages such as Perl or Shell Scripts. Adding another layer of Python on top of these codes would further reduce maintainability. So we developed dagu, which requires no coding, is easy to use, self-contained, and ideal for small projects.

How does it work?

dagu is a single command and it uses the local file system to store data. Therefore, no DBMS or cloud service is required. dagu executes DAGs defined in declarative YAML format. Existing programs can be used without any modification.

Install `dagu`

You can quickly install dagu command and try it out.

via Homebrew

brew install yohamta/tap/dagu

Upgrade to the latest version:

brew upgrade yohamta/tap/dagu

via Bash script

curl -L https://raw.githubusercontent.com/yohamta/dagu/main/scripts/downloader.sh | bash

via GitHub Release Page

Download the latest binary from the Releases page and place it in your $PATH (e.g. /usr/local/bin).

️Quick start

1. Launch the Web UI

Start the server with dagu server and browse to http://127.0.0.1:8080 to explore the Web UI.

2. Create a new DAG

Create a DAG by clicking the New DAG button on the top page of the web UI. Input example in the dialog.

Note: DAG (YAML) files will be placed in ~/.dagu/dags by default. See Admin Configuration for more details.

3. Edit the DAG

Go to the SPEC Tab and hit the Edit button. Copy & Paste this example YAML and click the Save button.

4. Execute the DAG

You can execute the example by pressing the Start button.

Note: Leave the parameter field in the dialog blank and press OK.

Command Line User Interface

dagu start [--params=<params>] <file> - Runs the DAG
dagu status <file> - Displays the current status of the DAG
dagu retry --req=<request-id> <file> - Re-runs the specified DAG run
dagu stop <file> - Stops the DAG execution by sending TERM signals
dagu restart <file> - Restart the current running DAG
dagu dry [--params=<params>] <file> - Dry-runs the DAG
dagu server [--host=<host>] [--port=<port>] [--dags=<path/to/the DAGs directory>] - Starts the web server for web UI
dagu scheduler [--dags=<path/to/the DAGs directory>] - Starts the scheduler process
dagu version - Shows the current binary version

The --config=<config> option is available to all commands. It allows to specify different dagu configuration for the commands. Which enables you to manage multiple dagu process in a single instance. See Admin Configuration for more details.

For example:

dagu server --config=~/.dagu/dev.yaml
dagu scheduler --config=~/.dagu/dev.yaml

Web User Interface

DAGs: It shows all DAGs and the real-time status.
DAG Details: It shows the real-time status, logs, and DAG configurations. You can edit DAG configurations on a browser.

You can switch to the vertical graph with the button on the top right corner.
Search DAGs: It greps given text across all DAGs.
Execution History: It shows past execution results and logs.
DAG Execution Log: It shows the detail log and standard output of each execution and step.

YAML format

Minimal Definition

The minimal DAG definition is as simple as follows:

steps:
  - name: step 1
    command: echo hello
  - name: step 2
    command: echo world
    depends:
      - step 1

Code Snippet

script field provides a way to run arbitrary snippets of code in any language.

steps:
  - name: step 1
    command: "bash"
    script: |
      cd /tmp
      echo "hello world" > hello
      cat hello
    output: RESULT
  - name: step 2
    command: echo ${RESULT} # hello world
    depends:
      - step 1

Environment Variables

You can define environment variables and refer to using env field.

env:
  - SOME_DIR: ${HOME}/batch
  - SOME_FILE: ${SOME_DIR}/some_file 
steps:
  - name: some task in some dir
    dir: ${SOME_DIR}
    command: python main.py ${SOME_FILE}

Parameters

You can define parameters using params field and refer to each parameter as $1, $2, etc. Parameters can also be command substitutions or environment variables. It can be overridden by --params= parameter of start command.

params: param1 param2
steps:
  - name: some task with parameters
    command: python main.py $1 $2

Named parameters are also available as follows:

params: ONE=1 TWO=`echo 2`
steps:
  - name: some task with parameters
    command: python main.py $ONE $TWO

Command Substitution

You can use command substitution in field values. I.e., a string enclosed in backquotes (`) is evaluated as a command and replaced with the result of standard output.

env:
  TODAY: "`date '+%Y%m%d'`"
steps:
  - name: hello
    command: "echo hello, today is ${TODAY}"

Conditional Logic

Sometimes you have parts of a DAG that you only want to run under certain conditions. You can use the precondition field to add conditional branches to your DAG.

For example, the below task only runs on the first date of each month.

steps:
  - name: A monthly task
    command: monthly.sh
    preconditions:
      - condition: "`date '+%d'`"
        expected: "01"

If you want the DAG to continue to the next step regardless of the step's conditional check result, you can use the continueOn field:

steps:
  - name: A monthly task
    command: monthly.sh
    preconditions:
      - condition: "`date '+%d'`"
        expected: "01"
    continueOn:
      skipped: true

Output

output field can be used to set a environment variable with standard output. Leading and trailing space will be trimmed automatically. The environment variables can be used in subsequent steps.

steps:
  - name: step 1
    command: "echo foo"
    output: FOO # will contain "foo"

Stdout and Stderr Redirection

stdout field can be used to write standard output to a file.

steps:
  - name: create a file
    command: "echo hello"
    stdout: "/tmp/hello" # the content will be "hello\n"

stderr field allows to redirect stderr to other file without writing to the normal log file.

steps:
  - name: output error file
    command: "echo error message >&2"
    stderr: "/tmp/error.txt"

Lifecycle Hooks

It is often desirable to take action when a specific event happens, for example, when a DAG fails. To achieve this, you can use handlerOn fields.

handlerOn:
  failure:
    command: notify_error.sh
  exit:
    command: cleanup.sh
steps:
  - name: A task
    command: main.sh

Repeating Task

If you want a task to repeat execution at regular intervals, you can use the repeatPolicy field. If you want to stop the repeating task, you can use the stop command to gracefully stop the task.

steps:
  - name: A task
    command: main.sh
    repeatPolicy:
      repeat: true
      intervalSec: 60

Other Available Fields

Combining these settings gives you granular control over how the DAG runs.

name: all configuration              # Name (optional, default is filename)
description: run a DAG               # Description
schedule: "0 * * * *"                # Execution schedule (cron expression)
group: DailyJobs                     # Group name to organize DAGs (optional)
tags: example                        # Free tags (separated by comma)
env:                                 # Environment variables
  - LOG_DIR: ${HOME}/logs
  - PATH: /usr/local/bin:${PATH}
logDir: ${LOG_DIR}                   # Log directory to write standard output, default: ${DAG_HOME}/logs/dags
restartWaitSec: 60                   # Wait 60s after the process is stopped, then restart the DAG.
histRetentionDays: 3                 # Execution history retention days (not for log files)
delaySec: 1                          # Interval seconds between steps
maxActiveRuns: 1                     # Max parallel number of running step
params: param1 param2                # Default parameters that can be referred to by $1, $2, ...
preconditions:                       # Precondisions for whether the it is allowed to run
  - condition: "`echo $2`"           # Command or variables to evaluate
    expected: "param2"               # Expected value for the condition
mailOn:
  failure: true                      # Send a mail when the it failed
  success: true                      # Send a mail when the it finished
MaxCleanUpTimeSec: 300               # The maximum amount of time to wait after sending a TERM signal to running steps before killing them
handlerOn:                           # Handlers on Success, Failure, Cancel, and Exit
  success:
    command: "echo succeed"          # Command to execute when the execution succeed
  failure:
    command: "echo failed"           # Command to execute when the execution failed
  cancel:
    command: "echo canceled"         # Command to execute when the execution canceled
  exit:
    command: "echo finished"         # Command to execute when the execution finished
steps:
  - name: some task                  # Step name
    description: some task           # Step description
    dir: ${HOME}/logs                # Working directory (default: the same directory of the DAG file)
    command: bash                    # Command and parameters
    stdout: /tmp/outfile
    ouptut: RESULT_VARIABLE
    script: |
      echo "any script"
    signalOnStop: "SIGINT"           # Specify signal name (e.g. SIGINT) to be sent when process is stopped
    mailOn:
      failure: true                  # Send a mail when the step failed
      success: true                  # Send a mail when the step finished
    continueOn:
      failure: true                   # Continue to the next regardless of the step failed or not
      skipped: true                  # Continue to the next regardless the preconditions are met or not
    retryPolicy:                     # Retry policy for the step
      limit: 2                       # Retry up to 2 times when the step failed
      intervalSec: 5                 # Interval time before retry
    repeatPolicy:                    # Repeat policy for the step
      repeat: true                   # Boolean whether to repeat this step
      intervalSec: 60                # Interval time to repeat the step in seconds
    preconditions:                   # Precondisions for whether the step is allowed to run
      - condition: "`echo $1`"       # Command or variables to evaluate
        expected: "param1"           # Expected Value for the condition

The global configuration file ~/.dagu/config.yaml is useful to gather common settings, such as logDir or env.

Executor

Executor is a different way of executing a Step; Executor can be set in the executor field.

HTTP Executor

The HTTP Executor allows you to send arbitrary HTTP requests.

steps:
  - name: send POST request
    executor: http
    command: POST https://foo.bar.com
    script: |
      {
        "timeout": 10,
        "headers": {
          "Authorization": "Bearer $TOKEN"
        },
        "query": {
          "key": "value"
        },
        "body": "post body"
      }

Admin Configuration

To configure dagu, please create the config file (default path: ~/.dagu/admin.yaml). All fields are optional.

# Web Server Host and Port
host: <hostname for web UI address>                          # default: 127.0.0.1
port: <port number for web UI address>                       # default: 8000

# path to the DAGs directory
dags: <the location of DAG configuration files>              # default: ${DAG_HOME}/dags

# Web UI Color & Title
navbarColor: <admin-web header color>                        # header color for web UI (e.g. "#ff0000")
navbarTitle: <admin-web title text>                          # header title for web UI (e.g. "PROD")

# Basic Auth
isBasicAuth: <true|false>                                    # enables basic auth
basicAuthUsername: <username for basic auth of web UI>       # basic auth user
basicAuthPassword: <password for basic auth of web UI>       # basic auth password

# Base Config
baseConfig: <base DAG config path> .                         # default: ${DAG_HOME}/config.yaml

# Others
logDir: <internal logdirectory>                              # default: ${DAG_HOME}/logs/admin
command: <Absolute path to the dagu binary>                  # default: dagu

Environment Variable

You can configure the dagu's internal work directory by defining DAGU_HOME environment variables. Default path is ~/.dagu/.

Sending email notifications

Email notifications can be sent when a DAG finished with an error or successfully. To do so, you can set the stmp field and related fields in the DAG specs. You can use any email delivery services (e.g., Sendgrid, Mailgun, etc).

# Eamil notification settings
mailOn:
  failure: true
  success: true

# SMTP server settings
smtp:
  host: "smtp.foo.bar"
  port: "587"
  username: "<username>"
  password: "<password>"

# Error mail configuration
errorMail:
  from: "[email protected]"
  to: "[email protected]"
  prefix: "[Error]"

# Info mail configuration
infoMail:
  from: "[email protected]"
  to: "[email protected]"
  prefix: "[Info]"

If you want to use the same settings for all DAGs, set them to the base configuration.

Base Configuration for all DAGs

Creating a base configuration (default path: ~/.dagu/config.yaml) is a convenient way to organize shared settings among all DAGs. The path to the base configuration file can be configured. See Admin Configuration for more details.

# directory path to save logs from standard output
logDir: /path/to/stdout-logs/

# history retention days (default: 30)
histRetentionDays: 3

# Eamil notification settings
mailOn:
  failure: true
  success: true

# SMTP server settings
smtp:
  host: "smtp.foo.bar"
  port: "587"
  username: "<username>"
  password: "<password>"

# Error mail configuration
errorMail:
  from: "[email protected]"
  to: "[email protected]"
  prefix: "[Error]"

# Info mail configuration
infoMail:
  from: "[email protected]"
  to: "[email protected]"
  prefix: "[Info]"

Scheduler

To run DAGs automatically, you need to run dagu scheduler process on your system.

Execution Schedule

You can specify the schedule with cron expression in the schedule field in the config file as follows:

schedule: "5 4 * * *" # Run at 04:05.
steps:
  - name: scheduled job
    command: job.sh

Or you can set multiple schedules:

schedule:
  - "30 7 * * *" # Run at 7:30
  - "0 20 * * *" # Also run at 20:00
steps:
  - name: scheduled job
    command: job.sh

Stop Schedule

If you want to start and stop a long-running process on a fixed schedule, you can define start and stop times as follows. At the stop time, each step's process receives a stop signal.

schedule:
  start: "0 8 * * *" # starts at 8:00
  stop: "0 13 * * *" # stops at 13:00
steps:
  - name: scheduled job
    command: job.sh

You can also set multiple start/stop schedules. In the following example, the process will run at 0:00-5:00 and 12:00-17:00.

schedule:
  start:
    - "0 0 * * *"
    - "12 0 * * *"
  stop:
    - "5 0 * * *"
    - "17 0 * * *"
steps:
  - name: some long-process
    command: main.sh

Restart Schedule

If you want to restart a DAG process on a fixed schedule, the restart field is also available. At the restart time, the DAG execution will be stopped and restarted again.

schedule:
  start: "0 8 * * *"    # starts at 8:00
  restart: "0 12 * * *" # restarts at 12:00
  stop: "0 13 * * *"    # stops at 13:00
steps:
  - name: scheduled job
    command: job.sh

The wait time after the job is stopped before restart can be configured in the DAG definition as follows. The default value is 0 (zero).

restartWaitSec: 60 # Wait 60s after the process is stopped, then restart the DAG.

steps:
  - name: step1
    command: python some_app.py

Run Scheduler as a daemon

The easiest way to make sure the process is always running on your system is to create the script below and execute it every minute using cron (you don't need root account in this way):

#!/bin/bash
process="dagu scheduler"
command="/usr/bin/dagu scheduler"

if ps ax | grep -v grep | grep "$process" > /dev/null
then
    exit
else
    $command &
fi

exit

Scheduler Configuration

Set the dags field to specify the directory of the DAGs.

dags: <the location of DAG configuration files> # default: (~/.dagu/dags)

REST API Interface

Please refer to REST API Docs

FAQ

How to contribute?

Feel free to contribute in any way you want. Share ideas, questions, submit issues, and create pull requests. Thanks!

Where is the history data stored?

It will store execution history data in the DAGU__DATA environment variable path. The default location is $HOME/.dagu/data.

Where are the log files stored?

It will store log files in the DAGU__LOGS environment variable path. The default location is $HOME/.dagu/logs. You can override the setting by the logDir field in a YAML file.

How long will the history data be stored?

The default retention period for execution history is 30 days. However, you can override the setting by the histRetentionDays field in a YAML file.

How to use specific `host` and `port` for `dagu server`?

dagu server's host and port can be configured in the admin configuration file as below. See Admin Configuration for more details.

host: <hostname for web UI address>                          # default: 127.0.0.1
port: <port number for web UI address>                       # default: 8000

How to specify the DAGs directory for `dagu server` and `dagu scheduler`?

You can customize DAGs directory that will be used by dagu server and dagu scheduler. See Admin Configuration for more details.

dags: <the location of DAG configuration files>              # default: ${DAG_HOME}/dags

How can I retry a DAG from a specific task?

You can change the status of any task to a failed state. Then, when you retry the DAG, it will execute the failed one and any subsequent.

How does it track running processes without DBMS?

dagu uses Unix sockets to communicate with running processes.

License

This project is licensed under the GNU GPLv3 - see the LICENSE.md file for details

Contributors

Made with contrib.rocks.

Name		Name	Last commit message	Last commit date
Latest commit History 876 Commits
.github		.github
admin		admin
assets/images		assets/images
cmd		cmd
docs		docs
examples		examples
internal		internal
scripts		scripts
testdata		testdata
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
agent.go		agent.go
agent_test.go		agent_test.go
codecov.yml		codecov.yml
go.mod		go.mod
go.sum		go.sum

Uh oh!

License

mattn/dagu

Folders and files

Latest commit

History

Repository files navigation

Highlights

Contents

Getting started

Motivation

Why not an existing workflow scheduler like Airflow?

How does it work?

Install dagu

via Homebrew

via Bash script

via GitHub Release Page

️Quick start

1. Launch the Web UI

2. Create a new DAG

3. Edit the DAG

4. Execute the DAG

Command Line User Interface

Web User Interface

YAML format

Minimal Definition

Code Snippet

Environment Variables

Parameters

Command Substitution

Conditional Logic

Output

Stdout and Stderr Redirection

Lifecycle Hooks

Repeating Task

Other Available Fields

Executor

HTTP Executor

Admin Configuration

Environment Variable

Sending email notifications

Base Configuration for all DAGs

Scheduler

Execution Schedule

Stop Schedule

Restart Schedule

Run Scheduler as a daemon

Scheduler Configuration

REST API Interface

FAQ

How to contribute?

Where is the history data stored?

Where are the log files stored?

How long will the history data be stored?

How to use specific host and port for dagu server?

How to specify the DAGs directory for dagu server and dagu scheduler?

How can I retry a DAG from a specific task?

How does it track running processes without DBMS?

License

Contributors

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Install `dagu`

How to use specific `host` and `port` for `dagu server`?

How to specify the DAGs directory for `dagu server` and `dagu scheduler`?

Packages