hello-innsbruck-1.nf
:
\small
process sayHello {
publishDir 'output', mode: 'copy'
output:
path 'output.txt'
script:
"""
echo 'Hello Innsbruck!' > output.txt
"""
}
workflow {
sayHello()
}
Define a process called 'sayHello':
\small
process sayHello {
// Specify the directory where output files will be saved
publishDir 'output', mode: 'copy'
// Declare that this process will produce a file called 'output.txt'
output:
path "output.txt"
script:
// The shell script to run for this process
"""
echo 'Hello Innsbruck!' > output.txt
"""
}
Defines the main workflow block and call the sayHello process:
\small
workflow {
sayHello()
}
\normalsize The workflow block is where we connect different processes together.
nextflow run hello-innsbruck-1.nf
This command creates a file at output/output.txt
containing the text Hello Innsbruck
.
Using the -resume
option reruns the pipeline, but only executes processes whose inputs have changed:
nextflow run hello-innsbruck-1.nf -resume
Inputs can be used in the script
block with ${input_name}
:
\small
process sayHello {
publishDir 'output', mode: 'copy'
input:
val name
output:
path "output.txt"
script:
"""
echo 'Hello ${name}' > output.txt
"""
}
In the workflow
block, inputs can be passed to processes like function arguments:
\small
workflow {
sayHello("Innsbruck")
}
nextflow run hello-innsbruck-2.nf
params.name = 'Innsbruck'
- If you do not specify
--name
, it defaults to'Innsbruck'
. - If you run with
--name 'Lukas'
, it overrides the default.
workflow {
sayHello(params.name)
}
nextflow run hello-innsbruck-3.nf --name 'Lukas'
This enable parallelization:
workflow {
// create a channel for inputs
name_ch = Channel.of('World', 'Lukas', 'Innsbruck')
// call process for each item in the channel
sayHello(name_ch)
}
nextflow run hello-innsbruck-4.nf
- Problem: Each execution of the process writes to the same output file, overwriting previous results.
- Solution: Use unique filenames by including input values in the filename.
\small
process sayHello {
publishDir 'results', mode: 'copy'
input:
val name
output:
path "output_${name}.txt" // We need to update the name of the output
script:
"""
# The shell script to run for this process
echo 'Hello ${name}!' > output_${name}.txt
"""
}
nextflow run hello-innsbruck-5.nf
- The
collect
operator gathers all emitted values into a single list. - This is useful when you want to pass all items at once to a single process.
\small
process collectGreetings {
publishDir 'output', mode: 'copy'
input:
// Accept a list of paths (all output files from sayHello)
path input_files
output:
path "output.txt"
script:
"""
# Concatenate all input greeting files into one
cat ${input_files} > 'output.txt'
"""
}
- The output channel from
sayHello
contains one item per output file. - To merge these files, we need to use
collect
so the next process runs once with all items. - Without
collect
, the downstream process would run once per item. \smallworkflow { name_ch = Channel.of('World', 'Lukas', 'Innsbruck') hello_ch = sayHello(name_ch) collectGreetings(hello_ch.collect()) }
\normalsize
\small
nextflow run hello-innsbruck-6.nf
Processes define tasks and use input
, output
, and script
blocks.
Workflows connect processes and pass data between them.
Parameters allow customization using params.<name>
and --<name>
on the command line.
Channels handle data flow:
Channel.of(...)
creates a channel with static values.- Channels trigger processes for each item.
- Use
.collect()
to group multiple values into one input.
Output merging: Use collect
when you want a process to handle all outputs at once (e.g., merging files).
- Instead of a
val
input, we define apath
input (used for file paths). - Paths are automatically staged by Nextflow
- if the process runs on a different machine, Nextflow ensures the file is available there.
\small
process countWords { input: path file output: path "${file.simpleName}_wordcount.txt" script: """ wc -w ${file} > ${file.simpleName}_wordcount.txt """ }
- Channel.fromPath() creates a channel from one or more file paths.
- Supports wildcards (e.g. "data/*.txt") for batch processing.
- Ensures each matching file is passed individually to the process.
params.input = 'data/pg84.txt' workflow { // Create a file channel from the input pattern input_files = Channel.fromPath(params.input) countWords(input_files) }
nextflow run word-count-1.nf --input data/pg84.txt
nextflow run word-count-1.nf --input "data/*.txt"
Important: Use quotes (" "
) around the pattern to prevent the shell from expanding the wildcard (*). This ensures Nextflow receives the pattern.
We can add the same merge logic from the hello-innsbruck
example:
\small
process collectCountings {
publishDir 'output', mode: 'copy'
input:
path input_files
output:
path "counts.txt"
script:
"""
# Concatenate all input greeting files into one
cat ${input_files} > 'counts.txt'
"""
}
workflow {
input_files = Channel.fromPath(params.input)
wordcounts = countWords(input_files)
merged = collectCountings(wordcounts.collect())
}
nextflow run word-count-2.nf --input "data/*.txt"
- Parameters can be set via the command line
- Or defined in a separate config file
- Config files are useful for managing different experiments or setups
params {
imput = "data/*.txt"
output = "output/gutenberg"
}
nextflow run word-count-2.nf -c gutenberg.config
<!->
Use a simple r script to create a barplot of the merged file
\small
process plotResults {
publishDir 'output', mode: 'copy'
input:
path merged_counts
output:
path 'wordcount_barplot.png'
script:
"""
#!RScript
library(ggplot2);
data <- read.table('${merged_counts}', col.names=c('count', 'file'));
data\$file <- factor(data\$file, levels = data\$file);
png('wordcount_barplot.png', width=1024, height=768);
ggplot(data, aes(x = file, y = count)) +
geom_bar(stat = 'identity', fill = 'steelblue') +
labs(title = 'Word Count per File', x = 'File', y = 'Word Count')
dev.off();
"""
}
workflow {
input_files = Channel.fromPath(params.input)
wordcounts = countWords(input_files)
merged = collectCountings(wordcounts.collect())
}
nextflow run word-count-3.nf --input "data/*.txt"
Option A: Use a Conda environment
- Activate a Conda environment before running the workflow.
- Works well when executing the workflow on a single machine.
- Not ideal for distributed or cluster environments.
Option B: Use one environment or one container per process Recommended
- Assign a specific container or environment to each process in the workflow.
- Ensures reproducibility and isolation.
- Works well across multiple machines or in HPC/cluster settings.
- Supports technologies like Docker, Singularity, or Conda (with Nextflow integration).
- Add a profile to
nextflow.config
. nextflow.config
uses the same syntax as process-specific configuration files.- It is loaded automatically and used to set global options.
- A user-provided config (via
-c
) can override the defaultnextflow.config
.
\small
profiles {
singularity {
singularity.enabled = true
singularity.autoMounts = true
process.container = '/mnt/genepi-lehre/teaching/scicomp/singularity/gwas-example.sif'
}
}
nextflow run word-count-3.nf -profile singularity --input "data/*.txt"
A a real example with our GWAS pipeline
https://github.com/lukfor/gwas-example
nextflow run lukfor/gwas-example -profile singularity