Skip to content

CWL Somatic Pipeline Walkthrough

Thomas B. Mooney edited this page Oct 18, 2022 · 5 revisions

This page walks through setting up an Analysis Project using a menu item for Somatic analysis. Throughout this walkthrough, the examples will use an imaginary Analysis Project named "Example Analysis Project for Somatic Pipeline Walkthrough".

Configuring the Analysis Project

Add the Somatic Menu Item

If this is a new project, create an new Analysis Project:

genome analysis-project create --name "Example Analysis Project for Somatic Pipeline Walkthrough" --environment automated

and then, from this prompt, enter "y" to choose a configuration:

Would you like to add configuration from the preset config menu items?
Reply with (y)es, add config / (n)o, start with no config:

If this is an existing project, then add a new menu item:

genome analysis-project add-menu-item "Example Analysis Project for Somatic Pipeline Walkthrough"

In either case, a list of menu items will be presented, e.g.:

 1: CLE Germline Exome (3770b8510d5a459f9c0bb01fabf56337)                                       active
 2: CLE IDT exome QC speedseq alignment only (2f89a3e18a1d43c991b72f5be60a8cc7)                 active
 3: CLE Somatic Exome (9ab6e28f832a428393b87b171d444401)                                        active
 4: CLE Somatic HapMap Mixed Exome (deb7a88c7b1642c78dba73a56715c9dc)                           active
 5: CLE germline exome TruSeq (edc0393e1bb946ccb98653e56e428893)                                active
 6: CLE somatic exome TruSeq (eb031f5f8b224df98a29d20cbfe14857)                                 active
 7: Human GRCh38 RNA Alignment and QC (1088495d695741789c2446650685fef6)                        active
 8: Human GRCh38DH Germline Exome Alignment + QC + GATK (06353997ae404021bfbc5e4a89f3edda)      active
 9: Human GRCh38DH Targeted Alignment and QC (69959dc859d2482c92be6e027471fee2)                 active
10: Human GRCh38DH WGS Alignment and QC (15ceccce5ccf4547b8ea2e7046f98ecd)                      active
11: Human Somatic Exome GRCh38 (839893b77ac145efb68edb1387253a9c)                               active
12: Human UMI Molecular Alignment + QC (3f2b03dff7444e95a8a4023c9080e85f)                       active

Please confirm the above items for 'analysis_menu_items' or modify your selection.
Reply with (c)ontinue, (h)elp, e(x)it, or specify item numbers to use:

Choose the "Human Somatic Exome GRCh38" option (in this case, it's number 11, but that may change over time). That yields:

1: Human Somatic Exome GRCh38 (839893b77ac145efb68edb1387253a9c)        active

Please confirm the above items for 'analysis_menu_items' or modify your selection.
Reply with (c)ontinue, (h)elp, e(x)it, (b)ack, or specify item numbers to use:

At this point, continue with c and the menu item will be added to the project. You can check the project's config, like so:

genome analysis-project show-config "Example Analysis Project for Somatic Pipeline Walkthrough"

ID                                 FILE_PATH                                                                                                                       UPDATED_AT            IS_CONCRETE   ANALYSIS_MENU_ITEM.NAME             STATUS     TAGS.NAME
--                                 ---------                                                                                                                       ----------            -----------   -----------------------             ------     ---------
0ac4a6fe6abc46eda957a98110d88cc7   /gscmnt/gc2560/core/default_config_menu_items/human_somatic_exome.yml                                                           2019-03-08 15:38:47                 Human Somatic Exome GRCh38          active     <NULL>

Add a non-default menu item

To test a new non-default menu item, custom config files (.yml) can be added.

genome analysis-project add-config-file --config-file=$custom_config.yml $AnPID

Add or Verify the Environment Configuration

For an Analysis Project, an environment configuration is needed to, at a minimum, specify the disk space to use. Additionally, for this (and future CWL-based runs), we must specify to use the "new" docker image and Cromwell. (For the correct VERSION of the Docker image to use, see this page for some options.) The new pipelines do not work with the legacy image or with Toil. The last parameter in this example is to avoid attempting to use older backends (like PTero) that are no longer available or don't work with CWL.

disk_group_models: "example_lab_gms"
disk_group_alignments: "example_lab_gms"
lsb_sub_additional: "docker(registry.gsc.wustl.edu/apipe-builder/genome_perl_environment:VERSION)"
cwl_runner: cromwell
workflow_builder_backend: simple

For more info on configuration for compute1, see Example Environment Configuration for GMS on compute1.

For a new analysis project, save your environment file with the correct disk groups and then add it to the project. If environment.yaml is in the current directory, this command adds it:

genome analysis-project add-environment-file "Example Analysis Project for Somatic Pipeline Walkthrough" environment.yaml

For an existing analysis project, there may already be an environment file. This command will show its location at the bottom of the output:

genome analysis-project view --fast "Example Analysis Project for Somatic Pipeline Walkthrough"


...
Environment config: /gscmnt/gc2560/core/analysis_project/1234567890abcdef1234567890abcdef

Under that directory will be genome/config.yaml. It will need to be replaced with an updated version edited to add any of the necessary lines for running this pipeline. (Note that "legacy" pipelines and CWL pipelines CANNOT be mixed in the same analysis project, as legacy pipelines require a different docker image in the configuration. Similarly compute0 and compute1 configurations cannot be mixed within a single project.) Once you have an updated environment file ready, it can be put into place with:

genome analysis-project update-environment-file "Example Analysis Project for Somatic Pipeline Walkthrough" updated-environment.yaml

Add Subject Mappings

Once the configuration is in place, subject mappings must be added in order to tell the system which samples to pair for the analysis. When using the menu item a four column TSV file should be created, like this:

tumor_sample	H_EX-example1-tumor	normal_sample	H_EX-example1-normal
tumor_sample	H_EX-example1-met	normal_sample	H_EX-example1-normal
tumor_sample	1234567890	normal_sample	1234567891

The first and third columns should be the literal strings "tumor_sample" and "normal_sample". The second and fourth can be either the names or IDs of samples that should be paired together. (The tumor should follow "tumor_sample" and the normal should follow "normal_sample".)

Once this file has been created, import it into the Analysis Project:

genome analysis-project subject-mapping import cwl-pipeline "Example Analysis Project for Somatic Pipeline Walkthrough" subject_mappings.tsv

If you're not sure what subject mappings to use and the data is already in the GMS, you can try to get a file of predictions with:

genome analysis-project subject-mapping predict somatic-validation "Example Analysis Project for Somatic Pipeline Walkthrough" predictions.tsv

This will produce a TSV file with tumor samples in the first column and normal samples in the second column. If these predictions look good, this file can be adapted to the four column format required for the cwl-pipeline subject-mapping importer. Assign Instrument Data

If the project has yet to be sequenced, it's possible that production will link this Analysis Project to a Work Order and instrument data will be added automatically. Otherwise, this command can be used to link data to the project:

genome analysis-project add-instrument-data 2345678901

Multiple instrument data can be assigned. The command also accepts lookups like sample.name=H_EX-sample1-tumor. Instrument data must have already been synchronized from LIMS or imported into the GMS before it can be added with this command. Release the Analysis Project

Once the project is configured as desired, tell the system it's ready for processing:

genome analysis-project release "Example Analysis Project for Somatic Pipeline Walkthrough"

Monitoring Progress

Monitoring an Analysis Project

The quickest way to get an overview of an Analysis Project's status is with the previously mentioned view command:

genome analysis-project view --fast "Example Analysis Project for Somatic Pipeline Walkthrough"
'analysis_project' may require verification...
Resolving parameter 'analysis_project' from command argument 'Example Analysis Project for Somatic Pipeline Walkthrough'... found 1
=== Analysis Project ===
ID: 1234567890abcdef1234567890abcdef                                                                                                                         Name: Example Analysis Project for Somatic Pipeline Walkthrough
Run as: prod-builder                                                             Created: 1985-04-01 15:38:25
Updated: 20xx-03-12 14:16:23                                                     Created by: tmooney
Status: In Progress

=== Instrument Data ===
Genome::InstrumentData::Solexa
             new 3
          failed 5
       processed 10
           Total 18

=== Models ===
Genome::Model::CwlPipeline
       Buildless 1
          Failed 1
         Running 1
       Succeeded 4
           Total 7

=== Configuration Items ===
Human Somatic Exome GRCh38 (839893b77ac145efb68edb1387253a9c): Human Somatic Exome GRCh38 -- Alignment and Variant Detection
    ID: abcdef7890abcdef1234567890abcdee                                         Concrete: Yes
    Created by: tmooney                                                          Status: active
    Created: 1985-04-09 15:38:47                                                                                                                                 Updated: 20xx-03-08 15:38:47
    Tags: 

Environment config: /gscmnt/gc2560/core/analysis_project/1234567890abcdef1234567890abcdef

Status is also available in a web browser by searching for the Analysis Project at https://spectacle.gsc.wustl.edu/

Monitoring an Individual Build

The view command gave a summary of the model statuses. To get a listing of the the statuses of all builds in the project:

genome model status "Example Analysis Project for Somatic Pipeline Walkthrough"

Resolving parameter 'models' from command argument 'Example Analysis Project for Somatic Pipeline Walkthrough'... found 7
H_EX.individual1.prod-cwl.somatic_exome	abcdef7890abcdef1234567890bbcdee	Succeeded
H_EX.individual1.prod-cwl.somatic_exome-1	abcdef7890abcdef1234567890cbcdee	Succeeded
H_EX.individual1.prod-cwl.somatic_exome-2	abcdef7890abcdef1234567890dbcdee	Failed
H_EX.individual2.prod-cwl.somatic_exome	abcdef7890abcdef1234567890ebcdee	Running
H_EX.individual2.prod-cwl.somatic_exome-1	abcdef7890abcdef1234567890fbcdee	Succeeded
H_EX.individual2.prod-cwl.somatic_exome-2	abcdef7890abcdef12345678909bcdee	Succeeded
H_EX.individual3.prod-cwl.somatic_exome	abcdef7890abcdef12345678908bcdee	Build Needed

The statuses here should add up to the summarized statuses from the view command.

To investigate a single build that has failed:

genome model build view abcdef7890abcdef1234567890dbcdee

'build' may require verification...
Resolving parameter 'build' from command argument 'abcdef7890abcdef1234567890dbcdee'... found 1
=== Build ===
Build ID: abcdef7890abcdef1234567890dbcdee      Build Status: Failed                           
Model ID: bbbdef4567abcdef8888888888dbcdee      Model Name: H_EX.individual1.prod-cwl.somatic_exome-2
Run by: prod-builder                            Processing Profile ID: 31e873b623e2454e9b68a53dac9356c4
Build Scheduled: 2019-03-13 21:03:36            Build Completed: 2019-03-14 02:49:10           

Build Class: Genome::Model::Build::CwlPipeline 
Software Revision: /gsc/scripts/opt/genome/snapshots/genome-3781/lib/perl/Genome/Site/TGI/SiteLib:/gsc/scripts/opt/genome/snapshots/genome-3781/lib/perl:/etc/perl:/usr/local/lib/site_perl:/usr/lib/x86_64-linux-gnu/perl-base
Software Result Test Name(s): No results found 
Data Directory: /gscmnt/gc99999/example/model_data/bbbdef4567abcdef8888888888dbcdee/buildabcdef7890abcdef1234567890dbcdee

Analysis Project: Example Analysis Project for Somatic Pipeline Walkthrough

=== Workflow ===
                                    ID    LSF_ID  SHARD       STATUS                             START                               END  NAME
  b9eb3c3b-3d45-4470-a29d-e9b21b17c8ab                        Failed     2019-03-13T21:04:46.145-05:00     2019-03-14T02:48:05.057-05:00  somatic_exome.cwl
  65bef02a-52ae-4b53-994c-457e14cc3c01                        Failed     2019-03-13T21:04:51.017-05:00     2019-03-13T21:05:07.474-05:00  . exome_alignment.cwl
  8276dbba-4aac-43af-b002-f9f38cecf6d7                        Failed     2019-03-13T21:04:53.076-05:00     2019-03-13T21:05:06.469-05:00  . . bam_to_bqsr.cwl
                                         5350753              Failed     2019-03-13T21:04:58.239-05:00     2019-03-13T21:05:05.682-05:00  . . . merge
  67f45bc7-1473-4204-ba93-027123519382                     Succeeded     2019-03-13T21:04:51.017-05:00     2019-03-14T02:48:03.977-05:00  . exome_alignment.cwl
  7b054299-947d-4413-8473-e911e7dae096                     Succeeded     2019-03-13T21:04:53.076-05:00     2019-03-14T02:27:33.636-05:00  . . bam_to_bqsr.cwl
  6a5600b3-2933-4813-966f-d31d894dc32e            0        Succeeded     2019-03-13T21:04:56.158-05:00     2019-03-13T22:04:44.334-05:00  . . . align.cwl
                                         5350752                Done     2019-03-13T21:04:58.239-05:00     2019-03-13T22:04:43.897-05:00  . . . . align_and_tag
                                         5351674                Done     2019-03-13T22:04:47.376-05:00     2019-03-13T22:37:04.379-05:00  . . . merge
                                         5352020                Done     2019-03-13T22:37:05.656-05:00     2019-03-13T22:47:14.221-05:00  . . . name_sort
                                         5352127                Done     2019-03-13T22:47:15.706-05:00     2019-03-14T00:33:24.122-05:00  . . . mark_duplicates_and_sort
                                         5352826                Done     2019-03-14T00:33:25.837-05:00     2019-03-14T00:49:14.249-05:00  . . . bqsr
                                         5352902                Done     2019-03-14T00:49:15.416-05:00     2019-03-14T02:14:32.616-05:00  . . . apply_bqsr
                                         5353656                Done     2019-03-14T02:14:34.516-05:00     2019-03-14T02:27:17.137-05:00  . . . bam_to_cram
                                         5353736                Done     2019-03-14T02:27:18.336-05:00     2019-03-14T02:27:32.830-05:00  . . . index_cram
  81c48e11-adef-4ee8-8282-8fdff72f1a28                     Succeeded     2019-03-14T02:27:35.005-05:00     2019-03-14T02:48:02.958-05:00  . . qc_exome.cwl
  c846e057-ed69-4cb3-8ff8-6e71b3c73ab1                     Succeeded     2019-03-14T02:27:37.058-05:00     2019-03-14T02:39:48.359-05:00  . . . hs_metrics.cwl
                                         5353746  0             Done     2019-03-14T02:27:41.185-05:00     2019-03-14T02:39:07.586-05:00  . . . . collect_per_target_hs_metrics
                                         5353747  0             Done     2019-03-14T02:27:41.186-05:00     2019-03-14T02:39:47.144-05:00  . . . . collect_per_base_hs_metrics
  5e629888-b5e4-41c6-89e5-090446db8ea0                     Succeeded     2019-03-14T02:27:37.058-05:00     2019-03-14T02:47:02.776-05:00  . . . cram_to_bam_and_index.cwl
                                         5353742                Done     2019-03-14T02:27:39.095-05:00     2019-03-14T02:44:19.094-05:00  . . . . cram_to_bam
                                         5353819                Done     2019-03-14T02:44:20.615-05:00     2019-03-14T02:47:02.211-05:00  . . . . index_bam
                                         5353740                Done     2019-03-14T02:27:37.056-05:00     2019-03-14T02:28:30.515-05:00  . . . samtools_flagstat
                                         5353744                Done     2019-03-14T02:27:37.057-05:00     2019-03-14T02:38:09.340-05:00  . . . collect_insert_size_metrics
                                         5353741                Done     2019-03-14T02:27:37.057-05:00     2019-03-14T02:28:28.724-05:00  . . . select_variants
                                         5353743                Done     2019-03-14T02:27:37.058-05:00     2019-03-14T02:47:04.281-05:00  . . . collect_alignment_summary_metrics
                                         5353745                Done     2019-03-14T02:27:38.075-05:00     2019-03-14T02:45:34.177-05:00  . . . collect_roi_hs_metrics
                                         5353833                Done     2019-03-14T02:47:04.816-05:00     2019-03-14T02:48:02.695-05:00  . . . verify_bam_id
Name: merge    Status: Failed
Stderr Log: /gscmnt/gc99999/example/model_data/bbbdef4567abcdef8888888888dbcdee/buildabcdef7890abcdef1234567890dbcdee/tmp/cromwell-executions/somatic_exome.cwl/b9eb3c3b-3d45-4470-a29d-e9b21b17c8ab/call-tumor_alignment_and_qc/exome_alignment.cwl/65bef02a-52ae-4b53-994c-457e14cc3c01/call-alignment/bam_to_bqsr.cwl/8276dbba-4aac-43af-b002-f9f38cecf6d7/call-merge/execution/stderr

The detailed cromwell workflow will be displayed to help identify which steps are currently running or have failed. (This feature will not work in the "legacy" docker image.) The cromwell workflow will not display future steps--they only are added to this view once they have begun.

Clone this wiki locally