-
Notifications
You must be signed in to change notification settings - Fork 57
CWL Somatic Pipeline Walkthrough
This page walks through setting up an Analysis Project using a menu item for Somatic analysis. Throughout this walkthrough, the examples will use an imaginary Analysis Project named "Example Analysis Project for Somatic Pipeline Walkthrough".
If this is a new project, create an new Analysis Project:
genome analysis-project create --name "Example Analysis Project for Somatic Pipeline Walkthrough" --environment automatedand then, from this prompt, enter "y" to choose a configuration:
Would you like to add configuration from the preset config menu items?
Reply with (y)es, add config / (n)o, start with no config:
If this is an existing project, then add a new menu item:
genome analysis-project add-menu-item "Example Analysis Project for Somatic Pipeline Walkthrough"In either case, a list of menu items will be presented, e.g.:
1: CLE Germline Exome (3770b8510d5a459f9c0bb01fabf56337) active
2: CLE IDT exome QC speedseq alignment only (2f89a3e18a1d43c991b72f5be60a8cc7) active
3: CLE Somatic Exome (9ab6e28f832a428393b87b171d444401) active
4: CLE Somatic HapMap Mixed Exome (deb7a88c7b1642c78dba73a56715c9dc) active
5: CLE germline exome TruSeq (edc0393e1bb946ccb98653e56e428893) active
6: CLE somatic exome TruSeq (eb031f5f8b224df98a29d20cbfe14857) active
7: Human GRCh38 RNA Alignment and QC (1088495d695741789c2446650685fef6) active
8: Human GRCh38DH Germline Exome Alignment + QC + GATK (06353997ae404021bfbc5e4a89f3edda) active
9: Human GRCh38DH Targeted Alignment and QC (69959dc859d2482c92be6e027471fee2) active
10: Human GRCh38DH WGS Alignment and QC (15ceccce5ccf4547b8ea2e7046f98ecd) active
11: Human Somatic Exome GRCh38 (839893b77ac145efb68edb1387253a9c) active
12: Human UMI Molecular Alignment + QC (3f2b03dff7444e95a8a4023c9080e85f) active
Please confirm the above items for 'analysis_menu_items' or modify your selection.
Reply with (c)ontinue, (h)elp, e(x)it, or specify item numbers to use:
Choose the "Human Somatic Exome GRCh38" option (in this case, it's number 11, but that may change over time). That yields:
1: Human Somatic Exome GRCh38 (839893b77ac145efb68edb1387253a9c) active
Please confirm the above items for 'analysis_menu_items' or modify your selection.
Reply with (c)ontinue, (h)elp, e(x)it, (b)ack, or specify item numbers to use:
At this point, continue with c and the menu item will be added to the project. You can check the project's config, like so:
genome analysis-project show-config "Example Analysis Project for Somatic Pipeline Walkthrough"
ID FILE_PATH UPDATED_AT IS_CONCRETE ANALYSIS_MENU_ITEM.NAME STATUS TAGS.NAME
-- --------- ---------- ----------- ----------------------- ------ ---------
0ac4a6fe6abc46eda957a98110d88cc7 /gscmnt/gc2560/core/default_config_menu_items/human_somatic_exome.yml 2019-03-08 15:38:47 Human Somatic Exome GRCh38 active <NULL>
To test a new non-default menu item, custom config files (.yml) can be added.
genome analysis-project add-config-file --config-file=$custom_config.yml $AnPIDFor an Analysis Project, an environment configuration is needed to, at a minimum, specify the disk space to use. Additionally, for this (and future CWL-based runs), we must specify to use the "new" docker image and Cromwell. (For the correct VERSION of the Docker image to use, see this page for some options.) The new pipelines do not work with the legacy image or with Toil. The last parameter in this example is to avoid attempting to use older backends (like PTero) that are no longer available or don't work with CWL.
disk_group_models: "example_lab_gms"
disk_group_alignments: "example_lab_gms"
lsb_sub_additional: "docker(registry.gsc.wustl.edu/apipe-builder/genome_perl_environment:VERSION)"
cwl_runner: cromwell
workflow_builder_backend: simpleFor more info on configuration for compute1, see Example Environment Configuration for GMS on compute1.
For a new analysis project, save your environment file with the correct disk groups and then add it to the project. If environment.yaml is in the current directory, this command adds it:
genome analysis-project add-environment-file "Example Analysis Project for Somatic Pipeline Walkthrough" environment.yamlFor an existing analysis project, there may already be an environment file. This command will show its location at the bottom of the output:
genome analysis-project view --fast "Example Analysis Project for Somatic Pipeline Walkthrough"
...
Environment config: /gscmnt/gc2560/core/analysis_project/1234567890abcdef1234567890abcdef
Under that directory will be genome/config.yaml. It will need to be replaced with an updated version edited to add any of the necessary lines for running this pipeline. (Note that "legacy" pipelines and CWL pipelines CANNOT be mixed in the same analysis project, as legacy pipelines require a different docker image in the configuration. Similarly compute0 and compute1 configurations cannot be mixed within a single project.) Once you have an updated environment file ready, it can be put into place with:
genome analysis-project update-environment-file "Example Analysis Project for Somatic Pipeline Walkthrough" updated-environment.yamlOnce the configuration is in place, subject mappings must be added in order to tell the system which samples to pair for the analysis. When using the menu item a four column TSV file should be created, like this:
tumor_sample H_EX-example1-tumor normal_sample H_EX-example1-normal
tumor_sample H_EX-example1-met normal_sample H_EX-example1-normal
tumor_sample 1234567890 normal_sample 1234567891
The first and third columns should be the literal strings "tumor_sample" and "normal_sample". The second and fourth can be either the names or IDs of samples that should be paired together. (The tumor should follow "tumor_sample" and the normal should follow "normal_sample".)
Once this file has been created, import it into the Analysis Project:
genome analysis-project subject-mapping import cwl-pipeline "Example Analysis Project for Somatic Pipeline Walkthrough" subject_mappings.tsv
If you're not sure what subject mappings to use and the data is already in the GMS, you can try to get a file of predictions with:
genome analysis-project subject-mapping predict somatic-validation "Example Analysis Project for Somatic Pipeline Walkthrough" predictions.tsvThis will produce a TSV file with tumor samples in the first column and normal samples in the second column. If these predictions look good, this file can be adapted to the four column format required for the cwl-pipeline subject-mapping importer. Assign Instrument Data
If the project has yet to be sequenced, it's possible that production will link this Analysis Project to a Work Order and instrument data will be added automatically. Otherwise, this command can be used to link data to the project:
genome analysis-project add-instrument-data 2345678901Multiple instrument data can be assigned. The command also accepts lookups like sample.name=H_EX-sample1-tumor. Instrument data must have already been synchronized from LIMS or imported into the GMS before it can be added with this command.
Release the Analysis Project
Once the project is configured as desired, tell the system it's ready for processing:
genome analysis-project release "Example Analysis Project for Somatic Pipeline Walkthrough"The quickest way to get an overview of an Analysis Project's status is with the previously mentioned view command:
genome analysis-project view --fast "Example Analysis Project for Somatic Pipeline Walkthrough"
'analysis_project' may require verification...
Resolving parameter 'analysis_project' from command argument 'Example Analysis Project for Somatic Pipeline Walkthrough'... found 1
=== Analysis Project ===
ID: 1234567890abcdef1234567890abcdef Name: Example Analysis Project for Somatic Pipeline Walkthrough
Run as: prod-builder Created: 1985-04-01 15:38:25
Updated: 20xx-03-12 14:16:23 Created by: tmooney
Status: In Progress
=== Instrument Data ===
Genome::InstrumentData::Solexa
new 3
failed 5
processed 10
Total 18
=== Models ===
Genome::Model::CwlPipeline
Buildless 1
Failed 1
Running 1
Succeeded 4
Total 7
=== Configuration Items ===
Human Somatic Exome GRCh38 (839893b77ac145efb68edb1387253a9c): Human Somatic Exome GRCh38 -- Alignment and Variant Detection
ID: abcdef7890abcdef1234567890abcdee Concrete: Yes
Created by: tmooney Status: active
Created: 1985-04-09 15:38:47 Updated: 20xx-03-08 15:38:47
Tags:
Environment config: /gscmnt/gc2560/core/analysis_project/1234567890abcdef1234567890abcdef
Status is also available in a web browser by searching for the Analysis Project at https://spectacle.gsc.wustl.edu/
The view command gave a summary of the model statuses. To get a listing of the the statuses of all builds in the project:
genome model status "Example Analysis Project for Somatic Pipeline Walkthrough"
Resolving parameter 'models' from command argument 'Example Analysis Project for Somatic Pipeline Walkthrough'... found 7
H_EX.individual1.prod-cwl.somatic_exome abcdef7890abcdef1234567890bbcdee Succeeded
H_EX.individual1.prod-cwl.somatic_exome-1 abcdef7890abcdef1234567890cbcdee Succeeded
H_EX.individual1.prod-cwl.somatic_exome-2 abcdef7890abcdef1234567890dbcdee Failed
H_EX.individual2.prod-cwl.somatic_exome abcdef7890abcdef1234567890ebcdee Running
H_EX.individual2.prod-cwl.somatic_exome-1 abcdef7890abcdef1234567890fbcdee Succeeded
H_EX.individual2.prod-cwl.somatic_exome-2 abcdef7890abcdef12345678909bcdee Succeeded
H_EX.individual3.prod-cwl.somatic_exome abcdef7890abcdef12345678908bcdee Build Needed
The statuses here should add up to the summarized statuses from the view command.
To investigate a single build that has failed:
genome model build view abcdef7890abcdef1234567890dbcdee
'build' may require verification...
Resolving parameter 'build' from command argument 'abcdef7890abcdef1234567890dbcdee'... found 1
=== Build ===
Build ID: abcdef7890abcdef1234567890dbcdee Build Status: Failed
Model ID: bbbdef4567abcdef8888888888dbcdee Model Name: H_EX.individual1.prod-cwl.somatic_exome-2
Run by: prod-builder Processing Profile ID: 31e873b623e2454e9b68a53dac9356c4
Build Scheduled: 2019-03-13 21:03:36 Build Completed: 2019-03-14 02:49:10
Build Class: Genome::Model::Build::CwlPipeline
Software Revision: /gsc/scripts/opt/genome/snapshots/genome-3781/lib/perl/Genome/Site/TGI/SiteLib:/gsc/scripts/opt/genome/snapshots/genome-3781/lib/perl:/etc/perl:/usr/local/lib/site_perl:/usr/lib/x86_64-linux-gnu/perl-base
Software Result Test Name(s): No results found
Data Directory: /gscmnt/gc99999/example/model_data/bbbdef4567abcdef8888888888dbcdee/buildabcdef7890abcdef1234567890dbcdee
Analysis Project: Example Analysis Project for Somatic Pipeline Walkthrough
=== Workflow ===
ID LSF_ID SHARD STATUS START END NAME
b9eb3c3b-3d45-4470-a29d-e9b21b17c8ab Failed 2019-03-13T21:04:46.145-05:00 2019-03-14T02:48:05.057-05:00 somatic_exome.cwl
65bef02a-52ae-4b53-994c-457e14cc3c01 Failed 2019-03-13T21:04:51.017-05:00 2019-03-13T21:05:07.474-05:00 . exome_alignment.cwl
8276dbba-4aac-43af-b002-f9f38cecf6d7 Failed 2019-03-13T21:04:53.076-05:00 2019-03-13T21:05:06.469-05:00 . . bam_to_bqsr.cwl
5350753 Failed 2019-03-13T21:04:58.239-05:00 2019-03-13T21:05:05.682-05:00 . . . merge
67f45bc7-1473-4204-ba93-027123519382 Succeeded 2019-03-13T21:04:51.017-05:00 2019-03-14T02:48:03.977-05:00 . exome_alignment.cwl
7b054299-947d-4413-8473-e911e7dae096 Succeeded 2019-03-13T21:04:53.076-05:00 2019-03-14T02:27:33.636-05:00 . . bam_to_bqsr.cwl
6a5600b3-2933-4813-966f-d31d894dc32e 0 Succeeded 2019-03-13T21:04:56.158-05:00 2019-03-13T22:04:44.334-05:00 . . . align.cwl
5350752 Done 2019-03-13T21:04:58.239-05:00 2019-03-13T22:04:43.897-05:00 . . . . align_and_tag
5351674 Done 2019-03-13T22:04:47.376-05:00 2019-03-13T22:37:04.379-05:00 . . . merge
5352020 Done 2019-03-13T22:37:05.656-05:00 2019-03-13T22:47:14.221-05:00 . . . name_sort
5352127 Done 2019-03-13T22:47:15.706-05:00 2019-03-14T00:33:24.122-05:00 . . . mark_duplicates_and_sort
5352826 Done 2019-03-14T00:33:25.837-05:00 2019-03-14T00:49:14.249-05:00 . . . bqsr
5352902 Done 2019-03-14T00:49:15.416-05:00 2019-03-14T02:14:32.616-05:00 . . . apply_bqsr
5353656 Done 2019-03-14T02:14:34.516-05:00 2019-03-14T02:27:17.137-05:00 . . . bam_to_cram
5353736 Done 2019-03-14T02:27:18.336-05:00 2019-03-14T02:27:32.830-05:00 . . . index_cram
81c48e11-adef-4ee8-8282-8fdff72f1a28 Succeeded 2019-03-14T02:27:35.005-05:00 2019-03-14T02:48:02.958-05:00 . . qc_exome.cwl
c846e057-ed69-4cb3-8ff8-6e71b3c73ab1 Succeeded 2019-03-14T02:27:37.058-05:00 2019-03-14T02:39:48.359-05:00 . . . hs_metrics.cwl
5353746 0 Done 2019-03-14T02:27:41.185-05:00 2019-03-14T02:39:07.586-05:00 . . . . collect_per_target_hs_metrics
5353747 0 Done 2019-03-14T02:27:41.186-05:00 2019-03-14T02:39:47.144-05:00 . . . . collect_per_base_hs_metrics
5e629888-b5e4-41c6-89e5-090446db8ea0 Succeeded 2019-03-14T02:27:37.058-05:00 2019-03-14T02:47:02.776-05:00 . . . cram_to_bam_and_index.cwl
5353742 Done 2019-03-14T02:27:39.095-05:00 2019-03-14T02:44:19.094-05:00 . . . . cram_to_bam
5353819 Done 2019-03-14T02:44:20.615-05:00 2019-03-14T02:47:02.211-05:00 . . . . index_bam
5353740 Done 2019-03-14T02:27:37.056-05:00 2019-03-14T02:28:30.515-05:00 . . . samtools_flagstat
5353744 Done 2019-03-14T02:27:37.057-05:00 2019-03-14T02:38:09.340-05:00 . . . collect_insert_size_metrics
5353741 Done 2019-03-14T02:27:37.057-05:00 2019-03-14T02:28:28.724-05:00 . . . select_variants
5353743 Done 2019-03-14T02:27:37.058-05:00 2019-03-14T02:47:04.281-05:00 . . . collect_alignment_summary_metrics
5353745 Done 2019-03-14T02:27:38.075-05:00 2019-03-14T02:45:34.177-05:00 . . . collect_roi_hs_metrics
5353833 Done 2019-03-14T02:47:04.816-05:00 2019-03-14T02:48:02.695-05:00 . . . verify_bam_id
Name: merge Status: Failed
Stderr Log: /gscmnt/gc99999/example/model_data/bbbdef4567abcdef8888888888dbcdee/buildabcdef7890abcdef1234567890dbcdee/tmp/cromwell-executions/somatic_exome.cwl/b9eb3c3b-3d45-4470-a29d-e9b21b17c8ab/call-tumor_alignment_and_qc/exome_alignment.cwl/65bef02a-52ae-4b53-994c-457e14cc3c01/call-alignment/bam_to_bqsr.cwl/8276dbba-4aac-43af-b002-f9f38cecf6d7/call-merge/execution/stderr
The detailed cromwell workflow will be displayed to help identify which steps are currently running or have failed. (This feature will not work in the "legacy" docker image.) The cromwell workflow will not display future steps--they only are added to this view once they have begun.