This pipeline processes single-cell or single-nucleus RNA sequencing data and merges the results for further analysis.
This pipeline is designed to process RNA-seq data from single-cell or single-nucleus samples. It allows for processing multiple samples independently and merging them into a single Seurat object for downstream analysis.
To start using the pipeline, clone the repository to your local machine:
git clone <repository_url>The pipeline requires a sample sheet in .csv format. The sample sheet should contain the following two columns:
- Sample ID: A unique identifier for each sample.
- Cell Ranger Output Path: The file path to the Cell Ranger output for each sample.
Make sure to adjust the paths in the sample sheet as needed for your environment.
The params.yaml file contains various configuration parameters for the pipeline. The following parameters are mandatory:
- pipeline: Specify the steps you want to run, separated by commas (no spaces). To run the entire pipeline, use
full(e.g.,pipeline: full). - type.sequencing: make sure you set this argument to either snRNA or scRNA
Once the setup is complete, you can execute the pipeline using the following command:
sbatch scRNA-pipeline.sh-
Sample Processing: The pipeline will first process each sample independently based on the information in the sample sheet. This will include quality control, filtering, and other preprocessing steps.
-
Metric Summary: After processing all samples, a metric summary will be generated to help evaluate the quality of the data for each sample.
-
Merging Process: Once all samples are processed successfully (i.e., without premature stops or errors), the pipeline will merge the samples. Only completed samples will be included in the merging process. The number of cells in each sample is also evaluated, and the default threshold is 200 cells. This threshold can be modified by adjusting the
min_cellsparameter in theparams.yaml.
- pipeline: Define the pipeline steps you want to run (comma-separated, no spaces). Use
fullto run all steps. - min_cells: The minimum number of cells that must be present in each sample to include it in the merge step. Default is
200. You can modify this value inparams.yaml.
- reference: Path to a reference RDS file that contains a Seurat object for annotation.
- ref.metadata.col: The specific column in the reference metadata to use for annotation transfer.
If you wish to run just the merging and processing steps (i.e., skip the individual sample processing), you can do so by following these steps:
-
Move the
processing.shfile from thescriptsdirectory into the previous directory. -
Modify
params.yamlas needed. If you skip the loading step (load), ensure that you provide a pre-processed Seurat object RDS file by specifying the path to it underRDS.fileinparams.yaml.
- Make sure all necessary files (e.g., sample sheet, Cell Ranger outputs, and Seurat reference) are properly linked and accessible.