This repository contains subworkflows for performing copy number variation (CNV) analysis on single-cell RNA-seq (scRNA-seq) data. The CNV subworkflows support multiple tools including inferCNV, SCEVAN, and CopyKAT to assess tumor heterogeneity and identify chromosomal aberrations.
Disclaimer: Subworkflows are high-level wrappers around chained Nextflow modules. They should be used as part of a pipeline and can be extended or reused across different scRNA-seq workflows.
Ensure the following tools are installed:
- Nextflow (v21.04.0 or higher)
- Java (v8 or higher)
- Singularity or Docker for container execution
- Git
Clone the repository:
git clone https://github.com/WangLab-ComputationalBiology/SCRATCH-CNV.git
cd SCRATCH-CNVThis is the main entry script that orchestrates CNV analysis using inferCNV, SCEVAN, and CopyKAT subworkflows.
Performs CNV inference using gene expression intensities compared between reference (normal) and observation (tumor) cells.
nextflow run main.nf -profile singularity --input_seurat_object <path/to/seurat_object.RDS> --input_reference_table <path/to/reference_table.csv>- --input_seurat_object: Seurat object with UMAP and count layers
- --input_reference_table: CSV with barcode and reference label columns
- --project_name: Output project name (optional)
- --skip_infercnv: Skip running inferCNV (default: false)
Performs CNV detection using Bayesian inference across multiple tumor samples.
- project_name: Name for output and figures
- input_model: Organism (e.g., human)
- n_threads: Number of threads
- n_memory: Memory in GB
- workdir: Working directory for outputs
- auto_save: Save intermediate objects (true/false)
Optional module for an alternative CNV inference strategy.
nextflow run main.nf -profile singularity \
  --input_seurat_object project_cluster_object.RDS \
  --input_reference_table assets/OV_reference_table.csv \
  --project_name OV_CNV \
  -resumeTo ensure successful CNV analysis, your input Seurat object must include one of the following annotation columns in meta.data and contain the minimum required cell types:
| Annotation Column | Required Cell Types | Role | 
|---|---|---|
| azimuth_labels | B cell, T cell, Fibroblast, Epithelial | Reference + Observation | 
| sctype | B_Plasma_Cells, T_Cells, Fibroblast, Epithelial | Reference + Observation | 
| cell_label | B cell, T cell, Fibroblast, Epithelial | Reference + Observation | 
Note: The presence of these cell types is critical to define both reference (normal) and observation (tumor) populations.
Default parameters and paths can be set in nextflow.config. Use institutional profiles for HPC environments.
- ./<project_name>/data/infercnv: CNV matrices and plots from inferCNV
- ./<project_name>/data/scevan: CNV profiles and oncoheatmaps from SCEVAN
- ./<project_name>/report: Consolidated report
- For inferCNV HMM mode on large matrices, increase cutoff (e.g., 0.25) or use HMM_type = "i3"to reduce model complexity.
- Inspect logs via .nextflow.logand.command.outin work directories for failed tasks.
- Avoid missing parameters in ext.argsblock for modules like SCEVAN to prevent pipeline crashes.
Open issues or submit PRs for bugs, enhancements, or suggestions.
This project is licensed under the GNU General Public License v3.0.
For help and questions: