coVar is a tool for detecting physically-linked mutations in wastewater genomic sequencing data. Given a sorted, indexed BAM file, reference genome and gene annotation, coVar identifies and counts sequencing reads with unique physically linked mutations.
conda install -c bioconda covar
covar --version
cargo install covar
covar --version
git clone https://github.com/andersen-lab/covar.git
cd covar
cargo install --path .
covar --version
covar --input <INPUT_BAM> --reference <REFERENCE_FASTA> --annotation <ANNOTATION_GFF>
| Flag | Description |
|---|---|
-i, --input <INPUT_BAM> |
Input BAM file (must be primer trimmed, sorted, and indexed). |
-r, --reference <REFERENCE_FASTA> |
Reference genome in FASTA format. |
-a, --annotation <ANNOTATION_GFF> |
Annotation GFF3 file for translating nucleotide to amino acid mutations. |
| Flag | Default | Description |
|---|---|---|
-o, --output <OUTPUT> |
stdout | Output file path. If not provided, results will be printed to stdout. |
-s, --start_site <START> |
0 |
Genomic start position for variant calling. |
-e, --end_site <END> |
reference length | Genomic end position for variant calling. Defaults to the length of the reference genome. |
-d, --min_depth <DEPTH> |
1 |
Minimum coverage depth for a mutation cluster to be considered. |
-f, --min_frequency <FREQ> |
0.001 |
Minimum mutation frequency (cluster depth / total depth). |
-q, --min_quality <QUAL> |
20 |
Minimum base quality score for variant calling. |
-t, --threads <THREADS> |
1 |
Number of threads to use for processing. |
covar \
-i sample.bam \
-r reference.fasta \
-a annotation.gff3covar \
-i sample.bam \
-r reference.fasta \
-a annotation.gff3 \
-s 1000 \
-e 5000 \
-o output.tsvcovar \
-i sample.bam \
-r reference.fasta \
-a annotation.gff3 \
-d 5 \
-q 30 \
-f 0.01 \
-t 4The output is a tab-delimited file (.tsv) with the following columns:
| Column | Description |
|---|---|
nt_mutations |
Nucleotide mutations for this cluster |
aa_mutations |
Corresponding amino acid translations (where possible*) |
cluster_depth |
Total number of read pairs with this cluster of mutations |
total_depth |
Total number of reads spanning this cluster |
frequency |
Mutation frequency (cluster depth / total depth) |
coverage_start |
Maximum read start site for which this cluster was detected |
coverage_end |
Minimum read end site for which this cluster was detected |
*Note: Not all nucleotide mutations will have a corresponding amino acid mutations. For example, SNPs in codons that span reads or frameshift indels will be translated as 'Unknown' and 'NA', respectively.