nf-core/seqinspector is a bioinformatics pipeline that processes raw sequence data (FASTQ) to provide comprehensive quality control. It can perform subsampling, quality assessment, duplication level analysis, and complexity evaluation on a per-sample basis, while also detecting adapter content, technical artifacts, and common biological contaminants. The pipeline generates detailed MultiQC reports with flexible output options, ranging from individual sample reports to project-wide summaries, making it particularly useful for sequencing core facilities and research groups with access to sequencing instruments. If provided, nf-core/seqinspector can also parse statistics from an Illumina run folder directory into the final MultiQC reports.
| Tool Type | Tool Name | Tool Description | Compatibility with Data | Dependencies | Default tool |
|---|---|---|---|---|---|
Subsampling |
Seqtk |
Global subsampling of reads. Only performs subsampling if --sample_size parameter is given. |
[RNA, DNA, synthetic] | [N/A] | no |
Indexing, Mapping |
Bwamem2 |
Align reads to reference | [RNA, DNA] | [N/A] | yes |
Indexing |
SAMtools |
Index aligned BAM files, create FASTA index | [DNA] | [N/A] | yes |
QC |
FastQC |
Read QC | [RNA, DNA] | [N/A] | yes |
QC |
FastqScreen |
Basic contamination detection | [RNA, DNA] | [N/A] | yes |
QC |
SeqFu Stats |
Sequence statistics | [RNA, DNA] | [N/A] | yes |
QC |
Picard collect multiple metrics |
Collect multiple QC metrics | [RNA, DNA] | [Bwamem2, SAMtools, --genome] |
yes |
QC |
Picard_collecthsmetrics |
Collect alignment QC metrics of hybrid-selection data. | [RNA, DNA] | [Bwamem2, SAMtools, --fasta, --run_picard_collecths_metrics, --bait_intervals, --target_intervals (--ref_dict)] |
no |
Reporting |
MultiQC |
Present QC for raw reads | [RNA, DNA, synthetic] | [N/A] | yes |
| Tool | Version |
|---|---|
| bwamem2 | 2.3 |
| fastqc | 0.12.1 |
| fastqscreen | 0.16.0 |
| multiqc | 1.33 |
| picard | 3.4.0 |
| samtools | 1.22.1 |
| seqfu | 1.22.3 |
| seqtk | 1.4 |
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
sample,fastq_1,fastq_2,rundir,tags
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,200624_A00834_0183_BHMTFYDRXX,lane1:project5:group2Each row represents a fastq file (single-end with only fastq_1) or a pair of fastq files (paired end with fastq_1 and fastq_2).
rundir is the path to the runfolder.
tags is a colon-separated list of tags that will be added to the MultiQC report for this sample.
Now, you can run the pipeline using:
nextflow run nf-core/seqinspector \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
nf-core/seqinspector was originally written by @agrima2010, @Aratz, @FranBonath, @kedhammar, and @MatthiasZepper from the Swedish National Genomics Infrastructure and Clinical Genomics Stockholm.
Maintenance is now lead by Maxime U Garcia (National Genomics Infrastructure)
We thank the following people for their extensive assistance in the development of this pipeline:
- @adamrtalbot
- @alneberg
- @beatrizsavinhas
- @ctuni
- @edmundmiller
- @EliottBo
- @KarNair
- @kjellinjonas
- @mahesh-panchal
- @matrulda
- @mirpedrol
- @nggvs
- @nkongenelly
- @Patricie34
- @pontushojer
- @ramprasadn
- @rannick
- @torigiffin
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #seqinspector channel (you can join with this invite).
You can cite the seqinspector zenodo record for a specific version using the following doi: 10.5281/zenodo.18757486
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
You can cite the nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.