-
Notifications
You must be signed in to change notification settings - Fork 39
Home
mzLib is a comprehensive, open-source .NET library for mass spectrometry data analysis and computational proteomics/transcriptomics. Built on .NET 8, mzLib provides a unified framework for working with mass spectrometry data, biological sequences, chemical formulas, and computational workflows for omics research.
- Mass Spectrometry Data Processing: Read, write, and analyze MS data in multiple formats (mzML, Thermo RAW, etc.)
- Omics Analysis: Protein and RNA sequence analysis with digestion, modification, and fragmentation support
- Chemical Calculations: Accurate mass and isotope distribution calculations
- Spectral Analysis: Deconvolution, averaging, and matching algorithms
- Retention Time Prediction: Machine learning-based chromatographic retention time prediction
- Database Management: Load and process protein/RNA databases with decoy generation
- Quantification: Label-free quantification via FlashLFQ
# Install via NuGet
dotnet add package mzLibusing MassSpectrometry;
using Proteomics;
using Chemistry;
// Load MS data
var msDataFile = MsDataFileReader.GetDataFile("sample.mzML");
var scans = msDataFile.GetAllScansList();
// Work with proteins
var protein = new Protein("PEPTIDESEQUENCE", "P12345");
var peptides = protein.Digest(new DigestionParams("trypsin"), fixedMods, variableMods);
// Calculate masses
var formula = ChemicalFormula.ParseFormula("C6H12O6");
double mass = formula.MonoisotopicMass;- Chemistry - Chemical formula operations, mass calculations, periodic table, isotope distributions
- Mass Spectrometry - MS data structures, scan operations, spectral processing
- Spectral Deconvolution - Charge state deconvolution and isotope deisotoping
- Spectral Averaging - Averaging and combining multiple MS scans
- Retention Time Prediction - SSRCalc-based chromatographic retention time prediction
- MS Data File Reading - Read mzML, Thermo RAW, and other MS formats
- Result File Reading - Parse search engine results (pepXML, mzIdentML, etc.)
- Sequence Database File Reading - Load protein/RNA databases (FASTA, XML, etc.)
-
Omics: Base Foundation - Core interfaces (
IBioPolymer,IBioPolymerWithSetMods) and unified architecture
- Proteomics - Protein analysis, peptide generation, PTMs, SILAC labeling
- Transcriptomics - RNA analysis, oligonucleotide generation, epitranscriptomic modifications
- Omics: Modifications - Post-translational modifications (PTMs) and RNA modifications with motif matching
- Omics: Digestion - Enzymatic digestion framework for proteases and RNases
- Omics: Fragmentation - MS/MS fragmentation for peptides and oligonucleotides
- Omics: Decoy Generation - Generate decoy sequences for FDR calculation
using MassSpectrometry;
using Proteomics;
using Proteomics.ProteolyticDigestion;
using UsefulProteomicsDatabases;
// 1. Load MS data
var msDataFile = MsDataFileReader.GetDataFile("sample.mzML");
var ms2Scans = msDataFile.GetAllScansList().Where(s => s.MsnOrder == 2).ToList();
// 2. Load protein database
var proteins = ProteinDbLoader.LoadProteinFasta(
"database.fasta",
generateTargets: true,
decoyType: DecoyType.Reverse,
isContaminant: false,
out var errors
);
// 3. Digest proteins
var digestionParams = new DigestionParams(
protease: "trypsin",
maxMissedCleavages: 2,
minPeptideLength: 7,
maxPeptideLength: 30
);
var theoreticalPeptides = proteins
.SelectMany(p => p.Digest(digestionParams, fixedMods, variableMods))
.ToList();
// 4. Match spectra to peptides
foreach (var scan in ms2Scans)
{
var candidates = theoreticalPeptides
.Where(p => Math.Abs(p.MonoisotopicMass - scan.PrecursorMass) < tolerance)
.ToList();
foreach (var peptide in candidates)
{
var products = new List<Product>();
peptide.Fragment(DissociationType.HCD, FragmentationTerminus.Both, products);
// Score matches...
}
}using Transcriptomics;
using Transcriptomics.Digestion;
// 1. Load RNA sequence
var rna = new RNA(
sequence: "AUGCCGUACGAU",
accession: "RNA001",
name: "tRNA-Ala"
);
// 2. Define RNA modifications
var m6A = new Modification(
_originalId: "m6A",
_target: ModificationMotif.GetMotif("A"),
_chemicalFormula: ChemicalFormula.ParseFormula("CH2")
);
// 3. Digest with RNase
var digestionParams = new RnaDigestionParams(
rnase: "RNase T1",
maxMissedCleavages: 1,
minOligoLength: 3,
maxOligoLength: 20
);
var oligos = rna.Digest(
digestionParams,
new List<Modification>(),
new List<Modification> { m6A }
).ToList();
// 4. Analyze modification sites
var modifiedOligos = oligos.Where(o => o.NumMods > 0).ToList();
Console.WriteLine($"Found {modifiedOligos.Count} modified oligonucleotides");using MassSpectrometry;
using MassSpectrometry.Deconvolution;
using SpectralAveraging;
// 1. Load MS data
var msDataFile = MsDataFileReader.GetDataFile("sample.mzML");
// 2. Average similar scans
var scansToAverage = msDataFile.GetAllScansList()
.Where(s => s.MsnOrder == 1 && s.RetentionTime > 10 && s.RetentionTime < 15)
.ToList();
var averagedScan = SpectraFileAveraging.AverageSpectra(scansToAverage);
// 3. Deconvolute spectrum
var deconvolutionParams = new ClassicDeconvolutionParameters(
minAssumedChargeState: 1,
maxAssumedChargeState: 10,
deconvolutionTolerancePpm: 20,
intensityRatioLimit: 3
);
var deconvolutedPeaks = Deconvoluter.Deconvolute(
averagedScan.MassSpectrum,
deconvolutionParams
).ToList();
Console.WriteLine($"Found {deconvolutedPeaks.Count} deconvoluted peaks");mzLib follows a modular, layered architecture:
┌─────────────────────────────────────────┐
│ Applications (MetaMorpheus, etc.) │
├─────────────────────────────────────────┤
│ Domain Libraries │
│ • Proteomics • Transcriptomics │
│ • FlashLFQ • SpectralAveraging │
├─────────────────────────────────────────┤
│ Core Omics Framework │
│ • IBioPolymer • Modifications │
│ • Digestion • Fragmentation │
├─────────────────────────────────────────┤
│ Mass Spectrometry & Chemistry │
│ • MassSpectrometry • Chemistry │
│ • Deconvolution • Chromatography │
├─────────────────────────────────────────┤
│ File I/O & Utilities │
│ • Readers • UsefulProteomicsDatabases │
│ • MzLibUtil • MzIdentML • PepXML │
└─────────────────────────────────────────┘
-
Interface-Based Design: Core functionality defined through interfaces (
IBioPolymer,IBioPolymerWithSetMods) - Unified Framework: Common patterns for proteins and RNA analysis
- Type Safety: Strong typing with generics and compile-time checks
- Performance: Memory pooling, caching, and parallel processing support
- Extensibility: Easy to add custom enzymes, modifications, and fragmentation rules
Purpose: Foundation for all mass and formula calculations
Key Classes: ChemicalFormula, PeriodicTable, IsotopicDistribution
Purpose: Core MS data structures and operations
Key Classes: MsDataScan, MsDataFile, MzSpectrum, ChromatographicPeak
Purpose: Base framework for biological polymer analysis
Key Interfaces: IBioPolymer, IBioPolymerWithSetMods, IDigestionParams
Purpose: Protein-specific implementations
Key Classes: Protein, PeptideWithSetModifications, Protease
Purpose: RNA-specific implementations
Key Classes: RNA, OligoWithSetMods, Rnase
Purpose: File I/O for various formats
Supported Formats: mzML, Thermo RAW, pepXML, mzIdentML, FASTA, UniProt XML
Purpose: Database loading and management
Key Classes: ProteinDbLoader, DecoyProteinGenerator, PtmListLoader
Purpose: Label-free quantification
Key Classes: FlashLfqEngine, ChromatographicPeak, ProteinGroup
Purpose: Spectral averaging and combination
Key Classes: SpectraFileAveraging, ScoreBasedAveraging
Purpose: Retention time prediction
Key Classes: SSRCalc3RetentionTimePredictor
# Install core library
dotnet add package mzLib
# Or install specific components
dotnet add package mzLib.Chemistry
dotnet add package mzLib.Proteomics
dotnet add package mzLib.Transcriptomics
dotnet add package FlashLFQInstall-Package mzLib<ItemGroup>
<PackageReference Include="mzLib" Version="*" />
</ItemGroup>- .NET 8.0 or higher
- Supported Platforms: Windows, Linux, macOS
- Optional: Thermo MSFileReader (for native RAW file support on Windows)
- Browse the wiki pages listed above for detailed documentation
- Each page includes:
- Overview and key features
- System design with class diagrams
- Comprehensive code examples
- Common use cases
- Best practices
- GitHub Issues: Report bugs or request features
- Discussions: Ask questions and share ideas
- Pull Requests: Contributions are welcome!
Each wiki page includes working code examples. For additional examples, see:
- Test projects in the repository
- MetaMorpheus source code (uses mzLib extensively)
- MetaMorpheus: Proteomics search engine and PTM discovery tool
- FlashLFQ: Label-free quantification
We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Commit your changes with clear messages
- Write tests for new functionality
- Submit a pull request
See CONTRIBUTING.md for detailed guidelines.
mzLib is licensed under the MIT License. See LICENSE for details.
Beginners
- Chemistry - Start here for fundamentals
- Mass Spectrometry - MS data basics
- MS Data File Reading - Load your first file
Proteomics Researchers
- Omics: Base Foundation - Understand the framework
- Proteomics - Protein analysis
- Omics: Modifications - PTMs
- Omics: Digestion - Enzymatic digestion
- Omics: Fragmentation - MS/MS spectra
RNA/Transcriptomics Researchers
- Omics: Base Foundation - Understand the framework
- Transcriptomics - RNA analysis
- Omics: Modifications - RNA modifications
- Omics: Digestion - RNase digestion
- Omics: Fragmentation - Oligonucleotide fragmentation
Advanced Topics
- Spectral Deconvolution - Charge state determination
- Spectral Averaging - Signal enhancement
- Retention Time Prediction - Chromatography
- Omics: Decoy Generation - FDR control
Loading Data
Processing Sequences
Processing Spectra
Calculating Properties
Welcome to mzLib! Start with the Chemistry or Omics: Base Foundation pages, or jump directly to the topic that interests you most.