Skip to content

Releases: muellan/metacache

MetaCache v0.8.0

29 Oct 09:49

Choose a tag to compare

New feature "Coverage Filter"

Option -cov-percentile <p> removes the p-th percentile of hit targets (reference genomes) with the lowest coverage. A first pass does the normal mapping of queries (reads) to targets (reference genomes). The actual classification is then done in a second pass using only the remaining hit targets.

This will lead to a very small increase in runtime and memory consumption but can improve accuracy by detecting and removing stray false positive hits.

The coverage filter is deactivated by default.

Other Changes

  • improved multi-threading in query mode
  • improved database format (layout better suited for future loading on GPUs)
  • code cleanup

MetaCache v0.6.2

16 Oct 07:38

Choose a tag to compare

  • improved accession number / sequence id parsing
  • file reading improvements
  • code cleanup

MetaCache v0.6.1

25 Sep 12:30

Choose a tag to compare

  • improved database building performance (~30-50% speedup)
  • improved taxonomic id assignment during build: now one can also use global assembly_summary files
    (default: "assembly_summary_refseq.txt", "assembly_summary_refseq_historical.txt", "assembly_summary_genbank.txt", "assembly_summary_genbank_historical.txt" in the taxonomy folder)
  • the download-ncbi-taxonomy script downloads "assembly_summary_refseq.txt" and "assembly_summary_refseq_historical.txt" by default now
  • code cleanup

MetaCache v0.5.3

09 May 08:44

Choose a tag to compare

  • improvements to abundances output
  • some code cleanup

MetaCache v0.5.2

03 May 07:52

Choose a tag to compare

  • fixed "abundance estimation not working if lowest classificaiton level is above sequence level"
  • some code reorganization

MetaCache v0.5.1

29 Apr 11:56

Choose a tag to compare

  • It is now possible to have the "root" level as highest taxonomic classification level. This is needed for some abundance estimation postprocessing tasks. The default for the highest level remains "domain".
  • improved database I/O performance
  • database files on disk are up to 15% smaller now
  • small fixes

MetaCache v0.5.0

23 Oct 14:00

Choose a tag to compare

  • New merge mode for merging results of multiple, independent queries. This can be used to save memory by splitting up the set of reference genomes into several databases. These can then be queried in succession and the results can be merged to obtain a classification based on the whole set of reference genomes.
    ./metacache query 1.db reads.fa -tophits -queryids -lowest species -out res1.txt 
    ./metacache query 2.db reads.fa -tophits -queryids -lowest species -out res2.txt
    ./metacache query 3.db reads.fa -tophits -queryids -lowest species -out res3.txt
    ./metacache merge res1.txt res2.txt res3.txt -taxonomy ncbi_taxonomy -out res1+2+3.txt
  • tweaked classification algorithm in case of multiple equally good matches in several targets
  • small fixes

MetaCache v0.4.0

02 Oct 13:33

Choose a tag to compare

  • added per-taxon abundance summary (-abundances) and per-rank abundance estimation (-abundance-per <rank>
  • Simplified internal classification scheme; Attention: This will now tend to favor precision a little more than before if one uses the default classification threshold. You can lower the threshold (-hitsmin) to have more sensitivity at the expense of precision.
  • performance improvements

MetaCache v0.3.4

18 Sep 11:06

Choose a tag to compare

improved query performance

MetaCache v0.3.3

12 Sep 12:43

Choose a tag to compare

  • small performance improvements
  • code simplification
  • fixes