Improve reference data practices

Our workflows often rely on reference datasets that we mount into the VM from a private NFS server. Basically, everything in [this file](https://github.com/Duke-GCB/bespin-cwl/blob/master/examples/exome-seq/exomeseq-bespin-dev.json) with a `/data/` prefix. Example below:

```
      "path": "/data/exome-seq/GenomeAnalysisTK-3.8/GenomeAnalysisTK.jar"
      "path": "/data/exome-seq/b37/Mills_and_1000G_gold_standard.indels.b37.vcf"
        "path": "/data/exome-seq/capture/xgen-exome-research-panel-targetsae255a1532796e2eaa53ff00001c1b3c-trimmed-chr.bed"
        "path": "/data/exome-seq/b37/dbsnp_138.b37.vcf"
        "path": "/data/exome-seq/b37/Mills_and_1000G_gold_standard.indels.b37.vcf"
        "path": "/data/exome-seq/b37/1000G_phase1.indels.b37.vcf"
        "path": "/data/exome-seq/capture/xgen-exome-research-panel-probesbe255a1532796e2eaa53ff00001c1b3c-trimmed-chr.bed"
      "path": "/data/exome-seq/b37/decoy/human_g1k_v37_decoy.fasta"
      "path": "/data/exome-seq/b37/dbsnp_138.b37.vcf"
      "path": "/data/exome-seq/b37/1000G_phase1.snps.high_confidence.b37.vcf"
      "path": "/data/exome-seq/b37/hapmap/hapmap_3.3.b37.vcf"
      "path": "/data/exome-seq/b37/omni/1000G_omni2.5.b37.vcf"
```

While some of the referenced datasets may seem obvious to those with domain expertise, their provenance is not made explicit. We also do not provide checksums, file sizes, or access to these files.

Let's come up with a strategy to address these shortcomings


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve reference data practices #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve reference data practices #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions