-
Notifications
You must be signed in to change notification settings - Fork 25
a little cleanup #422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
a little cleanup #422
Changes from 2 commits
934f1ca
b622439
d324901
6ab28a2
3b8b9f5
9674dd2
ceaf133
f9dc53a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,9 +16,10 @@ ENV PATH=/miniconda/bin/:/miniconda/envs/lr-metrics/bin/:/root/google-cloud-sdk/ | |
|
|
||
| # install conda packages | ||
| COPY ./environment.yml / | ||
| RUN conda install -n base conda-libmamba-solver && conda config --set solver libmamba | ||
| RUN conda env create -f /environment.yml && conda clean -a | ||
|
|
||
| # install gatk | ||
| # install super-special version of gatk | ||
|
||
| RUN git clone https://github.com/broadinstitute/gatk.git -b kvg_pbeap \ | ||
| && cd gatk \ | ||
| && git checkout c9497220ef13beb05da7c7a820c181be00b9b817 \ | ||
|
|
@@ -30,6 +31,9 @@ RUN git clone https://github.com/broadinstitute/gatk.git -b kvg_pbeap \ | |
| # install picard | ||
| RUN wget -O /usr/local/bin/picard.jar https://github.com/broadinstitute/picard/releases/download/2.22.1/picard.jar | ||
|
|
||
| # install gsutil | ||
| RUN curl -sSL https://sdk.cloud.google.com | bash | ||
|
||
|
|
||
| # install various metric and visualization scripts | ||
| COPY lima_report_detail.R / | ||
| COPY lima_report_summary.R / | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,7 @@ channels: | |
| - bioconda | ||
| - r | ||
| dependencies: | ||
| - conda-forge::ncurses | ||
| - samtools | ||
| - bedtools | ||
| - java-jdk | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| { | ||
| "AlignedMetrics.aligned_bam": "gs://broad-dsp-lrma-cromwell/test_data/aligned-metrics/NA24385.bam", | ||
| "AlignedMetrics.aligned_bai": "gs://broad-dsp-lrma-cromwell/test_data/aligned-metrics/NA24385.bam.bai", | ||
| "AlignedMetrics.ref_dict": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict", | ||
| "AlignedMetrics.ref_fasta": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta", | ||
| "AlignedMetrics.gcs_output_dir": "gs://broad-dsp-lrma-ci/test-outputs" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,183 @@ | ||
| version 1.0 | ||
|
|
||
| task CoverageTrack { | ||
| input { | ||
| File bam | ||
| File bai | ||
| String chr | ||
| String start | ||
| String end | ||
|
|
||
| RuntimeAttr? runtime_attr_override | ||
| } | ||
|
|
||
| String basename = basename(bam, ".bam") | ||
| Int disk_size = 2*ceil(size(bam, "GB") + size(bai, "GB")) | ||
|
|
||
| command <<< | ||
| set -euxo pipefail | ||
|
|
||
| samtools depth -a ~{bam} -r ~{chr}:~{start}-~{end} | bgzip > ~{basename}.coverage.~{chr}_~{start}_~{end}.txt.gz | ||
| tabix -p bed ~{basename}.coverage.~{chr}_~{start}_~{end}.txt.gz | ||
| >>> | ||
|
|
||
| output { | ||
| File coverage = "~{basename}.coverage.~{chr}_~{start}_~{end}.txt.gz" | ||
| File coverage_tbi = "~{basename}.coverage.~{chr}_~{start}_~{end}.txt.gz.tbi" | ||
| } | ||
|
|
||
| ######################### | ||
| RuntimeAttr default_attr = object { | ||
| cpu_cores: 1, | ||
| mem_gb: 4, | ||
| disk_gb: disk_size, | ||
| boot_disk_gb: 10, | ||
| preemptible_tries: 2, | ||
| max_retries: 1, | ||
| docker: "us.gcr.io/broad-dsp-lrma/lr-metrics:0.1.11" | ||
| } | ||
| RuntimeAttr runtime_attr = select_first([runtime_attr_override, default_attr]) | ||
| runtime { | ||
| cpu: select_first([runtime_attr.cpu_cores, default_attr.cpu_cores]) | ||
| memory: select_first([runtime_attr.mem_gb, default_attr.mem_gb]) + " GiB" | ||
| disks: "local-disk " + select_first([runtime_attr.disk_gb, default_attr.disk_gb]) + " HDD" | ||
| bootDiskSizeGb: select_first([runtime_attr.boot_disk_gb, default_attr.boot_disk_gb]) | ||
| preemptible: select_first([runtime_attr.preemptible_tries, default_attr.preemptible_tries]) | ||
| maxRetries: select_first([runtime_attr.max_retries, default_attr.max_retries]) | ||
| docker: select_first([runtime_attr.docker, default_attr.docker]) | ||
| } | ||
| } | ||
|
|
||
| task FilterMQ0Reads { | ||
| input { | ||
| File bam | ||
|
|
||
| RuntimeAttr? runtime_attr_override | ||
| } | ||
|
|
||
| Int disk_size = 2*ceil(size(bam, "GB")) | ||
| String prefix = basename(bam, ".bam") | ||
|
|
||
| command <<< | ||
| set -euxo pipefail | ||
|
|
||
| samtools view -q 1 -b ~{bam} > ~{prefix}.no_mq0.bam | ||
| samtools index ~{prefix}.no_mq0.bam | ||
| >>> | ||
|
|
||
| output { | ||
| File no_mq0_bam = "~{prefix}.no_mq0.bam" | ||
| File no_mq0_bai = "~{prefix}.no_mq0.bam.bai" | ||
| } | ||
|
|
||
| ######################### | ||
| RuntimeAttr default_attr = object { | ||
| cpu_cores: 1, | ||
| mem_gb: 2, | ||
| disk_gb: disk_size, | ||
| boot_disk_gb: 10, | ||
| preemptible_tries: 2, | ||
| max_retries: 1, | ||
| docker: "us.gcr.io/broad-dsp-lrma/lr-metrics:0.1.11" | ||
| } | ||
| RuntimeAttr runtime_attr = select_first([runtime_attr_override, default_attr]) | ||
| runtime { | ||
| cpu: select_first([runtime_attr.cpu_cores, default_attr.cpu_cores]) | ||
| memory: select_first([runtime_attr.mem_gb, default_attr.mem_gb]) + " GiB" | ||
| disks: "local-disk " + select_first([runtime_attr.disk_gb, default_attr.disk_gb]) + " HDD" | ||
| bootDiskSizeGb: select_first([runtime_attr.boot_disk_gb, default_attr.boot_disk_gb]) | ||
| preemptible: select_first([runtime_attr.preemptible_tries, default_attr.preemptible_tries]) | ||
| maxRetries: select_first([runtime_attr.max_retries, default_attr.max_retries]) | ||
| docker: select_first([runtime_attr.docker, default_attr.docker]) | ||
| } | ||
| } | ||
|
|
||
| task ComputeBedCoverage { | ||
| input { | ||
| File bam | ||
| File bai | ||
| File bed | ||
| String prefix | ||
|
|
||
| RuntimeAttr? runtime_attr_override | ||
| } | ||
|
|
||
| Int disk_size = 2*ceil(size(bam, "GB") + size(bai, "GB") + size(bed, "GB")) | ||
|
|
||
| command <<< | ||
| set -euxo pipefail | ||
|
|
||
| bedtools coverage -b ~{bed} -a ~{bam} -nobuf | gzip > ~{prefix}.txt.gz | ||
| zcat ~{prefix}.txt.gz | awk '{ sum += sprintf("%f", $15*$16) } END { printf("%f\n", sum) }' > ~{prefix}.count.txt | ||
| >>> | ||
|
|
||
| output { | ||
| File coverage = "~{prefix}.txt.gz" | ||
| Float counts = read_float("~{prefix}.count.txt") | ||
| File counts_file = "~{prefix}.count.txt" | ||
| } | ||
|
|
||
| ######################### | ||
| RuntimeAttr default_attr = object { | ||
| cpu_cores: 1, | ||
| mem_gb: 2, | ||
| disk_gb: disk_size, | ||
| boot_disk_gb: 10, | ||
| preemptible_tries: 2, | ||
| max_retries: 1, | ||
| docker: "us.gcr.io/broad-dsp-lrma/lr-metrics:0.1.11" | ||
| } | ||
| RuntimeAttr runtime_attr = select_first([runtime_attr_override, default_attr]) | ||
| runtime { | ||
| cpu: select_first([runtime_attr.cpu_cores, default_attr.cpu_cores]) | ||
| memory: select_first([runtime_attr.mem_gb, default_attr.mem_gb]) + " GiB" | ||
| disks: "local-disk " + select_first([runtime_attr.disk_gb, default_attr.disk_gb]) + " HDD" | ||
| bootDiskSizeGb: select_first([runtime_attr.boot_disk_gb, default_attr.boot_disk_gb]) | ||
| preemptible: select_first([runtime_attr.preemptible_tries, default_attr.preemptible_tries]) | ||
| maxRetries: select_first([runtime_attr.max_retries, default_attr.max_retries]) | ||
| docker: select_first([runtime_attr.docker, default_attr.docker]) | ||
| } | ||
| } | ||
|
|
||
| task BamToBed { | ||
| input { | ||
| File bam | ||
| File bai | ||
|
|
||
| RuntimeAttr? runtime_attr_override | ||
| } | ||
|
|
||
| String bed = basename(bam, ".bam") + ".bed" | ||
| Int disk_size = 4*ceil(size(bam, "GB") + size(bai, "GB")) | ||
|
|
||
| command <<< | ||
| set -euxo pipefail | ||
|
|
||
| bedtools bamtobed -i ~{bam} > ~{bed} | ||
| >>> | ||
|
|
||
| output { | ||
| File bedfile = bed | ||
| } | ||
|
|
||
| ######################### | ||
| RuntimeAttr default_attr = object { | ||
| cpu_cores: 2, | ||
| mem_gb: 8, | ||
| disk_gb: disk_size, | ||
| boot_disk_gb: 10, | ||
| preemptible_tries: 2, | ||
| max_retries: 1, | ||
| docker: "us.gcr.io/broad-dsp-lrma/lr-metrics:0.1.11" | ||
| } | ||
| RuntimeAttr runtime_attr = select_first([runtime_attr_override, default_attr]) | ||
| runtime { | ||
| cpu: select_first([runtime_attr.cpu_cores, default_attr.cpu_cores]) | ||
| memory: select_first([runtime_attr.mem_gb, default_attr.mem_gb]) + " GiB" | ||
| disks: "local-disk " + select_first([runtime_attr.disk_gb, default_attr.disk_gb]) + " HDD" | ||
| bootDiskSizeGb: select_first([runtime_attr.boot_disk_gb, default_attr.boot_disk_gb]) | ||
| preemptible: select_first([runtime_attr.preemptible_tries, default_attr.preemptible_tries]) | ||
| maxRetries: select_first([runtime_attr.max_retries, default_attr.max_retries]) | ||
| docker: select_first([runtime_attr.docker, default_attr.docker]) | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,6 @@ | ||
| version 1.0 | ||
|
|
||
| import "../../../structs/Structs.wdl" | ||
| import "../../../tasks/Utility/Finalize.wdl" as FF | ||
|
|
||
| workflow ONTPfHrp2Hrp3Status { | ||
|
|
||
|
|
@@ -86,7 +85,7 @@ task IsLocusDeleted { | |
| boot_disk_gb: 10, | ||
| preemptible_tries: 2, | ||
| max_retries: 1, | ||
| docker: "us.gcr.io/broad-dsp-lrma/lr-mosdepth:0.3.1" | ||
| docker: "us.gcr.io/broad-dsp-lrma/lr-mosdepth:0.3.2" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know we have
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makefile version bumped. |
||
| } | ||
| RuntimeAttr runtime_attr = select_first([runtime_attr_override, default_attr]) | ||
| runtime { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with how mamba helps dependencies, in particular where it should be installed.
But if you look at the environment.yml, it's trying to create an env named
lr-metrics, and here you're installing mamba into thebaseenv.Is this usually what people do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been reverted, so it's no longer relevant, but here's some info for posterity.
I tried to build this docker on a desktop with 16Gb of memory. No dice. If I was lucky it just got OOM'd after a few hours. So I ran out to microcenter and bought 32Gb of memory. After leaving it overnight it was still trying to solve the environment. Added libmamba and the docker built promptly (a few minutes--don't remember exactly) using little memory. Don't know about the correctness of the environments (but I'd think you'd want the solver in the base environment).