Skip to content

Commit a487dff

Browse files
committed
update docs
1 parent 90d219d commit a487dff

File tree

7 files changed

+122
-110
lines changed

7 files changed

+122
-110
lines changed

docs/conda.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,16 +128,19 @@ dependency tree and come up with a solution that works to satisfy the entire
128128
set of specified requirements.
129129

130130
We chose to split the conda environments in two: the **main** environment and the **R**
131-
environment (see :ref:`conda-design-decisons`). These environments are
131+
environment (see :ref:`conda-design-decisions`). These environments are
132132
described by both "strict" and "loose" files. By default we use the "strict"
133133
version, which pins all versions of all packages exactly. This is preferred
134134
wherever possible. However we also provide a "loose" version that is not
135135
specific about versions. The following table describes these files:
136136

137+
+----------------+--------------------------------+----------------------------------+
137138
| strict version | loose version | used for |
138139
+================+================================+==================================+
139140
| ``env.yml`` | ``include/requirements.txt`` | Main Snakefiles |
141+
+----------------+--------------------------------+----------------------------------+
140142
| ``env-r.yaml`` | ``include/requirements-r.txt`` | Downstream RNA-seq analysis in R |
143+
+----------------+--------------------------------+----------------------------------+
141144

142145
When deploying new instances, use the ``--build-envs`` argument which will use
143146
the strict version. Or use the following commands in a deployed directory:

docs/config.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
.. _config:
33

44

5-
Configuration details
6-
=====================
5+
Configuration
6+
=============
77

88
General configuration
99
~~~~~~~~~~~~~~~~~~~~~

docs/getting-started.rst

Lines changed: 44 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,14 @@
33
Getting started
44
===============
55

6-
The main prerequisite for `lcdb-wf` is `conda
7-
<https://docs.conda.io/en/latest/>_`, with the `bioconda
8-
<https://bioconda.github.io>`_. channel set up and the `mamba
9-
<https://github.com/mamba-org/mamba>`_ drop-in replacement for conda.
6+
The main prerequisite for `lcdb-wf` is `conda <https://docs.conda.io/en/latest/>_`, with the `bioconda <https://bioconda.github.io>`_. channel set up and the `mamba <https://github.com/mamba-org/mamba>`_ drop-in replacement for conda installed.
107

118
If this is new to you, please see :ref:`conda-envs`.
129

1310
.. note::
1411

15-
`lcdb-wf` is tested and heavily used on Linux.
16-
17-
It is likely to work on macOS as long as all relevant conda packages are
18-
available for macOS -- though this is not tested.
19-
20-
It will **not** work on Windows due to a general lack of support of Windows
21-
in bioinformatics tools.
12+
`lcdb-wf` is tested and heavily used on Linux. It is only supported on
13+
Linux.
2214

2315
.. _setup-proj:
2416

@@ -27,21 +19,24 @@ Setting up a project
2719

2820
The general steps to use lcdb-wf in a new project are:
2921

30-
1. **Deploy:** download and run ``deploy.py``
22+
1. **Deploy:** download and run ``deploy.py`` to copy files into a project directory
3123
2. **Configure:** set up samples table for experiments and edit configuration file
3224
3. **Run:** activate environment and run the Snakemake file either locally or on a cluster
3325

3426
.. _deploy:
3527

3628
1. Deploying lcdb-wf
3729
--------------------
30+
Using `lcdb-wf` starts with copying files to a project directory, or
31+
"deploying".
3832

3933
Unlike other tools you may have used, `lcdb-wf` is not actually installed per
4034
se. Rather, it is "deployed" by copying over relevant files from the `lcdb-wf`
4135
repository to your project directory. This includes Snakefiles, config files,
4236
and other infrastructure required to run, and excludes files like these docs
43-
and testing files that are not necessary for an actual project. The reason to
44-
use this script is so you end up with a cleaner project directory.
37+
and testing files that are not necessary for an actual project. The reason is
38+
to use this script is so you end up with a cleaner project directory, compared
39+
to cloning the repo directly.
4540

4641
This script also writes a file to the destination called
4742
``.lcdb-wf-deployment.json``. It stores the timestamp and details about what
@@ -53,8 +48,8 @@ There are a few ways of doing this.
5348
Option 1: Download and run the deployment script
5449
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5550

56-
Note that you will not be able to run tests with this method, but it is likely
57-
the most convenient method.
51+
This is the most convenient method, although it does not allow running tests
52+
locally.
5853

5954
.. code-block:: bash
6055
@@ -63,7 +58,8 @@ the most convenient method.
6358
6459
Run ``python deploy.py -h`` to see help. Be sure to use the ``--staging`` and
6560
``--branch=$BRANCH`` arguments when using this method, which will clone the
66-
repository to a location of your choosing. Once you deploy you can remove it. For example:
61+
repository to a location of your choosing. Once you deploy you can remove the
62+
script. For example:
6763

6864
.. code-block:: bash
6965
@@ -78,6 +74,12 @@ repository to a location of your choosing. Once you deploy you can remove it. Fo
7874
# You can clean up the cloned copy if you want:
7975
# rm -rf /tmp/lcdb-wf-tmp
8076
77+
This will clone the full git repo to ``/tmp/lcdb-wf-tmp``, check out the master
78+
branch (or whatever branch ``$BRANCH`` is set to), copy the files required for
79+
an RNA-seq project over to ``analysis/project``, build the main conda
80+
environment and the R environment, save the ``.lcdb-wf-deployment.json`` file
81+
there, and then delete the temporary repo.
82+
8183
Option 2: Clone repo manually
8284
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8385
Clone a repo using git and check out the branch. Use this method for running
@@ -140,9 +142,9 @@ and run the following:
140142
141143
If all goes well, this should print a list of jobs to be run.
142144

143-
You can run locally, but this is NOT recommended. To run locally, choose the
144-
number of CPUs you want to use with the ``-j`` argument as is standard for
145-
Snakemake.
145+
You can run locally, but this is NOT recommended for a typicaly RNA-seq
146+
project. To run locally, choose the number of CPUs you want to use with the
147+
``-j`` argument as is standard for Snakemake.
146148

147149
.. warning::
148150

@@ -157,18 +159,35 @@ Snakemake.
157159
# run locally (not recommended)
158160
snakemake --use-conda -j 8
159161
160-
The recommended way is to run on a cluster. On NIH's Biowulf cluster, the way
161-
to do this is to submit the wrapper script as a batch job:
162+
The recommended way is to run on a cluster.
163+
164+
To run on a cluster, you will need a `Snakemake profile
165+
<https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles>`_ for
166+
your cluster that translates generic resource requirements into arguments for
167+
your cluster's batch system.
168+
169+
On NIH's Biowulf cluster, the profile can be found at
170+
https://github.com/NIH-HPC/snakemake_profile. If you are not already using this for other Snakemake workflows, you can set it up the first time like this:
171+
172+
1. Clone the profile to a location of your choosing, maybe
173+
``~/snakemake_profile``
174+
2. Set the environment variable ``LCDBWF_SNAKEMAKE_PROFILE``, perhaps in your
175+
``~/.bashrc`` file.
176+
177+
Then back in your deployed and configured project, submit the wrapper script as
178+
a batch job:
162179

163180
.. code-block:: bash
164181
165182
sbatch ../../include/WRAPPER_SLURM
166183
167-
and then monitor the various jobs that will be submitted on your behalf. See
184+
This will submit Snakemake as a batch job, use the profile to translate
185+
resources to cluster arguments and set default command-line arguments, and
186+
submit the various jobs created by Snakemake to the cluster on your behalf. See
168187
:ref:`cluster` for more details on this.
169188

170-
Other clusters will need different configuration, but everything is standard
171-
Snakemake. The Snakemake documentation on `cluster execution
189+
Other clusters will need different configuration, but everything in `lcdb-wf`
190+
is standard Snakemake. The Snakemake documentation on `cluster execution
172191
<https://snakemake.readthedocs.io/en/stable/executing/cluster.html>`_ and
173192
`cloud execution
174193
<https://snakemake.readthedocs.io/en/stable/executing/cloud.html>`_ can be

docs/index.rst

Lines changed: 28 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,18 @@ Non-model organism? Custom gene annotations? Complicated regression models?
1717
Unconventional command-line arguments to tools? New tools to add to the
1818
workflow? No problem.
1919

20-
Tested automatically
21-
--------------------
22-
Every change to the code on GitHub triggers an automated test, the results of
23-
which you can find at https://circleci.com/gh/lcdb/lcdb-wf. Each test sets the
24-
system up from scratch, including installing all software, downloading example
25-
data, and running everything up through the final results. This guarantees that
26-
you can set up and test the code yourself.
20+
Extensive downstream RNA-seq
21+
----------------------------
22+
A comprehensive RMarkdown template, along with a custom R package, enables
23+
sophisticated RNA-seq analysis that supports complex experimental designs and
24+
many contrasts.
25+
26+
Extenstive exploration of ChIP-seq peaks
27+
----------------------------------------
28+
The ChIP-seq configuration supports multiple peak-callers as well as calling
29+
peaks with many different parameter sets for each caller. Combined with
30+
visualizaiton in track hubs (see below), this can identify the optimal
31+
parameters for a given experiment.
2732

2833
Track hubs
2934
----------
@@ -44,11 +49,10 @@ a site to get lots of genomes you can use for running `fastq_screen`, and
4449
easily include arbitrary other genomes. They can then be automatically included
4550
in RNA-seq and ChIP-seq workflows.
4651

47-
This system is designed to allow customization as the config file
48-
can be used to include arbitrary genomes whether local or on the web.
49-
The `references` workflow need only be run once for all these genomes
50-
to be created, with the `references_dir` being used as a centralized
51-
repository that can be then used with all other workflows.
52+
Arbitrary genomes can be used, whether local (e.g., customized with additional
53+
genetic constructs) or on the web. The `references` workflow need only be run
54+
once for all these genomes to be created, with the `references_dir` being used
55+
as a centralized repository that can be then used with all other workflows.
5256

5357
Integration with external data and figure-making
5458
------------------------------------------------
@@ -59,6 +63,15 @@ If an upstream file changes (e.g., gene annotation), all dependent downstream
5963
jobs -- including figures -- will be updated so you can ensure that even
6064
complex analyses stay correct and up-to-date.
6165

66+
Tested automatically
67+
--------------------
68+
Every change to the code on GitHub triggers an automated test, the results of
69+
which you can find at https://circleci.com/gh/lcdb/lcdb-wf. Each test sets the
70+
system up from scratch, including installing all software, downloading example
71+
data, and running everything up through the final results. This guarantees that
72+
you can set up and test the code yourself.
73+
74+
6275
All the advantages of Snakemake
6376
-------------------------------
6477

@@ -78,7 +91,9 @@ Only run the required jobs
7891
~~~~~~~~~~~~~~~~~~~~~~~~~~
7992
New gene annotation? Snakemake tracks dependencies, so it will detect that the
8093
annotations changed. Only jobs that depend on that file at some point in their
81-
dependency chain will be re-run and the independent files are untouched.
94+
dependency chain will be re-run and the independent files are untouched. Adding
95+
a new sample will leave untouched any output from samples that have already
96+
run.
8297

8398
Parallelization
8499
~~~~~~~~~~~~~~~

docs/tests.rst

Lines changed: 1 addition & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,7 @@ This assumes you have set up the `bioconda channel
3333
3434
We **highly recommend** using conda for isolating projects and for analysis
3535
reproducibility. If you are unfamiliar with conda, we provide a more detailed look
36-
at:
37-
38-
.. toctree::
39-
:maxdepth: 2
40-
41-
conda
36+
at :ref:`conda-envs`.
4237

4338

4439
Activate the main env
@@ -186,13 +181,3 @@ Exhaustive tests
186181
The file ``.circleci/config.yml`` configures all of the tests that are run on
187182
CircleCI. There's a lot of configuration happening there, but look for the
188183
entries that have ``./run_test.sh`` in them to see the commands that are run.
189-
190-
Next steps
191-
----------
192-
193-
Now that you have tested your installation of ``lcdb-wf`` you can learn about the
194-
different workflows implemented here at the :ref:`workflows` page and see details
195-
on configuration at :ref:`config`, before getting started on your analysis.
196-
197-
In addition, :ref:`setup-proj` explains the process of deploying ``lcdb-wf``
198-
to a project directory.

docs/toc.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,20 @@ Table of Contents
22
=================
33

44
.. toctree::
5-
:maxdepth: 2
5+
:maxdepth: 3
66

77
index
88
getting-started
99
guide
10-
tests
1110
workflows
1211
config
1312
references
1413
rnaseq
1514
downstream-rnaseq
1615
chipseq
1716
integrative
17+
conda
18+
tests
1819
faqs
1920
changelog
2021
developers

0 commit comments

Comments
 (0)