update docs

daler · daler · commit a487dff60cb1 · 2023-04-09T21:52:11.000-04:00
diff --git a/docs/conda.rst b/docs/conda.rst
@@ -128,16 +128,19 @@ dependency tree and come up with a solution that works to satisfy the entire
 set of specified requirements.
 
 We chose to split the conda environments in two: the **main** environment and the **R**
-environment (see :ref:`conda-design-decisons`). These environments are
+environment (see :ref:`conda-design-decisions`). These environments are
 described by both "strict" and "loose" files. By default we use the "strict"
 version, which pins all versions of all packages exactly. This is preferred
 wherever possible. However we also provide a "loose" version that is not
 specific about versions. The following table describes these files:
 
++----------------+--------------------------------+----------------------------------+
 | strict version | loose version                  | used for                         |
 +================+================================+==================================+
 | ``env.yml``    | ``include/requirements.txt``   | Main Snakefiles                  |
++----------------+--------------------------------+----------------------------------+
 | ``env-r.yaml`` | ``include/requirements-r.txt`` | Downstream RNA-seq analysis in R |
++----------------+--------------------------------+----------------------------------+
 
 When deploying new instances, use the ``--build-envs`` argument which will use
 the strict version. Or use the following commands in a deployed directory:
diff --git a/docs/config.rst b/docs/config.rst
@@ -2,8 +2,8 @@
 .. _config:
 
 
-Configuration details
-=====================
+Configuration
+=============
 
 General configuration
 ~~~~~~~~~~~~~~~~~~~~~
diff --git a/docs/getting-started.rst b/docs/getting-started.rst
@@ -3,22 +3,14 @@
 Getting started
 ===============
 
-The main prerequisite for `lcdb-wf` is `conda
-<https://docs.conda.io/en/latest/>_`, with the `bioconda
-<https://bioconda.github.io>`_. channel set up and the `mamba
-<https://github.com/mamba-org/mamba>`_ drop-in replacement for conda.
+The main prerequisite for `lcdb-wf` is `conda <https://docs.conda.io/en/latest/>_`, with the `bioconda <https://bioconda.github.io>`_. channel set up and the `mamba <https://github.com/mamba-org/mamba>`_ drop-in replacement for conda installed.
 
 If this is new to you, please see :ref:`conda-envs`.
 
 .. note::
 
-    `lcdb-wf` is tested and heavily used on Linux.
-
-    It is likely to work on macOS as long as all relevant conda packages are
-    available for macOS -- though this is not tested.
-
-    It will **not** work on Windows due to a general lack of support of Windows
-    in bioinformatics tools.
+    `lcdb-wf` is tested and heavily used on Linux. It is only supported on
+    Linux.
 
 .. _setup-proj:
 
@@ -27,21 +19,24 @@ Setting up a project
 
 The general steps to use lcdb-wf in a new project are:
 
-1. **Deploy:** download and run ``deploy.py``
+1. **Deploy:** download and run ``deploy.py`` to copy files into a project directory
 2. **Configure:** set up samples table for experiments and edit configuration file
 3. **Run:** activate environment and run the Snakemake file either locally or on a cluster
 
 .. _deploy:
 
 1. Deploying lcdb-wf
 --------------------
+Using `lcdb-wf` starts with copying files to a project directory, or
+"deploying".
 
 Unlike other tools you may have used, `lcdb-wf` is not actually installed per
 se. Rather, it is "deployed" by copying over relevant files from the `lcdb-wf`
 repository to your project directory. This includes Snakefiles, config files,
 and other infrastructure required to run, and excludes files like these docs
-and testing files that are not necessary for an actual project. The reason to
-use this script is so you end up with a cleaner project directory. 
+and testing files that are not necessary for an actual project. The reason is
+to use this script is so you end up with a cleaner project directory, compared
+to cloning the repo directly.
 
 This script also writes a file to the destination called
 ``.lcdb-wf-deployment.json``. It stores the timestamp and details about what
@@ -53,8 +48,8 @@ There are a few ways of doing this.
 Option 1: Download and run the deployment script
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Note that you will not be able to run tests with this method, but it is likely
-the most convenient method.
+This is the most convenient method, although it does not allow running tests
+locally.
 
 .. code-block:: bash
 
@@ -63,7 +58,8 @@ the most convenient method.
 
 Run ``python deploy.py -h`` to see help. Be sure to use the ``--staging`` and
 ``--branch=$BRANCH`` arguments when using this method, which will clone the
-repository to a location of your choosing. Once you deploy you can remove it. For example:
+repository to a location of your choosing. Once you deploy you can remove the
+script. For example:
 
 .. code-block:: bash
 
@@ -78,6 +74,12 @@ repository to a location of your choosing. Once you deploy you can remove it. Fo
     # You can clean up the cloned copy if you want:
     # rm -rf /tmp/lcdb-wf-tmp
 
+This will clone the full git repo to ``/tmp/lcdb-wf-tmp``, check out the master
+branch (or whatever branch ``$BRANCH`` is set to), copy the files required for
+an RNA-seq project over to ``analysis/project``, build the main conda
+environment and the R environment, save the ``.lcdb-wf-deployment.json`` file
+there, and then delete the temporary repo.
+
 Option 2: Clone repo manually
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Clone a repo using git and check out the branch. Use this method for running
@@ -140,9 +142,9 @@ and run the following:
 
 If all goes well, this should print a list of jobs to be run.
 
-You can run locally, but this is NOT recommended. To run locally, choose the
-number of CPUs you want to use with the ``-j`` argument as is standard for
-Snakemake.
+You can run locally, but this is NOT recommended for a typicaly RNA-seq
+project. To run locally, choose the number of CPUs you want to use with the
+``-j`` argument as is standard for Snakemake.
 
 .. warning::
 
@@ -157,18 +159,35 @@ Snakemake.
     # run locally (not recommended)
     snakemake --use-conda -j 8
 
-The recommended way is to run on a cluster. On NIH's Biowulf cluster, the way
-to do this is to submit the wrapper script as a batch job:
+The recommended way is to run on a cluster.
+
+To run on a cluster, you will need a `Snakemake profile
+<https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles>`_ for
+your cluster that translates generic resource requirements into arguments for
+your cluster's batch system.
+
+On NIH's Biowulf cluster, the profile can be found at
+https://github.com/NIH-HPC/snakemake_profile. If you are not already using this for other Snakemake workflows, you can set it up the first time like this:
+
+1. Clone the profile to a location of your choosing, maybe
+   ``~/snakemake_profile``
+2. Set the environment variable ``LCDBWF_SNAKEMAKE_PROFILE``, perhaps in your
+   ``~/.bashrc`` file.
+
+Then back in your deployed and configured project, submit the wrapper script as
+a batch job:
 
 .. code-block:: bash
 
     sbatch ../../include/WRAPPER_SLURM
 
-and then monitor the various jobs that will be submitted on your behalf. See
+This will submit Snakemake as a batch job, use the profile to translate
+resources to cluster arguments and set default command-line arguments, and
+submit the various jobs created by Snakemake to the cluster on your behalf. See
 :ref:`cluster` for more details on this.
 
-Other clusters will need different configuration, but everything is standard
-Snakemake. The Snakemake documentation on `cluster execution
+Other clusters will need different configuration, but everything in `lcdb-wf`
+is standard Snakemake. The Snakemake documentation on `cluster execution
 <https://snakemake.readthedocs.io/en/stable/executing/cluster.html>`_ and
 `cloud execution
 <https://snakemake.readthedocs.io/en/stable/executing/cloud.html>`_ can be
diff --git a/docs/index.rst b/docs/index.rst
@@ -17,13 +17,18 @@ Non-model organism? Custom gene annotations? Complicated regression models?
 Unconventional command-line arguments to tools? New tools to add to the
 workflow? No problem.
 
-Tested automatically
---------------------
-Every change to the code on GitHub triggers an automated test, the results of
-which you can find at https://circleci.com/gh/lcdb/lcdb-wf. Each test sets the
-system up from scratch, including installing all software, downloading example
-data, and running everything up through the final results. This guarantees that
-you can set up and test the code yourself.
+Extensive downstream RNA-seq
+----------------------------
+A comprehensive RMarkdown template, along with a custom R package, enables
+sophisticated RNA-seq analysis that supports complex experimental designs and
+many contrasts.
+
+Extenstive exploration of ChIP-seq peaks
+----------------------------------------
+The ChIP-seq configuration supports multiple peak-callers as well as calling
+peaks with many different parameter sets for each caller. Combined with
+visualizaiton in track hubs (see below), this can identify the optimal
+parameters for a given experiment.
 
 Track hubs
 ----------
@@ -44,11 +49,10 @@ a site to get lots of genomes you can use for running `fastq_screen`, and
 easily include arbitrary other genomes. They can then be automatically included
 in RNA-seq and ChIP-seq workflows.
 
-This system is designed to allow customization as the config file
-can be used to include arbitrary genomes whether local or on the web.
-The `references` workflow need only be run once for all these genomes
-to be created, with the `references_dir` being used as a centralized
-repository that can be then used with all other workflows.
+Arbitrary genomes can be used, whether local (e.g., customized with additional
+genetic constructs) or on the web. The `references` workflow need only be run
+once for all these genomes to be created, with the `references_dir` being used
+as a centralized repository that can be then used with all other workflows.
 
 Integration with external data and figure-making
 ------------------------------------------------
@@ -59,6 +63,15 @@ If an upstream file changes (e.g., gene annotation), all dependent downstream
 jobs -- including figures -- will be updated so you can ensure that even
 complex analyses stay correct and up-to-date.
 
+Tested automatically
+--------------------
+Every change to the code on GitHub triggers an automated test, the results of
+which you can find at https://circleci.com/gh/lcdb/lcdb-wf. Each test sets the
+system up from scratch, including installing all software, downloading example
+data, and running everything up through the final results. This guarantees that
+you can set up and test the code yourself.
+
+
 All the advantages of Snakemake
 -------------------------------
 
@@ -78,7 +91,9 @@ Only run the required jobs
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 New gene annotation? Snakemake tracks dependencies, so it will detect that the 
 annotations changed. Only jobs that depend on that file at some point in their 
-dependency chain will be re-run and the independent files are untouched.
+dependency chain will be re-run and the independent files are untouched. Adding
+a new sample will leave untouched any output from samples that have already
+run.
 
 Parallelization
 ~~~~~~~~~~~~~~~
diff --git a/docs/tests.rst b/docs/tests.rst
@@ -33,12 +33,7 @@ This assumes you have set up the `bioconda channel
 
 We **highly recommend** using conda for isolating projects and for analysis
 reproducibility. If you are unfamiliar with conda, we provide a more detailed look
-at:
-
-.. toctree::
-   :maxdepth: 2
-
-   conda
+at :ref:`conda-envs`.
 
 
 Activate the main env
@@ -186,13 +181,3 @@ Exhaustive tests
 The file ``.circleci/config.yml`` configures all of the tests that are run on
 CircleCI. There's a lot of configuration happening there, but look for the
 entries that have ``./run_test.sh`` in them to see the commands that are run.
-
-Next steps
-----------
-
-Now that you have tested your installation of ``lcdb-wf`` you can learn about the
-different workflows implemented here at the :ref:`workflows` page and see details
-on configuration at :ref:`config`, before getting started on your analysis.
-
-In addition, :ref:`setup-proj` explains the process of deploying ``lcdb-wf``
-to a project directory.
diff --git a/docs/toc.rst b/docs/toc.rst
@@ -2,19 +2,20 @@ Table of Contents
 =================
 
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 3
 
    index
    getting-started
    guide
-   tests
    workflows
    config
    references
    rnaseq
    downstream-rnaseq
    chipseq
    integrative
+   conda
+   tests
    faqs
    changelog
    developers
diff --git a/docs/workflows.rst b/docs/workflows.rst