-
Couldn't load subscription status.
- Fork 1
docs: paper #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: paper #111
Conversation
WalkthroughA new document titled "A Snakemake workflow for differential expression analysis with alternative splicing detection using long read data" has been added. The document details a modular, reproducible workflow optimized for high-performance computing environments. It outlines steps including quality filtering, read alignment, quantification, differential expression analysis, and alternative splicing detection using a modified FLAIR toolkit. An optional annotation procedure for under-annotated or novel genomes is also described. Additionally, a new GitHub Actions workflow has been introduced to automate the generation and upload of a draft PDF from the document. A new bibliography file with 18 relevant references has also been added. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/HPC
participant Workflow as Snakemake Workflow
participant Quality as Quality Filtering
participant Align as Read Alignment
participant Quant as Quantification
participant DiffExp as Differential Analysis
participant Splice as Splicing Analysis (FLAIR)
participant Annot as Optional Annotation
User->>Workflow: Start workflow with input data
Workflow->>Quality: Execute quality filtering
Quality-->>Workflow: Return filtered data
Workflow->>Align: Perform read alignment
Align-->>Workflow: Return aligned reads
Workflow->>Quant: Quantify transcripts
Quant-->>Workflow: Return expression data
Workflow->>DiffExp: Run differential expression analysis
DiffExp-->>Workflow: Return statistical outcomes
Workflow->>Splice: Perform splicing analysis using FLAIR
Splice-->>Workflow: Return isoform data
Workflow->>Annot: (Optional) Annotate novel genomes
Annot-->>Workflow: Return annotation details
Workflow-->>User: Provide final workflow output
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
paper/paper.md (7)
4-10: Correct Typos in Title and Tags
The title and the tags contain typographical errors. In the title, "analyis" should be corrected to "analysis" and in the tags, "alterternative splicing detection" should be corrected to "alternative splicing detection".-title: 'A Snakemake workflow for differential expression analyis with alternative splicing detection using long read data' +title: 'A Snakemake workflow for differential expression analysis with alternative splicing detection using long read data' - - alterternative splicing detection + - alternative splicing detection🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
6-6: Unordered list indentation
Expected: 0; Actual: 2(MD007, ul-indent)
7-7: Unordered list indentation
Expected: 0; Actual: 2(MD007, ul-indent)
8-8: Unordered list indentation
Expected: 0; Actual: 2(MD007, ul-indent)
9-9: Unordered list indentation
Expected: 0; Actual: 2(MD007, ul-indent)
10-10: Unordered list indentation
Expected: 0; Actual: 2(MD007, ul-indent)
19-20: Review Placeholder ORCID Value
The ORCID for Christiane Krämer is set as "XXX ?". If this is a placeholder, please update it with a valid ORCID or remove the placeholder to maintain professionalism.
31-31: Remove Inline Comment from Date Field
The date field includes the annotation " <- update". Consider removing this inline note before finalizing the document to ensure the metadata remains clean.
5-10: Address YAML List Indentation Issues in Tags Block
Markdownlint has flagged unordered list indentation issues (MD007) for the tags block. You might either adjust the indentation to conform to the linter’s expectations or configure the linter to ignore YAML front matter.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
6-6: Unordered list indentation
Expected: 0; Actual: 2(MD007, ul-indent)
7-7: Unordered list indentation
Expected: 0; Actual: 2(MD007, ul-indent)
8-8: Unordered list indentation
Expected: 0; Actual: 2(MD007, ul-indent)
9-9: Unordered list indentation
Expected: 0; Actual: 2(MD007, ul-indent)
10-10: Unordered list indentation
Expected: 0; Actual: 2(MD007, ul-indent)
25-29: Ensure Consistent Indentation in Affiliations Block
There are inconsistent indentation issues in the affiliations list. Align the list items and their sub-properties uniformly to improve readability and prevent potential parsing issues.🧰 Tools
🪛 LanguageTool
[uncategorized] ~28-~28: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ny index: 1 - name: NHR-SouthWest / High Performance Computing Group, Johannes Gutenberg-Uni...(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.17.2)
26-26: Inconsistent indentation for list items at the same level
Expected: 2; Actual: 1(MD005, list-indent)
26-26: Unordered list indentation
Expected: 0; Actual: 1(MD007, ul-indent)
28-28: Inconsistent indentation for list items at the same level
Expected: 2; Actual: 1(MD005, list-indent)
28-28: Unordered list indentation
Expected: 0; Actual: 1(MD007, ul-indent)
78-78: Fix Typo and Enhance Readability in Quality Filtering Section
The word "Prioar" appears to be a typo and should be corrected to "Prior to". Additionally, consider inserting a comma after the introductory phrase "For this" to improve clarity (i.e. "For this, we make use of the BioPython library...").🧰 Tools
🪛 LanguageTool
[typographical] ~78-~78: Use a comma after an introductory phrase.
Context: ...ead quality and read length thresholds. For this we make use of the BioPython library [@...(COMMA_INTRODUCTORY_WORDS_PHRASES)
93-93: Remove Placeholder Text for Rulegraph
The text "ADD rulegraph and caption, here." appears to be a placeholder. Ensure that this placeholder is replaced with the final rulegraph content or removed before merging the draft paper.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
paper/paper.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
paper/paper.md
[uncategorized] ~28-~28: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...ny index: 1 - name: NHR-SouthWest / High Performance Computing Group, Johannes Gutenberg-Uni...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[typographical] ~78-~78: Use a comma after an introductory phrase.
Context: ...ead quality and read length thresholds. For this we make use of the BioPython library [@...
(COMMA_INTRODUCTORY_WORDS_PHRASES)
🪛 markdownlint-cli2 (0.17.2)
paper/paper.md
6-6: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
7-7: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
8-8: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
9-9: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
10-10: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
12-12: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
18-18: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
26-26: Inconsistent indentation for list items at the same level
Expected: 2; Actual: 1
(MD005, list-indent)
26-26: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
28-28: Inconsistent indentation for list items at the same level
Expected: 2; Actual: 1
(MD005, list-indent)
28-28: Unordered list indentation
Expected: 0; Actual: 1
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
.github/workflows/draft-pdf.yml (1)
28-28: Newline at End of FileThe linter also reported that there’s no newline character at the end of the file (line 28). Please add a newline to adhere to best practices and avoid potential issues in some tools.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 28-28: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/draft-pdf.yml(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/draft-pdf.yml
[warning] 4-4: wrong indentation: expected 8 but found 6
(indentation)
[warning] 5-5: wrong indentation: expected 10 but found 8
(indentation)
[warning] 9-9: wrong indentation: expected 4 but found 2
(indentation)
[warning] 10-10: wrong indentation: expected 6 but found 4
(indentation)
[warning] 13-13: wrong indentation: expected 8 but found 6
(indentation)
[warning] 18-18: wrong indentation: expected 12 but found 10
(indentation)
[warning] 24-24: wrong indentation: expected 12 but found 10
(indentation)
[error] 28-28: no new line character at the end of file
(new-line-at-end-of-file)
| jobs: | ||
| paper: | ||
| runs-on: ubuntu-latest | ||
| name: Paper Draft | ||
| steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Jobs Block: Correct Indentation
The jobs section also shows inconsistent indentation. For instance, line 9 (the key paper:) is indented with 2 spaces while the expected indentation is 4, and line 10’s runs-on: ubuntu-latest should be indented by 6 spaces rather than 4.
A consistent reindentation might look like this:
Adjust the rest of the document similarly to maintain uniformity.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 9-9: wrong indentation: expected 4 but found 2
(indentation)
[warning] 10-10: wrong indentation: expected 6 but found 4
(indentation)
.github/workflows/draft-pdf.yml
Outdated
| - name: Checkout | ||
| uses: actions/checkout@v4 | ||
| - name: Build draft PDF | ||
| uses: openjournals/openjournals-draft-action@master | ||
| with: | ||
| journal: joss | ||
| # This should be the path to the paper within your repo. | ||
| paper-path: paper/paper.md | ||
| - name: Upload | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: paper | ||
| # This is the output path where Pandoc will write the compiled | ||
| # PDF. Note, this should be the same directory as the input | ||
| # paper.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Steps Block: Reindent Step Items and Parameters
Several steps within the job have indentation issues as well. Static analysis reports that:
- Line 13 (“- name: Checkout”) is indented with 6 spaces instead of the expected 8.
- Line 18 (the “journal: joss” key) is indented with 10 spaces when 12 are expected.
- Line 24 (“name: paper”) similarly shows too few spaces.
A suggested reindentation for the steps block using a 2-space-per-level standard is:
This will help clear the YAML lint warnings and ensure that the workflow is parsed correctly.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 13-13: wrong indentation: expected 8 but found 6
(indentation)
[warning] 18-18: wrong indentation: expected 12 but found 10
(indentation)
[warning] 24-24: wrong indentation: expected 12 but found 10
(indentation)
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
paper/paper.md (4)
1-31: Fix Typos and Remove Editorial Notes in Front Matter
- Line 2: The title contains a typo—“differential expression analyis” should be “differential expression analysis.”
- Line 8: The tag “alterternative splicing detection” should be corrected to “alternative splicing detection.”
- Line 29: Remove the “<- update” note from the date field once the paper’s metadata is finalized.
16-18: Review Placeholder ORCID FieldThe ORCID for Christiane Krämer is listed as
XXX ?, which appears to be a placeholder. Please update this field with the correct ORCID or remove it if not applicable.
76-77: Correct Typographical Error in Quality Filtering SectionThe sentence beginning on line 76 starts with “Prioar downstream analysis...”. It should read “Prior to downstream analysis...” to ensure clarity and correctness.
90-90: Remove Placeholder TextThe placeholder text “ADD rulegraph and caption, here.” on line 90 should be replaced with the actual rulegraph and appropriate caption, or removed if it is not required in the final draft.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~90-~90: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...exploratory studies. ADD rulegraph and caption, here. # Usage # Acknowledgements A...(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/draft-pdf.yml(1 hunks)paper/paper.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
paper/paper.md
[typographical] ~75-~75: Use a comma after an introductory phrase.
Context: ...ead quality and read length thresholds. For this we make use of the BioPython library [@...
(COMMA_INTRODUCTORY_WORDS_PHRASES)
[uncategorized] ~90-~90: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...exploratory studies. ADD rulegraph and caption, here. # Usage # Acknowledgements A...
(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
🪛 YAMLlint (1.35.1)
.github/workflows/draft-pdf.yml
[error] 24-24: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (2)
.github/workflows/draft-pdf.yml (2)
1-7: YAML Trigger Configuration Looks CorrectThe trigger block under
on:is well structured using consistent 2-space indentation for keys and 6-space indentation for list items. This update resolves the previous indentation issues flagged in earlier commits.
8-24: Append Newline at End of FileStatic analysis has flagged that there is no newline character at the end of the file (line 24). Please ensure a newline is added to meet YAML formatting best practices.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 24-24: no new line character at the end of file
(new-line-at-end-of-file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
.github/workflows/draft-pdf.yml (2)
16-16: Pin the draft action to a specific release
Referencing@mastercan introduce unintended breakages. Pin to a stable tag or commit SHA for reproducibility:- uses: openjournals/openjournals-draft-action@master + uses: openjournals/openjournals-draft-action@v1
24-24: Add a trailing newline
YAML lint flags missing end-of-file newline.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
paper/rulegraph.svgis excluded by!**/*.svg
📒 Files selected for processing (1)
.github/workflows/draft-pdf.yml(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
.github/workflows/draft-pdf.yml
[error] 24-24: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Testing
🔇 Additional comments (2)
.github/workflows/draft-pdf.yml (2)
1-6: Trigger configuration is well-defined
The workflow correctly triggers on pushes to thepaper/**directory and the workflow file itself.
8-12: Job definition is appropriate
Thepaperjob is clearly named and configured to run on Ubuntu, aligning with the workflow’s objectives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🔭 Outside diff range comments (1)
paper/paper.md (1)
108-108: Remove stray “108” or complete References section
The trailing “108” appears to be a placeholder. Ensure that the References list is properly formatted and complete.
🧹 Nitpick comments (4)
paper/paper.md (4)
17-17: Verify ORCID for Christiane Krämer
The placeholderorcid: XXX ?needs to be replaced with the correct ORCID or removed if unavailable.
75-78: Grammar & hyphenation in quality-filtering section
- “Prioar” → “Prior”
- Add comma after introductory phrase: “For this, we make use…”
- Hyphenate compound adjective: “low-quality reads”
89-91: Typo in "data bases"
The term “data bases” should be one word: “databases.”
94-95: Improve image alt text
The placeholderlacks descriptive alt text. Consider:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
paper/paper.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
paper/paper.md
[typographical] ~76-~76: Use a comma after an introductory phrase.
Context: ...rs can specify a read length threshold. For this we make use of the BioPython library [@...
(COMMA_INTRODUCTORY_WORDS_PHRASES)
[uncategorized] ~76-~76: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...on the ONT basecaller for filtering out low quality reads. Sample quality statistics and re...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[misspelling] ~90-~90: This word is normally spelled as one.
Context: ... aligned against curated UniRef protein data bases using BLAST or lambda. This provides pu...
(EN_COMPOUNDS_DATA_BASES)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Formatting
| - name: NHR-SouthWest / High Performance Computing Group, Johannes Gutenberg-University Mainz, Germany | ||
| index: 2 | ||
|
|
||
| date: 04 April 2025 <- update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove inline comment from date field
The arrow and “<- update” in date: 04 April 2025 <- update will break YAML. Use a clean date or ISO format:
date: 2025-04-04🤖 Prompt for AI Agents
In paper/paper.md at line 29, remove the inline comment arrow and text from the
date field to avoid breaking YAML syntax. Replace the current line with a clean
date format, preferably ISO format like "date: 2025-04-04", ensuring the date is
valid and properly formatted without any trailing comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (1)
paper/paper.md (1)
29-29: Fix date format and remove inline comment.
The inline"<- update"breaks YAML parsing and the date should follow ISO format.- date: 04 April 2025 <- update + date: 2025-04-04
🧹 Nitpick comments (8)
paper/paper.md (8)
12-12: Consider using a YAML sequence for multiple affiliations.
The fieldaffiliation: "1, 2"is treated as a string; representing it as a list of integers will be more idiomatic and easier to parse.-affiliation: "1, 2" +affiliation: [1, 2]
26-26: Eliminate unintended extra spaces.
There are three spaces between “University” and “Mainz”. Consolidate to a single space.- Johannes Gutenberg-University Mainz, Germany + Johannes Gutenberg-University Mainz, Germany
34-34: Use consistent heading levels for sections.
Top-level headings (#) should be reserved for the document title. Change section titles to level-2 headings (##).- # Summary + ## Summary - # Statement of Need + ## Statement of NeedAlso applies to: 46-46
75-76: Correct typo and add missing comma.
The introductory sentence has a misspelling and lacks the comma after “For this”.- Prioar downstream analysis, reads undergo a configurable quality control step. + Prior to downstream analysis, reads undergo a configurable quality control step. - For this we make use of the BioPython library [@cock_biopython_2009]. + For this, we make use of the BioPython library [@cock_biopython_2009].
77-77: Fix subject-verb agreement.
“Those” is plural, so the verb should be “ensure” not “ensures”.- Those are included in the workflow report and ensures high-confidence input... + These are included in the workflow report and ensure high-confidence input...
78-78: Use a hyphen for the compound adjective.
Compound modifiers like “low quality reads” should be hyphenated.- filtering out low quality reads + filtering out low-quality reads
90-90: Correct spelling of “data bases”.
The word should be “databases” in one word.- against curated UniRef protein data bases + against curated UniRef protein databases
94-94: Provide descriptive alt text for the figure.
A clear alt description improves accessibility.-  + 
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
paper/rulegraph.svgis excluded by!**/*.svg
📒 Files selected for processing (1)
paper/paper.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
paper/paper.md
[typographical] ~76-~76: Use a comma after an introductory phrase.
Context: ...rs can specify a read length threshold. For this we make use of the BioPython library [@...
(COMMA_INTRODUCTORY_WORDS_PHRASES)
[uncategorized] ~76-~76: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...on the ONT basecaller for filtering out low quality reads. Sample quality statistics and re...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[misspelling] ~90-~90: This word is normally spelled as one.
Context: ... aligned against curated UniRef protein data bases using BLAST or lambda. This provides pu...
(EN_COMPOUNDS_DATA_BASES)
| # Acknowledgements | ||
|
|
||
| Any? | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Populate the Acknowledgements section.
Please add funding sources, contributor credits, and institutional support information.
🤖 Prompt for AI Agents
In paper/paper.md around lines 103 to 106, the Acknowledgements section is
currently empty. Populate this section by adding relevant funding sources,
contributor credits, and institutional support information to properly
acknowledge all contributions and support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (1)
paper/paper.md (1)
29-29: Remove inline comment fromdatefield
The trailing<- updatewill break YAML parsing. Please clean up the date and consider ISO format:- date: 04 April 2025 <- update + date: 2025-04-04
🧹 Nitpick comments (2)
paper/paper.md (2)
76-76: Refine punctuation and hyphenation
Add a comma after the introductory phrase and hyphenate the compound adjective for consistency:- For this we make use of the BioPython library [@cock_biopython_2009]. + For this, we make use of the BioPython library [@cock_biopython_2009]. - filtering out low quality reads + filtering out low-quality reads
90-90: Correct “data bases” to “databases”
The term “data bases” should be one word:- curated UniRef protein data bases using BLAST or lambda + curated UniRef protein databases using BLAST or lambda
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
paper/rulegraph.svgis excluded by!**/*.svg
📒 Files selected for processing (1)
paper/paper.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
paper/paper.md
[typographical] ~76-~76: Use a comma after an introductory phrase.
Context: ...rs can specify a read length threshold. For this we make use of the BioPython library [@...
(COMMA_INTRODUCTORY_WORDS_PHRASES)
[uncategorized] ~76-~76: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...on the ONT basecaller for filtering out low quality reads. Sample quality statistics and re...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[misspelling] ~90-~90: This word is normally spelled as one.
Context: ... aligned against curated UniRef protein data bases using BLAST or lambda. This provides pu...
(EN_COMPOUNDS_DATA_BASES)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Testing
| # Acknowledgements | ||
|
|
||
| Any? | ||
|
|
||
| # References |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Populate the Acknowledgements section
The Acknowledgements block is empty. Add funding sources, contributor credits, and institutional support to properly acknowledge contributions.
🤖 Prompt for AI Agents
In paper/paper.md around lines 103 to 107, the Acknowledgements section is
currently empty. Populate this section by adding relevant funding sources,
contributor credits, and institutional support to properly acknowledge all
contributions related to the paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
paper/paper.md (2)
16-18: Invalid placeholder ORCID breaks YAML front-matter
orcid: XXX ?is still a placeholder and will fail validation/parsing. Replace with the real ORCID or drop the field until you have it.- - name: Christiane Krämer - orcid: XXX ? - affiliation: 1 + - name: Christiane Krämer + # TODO: add valid ORCID, e.g. 0000-000X-XXXX-XXXX (remove this comment once filled) + affiliation: 1
29-30: YAMLdatefield contains inline comment → parsing error
The arrow and comment fragment render the field non-YAML-compliant. Use ISO format without trailing text.-date: 04 April 2025 <- update +date: 2025-04-04
🧹 Nitpick comments (4)
paper/paper.md (4)
72-75: Comma & hyphen for clarity; tighten wording-Prior to downstream analysis, reads undergo a configurable quality control step. Users can specify a read length threshold. For this we make use of the BioPython library ... +Before downstream analysis, reads undergo a configurable quality-control step. Users can specify a read-length threshold. For this, we use the BioPython library ...
86-89: Spelling: “data bases” → “databases”-... against curated UniRef protein data bases using BLAST or lambda. +... against curated UniRef protein databases using BLAST or lambda.
98-99: Empty Markdown link placeholder triggers MD042Remove or replace the placeholder:
-[]() # Add report HTML +# TODO: insert link to HTML report once available
115-122: Specify language for fenced code block (MD040)-``` +```bash snakemake -j unlimited \ # Run snakemake with no job limit ...</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used: CodeRabbit UI** **Review profile: CHILL** **Plan: Pro** <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 4af47cc04b0667165a2fb9d58cdcacea2018faa1 and d94f12e3f490cb44531e205d6bbb231eb1dcb6f3. </details> <details> <summary>📒 Files selected for processing (2)</summary> * `paper/paper.bib` (1 hunks) * `paper/paper.md` (1 hunks) </details> <details> <summary>✅ Files skipped from review due to trivial changes (1)</summary> * paper/paper.bib </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🪛 LanguageTool</summary> <details> <summary>paper/paper.md</summary> [style] ~73-~73: ‘Prior to’ might be wordy. Consider a shorter alternative. Context: .... ## Quality Filtering and Assessment Prior to downstream analysis, reads undergo a co... (EN_WORDINESS_PREMIUM_PRIOR_TO) --- [typographical] ~74-~74: Use a comma after an introductory phrase. Context: ...rs can specify a read length threshold. For this we make use of the BioPython library [@... (COMMA_INTRODUCTORY_WORDS_PHRASES) --- [uncategorized] ~74-~74: If this is a compound adjective that modifies the following noun, use a hyphen. Context: ...on the ONT basecaller for filtering out low quality reads. Sample quality statistics and re... (EN_COMPOUND_ADJECTIVE_INTERNAL) --- [misspelling] ~88-~88: This word is normally spelled as one. Context: ... aligned against curated UniRef protein data bases using BLAST or lambda. This provides pu... (EN_COMPOUNDS_DATA_BASES) --- [grammar] ~112-~112: This phrase is duplicated. You should probably use “the workflow” only once. Context: ...ur computing environment. ### Running the workflow The workflow can be started by executing the followi... (PHRASE_REPETITION) </details> </details> <details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> <details> <summary>paper/paper.md</summary> 98-98: No empty links null (MD042, no-empty-links) --- 102-102: Heading levels should only increment by one level at a time Expected: h2; Actual: h3 (MD001, heading-increment) --- 115-115: Fenced code blocks should have a language specified null (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms (1)</summary> * GitHub Check: Formatting </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| ## Implementation | ||
|
|
||
| ## Input Data and Reference Handling | ||
|
|
||
| The workflow accepts raw ONT reads in FASTQ format, along with either user-specified or automatically downloaded reference data. Reference transcriptomes and genome assemblies can be provided as file paths, or alternatively, specified using NCBI accession numbers, in which case the required data are retrieved using `ncbi-datasets` [@oleary_exploring_2024]. | ||
| This allows users to flexibly apply the workflow to well-characterized model organisms or newly sequenced, poorly annotated species. | ||
|
|
||
| ## Quality Filtering and Assessment | ||
|
|
||
| Prior to downstream analysis, reads undergo a configurable quality control step. Users can specify a read length threshold. For this we make use of the BioPython library [@cock_biopython_2009]. To ensure sufficient quality, we rely on the ONT basecaller for filtering out low quality reads. Sample quality statistics and read length distributions are assessed using NanoPlot [@de_coster_nanopack_2018], which generates interactive and publication-ready QC plots. These are included in the workflow report and ensure high-confidence input for downstream expression and splicing analysis. | ||
|
|
||
| ## Transcriptome Alignment and Differential Expression Analysis | ||
|
|
||
| Reads passing quality filters are aligned to the reference transcriptome by `minimap2` [@li_minimap2_2018]. Following alignment, read counts per transcript are computed and used for differential expression analysis using pyDESeq2 [@zhu_heavy-tailed_2019;@love_moderated_2014], a Python-native implementation of the DESeq2 method. | ||
|
|
||
| This enables statistical analysis of gene expression changes across experimental conditions while staying within a Python-based workflow ecosystem. | ||
|
|
||
| ## Alternative Splicing Analysis | ||
|
|
||
| For isoform-level analysis, the workflow integrates the FLAIR toolkit [@tang_full-length_2020-1]. We adapted the FLAIR plotting script to improve Snakemake compatibility and enable automated per-gene isoform visualization. Isoforms are collapsed, quantified, and categorized to identify splicing patterns and events across conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Heading levels jump — violates MD001 and hurts document structure
## Implementation is level-2; its subsections (## Input Data…, ## Quality Filtering…, etc.) should be level-3.
-## Input Data and Reference Handling
-## Quality Filtering and Assessment
-## Transcriptome Alignment and Differential Expression Analysis
-## Alternative Splicing Analysis
+### Input Data and Reference Handling
+### Quality Filtering and Assessment
+### Transcriptome Alignment and Differential Expression Analysis
+### Alternative Splicing Analysis📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Implementation | |
| ## Input Data and Reference Handling | |
| The workflow accepts raw ONT reads in FASTQ format, along with either user-specified or automatically downloaded reference data. Reference transcriptomes and genome assemblies can be provided as file paths, or alternatively, specified using NCBI accession numbers, in which case the required data are retrieved using `ncbi-datasets` [@oleary_exploring_2024]. | |
| This allows users to flexibly apply the workflow to well-characterized model organisms or newly sequenced, poorly annotated species. | |
| ## Quality Filtering and Assessment | |
| Prior to downstream analysis, reads undergo a configurable quality control step. Users can specify a read length threshold. For this we make use of the BioPython library [@cock_biopython_2009]. To ensure sufficient quality, we rely on the ONT basecaller for filtering out low quality reads. Sample quality statistics and read length distributions are assessed using NanoPlot [@de_coster_nanopack_2018], which generates interactive and publication-ready QC plots. These are included in the workflow report and ensure high-confidence input for downstream expression and splicing analysis. | |
| ## Transcriptome Alignment and Differential Expression Analysis | |
| Reads passing quality filters are aligned to the reference transcriptome by `minimap2` [@li_minimap2_2018]. Following alignment, read counts per transcript are computed and used for differential expression analysis using pyDESeq2 [@zhu_heavy-tailed_2019;@love_moderated_2014], a Python-native implementation of the DESeq2 method. | |
| This enables statistical analysis of gene expression changes across experimental conditions while staying within a Python-based workflow ecosystem. | |
| ## Alternative Splicing Analysis | |
| For isoform-level analysis, the workflow integrates the FLAIR toolkit [@tang_full-length_2020-1]. We adapted the FLAIR plotting script to improve Snakemake compatibility and enable automated per-gene isoform visualization. Isoforms are collapsed, quantified, and categorized to identify splicing patterns and events across conditions. | |
| ## Implementation | |
| ### Input Data and Reference Handling | |
| The workflow accepts raw ONT reads in FASTQ format, along with either user-specified or automatically downloaded reference data. Reference transcriptomes and genome assemblies can be provided as file paths, or alternatively, specified using NCBI accession numbers, in which case the required data are retrieved using `ncbi-datasets` [@oleary_exploring_2024]. | |
| This allows users to flexibly apply the workflow to well-characterized model organisms or newly sequenced, poorly annotated species. | |
| ### Quality Filtering and Assessment | |
| Prior to downstream analysis, reads undergo a configurable quality control step. Users can specify a read length threshold. For this we make use of the BioPython library [@cock_biopython_2009]. To ensure sufficient quality, we rely on the ONT basecaller for filtering out low quality reads. Sample quality statistics and read length distributions are assessed using NanoPlot [@de_coster_nanopack_2018], which generates interactive and publication-ready QC plots. These are included in the workflow report and ensure high-confidence input for downstream expression and splicing analysis. | |
| ### Transcriptome Alignment and Differential Expression Analysis | |
| Reads passing quality filters are aligned to the reference transcriptome by `minimap2` [@li_minimap2_2018]. Following alignment, read counts per transcript are computed and used for differential expression analysis using pyDESeq2 [@zhu_heavy-tailed_2019;@love_moderated_2014], a Python-native implementation of the DESeq2 method. | |
| This enables statistical analysis of gene expression changes across experimental conditions while staying within a Python-based workflow ecosystem. | |
| ### Alternative Splicing Analysis | |
| For isoform-level analysis, the workflow integrates the FLAIR toolkit [@tang_full-length_2020-1]. We adapted the FLAIR plotting script to improve Snakemake compatibility and enable automated per-gene isoform visualization. Isoforms are collapsed, quantified, and categorized to identify splicing patterns and events across conditions. |
🧰 Tools
🪛 LanguageTool
[style] ~73-~73: ‘Prior to’ might be wordy. Consider a shorter alternative.
Context: .... ## Quality Filtering and Assessment Prior to downstream analysis, reads undergo a co...
(EN_WORDINESS_PREMIUM_PRIOR_TO)
[typographical] ~74-~74: Use a comma after an introductory phrase.
Context: ...rs can specify a read length threshold. For this we make use of the BioPython library [@...
(COMMA_INTRODUCTORY_WORDS_PHRASES)
[uncategorized] ~74-~74: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...on the ONT basecaller for filtering out low quality reads. Sample quality statistics and re...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🤖 Prompt for AI Agents
In paper/paper.md around lines 65 to 84, the heading levels jump incorrectly by
using multiple level-2 headings for subsections under the "Implementation"
section, violating markdown structure rules. Change the subsections "Input Data
and Reference Handling," "Quality Filtering and Assessment," "Transcriptome
Alignment and Differential Expression Analysis," and "Alternative Splicing
Analysis" from level-2 headings (##) to level-3 headings (###) to properly nest
them under the main "Implementation" level-2 heading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai - this might be correct. We will have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmeesters, sounds good! Take your time reviewing the heading structure. The suggestion is there whenever you're ready to address it. Let me know if you have any questions about the markdown formatting or need any other assistance with the paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
paper/paper.md (2)
29-29: Invaliddatefield still breaks YAML front-matterFront-matter cannot be parsed because the
datevalue contains free-text. Use an ISO date and drop the inline comment.-date: 04 April 2025 <- update +date: 2025-04-04
65-84: Heading hierarchy jumps (‘##’ → ‘##’) – violates MD001 and hurts structureSub-sections under “Implementation” should be level-3 (
###) to nest properly.-## Input Data and Reference Handling -## Quality Filtering and Assessment -## Transcriptome Alignment and Differential Expression Analysis -## Alternative Splicing Analysis +### Input Data and Reference Handling +### Quality Filtering and Assessment +### Transcriptome Alignment and Differential Expression Analysis +### Alternative Splicing Analysis
🧹 Nitpick comments (4)
paper/paper.md (4)
72-75: Minor wording and punctuation fixes-Prior to downstream analysis, reads undergo a configurable quality control step. Users can specify a read length threshold. For this we make use of the BioPython library ... +Before downstream analysis, reads undergo a configurable quality-control step. Users can specify a read-length threshold. For this, we make use of the Biopython library ...
88-90: Spelling: “data bases” → “databases”-... aligned against curated UniRef protein data bases using BLAST or lambda ... +... aligned against curated UniRef protein databases using BLAST or lambda ...
98-99: Typos and possessive form-... were accesed on 31 March 2025 ... This example illustrates the workflows capabilities ... +... were accessed on 31 March 2025 ... This example illustrates the workflow’s capabilities ...
132-134: Populate the Acknowledgements section before publicationThe “Any?” placeholder must be replaced with funding sources, institutional support, and contributor credits.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
paper/paper.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
paper/paper.md
[style] ~73-~73: ‘Prior to’ might be wordy. Consider a shorter alternative.
Context: .... ## Quality Filtering and Assessment Prior to downstream analysis, reads undergo a co...
(EN_WORDINESS_PREMIUM_PRIOR_TO)
[typographical] ~74-~74: Use a comma after an introductory phrase.
Context: ...rs can specify a read length threshold. For this we make use of the BioPython library [@...
(COMMA_INTRODUCTORY_WORDS_PHRASES)
[uncategorized] ~74-~74: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...on the ONT basecaller for filtering out low quality reads. Sample quality statistics and re...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[misspelling] ~88-~88: This word is normally spelled as one.
Context: ... aligned against curated UniRef protein data bases using BLAST or lambda. This provides pu...
(EN_COMPOUNDS_DATA_BASES)
[uncategorized] ~98-~98: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...nex-data). This example illustrates the workflows capabilities in isoform analysis and fu...
(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)
[grammar] ~114-~114: This phrase is duplicated. You should probably use “the workflow” only once.
Context: ...ur computing environment. ### Running the workflow The workflow can be started by executing the followi...
(PHRASE_REPETITION)
[uncategorized] ~126-~126: A comma might be missing here.
Context: ...irectory. When working from a different directory specifying the workflow with `--snakefi...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
🪛 markdownlint-cli2 (0.17.2)
paper/paper.md
100-100: No empty links
(MD042, no-empty-links)
104-104: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
117-117: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
| ``` | ||
| snakemake -j unlimited \ # Run snakemake with no job limit | ||
| --workflow-profile <profile_dir> \ # Use a specific Snakemake profile | ||
| --configfile ../config/<config_file> \ # Specify the main config file | ||
| --directory <results_dir> \ # Set working directory | ||
| --sdm_conda \ # Enable conda software deployment | ||
| --conda_prefix <conda_envs_dir> # Custom prefix for conda environments | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage snippet contains wrong flags and lacks language spec
--workflow-profileis--profilein Snakemake ≥5.--sdm_condais not a valid flag; use--use-conda.- Fenced block misses a language hint (
bash).
-```
-snakemake -j unlimited \ # Run snakemake with no job limit
---workflow-profile <profile_dir> \ # Use a specific Snakemake profile
+```bash
+snakemake -j unlimited \ # Run snakemake with no job limit
+--profile <profile_dir> \ # Use a specific Snakemake profile
--configfile ../config/<config_file> \ # Specify the main config file
--directory <results_dir> \ # Set working directory
---sdm_conda \ # Enable conda software deployment
+--use-conda \ # Enable conda software deployment
--conda_prefix <conda_envs_dir> # Custom prefix for conda environments🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
117-117: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In paper/paper.md around lines 117 to 124, update the Snakemake usage snippet by
replacing the incorrect flag --workflow-profile with --profile, and replace the
invalid flag --sdm_conda with --use-conda. Also, add a language hint "bash" to
the fenced code block by changing the opening triple backticks to ```bash to
improve syntax highlighting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--sdm is short for --software-deployment-method and the correct way would be --sdm conda without the underscore.
there is a difference between --workflow-profile and --profile - here, --workflow-profile is really meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
@cmeesters Thank you for the correction! I appreciate you clarifying the proper Snakemake flag usage. You're absolutely right about both points:
--sdmis indeed short for--software-deployment-methodand should be--sdm conda--workflow-profileis the correct flag and is different from--profile
I apologize for the confusion in my previous suggestion. The only remaining issue would be adding the language specification (bash) to the fenced code block for better syntax highlighting.
(__/)
( o.o )
^ <
learns something new
🏁 Script executed:
#!/bin/bash
# Search for Snakemake documentation on command-line options to verify the flags
rg -A 5 -B 5 "workflow-profile|sdm|software-deployment-method" --type md
Length of output: 1046
@cmeesters Thank you for the clarification! The search confirms that these flags are specific to your workflow usage.
I see that the paper currently shows --sdm_conda with an underscore, but based on your correction, it should be --sdm conda with a space. So the paper would need this small adjustment:
---sdm_conda \ # Enable conda software deployment
+--sdm conda \ # Enable conda software deploymentAnd you're absolutely right about --workflow-profile being the correct flag. I appreciate you taking the time to correct my misunderstanding!
(__/)
( o.o )
^ <
grateful for the correction
✏️ Learnings added
Learnt from: cmeesters
PR: snakemake-workflows/rna-longseq-de-isoform#111
File: paper/paper.md:117-124
Timestamp: 2025-07-02T19:53:21.197Z
Learning: In Snakemake, `--sdm` is short for `--software-deployment-method` and should be used as `--sdm conda` (not `--use-conda`).
Learnt from: cmeesters
PR: snakemake-workflows/rna-longseq-de-isoform#111
File: paper/paper.md:117-124
Timestamp: 2025-07-02T19:53:21.197Z
Learning: In Snakemake, `--workflow-profile` is different from `--profile` and is the correct flag for specifying workflow profiles.
Learnt from: cmeesters
PR: snakemake-workflows/rna-longseq-de-isoform#111
File: paper/paper.md:117-124
Timestamp: 2025-07-02T19:53:21.197Z
Learning: In Snakemake, `--sdm` is short for `--software-deployment-method` and should be used as `--sdm conda` (with a space, not underscore).
Learnt from: cmeesters
PR: snakemake-workflows/rna-longseq-de-isoform#111
File: paper/paper.md:117-124
Timestamp: 2025-07-02T19:53:21.197Z
Learning: In Snakemake, `--workflow-profile` is different from `--profile` and is the correct flag for specifying workflow profiles.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
|
|
||
| A complete example run using six cDNA Nanopore sequencing datasets from human H9 and HCT 116 cell lines is provided below. The data originate from the SG-NEx project [@Chen2025-yk] and were accesed on 31 March 2025 from [registry.opendata.aws/sg-nex-data](registry.opendata.aws/sg-nex-data). This example illustrates the workflows capabilities in isoform analysis and functional annotation. | ||
|
|
||
| []() # Add report HTML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove empty Markdown link placeholder
Line contains [](), triggering MD042.
-[]() # Add report HTML
+# TODO: embed report HTML once generated📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| []() # Add report HTML | |
| # TODO: embed report HTML once generated |
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
100-100: No empty links
(MD042, no-empty-links)
🤖 Prompt for AI Agents
In paper/paper.md at line 100, remove the empty Markdown link placeholder `[]()`
as it triggers the MD042 linting error. Simply delete this placeholder to
resolve the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai: this is a paper draft - we will work upon it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmeesters Understood! Since this is a paper draft in development, I'll hold off on nitpicking the formatting details for now. Feel free to work on the content, and I can help review any specific sections or provide feedback as you develop the paper further. Good luck with the draft!
this PR is not to be merged until after the paper review.
The intention is to develop a paper draft together .
Summary by CodeRabbit
Documentation
New Features