feat: add bioRxiv submission package command (v1.19.0) by paxcalpt · Pull Request #291 · HenriquesLab/rxiv-maker

paxcalpt · 2026-02-05T15:17:12Z

Summary

This PR adds a new rxiv biorxiv command that generates a complete bioRxiv submission package, including author template (TSV), manuscript PDF, source files, and a ready-to-upload ZIP archive.

New Features

bioRxiv Submission Package Command: rxiv biorxiv generates complete submission package
- Generates bioRxiv author template (TSV format) with HTML entity encoding
- Includes manuscript PDF, source files (TeX, figures, bibliography)
- Creates ZIP archive ready for bioRxiv upload
- Supports custom submission directory and ZIP filename options
- HTML entity encoding for special characters (António → António, Åbo → Åbo)
- Automatic handling of multiple corresponding authors (keeps last one)
- Command options: --biorxiv-dir, --zip-filename, --no-zip

Code Improvements

Centralized Submission Logic: Refactored common patterns into BaseCommand
- Added _clear_output_directory(), _ensure_pdf_built(), _set_submission_defaults() helper methods
- Refactored ArxivCommand and BioRxivCommand to use shared methods
- Eliminated ~64 lines of duplicated code between commands
- Improved maintainability and consistency across submission commands

Bug Fixes

bioRxiv Character Encoding: Special characters now properly encoded as HTML entities
- Previously stripped accents to ASCII (António → Antonio)
- Now preserves original characters using HTML entities (António → António)
- Complies with bioRxiv's TSV import requirements for international author names

Testing

Added 25 unit tests for bioRxiv functionality
Test HTML entity encoding, TSV generation, and packaging
All tests passing (23 passed, 2 skipped)

Checklist

🤖 Generated with Claude Code

New Features: - Add `rxiv biorxiv` command for complete submission package generation - Generate bioRxiv author template (TSV) with HTML entity encoding - Include manuscript PDF, source files, and ZIP archive - Support for custom directories and filenames (--biorxiv-dir, --zip-filename, --no-zip) - HTML entity encoding for special characters (António → António) - Automatic handling of multiple corresponding authors Code Improvements: - Centralize common submission logic in BaseCommand - Add shared helper methods: _clear_output_directory(), _ensure_pdf_built(), _set_submission_defaults() - Refactor ArxivCommand and BioRxivCommand to use shared methods - Eliminate ~64 lines of duplicated code Bug Fixes: - Fix special character handling for bioRxiv (use HTML entities instead of ASCII stripping) Tests: - Add 25 unit tests for bioRxiv functionality - Test HTML entity encoding, TSV generation, packaging - All tests passing (23 passed, 2 skipped) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

github-actions · 2026-02-05T15:20:22Z

Pull Request Review: bioRxiv Submission Package Command (v1.19.0)

Overview

This PR adds a new rxiv biorxiv command that generates complete bioRxiv submission packages. The implementation is well-structured with good test coverage and follows the project's established patterns.

Strengths

Code Quality:

Excellent refactoring: The centralization of common submission logic in BaseCommand eliminates ~64 lines of duplication
Strong type hints throughout
Good separation of concerns
Comprehensive docstrings

Testing:

25 unit tests covering core functionality
Good edge case coverage

Feature Implementation:

HTML entity encoding properly handles international characters
Robust error handling
Flexible options for customization

Issues and Recommendations

1. Data Mutation - HIGH PRIORITY
Location: src/rxiv_maker/engines/operations/prepare_biorxiv.py:187-196

The code mutates the original config data which is a side effect that violates functional programming principles.
Recommendation: Work with a deep copy using copy.deepcopy()

2. Inconsistent Error Handling - MEDIUM PRIORITY
Location: src/rxiv_maker/engines/operations/prepare_biorxiv.py:91-98

The code raises an error for multiple corresponding authors but then fixes it silently elsewhere.
Recommendation: Choose one approach - auto-fix with warning OR always error

3. Performance Issue - MEDIUM PRIORITY
Location: src/rxiv_maker/engines/operations/prepare_biorxiv.py:45-51

The dictionary is rebuilt on every function call which is expensive.
Recommendation: Cache the mapping at module level

4. Missing ZIP Path Validation - MEDIUM PRIORITY
Location: src/rxiv_maker/engines/operations/prepare_biorxiv.py:371

Could fail silently or create files in unexpected locations.
Recommendation: Add path validation and ensure parent directory exists

5. Test Coverage Gaps

Integration test for full CLI workflow
Error path tests for package and ZIP creation

6. gitignore Changes - MINOR
The additions of **/CLAUDE.md and **/*.docx should be clarified

Security Considerations

Ensure no TSV injection via control characters
Verify symlinks handled safely in ZIP creation
Validate base64 email decoding

Final Verdict

Recommended Action: APPROVE WITH MINOR REVISIONS

Priority fixes before merge:

Fix data mutation issue (HIGH)
Resolve inconsistent error handling (MEDIUM)
Add ZIP path validation (MEDIUM)

Can be addressed in follow-up:

Performance optimization
Test coverage expansion
Other minor improvements

Excellent work on this feature!

Copilot

Pull request overview

This PR adds a new rxiv biorxiv command that generates complete bioRxiv submission packages, including author templates with HTML entity encoding for international characters, manuscript PDFs, source files, and ZIP archives. The PR also refactors common submission logic into BaseCommand helper methods to reduce code duplication between the arxiv and biorxiv commands.

Changes:

Added new rxiv biorxiv command with TSV author template generation using HTML entity encoding
Refactored submission command logic into shared BaseCommand helper methods (_clear_output_directory, _ensure_pdf_built, _set_submission_defaults)
Added 25 unit tests for bioRxiv functionality covering HTML entity encoding, TSV generation, and validation

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/rxiv_maker/engines/operations/prepare_biorxiv.py	Core bioRxiv functionality: HTML entity encoding, TSV generation, package preparation, and ZIP creation
src/rxiv_maker/cli/commands/biorxiv.py	CLI command definition for bioRxiv submission package generation
src/rxiv_maker/cli/framework/base.py	Added shared helper methods for submission commands
src/rxiv_maker/cli/framework/workflow_commands.py	Refactored ArxivCommand to use shared helpers and added BioRxivCommand implementation
src/rxiv_maker/cli/main.py	Registered biorxiv command in CLI
src/rxiv_maker/cli/commands/init.py	Exported biorxiv command
tests/unit/test_prepare_biorxiv.py	Comprehensive unit tests for bioRxiv functionality
tests/unit/test_biorxiv_command.py	Basic CLI command test
src/rxiv_maker/version.py	Version bump to 1.19.0
CHANGELOG.md	Documented new features and changes
.gitignore	Added development file patterns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-05T15:25:23Z

src/rxiv_maker/engines/operations/prepare_biorxiv.py

+    # Handle multiple corresponding authors: keep only the last one
+    corresponding_indices = [i for i, author in enumerate(authors) if author.get("corresponding_author", False)]
+    if len(corresponding_indices) > 1:
+        # Unmark all but the last corresponding author
+        for idx in corresponding_indices[:-1]:
+            authors[idx]["corresponding_author"] = False
+        logger.warning(
+            f"Multiple corresponding authors found. Only keeping the last one: "
+            f"{authors[corresponding_indices[-1]].get('name', 'Unknown')}"
+        )
+
+    # Validate author data
+    validate_author_data(authors)
+


The logic for handling multiple corresponding authors is confusing. Lines 187-196 automatically keep only the last corresponding author, but then lines 198-199 call validate_author_data() which checks for multiple corresponding authors and would raise an error. However, since the multiple authors have already been handled, the validation check for multiple corresponding authors becomes dead code.

Consider either: (1) moving the validation before the automatic handling, and only handle multiple authors if validation fails, or (2) removing the multiple-corresponding-author check from validate_author_data() since it's handled at the call site. The current approach makes the validation function's behavior inconsistent with its actual usage.

Suggested change

# Handle multiple corresponding authors: keep only the last one

corresponding_indices = [i for i, author in enumerate(authors) if author.get("corresponding_author", False)]

if len(corresponding_indices) > 1:

# Unmark all but the last corresponding author

for idx in corresponding_indices[:-1]:

authors[idx]["corresponding_author"] = False

logger.warning(

f"Multiple corresponding authors found. Only keeping the last one: "

f"{authors[corresponding_indices[-1]].get('name', 'Unknown')}"

)

# Validate author data

validate_author_data(authors)

# Validate author data; if validation fails due to multiple corresponding authors,

# automatically keep only the last corresponding author and re-validate.

try:

validate_author_data(authors)

except BioRxivAuthorError:

corresponding_indices = [

i for i, author in enumerate(authors) if author.get("corresponding_author", False)

]

if len(corresponding_indices) > 1:

# Unmark all but the last corresponding author

for idx in corresponding_indices[:-1]:

authors[idx]["corresponding_author"] = False

logger.warning(

f"Multiple corresponding authors found. Only keeping the last one: "

f"{authors[corresponding_indices[-1]].get('name', 'Unknown')}"

)

# Re-validate after automatic correction

validate_author_data(authors)

else:

# The validation error was not due to multiple corresponding authors

raise

Copilot · 2026-02-05T15:25:23Z

src/rxiv_maker/engines/operations/prepare_biorxiv.py

+        Path to the created ZIP file
+    """
+    # Use manuscript-aware naming if manuscript path is provided
+    if manuscript_path and zip_filename == "biorxiv_submission.zip":


The condition on line 367 will never be true in practice. The default zip_filename is set by _set_submission_defaults() to include the full path (e.g., "/path/to/output/manuscript_biorxiv.zip"), not just "biorxiv_submission.zip". This makes lines 367-369 unreachable code.

The manuscript_path parameter is already used by the caller to set an appropriate default filename via _set_submission_defaults(), so these lines appear to be redundant. Consider removing them or adjusting the logic to check if the filename matches the expected pattern rather than an exact string match.

Suggested change

if manuscript_path and zip_filename == "biorxiv_submission.zip":

if manuscript_path and Path(zip_filename).name == "biorxiv_submission.zip":

Copilot · 2026-02-05T15:25:24Z

tests/unit/test_prepare_biorxiv.py

+    def test_multiple_authors(self, tmp_path):
+        """Test TSV with multiple authors."""
+        config_path = tmp_path / "00_CONFIG.yml"
+        config_content = """
+authors:
+  - name: John Smith
+    email: john@example.com
+    affiliations: [inst1]
+    corresponding_author: false
+  - name: Jane Doe
+    email: jane@example.com
+    affiliations: [inst2]
+    corresponding_author: true
+
+affiliations:
+  - shortname: inst1
+    full_name: University A
+  - shortname: inst2
+    full_name: University B
+"""
+        config_path.write_text(config_content)
+
+        output_path = tmp_path / "biorxiv_authors.tsv"
+        generate_biorxiv_author_tsv(config_path, output_path)
+
+        with open(output_path, newline="", encoding="utf-8") as f:
+            reader = csv.reader(f, delimiter="\t")
+            rows = list(reader)
+
+        assert len(rows) == 3  # Header + 2 authors
+        assert rows[1][6] == ""  # First author not corresponding (empty string)
+        assert rows[2][6] == "Yes"  # Second author is corresponding
+


Missing test coverage for the automatic handling of multiple corresponding authors. While the PR description mentions "Automatic handling of multiple corresponding authors (keeps last one)" and there's code to implement this (lines 187-196), there's no test that verifies this behavior works correctly.

Consider adding a test case where the config has multiple authors marked as corresponding authors to verify that: (1) the function succeeds without raising an error, (2) only the last author is marked as corresponding in the output TSV, and (3) a warning is logged.

Copilot AI review requested due to automatic review settings February 5, 2026 15:17

Copilot started reviewing on behalf of paxcalpt February 5, 2026 15:17 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

paxcalpt merged commit 44b5aac into main Feb 5, 2026
15 of 16 checks passed

paxcalpt deleted the feature/biorxiv-submission-v1.19.0 branch February 5, 2026 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add bioRxiv submission package command (v1.19.0)#291

feat: add bioRxiv submission package command (v1.19.0)#291
paxcalpt merged 1 commit intomainfrom
feature/biorxiv-submission-v1.19.0

paxcalpt commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if manuscript_path and zip_filename == "biorxiv_submission.zip":
	if manuscript_path and Path(zip_filename).name == "biorxiv_submission.zip":

Conversation

paxcalpt commented Feb 5, 2026

Summary

New Features

Code Improvements

Bug Fixes

Testing

Checklist

Uh oh!

github-actions bot commented Feb 5, 2026

Pull Request Review: bioRxiv Submission Package Command (v1.19.0)

Overview

Strengths

Issues and Recommendations

Security Considerations

Final Verdict

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant