Skip to content

feat: add bioRxiv submission package command (v1.19.0)#291

Merged
paxcalpt merged 1 commit intomainfrom
feature/biorxiv-submission-v1.19.0
Feb 5, 2026
Merged

feat: add bioRxiv submission package command (v1.19.0)#291
paxcalpt merged 1 commit intomainfrom
feature/biorxiv-submission-v1.19.0

Conversation

@paxcalpt
Copy link
Contributor

@paxcalpt paxcalpt commented Feb 5, 2026

Summary

This PR adds a new rxiv biorxiv command that generates a complete bioRxiv submission package, including author template (TSV), manuscript PDF, source files, and a ready-to-upload ZIP archive.

New Features

  • bioRxiv Submission Package Command: rxiv biorxiv generates complete submission package
    • Generates bioRxiv author template (TSV format) with HTML entity encoding
    • Includes manuscript PDF, source files (TeX, figures, bibliography)
    • Creates ZIP archive ready for bioRxiv upload
    • Supports custom submission directory and ZIP filename options
    • HTML entity encoding for special characters (António → António, Åbo → Åbo)
    • Automatic handling of multiple corresponding authors (keeps last one)
    • Command options: --biorxiv-dir, --zip-filename, --no-zip

Code Improvements

  • Centralized Submission Logic: Refactored common patterns into BaseCommand
    • Added _clear_output_directory(), _ensure_pdf_built(), _set_submission_defaults() helper methods
    • Refactored ArxivCommand and BioRxivCommand to use shared methods
    • Eliminated ~64 lines of duplicated code between commands
    • Improved maintainability and consistency across submission commands

Bug Fixes

  • bioRxiv Character Encoding: Special characters now properly encoded as HTML entities
    • Previously stripped accents to ASCII (António → Antonio)
    • Now preserves original characters using HTML entities (António → António)
    • Complies with bioRxiv's TSV import requirements for international author names

Testing

  • Added 25 unit tests for bioRxiv functionality
  • Test HTML entity encoding, TSV generation, and packaging
  • All tests passing (23 passed, 2 skipped)

Checklist

  • Version bumped to 1.19.0
  • CHANGELOG.md updated
  • All tests passing
  • Pre-commit hooks passing
  • Code formatted with ruff

🤖 Generated with Claude Code

New Features:
- Add `rxiv biorxiv` command for complete submission package generation
- Generate bioRxiv author template (TSV) with HTML entity encoding
- Include manuscript PDF, source files, and ZIP archive
- Support for custom directories and filenames (--biorxiv-dir, --zip-filename, --no-zip)
- HTML entity encoding for special characters (António → António)
- Automatic handling of multiple corresponding authors

Code Improvements:
- Centralize common submission logic in BaseCommand
- Add shared helper methods: _clear_output_directory(), _ensure_pdf_built(), _set_submission_defaults()
- Refactor ArxivCommand and BioRxivCommand to use shared methods
- Eliminate ~64 lines of duplicated code

Bug Fixes:
- Fix special character handling for bioRxiv (use HTML entities instead of ASCII stripping)

Tests:
- Add 25 unit tests for bioRxiv functionality
- Test HTML entity encoding, TSV generation, packaging
- All tests passing (23 passed, 2 skipped)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 5, 2026 15:17
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Pull Request Review: bioRxiv Submission Package Command (v1.19.0)

Overview

This PR adds a new rxiv biorxiv command that generates complete bioRxiv submission packages. The implementation is well-structured with good test coverage and follows the project's established patterns.

Strengths

Code Quality:

  • Excellent refactoring: The centralization of common submission logic in BaseCommand eliminates ~64 lines of duplication
  • Strong type hints throughout
  • Good separation of concerns
  • Comprehensive docstrings

Testing:

  • 25 unit tests covering core functionality
  • Good edge case coverage

Feature Implementation:

  • HTML entity encoding properly handles international characters
  • Robust error handling
  • Flexible options for customization

Issues and Recommendations

1. Data Mutation - HIGH PRIORITY
Location: src/rxiv_maker/engines/operations/prepare_biorxiv.py:187-196

The code mutates the original config data which is a side effect that violates functional programming principles.
Recommendation: Work with a deep copy using copy.deepcopy()

2. Inconsistent Error Handling - MEDIUM PRIORITY
Location: src/rxiv_maker/engines/operations/prepare_biorxiv.py:91-98

The code raises an error for multiple corresponding authors but then fixes it silently elsewhere.
Recommendation: Choose one approach - auto-fix with warning OR always error

3. Performance Issue - MEDIUM PRIORITY
Location: src/rxiv_maker/engines/operations/prepare_biorxiv.py:45-51

The dictionary is rebuilt on every function call which is expensive.
Recommendation: Cache the mapping at module level

4. Missing ZIP Path Validation - MEDIUM PRIORITY
Location: src/rxiv_maker/engines/operations/prepare_biorxiv.py:371

Could fail silently or create files in unexpected locations.
Recommendation: Add path validation and ensure parent directory exists

5. Test Coverage Gaps

  • Integration test for full CLI workflow
  • Error path tests for package and ZIP creation

6. gitignore Changes - MINOR
The additions of **/CLAUDE.md and **/*.docx should be clarified

Security Considerations

  • Ensure no TSV injection via control characters
  • Verify symlinks handled safely in ZIP creation
  • Validate base64 email decoding

Final Verdict

Recommended Action: APPROVE WITH MINOR REVISIONS

Priority fixes before merge:

  1. Fix data mutation issue (HIGH)
  2. Resolve inconsistent error handling (MEDIUM)
  3. Add ZIP path validation (MEDIUM)

Can be addressed in follow-up:

  • Performance optimization
  • Test coverage expansion
  • Other minor improvements

Excellent work on this feature!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new rxiv biorxiv command that generates complete bioRxiv submission packages, including author templates with HTML entity encoding for international characters, manuscript PDFs, source files, and ZIP archives. The PR also refactors common submission logic into BaseCommand helper methods to reduce code duplication between the arxiv and biorxiv commands.

Changes:

  • Added new rxiv biorxiv command with TSV author template generation using HTML entity encoding
  • Refactored submission command logic into shared BaseCommand helper methods (_clear_output_directory, _ensure_pdf_built, _set_submission_defaults)
  • Added 25 unit tests for bioRxiv functionality covering HTML entity encoding, TSV generation, and validation

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/rxiv_maker/engines/operations/prepare_biorxiv.py Core bioRxiv functionality: HTML entity encoding, TSV generation, package preparation, and ZIP creation
src/rxiv_maker/cli/commands/biorxiv.py CLI command definition for bioRxiv submission package generation
src/rxiv_maker/cli/framework/base.py Added shared helper methods for submission commands
src/rxiv_maker/cli/framework/workflow_commands.py Refactored ArxivCommand to use shared helpers and added BioRxivCommand implementation
src/rxiv_maker/cli/main.py Registered biorxiv command in CLI
src/rxiv_maker/cli/commands/init.py Exported biorxiv command
tests/unit/test_prepare_biorxiv.py Comprehensive unit tests for bioRxiv functionality
tests/unit/test_biorxiv_command.py Basic CLI command test
src/rxiv_maker/version.py Version bump to 1.19.0
CHANGELOG.md Documented new features and changes
.gitignore Added development file patterns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +187 to +200
# Handle multiple corresponding authors: keep only the last one
corresponding_indices = [i for i, author in enumerate(authors) if author.get("corresponding_author", False)]
if len(corresponding_indices) > 1:
# Unmark all but the last corresponding author
for idx in corresponding_indices[:-1]:
authors[idx]["corresponding_author"] = False
logger.warning(
f"Multiple corresponding authors found. Only keeping the last one: "
f"{authors[corresponding_indices[-1]].get('name', 'Unknown')}"
)

# Validate author data
validate_author_data(authors)

Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for handling multiple corresponding authors is confusing. Lines 187-196 automatically keep only the last corresponding author, but then lines 198-199 call validate_author_data() which checks for multiple corresponding authors and would raise an error. However, since the multiple authors have already been handled, the validation check for multiple corresponding authors becomes dead code.

Consider either: (1) moving the validation before the automatic handling, and only handle multiple authors if validation fails, or (2) removing the multiple-corresponding-author check from validate_author_data() since it's handled at the call site. The current approach makes the validation function's behavior inconsistent with its actual usage.

Suggested change
# Handle multiple corresponding authors: keep only the last one
corresponding_indices = [i for i, author in enumerate(authors) if author.get("corresponding_author", False)]
if len(corresponding_indices) > 1:
# Unmark all but the last corresponding author
for idx in corresponding_indices[:-1]:
authors[idx]["corresponding_author"] = False
logger.warning(
f"Multiple corresponding authors found. Only keeping the last one: "
f"{authors[corresponding_indices[-1]].get('name', 'Unknown')}"
)
# Validate author data
validate_author_data(authors)
# Validate author data; if validation fails due to multiple corresponding authors,
# automatically keep only the last corresponding author and re-validate.
try:
validate_author_data(authors)
except BioRxivAuthorError:
corresponding_indices = [
i for i, author in enumerate(authors) if author.get("corresponding_author", False)
]
if len(corresponding_indices) > 1:
# Unmark all but the last corresponding author
for idx in corresponding_indices[:-1]:
authors[idx]["corresponding_author"] = False
logger.warning(
f"Multiple corresponding authors found. Only keeping the last one: "
f"{authors[corresponding_indices[-1]].get('name', 'Unknown')}"
)
# Re-validate after automatic correction
validate_author_data(authors)
else:
# The validation error was not due to multiple corresponding authors
raise

Copilot uses AI. Check for mistakes.
Path to the created ZIP file
"""
# Use manuscript-aware naming if manuscript path is provided
if manuscript_path and zip_filename == "biorxiv_submission.zip":
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition on line 367 will never be true in practice. The default zip_filename is set by _set_submission_defaults() to include the full path (e.g., "/path/to/output/manuscript_biorxiv.zip"), not just "biorxiv_submission.zip". This makes lines 367-369 unreachable code.

The manuscript_path parameter is already used by the caller to set an appropriate default filename via _set_submission_defaults(), so these lines appear to be redundant. Consider removing them or adjusting the logic to check if the filename matches the expected pattern rather than an exact string match.

Suggested change
if manuscript_path and zip_filename == "biorxiv_submission.zip":
if manuscript_path and Path(zip_filename).name == "biorxiv_submission.zip":

Copilot uses AI. Check for mistakes.
Comment on lines +283 to +315
def test_multiple_authors(self, tmp_path):
"""Test TSV with multiple authors."""
config_path = tmp_path / "00_CONFIG.yml"
config_content = """
authors:
- name: John Smith
email: john@example.com
affiliations: [inst1]
corresponding_author: false
- name: Jane Doe
email: jane@example.com
affiliations: [inst2]
corresponding_author: true

affiliations:
- shortname: inst1
full_name: University A
- shortname: inst2
full_name: University B
"""
config_path.write_text(config_content)

output_path = tmp_path / "biorxiv_authors.tsv"
generate_biorxiv_author_tsv(config_path, output_path)

with open(output_path, newline="", encoding="utf-8") as f:
reader = csv.reader(f, delimiter="\t")
rows = list(reader)

assert len(rows) == 3 # Header + 2 authors
assert rows[1][6] == "" # First author not corresponding (empty string)
assert rows[2][6] == "Yes" # Second author is corresponding

Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for the automatic handling of multiple corresponding authors. While the PR description mentions "Automatic handling of multiple corresponding authors (keeps last one)" and there's code to implement this (lines 187-196), there's no test that verifies this behavior works correctly.

Consider adding a test case where the config has multiple authors marked as corresponding authors to verify that: (1) the function succeeds without raising an error, (2) only the last author is marked as corresponding in the output TSV, and (3) a warning is logged.

Copilot uses AI. Check for mistakes.
@paxcalpt paxcalpt merged commit 44b5aac into main Feb 5, 2026
15 of 16 checks passed
@paxcalpt paxcalpt deleted the feature/biorxiv-submission-v1.19.0 branch February 5, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant