Skip to content

feat: add readMedia tool with PDF support#544

Merged
buger merged 1 commit intomainfrom
feat/pdf-support
Mar 20, 2026
Merged

feat: add readMedia tool with PDF support#544
buger merged 1 commit intomainfrom
feat/pdf-support

Conversation

@buger
Copy link
Collaborator

@buger buger commented Mar 20, 2026

Summary

  • Add unified readMedia tool that handles both images (png, jpg, jpeg, webp, bmp, svg) and documents (pdf)
  • PDFs sent via Vercel AI SDK file content part, natively supported by Claude (32MB), OpenAI (50MB), and Gemini (50MB)
  • readImage preserved as backward-compatible alias pointing to the same execute function
  • New mediaConfig.js centralizes format/MIME config, replacing imageConfig.js (which is kept for external consumers)

Backward Compatibility

  • readImage tool name still works (alias)
  • loadImageIfValid() still works (alias to loadMediaIfValid())
  • getCurrentImages() still returns only image data URLs (filters out documents)
  • readImageSchema still exported at all levels

New APIs

  • getCurrentMedia() — returns Vercel AI SDK content parts for both images and documents
  • readMediaSchema — exported alongside readImageSchema
  • Documents stored as { type: 'document', mimeType, data, filename } objects in pendingImages

Test plan

  • All 27 readImage/readMedia tests pass (7 new PDF-specific tests)
  • Full test suite: 3092/3092 tests pass, 130 suites
  • Verify readImage alias works for both images and PDFs
  • Verify getCurrentImages() excludes PDFs (backward compat)
  • Verify getCurrentMedia() returns both image and file content parts
  • Verify PDF stored as document type with correct mimeType

🤖 Generated with Claude Code

…dImage alias

Add a unified readMedia tool that handles both images (png, jpg, jpeg, webp, bmp, svg)
and documents (pdf) via the Vercel AI SDK file content part. PDFs are supported natively
by Claude (32MB), Gemini (50MB), and OpenAI (50MB).

- readImage kept as backward-compatible alias pointing to the same execute function
- Documents stored as { type: 'document', mimeType, data, filename } objects
- Images stored as data URL strings (unchanged)
- getCurrentMedia() returns Vercel AI SDK content parts for both types
- getCurrentImages() filters to images only (backward compat)
- loadMediaIfValid() replaces loadImageIfValid() (alias preserved)
- New mediaConfig.js centralizes format/MIME config for images + documents
- Updated tools-reference.md docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@buger buger merged commit a29158a into main Mar 20, 2026
13 checks passed
@buger buger deleted the feat/pdf-support branch March 20, 2026 19:06
@probelabs
Copy link
Contributor

probelabs bot commented Mar 20, 2026

PR Overview: Add readMedia Tool with PDF Support

Summary

This PR introduces a unified readMedia tool that extends the existing image loading functionality to support PDF documents. The implementation maintains full backward compatibility with the existing readImage tool while adding native PDF support through the Vercel AI SDK's file content part.

Files Changed Analysis

8 files changed with +375/-111 lines:

New Files

  • npm/src/agent/mediaConfig.js (+122 lines): Centralized configuration for all media formats (images + documents), replacing the image-specific config. Exports SUPPORTED_MEDIA_EXTENSIONS, MEDIA_MIME_TYPES, and utility functions (isImageExtension, isDocumentExtension, isFormatSupportedByProvider).

Modified Files

  • npm/src/agent/ProbeAgent.js (+125/-101): Core implementation changes

    • Renamed loadImageIfValid()loadMediaIfValid() (with backward-compatible alias)
    • Added MAX_DOCUMENT_FILE_SIZE constant (32MB)
    • New getCurrentMedia() method returns Vercel AI SDK content parts for both images and PDFs
    • Modified prepareMessagesWithImages() → handles both image and file content parts
    • Tool registration: readMedia added, readImage preserved as alias pointing to same execute function
    • PDFs stored as objects {type, mimeType, data, filename} in pendingImages Map
    • getCurrentImages() filters to return only image data URLs (backward compatible)
  • npm/tests/unit/readImageTool.test.js (+99/-3): Added 7 new PDF-specific tests

    • Verifies readMedia tool availability and shared execute function with readImage
    • Tests PDF loading, storage format, and getCurrentMedia() output
    • Confirms backward compatibility: getCurrentImages() excludes PDFs
    • Validates readImage alias works for PDFs
  • npm/src/tools/common.js (+4): Added readMediaSchema Zod schema

  • npm/src/tools/index.js (+1): Export readMediaSchema

  • npm/src/agent/tools.js (+2): Import and export readMediaSchema

  • npm/src/index.js (+2): Export readMediaSchema at package level

  • docs/probe-agent/sdk/tools-reference.md (+20/-7): Updated documentation to reflect readMedia as primary tool name with readImage as backward-compatible alias

Architecture & Impact Assessment

What This PR Accomplishes

  1. Unified Media Handling: Consolidates image and document loading under a single readMedia tool while preserving the existing readImage API

  2. Native PDF Support: PDFs are sent via Vercel AI SDK's file content part, leveraging native support from:

    • Claude (32MB limit)
    • OpenAI (50MB limit)
    • Gemini (50MB limit)
  3. Backward Compatibility: All existing readImage functionality remains intact:

    • readImage tool name works as alias
    • loadImageIfValid() method preserved as alias
    • getCurrentImages() returns only images (filters out PDFs)
    • readImageSchema still exported

Key Technical Changes

1. Media Storage Format

// Images stored as data URLs (backward compatible)
pendingImages.set('path/to/image.png', 'data:image/png;base64,iVBORw0KG...')

// PDFs stored as objects with metadata
pendingImages.set('path/to/doc.pdf', {
  type: 'document',
  mimeType: 'application/pdf',
  data: 'base64encodedcontent',
  filename: 'doc.pdf'
})

2. Content Part Generation

// getCurrentMedia() returns Vercel AI SDK compatible parts
[
  { type: 'image', image: 'data:image/png;base64,...' },
  { type: 'file', mediaType: 'application/pdf', data: 'base64...', filename: 'doc.pdf' }
]

3. Size Limits

  • Images: 20MB (existing MAX_IMAGE_FILE_SIZE)
  • Documents: 32MB (new MAX_DOCUMENT_FILE_SIZE)

Affected System Components

graph TD
    A[AI Agent] -->|calls| B[readMedia tool]
    A -->|calls| C[readImage tool alias]
    B --> D[readMediaExecute]
    C --> D
    D --> E[loadMediaIfValid]
    E --> F{Extension Check}
    F -->|image| G[Store as data URL]
    F -->|pdf| H[Store as document object]
    G --> I[pendingImages Map]
    H --> I
    I --> J[getCurrentMedia]
    J --> K[prepareMessagesWithImages]
    K --> L[Vercel AI SDK]
    L --> M[AI Provider]
    
    N[getCurrentImages] -.->|backward compat| I
    N -->|filters| O[Only data URLs]
Loading

Component Impact:

  • Tool Layer: New readMedia tool, readImage now alias
  • Config Layer: mediaConfig.js replaces imageConfig.js (latter kept for external consumers)
  • Storage Layer: pendingImages Map now supports heterogeneous data types
  • Message Preparation: prepareMessagesWithImages() handles both image and file content parts
  • API Surface: New exports (readMediaSchema, getCurrentMedia()) at all levels

Scope Discovery & Context Expansion

Direct Impact

  • Tool consumers: AI agents can now load PDFs using <readMedia><path>doc.pdf</path></readMedia>
  • API users: New getCurrentMedia() method for accessing all loaded media
  • Test suite: 7 new PDF-specific tests, all 3092 tests passing

Related Files (Not Modified)

Based on codebase analysis, these files interact with the media loading system:

Configuration & Validation:

  • npm/src/agent/imageConfig.js - Still exports legacy constants for backward compatibility

Message Flow:

  • npm/src/agent/ProbeAgent.js:3940,4497,4615 - prepareMessagesWithImages() call sites in answer() flow
  • npm/src/agent/ProbeAgent.js:2378-2440 - processImageReferences() for automatic image detection

Test Files:

  • npm/tests/unit/imagePathResolution.test.js - Uses getCurrentImages() in 5 locations
  • npm/tests/unit/*timeout*.test.js - Mock prepareMessagesWithImages() in 6 test files
  • examples/chat/test-agentic-image-loading.js - Example implementation

MCP Integration:

  • npm/src/agent/mcp/client.js:629-674 - Vercel AI SDK content part conversion (handles image type, may need review for file type)

Potential Follow-up Areas

  1. MCP Client: Verify toModelOutput() in mcp/client.js handles file content parts correctly
  2. Provider Documentation: Update provider-specific docs with PDF size limits
  3. Error Messages: Consider adding PDF-specific error messages for size/format issues
  4. Future Formats: Architecture supports easy addition of other document types (DOCX, etc.)

Backward Compatibility Guarantees

Fully backward compatible - all existing code continues to work:

  • readImage tool name works for both images and PDFs
  • loadImageIfValid() method preserved as alias
  • getCurrentImages() returns only image data URLs (excludes PDFs)
  • readImageSchema still exported
  • imageConfig.js still exports IMAGE_MIME_TYPES and SUPPORTED_IMAGE_EXTENSIONS

Test Coverage

  • 27 tests in readImageTool.test.js (7 new PDF-specific)
  • Full suite: 3092/3092 tests pass, 130 suites
  • New tests cover:
    • PDF loading and storage format
    • getCurrentMedia() output structure
    • Backward compatibility of getCurrentImages()
    • readImage alias functionality for PDFs

References

Modified Files:

  • npm/src/agent/ProbeAgent.js:961-992 - Tool registration and execute function
  • npm/src/agent/ProbeAgent.js:2510-2631 - loadMediaIfValid() implementation
  • npm/src/agent/ProbeAgent.js:2637-2677 - getCurrentMedia() and prepareMessagesWithImages()
  • npm/src/agent/mediaConfig.js:1-122 - New centralized media configuration
  • npm/tests/unit/readImageTool.test.js:369-467 - PDF-specific tests
  • npm/src/tools/common.js:71-73 - readMediaSchema definition

Related Files (Context):

  • npm/src/agent/imageConfig.js - Legacy image config (preserved)
  • npm/src/agent/ProbeAgent.js:2378-2440 - processImageReferences() for auto-detection
  • npm/src/agent/mcp/client.js:629-674 - Vercel AI SDK content part handling
Metadata
  • Review Effort: 2 / 5
  • Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-03-20T19:17:11.693Z | Triggered by: pr_opened | Commit: 40d489d

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link
Contributor

probelabs bot commented Mar 20, 2026

\n\n

Architecture Issues (8)

Severity Location Issue
🟠 Error npm/src/agent/ProbeAgent.js:2570-2590
The pendingImages Map now stores two different data types: strings (data URLs for images) and objects (document metadata for PDFs). This creates type inconsistency that makes the code harder to reason about and maintain. The method name 'pendingImages' is now misleading since it contains both images and documents.
💡 SuggestionRename pendingImages to pendingMedia and use a consistent data structure. Consider always storing objects with a type field: { type: 'image'|'document', mimeType, data, filename? }. This makes the data model explicit and easier to extend.
🔧 Suggested Fix
this.pendingMedia.set(mediaPath, { type: isDocumentExtension(extension) ? 'document' : 'image', mimeType, data: base64Data, filename: basename(mediaPath) });
🟠 Error npm/src/agent/ProbeAgent.js:2655-2690
The getCurrentMedia() method creates content parts with type: 'file' for PDFs, but this doesn't match the Vercel AI SDK's documented format. Based on the codebase, Vercel AI SDK uses type: 'image-data' with mediaType property for binary content. The type: 'file' format may not be supported.
💡 SuggestionVerify the correct Vercel AI SDK format for PDF/document files. It may need to be type: 'image-data' with mediaType: 'application/pdf' or a different format entirely. Check Vercel AI SDK documentation for file attachments.
🟠 Error npm/src/agent/mediaConfig.js:1-122
mediaConfig.js duplicates most of imageConfig.js's functionality (SUPPORTED_*_EXTENSIONS, MIME_TYPES, isFormatSupportedByProvider, etc.). This creates maintenance burden - changes must be made in two places. The PR description says imageConfig.js is 'kept for external consumers' but doesn't explain why duplication is necessary.
💡 SuggestionExtend imageConfig.js to support documents rather than creating a parallel module. Add SUPPORTED_DOCUMENT_EXTENSIONS and isDocumentExtension() to imageConfig.js, or rename it to mediaConfig.js and remove the old file entirely. The backward compatibility concern can be addressed with re-exports.
🟡 Warning npm/src/agent/ProbeAgent.js:2620-2625
getCurrentImages() filters pendingImages by typeof v === 'string' to exclude documents. This filtering logic is duplicated in getCurrentMedia() which checks entry.type === 'document'. The dual access patterns create maintenance burden and potential for bugs.
💡 SuggestionUse a single data structure with consistent typing. If backward compatibility for getCurrentImages() is required, have it filter by type field rather than typeof check: Array.from(this.pendingMedia.values()).filter(v => v.type === 'image').map(v => v.dataUrl)
🟡 Warning npm/src/agent/ProbeAgent.js:2630-2650
getCurrentMedia() has to handle two different data structures (strings and objects) from pendingImages Map, requiring conditional logic to check typeof entry and entry.type. This is a consequence of the inconsistent data model.
💡 SuggestionStandardize the data structure so all entries are objects with a type field. This eliminates the need for typeof checks and makes the code more predictable.
🟡 Warning npm/src/agent/ProbeAgent.js:964-1000
The readImage tool is implemented as an alias pointing to readMedia's execute function. While this maintains backward compatibility, it creates confusion because readImage can now load PDFs, which contradicts user expectations. The alias pattern also makes it harder to evolve the tools independently in the future.
💡 SuggestionConsider keeping readImage and readMedia as separate tools that share common logic via a helper function. This makes the distinction clear and allows independent evolution. Alternatively, document clearly that readImage is deprecated and will be removed in a future version.
🟡 Warning npm/src/agent/ProbeAgent.js:2513-2615
The method loadImageIfValid() was renamed to loadMediaIfValid() but the old name is kept as a backward-compatible alias. However, the method still uses variable names like 'imagePath' and comments mentioning 'image' throughout its implementation. This creates cognitive dissonance and makes the code harder to understand.
💡 SuggestionUpdate all variable names and comments in loadMediaIfValid() to use media/file terminology instead of image-specific terminology. This makes the code's purpose clear and reduces confusion.
🟡 Warning npm/src/agent/mediaConfig.js:82-122
The mediaConfig.js module exports several helper functions (getExtensionPattern, getMimeType, getSupportedExtensionsForProvider) that duplicate functionality already in imageConfig.js. Functions like getExtensionPattern and getMimeType are simple wrappers that don't add value.
💡 SuggestionRemove unused wrapper functions or consolidate them into a single configuration module. If the functions are needed, document their use cases with examples.

Performance Issues (5)

Severity Location Issue
🟡 Warning npm/src/agent/ProbeAgent.js:2570-2595
Documents stored as objects with metadata (type, mimeType, data, filename) consume more memory than the previous string-only storage for images. A 32MB PDF stored as base64 becomes ~43MB plus object overhead.
💡 SuggestionConsider streaming large documents instead of loading entirely into memory, or implement a size-based warning system for documents above a threshold.
🟡 Warning npm/src/agent/ProbeAgent.js:2570-2595
Base64 encoding increases file size by ~33%. For PDFs up to 32MB, this results in ~42MB of memory usage per document. Multiple PDFs could cause memory pressure.
💡 SuggestionDocument the memory implications and consider implementing a document count or total size limit in addition to per-file size limits.
🟡 Warning npm/src/agent/ProbeAgent.js:2620-2635
getCurrentMedia() iterates through all pendingImages entries on every call to filter and transform them. This is O(n) and called frequently during message preparation.
💡 SuggestionMaintain separate caches for images vs documents, or cache the transformed content parts and invalidate only when media is added/removed.
🟡 Warning npm/src/agent/ProbeAgent.js:2645-2680
prepareMessagesWithImages() calls getCurrentMedia() which converts Map values to array and creates new objects. Combined with the message cloning, this creates multiple temporary arrays per AI call.
💡 SuggestionConsider passing the Map directly or using iterators to avoid intermediate array creation.
🟡 Warning npm/src/agent/ProbeAgent.js:2620-2622
getCurrentImages() now filters pendingImages by type check (typeof v === 'string') on every call, adding O(n) overhead to what was previously a direct Array.from() conversion.
💡 SuggestionMaintain separate Maps for images and documents to avoid runtime type filtering.

Quality Issues (1)

Severity Location Issue
🟠 Error contract:0
Output schema validation failed: must have required property 'issues'

Powered by Visor from Probelabs

Last updated: 2026-03-20T19:16:53.074Z | Triggered by: pr_opened | Commit: 40d489d

💡 TIP: You can chat with Visor using /visor ask <your question>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant