feat: add readMedia tool with PDF support by buger · Pull Request #544 · probelabs/probe

buger · 2026-03-20T19:04:30Z

Summary

Add unified readMedia tool that handles both images (png, jpg, jpeg, webp, bmp, svg) and documents (pdf)
PDFs sent via Vercel AI SDK file content part, natively supported by Claude (32MB), OpenAI (50MB), and Gemini (50MB)
readImage preserved as backward-compatible alias pointing to the same execute function
New mediaConfig.js centralizes format/MIME config, replacing imageConfig.js (which is kept for external consumers)

Backward Compatibility

readImage tool name still works (alias)
loadImageIfValid() still works (alias to loadMediaIfValid())
getCurrentImages() still returns only image data URLs (filters out documents)
readImageSchema still exported at all levels

New APIs

getCurrentMedia() — returns Vercel AI SDK content parts for both images and documents
readMediaSchema — exported alongside readImageSchema
Documents stored as { type: 'document', mimeType, data, filename } objects in pendingImages

Test plan

All 27 readImage/readMedia tests pass (7 new PDF-specific tests)
Full test suite: 3092/3092 tests pass, 130 suites
Verify readImage alias works for both images and PDFs
Verify getCurrentImages() excludes PDFs (backward compat)
Verify getCurrentMedia() returns both image and file content parts
Verify PDF stored as document type with correct mimeType

🤖 Generated with Claude Code

…dImage alias Add a unified readMedia tool that handles both images (png, jpg, jpeg, webp, bmp, svg) and documents (pdf) via the Vercel AI SDK file content part. PDFs are supported natively by Claude (32MB), Gemini (50MB), and OpenAI (50MB). - readImage kept as backward-compatible alias pointing to the same execute function - Documents stored as { type: 'document', mimeType, data, filename } objects - Images stored as data URL strings (unchanged) - getCurrentMedia() returns Vercel AI SDK content parts for both types - getCurrentImages() filters to images only (backward compat) - loadMediaIfValid() replaces loadImageIfValid() (alias preserved) - New mediaConfig.js centralizes format/MIME config for images + documents - Updated tools-reference.md docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

probelabs · 2026-03-20T19:07:25Z

PR Overview: Add readMedia Tool with PDF Support

Summary

This PR introduces a unified readMedia tool that extends the existing image loading functionality to support PDF documents. The implementation maintains full backward compatibility with the existing readImage tool while adding native PDF support through the Vercel AI SDK's file content part.

Files Changed Analysis

8 files changed with +375/-111 lines:

New Files

npm/src/agent/mediaConfig.js (+122 lines): Centralized configuration for all media formats (images + documents), replacing the image-specific config. Exports SUPPORTED_MEDIA_EXTENSIONS, MEDIA_MIME_TYPES, and utility functions (isImageExtension, isDocumentExtension, isFormatSupportedByProvider).

Modified Files

npm/src/agent/ProbeAgent.js (+125/-101): Core implementation changes
- Renamed loadImageIfValid() → loadMediaIfValid() (with backward-compatible alias)
- Added MAX_DOCUMENT_FILE_SIZE constant (32MB)
- New getCurrentMedia() method returns Vercel AI SDK content parts for both images and PDFs
- Modified prepareMessagesWithImages() → handles both image and file content parts
- Tool registration: readMedia added, readImage preserved as alias pointing to same execute function
- PDFs stored as objects {type, mimeType, data, filename} in pendingImages Map
- getCurrentImages() filters to return only image data URLs (backward compatible)
npm/tests/unit/readImageTool.test.js (+99/-3): Added 7 new PDF-specific tests
- Verifies readMedia tool availability and shared execute function with readImage
- Tests PDF loading, storage format, and getCurrentMedia() output
- Confirms backward compatibility: getCurrentImages() excludes PDFs
- Validates readImage alias works for PDFs
npm/src/tools/common.js (+4): Added readMediaSchema Zod schema
npm/src/tools/index.js (+1): Export readMediaSchema
npm/src/agent/tools.js (+2): Import and export readMediaSchema
npm/src/index.js (+2): Export readMediaSchema at package level
docs/probe-agent/sdk/tools-reference.md (+20/-7): Updated documentation to reflect readMedia as primary tool name with readImage as backward-compatible alias

Architecture & Impact Assessment

What This PR Accomplishes

Unified Media Handling: Consolidates image and document loading under a single readMedia tool while preserving the existing readImage API
Native PDF Support: PDFs are sent via Vercel AI SDK's file content part, leveraging native support from:
- Claude (32MB limit)
- OpenAI (50MB limit)
- Gemini (50MB limit)
Backward Compatibility: All existing readImage functionality remains intact:
- readImage tool name works as alias
- loadImageIfValid() method preserved as alias
- getCurrentImages() returns only images (filters out PDFs)
- readImageSchema still exported

Key Technical Changes

1. Media Storage Format

// Images stored as data URLs (backward compatible)
pendingImages.set('path/to/image.png', 'data:image/png;base64,iVBORw0KG...')

// PDFs stored as objects with metadata
pendingImages.set('path/to/doc.pdf', {
  type: 'document',
  mimeType: 'application/pdf',
  data: 'base64encodedcontent',
  filename: 'doc.pdf'
})

2. Content Part Generation

// getCurrentMedia() returns Vercel AI SDK compatible parts
[
  { type: 'image', image: 'data:image/png;base64,...' },
  { type: 'file', mediaType: 'application/pdf', data: 'base64...', filename: 'doc.pdf' }
]

3. Size Limits

Images: 20MB (existing MAX_IMAGE_FILE_SIZE)
Documents: 32MB (new MAX_DOCUMENT_FILE_SIZE)

Affected System Components

graph TD
    A[AI Agent] -->|calls| B[readMedia tool]
    A -->|calls| C[readImage tool alias]
    B --> D[readMediaExecute]
    C --> D
    D --> E[loadMediaIfValid]
    E --> F{Extension Check}
    F -->|image| G[Store as data URL]
    F -->|pdf| H[Store as document object]
    G --> I[pendingImages Map]
    H --> I
    I --> J[getCurrentMedia]
    J --> K[prepareMessagesWithImages]
    K --> L[Vercel AI SDK]
    L --> M[AI Provider]
    
    N[getCurrentImages] -.->|backward compat| I
    N -->|filters| O[Only data URLs]

Component Impact:

Tool Layer: New readMedia tool, readImage now alias
Config Layer: mediaConfig.js replaces imageConfig.js (latter kept for external consumers)
Storage Layer: pendingImages Map now supports heterogeneous data types
Message Preparation: prepareMessagesWithImages() handles both image and file content parts
API Surface: New exports (readMediaSchema, getCurrentMedia()) at all levels

Scope Discovery & Context Expansion

Direct Impact

Tool consumers: AI agents can now load PDFs using <readMedia><path>doc.pdf</path></readMedia>
API users: New getCurrentMedia() method for accessing all loaded media
Test suite: 7 new PDF-specific tests, all 3092 tests passing

Related Files (Not Modified)

Based on codebase analysis, these files interact with the media loading system:

Configuration & Validation:

npm/src/agent/imageConfig.js - Still exports legacy constants for backward compatibility

Message Flow:

npm/src/agent/ProbeAgent.js:3940,4497,4615 - prepareMessagesWithImages() call sites in answer() flow
npm/src/agent/ProbeAgent.js:2378-2440 - processImageReferences() for automatic image detection

Test Files:

npm/tests/unit/imagePathResolution.test.js - Uses getCurrentImages() in 5 locations
npm/tests/unit/*timeout*.test.js - Mock prepareMessagesWithImages() in 6 test files
examples/chat/test-agentic-image-loading.js - Example implementation

MCP Integration:

npm/src/agent/mcp/client.js:629-674 - Vercel AI SDK content part conversion (handles image type, may need review for file type)

Potential Follow-up Areas

MCP Client: Verify toModelOutput() in mcp/client.js handles file content parts correctly
Provider Documentation: Update provider-specific docs with PDF size limits
Error Messages: Consider adding PDF-specific error messages for size/format issues
Future Formats: Architecture supports easy addition of other document types (DOCX, etc.)

Backward Compatibility Guarantees

✅ Fully backward compatible - all existing code continues to work:

readImage tool name works for both images and PDFs
loadImageIfValid() method preserved as alias
getCurrentImages() returns only image data URLs (excludes PDFs)
readImageSchema still exported
imageConfig.js still exports IMAGE_MIME_TYPES and SUPPORTED_IMAGE_EXTENSIONS

Test Coverage

27 tests in readImageTool.test.js (7 new PDF-specific)
Full suite: 3092/3092 tests pass, 130 suites
New tests cover:
- PDF loading and storage format
- getCurrentMedia() output structure
- Backward compatibility of getCurrentImages()
- readImage alias functionality for PDFs

References

Modified Files:

npm/src/agent/ProbeAgent.js:961-992 - Tool registration and execute function
npm/src/agent/ProbeAgent.js:2510-2631 - loadMediaIfValid() implementation
npm/src/agent/ProbeAgent.js:2637-2677 - getCurrentMedia() and prepareMessagesWithImages()
npm/src/agent/mediaConfig.js:1-122 - New centralized media configuration
npm/tests/unit/readImageTool.test.js:369-467 - PDF-specific tests
npm/src/tools/common.js:71-73 - readMediaSchema definition

Related Files (Context):

npm/src/agent/imageConfig.js - Legacy image config (preserved)
npm/src/agent/ProbeAgent.js:2378-2440 - processImageReferences() for auto-detection
npm/src/agent/mcp/client.js:629-674 - Vercel AI SDK content part handling

Metadata

Review Effort: 2 / 5
Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-03-20T19:17:11.693Z | Triggered by: pr_opened | Commit: 40d489d

💡 TIP: You can chat with Visor using /visor ask <your question>

probelabs · 2026-03-20T19:11:58Z

\n\n

Architecture Issues (8)

Severity	Location	Issue
🟠 Error	`npm/src/agent/ProbeAgent.js:2570-2590`	The pendingImages Map now stores two different data types: strings (data URLs for images) and objects (document metadata for PDFs). This creates type inconsistency that makes the code harder to reason about and maintain. The method name 'pendingImages' is now misleading since it contains both images and documents. 💡 Suggestion Rename pendingImages to pendingMedia and use a consistent data structure. Consider always storing objects with a type field: { type: 'image'\|'document', mimeType, data, filename? }. This makes the data model explicit and easier to extend. 🔧 Suggested Fix `this.pendingMedia.set(mediaPath, { type: isDocumentExtension(extension) ? 'document' : 'image', mimeType, data: base64Data, filename: basename(mediaPath) });`
🟠 Error	`npm/src/agent/ProbeAgent.js:2655-2690`	The getCurrentMedia() method creates content parts with type: 'file' for PDFs, but this doesn't match the Vercel AI SDK's documented format. Based on the codebase, Vercel AI SDK uses type: 'image-data' with mediaType property for binary content. The type: 'file' format may not be supported. 💡 Suggestion Verify the correct Vercel AI SDK format for PDF/document files. It may need to be type: 'image-data' with mediaType: 'application/pdf' or a different format entirely. Check Vercel AI SDK documentation for file attachments.
🟠 Error	`npm/src/agent/mediaConfig.js:1-122`	mediaConfig.js duplicates most of imageConfig.js's functionality (SUPPORTED__EXTENSIONS, MIME_TYPES, isFormatSupportedByProvider, etc.). This creates maintenance burden - changes must be made in two places. The PR description says imageConfig.js is 'kept for external consumers' but doesn't explain why duplication is necessary. 💡 Suggestion* Extend imageConfig.js to support documents rather than creating a parallel module. Add SUPPORTED_DOCUMENT_EXTENSIONS and isDocumentExtension() to imageConfig.js, or rename it to mediaConfig.js and remove the old file entirely. The backward compatibility concern can be addressed with re-exports.
🟡 Warning	`npm/src/agent/ProbeAgent.js:2620-2625`	getCurrentImages() filters pendingImages by typeof v === 'string' to exclude documents. This filtering logic is duplicated in getCurrentMedia() which checks entry.type === 'document'. The dual access patterns create maintenance burden and potential for bugs. 💡 Suggestion Use a single data structure with consistent typing. If backward compatibility for getCurrentImages() is required, have it filter by type field rather than typeof check: Array.from(this.pendingMedia.values()).filter(v => v.type === 'image').map(v => v.dataUrl)
🟡 Warning	`npm/src/agent/ProbeAgent.js:2630-2650`	getCurrentMedia() has to handle two different data structures (strings and objects) from pendingImages Map, requiring conditional logic to check typeof entry and entry.type. This is a consequence of the inconsistent data model. 💡 Suggestion Standardize the data structure so all entries are objects with a type field. This eliminates the need for typeof checks and makes the code more predictable.
🟡 Warning	`npm/src/agent/ProbeAgent.js:964-1000`	The readImage tool is implemented as an alias pointing to readMedia's execute function. While this maintains backward compatibility, it creates confusion because readImage can now load PDFs, which contradicts user expectations. The alias pattern also makes it harder to evolve the tools independently in the future. 💡 Suggestion Consider keeping readImage and readMedia as separate tools that share common logic via a helper function. This makes the distinction clear and allows independent evolution. Alternatively, document clearly that readImage is deprecated and will be removed in a future version.
🟡 Warning	`npm/src/agent/ProbeAgent.js:2513-2615`	The method loadImageIfValid() was renamed to loadMediaIfValid() but the old name is kept as a backward-compatible alias. However, the method still uses variable names like 'imagePath' and comments mentioning 'image' throughout its implementation. This creates cognitive dissonance and makes the code harder to understand. 💡 Suggestion Update all variable names and comments in loadMediaIfValid() to use media/file terminology instead of image-specific terminology. This makes the code's purpose clear and reduces confusion.
🟡 Warning	`npm/src/agent/mediaConfig.js:82-122`	The mediaConfig.js module exports several helper functions (getExtensionPattern, getMimeType, getSupportedExtensionsForProvider) that duplicate functionality already in imageConfig.js. Functions like getExtensionPattern and getMimeType are simple wrappers that don't add value. 💡 Suggestion Remove unused wrapper functions or consolidate them into a single configuration module. If the functions are needed, document their use cases with examples.

Performance Issues (5)

Severity	Location	Issue
🟡 Warning	`npm/src/agent/ProbeAgent.js:2570-2595`	Documents stored as objects with metadata (type, mimeType, data, filename) consume more memory than the previous string-only storage for images. A 32MB PDF stored as base64 becomes ~43MB plus object overhead. 💡 Suggestion Consider streaming large documents instead of loading entirely into memory, or implement a size-based warning system for documents above a threshold.
🟡 Warning	`npm/src/agent/ProbeAgent.js:2570-2595`	Base64 encoding increases file size by ~33%. For PDFs up to 32MB, this results in ~42MB of memory usage per document. Multiple PDFs could cause memory pressure. 💡 Suggestion Document the memory implications and consider implementing a document count or total size limit in addition to per-file size limits.
🟡 Warning	`npm/src/agent/ProbeAgent.js:2620-2635`	getCurrentMedia() iterates through all pendingImages entries on every call to filter and transform them. This is O(n) and called frequently during message preparation. 💡 Suggestion Maintain separate caches for images vs documents, or cache the transformed content parts and invalidate only when media is added/removed.
🟡 Warning	`npm/src/agent/ProbeAgent.js:2645-2680`	prepareMessagesWithImages() calls getCurrentMedia() which converts Map values to array and creates new objects. Combined with the message cloning, this creates multiple temporary arrays per AI call. 💡 Suggestion Consider passing the Map directly or using iterators to avoid intermediate array creation.
🟡 Warning	`npm/src/agent/ProbeAgent.js:2620-2622`	getCurrentImages() now filters pendingImages by type check (typeof v === 'string') on every call, adding O(n) overhead to what was previously a direct Array.from() conversion. 💡 Suggestion Maintain separate Maps for images and documents to avoid runtime type filtering.

Quality Issues (1)

Severity	Location	Issue
🟠 Error	`contract:0`	Output schema validation failed: must have required property 'issues'

Powered by Visor from Probelabs

Last updated: 2026-03-20T19:16:53.074Z | Triggered by: pr_opened | Commit: 40d489d

💡 TIP: You can chat with Visor using /visor ask <your question>

buger merged commit a29158a into main Mar 20, 2026
13 checks passed

buger deleted the feat/pdf-support branch March 20, 2026 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add readMedia tool with PDF support#544

feat: add readMedia tool with PDF support#544
buger merged 1 commit intomainfrom
feat/pdf-support

buger commented Mar 20, 2026

Uh oh!

Uh oh!

probelabs bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

probelabs bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buger commented Mar 20, 2026

Summary

Backward Compatibility

New APIs

Test plan

Uh oh!

Uh oh!

probelabs bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Overview: Add readMedia Tool with PDF Support

Summary

Files Changed Analysis

New Files

Modified Files

Architecture & Impact Assessment

What This PR Accomplishes

Key Technical Changes

Affected System Components

Scope Discovery & Context Expansion

Direct Impact

Related Files (Not Modified)

Potential Follow-up Areas

Backward Compatibility Guarantees

Test Coverage

References

Uh oh!

probelabs bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Architecture Issues (8)

Performance Issues (5)

Quality Issues (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

probelabs bot commented Mar 20, 2026 •

edited

Loading

probelabs bot commented Mar 20, 2026 •

edited

Loading