-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
When importing documents, automatically generate summaries and tags to give the agent better context during discovery queries.
Motivation
Currently the agent must read full document text to understand content. Pre-computed summaries and tags would:
- Speed up agent queries (scan summaries instead of full text)
- Improve semantic matching (summaries capture key themes)
- Enable tag-based filtering before search
- Sync to peers via iroh-docs (shared context)
Proposed Implementation
Metadata Extension
Add summary and tags fields to document metadata:
{
"name": "paper.pdf",
"pdf_hash": "...",
"text_hash": "...",
"summary": "Analysis of Arctic ice loss 2010-2023...",
"tags": ["climate", "arctic", "data-analysis"],
"created_at": "..."
}Background Processing
- Import document immediately (don't block on summarization)
- Queue summarization as background task
- Show "processing" state in UI
- Document is searchable by full text immediately; summary enhances discovery once ready
Tag Generation
- LLM-suggested topic tags
- Entity extraction (people, organizations, dates) as structured metadata
- Useful for investigative journalism workflows
Considerations
- Batch imports: Process queue with progress indicator
- Re-summarization: Allow regenerating summaries after model upgrade
- Sync: Summaries sync via iroh-docs, so peers benefit without re-processing
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels