Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

Fix issue #3764: Implement lazy loading for knowledge sources to prevent authentication errors

Summary

This PR fixes a bug where adding knowledge sources to agents caused authentication errors (401) during initialization, even when users only wanted to provide context without persisting to a vector database.

Root Cause: Knowledge sources were being eagerly loaded and saved to storage during agent/crew initialization (agent.set_knowledge() and crew.create_crew_knowledge()), which required valid embedding service credentials. The default embedder uses OpenAI's API, causing 401 errors for users without proper credentials.

Solution: Implemented lazy loading pattern - knowledge sources are now loaded only when first queried, not during initialization. This allows agents to be created with knowledge sources without requiring immediate authentication.

Changes:

  • Added _sources_loaded private attribute to Knowledge class to track loading state
  • Modified Knowledge.query() to check and load sources on first query
  • Modified Knowledge.add_sources() to set the loaded flag
  • Removed eager add_sources() calls from agent.set_knowledge() and crew.create_crew_knowledge()
  • Added comprehensive test suite for lazy loading behavior

Review & Testing Checklist for Human

  • Critical: Test with missing/invalid OPENAI_API_KEY to verify 401 error is prevented during agent/crew initialization (the core bug fix)
  • Verify that error messages during lazy loading (first query) are clear and actionable for users
  • Check if there are any use cases that rely on knowledge sources being validated at initialization time (potential breaking change)
  • Consider thread safety: if knowledge sources might be queried from multiple threads before initial load, race conditions could occur

Recommended Test Plan

  1. Create an agent with TextFileKnowledgeSource without setting OPENAI_API_KEY
  2. Verify agent creation succeeds (previously would fail with 401)
  3. Try querying the knowledge - verify it fails gracefully with a clear error message
  4. Set valid credentials and verify querying works correctly
  5. Check that subsequent queries don't reload sources (performance)

Notes

  • This is a behavioral change: sources load on first query instead of at initialization
  • Error timing shifts: authentication/loading errors now occur during first query, not during initialization
  • All tests pass but mock the storage layer - real-world testing with actual credentials is essential
  • Thread safety not explicitly handled - concurrent first queries could trigger multiple load attempts

Link to Devin run: https://app.devin.ai/sessions/acca508411e24db9b564b7a427211553
Requested by: João ([email protected])

This commit fixes a bug where knowledge sources were being loaded eagerly
during agent/crew initialization, causing authentication errors (401) when
users didn't have proper credentials configured.

Changes:
- Modified Knowledge class to use lazy loading pattern
- Added _sources_loaded private attribute to track loading state
- Knowledge sources are now loaded only when first queried
- Removed eager add_sources() calls from agent.set_knowledge() and crew.create_crew_knowledge()
- Added comprehensive tests for lazy loading behavior

The fix ensures that:
1. Knowledge sources don't require authentication during initialization
2. Sources are loaded on-demand when actually needed (first query)
3. Subsequent queries don't reload sources
4. Explicit add_sources() calls still work as expected

Fixes #3764

Co-Authored-By: João <[email protected]>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants