-
Notifications
You must be signed in to change notification settings - Fork 0
Description
[AI-010a] Real Embedding Generation
Story Points: 3
Epic: AI Integration
Dependencies: AI-001 (Ollama Service), AI-002 (Model Management), AI-004 (Vector Generation)
Branch: feature/AI-010a
Related: Split from original AI-010 (5 points) - Part 1 of 2
Description
Replace the stub VectorGenerator implementation with real embedding generation using Ollama embedding models. This story focuses on the core functionality of generating actual vector embeddings (not zeros) with proper ModelManager integration.
Current State: VectorGenerator exists but returns np.zeros(768) - a placeholder implementation.
Target State: VectorGenerator generates real embeddings using Ollama models with ModelManager integration.
User Stories
- As a developer, I need real vector embeddings (not stubs) for accurate similarity search
- As a system, I need integration with ModelManager for consistent model handling
- As a user, I need embeddings generated within performance SLAs (<1s per fragment)
BDD Scenarios
Feature: Real Embedding Generation
Scenario: Generate embeddings for text
Given I have text content to embed
When I generate vector embeddings
Then real embeddings are created with correct dimensions
And performance is under 1s per fragment
Scenario: Integration with ModelManager
Given ModelManager is available
When I request embeddings with a specific model
Then the model is loaded via ModelManager
And embeddings are generated using that model
Scenario: Error handling
Given a model is unavailable
When attempting to generate embeddings
Then an appropriate error is raised
And the error message is descriptiveAcceptance Criteria
- Real API calls to embedding models (no stubs or mocks)
- Integration with ModelManager working correctly
- Embeddings generated with correct dimensions (model-specific)
- Performance under 1s per text fragment
- Basic error handling implemented
- Unit and integration tests passing (container-first TDD)
- Documentation updated
Technical Approach
Current Implementation
class VectorGenerator:
async def generate(self, text: str) -> np.ndarray:
# Placeholder implementation
return np.zeros(768, dtype=np.float32)Target Implementation
from pseudoscribe.infrastructure.ollama_service import OllamaService
from pseudoscribe.infrastructure.model_manager import ModelManager
class VectorGenerator:
def __init__(self, model_manager: ModelManager):
self.model_manager = model_manager
self.ollama_service = OllamaService()
async def generate(self, text: str, model: str = "nomic-embed-text") -> np.ndarray:
"""Generate real embeddings using Ollama"""
# Use Ollama embedding API
response = await self.ollama_service.embed(model, text)
embedding = np.array(response['embedding'], dtype=np.float32)
return embeddingKey Changes
- Add OllamaService integration
- Connect to ModelManager from AI-002
- Use real embedding models (nomic-embed-text initially)
- Return actual embeddings, not zeros
- Proper async/await patterns
- Error handling for model unavailability
Test Strategy
Unit Tests
- Embedding generation logic
- Dimension validation
- Error handling
Integration Tests
- ModelManager integration
- Ollama service connectivity
- Real embedding generation
Performance Tests
- <1s per text fragment SLA
- Various text lengths
- Model loading time
Container-First TDD
- All tests run in Kubernetes environment
- Real Ollama integration (no mocks)
- Use tinyllama or nomic-embed-text models
Implementation Notes
Embedding Models
- Primary: nomic-embed-text (768 dimensions)
- Alternative: all-minilm-L6-v2 (384 dimensions)
- Support single model initially (multi-model in AI-010b)
Dependencies
- Requires Ollama service running (ollama-svc:11434)
- Requires ModelManager from AI-002
- Requires embedding model loaded in Ollama
Performance Considerations
- Target: <1s per text fragment
- Optimize for typical text lengths (100-500 words)
- Async operations throughout
Definition of Done
- Code reviewed and approved (2 reviewers)
- All tests passing (container environment)
- Performance SLAs met (<1s per fragment)
- Documentation updated
- No breaking changes to existing API
- Ready for AI-010b (caching layer)
Related Issues
- Follows: AI-010b (Embedding Cache & Multi-Model) - depends on this story
- Original: This is part 1 of original AI-010 (5 points split into 3+2)
Estimated Effort
Story Points: 3
Time Estimate: 3-5 days
Complexity: Medium
Breakdown
- Day 1: Ollama integration and basic generation
- Day 2: ModelManager integration
- Day 3: Error handling and testing
- Days 4-5: Performance optimization and documentation
Priority: High
Type: Feature
Component: AI
Epic: AI Integration