Skip to content

[AI-010a] Real Embedding Generation #62

@quaid

Description

@quaid

[AI-010a] Real Embedding Generation

Story Points: 3
Epic: AI Integration
Dependencies: AI-001 (Ollama Service), AI-002 (Model Management), AI-004 (Vector Generation)
Branch: feature/AI-010a
Related: Split from original AI-010 (5 points) - Part 1 of 2


Description

Replace the stub VectorGenerator implementation with real embedding generation using Ollama embedding models. This story focuses on the core functionality of generating actual vector embeddings (not zeros) with proper ModelManager integration.

Current State: VectorGenerator exists but returns np.zeros(768) - a placeholder implementation.

Target State: VectorGenerator generates real embeddings using Ollama models with ModelManager integration.


User Stories

  • As a developer, I need real vector embeddings (not stubs) for accurate similarity search
  • As a system, I need integration with ModelManager for consistent model handling
  • As a user, I need embeddings generated within performance SLAs (<1s per fragment)

BDD Scenarios

Feature: Real Embedding Generation

Scenario: Generate embeddings for text
  Given I have text content to embed
  When I generate vector embeddings
  Then real embeddings are created with correct dimensions
  And performance is under 1s per fragment

Scenario: Integration with ModelManager
  Given ModelManager is available
  When I request embeddings with a specific model
  Then the model is loaded via ModelManager
  And embeddings are generated using that model

Scenario: Error handling
  Given a model is unavailable
  When attempting to generate embeddings
  Then an appropriate error is raised
  And the error message is descriptive

Acceptance Criteria

  • Real API calls to embedding models (no stubs or mocks)
  • Integration with ModelManager working correctly
  • Embeddings generated with correct dimensions (model-specific)
  • Performance under 1s per text fragment
  • Basic error handling implemented
  • Unit and integration tests passing (container-first TDD)
  • Documentation updated

Technical Approach

Current Implementation

class VectorGenerator:
    async def generate(self, text: str) -> np.ndarray:
        # Placeholder implementation
        return np.zeros(768, dtype=np.float32)

Target Implementation

from pseudoscribe.infrastructure.ollama_service import OllamaService
from pseudoscribe.infrastructure.model_manager import ModelManager

class VectorGenerator:
    def __init__(self, model_manager: ModelManager):
        self.model_manager = model_manager
        self.ollama_service = OllamaService()
    
    async def generate(self, text: str, model: str = "nomic-embed-text") -> np.ndarray:
        """Generate real embeddings using Ollama"""
        # Use Ollama embedding API
        response = await self.ollama_service.embed(model, text)
        embedding = np.array(response['embedding'], dtype=np.float32)
        return embedding

Key Changes

  1. Add OllamaService integration
  2. Connect to ModelManager from AI-002
  3. Use real embedding models (nomic-embed-text initially)
  4. Return actual embeddings, not zeros
  5. Proper async/await patterns
  6. Error handling for model unavailability

Test Strategy

Unit Tests

  • Embedding generation logic
  • Dimension validation
  • Error handling

Integration Tests

  • ModelManager integration
  • Ollama service connectivity
  • Real embedding generation

Performance Tests

  • <1s per text fragment SLA
  • Various text lengths
  • Model loading time

Container-First TDD

  • All tests run in Kubernetes environment
  • Real Ollama integration (no mocks)
  • Use tinyllama or nomic-embed-text models

Implementation Notes

Embedding Models

  • Primary: nomic-embed-text (768 dimensions)
  • Alternative: all-minilm-L6-v2 (384 dimensions)
  • Support single model initially (multi-model in AI-010b)

Dependencies

  • Requires Ollama service running (ollama-svc:11434)
  • Requires ModelManager from AI-002
  • Requires embedding model loaded in Ollama

Performance Considerations

  • Target: <1s per text fragment
  • Optimize for typical text lengths (100-500 words)
  • Async operations throughout

Definition of Done

  • Code reviewed and approved (2 reviewers)
  • All tests passing (container environment)
  • Performance SLAs met (<1s per fragment)
  • Documentation updated
  • No breaking changes to existing API
  • Ready for AI-010b (caching layer)

Related Issues

  • Follows: AI-010b (Embedding Cache & Multi-Model) - depends on this story
  • Original: This is part 1 of original AI-010 (5 points split into 3+2)

Estimated Effort

Story Points: 3
Time Estimate: 3-5 days
Complexity: Medium

Breakdown

  • Day 1: Ollama integration and basic generation
  • Day 2: ModelManager integration
  • Day 3: Error handling and testing
  • Days 4-5: Performance optimization and documentation

Priority: High
Type: Feature
Component: AI
Epic: AI Integration

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions