[AI-010a] Real Embedding Generation

# [AI-010a] Real Embedding Generation

**Story Points**: 3  
**Epic**: AI Integration  
**Dependencies**: AI-001 (Ollama Service), AI-002 (Model Management), AI-004 (Vector Generation)  
**Branch**: feature/AI-010a  
**Related**: Split from original AI-010 (5 points) - Part 1 of 2

---

## Description

Replace the stub VectorGenerator implementation with real embedding generation using Ollama embedding models. This story focuses on the core functionality of generating actual vector embeddings (not zeros) with proper ModelManager integration.

**Current State**: VectorGenerator exists but returns `np.zeros(768)` - a placeholder implementation.

**Target State**: VectorGenerator generates real embeddings using Ollama models with ModelManager integration.

---

## User Stories

- As a developer, I need real vector embeddings (not stubs) for accurate similarity search
- As a system, I need integration with ModelManager for consistent model handling
- As a user, I need embeddings generated within performance SLAs (<1s per fragment)

---

## BDD Scenarios

```gherkin
Feature: Real Embedding Generation

Scenario: Generate embeddings for text
  Given I have text content to embed
  When I generate vector embeddings
  Then real embeddings are created with correct dimensions
  And performance is under 1s per fragment

Scenario: Integration with ModelManager
  Given ModelManager is available
  When I request embeddings with a specific model
  Then the model is loaded via ModelManager
  And embeddings are generated using that model

Scenario: Error handling
  Given a model is unavailable
  When attempting to generate embeddings
  Then an appropriate error is raised
  And the error message is descriptive
```

---

## Acceptance Criteria

- [ ] Real API calls to embedding models (no stubs or mocks)
- [ ] Integration with ModelManager working correctly
- [ ] Embeddings generated with correct dimensions (model-specific)
- [ ] Performance under 1s per text fragment
- [ ] Basic error handling implemented
- [ ] Unit and integration tests passing (container-first TDD)
- [ ] Documentation updated

---

## Technical Approach

### Current Implementation
```python
class VectorGenerator:
    async def generate(self, text: str) -> np.ndarray:
        # Placeholder implementation
        return np.zeros(768, dtype=np.float32)
```

### Target Implementation
```python
from pseudoscribe.infrastructure.ollama_service import OllamaService
from pseudoscribe.infrastructure.model_manager import ModelManager

class VectorGenerator:
    def __init__(self, model_manager: ModelManager):
        self.model_manager = model_manager
        self.ollama_service = OllamaService()
    
    async def generate(self, text: str, model: str = "nomic-embed-text") -> np.ndarray:
        """Generate real embeddings using Ollama"""
        # Use Ollama embedding API
        response = await self.ollama_service.embed(model, text)
        embedding = np.array(response['embedding'], dtype=np.float32)
        return embedding
```

### Key Changes
1. Add OllamaService integration
2. Connect to ModelManager from AI-002
3. Use real embedding models (nomic-embed-text initially)
4. Return actual embeddings, not zeros
5. Proper async/await patterns
6. Error handling for model unavailability

---

## Test Strategy

### Unit Tests
- Embedding generation logic
- Dimension validation
- Error handling

### Integration Tests
- ModelManager integration
- Ollama service connectivity
- Real embedding generation

### Performance Tests
- <1s per text fragment SLA
- Various text lengths
- Model loading time

### Container-First TDD
- All tests run in Kubernetes environment
- Real Ollama integration (no mocks)
- Use tinyllama or nomic-embed-text models

---

## Implementation Notes

### Embedding Models
- **Primary**: nomic-embed-text (768 dimensions)
- **Alternative**: all-minilm-L6-v2 (384 dimensions)
- Support single model initially (multi-model in AI-010b)

### Dependencies
- Requires Ollama service running (ollama-svc:11434)
- Requires ModelManager from AI-002
- Requires embedding model loaded in Ollama

### Performance Considerations
- Target: <1s per text fragment
- Optimize for typical text lengths (100-500 words)
- Async operations throughout

---

## Definition of Done

- [ ] Code reviewed and approved (2 reviewers)
- [ ] All tests passing (container environment)
- [ ] Performance SLAs met (<1s per fragment)
- [ ] Documentation updated
- [ ] No breaking changes to existing API
- [ ] Ready for AI-010b (caching layer)

---

## Related Issues

- **Follows**: AI-010b (Embedding Cache & Multi-Model) - depends on this story
- **Original**: This is part 1 of original AI-010 (5 points split into 3+2)

---

## Estimated Effort

**Story Points**: 3  
**Time Estimate**: 3-5 days  
**Complexity**: Medium

### Breakdown
- Day 1: Ollama integration and basic generation
- Day 2: ModelManager integration
- Day 3: Error handling and testing
- Days 4-5: Performance optimization and documentation

---

**Priority**: High  
**Type**: Feature  
**Component**: AI  
**Epic**: AI Integration


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AI-010a] Real Embedding Generation #62

[AI-010a] Real Embedding Generation

Description

User Stories

BDD Scenarios

Acceptance Criteria

Technical Approach

Current Implementation

Target Implementation

Key Changes

Test Strategy

Unit Tests

Integration Tests

Performance Tests

Container-First TDD

Implementation Notes

Embedding Models

Dependencies

Performance Considerations

Definition of Done

Related Issues

Estimated Effort

Breakdown

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[AI-010a] Real Embedding Generation #62

Description

[AI-010a] Real Embedding Generation

Description

User Stories

BDD Scenarios

Acceptance Criteria

Technical Approach

Current Implementation

Target Implementation

Key Changes

Test Strategy

Unit Tests

Integration Tests

Performance Tests

Container-First TDD

Implementation Notes

Embedding Models

Dependencies

Performance Considerations

Definition of Done

Related Issues

Estimated Effort

Breakdown

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions