Skip to content

feat: use structured output to improve responses#227

Open
vegerot wants to merge 2 commits intoJerryZLiu:mainfrom
vegerot:pr227
Open

feat: use structured output to improve responses#227
vegerot wants to merge 2 commits intoJerryZLiu:mainfrom
vegerot:pr227

Conversation

@vegerot
Copy link
Contributor

@vegerot vegerot commented Mar 5, 2026

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors AI/LLM-related code to use structured output (JSON schemas) for Gemini API responses, and consolidates previously duplicated prompt templates, parsing helpers, and validation logic into shared utilities.

Changes:

  • Introduces LLMSchema.swift with JSON schema definitions for structured LLM output, and LLMPromptTemplates/LLMTranscriptUtilities/LLMTimelineCardValidation shared enums in TimeParsing.swift and GeminiPromptPreferences.swift
  • Renames GeminiPromptOverrides/GeminiPromptPreferences/GeminiPromptSections to provider-agnostic VideoPromptOverrides/VideoPromptPreferences/VideoPromptSections
  • Generalizes TestConnectionView to accept a provider parameter instead of being Gemini-only, and fixes card layout calculations in OnboardingLLMSelectionView

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
LLMSchema.swift New file defining JSON schemas for structured LLM responses
GeminiPromptPreferences.swift Renames Gemini-specific types to generic names; adds shared LLMPromptTemplates enum
GeminiDirectProvider.swift Replaces inline prompts and parsing with shared utilities; switches to structured output via responseJsonSchema
TimeParsing.swift Adds shared LLMVideoTimestampUtilities, LLMTimelineCardValidation, and LLMTranscriptUtilities
TestConnectionView.swift Adds provider parameter to support multiple providers
OnboardingLLMSelectionView.swift Fixes card width calculation to use dynamic provider count
SettingsProvidersTabView.swift Passes .gemini provider to TestConnectionView
ProvidersSettingsViewModel.swift Updates references from GeminiPromptPreferences to VideoPromptPreferences

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +998 to +999
let transcriptionSchemaObject = try! JSONSerialization.jsonObject(
with: Data(LLMSchema.screenRecordingTranscriptionSchema.utf8))
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using try! here will crash the app if LLMSchema.screenRecordingTranscriptionSchema ever contains invalid JSON. Since the schema is a string literal that could be malformed during development or future edits, this should use try with proper error handling instead of a force-try.

Suggested change
let transcriptionSchemaObject = try! JSONSerialization.jsonObject(
with: Data(LLMSchema.screenRecordingTranscriptionSchema.utf8))
guard let schemaData = LLMSchema.screenRecordingTranscriptionSchema.data(using: .utf8) else {
throw NSError(
domain: "GeminiError",
code: 7,
userInfo: [NSLocalizedDescriptionKey: "Invalid UTF-8 in transcription schema"]
)
}
let transcriptionSchemaObject: Any
do {
transcriptionSchemaObject = try JSONSerialization.jsonObject(with: schemaData, options: [])
} catch {
throw NSError(
domain: "GeminiError",
code: 8,
userInfo: [
NSLocalizedDescriptionKey: "Failed to parse transcription schema JSON",
NSUnderlyingErrorKey: error,
]
)
}

Copilot uses AI. Check for mistakes.
Comment on lines +1680 to 1687
let activityCardsSchemaObject = try? JSONSerialization.jsonObject(
with: Data(LLMSchema.activityCardsSchema.utf8))
let generationConfig: [String: Any] = [
"temperature": 0.7,
"maxOutputTokens": 8192,
"responseMimeType": "application/json",
"responseJsonSchema": activityCardsSchemaObject,
]
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activityCardsSchemaObject is Optional because try? is used, so it will be nil if JSON parsing fails. Inserting an optional value into the generationConfig dictionary means responseJsonSchema will be set to nil, which will likely be serialized as NSNull or cause the structured output to be silently skipped. The try! pattern used on line 998 (however inadvisable) at least exposes the failure; here it silently produces broken config. Consider using try with error propagation or at minimum force-unwrapping with a guard.

Copilot uses AI. Check for mistakes.
"mediaResolution": "MEDIA_RESOLUTION_HIGH",
"responseMimeType": "application/json",
"responseSchema": transcriptionSchema,
"responseJsonSchema": transcriptionSchemaObject,
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Gemini API uses "responseSchema" (not "responseJsonSchema") as the key for the structured output schema in its generationConfig. The previous code used "responseSchema" correctly. Changing it to "responseJsonSchema" will cause the API to ignore the schema and return unstructured output, breaking the downstream JSON parsing.

Suggested change
"responseJsonSchema": transcriptionSchemaObject,
"responseSchema": transcriptionSchemaObject,

Copilot uses AI. Check for mistakes.

enum GeminiPromptPreferences {
enum VideoPromptPreferences {
private static let overridesKey = "geminiPromptOverrides"
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UserDefaults key "geminiPromptOverrides" still references the old Gemini-specific naming after the rename to VideoPromptPreferences. While changing this would require a migration for existing users, the inconsistency is worth noting. Consider either documenting why the key is intentionally preserved, or adding a migration path.

Copilot uses AI. Check for mistakes.
Comment on lines +126 to 128
func finishFailure(_ message: String, errorCode: String? = nil) {
testResult = .failure(message)
onTestComplete?(false)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finishFailure updates testResult and calls onTestComplete, but it does not set isTesting = false. When finishFailure is called before isTesting is set to true (e.g., for the "unsupported provider" or "no API key" early-exit cases on lines 149–158), this is fine. However, when it's called inside the Task after isTesting = true has been set (line 184), isTesting is correctly set to false on line 183 before finishFailure is called. But if finishFailure is ever called after isTesting is set to true without the explicit isTesting = false on the preceding line, the testing spinner will never dismiss. The current code at line 183–184 sets isTesting = false separately before calling finishFailure, which is fragile. For consistency and safety, isTesting = false should be included inside finishFailure.

Copilot uses AI. Check for mistakes.
@vegerot
Copy link
Contributor Author

vegerot commented Mar 7, 2026

@JerryZLiu Please review. This PR fixes a bug where you're using the wrong field name to the Gemini API

vegerot and others added 2 commits March 6, 2026 16:12
- Extract shared prompt templates into LLMPromptTemplates (GeminiPromptPreferences.swift)
- Add VideoPromptPreferences/VideoPromptOverrides/VideoPromptSections types,
  replacing GeminiPromptPreferences/GeminiPromptOverrides/GeminiPromptSections
- Centralize transcript JSON decoding and observation conversion in
  LLMTranscriptUtilities (TimeParsing.swift) for reuse across providers
- Refactor GeminiDirectProvider to use LLMPromptTemplates and LLMTranscriptUtilities
- Refactor TestConnectionView to accept a provider parameter with
  finishFailure/finishSuccess helpers for clean multi-provider support
- Fix OnboardingLLMSelectionView card-width calculation to be dynamic
  based on card count rather than hard-coded divisor of 3
- Update SettingsProvidersTabView and ProvidersSettingsViewModel to use
  new VideoPrompt* types

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JerryZLiu
Copy link
Owner

I'm a bit confused about this PR. It looks much larger in scope than the title suggests.

Re: incorrect field name, Google supports the OpenAI-compatible responseSchema, https://ai.google.dev/api/generate-content.

@vegerot
Copy link
Contributor Author

vegerot commented Mar 9, 2026

@JerryZLiu

I'm a bit confused about this PR. It looks much larger in scope than the title suggests.

Sorry, this is a stacked PR. The specific commit is in dd71b08

Google supports the OpenAI-compatible responseSchema

Interesting. I didn't see that documented on https://ai.google.dev/gemini-api/docs/structured-output . Could responseSchema be deprecated or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants