feat: use structured output to improve responses#227
feat: use structured output to improve responses#227vegerot wants to merge 2 commits intoJerryZLiu:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors AI/LLM-related code to use structured output (JSON schemas) for Gemini API responses, and consolidates previously duplicated prompt templates, parsing helpers, and validation logic into shared utilities.
Changes:
- Introduces
LLMSchema.swiftwith JSON schema definitions for structured LLM output, andLLMPromptTemplates/LLMTranscriptUtilities/LLMTimelineCardValidationshared enums inTimeParsing.swiftandGeminiPromptPreferences.swift - Renames
GeminiPromptOverrides/GeminiPromptPreferences/GeminiPromptSectionsto provider-agnosticVideoPromptOverrides/VideoPromptPreferences/VideoPromptSections - Generalizes
TestConnectionViewto accept aproviderparameter instead of being Gemini-only, and fixes card layout calculations inOnboardingLLMSelectionView
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
LLMSchema.swift |
New file defining JSON schemas for structured LLM responses |
GeminiPromptPreferences.swift |
Renames Gemini-specific types to generic names; adds shared LLMPromptTemplates enum |
GeminiDirectProvider.swift |
Replaces inline prompts and parsing with shared utilities; switches to structured output via responseJsonSchema |
TimeParsing.swift |
Adds shared LLMVideoTimestampUtilities, LLMTimelineCardValidation, and LLMTranscriptUtilities |
TestConnectionView.swift |
Adds provider parameter to support multiple providers |
OnboardingLLMSelectionView.swift |
Fixes card width calculation to use dynamic provider count |
SettingsProvidersTabView.swift |
Passes .gemini provider to TestConnectionView |
ProvidersSettingsViewModel.swift |
Updates references from GeminiPromptPreferences to VideoPromptPreferences |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let transcriptionSchemaObject = try! JSONSerialization.jsonObject( | ||
| with: Data(LLMSchema.screenRecordingTranscriptionSchema.utf8)) |
There was a problem hiding this comment.
Using try! here will crash the app if LLMSchema.screenRecordingTranscriptionSchema ever contains invalid JSON. Since the schema is a string literal that could be malformed during development or future edits, this should use try with proper error handling instead of a force-try.
| let transcriptionSchemaObject = try! JSONSerialization.jsonObject( | |
| with: Data(LLMSchema.screenRecordingTranscriptionSchema.utf8)) | |
| guard let schemaData = LLMSchema.screenRecordingTranscriptionSchema.data(using: .utf8) else { | |
| throw NSError( | |
| domain: "GeminiError", | |
| code: 7, | |
| userInfo: [NSLocalizedDescriptionKey: "Invalid UTF-8 in transcription schema"] | |
| ) | |
| } | |
| let transcriptionSchemaObject: Any | |
| do { | |
| transcriptionSchemaObject = try JSONSerialization.jsonObject(with: schemaData, options: []) | |
| } catch { | |
| throw NSError( | |
| domain: "GeminiError", | |
| code: 8, | |
| userInfo: [ | |
| NSLocalizedDescriptionKey: "Failed to parse transcription schema JSON", | |
| NSUnderlyingErrorKey: error, | |
| ] | |
| ) | |
| } |
| let activityCardsSchemaObject = try? JSONSerialization.jsonObject( | ||
| with: Data(LLMSchema.activityCardsSchema.utf8)) | ||
| let generationConfig: [String: Any] = [ | ||
| "temperature": 0.7, | ||
| "maxOutputTokens": 8192, | ||
| "responseMimeType": "application/json", | ||
| "responseJsonSchema": activityCardsSchemaObject, | ||
| ] |
There was a problem hiding this comment.
activityCardsSchemaObject is Optional because try? is used, so it will be nil if JSON parsing fails. Inserting an optional value into the generationConfig dictionary means responseJsonSchema will be set to nil, which will likely be serialized as NSNull or cause the structured output to be silently skipped. The try! pattern used on line 998 (however inadvisable) at least exposes the failure; here it silently produces broken config. Consider using try with error propagation or at minimum force-unwrapping with a guard.
| "mediaResolution": "MEDIA_RESOLUTION_HIGH", | ||
| "responseMimeType": "application/json", | ||
| "responseSchema": transcriptionSchema, | ||
| "responseJsonSchema": transcriptionSchemaObject, |
There was a problem hiding this comment.
The Gemini API uses "responseSchema" (not "responseJsonSchema") as the key for the structured output schema in its generationConfig. The previous code used "responseSchema" correctly. Changing it to "responseJsonSchema" will cause the API to ignore the schema and return unstructured output, breaking the downstream JSON parsing.
| "responseJsonSchema": transcriptionSchemaObject, | |
| "responseSchema": transcriptionSchemaObject, |
|
|
||
| enum GeminiPromptPreferences { | ||
| enum VideoPromptPreferences { | ||
| private static let overridesKey = "geminiPromptOverrides" |
There was a problem hiding this comment.
The UserDefaults key "geminiPromptOverrides" still references the old Gemini-specific naming after the rename to VideoPromptPreferences. While changing this would require a migration for existing users, the inconsistency is worth noting. Consider either documenting why the key is intentionally preserved, or adding a migration path.
| func finishFailure(_ message: String, errorCode: String? = nil) { | ||
| testResult = .failure(message) | ||
| onTestComplete?(false) |
There was a problem hiding this comment.
finishFailure updates testResult and calls onTestComplete, but it does not set isTesting = false. When finishFailure is called before isTesting is set to true (e.g., for the "unsupported provider" or "no API key" early-exit cases on lines 149–158), this is fine. However, when it's called inside the Task after isTesting = true has been set (line 184), isTesting is correctly set to false on line 183 before finishFailure is called. But if finishFailure is ever called after isTesting is set to true without the explicit isTesting = false on the preceding line, the testing spinner will never dismiss. The current code at line 183–184 sets isTesting = false separately before calling finishFailure, which is fragile. For consistency and safety, isTesting = false should be included inside finishFailure.
|
@JerryZLiu Please review. This PR fixes a bug where you're using the wrong field name to the Gemini API |
- Extract shared prompt templates into LLMPromptTemplates (GeminiPromptPreferences.swift) - Add VideoPromptPreferences/VideoPromptOverrides/VideoPromptSections types, replacing GeminiPromptPreferences/GeminiPromptOverrides/GeminiPromptSections - Centralize transcript JSON decoding and observation conversion in LLMTranscriptUtilities (TimeParsing.swift) for reuse across providers - Refactor GeminiDirectProvider to use LLMPromptTemplates and LLMTranscriptUtilities - Refactor TestConnectionView to accept a provider parameter with finishFailure/finishSuccess helpers for clean multi-provider support - Fix OnboardingLLMSelectionView card-width calculation to be dynamic based on card count rather than hard-coded divisor of 3 - Update SettingsProvidersTabView and ProvidersSettingsViewModel to use new VideoPrompt* types Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
I'm a bit confused about this PR. It looks much larger in scope than the title suggests. Re: incorrect field name, Google supports the OpenAI-compatible responseSchema, https://ai.google.dev/api/generate-content. |
Sorry, this is a stacked PR. The specific commit is in dd71b08
Interesting. I didn't see that documented on https://ai.google.dev/gemini-api/docs/structured-output . Could |
Stack created with Sapling. Best reviewed with ReviewStack.