Add gulf genkit validator #314

jacobsimionato · 2025-09-17T07:03:17Z

Summary of Changes

This pull request introduces a new Genkit evaluation project aimed at assessing the proficiency of various large language models in generating structured UI components according to a predefined JSON schema. It establishes a comprehensive framework for testing LLM outputs against both standard JSON schema validation and custom, more intricate validation rules, thereby ensuring the generated UI structures adhere to complex protocol constraints. The project facilitates a systematic comparison of different LLMs' capabilities in this domain.

Highlights

New Genkit Evaluation Project: A new directory packages/spikes/gulf_genkit_eval has been added, containing a Genkit-based project for evaluating the structured output capabilities of various large language models (LLMs).
Comprehensive UI Schemas: The project includes detailed JSON schemas for StreamHeader, ComponentUpdate, DataModelUpdate, and BeginRendering messages, defining a robust protocol for streaming UI components.
Custom Schema Validation: A custom TypeScript validator (src/validator.ts) has been implemented to enforce complex validation rules on generated UI components, ensuring uniqueness of IDs, reference integrity, and component-specific property requirements that cannot be easily expressed in standard JSON schema.
Multi-LLM Evaluation Framework: The core application (src/index.ts) sets up a flow to test different LLMs (OpenAI, Google AI, Anthropic) against a set of predefined prompts and schemas, providing a structured way to compare their performance in generating valid UI structures.
Documentation and Configuration: The project includes README.md for execution instructions, GEMINI.md detailing validation logic, and standard Genkit/TypeScript configuration files (genkit.conf.js, package.json, tsconfig.json, .gitignore).

Changelog

packages/spikes/gulf_genkit_eval/.gitignore
- Added standard ignore rules for Node.js artifacts and Genkit-specific files.
packages/spikes/gulf_genkit_eval/GEMINI.md
- Added detailed documentation outlining the Gemini Schema Validation Logic, covering rules for ComponentUpdate, DataModelUpdate, and BeginRendering messages.
packages/spikes/gulf_genkit_eval/README.md
- Added instructions for running the Genkit flow, including how to execute single tests with specific models and prompts, and how to control output verbosity.
packages/spikes/gulf_genkit_eval/genkit.conf.js
- Added Genkit configuration to integrate Google AI, OpenAI, and Anthropic plugins, set debug logging, and enable tracing and metrics.
packages/spikes/gulf_genkit_eval/lib/index.js
- Added compiled JavaScript file for the Genkit flow, defining the component generation flow and the main evaluation function. (Note: A review comment suggests removing this file from version control.)
packages/spikes/gulf_genkit_eval/package.json
- Added package definition including dependencies for Genkit, Google AI, OpenAI, Anthropic, and development dependencies for tsx and typescript.
packages/spikes/gulf_genkit_eval/src/begin_rendering.json
- Added JSON schema for the 'BeginRendering' message, specifying a required 'root' property and optional 'styles'.
packages/spikes/gulf_genkit_eval/src/component_update.json
- Added a comprehensive JSON schema for the 'ComponentUpdate' message, detailing various UI component types (e.g., Heading, Text, Row, Card, Button) and their specific properties and requirements.
packages/spikes/gulf_genkit_eval/src/data_model_update.json
- Added JSON schema for the 'DataModelUpdate' message, defining 'path' (optional) and 'contents' (required) properties for updating the data model.
packages/spikes/gulf_genkit_eval/src/index.ts
- Added TypeScript source for the main Genkit evaluation script. This file defines the componentGeneratorFlow, loads schemas, and orchestrates the testing process across multiple LLMs and prompts, incorporating custom validation logic.
packages/spikes/gulf_genkit_eval/src/models.ts
- Added TypeScript file defining the ModelConfiguration interface and an array modelsToTest, listing various LLM models from OpenAI, Google AI, and Anthropic with their respective configurations for evaluation.
packages/spikes/gulf_genkit_eval/src/prompts.ts
- Added TypeScript file defining the TestPrompt interface and an array prompts, containing various prompts for UI component generation scenarios and their associated schemas.
packages/spikes/gulf_genkit_eval/src/schema.json
- Added a master JSON schema for the A2A Streaming Protocol Message, serving as a central definition for StreamHeader, ComponentUpdate, DataModelUpdate, and UIRoot message types.
packages/spikes/gulf_genkit_eval/src/schema_brief.txt
- Added a text document outlining the requirements and core concepts for the JSON schema, including data binding, styles, actions, and a catalog of UI elements.
packages/spikes/gulf_genkit_eval/src/stream_header.json
- Added JSON schema for the 'StreamHeader' message, defining a required 'version' property.
packages/spikes/gulf_genkit_eval/src/validator.ts
- Added TypeScript file implementing the validateSchema function and several helper functions to perform custom validation rules on generated UI components, checking for uniqueness, reference integrity, and component-specific property adherence.
packages/spikes/gulf_genkit_eval/tsconfig.json
- Added TypeScript configuration file for the project.

Activity

jacobsimionato requested a summary of the pull request.
gemini-code-assist[bot] provided a critical review comment regarding the lib/index.js file being out of sync with its TypeScript source and suggested removing it and adding lib/ to .gitignore.
gemini-code-assist[bot] raised a high-priority concern about the unsafe handling of environment variables for API keys, recommending validation at startup.
gemini-code-assist[bot] noted a medium-priority issue with z.any() being used for model, config, and schema in componentGeneratorFlow's inputSchema, suggesting more specific Zod schemas for better type safety.
gemini-code-assist[bot] pointed out a medium-priority discrepancy in the README.md example command, where the prompt name generateDogUIs was incorrect and should be dogBreedGenerator.
gemini-code-assist[bot] suggested a medium-priority change for consistency, recommending renaming min_value and max_value to minValue and maxValue in the Slider component schema.
gemini-code-assist[bot] highlighted a medium-priority improvement for the validateComponent function, advising the addition of a default case to its switch statement to catch unknown component types.

jacobsimionato · 2025-09-17T07:03:43Z

/gemini summarize

gemini-code-assist

Code Review

This pull request introduces a new spike package, gulf_genkit_eval, to evaluate various large language models for generating UI components from a schema. The implementation includes a Genkit flow, JSON schemas for UI components, a set of prompts and models for testing, and a custom validator to check the generated output. The overall structure is well-organized for an experimental package. My review includes several suggestions to improve correctness, maintainability, and robustness. I've identified a critical issue with a stale compiled file being included in the commit, which should be addressed. Other feedback points to improving environment variable handling, fixing an error in the README, making the validator more robust, and enhancing type safety and consistency. Also, please note that the repository's style guide mentions a Pre-Review Checklist from the PR template, which seems to be missing from this pull request's description.[^1]

gemini-code-assist · 2025-09-17T07:05:00Z

packages/spikes/gulf_genkit_eval/lib/index.js

+"use strict";
+var __createBinding = (this && this.__createBinding) || (Object.create ? (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    var desc = Object.getOwnPropertyDescriptor(m, k);
+    if (!desc || ("get" in desc ? !m.__esModule : desc.writable || desc.configurable)) {
+      desc = { enumerable: true, get: function() { return m[k]; } };
+    }
+    Object.defineProperty(o, k2, desc);
+}) : (function(o, m, k, k2) {
+    if (k2 === undefined) k2 = k;
+    o[k2] = m[k];
+}));
+var __setModuleDefault = (this && this.__setModuleDefault) || (Object.create ? (function(o, v) {
+    Object.defineProperty(o, "default", { enumerable: true, value: v });
+}) : function(o, v) {
+    o["default"] = v;
+});
+var __importStar = (this && this.__importStar) || (function () {
+    var ownKeys = function(o) {
+        ownKeys = Object.getOwnPropertyNames || function (o) {
+            var ar = [];
+            for (var k in o) if (Object.prototype.hasOwnProperty.call(o, k)) ar[ar.length] = k;
+            return ar;
+        };
+        return ownKeys(o);
+    };
+    return function (mod) {
+        if (mod && mod.__esModule) return mod;
+        var result = {};
+        if (mod != null) for (var k = ownKeys(mod), i = 0; i < k.length; i++) if (k[i] !== "default") __createBinding(result, mod, k[i]);
+        __setModuleDefault(result, mod);
+        return result;
+    };
+})();
+Object.defineProperty(exports, "__esModule", { value: true });
+exports.componentGeneratorFlow = void 0;
+const google_genai_1 = require("@genkit-ai/google-genai");
+const genkit_1 = require("genkit");
+const fs = __importStar(require("fs"));
+const path = __importStar(require("path"));
+const openai_1 = require("@genkit-ai/compat-oai/openai");
+const genkitx_anthropic_1 = require("genkitx-anthropic");
+// Read the schema file
+const schemaString = fs.readFileSync(path.join(__dirname, 'schema.json'), 'utf-8');
+const schema = JSON.parse(schemaString);
+const ai = (0, genkit_1.genkit)({
+    plugins: [(0, google_genai_1.googleAI)({ apiKey: process.env.GEMINI_API_KEY }), (0, openai_1.openAI)(), (0, genkitx_anthropic_1.anthropic)({ apiKey: process.env.ANTHROPIC_API_KEY }),],
+});
+// Define a UI component generator flow
+exports.componentGeneratorFlow = ai.defineFlow({
+    name: 'componentGeneratorFlow',
+    inputSchema: genkit_1.z.object({ prompt: genkit_1.z.string(), model: genkit_1.z.any() }),
+    outputSchema: genkit_1.z.any(),
+}, async ({ prompt, model }) => {
+    // Generate structured component data using the schema from the file
+    const { output } = await ai.generate({
+        prompt,
+        model,
+        output: { jsonSchema: schema },
+        // config: {
+        //     thinkingConfig: { thinkingBudget: 0 }
+        // },
+    });
+    if (!output)
+        throw new Error('Failed to generate component');
+    return output;
+});
+// Run the flow
+async function main() {
+    const models = [
+        openai_1.openAI.model('gpt-5-mini'),
+        openai_1.openAI.model('gpt-5'),
+        openai_1.openAI.model('gpt-5-nano'),
+        google_genai_1.googleAI.model('gemini-2.5-flash'),
+        google_genai_1.googleAI.model('gemini-2.5-flash-lite'),
+        genkitx_anthropic_1.claude4Sonnet,
+        genkitx_anthropic_1.claude35Haiku,
+    ];
+    const prompt = `Generate a JSON conforming to the schema to describe the following UI:
+
+A root node has already been created with ID "root". You need to create a ComponentUpdate message now.
+
+A vertical list with:
+Dog breed information
+Dog generator
+
+The dog breed information is a card, which contains a title “Famous Dog breeds”, a header image, and a carousel of different dog breeds. The carousel information should be in the data model at /carousel.
+
+The dog generator is another card which is a form that generates a fictional dog breed with a description
+- Title
+- Description text explaining what it is
+- Dog breed name (text input)
+- Number of legs (number input)
+- Skills (checkboxes)
+- Button called “Generate” which takes the data above and generates a new dog description
+- A divider
+- A section which shows the generated content
+`;
+    for (const model of models) {
+        console.log(`Generating component with model: ${model.name}`);
+        const component = await (0, exports.componentGeneratorFlow)({
+            prompt,
+            model,
+        });
+        console.log(JSON.stringify(component, null, 2));
+    }
+}
+main().catch(console.error);


The compiled JavaScript file lib/index.js is out of sync with its TypeScript source src/index.ts. For example, lib/index.js reads a single schema.json file, while src/index.ts dynamically reads different schema files based on the prompt. This will lead to incorrect behavior. Compiled files should generally not be committed to version control. They should be generated as part of a build process. Please remove lib/index.js from the repository and add lib/ to your .gitignore file.

gemini-code-assist · 2025-09-17T07:05:00Z

packages/spikes/gulf_genkit_eval/src/index.ts

+import { validateSchema } from './validator';
+
+const ai = genkit({
+    plugins: [googleAI({ apiKey: process.env.GEMINI_API_KEY! }), openAI(), anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }),


The code uses a non-null assertion (!) for process.env.GEMINI_API_KEY and implicitly assumes process.env.ANTHROPIC_API_KEY and process.env.OPENAI_API_KEY are present. This can lead to runtime errors if these environment variables are not set. It's safer to validate all required environment variables at application startup and exit gracefully with a clear error message if any are missing.

gemini-code-assist · 2025-09-17T07:05:01Z

packages/spikes/gulf_genkit_eval/src/index.ts

+export const componentGeneratorFlow = ai.defineFlow(
+    {
+        name: 'componentGeneratorFlow',
+        inputSchema: z.object({ prompt: z.string(), model: z.any(), config: z.any().optional(), schema: z.any() }),


The inputSchema for componentGeneratorFlow uses z.any() for model, config, and schema. While convenient for a spike, this sacrifices type safety. Consider defining more specific Zod schemas for these inputs to improve code robustness and clarity. For example, schema could be validated as z.record(z.any()) or a more detailed schema definition that matches the structure of your JSON schemas.

gemini-code-assist · 2025-09-17T07:05:01Z

packages/spikes/gulf_genkit_eval/README.md

+To run the test with the `gpt-5-nano (reasoning: minimal)` model and the `generateDogUIs` prompt, use the following command:
+
+```bash
+npx tsx src/index.ts --model='gpt-5-nano (reasoning: minimal)' --prompt=generateDogUIs


The example command in the README uses --prompt=generateDogUIs, but the actual prompt name defined in src/prompts.ts is dogBreedGenerator. This will cause the command to fail. Please update the README to use the correct prompt name.

Suggested change

npx tsx src/index.ts --model='gpt-5-nano (reasoning: minimal)' --prompt=generateDogUIs

npx tsx src/index.ts --model='gpt-5-nano (reasoning: minimal)' --prompt=dogBreedGenerator

gemini-code-assist · 2025-09-17T07:05:01Z

packages/spikes/gulf_genkit_eval/src/component_update.json

+          "min_value": {
+            "type": "number",
+            "default": 0,
+            "description": "For 'Slider' type: The minimum value."
+          },
+          "max_value": {
+            "type": "number",
+            "default": 100,
+            "description": "For 'Slider' type: The maximum value."
+          }


The properties min_value and max_value for the Slider component use snake_case. This is inconsistent with the rest of the schema, which predominantly uses camelCase for property names (e.g., tabItems, entryPointChild). For consistency, please consider renaming them to minValue and maxValue.

Suggested change

"min_value": {

"type": "number",

"default": 0,

"description": "For 'Slider' type: The minimum value."

},

"max_value": {

"type": "number",

"default": 100,

"description": "For 'Slider' type: The maximum value."

}

"minValue": {

"type": "number",

"default": 0,

"description": "For 'Slider' type: The minimum value."

},

"maxValue": {

"type": "number",

"default": 100,

"description": "For 'Slider' type: The maximum value."

}

gemini-code-assist · 2025-09-17T07:05:01Z

packages/spikes/gulf_genkit_eval/src/validator.ts

+            break;
+        case 'Divider':
+            break;
+    }


The switch statement in validateComponent for component.type does not have a default case. If a component with an unknown type is encountered, it will be silently ignored by the validator. It would be better to add a default case to log an error for unexpected component types.

default: errors.push(`Unknown component type '${component.type}' for component ID '${component.id}'.`); }

gemini-code-assist · 2025-09-17T07:05:25Z

Summary of Changes

This pull request introduces a new Genkit evaluation project aimed at assessing the proficiency of various large language models in generating structured UI components according to a predefined JSON schema. It establishes a comprehensive framework for testing LLM outputs against both standard JSON schema validation and custom, more intricate validation rules, thereby ensuring the generated UI structures adhere to complex protocol constraints. The project facilitates a systematic comparison of different LLMs' capabilities in this domain.

Highlights

New Genkit Evaluation Project: A new directory packages/spikes/gulf_genkit_eval has been added, containing a Genkit-based project for evaluating the structured output capabilities of various large language models (LLMs).
Comprehensive UI Schemas: The project includes detailed JSON schemas for StreamHeader, ComponentUpdate, DataModelUpdate, and BeginRendering messages, defining a robust protocol for streaming UI components.
Custom Schema Validation: A custom TypeScript validator (src/validator.ts) has been implemented to enforce complex validation rules on generated UI components, ensuring uniqueness of IDs, reference integrity, and component-specific property requirements that cannot be easily expressed in standard JSON schema.
Multi-LLM Evaluation Framework: The core application (src/index.ts) sets up a flow to test different LLMs (OpenAI, Google AI, Anthropic) against a set of predefined prompts and schemas, providing a structured way to compare their performance in generating valid UI structures.
Documentation and Configuration: The project includes README.md for execution instructions, GEMINI.md detailing validation logic, and standard Genkit/TypeScript configuration files (genkit.conf.js, package.json, tsconfig.json, .gitignore).

Changelog

packages/spikes/gulf_genkit_eval/.gitignore
- Added standard ignore rules for Node.js artifacts and Genkit-specific files.
packages/spikes/gulf_genkit_eval/GEMINI.md
- Added detailed documentation outlining the Gemini Schema Validation Logic, covering rules for ComponentUpdate, DataModelUpdate, and BeginRendering messages.
packages/spikes/gulf_genkit_eval/README.md
- Added instructions for running the Genkit flow, including how to execute single tests with specific models and prompts, and how to control output verbosity.
packages/spikes/gulf_genkit_eval/genkit.conf.js
- Added Genkit configuration to integrate Google AI, OpenAI, and Anthropic plugins, set debug logging, and enable tracing and metrics.
packages/spikes/gulf_genkit_eval/lib/index.js
- Added compiled JavaScript file for the Genkit flow, defining the component generation flow and the main evaluation function. (Note: A review comment suggests removing this file from version control.)
packages/spikes/gulf_genkit_eval/package.json
- Added package definition including dependencies for Genkit, Google AI, OpenAI, Anthropic, and development dependencies for tsx and typescript.
packages/spikes/gulf_genkit_eval/src/begin_rendering.json
- Added JSON schema for the 'BeginRendering' message, specifying a required 'root' property and optional 'styles'.
packages/spikes/gulf_genkit_eval/src/component_update.json
- Added a comprehensive JSON schema for the 'ComponentUpdate' message, detailing various UI component types (e.g., Heading, Text, Row, Card, Button) and their specific properties and requirements.
packages/spikes/gulf_genkit_eval/src/data_model_update.json
- Added JSON schema for the 'DataModelUpdate' message, defining 'path' (optional) and 'contents' (required) properties for updating the data model.
packages/spikes/gulf_genkit_eval/src/index.ts
- Added TypeScript source for the main Genkit evaluation script. This file defines the componentGeneratorFlow, loads schemas, and orchestrates the testing process across multiple LLMs and prompts, incorporating custom validation logic.
packages/spikes/gulf_genkit_eval/src/models.ts
- Added TypeScript file defining the ModelConfiguration interface and an array modelsToTest, listing various LLM models from OpenAI, Google AI, and Anthropic with their respective configurations for evaluation.
packages/spikes/gulf_genkit_eval/src/prompts.ts
- Added TypeScript file defining the TestPrompt interface and an array prompts, containing various prompts for UI component generation scenarios and their associated schemas.
packages/spikes/gulf_genkit_eval/src/schema.json
- Added a master JSON schema for the A2A Streaming Protocol Message, serving as a central definition for StreamHeader, ComponentUpdate, DataModelUpdate, and UIRoot message types.
packages/spikes/gulf_genkit_eval/src/schema_brief.txt
- Added a text document outlining the requirements and core concepts for the JSON schema, including data binding, styles, actions, and a catalog of UI elements.
packages/spikes/gulf_genkit_eval/src/stream_header.json
- Added JSON schema for the 'StreamHeader' message, defining a required 'version' property.
packages/spikes/gulf_genkit_eval/src/validator.ts
- Added TypeScript file implementing the validateSchema function and several helper functions to perform custom validation rules on generated UI components, checking for uniqueness, reference integrity, and component-specific property adherence.
packages/spikes/gulf_genkit_eval/tsconfig.json
- Added TypeScript configuration file for the project.

Activity

jacobsimionato requested a summary of the pull request.
gemini-code-assist[bot] provided a critical review comment regarding the lib/index.js file being out of sync with its TypeScript source and suggested removing it and adding lib/ to .gitignore.
gemini-code-assist[bot] raised a high-priority concern about the unsafe handling of environment variables for API keys, recommending validation at startup.
gemini-code-assist[bot] noted a medium-priority issue with z.any() being used for model, config, and schema in componentGeneratorFlow's inputSchema, suggesting more specific Zod schemas for better type safety.
gemini-code-assist[bot] pointed out a medium-priority discrepancy in the README.md example command, where the prompt name generateDogUIs was incorrect and should be dogBreedGenerator.
gemini-code-assist[bot] suggested a medium-priority change for consistency, recommending renaming min_value and max_value to minValue and maxValue in the Slider component schema.
gemini-code-assist[bot] highlighted a medium-priority improvement for the validateComponent function, advising the addition of a default case to its switch statement to catch unknown component types.

gspencergoog

Thanks for creating this!

Add gulf genkit validator

61af90c

gemini-code-assist bot reviewed Sep 17, 2025

View reviewed changes

gspencergoog approved these changes Sep 17, 2025

View reviewed changes

gspencergoog merged commit 9639e0b into main Sep 17, 2025
2 checks passed

gspencergoog deleted the validator branch September 17, 2025 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add gulf genkit validator #314

Add gulf genkit validator #314

Uh oh!

jacobsimionato commented Sep 17, 2025 •

edited

Loading

Uh oh!

jacobsimionato commented Sep 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

gemini-code-assist bot Sep 17, 2025

Uh oh!

gemini-code-assist bot commented Sep 17, 2025

Uh oh!

gspencergoog left a comment

Uh oh!

Uh oh!

Uh oh!

	npx tsx src/index.ts --model='gpt-5-nano (reasoning: minimal)' --prompt=generateDogUIs
	npx tsx src/index.ts --model='gpt-5-nano (reasoning: minimal)' --prompt=dogBreedGenerator

Add gulf genkit validator #314

Add gulf genkit validator #314

Uh oh!

Conversation

jacobsimionato commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of Changes

Highlights

Uh oh!

jacobsimionato commented Sep 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot commented Sep 17, 2025

Summary of Changes

Highlights

Uh oh!

gspencergoog left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jacobsimionato commented Sep 17, 2025 •

edited

Loading