-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Vague requirements
- P1: Allow the model to render custom components that it couldn't render before. E.g. not just diffs, multiple choice etc. Other components.
- P1: Have UI to take on a persistent, full-screen, "pinned" approach where longer-running tasks and structures can be persisted even as more granular tasks come and go.
- P2: Low impact on existing Gemini CLI codebase, easy to maintain our UI "extension" even as the Gemini CLI codebase changes
Implementation ideas
Low impact option: Create an MCP server that exposes Gen UI as a tool for Gemini CLI
We could implement an MCP server in Dart which exposes methods to drive a custom UI layer. It would run locally and communicate via stdin/out or via HTTP.
Advantages:
- Can combine chat + UI, as we don't hijack regular output
- Can drive custom UI components with arbitary schemas
- Very modular, doesn't require any changes to Gemini CLI codebase, uses official extension point
Concerns
- Might be hard to persuade Gemini CLI to use the Gen UI output tool consistently
Higher performance option: Update system prompt to replace text output with UI output
We could figure out how Gemini CLI drives its current UI layer, and try to replace it with some Gen UI approach e.g. structured output.
Notes from using Gemini CLI to explore it's own codebase:
Details
After reviewing packages/cli/src/ui/commands/types.ts and packages/cli/src/ui/types.ts, I
have a much clearer understanding of the architecture.
Here's a breakdown of how the UI layer connects to the data layer in the Gemini CLI:
The Core Bridge: CommandContext
The primary interface connecting the data/logic layer to the UI layer is CommandContext.
This object is passed to every slash command's action handler. It provides a well-defined
API for commands to interact with the UI without being tightly coupled to the specific
rendering implementation (which is currently Ink-based React).
The ui property within CommandContext is the most critical part. It exposes a set of
functions that a command can call to affect the user interface:
- addItem(item: HistoryItemWithoutId): This is the main function for adding content to the
display. HistoryItemWithoutId is a union type that can represent various kinds of
content, such as user input, Gemini's responses, tool outputs, error messages, and more. - setPendingItem(item: HistoryItemWithoutId | null): Used to display a temporary item
while a long-running operation is in progress. - clear(): Clears the entire UI.
- setDebugMessage(message: string): Displays a debug message, likely in a footer or status
bar. - loadHistory(history: HistoryItemWithoutId[]): Replaces the current session history with
a new one.
The Communication Protocol: HistoryItem
The HistoryItem type (and its HistoryItemWithoutId variant) defines the "protocol" for the
data that is sent to the UI for rendering. It's a discriminated union based on the type
property. This allows the UI to handle different types of content appropriately. Some of
the key HistoryItem types are:
- user: Represents a prompt entered by the user.
- gemini or gemini_content: Represents a response from the Gemini model.
- tool_group: This is particularly important. It represents the output of one or more
tools. It contains an array of IndividualToolCallDisplay objects, each with details
about the tool call's status (pending, executing, success, error), its results
(resultDisplay), and any confirmation details. This is how the UI can render rich tool
outputs, including live updates for streaming commands. - info, error: For displaying informational or error messages.
- quit: A special type to signal the application to exit.
- confirm_action, confirm_shell_commands: These are not HistoryItems themselves but are
returned by command actions to signal to the UI that a confirmation prompt needs to be
displayed. The UI then handles the user's response and re-invokes the original command
with the confirmation result.
Key Interfaces and Files
-
packages/cli/src/ui/commands/types.ts
: This file is the most important one. It defines
the CommandContext interface and the various SlashCommandActionReturn types that a
command's action can return to signal different UI interactions (like quitting, opening a
dialog, or submitting a prompt). -
packages/cli/src/ui/types.ts
: This file defines the HistoryItem union type and all its
variants, which represent the different kinds of content that can be displayed in the UI.
It also defines IndividualToolCallDisplay, which is crucial for rendering tool outputs. -
packages/cli/src/ui/hooks/useGeminiStream.ts
: This React hook is a good example of the
consumer side of this protocol. It receives HistoryItem objects and uses them to update
the React state, which in turn causes the UI to re-render.
How to Integrate a New Renderer
To replace the existing Ink-based renderer, you would need to create a new UI layer that
consumes the CommandContext and renders the HistoryItem objects. The best place to
integrate would be to create a new implementation of the main application component
(currently likely in packages/cli/src/ui/App.tsx or a similar file) that:
- Provides an implementation of the
ui
object inCommandContext
: Your new renderer
would need to provide functions for addItem, clear, etc., that update your new UI
instead of the React/Ink components. - Renders
HistoryItem
objects: Your new UI would need to have components or rendering
logic for each of the different HistoryItem types. For example, it would need to know
how to display a tool_group with its individual tool calls, how to render Markdown from
a gemini_content item, and how to display error and info messages. - Handles confirmation prompts: When a command returns a confirm_shell_commands or
confirm_action result, your UI would need to display the appropriate prompt to the user
and then re-execute the command based on their response.
By adhering to the CommandContext and HistoryItem interfaces, you can effectively decouple
your new renderer from the core business logic of the Gemini CLI, just as the current
Ink-based UI is.
How are text output diffs generated by the LLM? Is it through tool calls? Where is it? │
╰────────────────────────────────────────────────────────────────────────────────────────────╯
✦ You've hit on a key feature. Based on my analysis, text diffs are not generated directly by
the LLM's text output. Instead, they are the result of specific tool calls that modify the
file system.
Here's the workflow and where to find the relevant code:
-
The LLM Decides to Change a File: The model determines that the best way to fulfill the
user's request is to modify a file. It does this by issuing a tool call for either
replace or write_file. The arguments for this tool call include the file path and the
complete new content for the file (or the section to be replaced). -
The Tool Executes: The CLI's tool scheduler receives this request and executes the
appropriate tool. The tool performs the file modification in memory. -
The Tool Returns a Special Display Object: This is the crucial step. Instead of just
returning a simple "success" message, the replace and write_file tools are designed to
return a special object that includes the original content, the new content, and the
file name. -
The UI Renders the Diff: The UI layer receives this special object and uses a dedicated
component to calculate and render the diff.
Key Code Locations:
packages/cli/src/zed-integration/zedIntegration.ts
: This file shows how the tool
result is packaged for display. When a tool like replace or write_file succeeds, it
returns a returnDisplay object with a specific structure.
1 // In packages/cli/src/zed-integration/zedIntegration.ts
2 // (Inside the `executeTool` method or similar logic)
3
4 if ('fileName' in toolResult.returnDisplay) {
5 // This is a diff object
6 const diff_params = {
7 path: toolResult.returnDisplay.fileName,
8 oldText: toolResult.returnDisplay.originalContent,
9 newText: toolResult.returnDisplay.newContent,
10 };
11 // ...sends this to the UI for rendering
12 }
-
packages/cli/src/ui/hooks/useReactToolScheduler.ts
: This hook is responsible for
managing the state of tool calls for the UI. The mapToDisplay function within this file
transforms the raw tool call information into a format the UI components can render.
It looks for the special returnDisplay object and passes it along. -
packages/cli/src/ui/types.ts
: This file defines the data structures. The
ToolResultDisplay type is a union that can include the object containing fileName,
originalContent, and newContent, which signals to the UI that a diff should be
rendered.
In summary, the LLM doesn't generate the diff itself. It generates the intent to change a
file via a tool call, and the client-side application uses the tool's output to create and
display the diff to the user. This is a much more robust and reliable way to handle file
modifications.