Skip to content

Figure out how to integrate a custom UI layer into Gemini CLI #308

@jacobsimionato

Description

@jacobsimionato

Vague requirements

  • P1: Allow the model to render custom components that it couldn't render before. E.g. not just diffs, multiple choice etc. Other components.
  • P1: Have UI to take on a persistent, full-screen, "pinned" approach where longer-running tasks and structures can be persisted even as more granular tasks come and go.
  • P2: Low impact on existing Gemini CLI codebase, easy to maintain our UI "extension" even as the Gemini CLI codebase changes

Implementation ideas

Low impact option: Create an MCP server that exposes Gen UI as a tool for Gemini CLI

We could implement an MCP server in Dart which exposes methods to drive a custom UI layer. It would run locally and communicate via stdin/out or via HTTP.

Advantages:

  • Can combine chat + UI, as we don't hijack regular output
  • Can drive custom UI components with arbitary schemas
  • Very modular, doesn't require any changes to Gemini CLI codebase, uses official extension point

Concerns

  • Might be hard to persuade Gemini CLI to use the Gen UI output tool consistently

Higher performance option: Update system prompt to replace text output with UI output

We could figure out how Gemini CLI drives its current UI layer, and try to replace it with some Gen UI approach e.g. structured output.

Notes from using Gemini CLI to explore it's own codebase:

Details

After reviewing packages/cli/src/ui/commands/types.ts and packages/cli/src/ui/types.ts, I
have a much clearer understanding of the architecture.

Here's a breakdown of how the UI layer connects to the data layer in the Gemini CLI:

The Core Bridge: CommandContext

The primary interface connecting the data/logic layer to the UI layer is CommandContext.
This object is passed to every slash command's action handler. It provides a well-defined
API for commands to interact with the UI without being tightly coupled to the specific
rendering implementation (which is currently Ink-based React).

The ui property within CommandContext is the most critical part. It exposes a set of
functions that a command can call to affect the user interface:

  • addItem(item: HistoryItemWithoutId): This is the main function for adding content to the
    display. HistoryItemWithoutId is a union type that can represent various kinds of
    content, such as user input, Gemini's responses, tool outputs, error messages, and more.
  • setPendingItem(item: HistoryItemWithoutId | null): Used to display a temporary item
    while a long-running operation is in progress.
  • clear(): Clears the entire UI.
  • setDebugMessage(message: string): Displays a debug message, likely in a footer or status
    bar.
  • loadHistory(history: HistoryItemWithoutId[]): Replaces the current session history with
    a new one.

The Communication Protocol: HistoryItem

The HistoryItem type (and its HistoryItemWithoutId variant) defines the "protocol" for the
data that is sent to the UI for rendering. It's a discriminated union based on the type
property. This allows the UI to handle different types of content appropriately. Some of
the key HistoryItem types are:

  • user: Represents a prompt entered by the user.
  • gemini or gemini_content: Represents a response from the Gemini model.
  • tool_group: This is particularly important. It represents the output of one or more
    tools. It contains an array of IndividualToolCallDisplay objects, each with details
    about the tool call's status (pending, executing, success, error), its results
    (resultDisplay), and any confirmation details. This is how the UI can render rich tool
    outputs, including live updates for streaming commands.
  • info, error: For displaying informational or error messages.
  • quit: A special type to signal the application to exit.
  • confirm_action, confirm_shell_commands: These are not HistoryItems themselves but are
    returned by command actions to signal to the UI that a confirmation prompt needs to be
    displayed. The UI then handles the user's response and re-invokes the original command
    with the confirmation result.

Key Interfaces and Files

  • packages/cli/src/ui/commands/types.ts: This file is the most important one. It defines
    the CommandContext interface and the various SlashCommandActionReturn types that a
    command's action can return to signal different UI interactions (like quitting, opening a
    dialog, or submitting a prompt).

  • packages/cli/src/ui/types.ts: This file defines the HistoryItem union type and all its
    variants, which represent the different kinds of content that can be displayed in the UI.
    It also defines IndividualToolCallDisplay, which is crucial for rendering tool outputs.

  • packages/cli/src/ui/hooks/useGeminiStream.ts: This React hook is a good example of the
    consumer side of this protocol. It receives HistoryItem objects and uses them to update
    the React state, which in turn causes the UI to re-render.

How to Integrate a New Renderer

To replace the existing Ink-based renderer, you would need to create a new UI layer that
consumes the CommandContext and renders the HistoryItem objects. The best place to
integrate would be to create a new implementation of the main application component
(currently likely in packages/cli/src/ui/App.tsx or a similar file) that:

  1. Provides an implementation of the ui object in CommandContext: Your new renderer
    would need to provide functions for addItem, clear, etc., that update your new UI
    instead of the React/Ink components.
  2. Renders HistoryItem objects: Your new UI would need to have components or rendering
    logic for each of the different HistoryItem types. For example, it would need to know
    how to display a tool_group with its individual tool calls, how to render Markdown from
    a gemini_content item, and how to display error and info messages.
  3. Handles confirmation prompts: When a command returns a confirm_shell_commands or
    confirm_action result, your UI would need to display the appropriate prompt to the user
    and then re-execute the command based on their response.

By adhering to the CommandContext and HistoryItem interfaces, you can effectively decouple
your new renderer from the core business logic of the Gemini CLI, just as the current
Ink-based UI is.

How are text output diffs generated by the LLM? Is it through tool calls? Where is it? │
╰────────────────────────────────────────────────────────────────────────────────────────────╯

✦ You've hit on a key feature. Based on my analysis, text diffs are not generated directly by
the LLM's text output. Instead, they are the result of specific tool calls that modify the
file system.

Here's the workflow and where to find the relevant code:

  1. The LLM Decides to Change a File: The model determines that the best way to fulfill the
    user's request is to modify a file. It does this by issuing a tool call for either
    replace or write_file. The arguments for this tool call include the file path and the
    complete new content for the file (or the section to be replaced).

  2. The Tool Executes: The CLI's tool scheduler receives this request and executes the
    appropriate tool. The tool performs the file modification in memory.

  3. The Tool Returns a Special Display Object: This is the crucial step. Instead of just
    returning a simple "success" message, the replace and write_file tools are designed to
    return a special object that includes the original content, the new content, and the
    file name.

  4. The UI Renders the Diff: The UI layer receives this special object and uses a dedicated
    component to calculate and render the diff.

Key Code Locations:

  1. packages/cli/src/zed-integration/zedIntegration.ts: This file shows how the tool
    result is packaged for display. When a tool like replace or write_file succeeds, it
    returns a returnDisplay object with a specific structure.
1     // In packages/cli/src/zed-integration/zedIntegration.ts
2     // (Inside the `executeTool` method or similar logic)
3
4     if ('fileName' in toolResult.returnDisplay) {
5         // This is a diff object
6         const diff_params = {
7             path: toolResult.returnDisplay.fileName,
8             oldText: toolResult.returnDisplay.originalContent,
9             newText: toolResult.returnDisplay.newContent,

10 };
11 // ...sends this to the UI for rendering
12 }

  1. packages/cli/src/ui/hooks/useReactToolScheduler.ts: This hook is responsible for
    managing the state of tool calls for the UI. The mapToDisplay function within this file
    transforms the raw tool call information into a format the UI components can render.
    It looks for the special returnDisplay object and passes it along.

  2. packages/cli/src/ui/types.ts: This file defines the data structures. The
    ToolResultDisplay type is a union that can include the object containing fileName,
    originalContent, and newContent, which signals to the UI that a diff should be
    rendered.

In summary, the LLM doesn't generate the diff itself. It generates the intent to change a
file via a tool call, and the client-side application uses the tool's output to create and
display the diff to the user. This is a much more robust and reliable way to handle file
modifications.

Metadata

Metadata

Assignees

Labels

P1A high-priority issue. Someone should be assigned and actively working on it.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions