Figure out how to integrate a custom UI layer into Gemini CLI

## Vague requirements

- P1: Allow the model to render custom components that it couldn't render before. E.g. not just diffs, multiple choice etc. Other components.
- P1: Have UI to take on a persistent, full-screen, "pinned" approach where longer-running tasks and structures can be persisted even as more granular tasks come and go.
- P2: Low impact on existing Gemini CLI codebase, easy to maintain our UI "extension" even as the Gemini CLI codebase changes

## Implementation ideas

### Low impact option: Create an MCP server that exposes Gen UI as a tool for Gemini CLI

We could implement an MCP server in Dart which exposes methods to drive a custom UI layer. It would run locally and communicate via stdin/out or via HTTP.

Advantages:
- Can combine chat + UI, as we don't hijack regular output
- Can drive custom UI components with arbitary schemas
- Very modular, doesn't require any changes to Gemini CLI codebase, uses official extension point

Concerns
- Might be hard to persuade Gemini CLI to use the Gen UI output tool consistently

### Higher performance option: Update system prompt to replace text output with UI output

We could figure out how Gemini CLI drives its current UI layer, and try to replace it with some Gen UI approach e.g. structured output.

#### Notes from using Gemini CLI to explore it's own codebase:
<details>
<summary>Details</summary>

After reviewing packages/cli/src/ui/commands/types.ts and packages/cli/src/ui/types.ts, I
  have a much clearer understanding of the architecture.

  Here's a breakdown of how the UI layer connects to the data layer in the Gemini CLI:

  The Core Bridge: CommandContext

  The primary interface connecting the data/logic layer to the UI layer is CommandContext.
  This object is passed to every slash command's action handler. It provides a well-defined
  API for commands to interact with the UI without being tightly coupled to the specific
  rendering implementation (which is currently Ink-based React).

  The ui property within CommandContext is the most critical part. It exposes a set of
  functions that a command can call to affect the user interface:

   * addItem(item: HistoryItemWithoutId): This is the main function for adding content to the
     display. HistoryItemWithoutId is a union type that can represent various kinds of
     content, such as user input, Gemini's responses, tool outputs, error messages, and more.
   * setPendingItem(item: HistoryItemWithoutId | null): Used to display a temporary item
     while a long-running operation is in progress.
   * clear(): Clears the entire UI.
   * setDebugMessage(message: string): Displays a debug message, likely in a footer or status
     bar.
   * loadHistory(history: HistoryItemWithoutId[]): Replaces the current session history with
     a new one.

  The Communication Protocol: HistoryItem

  The HistoryItem type (and its HistoryItemWithoutId variant) defines the "protocol" for the
  data that is sent to the UI for rendering. It's a discriminated union based on the type
  property. This allows the UI to handle different types of content appropriately. Some of
  the key HistoryItem types are:

   * user: Represents a prompt entered by the user.
   * gemini or gemini_content: Represents a response from the Gemini model.
   * tool_group: This is particularly important. It represents the output of one or more
     tools. It contains an array of IndividualToolCallDisplay objects, each with details
     about the tool call's status (pending, executing, success, error), its results
     (resultDisplay), and any confirmation details. This is how the UI can render rich tool
     outputs, including live updates for streaming commands.
   * info, error: For displaying informational or error messages.
   * quit: A special type to signal the application to exit.
   * confirm_action, confirm_shell_commands: These are not HistoryItems themselves but are
     returned by command actions to signal to the UI that a confirmation prompt needs to be
     displayed. The UI then handles the user's response and re-invokes the original command
     with the confirmation result.

  Key Interfaces and Files

   * `packages/cli/src/ui/commands/types.ts`: This file is the most important one. It defines
     the CommandContext interface and the various SlashCommandActionReturn types that a
     command's action can return to signal different UI interactions (like quitting, opening a
      dialog, or submitting a prompt).
   * `packages/cli/src/ui/types.ts`: This file defines the HistoryItem union type and all its
     variants, which represent the different kinds of content that can be displayed in the UI.
      It also defines IndividualToolCallDisplay, which is crucial for rendering tool outputs.

   * `packages/cli/src/ui/hooks/useGeminiStream.ts`: This React hook is a good example of the
     consumer side of this protocol. It receives HistoryItem objects and uses them to update
     the React state, which in turn causes the UI to re-render.

  How to Integrate a New Renderer

  To replace the existing Ink-based renderer, you would need to create a new UI layer that
  consumes the CommandContext and renders the HistoryItem objects. The best place to
  integrate would be to create a new implementation of the main application component
  (currently likely in packages/cli/src/ui/App.tsx or a similar file) that:

   1. Provides an implementation of the `ui` object in `CommandContext`: Your new renderer
      would need to provide functions for addItem, clear, etc., that update your new UI
      instead of the React/Ink components.
   2. Renders `HistoryItem` objects: Your new UI would need to have components or rendering
      logic for each of the different HistoryItem types. For example, it would need to know
      how to display a tool_group with its individual tool calls, how to render Markdown from
       a gemini_content item, and how to display error and info messages.
   3. Handles confirmation prompts: When a command returns a confirm_shell_commands or
      confirm_action result, your UI would need to display the appropriate prompt to the user
       and then re-execute the command based on their response.

  By adhering to the CommandContext and HistoryItem interfaces, you can effectively decouple
  your new renderer from the core business logic of the Gemini CLI, just as the current
  Ink-based UI is.

> How are text output diffs generated by the LLM? Is it through tool calls? Where is it?  │
╰────────────────────────────────────────────────────────────────────────────────────────────╯

✦ You've hit on a key feature. Based on my analysis, text diffs are not generated directly by
  the LLM's text output. Instead, they are the result of specific tool calls that modify the
  file system.

  Here's the workflow and where to find the relevant code:

   1. The LLM Decides to Change a File: The model determines that the best way to fulfill the
       user's request is to modify a file. It does this by issuing a tool call for either
      replace or write_file. The arguments for this tool call include the file path and the
      complete new content for the file (or the section to be replaced).

   2. The Tool Executes: The CLI's tool scheduler receives this request and executes the
      appropriate tool. The tool performs the file modification in memory.

   3. The Tool Returns a Special Display Object: This is the crucial step. Instead of just
      returning a simple "success" message, the replace and write_file tools are designed to
      return a special object that includes the original content, the new content, and the
      file name.

   4. The UI Renders the Diff: The UI layer receives this special object and uses a dedicated
       component to calculate and render the diff.

  Key Code Locations:

   1. `packages/cli/src/zed-integration/zedIntegration.ts`: This file shows how the tool
      result is packaged for display. When a tool like replace or write_file succeeds, it
      returns a returnDisplay object with a specific structure.

    1     // In packages/cli/src/zed-integration/zedIntegration.ts
    2     // (Inside the `executeTool` method or similar logic)
    3
    4     if ('fileName' in toolResult.returnDisplay) {
    5         // This is a diff object
    6         const diff_params = {
    7             path: toolResult.returnDisplay.fileName,
    8             oldText: toolResult.returnDisplay.originalContent,
    9             newText: toolResult.returnDisplay.newContent,
   10         };
   11         // ...sends this to the UI for rendering
   12     }

   2. `packages/cli/src/ui/hooks/useReactToolScheduler.ts`: This hook is responsible for
      managing the state of tool calls for the UI. The mapToDisplay function within this file
       transforms the raw tool call information into a format the UI components can render.
      It looks for the special returnDisplay object and passes it along.

   3. `packages/cli/src/ui/types.ts`: This file defines the data structures. The
      ToolResultDisplay type is a union that can include the object containing fileName,
      originalContent, and newContent, which signals to the UI that a diff should be
      rendered.

  In summary, the LLM doesn't generate the diff itself. It generates the intent to change a
  file via a tool call, and the client-side application uses the tool's output to create and
  display the diff to the user. This is a much more robust and reliable way to handle file
  modifications.
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Figure out how to integrate a custom UI layer into Gemini CLI #308

Vague requirements

Implementation ideas

Low impact option: Create an MCP server that exposes Gen UI as a tool for Gemini CLI

Higher performance option: Update system prompt to replace text output with UI output

Notes from using Gemini CLI to explore it's own codebase:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Figure out how to integrate a custom UI layer into Gemini CLI #308

Description

Vague requirements

Implementation ideas

Low impact option: Create an MCP server that exposes Gen UI as a tool for Gemini CLI

Higher performance option: Update system prompt to replace text output with UI output

Notes from using Gemini CLI to explore it's own codebase:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions