Skip to content

Conversation

@chiang-daniel
Copy link
Contributor

@chiang-daniel chiang-daniel commented Nov 7, 2025

What does this PR do?

Big PR to support fine-tuning with Trace. This PR is based on the work from #758 where a new Trace based Dataset Formatter is introduced.

  • New fine-tune page redesign, split models by function fine-tune capabilities and filter datasets accordingly.
  • Save "run-config" with tools information for every new Fine-Tune so we could match the tools data.
  • UI updates
  • Fine-tune API updates.
  • Tested on Fireworks, OpenAI, Google Vertex.

Checklists

  • Tests have been run locally and passed
  • New tests have been added to any work in /lib

Summary by CodeRabbit

  • New Features

    • Models now indicate function-calling support; finetune creation can include and persist a Run Config; new endpoint to list Run Configs.
  • Improvements

    • Dataset/tag listings and downloads respect selected tools, showing only eligible datasets/tags; dataset exports support tool-aware formatting (including Vertex).
    • Run-config UI updated: grouped model selector, tool-aware enable/disable and auto-clear, saved run-config persistence, and model-specific run-config suggestions.
  • Tests

    • Expanded tests for tool workflows, run-config propagation, async dataset exports, and new formatting paths.

✏️ Tip: You can customize this high-level summary in your review settings.


Note

Adds tool-aware fine-tune creation and model-specific run configs, extends dataset exports to include traces/tools (incl. Vertex), and updates UI to select compatible models/datasets with persisted state.

  • Backend/API:
    • Add GET /api/projects/{project_id}/tasks/{task_id}/run_configs/ (includes fine-tune run configs).
    • Extend finetune APIs: accept run_config_properties, expose supports_function_calling, filter finetune_dataset_info by tool_ids, and return eligible_datasets/eligible_finetune_tags.
    • Provider models include model_specific_run_config; fine-tuned models expose consistent model_id, structured output mode, and optional model-specific run config.
  • Data formatting/Tracing:
    • Incorporate tool calls from traces into training chats; include tool definitions in exports; add Vertex Gemini formatter; make dataset export async.
    • Set fine-tune run_config post-start (kiln_fine_tune model, generated prompt id, structured output mode).
    • DatasetSplit computes per-split tool consistency via tool_info.
  • UI:
    • Run configs store now uses new endpoint; dropdowns handle disabled/"recommended" options; warning when not using model’s suggested config.
    • Run Config Component: debounced state sync, auto-select model-specific configs (e.g., fine-tunes), tool selector settings, hide model selector when embedded.
    • Fine-tune creation flow redesigned: grouped models by tool support, pick tools, filter/restore eligible datasets by tools, persistent form state, improved downloads.
    • Misc: option list/buttons honor disabled state; minor schema/type updates.
  • Tests:
    • Broad new coverage for tool-aware datasets, run-config propagation, Vertex formatting, chat/tool message handling, async exports, and provider/endpoint changes.

Written by Cursor Bugbot for commit 4adf7f4. This will update automatically on new commits. Configure here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (2)

58-114: Remove unused tool_definitions parameter.

The tool_definitions parameter is declared but never used within the build_training_chat function body. The tool definitions are only passed to the generator functions in dump_to_file, not used during training chat construction.

Apply this diff to remove the unused parameter:

 def build_training_chat(
     task_run: TaskRun,
     system_message: str,
     data_strategy: ChatStrategy,
     thinking_instructions: str | None = None,
-    tool_definitions: list[ToolCallDefinition] | None = None,
 ) -> list[ChatMessage]:

139-179: Fix type annotation to match actual return type.

The function signature declares list[dict[str, str | None]] as the return type, but the function actually returns dictionaries that can contain tool_calls (list) and tool_call_id (str), which don't match the annotation. The internal typing as list[dict[str, Any]] is correct.

Apply this diff to fix the type annotation:

 def generate_chat_message_list(
     training_chat: list[ChatMessage],
-) -> list[dict[str, str | None]]:
+) -> list[dict[str, Any]]:
     """Generate OpenAI chat list. Not the full OpenAI body, just the list of messages."""
app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (1)

598-652: Bug: update_data_strategies_supported leaves Reasoning select empty for some download formats

For download formats in r1_disabled_for_downloads (download_huggingface_chat_template_toolcall, download_jsonl_toolcall, download_vertex_gemini), the function hits:

if (r1_disabled_for_downloads.includes(model_provider)) {
  return ["final_only", "two_message_cot"]
}

This early return just yields an array and exits without updating data_strategy_select_options or data_strategy, so the “Reasoning” dropdown can remain stale or empty when switching into these options.

You can fold the special‑case into compatible_data_strategies instead, so the function always updates the reactive state:

-    const r1_disabled_for_downloads = [
-      // R1 data strategy currently disabled for toolcall downloads
-      // because unclear how to use in the best way
-      "download_huggingface_chat_template_toolcall",
-      "download_jsonl_toolcall",
-
-      // R1 currently not supported by Vertex models
-      "download_vertex_gemini",
-    ]
-    if (r1_disabled_for_downloads.includes(model_provider)) {
-      return ["final_only", "two_message_cot"]
-    }
-
-    const compatible_data_strategies: ChatStrategy[] = is_download
-      ? [
-          "final_only",
-          "two_message_cot",
-          "final_and_intermediate_r1_compatible",
-        ]
-      : available_models
+    const r1_disabled_for_downloads = [
+      // R1 data strategy currently disabled for toolcall downloads
+      // because unclear how to use in the best way
+      "download_huggingface_chat_template_toolcall",
+      "download_jsonl_toolcall",
+
+      // R1 currently not supported by Vertex models
+      "download_vertex_gemini",
+    ]
+
+    const compatible_data_strategies: ChatStrategy[] = is_download
+      ? (r1_disabled_for_downloads.includes(model_provider)
+          ? ["final_only", "two_message_cot"]
+          : [
+              "final_only",
+              "two_message_cot",
+              "final_and_intermediate_r1_compatible",
+            ])
+      : available_models
           ?.map((model) => model.models)
           .flat()
           .find((model) => model.id === base_model_id)
           ?.data_strategies_supported ?? []

This keeps the R1 strategy hidden where needed but still drives data_strategy_select_options and data_strategy for the UI.

♻️ Duplicate comments (1)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (1)

410-450: Past review concern is addressed.

The error handling for tool definition lookup is properly implemented. The try-except block (lines 438-448) catches all exceptions including ValueError from tool_from_id and wraps them in a RuntimeError with clear context. This prevents a single bad task run from crashing the entire formatting operation, addressing the concern raised in the previous review.

🧹 Nitpick comments (2)
app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (2)

92-201: State persistence and reset flow look solid; consider future-proofing in‑memory resets

The IndexedDB‑backed SavedFinetuneState plus state_initialized guard now correctly prevents clear_saved_state() from being immediately overwritten by the reactive writer, and the saved_dataset_id restoration into SelectFinetuneDataset looks consistent.

One thing to keep in mind: clear_saved_state() resets only the persisted state, not the local form fields (finetune_name, finetune_description, selected_dataset, etc.). That’s fine for the current flows (Reset does a full window.location.reload(); post‑create you show the Completed view), but if you later add a “Create another fine‑tune” path on the same component instance you’ll also want to clear the in‑memory fields and explicitly flip state_initialized back to true after re‑initializing them so subsequent edits are re‑persisted.

   function clear_saved_state() {
     // Prevent the reactive block from immediately repopulating the store
     state_initialized = false

     // Clear the saved state in IndexedDB by saving the defaults
     saved_state.update((s) => ({
       ...s,
@@
       tools: undefined,
     }))
+
+    // (Optional for future "create another fine-tune" flows)
+    // Reset in-memory fields as well so the UI matches the cleared state.
+    // finetune_name = ""
+    // finetune_description = ""
+    // selected_dataset = null
+    // selected_tool_ids = []
+    // data_strategy = "final_only"
+    // system_prompt_method = "simple_prompt_builder"
   }

Also applies to: 217-225, 229-244, 826-851


853-855: Minor UX nit: Step 4 header is empty for download‑only flows

When is_download is true and a dataset is selected, step_4_visible is true so the “Step 4: Advanced Options” header renders, but the inner content is wrapped in {#if !is_download} and therefore empty. Then Step 5 (“Download JSONL”) appears below.

That’s a little confusing visually (a numbered step with no controls). You could either hide Step 4 entirely for download flows, or repurpose the label so the sequence stays contiguous (e.g., make downloads the new Step 4 when is_download).

-  $: step_4_visible =
-    $model_provider && $model_provider !== disabled_header && !!selected_dataset
+  $: step_4_visible =
+    $model_provider &&
+    $model_provider !== disabled_header &&
+    !!selected_dataset &&
+    !is_download
@@
-        {#if step_4_visible}
+        {#if step_4_visible}
           <div class="text-xl font-bold">Step 4: Advanced Options</div>
           {#if !is_download}
             <div>
               <Collapse title="Advanced Options">
@@
-    {#if step_5_download_visible}
+    {#if step_5_download_visible}
       <div>
-        <div class="text-xl font-bold">Step 5: Download JSONL</div>
+        <div class="text-xl font-bold">
+          {step_4_visible ? "Step 5: Download JSONL" : "Step 4: Download JSONL"}
+        </div>

Also applies to: 856-901, 905-937

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b280e4d and cf1d93f.

📒 Files selected for processing (2)
  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (12 hunks)
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (11 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-22T04:31:20.500Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 739
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte:170-177
Timestamp: 2025-10-22T04:31:20.500Z
Learning: In Svelte components for the Kiln project, leonardmq prefers tracking previous query parameter values and comparing them in reactive blocks (rather than just checking for existence) to avoid duplicate execution on initial load while still reacting to subsequent navigation changes. This pattern supports both current behavior and future feature additions like mode switching.

Applied to files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
📚 Learning: 2025-10-07T00:21:33.537Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 698
File: app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte:800-812
Timestamp: 2025-10-07T00:21:33.537Z
Learning: In the Kiln web UI codebase, the `save_new_task_run_config` function in `app/web_ui/src/lib/stores/run_configs_store.ts` automatically refreshes the run configs store by calling `load_task_run_configs(project_id, task_id, true)` with force_refresh=true after successfully creating a new run config. This means callers of save_new_task_run_config do not need to manually refresh the store afterward, as the store will already contain the newly created config.

Applied to files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
🧬 Code graph analysis (1)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (5)
libs/core/kiln_ai/adapters/chat/chat_formatter.py (6)
  • BasicChatMessage (16-18)
  • ToolCallMessage (22-27)
  • ToolResponseMessage (31-36)
  • get_chat_formatter (239-259)
  • append_messages (74-76)
  • messages (71-72)
libs/core/kiln_ai/adapters/chat/chat_utils.py (1)
  • build_tool_call_messages (10-61)
libs/core/kiln_ai/adapters/fine_tune/vertex_formatter.py (1)
  • generate_vertex_gemini (18-235)
libs/core/kiln_ai/tools/base_tool.py (1)
  • ToolCallDefinition (19-23)
libs/core/kiln_ai/tools/tool_registry.py (1)
  • tool_from_id (25-117)
🪛 GitHub Actions: Debug Detector
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

[error] 435-435: Python print statement found. Remove or replace with proper logging.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Generate Coverage Report
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (macos-15-intel)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
🔇 Additional comments (4)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (3)

8-21: LGTM! Clean import additions for tool integration.

The new imports are well-organized and all necessary for the tool call functionality being added to dataset formatting.


87-90: LGTM! Tool call extraction is correctly implemented.

The tool call messages are properly extracted from traces and inserted into the training chat, enabling tool-based fine-tuning data generation.


339-406: LGTM! Async conversion is necessary and well-implemented.

The method is correctly converted to async to support fetching tool definitions from potentially async sources (like MCP servers). Tool definitions are properly fetched and propagated through the formatting pipeline.

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (1)

447-458: Run‑config and tools wiring into finetune creation looks coherent

The way you:

  • Extract model_name from base_model_id,
  • Build run_config_properties from run_config_component.run_options_as_run_config_properties() plus model_name and model_provider_name,
  • Bind tools into RunConfigComponent and reuse selected_tool_ids both for dataset filtering (required_tool_ids) and analytics (tool_count),
  • Gate tools via supports_function_calling and clear them reactively for unsupported models,

results in a clean, self‑contained payload (run_config_properties) and consistent telemetry (supports_tools / tool_count).

The null‑checks around run_config_component, model_name, and provider_id also make this robust against partial state. No changes needed here.

Also applies to: 487-487, 501-503, 806-824, 844-847

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (1)

58-114: Remove unused parameter.

The tool_definitions parameter added at line 63 is never used within the function body. Tool call messages are extracted from traces (lines 87-89), but the tool definitions themselves aren't needed during chat construction—they're only passed to the formatters later.

Apply this diff to remove the unused parameter:

 def build_training_chat(
     task_run: TaskRun,
     system_message: str,
     data_strategy: ChatStrategy,
     thinking_instructions: str | None = None,
-    tool_definitions: list[ToolCallDefinition] | None = None,
 ) -> list[ChatMessage]:

And update the call site at line 393:

                 training_chat = build_training_chat(
                     task_run=task_run,
                     system_message=self.system_message,
                     data_strategy=data_strategy,
                     thinking_instructions=self.thinking_instructions,
-                    tool_definitions=tools,
                 )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf1d93f and 4aa1daf.

📒 Files selected for processing (1)
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

Applied to files:

  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
🧬 Code graph analysis (1)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (5)
libs/core/kiln_ai/adapters/chat/chat_formatter.py (6)
  • BasicChatMessage (16-18)
  • ToolCallMessage (22-27)
  • ToolResponseMessage (31-36)
  • get_chat_formatter (239-259)
  • append_messages (74-76)
  • messages (71-72)
libs/core/kiln_ai/adapters/chat/chat_utils.py (1)
  • build_tool_call_messages (10-61)
libs/core/kiln_ai/adapters/fine_tune/vertex_formatter.py (1)
  • generate_vertex_gemini (18-235)
libs/core/kiln_ai/tools/base_tool.py (1)
  • ToolCallDefinition (19-23)
libs/core/kiln_ai/tools/tool_registry.py (1)
  • tool_from_id (25-117)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Desktop Apps (macos-15-intel)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (6)
libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (6)

8-20: LGTM!

The new imports for chat message types (BasicChatMessage, ToolCallMessage, ToolResponseMessage), tool utilities (build_tool_call_messages), and tool registry functions (tool_from_id, ToolCallDefinition) are all used appropriately throughout the file to support tool-based fine-tuning.


48-55: LGTM!

The protocol extension to accept optional tool definitions is appropriate and all format generator implementations have been updated to match this signature.


139-179: LGTM!

The function correctly handles the three message types (BasicChatMessage, ToolCallMessage, ToolResponseMessage) with appropriate type guards and error handling. The type annotation change to Any is necessary to accommodate tool_calls arrays, and the empty string fallback for None content (line 162) is well-documented for Fireworks compatibility.


410-449: LGTM!

The helper method correctly fetches tool definitions with appropriate caching to avoid repeated MCP server calls. The error handling wraps any exceptions with context, addressing the previous review concern about tool lookup failures. Early returns for missing prerequisites (run_config, tools_config, task) are handled cleanly.

Based on past review comments (now resolved).


339-406: LGTM!

The async conversion of dump_to_file is correct, properly awaiting the tool definition fetch (line 390) and threading tool definitions through to both build_training_chat and the format generator (lines 398, 400). UTF-8 encoding with ensure_ascii=False appropriately supports international content.


267-275: No issues found—HuggingFace format correctly embeds tool information in conversation structure.

Verification confirms that both HuggingFace generator functions intentionally do not use the tools parameter. The generate_huggingface_chat_template_toolcall function embeds tool call information directly within the conversation structure (via the "tool_calls" key in messages), whereas the OpenAI format includes tool definitions separately. This is the correct design pattern for HuggingFace chat templates, which do not use a separate tools field.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte (1)

176-194: Consider simplifying redundant nullish coalescing.

The common_props object is a good refactor for reducing duplication. However, the fallback on lines 184-186 is redundant since tools_selector_settings.empty_label is guaranteed to have a value from the defaults merge on lines 31-34.

Apply this diff to simplify:

-    empty_label:
-      tools_selector_settings.empty_label ??
-      default_tools_selector_settings.empty_label,
+    empty_label: tools_selector_settings.empty_label,

Note: The current code works correctly; this is just a minor simplification.

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (1)

182-201: Consider simplifying state clearing logic.

The clear_saved_state function uses update() to set each field to undefined (lines 187-200), which is verbose and maintains the object structure in IndexedDB with undefined values. Since state_initialized is already set to false to prevent reactive repopulation, a simpler approach would be to clear the entire object:

 function clear_saved_state() {
   // Prevent the reactive block from immediately repopulating the store
   state_initialized = false
-
-  // Clear the saved state in IndexedDB by saving the defaults
-  saved_state.update((s) => ({
-    ...s,
-    name: undefined,
-    description: undefined,
-    provider: undefined,
-    base_model_id: undefined,
-    dataset_split_id: undefined,
-    parameters: undefined,
-    system_message: undefined,
-    thinking_instructions: undefined,
-    data_strategy: "final_only",
-    system_prompt_method: "simple_prompt_builder",
-    tools: undefined,
-  }))
+  saved_state.set({})
 }

This achieves the same result with less code and more clearly expresses the intent to clear the state.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4aa1daf and c0d90a1.

📒 Files selected for processing (4)
  • app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte (6 hunks)
  • app/web_ui/src/lib/utils/form_element.svelte (1 hunks)
  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (12 hunks)
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py (10 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-27T04:44:58.745Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 751
File: app/web_ui/src/routes/(app)/docs/rag_configs/[project_id]/create_rag_config/edit_rag_config_form.svelte:49-50
Timestamp: 2025-10-27T04:44:58.745Z
Learning: In the Kiln web UI codebase (Svelte), the FormElement component uses an `optional` boolean prop that defaults to `false`, rather than a `required` prop. FormElement also has built-in validation capabilities through its `validator` prop.

Applied to files:

  • app/web_ui/src/lib/utils/form_element.svelte
📚 Learning: 2025-10-22T04:31:20.500Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 739
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte:170-177
Timestamp: 2025-10-22T04:31:20.500Z
Learning: In Svelte components for the Kiln project, leonardmq prefers tracking previous query parameter values and comparing them in reactive blocks (rather than just checking for existence) to avoid duplicate execution on initial load while still reacting to subsequent navigation changes. This pattern supports both current behavior and future feature additions like mode switching.

Applied to files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
📚 Learning: 2025-10-07T00:21:33.537Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 698
File: app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte:800-812
Timestamp: 2025-10-07T00:21:33.537Z
Learning: In the Kiln web UI codebase, the `save_new_task_run_config` function in `app/web_ui/src/lib/stores/run_configs_store.ts` automatically refreshes the run configs store by calling `load_task_run_configs(project_id, task_id, true)` with force_refresh=true after successfully creating a new run config. This means callers of save_new_task_run_config do not need to manually refresh the store afterward, as the store will already contain the newly created config.

Applied to files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
📚 Learning: 2025-11-11T00:07:51.535Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 756
File: app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte:19-20
Timestamp: 2025-11-11T00:07:51.535Z
Learning: In app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte, the single_select_selected_tool prop should NOT be synchronized with the tools_store. Only the multi-select tools array should persist to the store. The single_select mode is intended to use ephemeral component-local state.

Applied to files:

  • app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (macos-15-intel)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
🔇 Additional comments (11)
app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte (8)

1-10: LGTM: Clean imports supporting settings-driven refactor.

The imports are well-organized and appropriate for the settings-driven architecture introduced in this refactor.


11-18: LGTM: Props correctly implement settings pattern and respect store synchronization.

The new settings prop enables a clean, settings-driven API. The single_select_selected_tool correctly remains component-local state and is not synchronized with tools_store, which aligns with the intended ephemeral behavior for single-select mode. Based on learnings.


22-34: LGTM: Well-implemented settings merge pattern.

The settings-driven approach is correctly implemented with sensible defaults and proper reactive merging. This reduces prop proliferation and provides a cleaner API surface.


43-66: LGTM: Mandatory tools correctly integrated into load flow.

The function properly combines mandatory tools from settings with existing task tools, preserving order and removing duplicates. The async initialization is handled correctly.


69-77: LGTM: Store synchronization correctly implements documented behavior.

The reactive statement properly synchronizes only the multi-select tools array with the store, while keeping single_select_selected_tool ephemeral. This matches the intended design. Based on learnings.


79-109: LGTM: Robust filtering of unavailable tools.

The function correctly filters out tools that are no longer available (e.g., server offline, removed tools) with appropriate console warnings for debugging.


113-174: LGTM: Settings correctly control tool option generation.

The function properly uses settings to control UI behavior (hiding create button, disabling mandatory tools) while building option groups for the selector.


198-210: LGTM: Clean render implementation leveraging common props.

The conditional rendering correctly uses settings to determine single vs. multi-select mode, and the common_props spread effectively eliminates duplication while maintaining proper value bindings for each mode.

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (3)

151-169: State persistence implementation looks good.

The reactive block correctly persists form state to IndexedDB when state_initialized is true, and the guard prevents premature execution. This works well with the clear_saved_state function (lines 182-201) which sets state_initialized = false to prevent the reactive writer from immediately repopulating the store—addressing the previous review concern.


806-823: RunConfigComponent integration looks well-structured.

The component binding (line 807) enables the parent to call clear_tools() (line 80), and the bidirectional tools binding (line 808) keeps selected_tool_ids in sync. The configuration properly adapts the UI for the fine-tuning context, hiding unnecessary elements and providing clear feedback when tools are disabled.


839-849: Good safeguard: dataset selector waits for state initialization.

Conditionally rendering SelectFinetuneDataset only after state_initialized is true (line 839) ensures that saved_dataset_id is properly loaded from IndexedDB before the component attempts to restore the dataset selection. This prevents race conditions where the dataset selector might render before the persisted dataset ID is available.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
app/desktop/studio_server/eval_api.py (1)

325-345: Avoid mutating task.run_configs() list when augmenting with finetune configs

You’re appending finetune-derived configs directly to the list returned by task.run_configs(). If that method ever returns a cached/shared list, this can mutate internal state and potentially accumulate duplicates across calls.

Consider copying before augmentation:

-    async def get_run_configs(project_id: str, task_id: str) -> list[TaskRunConfig]:
-        # Returns all run configs of a given task.
-        task = task_from_id(project_id, task_id)
-        configs = task.run_configs()
-
-        # Get run configs from finetunes
-        finetunes = task.finetunes()
-        for finetune in finetunes:
+    async def get_run_configs(project_id: str, task_id: str) -> list[TaskRunConfig]:
+        # Returns all run configs of a given task, plus finetune-derived run configs.
+        task = task_from_id(project_id, task_id)
+        # Use a shallow copy to avoid mutating the Task's internal collection.
+        configs = list(task.run_configs())
+
+        # Get run configs from finetunes
+        for finetune in task.finetunes():
             if finetune.run_config is not None:
                 configs.append(
                     TaskRunConfig(
                         id=f"finetune_run_config::{finetune.id}",  # this id is not persisted
                         name=finetune.name,
                         description=finetune.description,
                         run_config_properties=finetune.run_config,
                     )
                 )

This keeps the endpoint behavior the same while removing any reliance on run_configs() returning a fresh list each time.

app/web_ui/src/lib/api_schema.d.ts (3)

1578-1594: New /run_configs/ path looks correct; consider trailing-slash consistency if not intentional

The new GET /api/projects/{project_id}/tasks/{task_id}/run_configs/ endpoint wiring and typing match the corresponding operation and TaskRunConfig[] response, so the schema surface looks consistent. The only nit is the trailing / in the path, which differs from nearby task-scoped routes (task_run_configs, evals, etc.). If consistency across routes matters, you may want to normalize this in the FastAPI/OpenAPI definition and regenerate, otherwise this is fine as-is. Based on learnings, any change should be done in the backend schema, not in this generated file.


3899-3901: FinetuneDatasetInfo eligibility fields give a clear filtered view

Adding eligible_datasets and eligible_finetune_tags alongside the existing collections is a nice way to distinguish “all known” vs “eligible under current constraints” in the UI. Just ensure the backend docs/comments clearly describe how eligibility is computed (e.g., tool support, data strategy) so future consumers don’t conflate these with the existing fields. Any doc tweaks must go in the backend schema and then be regenerated here.


9158-9189: New get_run_configs operation mirrors the existing task run-configs API

get_run_configs_api_projects__project_id__tasks__task_id__run_configs__get cleanly returns TaskRunConfig[], matching the earlier get_task_run_configs... operation. Having both endpoints is fine if you’re standardizing naming for new consumers while keeping the old route for compatibility; if not, it might be worth documenting the intended distinction between the two in the backend/OpenAPI comments. The generated TypeScript wiring itself looks correct.

app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte (1)

134-151: Check whether default finetune configs should appear in two groups

Right now a finetune run config that is also the task’s default_run_config_id will:

  • Be added to saved_configuration_options as "... (Default)" under Saved Configurations.
  • Also appear under the Fine-Tune Configuration group via finetune_run_configs.

If duplication isn’t intended, you may want to either:

  • Skip pushing finetune defaults into saved_configuration_options, or
  • Mark default status only in the finetune group (and keep them excluded from “Saved Configurations”).

Also, the comment at line 180 (“only if selected_finetune_id is set”) doesn’t match the current logic, which purely checks whether any finetune run configs exist. Consider updating it to reflect the actual behavior to avoid confusion.

Also applies to: 180-197

app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte (2)

40-52: New props for tools selector settings, tools, and model visibility are consistent

  • tools_selector_settings: Partial<ToolsSelectorSettings> = {} cleanly exposes configuration for ToolsSelector while allowing call sites to omit fields.
  • Exporting tools: string[] = [] matches the bind:tools usage and lets parents query/reset tools via the new helpers.
  • hide_model_selector provides a clear toggle for hiding the model dropdown (useful for finetune flows or constrained UIs).

These additions expand the public API in a coherent way; just ensure call sites document how/when to rely on hide_model_selector and tools_selector_settings.


61-61: Finetune run-config handling and is_updating_from_saved_config guard look correct

  • is_updating_from_saved_config is set before applying a saved config and cleared only after an await tick(), which correctly prevents reset_to_custom_options_if_needed from flipping selected_run_config_id back to "custom" while you’re programmatically updating state.
  • Finetune IDs are derived from selected_run_config.id assuming the finetune_run_config::finetune_id convention, and you consistently synthesize:
    • Model: kiln_fine_tune/${project_id}::${current_task.id}::${finetune_id}
    • Prompt: fine_tune_prompt::${project_id}::${current_task.id}::${finetune_id}
  • The finetune branch in reset_to_custom_options_if_needed compares model, prompt_method, and provider against those canonical values, plus temperature/top_p/structured_output_mode/tools via arrays_equal, which should reliably detect manual edits and switch back to "custom".

Minor cleanups you might consider:

  • The explicit provider = selected_run_config.run_config_properties.model_provider_name inside the finetune branch is effectively overridden by the reactive $: provider = model ? model.split("/")[0] : "" and can likely be removed to reduce confusion.
  • As with the other file, the 'finetune_run_config::' prefix is hard-coded in two places here; centralizing it in a shared constant would help keep this in sync if the prefix ever changes.

Also applies to: 117-149, 160-165, 175-205

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6b7382 and 2f1fd4a.

📒 Files selected for processing (5)
  • app/desktop/studio_server/eval_api.py (1 hunks)
  • app/web_ui/src/lib/api_schema.d.ts (7 hunks)
  • app/web_ui/src/lib/stores/run_configs_store.ts (1 hunks)
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte (8 hunks)
  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte (3 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
app/web_ui/**/*.svelte

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

app/web_ui/**/*.svelte: Use Svelte v4 (not v5) for the web frontend UI
Use DaisyUI for UI components in the web frontend

Use Svelte v4 (not v5) for the web UI framework

Files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
app/web_ui/**/*.{svelte,ts}

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Tailwind CSS for styling in the web frontend

Files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
  • app/web_ui/src/lib/stores/run_configs_store.ts
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
  • app/web_ui/src/lib/api_schema.d.ts
app/web_ui/**/*.{svelte,css,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Tailwind and DaisyUI for styling the frontend

Files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
  • app/web_ui/src/lib/stores/run_configs_store.ts
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
  • app/web_ui/src/lib/api_schema.d.ts
app/web_ui/**/*.{svelte,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Make FastAPI backend calls from the Svelte web app to communicate with backend servers

Files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
  • app/web_ui/src/lib/stores/run_configs_store.ts
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
  • app/web_ui/src/lib/api_schema.d.ts
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation

Files:

  • app/desktop/studio_server/eval_api.py
app/desktop/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use python 3.13 for the desktop app

Files:

  • app/desktop/studio_server/eval_api.py
app/desktop/studio_server/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI for the desktop app server that extends libs/server with web app-specific APIs

Files:

  • app/desktop/studio_server/eval_api.py
app/web_ui/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use TypeScript for frontend type safety

Files:

  • app/web_ui/src/lib/stores/run_configs_store.ts
  • app/web_ui/src/lib/api_schema.d.ts
app/web_ui/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use TypeScript for frontend development

Files:

  • app/web_ui/src/lib/stores/run_configs_store.ts
  • app/web_ui/src/lib/api_schema.d.ts
🧠 Learnings (6)
📚 Learning: 2025-10-07T00:21:33.537Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 698
File: app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte:800-812
Timestamp: 2025-10-07T00:21:33.537Z
Learning: In the Kiln web UI codebase, the `save_new_task_run_config` function in `app/web_ui/src/lib/stores/run_configs_store.ts` automatically refreshes the run configs store by calling `load_task_run_configs(project_id, task_id, true)` with force_refresh=true after successfully creating a new run config. This means callers of save_new_task_run_config do not need to manually refresh the store afterward, as the store will already contain the newly created config.

Applied to files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
  • app/web_ui/src/lib/stores/run_configs_store.ts
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
📚 Learning: 2025-11-11T00:07:51.535Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 756
File: app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte:19-20
Timestamp: 2025-11-11T00:07:51.535Z
Learning: In app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte, the single_select_selected_tool prop should NOT be synchronized with the tools_store. Only the multi-select tools array should persist to the store. The single_select mode is intended to use ephemeral component-local state.

Applied to files:

  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
📚 Learning: 2025-08-08T15:50:45.334Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
📚 Learning: 2025-10-29T06:59:55.635Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 736
File: app/web_ui/src/lib/api_schema.d.ts:8143-8180
Timestamp: 2025-10-29T06:59:55.635Z
Learning: app/web_ui/src/lib/api_schema.d.ts is generated by openapi-typescript; do not propose manual edits. Schema changes should be made in the FastAPI backend (e.g., app/desktop/studio_server/data_gen_api.py or libs/server/kiln_server/*), then re-generate the TS types.

Applied to files:

  • app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-10-29T11:19:31.059Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 759
File: app/web_ui/src/lib/api_schema.d.ts:2485-2490
Timestamp: 2025-10-29T11:19:31.059Z
Learning: In libs/server/kiln_server/document_api.py and the generated app/web_ui/src/lib/api_schema.d.ts, CreateVectorStoreConfigRequest.properties is now a union of three distinct public schemas: LanceDBConfigFTSPropertiesPublic, LanceDBConfigVectorPropertiesPublic, and LanceDBConfigHybridPropertiesPublic. These are intentionally split to allow independent evolution; do not suggest merging them.

Applied to files:

  • app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-09-04T06:27:58.852Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: app/web_ui/src/lib/api_schema.d.ts:3315-3320
Timestamp: 2025-09-04T06:27:58.852Z
Learning: The file app/web_ui/src/lib/api_schema.d.ts is auto-generated by openapi-typescript from the backend API schema and should never be manually edited. Changes to API types should be made in the backend, which will then be reflected in the regenerated TypeScript definitions.

Applied to files:

  • app/web_ui/src/lib/api_schema.d.ts
🧬 Code graph analysis (1)
app/desktop/studio_server/eval_api.py (3)
libs/server/kiln_server/task_api.py (1)
  • task_from_id (16-25)
libs/core/kiln_ai/datamodel/eval.py (1)
  • configs (357-358)
libs/core/kiln_ai/datamodel/task.py (1)
  • run_configs (172-173)
🪛 GitHub Actions: Format and Lint
app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte

[error] 152-152: "Exlcude" is a misspelling of "Exclude".

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (macos-15-intel)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
🔇 Additional comments (10)
app/web_ui/src/lib/stores/run_configs_store.ts (1)

58-69: Path update correctly aligned with new run_configs endpoint

Switching load_task_run_configs to call /tasks/{task_id}/run_configs/ matches the new backend route and keeps all existing loading/error semantics intact. No further changes needed here.

app/web_ui/src/lib/api_schema.d.ts (4)

2634-2634: Optional run_config_properties on CreateFinetuneRequest is a reasonable, backwards‑compatible extension

Adding run_config_properties?: RunConfigProperties | null; cleanly reuses the shared run-config shape and keeps the field optional for callers that don’t yet send it. Frontend code consuming this should treat it as nullable/optional and handle the “no run config attached” case explicitly, but the schema change itself looks good. Any future tweaks should be made in the backend model and regenerated here.


3882-3883: Persisted Finetune.run_config aligns with the new creation request

run_config?: RunConfigProperties | null; on Finetune mirrors the creation request and will let the UI reason about which run configuration a fine-tune was built from. Keeping it optional preserves compatibility with older records. This addition looks consistent with the rest of the RunConfig wiring.


3945-3948: supports_function_calling on finetune models is appropriate; required boolean is fine

The new supports_function_calling: boolean; flag on FinetuneProviderModel matches how ModelDetails already exposes this capability, and the @default true comment makes sense for providers where function-calling is the norm. Since the field is required in the TS type, be sure all provider entries (including any persisted config) populate it to avoid surprises at runtime for older data. Any change to required/optional-ness should be done in the backend model and regenerated.


9019-9021: tool_ids query param on finetune_dataset_info is a sensible filter hook

Allowing tool_ids?: string[] | null; on the finetune_dataset_info GET parameters is a straightforward way to drive the newly added eligible_* fields from the UI. The typing looks correct; just ensure callers always pass an array (even singleton) rather than a comma‑separated string, per this schema.

app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte (1)

32-32: provider_name_from_id import is appropriate

Importing provider_name_from_id from $lib/stores matches how you resolve provider display names elsewhere; no issues here.

app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte (4)

31-32: Using shared arrays_equal and typed ToolsSelectorSettings looks good

Importing arrays_equal from $lib/utils/collections and ToolsSelectorSettings for the new settings prop is a solid move toward centralizing utilities and tightening types.


303-309: get_tools returns a defensive copy and clear_tools resets correctly

  • get_tools() returning [...tools] avoids accidental external mutation of internal component state.
  • clear_tools() centralizes tools clearing logic and works with the exported tools prop.

This is a good, minimal public surface for interacting with tools state.


313-321: hide_model_selector conditional around AvailableModelsDropdown is appropriate

Wrapping <AvailableModelsDropdown> in {#if !hide_model_selector} cleanly supports flows where the model is fixed or implied (e.g., finetunes) without changing the underlying model state logic. Just ensure callers that rely on get_selected_model() are aware it can now return null when the selector is hidden.


330-336: Passing tools_selector_settings into ToolsSelector in both locations aligns with the new API

Using:

<ToolsSelector
  bind:tools
  {project_id}
  task_id={current_task?.id ?? null}
  settings={tools_selector_settings}
/>

in both the main and “Advanced Options” branches keeps ToolsSelector configuration consistent regardless of layout, while still only binding the tools array.

Based on learnings, this preserves the intended contract where only the multi-select tools array persists to the store and any single-select mode stays ephemeral within ToolsSelector.

Also applies to: 350-356

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte (1)

152-161: Centralize the finetune_run_config:: prefix instead of hard‑coding

The filter correctly excludes finetune configs from the generic “other task run configs” list, but the "finetune_run_config::" prefix is now hard‑coded here and again below in the finetune group builder. To avoid drift if the prefix ever changes, consider introducing a shared constant (e.g. FINETUNE_RUN_CONFIG_PREFIX) in a common module and using it in both filters.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f1fd4a and c76dc65.

📒 Files selected for processing (1)
  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte (3 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
app/web_ui/**/*.svelte

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

app/web_ui/**/*.svelte: Use Svelte v4 (not v5) for the web frontend UI
Use DaisyUI for UI components in the web frontend

Use Svelte v4 (not v5) for the web UI framework

Files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
app/web_ui/**/*.{svelte,ts}

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Tailwind CSS for styling in the web frontend

Files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
app/web_ui/**/*.{svelte,css,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Tailwind and DaisyUI for styling the frontend

Files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
app/web_ui/**/*.{svelte,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Make FastAPI backend calls from the Svelte web app to communicate with backend servers

Files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
🧠 Learnings (2)
📚 Learning: 2025-10-07T00:21:33.537Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 698
File: app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte:800-812
Timestamp: 2025-10-07T00:21:33.537Z
Learning: In the Kiln web UI codebase, the `save_new_task_run_config` function in `app/web_ui/src/lib/stores/run_configs_store.ts` automatically refreshes the run configs store by calling `load_task_run_configs(project_id, task_id, true)` with force_refresh=true after successfully creating a new run config. This means callers of save_new_task_run_config do not need to manually refresh the store afterward, as the store will already contain the newly created config.

Applied to files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
📚 Learning: 2025-09-25T07:20:11.459Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 650
File: app/web_ui/src/routes/(app)/docs/library/[project_id]/[document_id]/+page.svelte:19-19
Timestamp: 2025-09-25T07:20:11.459Z
Learning: When analyzing relative import paths in SvelteKit route structures, be more careful to verify the actual directory structure exists before suggesting corrections. The import path ../../../../generate/[project_id]/[task_id]/table_button.svelte from app/web_ui/src/routes/(app)/docs/library/[project_id]/[document_id]/+page.svelte correctly resolves to the existing file at app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/table_button.svelte.

Applied to files:

  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (macos-15-intel)
  • GitHub Check: Build Desktop Apps (macos-latest)
🔇 Additional comments (1)
app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte (1)

32-32: Provider label import looks good

Using provider_name_from_id here to enrich finetune labels with provider names is a clean reuse of existing store logic; no issues from this change alone.

Comment on lines 180 to 197
// Add finetune run configs only if selected_finetune_id is set
const finetune_run_configs = (
run_configs_by_task_composite_id[
get_task_composite_id(project_id, current_task.id ?? "")
] ?? []
).filter((config) => config.id?.startsWith("finetune_run_config::"))
if (finetune_run_configs.length > 0) {
options.push({
label: "Fine-Tune Configuration",
options: finetune_run_configs.map((config) => ({
value: config.id ?? "",
label: `${config.name} (${provider_name_from_id(config.run_config_properties.model_provider_name)})`,
description: `Base Model: ${config.run_config_properties.model_name},
Prompt: ${getRunConfigPromptDisplayName(config, current_task_prompts)}`,
})),
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix misleading comment and tighten finetune section semantics

  • The comment // Add finetune run configs only if selected_finetune_id is set is stale: there is no selected_finetune_id, and the block actually runs whenever any finetune configs exist. Suggest updating to something like // Add finetune run configs if any exist to match behavior.
  • (Optional) The group label "Fine-Tune Configuration" is singular but may contain multiple configs; consider "Fine-Tune Configurations" for clarity.
🤖 Prompt for AI Agents
In app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
around lines 180 to 197, the inline comment is stale and misleading — it says
"// Add finetune run configs only if selected_finetune_id is set" but there is
no selected_finetune_id and the block actually adds finetune configs whenever
any exist; update the comment to accurately reflect behavior (e.g., "// Add
finetune run configs if any exist"). Also consider changing the group label from
the singular "Fine-Tune Configuration" to the plural "Fine-Tune Configurations"
to match that the options array may contain multiple entries; update the label
string accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (1)

619-659: Bug: R1‑disabled download options never populate data_strategy_select_options

In update_data_strategies_supported, the special‑case branch for R1‑disabled downloads:

if (r1_disabled_for_downloads.includes(model_provider)) {
  return ["final_only", "two_message_cot"]
}

returns an array that is never used by the caller. As a result, for those download options (download_huggingface_chat_template_toolcall, download_jsonl_toolcall, download_vertex_gemini), data_strategy_select_options and data_strategy never get updated and will stay at their previous values (or empty on first selection).

You can fix this by assigning to compatible_data_strategies instead of returning early, e.g.:

-  const r1_disabled_for_downloads = [
+  const r1_disabled_for_downloads = [
     "download_huggingface_chat_template_toolcall",
     "download_jsonl_toolcall",
     "download_vertex_gemini",
   ]
-  if (r1_disabled_for_downloads.includes(model_provider)) {
-    return ["final_only", "two_message_cot"]
-  }
-
-  const compatible_data_strategies: ChatStrategy[] = is_download
-    ? [
-        "final_only",
-        "two_message_cot",
-        "final_and_intermediate_r1_compatible",
-      ]
-    : available_models
+  let compatible_data_strategies: ChatStrategy[]
+
+  if (is_download && r1_disabled_for_downloads.includes(model_provider)) {
+    compatible_data_strategies = ["final_only", "two_message_cot"]
+  } else if (is_download) {
+    compatible_data_strategies = [
+      "final_only",
+      "two_message_cot",
+      "final_and_intermediate_r1_compatible",
+    ]
+  } else {
+    compatible_data_strategies =
+      available_models
         ?.map((model) => model.models)
         .flat()
         .find((model) => model.id === base_model_id)
-        ?.data_strategies_supported ?? []
+        ?.data_strategies_supported ?? []
+  }

Optionally, you might also only reset data_strategy to the first compatible option if the current data_strategy is not in compatible_data_strategies, to better respect restored state:

-  data_strategy = compatible_data_strategies[0]
+  if (!compatible_data_strategies.includes(data_strategy)) {
+    data_strategy = compatible_data_strategies[0]
+  }
♻️ Duplicate comments (1)
app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (1)

74-85: Dataset selection can remain stale when tools are auto‑cleared

When a user switches to a model without tool‑calling support, the reactive block at Lines 74‑81 clears tools via run_config_component.clear_tools(), but any previously selected dataset (Lines 836‑845) is left unchanged, including when it was chosen under a specific required_tool_ids filter.

This can leave the form in a state where:

  • No tools are selected, but
  • selected_dataset is still a tools‑filtered dataset that may no longer match the current configuration.

Consider also clearing or re‑validating selected_dataset when tools are cleared, for example:

  $: if (
    selected_model &&
    !selected_model.model.supports_function_calling &&
    run_config_component
  ) {
    // Clear tools if the model doesn't support function calling
    run_config_component.clear_tools()
+
+    // Clear dataset if it was selected while tools were required
+    if (selected_dataset && selected_tool_ids.length === 0) {
+      selected_dataset = null
+    }
  }

This keeps dataset and tool configuration in sync with the selected model.

Also applies to: 836-845

🧹 Nitpick comments (2)
app/desktop/studio_server/test_eval_api.py (1)

1889-1975: Consider adding test coverage for pending status.

The test covers completed, running, unknown, and failed statuses, but FineTuneStatusType.pending is not tested. Based on the implementation in eval_api.py, finetunes with pending status would be included in the results (since only unknown and failed are excluded). Adding a test case for pending would ensure this behavior is verified.

         Finetune(
+            id="ft_pending",
+            name="Pending Finetune",
+            provider="openai",
+            base_model_id="model6",
+            dataset_split_id="split6",
+            system_message="System message",
+            latest_status=FineTuneStatusType.pending,
+            run_config=run_config_props,
+            parent=mock_task,
+        ),
+        Finetune(
             id="ft_no_run_config",

And add the assertion:

     assert "finetune_run_config::ft_running" in config_ids
+    assert "finetune_run_config::ft_pending" in config_ids
     assert "finetune_run_config::ft_unknown" not in config_ids
app/desktop/studio_server/eval_api.py (1)

326-350: LGTM overall. Minor suggestions for consistency.

The endpoint correctly filters out finetunes with unknown or failed status and those without a run_config. A couple of observations:

  1. The endpoint path has a trailing slash (/run_configs/) while similar endpoints like /task_run_configs (line 319) don't. Consider removing the trailing slash for consistency unless this is intentional.

  2. The condition could be slightly more readable using a set membership check:

-    @app.get("/api/projects/{project_id}/tasks/{task_id}/run_configs/")
+    @app.get("/api/projects/{project_id}/tasks/{task_id}/run_configs")
     async def get_run_configs(project_id: str, task_id: str) -> list[TaskRunConfig]:
         # Returns all run configs of a given task.
         task = task_from_id(project_id, task_id)
         configs = task.run_configs()

         # Get run configs from finetunes
         finetunes = task.finetunes()
         for finetune in finetunes:
             # Only include if the finetune is not unknown or failed
             if (
-                finetune.latest_status != FineTuneStatusType.unknown
-                and finetune.latest_status != FineTuneStatusType.failed
+                finetune.latest_status not in (FineTuneStatusType.unknown, FineTuneStatusType.failed)
                 and finetune.run_config is not None
             ):
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c76dc65 and 178c710.

📒 Files selected for processing (3)
  • app/desktop/studio_server/eval_api.py (2 hunks)
  • app/desktop/studio_server/test_eval_api.py (3 hunks)
  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (12 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation

Files:

  • app/desktop/studio_server/eval_api.py
  • app/desktop/studio_server/test_eval_api.py
app/desktop/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use python 3.13 for the desktop app

Files:

  • app/desktop/studio_server/eval_api.py
  • app/desktop/studio_server/test_eval_api.py
app/desktop/studio_server/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI for the desktop app server that extends libs/server with web app-specific APIs

Files:

  • app/desktop/studio_server/eval_api.py
  • app/desktop/studio_server/test_eval_api.py
**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use pytest for testing Python code

Use pytest for testing in python projects

Files:

  • app/desktop/studio_server/test_eval_api.py
app/web_ui/**/*.svelte

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

app/web_ui/**/*.svelte: Use Svelte v4 (not v5) for the web frontend UI
Use DaisyUI for UI components in the web frontend

Use Svelte v4 (not v5) for the web UI framework

Files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
app/web_ui/**/*.{svelte,ts}

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Tailwind CSS for styling in the web frontend

Files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
app/web_ui/**/*.{svelte,css,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Tailwind and DaisyUI for styling the frontend

Files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
app/web_ui/**/*.{svelte,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Make FastAPI backend calls from the Svelte web app to communicate with backend servers

Files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
🧠 Learnings (3)
📚 Learning: 2025-10-22T04:31:20.500Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 739
File: app/web_ui/src/routes/(app)/generate/[project_id]/[task_id]/synth/+page.svelte:170-177
Timestamp: 2025-10-22T04:31:20.500Z
Learning: In Svelte components for the Kiln project, leonardmq prefers tracking previous query parameter values and comparing them in reactive blocks (rather than just checking for existence) to avoid duplicate execution on initial load while still reacting to subsequent navigation changes. This pattern supports both current behavior and future feature additions like mode switching.

Applied to files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
📚 Learning: 2025-11-11T00:07:51.535Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 756
File: app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte:19-20
Timestamp: 2025-11-11T00:07:51.535Z
Learning: In app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte, the single_select_selected_tool prop should NOT be synchronized with the tools_store. Only the multi-select tools array should persist to the store. The single_select mode is intended to use ephemeral component-local state.

Applied to files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
📚 Learning: 2025-10-07T00:21:33.537Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 698
File: app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte:800-812
Timestamp: 2025-10-07T00:21:33.537Z
Learning: In the Kiln web UI codebase, the `save_new_task_run_config` function in `app/web_ui/src/lib/stores/run_configs_store.ts` automatically refreshes the run configs store by calling `load_task_run_configs(project_id, task_id, true)` with force_refresh=true after successfully creating a new run config. This means callers of save_new_task_run_config do not need to manually refresh the store afterward, as the store will already contain the newly created config.

Applied to files:

  • app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte
🧬 Code graph analysis (2)
app/desktop/studio_server/eval_api.py (4)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • FineTuneStatusType (52-61)
libs/server/kiln_server/task_api.py (1)
  • task_from_id (16-25)
libs/core/kiln_ai/datamodel/eval.py (1)
  • configs (357-358)
libs/core/kiln_ai/datamodel/task.py (1)
  • run_configs (172-173)
app/desktop/studio_server/test_eval_api.py (1)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • FineTuneStatusType (52-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build Desktop Apps (macos-15-intel)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
🔇 Additional comments (3)
app/desktop/studio_server/eval_api.py (1)

11-11: Import addition looks correct.

The FineTuneStatusType import is appropriately added and used for the finetune status filtering logic.

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte (2)

92-113: SavedFinetuneState + state_initialized gating now correctly prevent repopulating cleared drafts

The IndexedDB‑backed SavedFinetuneState plus the state_initialized guard (Lines 92‑113, 151‑201, 229‑244) correctly avoid re‑writing cleared state:

  • The reactive writer only runs when state_initialized is true.
  • clear_saved_state() flips state_initialized to false before updating saved_state, so the $: block no longer repopulates it.
  • After mount, you restore from $saved_state (if any) and only then set state_initialized = true, so initialization and persistence are nicely separated.
  • Calling clear_saved_state() after a successful finetune (Lines 514‑516) clears the persisted draft without needing to reload.

This addresses the earlier issue where clearing was immediately undone by the reactive writer while preserving a straightforward path to re‑enable autosave on a fresh page load.

Also applies to: 151-201, 229-244, 514-516


52-57: RunConfigComponent integration and tool plumbing look consistent with existing patterns

The new RunConfigComponent wiring and tool plumbing are coherent:

  • bind:tools={selected_tool_ids} keeps the multi‑select tools array as the single source of truth on this page (Lines 83‑87, 803‑820), aligning with the prior guidance that only the multi‑select tools array should persist, not any single‑select state.
  • disabled_tools_selector plus the reactive clear_tools() call correctly enforce “no tools” when the selected model doesn’t support function calling (Lines 74‑85, 803‑820).
  • run_config_properties is assembled from run_config_component.run_options_as_run_config_properties() and enriched with model_name (derived from base_model_id) and model_provider_name before being sent to the finetune API (Lines 447‑457, 487).
  • posthog metrics (supports_tools, tool_count) and the post‑create load_task_run_configs(...) refresh (Lines 501‑513) keep analytics and the run‑config store aligned with the new finetune.

This setup cleanly connects the shared RunConfigComponent to finetune creation and tool‑aware analytics.

Also applies to: 83-87, 447-457, 501-503, 503-513, 803-820

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
libs/core/kiln_ai/adapters/fine_tune/base_finetune.py (1)

8-9: Run‑config propagation into finetunes is wired correctly

  • Adding run_config: RunConfigProperties | None to create_and_start and threading it into FinetuneModel(run_config=run_config) cleanly persists the originating run configuration with the finetune.
  • Setting run_config.model_provider_name = ModelProviderName.kiln_fine_tune and run_config.model_name = datamodel.model_id() after _start ensures downstream runs will correctly target the new fine‑tuned model rather than the base model, and the provider enum aligns with the existing ModelProviderName.kiln_fine_tune value.
  • Note that when run_config is provided, this now implicitly requires the finetune to have a valid Project/Task parent chain (otherwise model_id() will raise), which matches the new Finetune.model_id() contract and seems appropriate for persisted fine‑tunes.

Based on learnings, this usage of ModelProviderName on run‑config properties matches how provider enums are used elsewhere in the codebase.

Also applies to: 67-68, 96-110, 115-119

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe1569f and cf7faae.

📒 Files selected for processing (6)
  • app/desktop/studio_server/provider_api.py (1 hunks)
  • app/desktop/studio_server/test_provider_api.py (1 hunks)
  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py (3 hunks)
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py (4 hunks)
  • libs/core/kiln_ai/datamodel/finetune.py (2 hunks)
  • libs/core/kiln_ai/datamodel/test_models.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • libs/core/kiln_ai/datamodel/finetune.py
🧰 Additional context used
📓 Path-based instructions (6)
**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use pytest for testing Python code

Use pytest for testing in python projects

Files:

  • app/desktop/studio_server/test_provider_api.py
  • libs/core/kiln_ai/datamodel/test_models.py
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation

Files:

  • app/desktop/studio_server/test_provider_api.py
  • app/desktop/studio_server/provider_api.py
  • libs/core/kiln_ai/datamodel/test_models.py
  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
app/desktop/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use python 3.13 for the desktop app

Files:

  • app/desktop/studio_server/test_provider_api.py
  • app/desktop/studio_server/provider_api.py
app/desktop/studio_server/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI for the desktop app server that extends libs/server with web app-specific APIs

Files:

  • app/desktop/studio_server/test_provider_api.py
  • app/desktop/studio_server/provider_api.py
libs/core/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Python 3.10+ for the core library (libs/core) and Python 3.13 for the desktop app

Use python 3.10+ for the core library (libs/core) and python 3.13 for the desktop app

Files:

  • libs/core/kiln_ai/datamodel/test_models.py
  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
libs/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use Pydantic v2 (not v1) for data validation and serialization

Files:

  • libs/core/kiln_ai/datamodel/test_models.py
  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: chiang-daniel
Repo: Kiln-AI/Kiln PR: 781
File: app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte:74-86
Timestamp: 2025-11-26T05:40:24.416Z
Learning: In the fine-tune creation flow (app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte), it is acceptable for a dataset containing tool data to be used when training a model that doesn't support function calling. The training will proceed without using the tool call data from the dataset. This allows for dataset reuse across different model types.
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

Applied to files:

  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
📚 Learning: 2025-08-22T11:15:53.584Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 413
File: libs/core/kiln_ai/adapters/embedding/embedding_registry.py:14-20
Timestamp: 2025-08-22T11:15:53.584Z
Learning: In the Kiln AI codebase, model_provider_name fields in datamodels like EmbeddingConfig are typed as ModelProviderName enum (which inherits from str, Enum), not as plain strings. Therefore, accessing .value on these fields is valid and returns the string representation of the enum value.

Applied to files:

  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
📚 Learning: 2025-08-08T15:50:45.334Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
🧬 Code graph analysis (4)
app/desktop/studio_server/test_provider_api.py (3)
libs/core/kiln_ai/datamodel/finetune.py (1)
  • model_id (99-109)
app/web_ui/src/lib/types.ts (1)
  • ModelProviderName (65-65)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • ModelProviderName (84-106)
app/desktop/studio_server/provider_api.py (1)
libs/core/kiln_ai/datamodel/finetune.py (1)
  • model_id (99-109)
libs/core/kiln_ai/datamodel/test_models.py (2)
libs/core/kiln_ai/tools/rag_tools.py (1)
  • project (110-114)
libs/core/kiln_ai/datamodel/finetune.py (2)
  • Finetune (23-127)
  • model_id (99-109)
libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py (1)
libs/core/kiln_ai/datamodel/finetune.py (2)
  • Finetune (23-127)
  • model_id (99-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Generate Coverage Report
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build Desktop Apps (macos-15-intel)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (5)
app/desktop/studio_server/provider_api.py (1)

1452-1452: LGTM! Good refactoring to centralize ID construction.

Replacing inline string formatting with the centralized model_id() method improves maintainability and ensures consistent ID generation across the codebase. The method also validates parent relationships and fails fast with a clear error if they're not properly set up, which is better than potentially creating invalid IDs.

app/desktop/studio_server/test_provider_api.py (1)

1267-1282: Mocking model_id() to match new composite ID format looks correct

Stubbing mock_fine_tune1.model_id() / mock_fine_tune2.model_id() with projX::taskY::ftZ aligns the test with the new Finetune.model_id() behavior while avoiding real parent relationships on the mocks. Assertions on result.models[*].id remain deterministic and consistent with the datamodel.

libs/core/kiln_ai/datamodel/test_models.py (1)

334-378: Finetune.model_id() coverage is solid and matches datamodel contract

This test exercises the composite ID generation and both error branches (missing task, missing project) with the exact messages used in Finetune.model_id(). It gives good confidence that downstream callers relying on the project_id::task_id::finetune_id format will behave correctly.

libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py (2)

11-19: Project/Task fixture setup correctly supports finetune model_id() and run‑config tests

Importing Project/Task and updating sample_task to create a saved Project with a Task parent ensures dataset.parent_task() has a path and task.parent_project() is non‑null. That matches the requirements of Finetune.model_id() and makes the new run‑config tests realistic relative to how finetunes are created in the app.

Also applies to: 48-52, 57-60


189-190: Run‑config behavior for create_and_start is well covered

  • The extra assertion in test_create_and_start_success that datamodel.run_config is None confirms the default path isn’t leaking any config.
  • test_create_and_start_with_run_config thoroughly checks that:
    • a provided RunConfigProperties is attached to the finetune,
    • its model_provider_name is rewritten to kiln_fine_tune,
    • its model_name is set to the composite finetune ID (datamodel.model_id()),
    • and all other fields (e.g., prompt_id, temperature, top_p) survive the round‑trip through persistence.

This aligns the tests with the new adapter behavior and with the enum‑typed model_provider_name/structured_output_mode design.

Also applies to: 222-260

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
libs/core/kiln_ai/adapters/fine_tune/base_finetune.py (1)

58-72: run_config propagation is sound; consider minor DRY and signature cleanup

The run_config threading into FinetuneModel and the post‑_start mutation (provider, model_name, prompt_id, structured_output_mode) looks logically correct and matches the finetune model_id() behavior from libs/core/kiln_ai/datamodel/finetune.py. The tests exercise this well.

Two optional improvements:

  1. Avoid mutable default for parameters

You can eliminate the shared {} default and keep the public API the same:

-        parameters: dict[str, str | int | float | bool] = {},
+        parameters: dict[str, str | int | float | bool] | None = None,
@@
-        cls.validate_parameters(parameters)
+        if parameters is None:
+            parameters = {}
+        cls.validate_parameters(parameters)
@@
-            parameters=parameters,
+            parameters=parameters,
  1. Reuse model_id() when building prompt_id

To avoid re‑implementing the project/task/id logic that already exists in FinetuneModel.model_id(), you can derive both model_name and prompt_id from a single call:

-            run_config.model_provider_name = ModelProviderName.kiln_fine_tune
-            run_config.model_name = datamodel.model_id()
-            # Build the fine-tune prompt ID
-            task = datamodel.parent_task()
-            if task is None:
-                raise ValueError("Finetune must have a parent task")
-            project = task.parent_project()
-            if project is None:
-                raise ValueError("Task must have a parent project")
-            run_config.prompt_id = (
-                f"fine_tune_prompt::{project.id}::{task.id}::{datamodel.id}"
-            )
+            run_config.model_provider_name = ModelProviderName.kiln_fine_tune
+            model_id = datamodel.model_id()
+            run_config.model_name = model_id
+            run_config.prompt_id = f"fine_tune_prompt::{model_id}"

This keeps the ID construction logic centralized in the datamodel. Based on relevant code in libs/core/kiln_ai/datamodel/finetune.py.

Also applies to: 100-114, 119-138

libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py (1)

243-290: run_config mutation and persistence test is thorough

test_create_and_start_with_run_config does a solid job:

  • Verifies that model_provider_name is rewritten to kiln_fine_tune.
  • Asserts model_name matches datamodel.model_id().
  • Checks prompt_id format using project/task/datamodel IDs.
  • Confirms temperature/top_p passthrough and JSON‑instructions defaulting.
  • Confirms run_config is persisted via FinetuneModel.load_from_file.

One optional enhancement would be an additional negative test where a Task lacks a parent Project but a run_config is provided, asserting the expected ValueError, to lock in that failure mode as intentional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf7faae and dcb8ff2.

📒 Files selected for processing (2)
  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py (3 hunks)
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
libs/core/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Python 3.10+ for the core library (libs/core) and Python 3.13 for the desktop app

Use python 3.10+ for the core library (libs/core) and python 3.13 for the desktop app

Files:

  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation

Files:

  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
libs/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use Pydantic v2 (not v1) for data validation and serialization

Files:

  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use pytest for testing Python code

Use pytest for testing in python projects

Files:

  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: chiang-daniel
Repo: Kiln-AI/Kiln PR: 781
File: app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte:74-86
Timestamp: 2025-11-26T05:40:24.416Z
Learning: In the fine-tune creation flow (app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte), it is acceptable for a dataset containing tool data to be used when training a model that doesn't support function calling. The training will proceed without using the tool call data from the dataset. This allows for dataset reuse across different model types.
📚 Learning: 2025-08-08T16:13:26.526Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1875-1890
Timestamp: 2025-08-08T16:13:26.526Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py (Python), do not blanket-add r1_thinking/optional_r1_thinking parsers for R1-style models. Parser usage is provider-specific and must be based on observed responses in tests. For PR Kiln-AI/Kiln#418, deepseek_r1_0528_distill_qwen3_8b providers were validated without needing a parser.

Applied to files:

  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
📚 Learning: 2025-08-08T15:50:45.334Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
📚 Learning: 2025-08-08T16:14:54.346Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1979-1983
Timestamp: 2025-08-08T16:14:54.346Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py, for ModelName.deepseek_r1_distill_qwen_32b on provider siliconflow_cn (model_id "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via testing that SiliconFlow returns final outputs that parse without an R1 parser, and reasoning may be omitted under structured output; rely on reasoning_optional_for_structured_output=True.

Applied to files:

  • libs/core/kiln_ai/adapters/fine_tune/base_finetune.py
  • libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py
🧬 Code graph analysis (1)
libs/core/kiln_ai/adapters/fine_tune/base_finetune.py (4)
libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py (2)
  • _start (24-25)
  • _start (50-51)
app/web_ui/src/lib/stores.ts (1)
  • model_name (466-479)
libs/core/kiln_ai/datamodel/finetune.py (2)
  • model_id (99-109)
  • parent_task (94-97)
libs/core/kiln_ai/tools/rag_tools.py (1)
  • project (110-114)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Generate Coverage Report
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build Desktop Apps (windows-latest)
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
  • GitHub Check: Build Desktop Apps (macos-15-intel)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
🔇 Additional comments (6)
libs/core/kiln_ai/adapters/fine_tune/base_finetune.py (1)

8-13: Enums and RunConfigProperties imports look correct

Imports align with new run_config behavior and structured output handling; no issues here.

libs/core/kiln_ai/adapters/fine_tune/test_base_finetune.py (5)

11-18: New datamodel and enum imports are appropriate

Bringing in Project, ModelProviderName, StructuredOutputMode, and RunConfigProperties matches the new run_config behavior under test; no issues.


47-66: MockFinetuneWithStructuredOutput correctly simulates adapter behavior

This mock neatly exercises the case where _start sets structured_output_mode (like Fireworks), allowing you to verify that run_config picks up the adapter‑defined mode.


70-79: Project-backed sample_task fixture aligns with finetune model_id/prompt_id requirements

Creating and persisting a Project and using it as the Task parent ensures that parent_project() is non‑null, which is required by FinetuneModel.model_id() and your new run_config prompt_id logic. Good fixture hardening.


210-210: Default run_config assertion is useful

Asserting datamodel.run_config is None in the minimal create‑and‑start path clearly documents the default and guards against regressions where a run_config might be auto‑created.


292-320: Structured output mode interaction test covers adapter-driven behavior

test_create_and_start_with_datamodel_structured_output_mode nicely validates that:

  • The adapter can set datamodel.structured_output_mode in _start.
  • The final run_config.structured_output_mode mirrors that value (overriding the initial default).

This confirms the intended precedence between adapter behavior and incoming run_config.

scosman and others added 8 commits November 27, 2025 22:13
…ncy fixes to make it easier to work with.

- UI code was overly aware of FineTune run configs. They are just run configs, treat them as such. Exception is one UI centric behaviour: when I select a FineTune, select the FT run config in the UI. Made this work with a model_specific_run_config concept in API.
- This UI code had weird circular reactivity (see example below). Was solved by fragile `await tick()` and guard vars calls, but that means it has implicit deps on call order, and if certain awaits were actually async or not. Fixed by making all reactivity one big method, with deterministic order and debouncing.
- Remove guard vars for deterministic run order
- Fix a bug where legacy FT models with data_mode=unknown couldn’t use the rest of their run config because data mode never matched
- Only show the a fine-tune run config if the user has selected that fine-tune. No need to show them all all the time. They don’t have meaning or value outside of the fine tune.
- Show a warning if using a fine-tune, but not it’s suggested run_config

Example of circular reactivity:
Changing the run config will update model, but selecting model will set the run config (jump back to "custom" or set fine-tune run config). These are legit desired behaviour: respect the user's last selection, and make the rest consistent. But it makes updating the state a bit tricky with many async reactive statements.
… is specified (bug previously that reloaded every time)
Clean fine-tuning UI for fine-tune-run-configs.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/web_ui/src/lib/api_schema.d.ts (1)

4973-5007: RunConfigProperties prompt_id field has a documented/type mismatch that requires backend schema correction

The auto-generated TypeScript schema correctly reflects the backend Pydantic model, but the backend model itself has an inconsistency: prompt_id is defined as a required PromptId field in libs/core/kiln_ai/datamodel/run_config.py (line 35–37), yet its description states "Defaults to building a simple prompt from the task if not provided"—implying optional behavior.

This mismatch is evident in usage patterns: app/desktop/studio_server/data_gen_api.py (lines 144–146, 174–176) explicitly overrides prompt_id with a comment noting "in case we change the default in the UI in the future," suggesting the backend team recognizes prompt_id should be optional.

To fix: update the backend Pydantic model to make prompt_id optional (prompt_id: PromptId | None = None), update its description to clarify the default behavior, regenerate the OpenAPI schema, and the frontend types will sync automatically via openapi-typescript.

🧹 Nitpick comments (5)
app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte (5)

85-103: Structured output mode auto-selection is solid; consider dropping the unused return

Using available_model_details to derive structured_output_mode when requires_structured_output is true is a good centralization of logic. The return true on Line 100 is currently unused and could be removed to avoid implying the function’s result is meaningful.


105-132: Run-config → current run options mapping guarded by prior_selected_run_config_id looks correct

The prior_selected_run_config_id check prevents re-applying the same config on every reactive pass, while still ensuring that the latest selected_run_config_id wins even under rapid user changes. Mapping all relevant fields (model, prompt, tools, temperature, top_p, structured_output_mode) from the selected run config is complete and matches run_options_as_run_config_properties.

One thing to keep in mind: if current_task changes without resetting selected_run_config_id, you might want to explicitly reset prior_selected_run_config_id as well so the new task’s selected config is always applied once.


134-198: Debounced reactive pipeline is well-structured; consider catching errors inside debounce_update_for_state_changes

The debouncing via running/run_again and tick() is a reasonable way to serialize the circular dependencies between model, run config, and tools while avoiding thrash. The stepwise update_for_state_changes (model change → structured output defaults → apply selected run config → reset to custom if diverged) is easy to reason about.

Right now, any exception thrown from update_for_state_changes will propagate out of debounce_update_for_state_changes (the try/finally doesn’t catch), which can surface as an unhandled promise since the reactive call doesn’t await it. Wrapping the body in a try/catch/finally that logs or otherwise handles errors would make this more robust against transient failures in load_task_run_configs or similar calls.


233-260: Run-config “custom” reset logic is more robust now

Breaking out model_changed, provider_changed, and prompt_changed, adding the "unknown" special-case for structured_output_mode, and switching to arrays_equal for comparing tools arrays makes the “has the user diverged from this saved config?” check much clearer and less fragile. This should avoid spurious resets when legacy configs use "unknown" and gives a cleaner tools comparison.


350-356: Exposing tools via copy and clear_tools aligns with ToolsSelector learnings

Returning a shallow copy from get_tools and providing clear_tools() to reset local tools state avoids external consumers mutating internal arrays by reference, while still giving parents control when they bind tools. This also keeps the persistent state surface centered on the multi-select tools array, with single-select behavior remaining inside ToolsSelector itself, which is consistent with the earlier guidance about not synchronizing single-select state to the store. Based on learnings, this is the intended behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcb8ff2 and 4adf7f4.

📒 Files selected for processing (10)
  • app/desktop/studio_server/eval_api.py (1 hunks)
  • app/desktop/studio_server/provider_api.py (5 hunks)
  • app/desktop/studio_server/test_eval_api.py (3 hunks)
  • app/desktop/studio_server/test_provider_api.py (5 hunks)
  • app/web_ui/src/lib/api_schema.d.ts (9 hunks)
  • app/web_ui/src/lib/stores/run_configs_store.ts (3 hunks)
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte (6 hunks)
  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte (7 hunks)
  • app/web_ui/src/routes/(app)/run/+page.svelte (3 hunks)
  • libs/core/kiln_ai/datamodel/finetune.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • app/web_ui/src/lib/ui/run_config_component/saved_run_configs_dropdown.svelte
  • app/web_ui/src/lib/stores/run_configs_store.ts
  • app/desktop/studio_server/eval_api.py
  • app/desktop/studio_server/test_provider_api.py
  • libs/core/kiln_ai/datamodel/finetune.py
🧰 Additional context used
📓 Path-based instructions (10)
app/web_ui/**/*.svelte

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

app/web_ui/**/*.svelte: Use Svelte v4 (not v5) for the web frontend UI
Use DaisyUI for UI components in the web frontend

Use Svelte v4 (not v5) for the web UI framework

Files:

  • app/web_ui/src/routes/(app)/run/+page.svelte
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
app/web_ui/**/*.{svelte,ts}

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use Tailwind CSS for styling in the web frontend

Files:

  • app/web_ui/src/routes/(app)/run/+page.svelte
  • app/web_ui/src/lib/api_schema.d.ts
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
app/web_ui/**/*.{svelte,css,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use Tailwind and DaisyUI for styling the frontend

Files:

  • app/web_ui/src/routes/(app)/run/+page.svelte
  • app/web_ui/src/lib/api_schema.d.ts
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
app/web_ui/**/*.{svelte,ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Make FastAPI backend calls from the Svelte web app to communicate with backend servers

Files:

  • app/web_ui/src/routes/(app)/run/+page.svelte
  • app/web_ui/src/lib/api_schema.d.ts
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

**/*.py: Use asyncio for asynchronous operations in Python
Use Pydantic v2 (not v1) for data validation

Files:

  • app/desktop/studio_server/provider_api.py
  • app/desktop/studio_server/test_eval_api.py
app/desktop/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use python 3.13 for the desktop app

Files:

  • app/desktop/studio_server/provider_api.py
  • app/desktop/studio_server/test_eval_api.py
app/desktop/studio_server/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI for the desktop app server that extends libs/server with web app-specific APIs

Files:

  • app/desktop/studio_server/provider_api.py
  • app/desktop/studio_server/test_eval_api.py
app/web_ui/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use TypeScript for frontend type safety

Files:

  • app/web_ui/src/lib/api_schema.d.ts
app/web_ui/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use TypeScript for frontend development

Files:

  • app/web_ui/src/lib/api_schema.d.ts
**/*test*.py

📄 CodeRabbit inference engine (.cursor/rules/project.mdc)

Use pytest for testing Python code

Use pytest for testing in python projects

Files:

  • app/desktop/studio_server/test_eval_api.py
🧠 Learnings (12)
📓 Common learnings
Learnt from: chiang-daniel
Repo: Kiln-AI/Kiln PR: 781
File: app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte:74-86
Timestamp: 2025-11-26T05:40:24.426Z
Learning: In the fine-tune creation flow (app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte), it is acceptable for a dataset containing tool data to be used when training a model that doesn't support function calling. The training will proceed without using the tool call data from the dataset. This allows for dataset reuse across different model types.
📚 Learning: 2025-11-11T00:07:51.535Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 756
File: app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte:19-20
Timestamp: 2025-11-11T00:07:51.535Z
Learning: In app/web_ui/src/lib/ui/run_config_component/tools_selector.svelte, the single_select_selected_tool prop should NOT be synchronized with the tools_store. Only the multi-select tools array should persist to the store. The single_select mode is intended to use ephemeral component-local state.

Applied to files:

  • app/web_ui/src/routes/(app)/run/+page.svelte
  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
📚 Learning: 2025-11-24T19:13:27.863Z
Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-24T19:13:27.863Z
Learning: Applies to app/web_ui/**/*.{svelte,ts,tsx} : Make FastAPI backend calls from the Svelte web app to communicate with backend servers

Applied to files:

  • app/web_ui/src/routes/(app)/run/+page.svelte
📚 Learning: 2025-11-24T19:13:27.863Z
Learnt from: CR
Repo: Kiln-AI/Kiln PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-24T19:13:27.863Z
Learning: Applies to app/web_ui/**/*.svelte : Use Svelte v4 (not v5) for the web UI framework

Applied to files:

  • app/web_ui/src/routes/(app)/run/+page.svelte
📚 Learning: 2025-08-08T15:50:45.334Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:221-228
Timestamp: 2025-08-08T15:50:45.334Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.KilnModelProvider, naming policy: keep cross-provider feature flags unprefixed (e.g., reasoning_optional_for_structured_output) to allow reuse across providers; prefix provider-specific toggles with the provider name (e.g., siliconflow_enable_thinking) when the implementation is specific and potentially unsafe to generalize.

Applied to files:

  • app/desktop/studio_server/provider_api.py
📚 Learning: 2025-08-08T16:14:54.346Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 418
File: libs/core/kiln_ai/adapters/ml_model_list.py:1979-1983
Timestamp: 2025-08-08T16:14:54.346Z
Learning: In libs/core/kiln_ai/adapters/ml_model_list.py, for ModelName.deepseek_r1_distill_qwen_32b on provider siliconflow_cn (model_id "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"), do not set a parser (r1_thinking/optional_r1_thinking). Verified via testing that SiliconFlow returns final outputs that parse without an R1 parser, and reasoning may be omitted under structured output; rely on reasoning_optional_for_structured_output=True.

Applied to files:

  • app/desktop/studio_server/provider_api.py
📚 Learning: 2025-08-15T14:07:02.226Z
Learnt from: scosman
Repo: Kiln-AI/Kiln PR: 473
File: app/desktop/studio_server/provider_api.py:1081-1081
Timestamp: 2025-08-15T14:07:02.226Z
Learning: For Docker Model Runner, untested models are intentionally set with supports_structured_output=True, unlike Ollama where untested models have supports_structured_output=False. This is a deliberate design choice by the maintainers.

Applied to files:

  • app/desktop/studio_server/provider_api.py
📚 Learning: 2025-10-29T06:59:55.635Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 736
File: app/web_ui/src/lib/api_schema.d.ts:8143-8180
Timestamp: 2025-10-29T06:59:55.635Z
Learning: app/web_ui/src/lib/api_schema.d.ts is generated by openapi-typescript; do not propose manual edits. Schema changes should be made in the FastAPI backend (e.g., app/desktop/studio_server/data_gen_api.py or libs/server/kiln_server/*), then re-generate the TS types.

Applied to files:

  • app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-10-29T11:19:31.059Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 759
File: app/web_ui/src/lib/api_schema.d.ts:2485-2490
Timestamp: 2025-10-29T11:19:31.059Z
Learning: In libs/server/kiln_server/document_api.py and the generated app/web_ui/src/lib/api_schema.d.ts, CreateVectorStoreConfigRequest.properties is now a union of three distinct public schemas: LanceDBConfigFTSPropertiesPublic, LanceDBConfigVectorPropertiesPublic, and LanceDBConfigHybridPropertiesPublic. These are intentionally split to allow independent evolution; do not suggest merging them.

Applied to files:

  • app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-09-04T06:27:58.852Z
Learnt from: leonardmq
Repo: Kiln-AI/Kiln PR: 487
File: app/web_ui/src/lib/api_schema.d.ts:3315-3320
Timestamp: 2025-09-04T06:27:58.852Z
Learning: The file app/web_ui/src/lib/api_schema.d.ts is auto-generated by openapi-typescript from the backend API schema and should never be manually edited. Changes to API types should be made in the backend, which will then be reflected in the regenerated TypeScript definitions.

Applied to files:

  • app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-11-26T05:40:24.426Z
Learnt from: chiang-daniel
Repo: Kiln-AI/Kiln PR: 781
File: app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte:74-86
Timestamp: 2025-11-26T05:40:24.426Z
Learning: In the fine-tune creation flow (app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte), it is acceptable for a dataset containing tool data to be used when training a model that doesn't support function calling. The training will proceed without using the tool call data from the dataset. This allows for dataset reuse across different model types.

Applied to files:

  • app/web_ui/src/lib/api_schema.d.ts
📚 Learning: 2025-10-07T00:21:33.537Z
Learnt from: sfierro
Repo: Kiln-AI/Kiln PR: 698
File: app/web_ui/src/routes/(app)/evals/[project_id]/[task_id]/compare/+page.svelte:800-812
Timestamp: 2025-10-07T00:21:33.537Z
Learning: In the Kiln web UI codebase, the `save_new_task_run_config` function in `app/web_ui/src/lib/stores/run_configs_store.ts` automatically refreshes the run configs store by calling `load_task_run_configs(project_id, task_id, true)` with force_refresh=true after successfully creating a new run config. This means callers of save_new_task_run_config do not need to manually refresh the store afterward, as the store will already contain the newly created config.

Applied to files:

  • app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte
🧬 Code graph analysis (2)
app/desktop/studio_server/provider_api.py (2)
libs/core/kiln_ai/datamodel/finetune.py (2)
  • Finetune (23-127)
  • model_id (99-109)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • StructuredOutputMode (27-49)
app/desktop/studio_server/test_eval_api.py (3)
libs/core/kiln_ai/datamodel/finetune.py (1)
  • Finetune (23-127)
libs/core/kiln_ai/datamodel/datamodel_enums.py (1)
  • FineTuneStatusType (52-61)
libs/core/kiln_ai/datamodel/test_eval_model.py (1)
  • mock_task (19-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: Generate Coverage Report
  • GitHub Check: Build, Typecheck, and Test Python (3.10)
  • GitHub Check: Build, Typecheck, and Test Python (3.11)
  • GitHub Check: Build, Typecheck, and Test Python (3.13)
  • GitHub Check: Build, Typecheck, and Test Python (3.12)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
  • GitHub Check: Build Desktop Apps (macos-15-intel)
  • GitHub Check: Build Desktop Apps (macos-latest)
  • GitHub Check: Build Desktop Apps (ubuntu-22.04)
  • GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (11)
app/desktop/studio_server/provider_api.py (4)

159-160: LGTM!

The new model_specific_run_config field is well-designed with a clear comment explaining its purpose for fine-tuned models with baked-in run configurations.


1444-1458: LGTM!

The helper function provides proper backward compatibility with a clear migration path from legacy structured_output_mode to the new run_config.structured_output_mode, with a safe fallback to json_instructions.


1489-1492: LGTM!

The refactored code properly uses the helper function and passes model_specific_run_config to ModelDetails, making the logic clearer and more maintainable.


1476-1476: Based on my verification, the review comment raises valid and well-founded concerns about error handling. Here's what I found:

Key Findings:

  1. model_id() definitely can raise ValueError - confirmed in implementation (raises when parent task or project is None)

  2. Parent relationships CAN be broken in practice - The KilnParentedModel class attempts to load parents from disk if not in memory. If the file structure is corrupted, load_parent() can return None, causing model_id() to raise.

  3. Inconsistent error handling pattern - The custom_models() function in the same file (lines 1400-1448) has a try-except block that catches errors and logs them, while all_fine_tuned_models() has no error handling.

  4. No existing tests for corrupted data - The existing tests for all_fine_tuned_models() don't cover the error scenario.

The review comment's concern is legitimate: without error handling, corrupted finetune files would crash the entire /api/available_models endpoint.


Verify error handling consistency with similar code patterns.

The use of fine_tune.model_id() is correct and cleaner than manual ID construction. However, this method raises ValueError if parent task or project is missing, which can occur if finetune files have corrupted parent relationships.

Currently, all_fine_tuned_models() lacks error handling, while the similar custom_models() function in the same file includes a try-except block that logs errors and continues. This inconsistency means a corrupted finetune would crash the /api/available_models endpoint.

Recommendation: Add a try-except block similar to custom_models() to gracefully skip corrupted finetunes with a logged warning, rather than crashing the entire endpoint.

app/web_ui/src/routes/(app)/run/+page.svelte (2)

25-26: Model-specific run config ID state is well-scoped and typed correctly

Using a nullable string for selected_model_specific_run_config_id with a clear comment keeps the model-specific suggested run config behavior explicit and localized to this page state. No issues found here.


203-214: Binding and propagating selected_model_specific_run_config_id through child components looks correct

Threading selected_model_specific_run_config_id via one-way prop into SavedRunConfigurationsDropdown and two-way binding it to RunConfigComponent cleanly centralizes the model-specific suggested run config logic in RunConfigComponent while keeping the dropdown in sync. This matches the existing selected_run_config_id pattern and fits the new model-specific behavior without introducing coupling problems.

app/web_ui/src/lib/api_schema.d.ts (3)

2625-2652: Fine‑tune/run‑config threading and capability flags are consistent with the PR goals

The additions of:

  • CreateFinetuneRequest.run_config_properties?: RunConfigProperties | null
  • Finetune.run_config?: RunConfigProperties | null (with structured_output_mode explicitly marked legacy)
  • FinetuneDatasetInfo.eligible_datasets / eligible_finetune_tags
  • FinetuneProviderModel.supports_function_calling: boolean
  • ModelDetails.model_specific_run_config?: string | null

together give a coherent surface for: (a) persisting the full run configuration used when training, (b) surfacing which models support function calling for fine‑tunes, and (c) exposing eligibility‑filtered datasets/tags in the UI. This aligns with the PR’s objective of fine‑tuning with traces & tools, while keeping backward compatibility via the legacy structured_output_mode field.

Given the retrieved learning that datasets with tool data may still be used with non‑function‑calling models (tool data is simply ignored at training time), just ensure the backend logic that populates eligible_datasets / eligible_finetune_tags and uses supports_function_calling doesn’t over‑restrict selection beyond what the UI flow expects. Any behavior tweaks should be implemented in the backend schema/handlers and then regenerated here. Based on learnings, this file is auto‑generated.

Also applies to: 3800-3905, 3910-3921, 3956-3968, 4322-4366


9071-9075: Tool‑aware finetune dataset info surface looks correct; verify backend filtering semantics

Extending finetune_dataset_info to accept tool_ids?: string[] | null and adding eligible_datasets / eligible_finetune_tags to FinetuneDatasetInfo is a good fit for tool‑filtered fine‑tune flows. This should let the UI ask “given these tools, which datasets/tags are suitable?” while still being able to show the full historical context via existing_datasets / finetune_tags.

Given the documented behavior that datasets with tool data are still allowed for non‑function‑calling models (tool data is just unused), please confirm the backend logic for tool_ids and the “eligible” fields preserves that flexibility (e.g., doesn’t block such datasets outright when no tools or a non‑tool‑capable model is selected). Any adjustments belong in the backend/OpenAPI schema, then re‑generate this file. Based on learnings, this file is auto‑generated.

Also applies to: 3910-3921


1595-1611: Both concerns in the review are valid: trailing slash is inconsistent with established patterns, and functional overlap exists between endpoints.

The endpoint /api/projects/{project_id}/tasks/{task_id}/run_configs/ has a trailing slash, which breaks the pattern established by all other similar endpoints in the same file (e.g., task_run_configs, evals, eval_configs, progress—all without trailing slashes).

Additionally, the endpoints do have functional overlap:

  • /task_run_configs returns only direct task run configs
  • /run_configs/ returns task run configs plus finetune-derived run configs (extended functionality)

While the api_schema.d.ts correctly reflects the backend definition, the inconsistency should be fixed in the FastAPI route definition (either add trailing slash consistently or remove it). If run_configs/ is intentionally the newer/extended version, consider documenting that task_run_configs is deprecated to avoid future confusion.

Any changes must be made to app/desktop/studio_server/eval_api.py and the schema regenerated from the OpenAPI spec.

app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte (2)

31-55: New tools selector settings and exported props are coherent with run-config usage

Importing arrays_equal and ToolsSelectorSettings, exporting tools_selector_settings, hide_model_selector, selected_model_specific_run_config_id, model, and tools, and deriving requires_tool_support from tools.length gives a clear, externally-controllable surface for model/tool filtering and run-config serialization. This looks consistent with the rest of the component’s responsibilities.


360-368: Based on my verification, I can now provide the rewritten review comment.

Template changes for model/tools selectors are consistent and flexible, with proper model handling confirmed

Conditionally rendering AvailableModelsDropdown behind !hide_model_selector and wiring settings={tools_selector_settings} into both ToolsSelector instances (inline at line 382 and in Advanced Options at line 402) gives callers fine-grained control over visibility and behavior without duplicating logic. The Tailwind-based layout (flex flex-col gap-4) remains unchanged and compatible with the existing UI stack.

Verified: The only call site that hides the model selector is the fine-tune page (create_finetune/+page.svelte), which safely manages model selection separately via the $model_provider store and overrides the model in run config properties. Since the fine-tune page doesn't set requires_structured_output, the model-dependent behaviors in the component don't create issues. The default model from $ui_state.selected_model works as expected for other call sites that don't hide the selector.

Comment on lines +1889 to +1975
@pytest.mark.asyncio
async def test_get_run_configs_excludes_unknown_and_failed_finetunes(
client, mock_task_from_id, mock_task
):
"""Test that finetunes with unknown or failed status are excluded from run configs."""
mock_task_from_id.return_value = mock_task

run_config_props = RunConfigProperties(
model_name="gpt-4",
model_provider_name=ModelProviderName.openai,
prompt_id="simple_chain_of_thought_prompt_builder",
structured_output_mode=StructuredOutputMode.json_schema,
)

finetunes = [
Finetune(
id="ft_completed",
name="Completed Finetune",
provider="openai",
base_model_id="model1",
dataset_split_id="split1",
system_message="System message",
latest_status=FineTuneStatusType.completed,
run_config=run_config_props,
parent=mock_task,
),
Finetune(
id="ft_running",
name="Running Finetune",
provider="openai",
base_model_id="model2",
dataset_split_id="split2",
system_message="System message",
latest_status=FineTuneStatusType.running,
run_config=run_config_props,
parent=mock_task,
),
Finetune(
id="ft_unknown",
name="Unknown Finetune",
provider="openai",
base_model_id="model3",
dataset_split_id="split3",
system_message="System message",
latest_status=FineTuneStatusType.unknown,
run_config=run_config_props,
parent=mock_task,
),
Finetune(
id="ft_failed",
name="Failed Finetune",
provider="openai",
base_model_id="model4",
dataset_split_id="split4",
system_message="System message",
latest_status=FineTuneStatusType.failed,
run_config=run_config_props,
parent=mock_task,
),
Finetune(
id="ft_no_run_config",
name="No Run Config Finetune",
provider="openai",
base_model_id="model5",
dataset_split_id="split5",
system_message="System message",
latest_status=FineTuneStatusType.completed,
run_config=None,
parent=mock_task,
),
]

for finetune in finetunes:
finetune.save_to_file()

response = client.get("/api/projects/project1/tasks/task1/run_configs/")

assert response.status_code == 200
configs = response.json()

config_ids = [config["id"] for config in configs]

assert "finetune_run_config::project1::task1::ft_completed" in config_ids
assert "finetune_run_config::project1::task1::ft_running" in config_ids
assert "finetune_run_config::project1::task1::ft_failed" in config_ids
assert "finetune_run_config::project1::task1::ft_unknown" in config_ids
assert "finetune_run_config::project1::task1::ft_no_run_config" not in config_ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Test name and docstring contradict the actual test behavior.

The test name test_get_run_configs_excludes_unknown_and_failed_finetunes and docstring claim that finetunes with unknown or failed status are excluded, but the assertions prove the opposite:

  • Lines 1973-1974 assert that ft_unknown and ft_failed ARE included in config_ids
  • Line 1975 shows the only excluded finetune is ft_no_run_config (which has run_config=None)

The actual behavior being tested is: "Include all finetunes that have a run_config, regardless of status."

Apply this diff to fix the test name and docstring:

 @pytest.mark.asyncio
-async def test_get_run_configs_excludes_unknown_and_failed_finetunes(
+async def test_get_run_configs_includes_finetunes_with_run_config(
     client, mock_task_from_id, mock_task
 ):
-    """Test that finetunes with unknown or failed status are excluded from run configs."""
+    """Test that finetunes are included in run configs only if they have a run_config set."""
     mock_task_from_id.return_value = mock_task
🤖 Prompt for AI Agents
In app/desktop/studio_server/test_eval_api.py around lines 1889 to 1975, the
test name and docstring incorrectly state that finetunes with unknown or failed
status are excluded; update the test name and docstring to reflect that the test
verifies all finetunes that have a run_config are included regardless of status
(i.e., include completed, running, unknown, failed) and only finetunes with
run_config=None are excluded; rename the test (e.g.,
test_get_run_configs_includes_all_with_run_config) and change the docstring to
"Test that finetunes with a run_config are included in run configs regardless of
status, and finetunes without a run_config are excluded."

Comment on lines +200 to +222
let prior_model: string | null = null
async function process_model_change() {
// only run once immediately after a model change, not every reactive update
if (prior_model === model) {
return
}
prior_model = model
// Special case on model change: if the model says it has a model-specific run config, select that run config.
// Currently used by fine-tuned models which need to be called like they are trained.
const model_details = available_model_details(
model_name,
provider,
$available_models,
)
if (model_details?.model_specific_run_config) {
selected_run_config_id = model_details.model_specific_run_config
selected_model_specific_run_config_id =
model_details.model_specific_run_config
} else {
selected_model_specific_run_config_id = null
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential override of user-selected run config when model has model_specific_run_config

process_model_change unconditionally sets selected_run_config_id to model_details.model_specific_run_config whenever the model changes and that property is present, and also updates selected_model_specific_run_config_id. Because model changes can be induced indirectly by selecting a saved run config (via update_current_run_options_for_selected_run_config), this can lead to the following behavior:

  1. User selects a saved run config whose model has a model_specific_run_config.
  2. That run config sets model to the chosen model.
  3. On the next debounced pass, process_model_change sees the “new” model and overwrites selected_run_config_id with the model-specific run config, effectively discarding the user’s chosen saved config.

This seems at odds with the comment in Lines 139–143 that “Select a saved run config” should apply that config’s values. If the intent is to only auto-select model_specific_run_config when the user changes the model from the dropdown (not when the model is changed as a side effect of picking a run config), you may want to guard this block with additional context (e.g., only run when the last change came from the model dropdown, or skip when selected_run_config_id was just set explicitly by the user).

🤖 Prompt for AI Agents
In app/web_ui/src/lib/ui/run_config_component/run_config_component.svelte around
lines 200 to 222, process_model_change unconditionally overrides
selected_run_config_id when model_details.model_specific_run_config exists;
change it so the model-specific run config is only auto-selected when the user
explicitly changed the model (e.g., from the model dropdown) and not when the
model value was set as a side-effect of selecting a saved run config. Implement
a small guard such as tracking the last change source (e.g., lastChangeSource =
'model' vs 'runConfig') or check that selected_run_config_id was not just set by
update_current_run_options_for_selected_run_config before assigning
selected_run_config_id; if the change did originate from an explicit model
dropdown action, set selected_run_config_id and
selected_model_specific_run_config_id as now, otherwise only update
selected_model_specific_run_config_id (or leave both unchanged).

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

high_quality_count: Dict[str, int] = {}
reasoning_and_high_quality_count: Dict[str, int] = {}

required_tools_set = set(tool_filter) if tool_filter else None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Empty tool filter treated as no filter

The compute_finetune_tag_info function's docstring states "Only runs with exactly these tools will be included", but passing an empty list [] as tool_filter is treated the same as None (no filtering). This is because set(tool_filter) if tool_filter else None evaluates bool([]) as False. If someone wants to filter to runs with exactly zero tools, this won't work as documented. The condition should check if tool_filter is not None instead of just if tool_filter to properly distinguish between "no filter requested" and "filter to runs with no tools". The same pattern exists for eligible_datasets filtering at line 380. The frontend guards against this by only sending non-empty arrays, but the backend function's behavior is inconsistent with its documentation.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants