[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-02-22 #17699
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-03-01T12:45:05.845Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Daily NLP-based clustering analysis of copilot agent task prompts from the last 30 days (2026-01-21 → 2026-02-22).
Summary
Cluster Overview
workflow, update, mcp, add, cliissue, section, copilot, resolvesafe, outputs, safe outputs, handleragentic workflows, debug, promptcode quality, task miner, improvementrun, failure, failed, ci, patchjob, fix, identify, failing, root causecustom agent, agent used, github actionscampaign, security, project, dispatchCluster Details with Representative Examples
1. Workflow & MCP Updates (741 PRs — 41% of total)
The dominant category. Covers updates to workflow files, MCP server dependency bumps, CLI feature additions, and compile/init command enhancements.
workflow, update, mcp, add, pr, make, review, agentic, file, cliRepresentative prompts:
Example PRs: #11050, #11058, #11064
2. Issue-driven Agent Tasks (304 PRs — 17% of total)
Tasks sourced directly from GitHub issues, typically formatted with
(issue_title)/(issue_description)XML tags. Covers a broad range of features and bug fixes originating from the issue tracker.issue, section, details, copilot, resolve, comments, original issueRepresentative prompts:
Example PRs: #11059, #11060, #11067
3. Safe Outputs Implementation (157 PRs — 9% of total)
Tasks specifically targeting the
safe-outputssystem: validation, error handling, ANSI stripping, compile-time checks, and JSON schema additions.safe, outputs, safe outputs, safe output, output, handler, project, createRepresentative prompts:
Example PRs: #11066, #11068, #11112
4. Agentic Workflow Debugging (133 PRs — 7% of total)
Tasks focused on debugging and improving the agentic workflow system itself: failure tracking, issue templates, prompt clustering, and agent orchestration fixes.
agentic workflows, agentic, workflows, debug, upgrade, prompt, createRepresentative prompts:
Example PRs: #11053, #11054, #11090
5. Code Quality / Task Mining (130 PRs — 7% of total)
Tasks generated by the task miner from code quality discussions: refactoring large files, adding test coverage, extracting helper functions, improving documentation, and fixing shell check warnings.
quality, code quality, code, discussion, improvement, task miner, discussion task, minerRepresentative prompts:
Example PRs: #11587, #11592, #11593
6. CI Failure / Run Fixes (119 PRs — 7% of total)
Automated tasks triggered by CI failures, often generated by the "CI Failure Doctor" workflow. Prompts include job IDs, run URLs, and ask the agent to identify root causes and implement fixes.
run, failure, workflow, failed, ci, ci failure, patch, workflow runRepresentative prompts:
Example PRs: #11069, #11915, #12304
7. Root Cause Bug Fixes (76 PRs — 4% of total)
Targeted bug-fix tasks where the agent is asked to identify the root cause of a specific job or test failure and implement a fix. More targeted than the CI Failure cluster — typically involve specific failing job IDs and log analysis.
job, fix, identify, failing, id, root cause, workflow, implement, logsRepresentative prompts:
Example PRs: #11096, #11915, #12304
8. Custom Agent Workflows (78 PRs — 4% of total)
Tasks submitted via custom agentic workflows (e.g.,
ci-cleaner,agentic-workflows). Prompts typically include a**Custom agent used:**suffix identifying the triggering workflow. Tends to involve light-touch tasks (docs, small features).custom agent, agent used, used, custom, github, docs, agent, documentationRepresentative prompts:
Example PRs: #11083, #11105, #11110
9. Campaign / Feature Work (63 PRs — 4% of total)
Tasks related to the campaign system: label-based discovery, orchestration, dispatch workers, security features, and structured project management.
campaign, security, project, issue, fix, docs, run, workflows, codeRepresentative prompts:
Example PRs: #11070, #11080, #11087
Merge Rate Comparison Table
Key Findings
Workflow & MCP Updates dominates at 41% of all tasks (741 PRs). This reflects the active development and maintenance cadence of the gh-aw system itself — dependency bumps, MCP server upgrades, and CLI enhancements make up the single largest task category.
Merge rates vary substantially by category (52%–79%). Custom Agent Workflows and Safe Outputs tasks merge most reliably. Code Quality / Task Mining has the lowest merge rate at 52%, suggesting many mined tasks are either too vague, duplicate existing work, or require more investigation than the agent can complete in one pass.
Task volume is growing week-over-week (279 → 346 tasks/week), indicating increasing reliance on the copilot agent for day-to-day engineering work.
Smallest-scope tasks merge most reliably. Custom Agent Workflows average just 3.9 files changed with a 79% merge rate, while the largest-scope tasks (Safe Outputs, Workflow & MCP Updates at ~27 files) still achieve 70–78% — suggesting the agent handles complex tasks well when prompts are precise.
Campaign / Feature Work requires the most iterations (avg 4.9 commits/PR vs ~3 for simpler clusters), consistent with the architectural nature of campaign system changes.
Recommendations
Review Code Quality / Task Mining prompt templates. With a 52% merge rate and 130 PRs, this is the highest-volume low-success cluster. Mined tasks should include clearer acceptance criteria, links to specific failing tests, and explicit scope boundaries. Consider adding a "definition of done" section to task miner output.
Break down Safe Outputs and Workflow & MCP Update tasks. Both categories change ~26–28 files on average per PR. While merge rates are still good (70–78%), splitting large multi-file tasks into atomic sub-tasks would reduce reviewer burden and decrease the risk of partial rework.
Standardize CI Failure prompts with structured context. The CI Failure / Run Fixes cluster (119 PRs, 69% merge rate) benefits from including Job IDs and run URLs. Ensure all automated failure-fix prompts include: workflow name, job ID, run URL, relevant log lines, and expected behavior.
Leverage Custom Agent Workflow patterns. The smallest and most reliably-merged cluster uses focused, single-concern prompts triggered by specialized workflows. Applying this "narrow scope + known agent context" pattern to other categories could improve merge rates across the board.
Monitor Code Quality campaign effectiveness. The task miner generates 130 PRs/month (7%) with 48% failing to merge — this represents engineering time that could be better spent. Consider a quality gate on mined tasks before dispatching to the agent.
References: §22277077825
Beta Was this translation helpful? Give feedback.
All reactions