Conversation
📝 WalkthroughWalkthroughAdds multi-provider AI support (Gemini/OpenAI), PR indexing and duplicate-detection features, config and CLI extensions for PRs, Qdrant PR collection support, GitHub PR helpers, provider-agnostic embedder/LLM refactors, and documentation/workflow updates. Changes
Sequence Diagram(s)sequenceDiagram
actor User as User
participant CLI as CLI (pr-duplicate)
participant GH as GitHub API
participant Embed as Embedder (Gemini/OpenAI)
participant Qdrant as Qdrant
participant LLM as LLM Client (Gemini/OpenAI)
participant Out as Output
User->>CLI: run pr-duplicate --repo org/repo --number N
CLI->>GH: GetPullRequest(org,repo,N) / ListPullRequestFiles
GH-->>CLI: PR metadata + file list
CLI->>Embed: build embedding vector (PR text)
Embed-->>CLI: embedding vector
CLI->>Qdrant: search issues collection (and PR collection if present)
Qdrant-->>CLI: candidate list
CLI->>LLM: DetectPRDuplicate(PR + candidates)
LLM-->>CLI: structured duplicate result
CLI->>Out: render text or JSON
Out-->>User: display findings
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
add openai support
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/simili-web/main.go (1)
112-147:⚠️ Potential issue | 🟠 MajorAdd env fallback for embedding/LLM API keys when no config file is present.
With no config file,
cfg.Embedding.APIKeystays empty and the web UI will fail even ifGEMINI_API_KEYorOPENAI_API_KEYis set in the environment.🔧 Suggested fix (env fallback)
import ( "context" "embed" "encoding/json" "fmt" "io/fs" "log" "net/http" "os" + "strings" @@ func initDependencies(cfg *config.Config) (*pipeline.Dependencies, error) { deps := &pipeline.Dependencies{ DryRun: true, // Always dry-run for web UI } - // Embedder (Gemini/OpenAI auto-selected by available keys) - embedder, err := gemini.NewEmbedder(cfg.Embedding.APIKey, cfg.Embedding.Model) + apiKey := strings.TrimSpace(cfg.Embedding.APIKey) + if apiKey == "" { + if k := strings.TrimSpace(os.Getenv("GEMINI_API_KEY")); k != "" { + apiKey = k + } else if k := strings.TrimSpace(os.Getenv("OPENAI_API_KEY")); k != "" { + apiKey = k + } + } + + // Embedder (Gemini/OpenAI auto-selected by available keys) + embedder, err := gemini.NewEmbedder(apiKey, cfg.Embedding.Model) if err != nil { return nil, fmt.Errorf("failed to init embedder: %w", err) } deps.Embedder = embedder @@ - // LLM Client - llm, err := gemini.NewLLMClient(cfg.Embedding.APIKey) + // LLM Client + llm, err := gemini.NewLLMClient(apiKey) if err != nil { return nil, fmt.Errorf("failed to init LLM: %w", err) } deps.LLMClient = llm
🤖 Fix all issues with AI agents
In @.env.sample:
- Around line 7-12: Reorder the variables in .env.sample so QDRANT_COLLECTION
appears before QDRANT_URL to satisfy dotenv-linter key order (move the line
"QDRANT_COLLECTION=your-issues-collection" above
"QDRANT_URL=https://your-cluster.qdrant.io:6333"), or alternatively update the
dotenv-linter configuration to allow the current ordering if you prefer to keep
the existing layout; ensure the change preserves comments and optional
QDRANT_PR_COLLECTION lines.
In @.simili.yaml:
- Around line 7-10: The dimensions value in .simili.yaml (model:
gemini-embedding-001) is set to 1536 which conflicts with the codebase default
Embedding.Dimensions (768) and Gemini’s API default (3072) unless
output_dimensionality is set; either change dimensions to 768 to match the
codebase default or ensure the Gemini client initialization explicitly sets
output_dimensionality: 1536 so the API returns 1536-d embeddings (check places
that reference Embedding.Dimensions and the Gemini call in
internal/steps/vectordb_prep.go and the config default in
internal/core/config/config.go to keep the config and actual embedder output in
sync).
In `@cmd/simili/commands/batch.go`:
- Around line 333-339: The current code hard-fails when gemini.NewLLMClient
returns an error; change it to allow LLM initialization to fail gracefully by
only assigning deps.LLMClient and printing the verbose success message when err
== nil, and otherwise (when err != nil) do not return the error — optionally
emit a verbose warning about the LLM init failure; specifically, update the
gemini.NewLLMClient call handling so that deps.LLMClient remains nil on error
(the Dependencies.Close() already tolerates nil) and downstream LLM-dependent
steps remain skipped for non-LLM presets.
In `@internal/integrations/gemini/embedder.go`:
- Around line 116-147: embedOpenAI mutates e.dimensions unsafely while
Dimensions() reads it concurrently; add synchronization to avoid data races by
protecting dimensions with a mutex or atomic: add a sync.RWMutex (e.g., dimsMu)
to the Embedder struct, set e.dimensions under dimsMu.Lock() in embedOpenAI (or
initialize once if immutable) and read it under dimsMu.RLock() in Dimensions();
ensure all reads/writes of e.dimensions use the same lock to eliminate the race.
In `@internal/integrations/github/client.go`:
- Around line 157-168: ListPullRequestFiles currently returns only the first
page; change it to paginate internally like ListIssueEvents so callers don’t
miss files. Initialize opts if nil (PerPage:100), loop calling
c.client.PullRequests.ListFiles(ctx, org, repo, number, opts), append returned
[]*github.CommitFile to a single slice, update opts.Page using resp.NextPage (or
increment) until there are no more pages, and return the aggregated files and
the last *github.Response; preserve the existing error wrapping when a call
fails. Reference: ListPullRequestFiles, ListIssueEvents, and
listAllPullRequestFilePaths.
🧹 Nitpick comments (7)
cmd/simili/commands/pr_support.go (2)
17-17: Regex may miss some GitHub linking keywords.The pattern handles
close/closed/closes,fix/fixed/fixes/fixe, andresolve/resolved/resolves. Note thatfixeis matched but this is uncommon. GitHub also supports additional keywords likeclosingwhich this pattern won't catch, though the current set covers the most common cases.Also applies to: 103-128
76-78: Chained getters are nil-safe in go-github v60—defensive nil checks are optional.In go-github v60, the
GetUser(),GetBase(), andGetHead()accessor methods are designed to be nil-safe on their receivers and in chains. For example, ifpr.GetUser()returnsnil, callingGetLogin()on thatnilpointer still returns an empty string rather than panicking. This is by design: the generated getters ingithub-accessors.gohandlenilreceivers gracefully.The code as written is safe and does not require defensive nil checks. If you prefer explicit nil checks for code clarity, the suggested refactor is valid but not necessary.
DOCS/examples/multi-repo/shared-workflow.yml (1)
11-19: Consider documenting the "at least one required" constraint.Both
GEMINI_API_KEYandOPENAI_API_KEYare marked asrequired: false, but the application requires at least one. While runtime validation handles this, adding a comment in the workflow clarifying this constraint would improve maintainability.📝 Suggested comment addition
secrets: + # At least one of GEMINI_API_KEY or OPENAI_API_KEY must be provided GEMINI_API_KEY: required: false OPENAI_API_KEY: required: falseDOCS/single-repo-setup.md (1)
30-48: Mention PR workflows in Step 3/CLI backfill.
Since PR support was added, call out PR-triggered workflows and the PR indexing flags so users discover the feature.✍️ Suggested doc tweak
-Create a GitHub Actions workflow file (e.g., `.github/workflows/simili.yml`) to trigger the bot on issue events. +Create a GitHub Actions workflow file (e.g., `.github/workflows/simili.yml`) to trigger the bot on issue events +(and `pull_request` events if you want PR workflows). ... -2. **Index Issues**: +2. **Index Issues**: ```bash gh simili index --repo owner/repo --config .github/simili.yaml ``` + **Optional: Index PRs too**: + ```bash + gh simili index --repo owner/repo --config .github/simili.yaml --include-prs --pr-collection <name> + ```cmd/simili/commands/pr_duplicate.go (1)
212-260: Align Qdrant env overrides with other commands.
Other commands honorQDRANT_URL/QDRANT_API_KEY. Adding the same fallback here improves CLI consistency.🔧 Suggested adjustment
- qdrantClient, err := qdrant.NewClient(cfg.Qdrant.URL, cfg.Qdrant.APIKey) + qURL := cfg.Qdrant.URL + if val := os.Getenv("QDRANT_URL"); val != "" && (qURL == "" || qURL == "localhost:6334") { + qURL = val + } + if qURL == "" { + qURL = "localhost:6334" + } + + qKey := cfg.Qdrant.APIKey + if val := os.Getenv("QDRANT_API_KEY"); val != "" && qKey == "" { + qKey = val + } + + qdrantClient, err := qdrant.NewClient(qURL, qKey)internal/integrations/gemini/llm.go (1)
168-174: Close OpenAI idle connections inClose().Right now only the Gemini client is closed. For long‑lived processes using OpenAI, consider closing idle HTTP connections to avoid resource leaks.
♻️ Proposed fix
func (l *LLMClient) Close() error { - if l.gemini != nil { - return l.gemini.Close() - } - return nil + var err error + if l.gemini != nil { + err = l.gemini.Close() + } + if l.openAI != nil { + l.openAI.CloseIdleConnections() + } + return err }internal/integrations/gemini/embedder.go (1)
68-74: Close OpenAI idle connections inClose().Right now only the Gemini client is closed. For long‑lived processes using OpenAI, consider closing idle HTTP connections to avoid resource leaks.
♻️ Proposed fix
func (e *Embedder) Close() error { - if e.gemini != nil { - return e.gemini.Close() - } - return nil + var err error + if e.gemini != nil { + err = e.gemini.Close() + } + if e.openAI != nil { + e.openAI.CloseIdleConnections() + } + return err }
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
DOCS/single-repo-setup.md (1)
36-48:⚠️ Potential issue | 🟡 MinorUpdate the CLI index command documentation to mention PR indexing capability.
The CLI index command supports PR indexing via the
--include-prsflag and a dedicated--pr-collectionparameter, but the documentation only shows basic issue indexing. Add an example or mention of these flags so users are aware they can also index pull requests.
|
Hey @nick1udwig — really appreciate this contribution! The work here is solid and both features are things we want in 0.2.0. However, this PR bundles two independent features that we'd like to track and merge separately:
Could you split this into two PRs targeting those issues? It'll make each easier to review, test independently, and revert if needed.
Heads-up: codebase has changed since this PR was openedTwo PRs have landed on PR #39 — feat: support pull request events in triage pipeline
PR #41 — fix: PR triage review feedback
The main files likely to conflict with your branch are Thanks again for the effort on this — looking forward to getting both features in! |
Description
Index and query PRs, not just issues
Type of Change
Related Issues
Fixes #38
Changes Made
Testing
go build ./...successfullygo test ./...successfullygo vet ./...successfully(I've tested in nick1udwig#1 with OpenAI models)
Screenshots (if applicable)
Checklist
Additional Notes
Summary by CodeRabbit
New Features
Improvements
Documentation