Skip to content

feat: support for docker swarm#1493

Open
kmendell wants to merge 1 commit intomainfrom
feat/docker-swarm
Open

feat: support for docker swarm#1493
kmendell wants to merge 1 commit intomainfrom
feat/docker-swarm

Conversation

@kmendell
Copy link
Member

@kmendell kmendell commented Jan 15, 2026

Closes: #591

Disclaimer Greptiles Reviews use AI, make sure to check over its work.

To better help train Greptile on our codebase, if the comment is useful and valid Like the comment, if its not helpful or invalid Dislike

To have Greptile Re-Review the changes, mention greptileai.

Greptile Summary

This PR adds comprehensive Docker Swarm support to Arcane, introducing ~13k lines of new code across the full stack: a 1900-line SwarmService, a 1250-line stack_deploy_engine (compose-to-swarm deploy pipeline), 1361-line HTTP handlers, a WebSocket log streaming endpoint, and a complete Svelte frontend covering services, nodes, tasks, stacks, configs, secrets, and the cluster management page.

Key issues found:

  • Information-disclosure via host environment in resolveFileObjectContent (stack_deploy_engine.go:884) — A user-supplied compose file can read any host process environment variable as the content of a Docker config or secret (via the environment: key on a config/secret entry). Combined with the previously flagged parseEnvContent/os.Environ() seeding, this creates two independent paths for leaking server-side credentials into swarm resources on multi-tenant deployments.

  • Goroutine leak in readAllLogs (log_stream_util.go:83) — In non-follow mode, log lines are sent to a buffered channel (cap=256) without checking ctx.Done(). If the WebSocket client disconnects while more than 256 lines remain to be drained, the producer goroutine blocks indefinitely and the consumer goroutines hang waiting on range lines, leaking both goroutines permanently.

  • Several previously-noted issues remain open (unexported function naming, parseEnvContent/os.Environ() seeding, fmt.Printf vs slog, etc.).

Confidence Score: 2/5

  • Not safe to merge yet — two unaddressed security/reliability issues remain open alongside multiple previously-flagged concerns.
  • The core logic is well-structured and the feature coverage is thorough, but the resolveFileObjectContent host-env leakage and the readAllLogs goroutine leak are both concrete bugs requiring fixes before merging. Combined with the still-open parseEnvContent / os.Environ() seeding issue from the previous review round, the security surface of the stack deploy engine needs another pass.
  • backend/pkg/libarcane/swarm/stack_deploy_engine.go (two host-env leakage vectors), backend/internal/services/log_stream_util.go (goroutine leak in readAllLogs)

Important Files Changed

Filename Overview
backend/pkg/libarcane/swarm/stack_deploy_engine.go Core stack deploy engine: two information-disclosure paths (os.Environ seeding and os.LookupEnv in resolveFileObjectContent) allow host secrets to be injected into swarm configs/secrets via user-supplied compose files.
backend/internal/services/log_stream_util.go New log streaming utility: readAllLogs does not accept a context, causing potential goroutine leaks in non-follow mode when client disconnects with >256 buffered log lines.
backend/internal/services/swarm_service.go New 1900-line swarm service: comprehensive coverage of services, nodes, tasks, stacks, configs, and secrets; generally well-structured with proper context propagation and error handling.
backend/internal/huma/handlers/swarm.go New 1361-line HTTP handler layer for all swarm operations; follows existing handler conventions with proper error mapping and pagination support.
backend/internal/api/ws_handler.go Adds ServiceLogs WebSocket endpoint following the established context.Background() pattern; metrics tracking and hub lifecycle are consistent with existing container log streaming.
frontend/src/lib/services/swarm-service.ts New TypeScript service client covering all swarm API endpoints; consistent use of async/await and correct return types throughout.

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: backend/pkg/libarcane/swarm/stack_deploy_engine.go
Line: 884-894

Comment:
**Host environment variables leaked into config/secret content**

`resolveFileObjectContent` reads from the host process environment via `os.LookupEnv(fileConfig.Environment)` when a compose file declares a config or secret using the `environment` key. This means any user-supplied compose file can silently exfiltrate any secret present in the server's process environment (database credentials, API keys, etc.) by declaring:

```yaml
configs:
  leak:
    environment: DATABASE_URL
```

This is a distinct but related information-disclosure path to the already-discussed `parseEnvContent`/`os.Environ()` seeding issue. In this case, the host environment variable becomes the *content* of a Docker config/secret pushed to the swarm, where it is potentially visible to all swarm members.

If this feature (sourcing config/secret content from the server's env) is intentional, it should be gated behind an explicit allow-list of variable names or a feature flag, and clearly documented. If not, the `fileConfig.Environment` branch should be removed or return an error indicating it is unsupported.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: backend/internal/services/log_stream_util.go
Line: 83-112

Comment:
**Goroutine leak in `readAllLogs` — channel sends not guarded by context cancellation**

`readAllLogs` buffers the full log output and then sends each line to `logsChan` with bare blocking sends (`logsChan <- line`). No `ctx.Done()` check is present in the send loop.

In the WebSocket `ServiceLogs` handler (`ws_handler.go`), the `lines` channel has a buffer of 256. If `follow=false` and the log output exceeds 256 lines, `readAllLogs` will block waiting for the consumer once the buffer is full. When the WebSocket client disconnects, the hub invokes `cancel()`, which cancels the producer goroutine's context — but `readAllLogs` does not honour context cancellation, so the producer goroutine hangs indefinitely. Because `defer close(lines)` is never reached, the consumer goroutines also hang permanently on `for line := range lines`.

The fix is straightforward: accept `ctx context.Context` in `readAllLogs` and select between `logsChan <- line` and `<-ctx.Done()`.

```go
func readAllLogs(ctx context.Context, logs io.ReadCloser, logsChan chan<- string) error {
    stdoutBuf := &strings.Builder{}
    stderrBuf := &strings.Builder{}

    _, err := stdcopy.StdCopy(stdoutBuf, stderrBuf, logs)
    if err != nil && !errors.Is(err, io.EOF) {
        return fmt.Errorf("failed to demultiplex logs: %w", err)
    }

    send := func(line string) bool {
        select {
        case logsChan <- line:
            return true
        case <-ctx.Done():
            return false
        }
    }

    for line := range strings.SplitSeq(strings.TrimRight(stdoutBuf.String(), "\n"), "\n") {
        if line != "" && !send(line) {
            return ctx.Err()
        }
    }
    for line := range strings.SplitSeq(strings.TrimRight(stderrBuf.String(), "\n"), "\n") {
        if line != "" && !send("[STDERR] "+line) {
            return ctx.Err()
        }
    }
    return nil
}
```

The caller in `StreamServiceLogs` would need to pass `ctx` as well.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 1bbabbe

Greptile also left 2 inline comments on this PR.

Copy link
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@getarcaneappbot
Copy link
Contributor

getarcaneappbot commented Jan 15, 2026

Container images for this PR have been built successfully!

  • Manager: ghcr.io/getarcaneapp/arcane:pr-1493
  • Agent: ghcr.io/getarcaneapp/arcane-headless:pr-1493

Built from commit c5ea348

@kmendell kmendell force-pushed the feat/docker-swarm branch 10 times, most recently from a8568ea to f4a24b6 Compare February 5, 2026 19:31
@getarcaneapp getarcaneapp deleted a comment from codecov bot Feb 6, 2026
@getarcaneapp getarcaneapp deleted a comment from codecov bot Feb 6, 2026
@kmendell kmendell force-pushed the feat/docker-swarm branch 5 times, most recently from 9a23197 to 6bf2322 Compare February 10, 2026 16:59
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

89 files reviewed, 11 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +884 to +894
if !ok {
return nil, fmt.Errorf("environment variable %s not set", fileConfig.Environment)
}
return []byte(value), nil
}
if fileConfig.File != "" {
path := fileConfig.File
if !filepath.IsAbs(path) {
path = filepath.Join(workingDir, path)
}
return os.ReadFile(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Host environment variables leaked into config/secret content

resolveFileObjectContent reads from the host process environment via os.LookupEnv(fileConfig.Environment) when a compose file declares a config or secret using the environment key. This means any user-supplied compose file can silently exfiltrate any secret present in the server's process environment (database credentials, API keys, etc.) by declaring:

configs:
  leak:
    environment: DATABASE_URL

This is a distinct but related information-disclosure path to the already-discussed parseEnvContent/os.Environ() seeding issue. In this case, the host environment variable becomes the content of a Docker config/secret pushed to the swarm, where it is potentially visible to all swarm members.

If this feature (sourcing config/secret content from the server's env) is intentional, it should be gated behind an explicit allow-list of variable names or a feature flag, and clearly documented. If not, the fileConfig.Environment branch should be removed or return an error indicating it is unsupported.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/pkg/libarcane/swarm/stack_deploy_engine.go
Line: 884-894

Comment:
**Host environment variables leaked into config/secret content**

`resolveFileObjectContent` reads from the host process environment via `os.LookupEnv(fileConfig.Environment)` when a compose file declares a config or secret using the `environment` key. This means any user-supplied compose file can silently exfiltrate any secret present in the server's process environment (database credentials, API keys, etc.) by declaring:

```yaml
configs:
  leak:
    environment: DATABASE_URL
```

This is a distinct but related information-disclosure path to the already-discussed `parseEnvContent`/`os.Environ()` seeding issue. In this case, the host environment variable becomes the *content* of a Docker config/secret pushed to the swarm, where it is potentially visible to all swarm members.

If this feature (sourcing config/secret content from the server's env) is intentional, it should be gated behind an explicit allow-list of variable names or a feature flag, and clearly documented. If not, the `fileConfig.Environment` branch should be removed or return an error indicating it is unsupported.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Comment on lines +83 to +112
// readAllLogs reads all logs at once (non-follow mode) and sends them to logsChan.
func readAllLogs(logs io.ReadCloser, logsChan chan<- string) error {
stdoutBuf := &strings.Builder{}
stderrBuf := &strings.Builder{}

_, err := stdcopy.StdCopy(stdoutBuf, stderrBuf, logs)
if err != nil && !errors.Is(err, io.EOF) {
return fmt.Errorf("failed to demultiplex logs: %w", err)
}

if stdoutBuf.Len() > 0 {
lines := strings.SplitSeq(strings.TrimRight(stdoutBuf.String(), "\n"), "\n")
for line := range lines {
if line != "" {
logsChan <- line
}
}
}

if stderrBuf.Len() > 0 {
lines := strings.SplitSeq(strings.TrimRight(stderrBuf.String(), "\n"), "\n")
for line := range lines {
if line != "" {
logsChan <- "[STDERR] " + line
}
}
}

return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goroutine leak in readAllLogs — channel sends not guarded by context cancellation

readAllLogs buffers the full log output and then sends each line to logsChan with bare blocking sends (logsChan <- line). No ctx.Done() check is present in the send loop.

In the WebSocket ServiceLogs handler (ws_handler.go), the lines channel has a buffer of 256. If follow=false and the log output exceeds 256 lines, readAllLogs will block waiting for the consumer once the buffer is full. When the WebSocket client disconnects, the hub invokes cancel(), which cancels the producer goroutine's context — but readAllLogs does not honour context cancellation, so the producer goroutine hangs indefinitely. Because defer close(lines) is never reached, the consumer goroutines also hang permanently on for line := range lines.

The fix is straightforward: accept ctx context.Context in readAllLogs and select between logsChan <- line and <-ctx.Done().

func readAllLogs(ctx context.Context, logs io.ReadCloser, logsChan chan<- string) error {
    stdoutBuf := &strings.Builder{}
    stderrBuf := &strings.Builder{}

    _, err := stdcopy.StdCopy(stdoutBuf, stderrBuf, logs)
    if err != nil && !errors.Is(err, io.EOF) {
        return fmt.Errorf("failed to demultiplex logs: %w", err)
    }

    send := func(line string) bool {
        select {
        case logsChan <- line:
            return true
        case <-ctx.Done():
            return false
        }
    }

    for line := range strings.SplitSeq(strings.TrimRight(stdoutBuf.String(), "\n"), "\n") {
        if line != "" && !send(line) {
            return ctx.Err()
        }
    }
    for line := range strings.SplitSeq(strings.TrimRight(stderrBuf.String(), "\n"), "\n") {
        if line != "" && !send("[STDERR] "+line) {
            return ctx.Err()
        }
    }
    return nil
}

The caller in StreamServiceLogs would need to pass ctx as well.

Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/internal/services/log_stream_util.go
Line: 83-112

Comment:
**Goroutine leak in `readAllLogs` — channel sends not guarded by context cancellation**

`readAllLogs` buffers the full log output and then sends each line to `logsChan` with bare blocking sends (`logsChan <- line`). No `ctx.Done()` check is present in the send loop.

In the WebSocket `ServiceLogs` handler (`ws_handler.go`), the `lines` channel has a buffer of 256. If `follow=false` and the log output exceeds 256 lines, `readAllLogs` will block waiting for the consumer once the buffer is full. When the WebSocket client disconnects, the hub invokes `cancel()`, which cancels the producer goroutine's context — but `readAllLogs` does not honour context cancellation, so the producer goroutine hangs indefinitely. Because `defer close(lines)` is never reached, the consumer goroutines also hang permanently on `for line := range lines`.

The fix is straightforward: accept `ctx context.Context` in `readAllLogs` and select between `logsChan <- line` and `<-ctx.Done()`.

```go
func readAllLogs(ctx context.Context, logs io.ReadCloser, logsChan chan<- string) error {
    stdoutBuf := &strings.Builder{}
    stderrBuf := &strings.Builder{}

    _, err := stdcopy.StdCopy(stdoutBuf, stderrBuf, logs)
    if err != nil && !errors.Is(err, io.EOF) {
        return fmt.Errorf("failed to demultiplex logs: %w", err)
    }

    send := func(line string) bool {
        select {
        case logsChan <- line:
            return true
        case <-ctx.Done():
            return false
        }
    }

    for line := range strings.SplitSeq(strings.TrimRight(stdoutBuf.String(), "\n"), "\n") {
        if line != "" && !send(line) {
            return ctx.Err()
        }
    }
    for line := range strings.SplitSeq(strings.TrimRight(stderrBuf.String(), "\n"), "\n") {
        if line != "" && !send("[STDERR] "+line) {
            return ctx.Err()
        }
    }
    return nil
}
```

The caller in `StreamServiceLogs` would need to pass `ctx` as well.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

@github-actions
Copy link

This pull request has merge conflicts. Please resolve the conflicts so the PR can stay up-to-date and reviewed.

3 similar comments
@github-actions
Copy link

This pull request has merge conflicts. Please resolve the conflicts so the PR can stay up-to-date and reviewed.

@github-actions
Copy link

This pull request has merge conflicts. Please resolve the conflicts so the PR can stay up-to-date and reviewed.

@github-actions
Copy link

This pull request has merge conflicts. Please resolve the conflicts so the PR can stay up-to-date and reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

⚡️ Feature: Swarm Support

2 participants