-
Notifications
You must be signed in to change notification settings - Fork 23
feat : satellite heartbeat #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
… reporting Signed-off-by: meethereum <[email protected]>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds satellite heartbeat/status reporting: new POST Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Sat as Satellite (startup)
participant SchState as StateScheduler
participant SchSR as StatusReportingScheduler
participant ProcState as FetchAndReplicateStateProcess
participant ProcSR as StatusReportingProcess
participant GC as Ground-Control
rect rgb(245,250,255)
Sat->>SchState: NewSchedulerWithInterval(..., &UpstreamInfo)
Sat->>SchSR: NewSchedulerWithInterval(..., &UpstreamInfo)
end
par Periodic state replication
SchState->>ProcState: Execute(ctx, upstreamPayload)
ProcState->>ProcState: fetch state/config, set digests & CurrentActivity
and Periodic status reporting
SchSR->>ProcSR: Execute(ctx, upstreamPayload)
ProcSR->>ProcSR: collect metrics, merge upstreamPayload -> StatusReportParams
ProcSR->>GC: POST /satellites/status (JSON)
GC-->>ProcSR: 2xx / error
end
sequenceDiagram
autonumber
participant Client as Satellite/process
participant GC as Ground-Control
participant Handler as statusReportHandler
Client->>GC: POST /satellites/status {StatusReportParams}
GC->>Handler: dispatch
Handler->>Handler: decode JSON -> SatelliteStatusParams
alt decode ok
Handler-->>Client: 200 OK
else decode error
Handler-->>Client: 400 Bad Request
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas needing extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/state/state_process.go (1)
148-190: Fix concurrent writes to shared upstream payload
payloadis mutated from multiple goroutines (state fetchers, config fetcher, and the main loop) without any synchronization. When the deferred send executes, those goroutines may still be writing topayload, so we hit a data race and the final snapshot we emit is undefined. Please confine all mutations to a single goroutine (e.g., store per-goroutine status locally and updatepayloadin the main loop, or guard the struct with a mutex) before sending it upstream.Also applies to: 208-279, 302-329
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
ground-control/internal/server/routes.go(1 hunks)ground-control/internal/server/satellite_handlers.go(3 hunks)internal/satellite/satellite.go(2 hunks)internal/scheduler/process.go(1 hunks)internal/scheduler/scheduler.go(4 hunks)internal/state/registration_process.go(3 hunks)internal/state/report.go(1 hunks)internal/state/reporting_process.go(1 hunks)internal/state/state_process.go(12 hunks)pkg/config/config.go(1 hunks)pkg/config/constants.go(1 hunks)pkg/config/getters.go(1 hunks)pkg/config/validate.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
internal/state/state_process.go (3)
internal/scheduler/scheduler.go (4)
UpstreamInfo(13-18)ActivityEncounteredError(33-33)ActivityReconcilingState(34-34)ActivityStateSynced(32-32)internal/state/state.go (1)
SatelliteState(27-30)pkg/config/config.go (1)
Config(37-42)
internal/scheduler/process.go (1)
internal/scheduler/scheduler.go (1)
UpstreamInfo(13-18)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
ConfigManager(29-37)internal/scheduler/scheduler.go (2)
UpstreamInfo(13-18)ParseEveryExpr(151-160)internal/logger/logger.go (1)
FromContext(85-94)internal/state/report.go (1)
StatusReportParams(16-26)
internal/state/registration_process.go (1)
internal/scheduler/scheduler.go (1)
UpstreamInfo(13-18)
internal/scheduler/scheduler.go (1)
internal/scheduler/process.go (1)
Process(8-20)
pkg/config/validate.go (2)
pkg/config/config.go (1)
HeartbeatConfig(32-35)pkg/config/constants.go (1)
DefaultStateReportCronExpr(20-20)
ground-control/internal/server/satellite_handlers.go (3)
ground-control/internal/server/server.go (1)
Server(18-22)ground-control/internal/server/helpers.go (1)
DecodeRequestBody(223-231)ground-control/internal/server/handleResponse.go (1)
HandleAppError(26-35)
pkg/config/getters.go (2)
pkg/config/manager.go (1)
ConfigManager(29-37)pkg/config/config.go (1)
HeartbeatConfig(32-35)
internal/satellite/satellite.go (4)
internal/scheduler/scheduler.go (2)
UpstreamInfo(13-18)NewSchedulerWithInterval(38-55)internal/state/state_process.go (1)
NewFetchAndReplicateStateProcess(39-46)internal/state/registration_process.go (1)
NewZtrProcess(33-40)internal/state/reporting_process.go (1)
NewStatusReportingProcess(24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: e2e-tests
- GitHub Check: test-release
- GitHub Check: build-ground-control
- GitHub Check: lint
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (7)
pkg/config/validate.go (2)
99-101: Add a warning for invalid heartbeat schedule (align with other intervals).Other intervals append a warning when defaulting. Do the same for HeartbeatConfig.StateReportInterval for consistency and operator visibility.
[ suggest_recommended_refactor ]if !isValidCronExpression(config.HeartbeatConfig.StateReportInterval) { config.HeartbeatConfig.StateReportInterval = DefaultStateReportCronExpr + warnings = append(warnings, fmt.Sprintf( + "invalid schedule provided for heartbeat.state_report_interval, using default schedule %s", + DefaultStateReportCronExpr, + )) }
106-112: Cron validator likely rejects '@every …' descriptors; broaden parsing.Defaults like "@every 00h05m00s" are descriptor specs and may not pass cron.ParseStandard. Update validator to accept standard fields and descriptors to avoid false negatives.
[ suggest_essential_refactor ]func isValidCronExpression(cronExpression string) bool { - if _, err := cron.ParseStandard(cronExpression); err != nil { - return false - } - return true + if cronExpression == "" { + return false + } + // 1) Try standard spec (5/6 fields). + if _, err := cron.ParseStandard(cronExpression); err == nil { + return true + } + // 2) Try descriptor-enabled parser (supports @every, @hourly, etc.). + p := cron.NewParser(cron.Minute | cron.Hour | cron.Dom | cron.Month | cron.Dow | cron.Descriptor | cron.SecondOptional) + if _, err := p.Parse(cronExpression); err == nil { + return true + } + return false }ground-control/internal/server/routes.go (1)
40-41: Endpoint name mismatch with objectives; confirm final path.Objectives call for POST /satellites/sync, but the route added is POST /satellites/status. Please confirm the intended path and align satellite client, docs, and ground-control accordingly.
[ request_verification ]If the intent is “heartbeat,” consider /satellites/heartbeat or keep /satellites/sync but reflect it everywhere.
pkg/config/constants.go (1)
20-22: Descriptor defaults look fine; consider simpler durations and brief doc."@every 00h05m00s" and "@every 00h10m00s" are valid durations; opt for "@every 5m" and "@every 10m" for readability, and document these must be time.ParseDuration-compatible.
[ suggest_optional_refactor ]pkg/config/getters.go (1)
145-155: Getters look good; verify usage wired into scheduler/reporting.The thread-safe getters follow existing pattern. Ensure the scheduler/heartbeat process actually consumes these.
[ approve_code_changes ]
[ request_verification ]#!/bin/bash # Find call sites for the new getters. rg -nP -C2 '\bGetStateReportingInterval\s*\(' rg -nP -C2 '\bIsHeartbeatDisabled\s*\('internal/state/registration_process.go (2)
42-54: Signature change acknowledged; ensure all Process implementations are updated.Execute now accepts upstreamPayload chan scheduler.UpstreamInfo. Confirm all process types and the scheduler invocations are updated to the new signature to avoid compile-time or runtime mismatches.
[ approve_code_changes ]
[ request_verification ]#!/bin/bash # Verify interface and implementations use the new Execute signature. rg -nP -C2 'type\s+Process\s+interface' rg -nP -C2 '\bExecute\s*\(\s*ctx\s+context\.Context\s*,\s*upstreamPayload\s+chan\s+scheduler\.UpstreamInfo\s*\)\s*error' rg -nP -C2 '\.\s*Execute\('
136-159: Close response body and add a timeout to avoid hangs and leaks.HTTP client has no timeout and the response body isn’t closed (also on non-200 paths), risking leaked connections and blocked goroutines.
[ raise_critical_issue ]Add import:
import ( "context" "encoding/json" "fmt" "net/http" + "time" "strings" "sync"Harden request:
func registerSatellite(groundControlURL, path, token string, ctx context.Context) (config.StateConfig, error) { ztrURL := fmt.Sprintf("%s/%s/%s", groundControlURL, path, token) - client := &http.Client{} + client := &http.Client{Timeout: 10 * time.Second} // Create a new request for the Zero Touch Registration of satellite - req, err := http.NewRequestWithContext(ctx, http.MethodGet, ztrURL, nil) + // Ensure a bounded lifetime even if caller ctx has no deadline. + ctxReq, cancel := context.WithTimeout(ctx, 10*time.Second) + defer cancel() + req, err := http.NewRequestWithContext(ctxReq, http.MethodGet, ztrURL, nil) if err != nil { return config.StateConfig{}, fmt.Errorf("failed to create request: %w", err) } - response, err := client.Do(req) + response, err := client.Do(req) if err != nil { return config.StateConfig{}, fmt.Errorf("failed to send request: %w", err) } + defer response.Body.Close() if response.StatusCode != http.StatusOK { return config.StateConfig{}, fmt.Errorf("failed to register satellite: %s", response.Status) } var authResponse config.StateConfig if err := json.NewDecoder(response.Body).Decode(&authResponse); err != nil { return config.StateConfig{}, fmt.Errorf("failed to decode response: %w", err) } return authResponse, nil }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 issues found across 13 files
Prompt for AI agents (all 9 issues)
Understand the root cause of the following 9 issues and fix them.
<file name="internal/state/state_process.go">
<violation number="1" location="internal/state/state_process.go:109">
Concurrent writes to shared payload from multiple goroutines without synchronization cause a data race; guard updates or communicate via a channel.</violation>
</file>
<file name="ground-control/internal/server/satellite_handlers.go">
<violation number="1" location="ground-control/internal/server/satellite_handlers.go:356">
Handler returns without sending a response; send an explicit success JSON response for consistency and client clarity.</violation>
<violation number="2" location="ground-control/internal/server/satellite_handlers.go:364">
Use the standard logger instead of fmt.Printf for consistency with the rest of the server.</violation>
</file>
<file name="internal/state/reporting_process.go">
<violation number="1" location="internal/state/reporting_process.go:35">
Process marks itself stopped immediately after starting a long-lived goroutine, leading to incorrect IsRunning state and lifecycle handling.</violation>
<violation number="2" location="internal/state/reporting_process.go:42">
HTTP response status code is ignored; non-2xx responses should surface as errors.</violation>
<violation number="3" location="internal/state/reporting_process.go:83">
Error context is swallowed; include err in the log for diagnosis.</violation>
<violation number="4" location="internal/state/reporting_process.go:144">
HTTP client has no timeout; a hung request can block and leak resources in the heartbeat loop.</violation>
</file>
<file name="internal/satellite/satellite.go">
<violation number="1" location="internal/satellite/satellite.go:57">
Incorrect error message: refers to status reporter while handling state scheduler creation.</violation>
<violation number="2" location="internal/satellite/satellite.go:67">
Check err before appending statusReportingScheduler; appending a nil scheduler can lead to a panic when iterating and invoking methods on entries in s.schedulers.</violation>
</file>
React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
ground-control/migrator/migrator.go (2)
30-37: Build the DSN safely and make sslmode configurable.Raw string formatting can break when credentials contain special chars. Also, hard‑coding sslmode=disable weakens security.
Apply:
@@ -import ( +import ( "context" "database/sql" "fmt" "log" "os" "time" + "net/url" @@ func parseDBConfig() DBConfig { - dbURL := fmt.Sprintf( - "postgres://%s:%s@%s:%s/%s?sslmode=disable", - username, - password, - HOST, - PORT, - dbName, - ) + sslmode := os.Getenv("DB_SSLMODE") + if sslmode == "" { + sslmode = "disable" // keep current default; consider "require" in prod + } + u := &url.URL{ + Scheme: "postgres", + User: url.UserPassword(username, password), + Host: fmt.Sprintf("%s:%s", HOST, PORT), + Path: dbName, + RawQuery: "sslmode=" + sslmode, + } + dbURL := u.String()
46-65: Prefer returning errors instead of log.Fatalf for testability and graceful shutdown.log.Fatalf aborts the process and skips defers; it’s hard to reuse this code in CLIs/tests/services. Consider returning errors from waitForPostgresReady, runMigrations, and DoMigrations, letting the caller decide how to handle them.
Also applies to: 67-79, 84-96
internal/state/reporting_process.go (3)
110-112: Gate execution with config (disabled flag, URL present).
CanExecutealways returns true. Consider checking heartbeat disabled flag and a non‑empty Ground Control URL so we log-and-skip early instead of starting.If getters don’t exist, I can add them; confirm desired sources of truth in config.
16-22: Tighten locking; simplify fields.
mucan be an embeddedsync.RWMutex(by value). UseRLockfor reads inIsRunning/Name, or drop locking forName(immutable).IsCompletedoesn’t needs.muat all; it only calls intocm.I can draft a minimal patch if you want this cleanup now.
Also applies to: 114-130
20-21: Remove or useDonechannel.
Doneis never used; remove it or wire it to lifecycle/teardown.Also applies to: 24-31
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ground-control/migrator/migrator.go(1 hunks)internal/state/reporting_process.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
ConfigManager(29-37)internal/scheduler/scheduler.go (2)
UpstreamInfo(13-18)ParseEveryExpr(151-160)internal/logger/logger.go (1)
FromContext(85-94)internal/state/report.go (1)
StatusReportParams(16-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: test-release
🔇 Additional comments (2)
internal/state/reporting_process.go (2)
33-95: Fix unbounded goroutine + inaccurate running state: run the receive loop inside Execute (block until ctx/channel close).Current code spawns a background goroutine every Execute, leaking receivers and racing on
isRunning. Start after gating, don’t return untilctx.Done()orupstreamcloses, fix log messages/typo, make channel receive‑only, and drop the redundantstring()cast.Apply:
-func (s *StatusReportingProcess) Execute(ctx context.Context, upstream chan scheduler.UpstreamInfo) error { - s.start() - defer s.stop() - - log := logger.FromContext(ctx).With().Str("process", s.name).Logger() - - canExecute, reason := s.CanExecute(&log) - if !canExecute { - log.Warn().Msgf("Process %s cannot execute: %s", s.name, reason) - return nil - } - log.Info().Msgf("Executing process") - - // consume upstream info continuously - go func() { - for { - select { - case <-ctx.Done(): - return - case info, ok := <-upstream: - if !ok { - return - } - var req StatusReportParams - - groundControlURL := s.cm.ResolveGroundControlURL() - - satteliteName, err := extractSatelliteNameFromURL(info.StateURL) - if err != nil { - log.Warn().Msg("Failed to parse state reporting interval") - continue - } - - duration, err := scheduler.ParseEveryExpr(s.cm.GetStateReportingInterval()) - if err != nil { - log.Warn().Msg("Failed to parse state reporting interval") - continue - } - - // populate info from upstream channel into the request - req.Name = satteliteName - req.Activity = info.CurrentActivity - req.StateReportInterval = s.cm.GetStateReportingInterval() - req.LatestStateDigest = info.LatestStateDigest - req.LatestConfigDigest = info.LatestConfigDigest - - // populate all other info which is not provided by the upstream channel, into the request - if err := collectStatusReportParams(ctx, duration, &req); err != nil { - log.Warn().Msg("Failed to populate heartbeat payload fully") - } - - if err := sendStatusReport(ctx, string(groundControlURL), &req); err != nil { - log.Warn().Msg("Failed to send status report") - } - - log.Info().Msg("Heartbeat sent to ground control successfully") - } - } - }() - - return nil -} +func (s *StatusReportingProcess) Execute(ctx context.Context, upstream <-chan scheduler.UpstreamInfo) error { + log := logger.FromContext(ctx).With().Str("process", s.name).Logger() + + canExecute, reason := s.CanExecute(&log) + if !canExecute { + log.Warn().Msgf("Process %s cannot execute: %s", s.name, reason) + return nil + } + + s.start() + defer s.stop() + log.Info().Msg("Executing process") + + for { + select { + case <-ctx.Done(): + return nil + case info, ok := <-upstream: + if !ok { + log.Info().Msg("Upstream channel closed; stopping") + return nil + } + + groundControlURL := s.cm.ResolveGroundControlURL() + + satelliteName, err := extractSatelliteNameFromURL(info.StateURL) + if err != nil { + log.Warn().Err(err).Msg("Failed to extract satellite name from state URL") + continue + } + + duration, err := scheduler.ParseEveryExpr(s.cm.GetStateReportingInterval()) + if err != nil { + log.Warn().Err(err).Msg("Failed to parse state reporting interval") + continue + } + + req := StatusReportParams{ + Name: satelliteName, + Activity: info.CurrentActivity, + StateReportInterval: s.cm.GetStateReportingInterval(), + LatestStateDigest: info.LatestStateDigest, + LatestConfigDigest: info.LatestConfigDigest, + } + + if err := collectStatusReportParams(ctx, duration, &req); err != nil { + log.Warn().Err(err).Msg("Failed to populate heartbeat payload fully") + } + + if err := sendStatusReport(ctx, groundControlURL, &req); err != nil { + log.Warn().Err(err).Msg("Failed to send status report") + continue + } + log.Info().Msg("Heartbeat sent to ground control successfully") + } + } +}
3-14: Handle non‑2xx responses and add per‑request timeout.Currently any HTTP status is treated as success and the client has no timeout. Add a timeout and fail on non‑2xx, including response body for diagnostics.
Apply:
@@ -import ( +import ( "bytes" "context" "encoding/json" "fmt" "github.com/container-registry/harbor-satellite/internal/logger" "github.com/container-registry/harbor-satellite/internal/scheduler" "github.com/container-registry/harbor-satellite/pkg/config" "github.com/rs/zerolog" + "io" "net/http" + "strings" "sync" + "time" ) @@ func sendStatusReport(ctx context.Context, groundControlURL string, req *StatusReportParams) error { - url := fmt.Sprintf("%s/satellites/status", groundControlURL) + url := strings.TrimRight(groundControlURL, "/") + "/satellites/status" @@ - client := &http.Client{} + client := &http.Client{Timeout: 10 * time.Second} resp, err := client.Do(httpReq) if err != nil { return fmt.Errorf("failed to send status report: %w", err) } defer func() { _ = resp.Body.Close() }() - return nil + if resp.StatusCode < 200 || resp.StatusCode >= 300 { + b, _ := io.ReadAll(io.LimitReader(resp.Body, 4<<10)) + return fmt.Errorf("ground control returned %s: %s", resp.Status, string(b)) + } + return nil }Also applies to: 132-156
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/satellite/satellite.go (1)
91-93: Optional: avoid shadowing the importedschedulerpackage name.Rename loop var to
schto reduce confusion.-for _, scheduler := range s.schedulers { - scheduler.Stop() +for _, sch := range s.schedulers { + sch.Stop() }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
internal/satellite/satellite.go(2 hunks)internal/state/reporting_process.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/state/reporting_process.go
🧰 Additional context used
🧬 Code graph analysis (1)
internal/satellite/satellite.go (4)
internal/scheduler/scheduler.go (2)
UpstreamInfo(13-18)NewSchedulerWithInterval(38-55)internal/state/state_process.go (1)
NewFetchAndReplicateStateProcess(39-46)internal/state/registration_process.go (1)
NewZtrProcess(33-40)internal/state/reporting_process.go (1)
NewStatusReportingProcess(24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build-ground-control
- GitHub Check: e2e-tests
- GitHub Check: test-release
🔇 Additional comments (5)
internal/satellite/satellite.go (5)
48-61: Good: error checked before append; past panic risk resolved.The state replication scheduler now checks
errbefore appending/starting. This addresses the earlier nil-append/Stop panic and log mix‑up noted in prior reviews.
34-46: LGTM: ZTR scheduler wiring and error handling look correct.Channel is passed, error handled, append and Run are in the right order.
30-31: LGTM: status reporting process instantiation.Constructor usage looks consistent with other processes.
76-77: LGTM: state scheduler started after successful creation.Order is correct.
82-83: Confirm Run contract (returns immediately vs blocks until ctx.Done).
Runreturnsctx.Err()right away; ifctxisn’t canceled, this returns nil immediately. Verify this matches caller expectations; manyRunmethods block until cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
internal/state/reporting_process.go (4)
20-21: Clean up minor lifecycle nits (optional).
- Done channel is never used; remove it or close it in stop() and document semantics to avoid double-close.
- Name()/IsRunning() lock around immutable string and a bool read; consider RLock or no lock if you keep writes serialized in start/stop.
Also applies to: 110-114, 126-136
90-96: Enrich success logs with minimal context.Add satellite name and HTTP status to success logs to aid triage.
Example:
log.Info().Str("satellite", req.Name).Msg("Heartbeat sent to ground control successfully")
146-151: Consider Authorization header if Ground Control requires auth.If GC expects a bearer token, set it on the request from config (e.g., cm.Token). This likely belongs in sendStatusReport; you may pass the token as an argument or refactor sendStatusReport into a method on StatusReportingProcess.
Example change:
httpReq.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token))
120-124: Confirm IsComplete() semantics.Returning cm.IsZTRDone() for a heartbeat process may prematurely mark it “complete” and interfere with scheduler policy. If heartbeats are intended to run for the lifetime of the satellite, consider always returning false here or gating on a dedicated heartbeat lifecycle flag.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/state/reporting_process.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
ConfigManager(29-37)internal/scheduler/scheduler.go (2)
UpstreamInfo(13-18)ParseEveryExpr(151-160)internal/logger/logger.go (1)
FromContext(85-94)internal/state/report.go (1)
StatusReportParams(16-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build-ground-control
- GitHub Check: e2e-tests
- GitHub Check: test-release
🔇 Additional comments (2)
internal/state/reporting_process.go (2)
33-36: Fix unbounded background receivers; guard start() and avoid spawning a new goroutine each Execute tick.Execute returns immediately after launching a long‑lived goroutine, and start() currently always returns true. If the scheduler calls Execute periodically, you’ll accumulate receivers contending on the same channel. Guard start() so only one runner exists at a time, and keep it running until ctx is canceled or upstream closes. This preserves a single lifecycle.
Apply this minimal guard:
func (s *StatusReportingProcess) start() bool { s.mu.Lock() defer s.mu.Unlock() - s.isRunning = true - return true + if s.isRunning { + return false + } + s.isRunning = true + return true }Also applies to: 49-98, 103-108
3-14: Harden HTTP: timeout, non‑2xx handling, and safe URL join.
- No timeout on client/request; risks hangs in the loop.
- Non‑2xx responses treated as success.
- URL is concatenated; prefer url.JoinPath to avoid double slashes.
import ( "bytes" "context" "encoding/json" "fmt" "github.com/container-registry/harbor-satellite/internal/logger" "github.com/container-registry/harbor-satellite/internal/scheduler" "github.com/container-registry/harbor-satellite/pkg/config" "github.com/rs/zerolog" "net/http" + "net/url" + "io" + "time" "sync" ) @@ func sendStatusReport(ctx context.Context, groundControlURL string, req *StatusReportParams) error { - url := fmt.Sprintf("%s/satellites/status", groundControlURL) + endpoint, err := url.JoinPath(groundControlURL, "satellites", "status") + if err != nil { + return fmt.Errorf("failed to build status URL: %w", err) + } body, err := json.Marshal(req) if err != nil { return fmt.Errorf("failed to marshal status report: %w", err) } - httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewBuffer(body)) + // Per‑request timeout to avoid wedging the loop. + cctx, cancel := context.WithTimeout(ctx, 10*time.Second) + defer cancel() + httpReq, err := http.NewRequestWithContext(cctx, http.MethodPost, endpoint, bytes.NewBuffer(body)) if err != nil { return fmt.Errorf("failed to create status report request: %w", err) } httpReq.Header.Set("Content-Type", "application/json") - client := &http.Client{} + client := &http.Client{Timeout: 15 * time.Second} resp, err := client.Do(httpReq) if err != nil { return fmt.Errorf("failed to send status report: %w", err) } defer func() { _ = resp.Body.Close() }() - return nil + if resp.StatusCode < 200 || resp.StatusCode >= 300 { + // Include small body snippet for diagnostics. + b, _ := io.ReadAll(io.LimitReader(resp.Body, 4<<10)) + return fmt.Errorf("ground control returned %s: %s", resp.Status, string(bytes.TrimSpace(b))) + } + return nil }Also applies to: 138-162
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
c3a4ac8 to
25d073e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/state/registration_process.go (1)
136-159: Close response body and add client timeouts
- resp.Body is never closed (and leaked on non-200); add Close in all paths.
- http.Client has no timeout; a hung GC will block indefinitely.
Apply:
- client := &http.Client{} + client := &http.Client{Timeout: 15 * time.Second} @@ - response, err := client.Do(req) + response, err := client.Do(req) if err != nil { return config.StateConfig{}, fmt.Errorf("failed to send request: %w", err) } - if response.StatusCode != http.StatusOK { + defer response.Body.Close() + if response.StatusCode != http.StatusOK { return config.StateConfig{}, fmt.Errorf("failed to register satellite: %s", response.Status) }Add import:
import "time"
♻️ Duplicate comments (3)
internal/state/reporting_process.go (3)
128-133: Consider Authorization header if GC requires authIf Ground Control expects auth, add Authorization: Bearer (from ConfigManager) to the request.
Do we need to include the satellite token? If yes, I can wire it through sendStatusReport safely.
120-144: Add HTTP timeout and treat non‑2xx as errorsHeartbeat HTTP client has no timeout and success path ignores status code.
Apply:
- httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewBuffer(body)) + httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewBuffer(body)) if err != nil { return fmt.Errorf("failed to create status report request: %w", err) } httpReq.Header.Set("Content-Type", "application/json") - client := &http.Client{} + client := &http.Client{Timeout: 15 * time.Second} resp, err := client.Do(httpReq) if err != nil { return fmt.Errorf("failed to send status report: %w", err) } defer func() { _ = resp.Body.Close() }() + + if resp.StatusCode < 200 || resp.StatusCode >= 300 { + return fmt.Errorf("ground control non-2xx response: %s", resp.Status) + }Add import:
import "time"
100-102: Honor heartbeat_config.disabled in CanExecuteSkip execution when heartbeats are disabled:
func (s *StatusReportingProcess) CanExecute(log *zerolog.Logger) (bool, string) { - return true, fmt.Sprintf("Process %s can execute all conditions fulfilled", s.name) + if s.cm != nil && s.cm.IsHeartbeatDisabled() { + return false, "heartbeat disabled by configuration" + } + return true, fmt.Sprintf("Process %s can execute all conditions fulfilled", s.name) }
🧹 Nitpick comments (6)
internal/state/registration_process.go (1)
137-145: Avoid token-in-path for registration (security/observability concern)Embedding the token in the URL path risks leakage via logs/proxies. Prefer Authorization: Bearer and a path without secrets.
internal/scheduler/scheduler.go (1)
137-138: Nil-guard upstreamPayload when launching processesIf any scheduler is created with a nil upstreamPayload, passing it blindly can NPE in implementations. Guard:
payload := s.upstreamPayload if payload == nil { payload = &UpstreamInfo{} } if err := s.process.Execute(ctx, payload); err != nil { ... }internal/state/reporting_process.go (4)
87-92: Make start() idempotent to prevent re‑entrancyGuard re‑entry:
func (s *StatusReportingProcess) start() bool { - s.mu.Lock(); defer s.mu.Unlock() - s.isRunning = true - return true + s.mu.Lock() + defer s.mu.Unlock() + if s.isRunning { + return false + } + s.isRunning = true + return true }
41-46: Avoid double stop()You already defer s.stop(); calling it again in the CanExecute false branch is redundant and risks confusing lifecycle logs. Remove the extra s.stop().
58-63: Default interval on parse errorIf interval parse fails, use a sane default (e.g., 5m) instead of zero duration:
duration, err := scheduler.ParseEveryExpr(s.cm.GetStateReportingInterval()) if err != nil { - log.Warn().Err(err).Msg("Failed to parse state reporting interval") + log.Warn().Err(err).Msg("Failed to parse state reporting interval, defaulting to 5m") + duration = 5 * time.Minute }Add import if needed:
import "time"
52-56: Handle missing satellite name decisivelyIf extractSatelliteNameFromURL fails, skip sending this heartbeat instead of emitting an empty name.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
internal/satellite/satellite.go(2 hunks)internal/scheduler/process.go(1 hunks)internal/scheduler/scheduler.go(4 hunks)internal/state/registration_process.go(3 hunks)internal/state/reporting_process.go(1 hunks)internal/state/state_process.go(12 hunks)pkg/config/getters.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/satellite/satellite.go
- pkg/config/getters.go
🧰 Additional context used
🧬 Code graph analysis (5)
internal/state/registration_process.go (1)
internal/scheduler/scheduler.go (1)
UpstreamInfo(13-18)
internal/scheduler/scheduler.go (1)
internal/scheduler/process.go (1)
Process(8-20)
internal/scheduler/process.go (1)
internal/scheduler/scheduler.go (1)
UpstreamInfo(13-18)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
ConfigManager(29-37)internal/scheduler/scheduler.go (2)
UpstreamInfo(13-18)ParseEveryExpr(151-160)internal/logger/logger.go (1)
FromContext(85-94)internal/state/report.go (1)
StatusReportParams(16-26)
internal/state/state_process.go (3)
internal/scheduler/scheduler.go (4)
UpstreamInfo(13-18)ActivityEncounteredError(33-33)ActivityReconcilingState(34-34)ActivityStateSynced(32-32)internal/state/state.go (1)
SatelliteState(27-30)pkg/config/config.go (1)
Config(37-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: e2e-tests
- GitHub Check: test-release
- GitHub Check: build-ground-control
- GitHub Check: Codacy Static Code Analysis
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
…or-satellite into status-reporting
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
internal/state/state_process.go (2)
130-190: Critical: Data race on upstreamPayload from concurrent goroutines.Multiple state fetcher goroutines write to
upstreamPayload.CurrentActivity(lines 150, 159, 171, 179) without synchronization, causing a data race. The aggregation loop (lines 273-316) also readsupstreamPayloadconcurrently.Two fixes:
- Preferred: Remove all direct writes to
upstreamPayloadfrom worker goroutines. Instead, have workers communicate results via the existingStateFetcherResultchannel, add anActivityfield to the result struct, and updateupstreamPayloadonly in the main aggregation loop (single goroutine).- Alternative: Make
UpstreamInfoconcurrency-safe by embeddingsync.RWMutexand exposingSet*/Get*methods; useLockfor writers andRLockfor readers throughout.Additionally, only set
ActivityStateSyncedafter the aggregation loop confirms all workers succeeded, not inside the config goroutine (line 270).
268-271: Major: Premature "state synced" status.Setting
ActivityStateSyncedinside the config goroutine (line 270) can announce success before state fetchers finish or errors are aggregated. Move this assignment after the aggregation loop (lines 273-316) completes successfully with no errors.ground-control/internal/server/satellite_handlers.go (1)
356-367: Address logging and response consistency.Two issues:
- Line 364:
fmt.Printfis inconsistent. Use the standardlogpackage (e.g.,log.Printf) for consistency with other handlers.- Line 365: Handler sends only a status header. For consistency with other handlers (e.g.,
listSatelliteHandler,GetSatelliteByName), send an explicit JSON response:WriteJSONResponse(w, http.StatusOK, map[string]string{"message": "Status received"})internal/state/reporting_process.go (2)
101-103: Honor heartbeat_config.disabled.
CanExecuteshould check if heartbeats are disabled via configuration:func (s *StatusReportingProcess) CanExecute(log *zerolog.Logger) (bool, string) { + if s.cm.IsHeartbeatDisabled() { + return false, "heartbeat disabled by configuration" + } return true, fmt.Sprintf("Process %s can execute all conditions fulfilled", s.name) }
121-151: Missing Authorization header; fix error message.Issue 1: The request lacks an Authorization header. If Ground Control requires authentication, add:
httpReq.Header.Set("Content-Type", "application/json") + // Add satellite token if required by Ground Control + // httpReq.Header.Set("Authorization", fmt.Sprintf("Bearer %s", s.cm.GetToken()))Issue 2: Line 147 error message is incorrect:
if resp.StatusCode < 200 || resp.StatusCode >= 300 { - return fmt.Errorf("heartbeat request timed out : %w, with status code %d", err, resp.StatusCode) + return fmt.Errorf("heartbeat request failed with status code %d", resp.StatusCode) }The error message says "timed out" but this is a status code check, not a timeout. Additionally,
errwould benilhere (HTTP request succeeded but returned non-2xx), causing a nil dereference in%w.
🧹 Nitpick comments (5)
pkg/config/validate.go (1)
89-92: Consider aligning with the existing pattern for cron field validation.The explicit empty-string check and defaulting here differs from how
StateReplicationInterval(lines 94-97) andRegisterSatelliteInterval(lines 99-102) are handled—those fields rely solely on validation to catch empty or invalid values. While having a separate message for "not specified" vs. "invalid" can improve UX, the inconsistency may confuse future maintainers.Additionally, line 91 has a space before the colon (
"defaulting to : %s"), which is a minor stylistic inconsistency.Consider one of these approaches for consistency:
Option 1: Remove the explicit empty check (lines 89-92) and rely on validation (lines 104-107) to catch both empty and invalid cases, matching the pattern used for other cron fields:
- if config.HeartbeatConfig.StateReportInterval == "" { - config.HeartbeatConfig.StateReportInterval = DefaultStateReportCronExpr - warnings = append(warnings, fmt.Sprintf("heartbeat interval not specified, defaulting to : %s", DefaultStateReportCronExpr)) - } -Then update the validation message to be clearer:
if !isValidCronExpression(config.HeartbeatConfig.StateReportInterval) { config.HeartbeatConfig.StateReportInterval = DefaultStateReportCronExpr - warnings = append(warnings, fmt.Sprintf("invalid schedule provided for state_report_interval, using default schedule %s", DefaultStateReportCronExpr)) + warnings = append(warnings, fmt.Sprintf("invalid or empty schedule provided for state_report_interval, using default schedule %s", DefaultStateReportCronExpr)) }Option 2: Add explicit empty-string checks for
StateReplicationIntervalandRegisterSatelliteIntervalto match this pattern (better UX but more code).Option 3: Keep as-is if the distinct "not specified" message is intentional for the new heartbeat feature, but fix the spacing:
- warnings = append(warnings, fmt.Sprintf("heartbeat interval not specified, defaulting to : %s", DefaultStateReportCronExpr)) + warnings = append(warnings, fmt.Sprintf("heartbeat interval not specified, defaulting to: %s", DefaultStateReportCronExpr))ground-control/internal/server/satellite_handlers.go (1)
29-39: Suggest consolidating duplicate type definitions.
SatelliteStatusParamshere duplicatesStatusReportParamsininternal/state/report.go(lines 15-25). Consider:
- Import and use
state.StatusReportParamsin the handler- Or move the shared type to a common package (e.g.,
pkg/api)This eliminates duplication and ensures consistency.
internal/state/reporting_process.go (3)
17-31: Remove unused Done channel.The
Donefield (line 21) is declared but never used. Remove it to reduce confusion, or document its intended purpose if it's for future use.
88-93: start() always returns true; consider preventing concurrent execution.If concurrent
Executecalls are a concern, implement a guard:func (s *StatusReportingProcess) start() bool { s.mu.Lock() defer s.mu.Unlock() + if s.isRunning { + return false // Already running + } s.isRunning = true return true }Otherwise, remove the return value as it's always
true.
53-86: Return on status report error to avoid misleading success log.if err := sendStatusReport(ctx, groundControlURL, &req); err != nil { - log.Warn().Err(err).Msg("Failed to send status report") + return fmt.Errorf("failed to send status report: %w", err) } log.Info().Msg("Heartbeat sent to ground control successfully")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
ground-control/internal/server/satellite_handlers.go(3 hunks)internal/state/reporting_process.go(1 hunks)internal/state/state_process.go(12 hunks)pkg/config/validate.go(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
ConfigManager(29-37)internal/scheduler/scheduler.go (2)
UpstreamInfo(13-18)ParseEveryExpr(151-160)internal/logger/logger.go (1)
FromContext(85-94)internal/state/report.go (1)
StatusReportParams(16-26)
internal/state/state_process.go (2)
internal/scheduler/scheduler.go (4)
UpstreamInfo(13-18)ActivityEncounteredError(33-33)ActivityReconcilingState(34-34)ActivityStateSynced(32-32)internal/state/state.go (1)
SatelliteState(27-30)
ground-control/internal/server/satellite_handlers.go (3)
ground-control/internal/server/server.go (1)
Server(18-22)ground-control/internal/server/helpers.go (1)
DecodeRequestBody(223-231)ground-control/internal/server/handleResponse.go (1)
HandleAppError(26-35)
pkg/config/validate.go (2)
pkg/config/config.go (1)
HeartbeatConfig(32-35)pkg/config/constants.go (1)
DefaultStateReportCronExpr(20-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: e2e-tests
- GitHub Check: build-satellite
- GitHub Check: build-ground-control
- GitHub Check: test-release
🔇 Additional comments (4)
pkg/config/validate.go (1)
104-107: LGTM!The validation logic is correct and follows the same pattern used for other cron intervals (
StateReplicationIntervalandRegisterSatelliteInterval). The default fallback and warning message are appropriate.internal/state/state_process.go (3)
62-62: LGTM: Execute signature updated to accept upstreamPayload.The signature change and initial StateURL assignment are appropriate. The main goroutine safely initializes StateURL before spawning workers.
Also applies to: 85-86
109-118: LGTM: State digest fetch and assignment.The digest fetch logic is correct. Errors are logged and
ActivityEncounteredErroris set, while execution continues for partial processing. Assignment toupstreamPayload.LatestStateDigestis safe as it occurs in the main goroutine before workers start.
209-212: LGTM: Config digest conditionally assigned.Good fix!
LatestConfigDigestis now only set whenFetchDigestsucceeds (err == nil), addressing the previous review concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
internal/state/reporting_process.go (3)
33-86: Implement periodic heartbeats as required; current implementation is event-driven only.Per PR objectives (issue #207) and a past review comment, satellites must send heartbeats at a configured interval even when nothing changes. The current implementation only sends a report when
Executeis called with new upstream info, making it purely event-driven rather than periodic.As suggested in the past review, parse the interval once, create a
time.NewTicker, and send reports on each tick. You can still send immediately on upstream updates if desired, but the ticker ensures periodic reporting regardless of activity.
101-103: Honor heartbeat_config.disabled in CanExecute.Per the PR description and a past review comment, the configuration includes
heartbeat_config.disabledto allow disabling heartbeats.CanExecuteshould check this flag and returnfalsewhen heartbeats are disabled.Apply this diff to add the guard:
func (s *StatusReportingProcess) CanExecute(log *zerolog.Logger) (bool, string) { + if s.cm != nil && s.cm.IsHeartbeatDisabled() { + return false, "heartbeat disabled by configuration" + } return true, fmt.Sprintf("Process %s can execute all conditions fulfilled", s.name) }Note: Verify the exact method name on
ConfigManager(e.g.,IsHeartbeatDisabled(),GetHeartbeatDisabled(), or similar).
137-137: Add Authorization header with satellite token.Per a past review comment, if Ground Control requires authentication, the request should include an
Authorization: Bearer <token>header. Based on the relevant code snippets showingConfigManagerhas aTokenfield, retrieve and set the token.Apply this diff to add the Authorization header:
httpReq.Header.Set("Content-Type", "application/json") + if s.cm.Token != "" { + httpReq.Header.Set("Authorization", fmt.Sprintf("Bearer %s", s.cm.Token)) + }Note: The
Tokenfield is shown in the relevant code snippets frompkg/config/manager.go. Adjust the field access pattern if the token is retrieved via a getter method instead.
🧹 Nitpick comments (2)
internal/state/reporting_process.go (2)
21-21: Remove unused Done channel.The
Donefield is declared but never initialized, closed, or selected anywhere in the code.Apply this diff to remove the unused field:
type StatusReportingProcess struct { name string isRunning bool mu *sync.Mutex - Done chan struct{} cm *config.ConfigManager }
115-119: Remove unnecessary mutex in Name().The
namefield is set once in the constructor and never modified, so the mutex is unnecessary overhead for this read-only access.Apply this diff:
func (s *StatusReportingProcess) Name() string { - s.mu.Lock() - defer s.mu.Unlock() return s.name }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/state/reporting_process.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
ConfigManager(29-37)internal/scheduler/scheduler.go (2)
UpstreamInfo(13-18)ParseEveryExpr(151-160)internal/logger/logger.go (1)
FromContext(85-94)internal/state/report.go (1)
StatusReportParams(16-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: e2e-tests
- GitHub Check: build-ground-control
- GitHub Check: build-satellite
- GitHub Check: test-release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
dagger.json(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: e2e-tests
- GitHub Check: build-satellite
- GitHub Check: build-ground-control
cef9020 to
fa1f204
Compare
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
ground-control/internal/server/satellite_handlers.go (1)
364-364: Use log.Printf for consistency with the codebase.Line 364 uses
fmt.Printffor debug output, which is inconsistent with the rest of the server code that uses the standardlogpackage (e.g., lines 48, 76, 128, etc.).Based on learnings
Apply this diff:
} // todo : process the heartbeat. eg:- save latest state in db - fmt.Printf("satellite reported status : %v\n", req) + log.Printf("satellite reported status: %v", req) w.WriteHeader(http.StatusOK)
🧹 Nitpick comments (1)
ground-control/internal/server/satellite_handlers.go (1)
356-367: Return a JSON response body for client clarity and API consistency.The handler currently only sends a 200 status code without a response body (line 365). Other handlers in this file return JSON responses using
WriteJSONResponse(e.g., lines 235, 338, 353, 384, 479, 540, 668). Consider returning a simple success JSON for consistency.Apply this diff:
func (s *Server) statusReportHandler(w http.ResponseWriter, r *http.Request) { var req SatelliteStatusParams if err := DecodeRequestBody(r, &req); err != nil { log.Println(err) HandleAppError(w, err) return } // todo : process the heartbeat. eg:- save latest state in db - fmt.Printf("satellite reported status : %v\n", req) - w.WriteHeader(http.StatusOK) - + log.Printf("satellite reported status: %v", req) + WriteJSONResponse(w, http.StatusOK, map[string]string{"message": "Status received"}) }If your clients expect an empty 200 response, please confirm and I can mark this as resolved.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
ground-control/go.sumis excluded by!**/*.sum
📒 Files selected for processing (5)
.dagger/e2e.go(2 hunks)go.mod(5 hunks)ground-control/go.mod(2 hunks)ground-control/internal/server/satellite_handlers.go(3 hunks)pkg/config/config.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ground-control/internal/server/satellite_handlers.go (3)
ground-control/internal/server/server.go (1)
Server(18-22)ground-control/internal/server/helpers.go (1)
DecodeRequestBody(223-231)ground-control/internal/server/handleResponse.go (1)
HandleAppError(26-35)
🪛 OSV Scanner (2.2.3)
ground-control/go.mod
[HIGH] 250-250: github.com/golang-jwt/jwt 3.2.2+incompatible: Excessive memory allocation during header parsing in github.com/golang-jwt/jwt
(GO-2025-3553)
[HIGH] 250-250: github.com/golang-jwt/jwt 3.2.2+incompatible: jwt-go allows excessive memory allocation during header parsing
[HIGH] 252-252: github.com/golang-jwt/jwt/v5 5.2.1: Excessive memory allocation during header parsing in github.com/golang-jwt/jwt
(GO-2025-3553)
[HIGH] 252-252: github.com/golang-jwt/jwt/v5 5.2.1: jwt-go allows excessive memory allocation during header parsing
[HIGH] 280-280: github.com/hashicorp/go-getter 1.7.8: HashiCorp go-getter Vulnerable to Symlink Attacks in github.com/hashicorp/go-getter
(GO-2025-3892)
[HIGH] 280-280: github.com/hashicorp/go-getter 1.7.8: HashiCorp go-getter Vulnerable to Symlink Attacks
[HIGH] 355-355: github.com/open-policy-agent/opa 1.2.0: OPA server Data API HTTP path injection of Rego in github.com/open-policy-agent/opa
(GO-2025-3660)
[HIGH] 355-355: github.com/open-policy-agent/opa 1.2.0: OPA server Data API HTTP path injection of Rego
[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Allows A Specially Crafted Chart Archive To Cause Out Of Memory Termination in helm.sh/helm
(GO-2025-3601)
[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Allows A Specially Crafted JSON Schema To Cause A Stack Overflow in helm.sh/helm
(GO-2025-3602)
[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm vulnerable to Code Injection through malicious chart.yaml content in helm.sh/helm
(GO-2025-3802)
[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Charts with Specific JSON Schema Values Can Cause Memory Exhaustion in helm.sh/helm
(GO-2025-3887)
[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm May Panic Due To Incorrect YAML Content in helm.sh/helm
(GO-2025-3888)
[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Allows A Specially Crafted Chart Archive To Cause Out Of Memory Termination
[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm vulnerable to Code Injection through malicious chart.yaml content
[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Allows A Specially Crafted JSON Schema To Cause A Stack Overflow
[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Charts with Specific JSON Schema Values Can Cause Memory Exhaustion
[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm May Panic Due To Incorrect YAML Content
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build-satellite
- GitHub Check: test-release
- GitHub Check: e2e-tests
🔇 Additional comments (6)
ground-control/go.mod (1)
477-477: LGTM! Replace directive aligns with monorepo structure.The replace directive correctly redirects the harbor-satellite module to the parent directory, which is appropriate for local development in a monorepo.
.dagger/e2e.go (1)
72-72: LGTM! Directory structure correctly aligned with monorepo layout.The change to mount the entire project root at
/appand set the working directory to/app/ground-controlproperly reflects the monorepo structure where ground-control is a subdirectory.Also applies to: 80-81
go.mod (2)
16-18: LGTM! Dependency updates support new status reporting features.The updates to
gopsutil/v4,viper, andtestifyalign with the PR objectives:
gopsutil/[email protected]enables system metrics collection (CPU, memory, storage) for status reports[email protected]likely supports new configuration options (heartbeat settings)[email protected]provides updated testing utilities
112-112: No action required. The verification confirms thatgithub.com/BurntSushi/tomlis directly imported atinternal/container_runtime/containerd.go:9, making it a proper direct dependency. The current entry in go.mod (without// indirect) is correct.Likely an incorrect or invalid review comment.
pkg/config/config.go (1)
25-26: LGTM! New heartbeat configuration fields are well-designed.The addition of
DisableHeartbeatandHeartbeatIntervalfields:
- Follow Go naming conventions
- Use appropriate types (bool for toggle, string for interval/cron expression)
- Include
omitemptyJSON tags for optional configuration- Are logically placed alongside other interval settings
These fields enable the satellite heartbeat status reporting feature described in the PR objectives.
ground-control/internal/server/satellite_handlers.go (1)
29-39: LGTM! SatelliteStatusParams structure is well-designed.The new type appropriately captures all necessary satellite health metrics:
- Identity (Name)
- Activity state
- Timing (StateReportInterval, RequestCreatedTime)
- State tracking (LatestStateDigest, LatestConfigDigest)
- Resource metrics (MemoryUsedBytes, StorageUsedBytes, CPUPercent)
Field types and JSON tags are appropriate. This aligns with the status reporting requirements in issue #207.
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.dagger/main.go(1 hunks).dagger/utils.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
.dagger/utils.go (1)
.dagger/main.go (2)
PROJ_MOUNT(14-14)DEFAULT_GO(13-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: cubic · AI code reviewer
- GitHub Check: build-ground-control
- GitHub Check: build-satellite
- GitHub Check: test-release
- GitHub Check: e2e-tests
- GitHub Check: Codacy Static Code Analysis
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.dagger/publish_image.go(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
.dagger/publish_image.go (1)
.dagger/main.go (1)
PROJ_MOUNT(14-14)
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
.dagger/publish_image.go (2)
24-35: Inconsistent directory handling creates confusion.The
directoryvariable is computed per-component (lines 24-31) but only used for the debug listing container (lines 32-35). The actual build at line 36 passes the fullsourcedirectory togetBuildContainer, not the per-componentdirectory. This inconsistency may confuse readers about what's actually being built.Consider either:
- Removing the per-component
directorycomputation and usingsourcedirectly for the debug container, or- Adding a comment clarifying that
directoryis only for debug listing and doesn't affect the build.Apply this diff to simplify by using
sourceconsistently:- var directory *dagger.Directory - switch { - case component == "satellite": - directory = source - case component == "ground-control": - directory = source.Directory(GROUND_CONTROL_PATH) - default: - panic(fmt.Sprintf("unknown component: %s", component)) - } dirContainer := dag.Container().From("alpine:latest"). - WithMountedDirectory(PROJ_MOUNT, directory). + WithMountedDirectory(PROJ_MOUNT, source). WithExec([]string{"ls", PROJ_MOUNT}) dirContainer.Stdout(ctx)
173-179: Add a default case for defensive programming.The switch statement lacks a default case to handle unexpected component values. While earlier validation prevents invalid components from reaching this code, adding a default case would improve consistency with the defensive pattern used in
PublishImage(lines 24-30) and make the code more robust.Apply this diff to add a default case:
var workDir string switch { case component == "satellite": workDir = PROJ_MOUNT case component == "ground-control": workDir = PROJ_MOUNT + "/ground-control" + default: + panic(fmt.Sprintf("unknown component: %s", component)) }QUICKSTART.md (1)
169-169: Use standard JSON format without inline comments.JSON does not officially support comments (including
#syntax). While some parsers may handle this leniently, strict JSON parsers will reject it, potentially causing parsing failures. Since this is example configuration, it should be valid JSON.Apply this diff to move the note into standard documentation:
- "heartbeat_interval": "@every 00h00m30s", # optional, default interval is 5 minutes + "heartbeat_interval": "@every 00h00m30s"Then add clarification in the prose above or below the code block:
> **Note:** The `heartbeat_interval` field is optional and defaults to 5 minutes. Adjust the value using the `@every HHhMMMmSSs` format as needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.dagger/publish_image.go(3 hunks)QUICKSTART.md(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
.dagger/publish_image.go (1)
.dagger/main.go (1)
PROJ_MOUNT(14-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-satellite
- GitHub Check: test-release
- GitHub Check: e2e-tests
- GitHub Check: build-ground-control
- GitHub Check: lint
🔇 Additional comments (2)
.dagger/publish_image.go (1)
36-36: Build path fix resolves previous critical issue.The change to pass the full
sourcedirectory togetBuildContainer(line 36) and mount it atPROJ_MOUNT(line 192), combined with the per-componentworkDirlogic (lines 173-179), correctly resolves the previous critical path mismatch issue for ground-control. Now:
- satellite: Full source mounted at
/app, workDir=/app, build command finds./cmd/main.go✓- ground-control: Full source mounted at
/app, workDir=/app/ground-control(exists), build command findsgo.modand sources ✓Also applies to: 192-193
QUICKSTART.md (1)
169-169: No changes needed—documentation is accurate.Verification confirms the QUICKSTART.md example is correct:
- Field name:
heartbeat_intervalis the correct field inAppConfig(confirmed inpkg/config/config.go:26)- Default value: The default is indeed 5 minutes (
@every 00h05m00s, defined inpkg/config/constants.go:20)- Documentation: The inline comment "optional, default interval is 5 minutes" accurately describes the default
- Example: The
@every 00h00m30s(30 seconds) is a valid override from the default, properly documentedThe example correctly shows how to configure a shorter heartbeat interval than the 5-minute default.
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
ground-control/internal/server/satellite_handlers.go (1)
357-375: Return JSON, use standard logger, validate inputs, and keep error handling consistent.
- Return a small JSON ack via WriteJSONResponse instead of empty 200.
- Use log.Printf, not fmt.Printf, for consistency.
- Validate req.Name and parse StateReportInterval; bound CPUPercent; unify errors via HandleAppError.
- Avoid per-request json.MarshalIndent to reduce log CPU and noise; log a compact summary.
Proposed focused change:
func (s *Server) statusReportHandler(w http.ResponseWriter, r *http.Request) { var req SatelliteStatusParams if err := DecodeRequestBody(r, &req); err != nil { log.Println(err) HandleAppError(w, err) return } - //todo: instead of just printing, save the latest state in db - heartbeat, err := json.MarshalIndent(req, "", " ") - if err != nil { - log.Printf("failed to marshal json: %v\n", err) - http.Error(w, "failed to marshal json", http.StatusInternalServerError) - return - } - - fmt.Printf("satellite reported status:\n%s\n", string(heartbeat)) - w.WriteHeader(http.StatusOK) + // Validate basic fields + if !utils.IsValidName(req.Name) { + HandleAppError(w, &AppError{Message: fmt.Sprintf(invalidNameMessage, "satellite"), Code: http.StatusBadRequest}) + return + } + if req.StateReportInterval != "" { + if _, err := time.ParseDuration(req.StateReportInterval); err != nil { + HandleAppError(w, &AppError{Message: "invalid state_report_interval", Code: http.StatusBadRequest}) + return + } + } + if req.CPUPercent < 0 || req.CPUPercent > 100 { + HandleAppError(w, &AppError{Message: "cpu_percent must be between 0 and 100", Code: http.StatusBadRequest}) + return + } + + // TODO: persist latest status into DB + log.Printf( + "satellite status: name=%s activity=%s interval=%s cpu=%.2f mem=%dB storage=%dB req_time=%s", + req.Name, req.Activity, req.StateReportInterval, req.CPUPercent, req.MemoryUsedBytes, req.StorageUsedBytes, + req.RequestCreatedTime.Format(time.RFC3339), + ) + + // Respond with a small JSON ack for client clarity + WriteJSONResponse(w, http.StatusOK, map[string]interface{}{ + "message": "ok", + "received_at": time.Now().UTC().Format(time.RFC3339), + }) }To confirm routing and auth:
#!/bin/bash # Verify route and middleware for /satellites/status rg -nP --type=go -C3 '/satellites/status' || true rg -nP --type=go -C3 'statusReportHandler\s*\(' ground-control | sed -n '1,120p'
🧹 Nitpick comments (2)
ground-control/internal/server/satellite_handlers.go (2)
4-4: Imports OK; consider dropping encoding/json if we remove pretty-printing.Looks fine. If you switch logging in statusReportHandler away from json.MarshalIndent (see suggestion below), remove the encoding/json import to avoid an unused import.
Apply if you adopt the handler change:
import ( - "encoding/json" "fmt" "log" "net/http" "os" "strconv" "time" )Also applies to: 10-10
30-40: Tighten payload schema: make optional fields omitempty; plan basic validation.
- Name should be validated (utils.IsValidName), others can be optional. Add ,omitempty so clients aren’t forced to send empty values. Consider parsing StateReportInterval to time.Duration in the handler and bounding CPUPercent to [0,100]. You can keep RequestCreatedTime as is; decode expects RFC3339.
Minimal tag tweak:
type SatelliteStatusParams struct { Name string `json:"name"` // Satellite identifier - Activity string `json:"activity"` // Current activity satellite is doing - StateReportInterval string `json:"state_report_interval"` // Interval between status reports - LatestStateDigest string `json:"latest_state_digest"` // Digest of latest state artifact - LatestConfigDigest string `json:"latest_config_digest"` // Digest of latest config artifact - MemoryUsedBytes uint64 `json:"memory_used_bytes"` // Memory currently used by satellite - StorageUsedBytes uint64 `json:"storage_used_bytes"` // Storage currently used by satellite - CPUPercent float64 `json:"cpu_percent"` // CPU usage percentage - RequestCreatedTime time.Time `json:"request_created_time"` // Timestamp of request creation + Activity string `json:"activity,omitempty"` // Current activity satellite is doing + StateReportInterval string `json:"state_report_interval,omitempty"` // Interval between status reports + LatestStateDigest string `json:"latest_state_digest,omitempty"` // Digest of latest state artifact + LatestConfigDigest string `json:"latest_config_digest,omitempty"` // Digest of latest config artifact + MemoryUsedBytes uint64 `json:"memory_used_bytes,omitempty"` // Memory currently used by satellite + StorageUsedBytes uint64 `json:"storage_used_bytes,omitempty"` // Storage currently used by satellite + CPUPercent float64 `json:"cpu_percent,omitempty"` // CPU usage percentage + RequestCreatedTime time.Time `json:"request_created_time,omitempty"` // Timestamp of request creation }Optional follow-ups:
- Consider pointer fields for metrics (e.g., *uint64, *float64) to distinguish “missing” vs “zero”.
- If aligning with CloudEvents later, map RequestCreatedTime to the CloudEvents time attribute.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ground-control/internal/server/satellite_handlers.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ground-control/internal/server/satellite_handlers.go (2)
ground-control/internal/server/helpers.go (1)
DecodeRequestBody(223-231)ground-control/internal/server/handleResponse.go (1)
HandleAppError(26-35)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: lint
- GitHub Check: e2e-tests
- GitHub Check: test-release
- GitHub Check: build-satellite
- GitHub Check: build-ground-control
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
ground-control/internal/server/satellite_handlers.go (2)
377-377: Use structured logger instead offmt.Printf.As flagged in previous reviews, use
log.Printffor consistency with the rest of the server codebase.Apply this diff:
- fmt.Printf("satellite reported status:\n%s\n", string(heartbeat)) + log.Printf("satellite reported status:\n%s\n", string(heartbeat))
378-379: Send an explicit JSON response body.As flagged in previous reviews, the handler should return a JSON response for consistency and client clarity rather than just setting the status header.
Apply this diff:
- fmt.Printf("satellite reported status:\n%s\n", string(heartbeat)) - w.WriteHeader(http.StatusOK) + log.Printf("satellite reported status:\n%s\n", string(heartbeat)) + WriteJSONResponse(w, http.StatusOK, map[string]string{"message": "Status report received"})
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ground-control/internal/server/satellite_handlers.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ground-control/internal/server/satellite_handlers.go (3)
ground-control/internal/server/server.go (1)
Server(18-22)ground-control/internal/server/helpers.go (1)
DecodeRequestBody(223-231)ground-control/internal/server/handleResponse.go (2)
HandleAppError(26-35)AppError(9-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: lint
- GitHub Check: build-satellite
- GitHub Check: test-release
- GitHub Check: e2e-tests
- GitHub Check: build-ground-control
🔇 Additional comments (3)
ground-control/internal/server/satellite_handlers.go (3)
4-4: LGTM: Import additions are appropriate.The
encoding/jsonandtimeimports are necessary for the new status reporting functionality.Also applies to: 10-10
30-40: LGTM: Well-defined status parameters struct.The struct is clearly documented with appropriate field types and JSON tags for status reporting.
357-379: Verify authentication/authorization for the status endpoint.Ensure that the POST
/satellites/statusendpoint is protected by authentication middleware to prevent unauthorized status reports.Run the following script to check the routing configuration and middleware:
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
pkg/config/manager.go (1)
132-138: Consider using distinct ConfigChangeType values for different intervals.Both
StateReplicationIntervalandHeartbeatIntervalchanges emit the sameIntervalsChangedtype. When both intervals change simultaneously, consumers receive twoConfigChangeentries with identical types, requiring them to inspectOldValue/NewValueto distinguish which interval changed.For improved clarity and type safety, consider introducing separate constants like
HeartbeatIntervalChangedandStateReplicationIntervalChanged.Apply this refactor if you want to improve type specificity:
const ( LogLevelChanged ConfigChangeType = "log_level" IntervalsChanged ConfigChangeType = "intervals" + HeartbeatIntervalChanged ConfigChangeType = "heartbeat_interval" + StateReplicationIntervalChanged ConfigChangeType = "state_replication_interval" ZotConfigChanged ConfigChangeType = "zot_config" )Then update the detection logic:
if oldConfig.AppConfig.StateReplicationInterval != newConfig.AppConfig.StateReplicationInterval { changes = append(changes, ConfigChange{ - Type: IntervalsChanged, + Type: StateReplicationIntervalChanged, OldValue: oldConfig.AppConfig.StateReplicationInterval, NewValue: newConfig.AppConfig.StateReplicationInterval, }) } if oldConfig.AppConfig.HeartbeatInterval != newConfig.AppConfig.HeartbeatInterval { changes = append(changes, ConfigChange{ - Type: IntervalsChanged, + Type: HeartbeatIntervalChanged, OldValue: oldConfig.AppConfig.HeartbeatInterval, NewValue: newConfig.AppConfig.HeartbeatInterval, }) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cmd/main.go(2 hunks)internal/hotreload/hotreload.go(3 hunks)pkg/config/manager.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
internal/hotreload/hotreload.go (1)
internal/scheduler/scheduler.go (1)
Scheduler(21-29)
pkg/config/manager.go (1)
pkg/config/config.go (1)
AppConfig(17-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build-ground-control
- GitHub Check: e2e-tests
- GitHub Check: build-satellite
- GitHub Check: test-release
🔇 Additional comments (1)
internal/hotreload/hotreload.go (1)
23-23: LGTM!The heartbeat scheduler field, constructor parameter, and initialization follow the same pattern as the existing state replication scheduler.
Also applies to: 33-33, 40-40
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (6)
internal/hotreload/hotreload.go (1)
101-107: Critical copy-paste errors remain unfixed.This issue was already flagged in a previous review. The heartbeat scheduler block contains three critical copy-paste errors:
- Line 102: Log message incorrectly references "state replication scheduler"
- Line 103: Operates on
hrm.stateReplicationSchedulerinstead ofhrm.heartbeatScheduler- Line 105: Error message incorrectly references "state replication scheduler"
This will cause the state replication scheduler to be reset twice with different intervals, and the heartbeat scheduler will never be reset at all.
Apply the fix from the previous review:
if hrm.heartbeatScheduler != nil { - hrm.log.Info().Msg("Restarting state replication scheduler with new interval") - err := hrm.stateReplicationScheduler.ResetIntervalFromExpr(hrm.cm.GetHeartbeatInterval()) + hrm.log.Info().Msg("Restarting heartbeat scheduler with new interval") + err := hrm.heartbeatScheduler.ResetIntervalFromExpr(hrm.cm.GetHeartbeatInterval()) if err != nil { - return fmt.Errorf("unable to restart state replication scheduler: %w", err) + return fmt.Errorf("unable to restart heartbeat scheduler: %w", err) } }internal/state/reporting_process.go (5)
135-138: Add Authorization header (if GC requires it) and reuse an HTTP client.Include bearer token when configured and reuse a client to benefit from connection pooling.
httpReq.Header.Set("Content-Type", "application/json") + if sCtx, ok := ctx.Value(struct{}{}).(*struct{}); ok && sCtx == nil { _ = ok } // no-op placeholder to avoid unused warnings if you inline below + if req != nil && s != nil && s.cm != nil && s.cm.Token != "" { + httpReq.Header.Set("Authorization", fmt.Sprintf("Bearer %s", s.cm.Token)) + } - client := &http.Client{} + // TODO: inject a shared *http.Client with timeouts via StatusReportingProcess if this runs frequently. + client := &http.Client{}Would you like a small constructor change to accept a shared client?
86-91: Guard against concurrent executions.
start()always returns true, so overlapping Execute calls can run concurrently. Return false if already running.func (s *StatusReportingProcess) start() bool { s.mu.Lock() defer s.mu.Unlock() - s.isRunning = true - return true + if s.isRunning { + return false + } + s.isRunning = true + return true }
99-101: Honor heartbeat_config.disabled in CanExecute.Skip execution when heartbeats are disabled.
func (s *StatusReportingProcess) CanExecute(log *zerolog.Logger) (bool, string) { - return true, fmt.Sprintf("Process %s can execute all conditions fulfilled", s.name) + if s.cm != nil && s.cm.IsHeartbeatDisabled() { + return false, "heartbeat disabled by configuration" + } + return true, fmt.Sprintf("Process %s can execute: all conditions fulfilled", s.name) }
53-57: Return early on failures to avoid emitting misleading success logs/payloads.If satellite name extraction or metrics collection fails, return the error instead of proceeding.
satelliteName, err := extractSatelliteNameFromURL(upstream.StateURL) if err != nil { - log.Warn().Err(err).Msg("Failed to extract satellite name") - + return fmt.Errorf("extract satellite name: %w", err) } @@ if err := collectStatusReportParams(ctx, duration, &req); err != nil { - log.Warn().Err(err).Msg("Failed to collect status report parameters") - + return fmt.Errorf("collect status report params: %w", err) }Also applies to: 72-75
146-148: Fix misleading/non-sensical HTTP error; include response body for diagnostics.Message claims timeout and wraps a nil err. Report non-2xx properly.
- if resp.StatusCode < 200 || resp.StatusCode >= 300 { - return fmt.Errorf("heartbeat request timed out : %w, with status code %d", err, resp.StatusCode) - } + if resp.StatusCode < 200 || resp.StatusCode >= 300 { + return fmt.Errorf("heartbeat request failed with status code %d", resp.StatusCode) + }Optionally also read a few bytes of resp.Body and include them in the error for easier debugging.
🧹 Nitpick comments (3)
internal/satellite/satellite.go (1)
91-93: Minor: avoid shadowing package name in Stop().
for _, scheduler := range s.schedulers { scheduler.Stop() }shadows theschedulerpackage. Rename loop variable for clarity.- for _, scheduler := range s.schedulers { - scheduler.Stop() + for _, sch := range s.schedulers { + sch.Stop()internal/state/reporting_process.go (2)
42-47: Remove redundant stop() call; already deferred.You call
s.stop()here and also viadefer s.stop(). Drop the explicit call.if !canExecute { log.Warn().Msgf("Process %s cannot execute: %s", s.name, reason) - s.stop() return nil }
17-23: Remove unused field.
Done chan struct{}isn’t used; delete to keep the type lean.type StatusReportingProcess struct { name string isRunning bool mu *sync.Mutex - Done chan struct{} cm *config.ConfigManager }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
internal/hotreload/hotreload.go(3 hunks)internal/satellite/satellite.go(2 hunks)internal/state/reporting_process.go(1 hunks)pkg/config/getters.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/config/getters.go
🧰 Additional context used
🧬 Code graph analysis (3)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
ConfigManager(29-37)internal/scheduler/scheduler.go (2)
UpstreamInfo(13-18)ParseEveryExpr(151-160)internal/logger/logger.go (1)
FromContext(85-94)internal/state/report.go (1)
StatusReportParams(16-26)
internal/satellite/satellite.go (2)
internal/scheduler/scheduler.go (2)
UpstreamInfo(13-18)NewSchedulerWithInterval(38-55)internal/state/reporting_process.go (1)
NewStatusReportingProcess(25-31)
internal/hotreload/hotreload.go (1)
internal/scheduler/scheduler.go (1)
Scheduler(21-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build-ground-control
- GitHub Check: build-satellite
- GitHub Check: e2e-tests
- GitHub Check: test-release
🔇 Additional comments (2)
internal/hotreload/hotreload.go (1)
23-23: LGTM! Heartbeat scheduler field addition is clean.The heartbeatScheduler field, parameter, and initialization follow the same pattern as the existing stateReplicationScheduler. The changes are consistent and well-structured.
Also applies to: 33-33, 40-40
internal/satellite/satellite.go (1)
82-83: Review comment is incorrect; no changes needed.The review misunderstands the intended design.
Satellite.Run()is correctly non-blocking: it starts background scheduler goroutines and returns immediately. This pattern allows the caller (cmd/main.go) to:
- Initialize all components after
Run()returns (line 154–157)- Manage blocking separately with
<-ctx.Done()(line 160)Adding
<-ctx.Done()insideRun()would prevent line 154–157 from executing, breaking the initialization sequence. The current implementation is a well-established pattern whereRun()starts background work and returns while the caller handles blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/hotreload/hotreload.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/hotreload/hotreload.go (1)
internal/scheduler/scheduler.go (1)
Scheduler(21-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: test-release
- GitHub Check: build-ground-control
- GitHub Check: build-satellite
🔇 Additional comments (1)
internal/hotreload/hotreload.go (1)
23-23: LGTM! Field, parameter, and initialization are correct.The heartbeat scheduler integration follows the same pattern as the state replication scheduler. The field declaration, constructor parameter, and initialization are all properly implemented.
Also applies to: 33-33, 40-40
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
Signed-off-by: meethereum <[email protected]>
fixes #207
Summary by cubic
Adds satellite heartbeat status reporting. Satellites send activity, digests, and resource usage to Ground Control at a configurable interval via a new endpoint.
New Features
Migration
Summary by CodeRabbit
New Features
Bug Fixes
Chores