Skip to content

Conversation

@meethereum
Copy link
Collaborator

@meethereum meethereum commented Sep 29, 2025

fixes #207


Summary by cubic

Adds satellite heartbeat status reporting. Satellites send activity, digests, and resource usage to Ground Control at a configurable interval via a new endpoint.

  • New Features

    • StatusReportingProcess and scheduler added; runs unless heartbeat_config.disabled is true.
    • Ground Control: POST /satellites/status handler accepts SatelliteStatusParams (logs payload for now).
    • Status payload includes activity, latest state/config digests, CPU%, memory used, storage used, and timestamp.
    • Scheduler supports an upstream info channel; state replication publishes digests/activity; reporter consumes and sends heartbeats.
    • Config: new heartbeat_config with state_report_interval (default 5m) and disable flag; validation and getters added.
  • Migration

    • Optional: set heartbeat_config.state_report_interval or heartbeat_config.disabled in satellite config to customize or disable heartbeats.

Summary by CodeRabbit

  • New Features

    • Satellites send scheduled, detailed status reports (CPU, memory, storage, activity, digests, timestamp) to Ground Control; reporting runs alongside existing schedulers and can be enabled/disabled via config.
    • Scheduler framework now propagates shared heartbeat/status context to scheduled tasks.
  • Bug Fixes

    • Config validation defaults empty/invalid report intervals to a safe schedule and emits warnings.
  • Chores

    • Added heartbeat config fields, quickstart docs updated, and build/dependency/tooling adjustments.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 29, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds satellite heartbeat/status reporting: new POST /satellites/status endpoint and handler; introduces StatusReportingProcess and UpstreamInfo propagation through schedulers and processes; collects runtime metrics and posts periodic status reports; adds heartbeat config/defaults/getters; wires heartbeat scheduler into startup and hot-reload; updates Dagger build workdir and dependency graph.

Changes

Cohort / File(s) Summary of edits
Ground-control API route & handler
ground-control/internal/server/routes.go, ground-control/internal/server/satellite_handlers.go
Adds POST /satellites/status route and statusReportHandler; introduces SatelliteStatusParams decoding, JSON logging, and 200/400 responses.
Scheduler core & API
internal/scheduler/scheduler.go, internal/scheduler/process.go
Adds UpstreamInfo type and activity constants; Scheduler gains upstreamPayload *UpstreamInfo; exports ParseEveryExpr; NewSchedulerWithInterval now accepts upstreamPayload; changes Process.ExecuteExecute(ctx, upstreamPayload *UpstreamInfo) and passes payload to processes.
State: reporting & metrics
internal/state/report.go, internal/state/reporting_process.go
Adds exported StatusReportParams and helpers to collect CPU/memory/storage; adds StatusReportingProcess that builds and POSTs status reports to ground-control /satellites/status.
State: existing processes updated
internal/state/state_process.go, internal/state/registration_process.go
Update Execute signatures to accept *scheduler.UpstreamInfo; FetchAndReplicateStateProcess populates upstreamPayload fields and sets CurrentActivity on success/error paths.
Satellite wiring
internal/satellite/satellite.go
Introduces shared heartbeatPayload (scheduler.UpstreamInfo), instantiates StatusReportingProcess and scheduler, wires payload into ZTR and state schedulers, and conditionally starts status reporting based on config.
Config: types, getters, defaults
pkg/config/config.go, pkg/config/constants.go, pkg/config/getters.go, pkg/config/validate.go
Adds DisableHeartbeat and HeartbeatInterval to AppConfig; adds DefaultStateReportCronExpr; adds getters GetHeartbeatInterval() and IsHeartbeatDisabled(); validates/defaults heartbeat interval.
Hot reload & interval handling
internal/hotreload/hotreload.go, cmd/main.go, pkg/config/manager.go
HotReloadManager accepts optional heartbeat scheduler and restarts it on interval changes; cmd/main.go wiring updated to pass heartbeat scheduler; config manager detects heartbeat interval changes.
Config & quickstart docs
QUICKSTART.md
Adds heartbeat_interval example to app_config.
Dependencies & Dagger build changes
go.mod, ground-control/go.mod, dagger.json, .dagger/e2e.go, .dagger/main.go, .dagger/utils.go, .dagger/publish_image.go
Adds gopsutil/v4, viper, testify and many indirect deps; updates module graph; updates Dagger engine version and changes container workdir handling to mount/run ground-control subdir and pass full source dir to builders.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Sat as Satellite (startup)
  participant SchState as StateScheduler
  participant SchSR as StatusReportingScheduler
  participant ProcState as FetchAndReplicateStateProcess
  participant ProcSR as StatusReportingProcess
  participant GC as Ground-Control

  rect rgb(245,250,255)
    Sat->>SchState: NewSchedulerWithInterval(..., &UpstreamInfo)
    Sat->>SchSR: NewSchedulerWithInterval(..., &UpstreamInfo)
  end

  par Periodic state replication
    SchState->>ProcState: Execute(ctx, upstreamPayload)
    ProcState->>ProcState: fetch state/config, set digests & CurrentActivity
  and Periodic status reporting
    SchSR->>ProcSR: Execute(ctx, upstreamPayload)
    ProcSR->>ProcSR: collect metrics, merge upstreamPayload -> StatusReportParams
    ProcSR->>GC: POST /satellites/status (JSON)
    GC-->>ProcSR: 2xx / error
  end
Loading
sequenceDiagram
  autonumber
  participant Client as Satellite/process
  participant GC as Ground-Control
  participant Handler as statusReportHandler

  Client->>GC: POST /satellites/status {StatusReportParams}
  GC->>Handler: dispatch
  Handler->>Handler: decode JSON -> SatelliteStatusParams
  alt decode ok
    Handler-->>Client: 200 OK
  else decode error
    Handler-->>Client: 400 Bad Request
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing extra attention:

  • Process.Execute signature change and all impacted implementations/call sites.
  • Concurrency and correctness of sharing/mutation of a single *UpstreamInfo among schedulers/processes.
  • Proper propagation of digests and CurrentActivity in state processes under error and success paths.
  • HTTP client behavior and error handling in StatusReportingProcess.sendStatusReport and ground-control handler decoding.
  • Hot-reload restart logic for heartbeat scheduler and config defaulting/validation interactions.
  • Dagger workdir changes impact on CI/E2E build steps.

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • bupd
  • Vad1mo

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Linked Issues Check ⚠️ Warning The PR implements core status reporting functionality including the StatusReportingProcess, configurable heartbeat intervals, and a Ground Control endpoint with the required status payload fields (activity, digests, CPU%, memory, storage, timestamp). However, issue #207 explicitly specifies that the event format must use the CloudEvents specification ("chosen spec : https://cloudevents.io/"), which is a clear coding requirement that is not implemented in this PR. The PR also does not implement webhook/user-defined endpoint delivery mechanisms specified in #207, though the PR notes some details remain TBD. While the core heartbeat functionality to Ground Control is present, the explicit requirement to wrap status in CloudEvents format is absent. The PR should implement the CloudEvents specification for the status payload format as explicitly required in issue #207. This requires wrapping the status report data in a CloudEvents-compliant structure with appropriate context attributes (type, source, subject, etc.). Additionally, clarify whether webhook/user-defined endpoint support mentioned in #207 is deferred to a follow-up PR or genuinely out of scope for this implementation.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat : satellite heartbeat" directly and clearly describes the primary change being introduced in the changeset. The title accurately summarizes the main feature—implementing satellite heartbeat/status reporting functionality. The title is concise, specific enough for a teammate to understand the key change when scanning history, and accurately reflects the substantial modifications across multiple files that introduce the StatusReportingProcess, status reporting configuration, and Ground Control endpoint for receiving satellite status updates.
Out of Scope Changes Check ✅ Passed The vast majority of changes are directly related to implementing the satellite heartbeat feature: StatusReportingProcess, status payload definitions, scheduler modifications to support upstream info propagation, configuration additions for heartbeat control, and the Ground Control status endpoint. Build system changes in dagger configuration and go.mod dependency updates appear to be supporting infrastructure necessary for the new ground-control module structure that hosts the heartbeat endpoint. These supporting changes, while peripheral to the core feature logic, are reasonable accompaniments to integrating the new Ground Control endpoint into the project.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4f9134 and ab13639.

📒 Files selected for processing (2)
  • QUICKSTART.md (1 hunks)
  • pkg/config/constants.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • QUICKSTART.md
  • pkg/config/constants.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build-satellite
  • GitHub Check: e2e-tests
  • GitHub Check: build-ground-control
  • GitHub Check: test-release
  • GitHub Check: lint
  • GitHub Check: Codacy Static Code Analysis

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: meethereum <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/state/state_process.go (1)

148-190: Fix concurrent writes to shared upstream payload

payload is mutated from multiple goroutines (state fetchers, config fetcher, and the main loop) without any synchronization. When the deferred send executes, those goroutines may still be writing to payload, so we hit a data race and the final snapshot we emit is undefined. Please confine all mutations to a single goroutine (e.g., store per-goroutine status locally and update payload in the main loop, or guard the struct with a mutex) before sending it upstream.

Also applies to: 208-279, 302-329

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c1233a6 and b013bd1.

📒 Files selected for processing (13)
  • ground-control/internal/server/routes.go (1 hunks)
  • ground-control/internal/server/satellite_handlers.go (3 hunks)
  • internal/satellite/satellite.go (2 hunks)
  • internal/scheduler/process.go (1 hunks)
  • internal/scheduler/scheduler.go (4 hunks)
  • internal/state/registration_process.go (3 hunks)
  • internal/state/report.go (1 hunks)
  • internal/state/reporting_process.go (1 hunks)
  • internal/state/state_process.go (12 hunks)
  • pkg/config/config.go (1 hunks)
  • pkg/config/constants.go (1 hunks)
  • pkg/config/getters.go (1 hunks)
  • pkg/config/validate.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
internal/state/state_process.go (3)
internal/scheduler/scheduler.go (4)
  • UpstreamInfo (13-18)
  • ActivityEncounteredError (33-33)
  • ActivityReconcilingState (34-34)
  • ActivityStateSynced (32-32)
internal/state/state.go (1)
  • SatelliteState (27-30)
pkg/config/config.go (1)
  • Config (37-42)
internal/scheduler/process.go (1)
internal/scheduler/scheduler.go (1)
  • UpstreamInfo (13-18)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
  • ConfigManager (29-37)
internal/scheduler/scheduler.go (2)
  • UpstreamInfo (13-18)
  • ParseEveryExpr (151-160)
internal/logger/logger.go (1)
  • FromContext (85-94)
internal/state/report.go (1)
  • StatusReportParams (16-26)
internal/state/registration_process.go (1)
internal/scheduler/scheduler.go (1)
  • UpstreamInfo (13-18)
internal/scheduler/scheduler.go (1)
internal/scheduler/process.go (1)
  • Process (8-20)
pkg/config/validate.go (2)
pkg/config/config.go (1)
  • HeartbeatConfig (32-35)
pkg/config/constants.go (1)
  • DefaultStateReportCronExpr (20-20)
ground-control/internal/server/satellite_handlers.go (3)
ground-control/internal/server/server.go (1)
  • Server (18-22)
ground-control/internal/server/helpers.go (1)
  • DecodeRequestBody (223-231)
ground-control/internal/server/handleResponse.go (1)
  • HandleAppError (26-35)
pkg/config/getters.go (2)
pkg/config/manager.go (1)
  • ConfigManager (29-37)
pkg/config/config.go (1)
  • HeartbeatConfig (32-35)
internal/satellite/satellite.go (4)
internal/scheduler/scheduler.go (2)
  • UpstreamInfo (13-18)
  • NewSchedulerWithInterval (38-55)
internal/state/state_process.go (1)
  • NewFetchAndReplicateStateProcess (39-46)
internal/state/registration_process.go (1)
  • NewZtrProcess (33-40)
internal/state/reporting_process.go (1)
  • NewStatusReportingProcess (24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: cubic · AI code reviewer
  • GitHub Check: e2e-tests
  • GitHub Check: test-release
  • GitHub Check: build-ground-control
  • GitHub Check: lint
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (7)
pkg/config/validate.go (2)

99-101: Add a warning for invalid heartbeat schedule (align with other intervals).

Other intervals append a warning when defaulting. Do the same for HeartbeatConfig.StateReportInterval for consistency and operator visibility.
[ suggest_recommended_refactor ]

 if !isValidCronExpression(config.HeartbeatConfig.StateReportInterval) {
     config.HeartbeatConfig.StateReportInterval = DefaultStateReportCronExpr
+    warnings = append(warnings, fmt.Sprintf(
+        "invalid schedule provided for heartbeat.state_report_interval, using default schedule %s",
+        DefaultStateReportCronExpr,
+    ))
 }

106-112: Cron validator likely rejects '@every …' descriptors; broaden parsing.

Defaults like "@every 00h05m00s" are descriptor specs and may not pass cron.ParseStandard. Update validator to accept standard fields and descriptors to avoid false negatives.
[ suggest_essential_refactor ]

 func isValidCronExpression(cronExpression string) bool {
-    if _, err := cron.ParseStandard(cronExpression); err != nil {
-        return false
-    }
-    return true
+    if cronExpression == "" {
+        return false
+    }
+    // 1) Try standard spec (5/6 fields).
+    if _, err := cron.ParseStandard(cronExpression); err == nil {
+        return true
+    }
+    // 2) Try descriptor-enabled parser (supports @every, @hourly, etc.).
+    p := cron.NewParser(cron.Minute | cron.Hour | cron.Dom | cron.Month | cron.Dow | cron.Descriptor | cron.SecondOptional)
+    if _, err := p.Parse(cronExpression); err == nil {
+        return true
+    }
+    return false
 }
ground-control/internal/server/routes.go (1)

40-41: Endpoint name mismatch with objectives; confirm final path.

Objectives call for POST /satellites/sync, but the route added is POST /satellites/status. Please confirm the intended path and align satellite client, docs, and ground-control accordingly.
[ request_verification ]

If the intent is “heartbeat,” consider /satellites/heartbeat or keep /satellites/sync but reflect it everywhere.

pkg/config/constants.go (1)

20-22: Descriptor defaults look fine; consider simpler durations and brief doc.

"@every 00h05m00s" and "@every 00h10m00s" are valid durations; opt for "@every 5m" and "@every 10m" for readability, and document these must be time.ParseDuration-compatible.
[ suggest_optional_refactor ]

pkg/config/getters.go (1)

145-155: Getters look good; verify usage wired into scheduler/reporting.

The thread-safe getters follow existing pattern. Ensure the scheduler/heartbeat process actually consumes these.
[ approve_code_changes ]
[ request_verification ]

#!/bin/bash
# Find call sites for the new getters.
rg -nP -C2 '\bGetStateReportingInterval\s*\('
rg -nP -C2 '\bIsHeartbeatDisabled\s*\('
internal/state/registration_process.go (2)

42-54: Signature change acknowledged; ensure all Process implementations are updated.

Execute now accepts upstreamPayload chan scheduler.UpstreamInfo. Confirm all process types and the scheduler invocations are updated to the new signature to avoid compile-time or runtime mismatches.
[ approve_code_changes ]
[ request_verification ]

#!/bin/bash
# Verify interface and implementations use the new Execute signature.
rg -nP -C2 'type\s+Process\s+interface'
rg -nP -C2 '\bExecute\s*\(\s*ctx\s+context\.Context\s*,\s*upstreamPayload\s+chan\s+scheduler\.UpstreamInfo\s*\)\s*error'
rg -nP -C2 '\.\s*Execute\('

136-159: Close response body and add a timeout to avoid hangs and leaks.

HTTP client has no timeout and the response body isn’t closed (also on non-200 paths), risking leaked connections and blocked goroutines.
[ raise_critical_issue ]

Add import:

 import (
   "context"
   "encoding/json"
   "fmt"
   "net/http"
+  "time"
   "strings"
   "sync"

Harden request:

 func registerSatellite(groundControlURL, path, token string, ctx context.Context) (config.StateConfig, error) {
   ztrURL := fmt.Sprintf("%s/%s/%s", groundControlURL, path, token)
-  client := &http.Client{}
+  client := &http.Client{Timeout: 10 * time.Second}
 
   // Create a new request for the Zero Touch Registration of satellite
-  req, err := http.NewRequestWithContext(ctx, http.MethodGet, ztrURL, nil)
+  // Ensure a bounded lifetime even if caller ctx has no deadline.
+  ctxReq, cancel := context.WithTimeout(ctx, 10*time.Second)
+  defer cancel()
+  req, err := http.NewRequestWithContext(ctxReq, http.MethodGet, ztrURL, nil)
   if err != nil {
     return config.StateConfig{}, fmt.Errorf("failed to create request: %w", err)
   }
-  response, err := client.Do(req)
+  response, err := client.Do(req)
   if err != nil {
     return config.StateConfig{}, fmt.Errorf("failed to send request: %w", err)
   }
+  defer response.Body.Close()
   if response.StatusCode != http.StatusOK {
     return config.StateConfig{}, fmt.Errorf("failed to register satellite: %s", response.Status)
   }
 
   var authResponse config.StateConfig
   if err := json.NewDecoder(response.Body).Decode(&authResponse); err != nil {
     return config.StateConfig{}, fmt.Errorf("failed to decode response: %w", err)
   }
 
   return authResponse, nil
 }

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 issues found across 13 files

Prompt for AI agents (all 9 issues)

Understand the root cause of the following 9 issues and fix them.


<file name="internal/state/state_process.go">

<violation number="1" location="internal/state/state_process.go:109">
Concurrent writes to shared payload from multiple goroutines without synchronization cause a data race; guard updates or communicate via a channel.</violation>
</file>

<file name="ground-control/internal/server/satellite_handlers.go">

<violation number="1" location="ground-control/internal/server/satellite_handlers.go:356">
Handler returns without sending a response; send an explicit success JSON response for consistency and client clarity.</violation>

<violation number="2" location="ground-control/internal/server/satellite_handlers.go:364">
Use the standard logger instead of fmt.Printf for consistency with the rest of the server.</violation>
</file>

<file name="internal/state/reporting_process.go">

<violation number="1" location="internal/state/reporting_process.go:35">
Process marks itself stopped immediately after starting a long-lived goroutine, leading to incorrect IsRunning state and lifecycle handling.</violation>

<violation number="2" location="internal/state/reporting_process.go:42">
HTTP response status code is ignored; non-2xx responses should surface as errors.</violation>

<violation number="3" location="internal/state/reporting_process.go:83">
Error context is swallowed; include err in the log for diagnosis.</violation>

<violation number="4" location="internal/state/reporting_process.go:144">
HTTP client has no timeout; a hung request can block and leak resources in the heartbeat loop.</violation>
</file>

<file name="internal/satellite/satellite.go">

<violation number="1" location="internal/satellite/satellite.go:57">
Incorrect error message: refers to status reporter while handling state scheduler creation.</violation>

<violation number="2" location="internal/satellite/satellite.go:67">
Check err before appending statusReportingScheduler; appending a nil scheduler can lead to a panic when iterating and invoking methods on entries in s.schedulers.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
ground-control/migrator/migrator.go (2)

30-37: Build the DSN safely and make sslmode configurable.

Raw string formatting can break when credentials contain special chars. Also, hard‑coding sslmode=disable weakens security.

Apply:

@@
-import (
+import (
 	"context"
 	"database/sql"
 	"fmt"
 	"log"
 	"os"
 	"time"
+	"net/url"
@@
 func parseDBConfig() DBConfig {
-	dbURL := fmt.Sprintf(
-		"postgres://%s:%s@%s:%s/%s?sslmode=disable",
-		username,
-		password,
-		HOST,
-		PORT,
-		dbName,
-	)
+	sslmode := os.Getenv("DB_SSLMODE")
+	if sslmode == "" {
+		sslmode = "disable" // keep current default; consider "require" in prod
+	}
+	u := &url.URL{
+		Scheme: "postgres",
+		User:   url.UserPassword(username, password),
+		Host:   fmt.Sprintf("%s:%s", HOST, PORT),
+		Path:   dbName,
+		RawQuery: "sslmode=" + sslmode,
+	}
+	dbURL := u.String()

46-65: Prefer returning errors instead of log.Fatalf for testability and graceful shutdown.

log.Fatalf aborts the process and skips defers; it’s hard to reuse this code in CLIs/tests/services. Consider returning errors from waitForPostgresReady, runMigrations, and DoMigrations, letting the caller decide how to handle them.

Also applies to: 67-79, 84-96

internal/state/reporting_process.go (3)

110-112: Gate execution with config (disabled flag, URL present).

CanExecute always returns true. Consider checking heartbeat disabled flag and a non‑empty Ground Control URL so we log-and-skip early instead of starting.

If getters don’t exist, I can add them; confirm desired sources of truth in config.


16-22: Tighten locking; simplify fields.

  • mu can be an embedded sync.RWMutex (by value). Use RLock for reads in IsRunning/Name, or drop locking for Name (immutable).
  • IsComplete doesn’t need s.mu at all; it only calls into cm.

I can draft a minimal patch if you want this cleanup now.

Also applies to: 114-130


20-21: Remove or use Done channel.

Done is never used; remove it or wire it to lifecycle/teardown.

Also applies to: 24-31

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b013bd1 and c62c251.

📒 Files selected for processing (2)
  • ground-control/migrator/migrator.go (1 hunks)
  • internal/state/reporting_process.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
  • ConfigManager (29-37)
internal/scheduler/scheduler.go (2)
  • UpstreamInfo (13-18)
  • ParseEveryExpr (151-160)
internal/logger/logger.go (1)
  • FromContext (85-94)
internal/state/report.go (1)
  • StatusReportParams (16-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: test-release
🔇 Additional comments (2)
internal/state/reporting_process.go (2)

33-95: Fix unbounded goroutine + inaccurate running state: run the receive loop inside Execute (block until ctx/channel close).

Current code spawns a background goroutine every Execute, leaking receivers and racing on isRunning. Start after gating, don’t return until ctx.Done() or upstream closes, fix log messages/typo, make channel receive‑only, and drop the redundant string() cast.

Apply:

-func (s *StatusReportingProcess) Execute(ctx context.Context, upstream chan scheduler.UpstreamInfo) error {
-	s.start()
-	defer s.stop()
-
-	log := logger.FromContext(ctx).With().Str("process", s.name).Logger()
-
-	canExecute, reason := s.CanExecute(&log)
-	if !canExecute {
-		log.Warn().Msgf("Process %s cannot execute: %s", s.name, reason)
-		return nil
-	}
-	log.Info().Msgf("Executing process")
-
-	// consume upstream info continuously
-	go func() {
-		for {
-			select {
-			case <-ctx.Done():
-				return
-			case info, ok := <-upstream:
-				if !ok {
-					return
-				}
-				var req StatusReportParams
-
-				groundControlURL := s.cm.ResolveGroundControlURL()
-
-				satteliteName, err := extractSatelliteNameFromURL(info.StateURL)
-				if err != nil {
-					log.Warn().Msg("Failed to parse state reporting interval")
-					continue
-				}
-
-				duration, err := scheduler.ParseEveryExpr(s.cm.GetStateReportingInterval())
-				if err != nil {
-					log.Warn().Msg("Failed to parse state reporting interval")
-					continue
-				}
-
-				// populate info from upstream channel into the request
-				req.Name = satteliteName
-				req.Activity = info.CurrentActivity
-				req.StateReportInterval = s.cm.GetStateReportingInterval()
-				req.LatestStateDigest = info.LatestStateDigest
-				req.LatestConfigDigest = info.LatestConfigDigest
-
-				// populate all other info which is not provided by the upstream channel, into the request
-				if err := collectStatusReportParams(ctx, duration, &req); err != nil {
-					log.Warn().Msg("Failed to populate heartbeat payload fully")
-				}
-
-				if err := sendStatusReport(ctx, string(groundControlURL), &req); err != nil {
-					log.Warn().Msg("Failed to send status report")
-				}
-
-				log.Info().Msg("Heartbeat sent to ground control successfully")
-			}
-		}
-	}()
-
-	return nil
-}
+func (s *StatusReportingProcess) Execute(ctx context.Context, upstream <-chan scheduler.UpstreamInfo) error {
+	log := logger.FromContext(ctx).With().Str("process", s.name).Logger()
+
+	canExecute, reason := s.CanExecute(&log)
+	if !canExecute {
+		log.Warn().Msgf("Process %s cannot execute: %s", s.name, reason)
+		return nil
+	}
+
+	s.start()
+	defer s.stop()
+	log.Info().Msg("Executing process")
+
+	for {
+		select {
+		case <-ctx.Done():
+			return nil
+		case info, ok := <-upstream:
+			if !ok {
+				log.Info().Msg("Upstream channel closed; stopping")
+				return nil
+			}
+
+			groundControlURL := s.cm.ResolveGroundControlURL()
+
+			satelliteName, err := extractSatelliteNameFromURL(info.StateURL)
+			if err != nil {
+				log.Warn().Err(err).Msg("Failed to extract satellite name from state URL")
+				continue
+			}
+
+			duration, err := scheduler.ParseEveryExpr(s.cm.GetStateReportingInterval())
+			if err != nil {
+				log.Warn().Err(err).Msg("Failed to parse state reporting interval")
+				continue
+			}
+
+			req := StatusReportParams{
+				Name:                satelliteName,
+				Activity:            info.CurrentActivity,
+				StateReportInterval: s.cm.GetStateReportingInterval(),
+				LatestStateDigest:   info.LatestStateDigest,
+				LatestConfigDigest:  info.LatestConfigDigest,
+			}
+
+			if err := collectStatusReportParams(ctx, duration, &req); err != nil {
+				log.Warn().Err(err).Msg("Failed to populate heartbeat payload fully")
+			}
+
+			if err := sendStatusReport(ctx, groundControlURL, &req); err != nil {
+				log.Warn().Err(err).Msg("Failed to send status report")
+				continue
+			}
+			log.Info().Msg("Heartbeat sent to ground control successfully")
+		}
+	}
+}

3-14: Handle non‑2xx responses and add per‑request timeout.

Currently any HTTP status is treated as success and the client has no timeout. Add a timeout and fail on non‑2xx, including response body for diagnostics.

Apply:

@@
-import (
+import (
 	"bytes"
 	"context"
 	"encoding/json"
 	"fmt"
 	"github.com/container-registry/harbor-satellite/internal/logger"
 	"github.com/container-registry/harbor-satellite/internal/scheduler"
 	"github.com/container-registry/harbor-satellite/pkg/config"
 	"github.com/rs/zerolog"
+	"io"
 	"net/http"
+	"strings"
 	"sync"
+	"time"
 )
@@
 func sendStatusReport(ctx context.Context, groundControlURL string, req *StatusReportParams) error {
-	url := fmt.Sprintf("%s/satellites/status", groundControlURL)
+	url := strings.TrimRight(groundControlURL, "/") + "/satellites/status"
@@
-	client := &http.Client{}
+	client := &http.Client{Timeout: 10 * time.Second}
 	resp, err := client.Do(httpReq)
 	if err != nil {
 		return fmt.Errorf("failed to send status report: %w", err)
 	}
 	defer func() {
 		_ = resp.Body.Close()
 	}()
 
-	return nil
+	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+		b, _ := io.ReadAll(io.LimitReader(resp.Body, 4<<10))
+		return fmt.Errorf("ground control returned %s: %s", resp.Status, string(b))
+	}
+	return nil
 }

Also applies to: 132-156

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
internal/satellite/satellite.go (1)

91-93: Optional: avoid shadowing the imported scheduler package name.

Rename loop var to sch to reduce confusion.

-for _, scheduler := range s.schedulers {
-	scheduler.Stop()
+for _, sch := range s.schedulers {
+	sch.Stop()
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c62c251 and ac5e332.

📒 Files selected for processing (2)
  • internal/satellite/satellite.go (2 hunks)
  • internal/state/reporting_process.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/state/reporting_process.go
🧰 Additional context used
🧬 Code graph analysis (1)
internal/satellite/satellite.go (4)
internal/scheduler/scheduler.go (2)
  • UpstreamInfo (13-18)
  • NewSchedulerWithInterval (38-55)
internal/state/state_process.go (1)
  • NewFetchAndReplicateStateProcess (39-46)
internal/state/registration_process.go (1)
  • NewZtrProcess (33-40)
internal/state/reporting_process.go (1)
  • NewStatusReportingProcess (24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build-ground-control
  • GitHub Check: e2e-tests
  • GitHub Check: test-release
🔇 Additional comments (5)
internal/satellite/satellite.go (5)

48-61: Good: error checked before append; past panic risk resolved.

The state replication scheduler now checks err before appending/starting. This addresses the earlier nil-append/Stop panic and log mix‑up noted in prior reviews.


34-46: LGTM: ZTR scheduler wiring and error handling look correct.

Channel is passed, error handled, append and Run are in the right order.


30-31: LGTM: status reporting process instantiation.

Constructor usage looks consistent with other processes.


76-77: LGTM: state scheduler started after successful creation.

Order is correct.


82-83: Confirm Run contract (returns immediately vs blocks until ctx.Done).

Run returns ctx.Err() right away; if ctx isn’t canceled, this returns nil immediately. Verify this matches caller expectations; many Run methods block until cancellation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
internal/state/reporting_process.go (4)

20-21: Clean up minor lifecycle nits (optional).

  • Done channel is never used; remove it or close it in stop() and document semantics to avoid double-close.
  • Name()/IsRunning() lock around immutable string and a bool read; consider RLock or no lock if you keep writes serialized in start/stop.

Also applies to: 110-114, 126-136


90-96: Enrich success logs with minimal context.

Add satellite name and HTTP status to success logs to aid triage.

Example:

log.Info().Str("satellite", req.Name).Msg("Heartbeat sent to ground control successfully")

146-151: Consider Authorization header if Ground Control requires auth.

If GC expects a bearer token, set it on the request from config (e.g., cm.Token). This likely belongs in sendStatusReport; you may pass the token as an argument or refactor sendStatusReport into a method on StatusReportingProcess.

Example change:

httpReq.Header.Set("Authorization", fmt.Sprintf("Bearer %s", token))

120-124: Confirm IsComplete() semantics.

Returning cm.IsZTRDone() for a heartbeat process may prematurely mark it “complete” and interfere with scheduler policy. If heartbeats are intended to run for the lifetime of the satellite, consider always returning false here or gating on a dedicated heartbeat lifecycle flag.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac5e332 and e9c3bf4.

📒 Files selected for processing (1)
  • internal/state/reporting_process.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
  • ConfigManager (29-37)
internal/scheduler/scheduler.go (2)
  • UpstreamInfo (13-18)
  • ParseEveryExpr (151-160)
internal/logger/logger.go (1)
  • FromContext (85-94)
internal/state/report.go (1)
  • StatusReportParams (16-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build-ground-control
  • GitHub Check: e2e-tests
  • GitHub Check: test-release
🔇 Additional comments (2)
internal/state/reporting_process.go (2)

33-36: Fix unbounded background receivers; guard start() and avoid spawning a new goroutine each Execute tick.

Execute returns immediately after launching a long‑lived goroutine, and start() currently always returns true. If the scheduler calls Execute periodically, you’ll accumulate receivers contending on the same channel. Guard start() so only one runner exists at a time, and keep it running until ctx is canceled or upstream closes. This preserves a single lifecycle.

Apply this minimal guard:

 func (s *StatusReportingProcess) start() bool {
   s.mu.Lock()
   defer s.mu.Unlock()
-  s.isRunning = true
-  return true
+  if s.isRunning {
+    return false
+  }
+  s.isRunning = true
+  return true
 }

Also applies to: 49-98, 103-108


3-14: Harden HTTP: timeout, non‑2xx handling, and safe URL join.

  • No timeout on client/request; risks hangs in the loop.
  • Non‑2xx responses treated as success.
  • URL is concatenated; prefer url.JoinPath to avoid double slashes.
 import (
   "bytes"
   "context"
   "encoding/json"
   "fmt"
   "github.com/container-registry/harbor-satellite/internal/logger"
   "github.com/container-registry/harbor-satellite/internal/scheduler"
   "github.com/container-registry/harbor-satellite/pkg/config"
   "github.com/rs/zerolog"
   "net/http"
+  "net/url"
+  "io"
+  "time"
   "sync"
 )
@@
 func sendStatusReport(ctx context.Context, groundControlURL string, req *StatusReportParams) error {
-  url := fmt.Sprintf("%s/satellites/status", groundControlURL)
+  endpoint, err := url.JoinPath(groundControlURL, "satellites", "status")
+  if err != nil {
+    return fmt.Errorf("failed to build status URL: %w", err)
+  }
 
   body, err := json.Marshal(req)
   if err != nil {
     return fmt.Errorf("failed to marshal status report: %w", err)
   }
 
-  httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewBuffer(body))
+  // Per‑request timeout to avoid wedging the loop.
+  cctx, cancel := context.WithTimeout(ctx, 10*time.Second)
+  defer cancel()
+  httpReq, err := http.NewRequestWithContext(cctx, http.MethodPost, endpoint, bytes.NewBuffer(body))
   if err != nil {
     return fmt.Errorf("failed to create status report request: %w", err)
   }
   httpReq.Header.Set("Content-Type", "application/json")
 
-  client := &http.Client{}
+  client := &http.Client{Timeout: 15 * time.Second}
   resp, err := client.Do(httpReq)
   if err != nil {
     return fmt.Errorf("failed to send status report: %w", err)
   }
   defer func() {
     _ = resp.Body.Close()
   }()
 
-  return nil
+  if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+    // Include small body snippet for diagnostics.
+    b, _ := io.ReadAll(io.LimitReader(resp.Body, 4<<10))
+    return fmt.Errorf("ground control returned %s: %s", resp.Status, string(bytes.TrimSpace(b)))
+  }
+  return nil
 }

Also applies to: 138-162

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/state/registration_process.go (1)

136-159: Close response body and add client timeouts

  • resp.Body is never closed (and leaked on non-200); add Close in all paths.
  • http.Client has no timeout; a hung GC will block indefinitely.

Apply:

-	client := &http.Client{}
+	client := &http.Client{Timeout: 15 * time.Second}
@@
-	response, err := client.Do(req)
+	response, err := client.Do(req)
 	if err != nil {
 		return config.StateConfig{}, fmt.Errorf("failed to send request: %w", err)
 	}
-	if response.StatusCode != http.StatusOK {
+	defer response.Body.Close()
+	if response.StatusCode != http.StatusOK {
 		return config.StateConfig{}, fmt.Errorf("failed to register satellite: %s", response.Status)
 	}

Add import:

import "time"
♻️ Duplicate comments (3)
internal/state/reporting_process.go (3)

128-133: Consider Authorization header if GC requires auth

If Ground Control expects auth, add Authorization: Bearer (from ConfigManager) to the request.

Do we need to include the satellite token? If yes, I can wire it through sendStatusReport safely.


120-144: Add HTTP timeout and treat non‑2xx as errors

Heartbeat HTTP client has no timeout and success path ignores status code.

Apply:

- httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewBuffer(body))
+ httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewBuffer(body))
  if err != nil {
    return fmt.Errorf("failed to create status report request: %w", err)
  }
  httpReq.Header.Set("Content-Type", "application/json")

- client := &http.Client{}
+ client := &http.Client{Timeout: 15 * time.Second}
  resp, err := client.Do(httpReq)
  if err != nil {
    return fmt.Errorf("failed to send status report: %w", err)
  }
  defer func() { _ = resp.Body.Close() }()
+
+ if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+   return fmt.Errorf("ground control non-2xx response: %s", resp.Status)
+ }

Add import:

import "time"

100-102: Honor heartbeat_config.disabled in CanExecute

Skip execution when heartbeats are disabled:

func (s *StatusReportingProcess) CanExecute(log *zerolog.Logger) (bool, string) {
-  return true, fmt.Sprintf("Process %s can execute all conditions fulfilled", s.name)
+  if s.cm != nil && s.cm.IsHeartbeatDisabled() {
+    return false, "heartbeat disabled by configuration"
+  }
+  return true, fmt.Sprintf("Process %s can execute all conditions fulfilled", s.name)
}
🧹 Nitpick comments (6)
internal/state/registration_process.go (1)

137-145: Avoid token-in-path for registration (security/observability concern)

Embedding the token in the URL path risks leakage via logs/proxies. Prefer Authorization: Bearer and a path without secrets.

internal/scheduler/scheduler.go (1)

137-138: Nil-guard upstreamPayload when launching processes

If any scheduler is created with a nil upstreamPayload, passing it blindly can NPE in implementations. Guard:

payload := s.upstreamPayload
if payload == nil {
  payload = &UpstreamInfo{}
}
if err := s.process.Execute(ctx, payload); err != nil { ... }
internal/state/reporting_process.go (4)

87-92: Make start() idempotent to prevent re‑entrancy

Guard re‑entry:

func (s *StatusReportingProcess) start() bool {
-  s.mu.Lock(); defer s.mu.Unlock()
-  s.isRunning = true
-  return true
+  s.mu.Lock()
+  defer s.mu.Unlock()
+  if s.isRunning {
+    return false
+  }
+  s.isRunning = true
+  return true
}

41-46: Avoid double stop()

You already defer s.stop(); calling it again in the CanExecute false branch is redundant and risks confusing lifecycle logs. Remove the extra s.stop().


58-63: Default interval on parse error

If interval parse fails, use a sane default (e.g., 5m) instead of zero duration:

duration, err := scheduler.ParseEveryExpr(s.cm.GetStateReportingInterval())
if err != nil {
-  log.Warn().Err(err).Msg("Failed to parse state reporting interval")
+  log.Warn().Err(err).Msg("Failed to parse state reporting interval, defaulting to 5m")
+  duration = 5 * time.Minute
}

Add import if needed:

import "time"

52-56: Handle missing satellite name decisively

If extractSatelliteNameFromURL fails, skip sending this heartbeat instead of emitting an empty name.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8fb0e21 and 25d073e.

📒 Files selected for processing (7)
  • internal/satellite/satellite.go (2 hunks)
  • internal/scheduler/process.go (1 hunks)
  • internal/scheduler/scheduler.go (4 hunks)
  • internal/state/registration_process.go (3 hunks)
  • internal/state/reporting_process.go (1 hunks)
  • internal/state/state_process.go (12 hunks)
  • pkg/config/getters.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/satellite/satellite.go
  • pkg/config/getters.go
🧰 Additional context used
🧬 Code graph analysis (5)
internal/state/registration_process.go (1)
internal/scheduler/scheduler.go (1)
  • UpstreamInfo (13-18)
internal/scheduler/scheduler.go (1)
internal/scheduler/process.go (1)
  • Process (8-20)
internal/scheduler/process.go (1)
internal/scheduler/scheduler.go (1)
  • UpstreamInfo (13-18)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
  • ConfigManager (29-37)
internal/scheduler/scheduler.go (2)
  • UpstreamInfo (13-18)
  • ParseEveryExpr (151-160)
internal/logger/logger.go (1)
  • FromContext (85-94)
internal/state/report.go (1)
  • StatusReportParams (16-26)
internal/state/state_process.go (3)
internal/scheduler/scheduler.go (4)
  • UpstreamInfo (13-18)
  • ActivityEncounteredError (33-33)
  • ActivityReconcilingState (34-34)
  • ActivityStateSynced (32-32)
internal/state/state.go (1)
  • SatelliteState (27-30)
pkg/config/config.go (1)
  • Config (37-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: e2e-tests
  • GitHub Check: test-release
  • GitHub Check: build-ground-control
  • GitHub Check: Codacy Static Code Analysis

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
internal/state/state_process.go (2)

130-190: Critical: Data race on upstreamPayload from concurrent goroutines.

Multiple state fetcher goroutines write to upstreamPayload.CurrentActivity (lines 150, 159, 171, 179) without synchronization, causing a data race. The aggregation loop (lines 273-316) also reads upstreamPayload concurrently.

Two fixes:

  • Preferred: Remove all direct writes to upstreamPayload from worker goroutines. Instead, have workers communicate results via the existing StateFetcherResult channel, add an Activity field to the result struct, and update upstreamPayload only in the main aggregation loop (single goroutine).
  • Alternative: Make UpstreamInfo concurrency-safe by embedding sync.RWMutex and exposing Set*/Get* methods; use Lock for writers and RLock for readers throughout.

Additionally, only set ActivityStateSynced after the aggregation loop confirms all workers succeeded, not inside the config goroutine (line 270).


268-271: Major: Premature "state synced" status.

Setting ActivityStateSynced inside the config goroutine (line 270) can announce success before state fetchers finish or errors are aggregated. Move this assignment after the aggregation loop (lines 273-316) completes successfully with no errors.

ground-control/internal/server/satellite_handlers.go (1)

356-367: Address logging and response consistency.

Two issues:

  1. Line 364: fmt.Printf is inconsistent. Use the standard log package (e.g., log.Printf) for consistency with other handlers.
  2. Line 365: Handler sends only a status header. For consistency with other handlers (e.g., listSatelliteHandler, GetSatelliteByName), send an explicit JSON response:
    WriteJSONResponse(w, http.StatusOK, map[string]string{"message": "Status received"})
internal/state/reporting_process.go (2)

101-103: Honor heartbeat_config.disabled.

CanExecute should check if heartbeats are disabled via configuration:

  func (s *StatusReportingProcess) CanExecute(log *zerolog.Logger) (bool, string) {
+   if s.cm.IsHeartbeatDisabled() {
+     return false, "heartbeat disabled by configuration"
+   }
    return true, fmt.Sprintf("Process %s can execute all conditions fulfilled", s.name)
  }

121-151: Missing Authorization header; fix error message.

Issue 1: The request lacks an Authorization header. If Ground Control requires authentication, add:

  httpReq.Header.Set("Content-Type", "application/json")
+ // Add satellite token if required by Ground Control
+ // httpReq.Header.Set("Authorization", fmt.Sprintf("Bearer %s", s.cm.GetToken()))

Issue 2: Line 147 error message is incorrect:

  if resp.StatusCode < 200 || resp.StatusCode >= 300 {
-   return fmt.Errorf("heartbeat request timed out : %w, with status code %d", err, resp.StatusCode)
+   return fmt.Errorf("heartbeat request failed with status code %d", resp.StatusCode)
  }

The error message says "timed out" but this is a status code check, not a timeout. Additionally, err would be nil here (HTTP request succeeded but returned non-2xx), causing a nil dereference in %w.

🧹 Nitpick comments (5)
pkg/config/validate.go (1)

89-92: Consider aligning with the existing pattern for cron field validation.

The explicit empty-string check and defaulting here differs from how StateReplicationInterval (lines 94-97) and RegisterSatelliteInterval (lines 99-102) are handled—those fields rely solely on validation to catch empty or invalid values. While having a separate message for "not specified" vs. "invalid" can improve UX, the inconsistency may confuse future maintainers.

Additionally, line 91 has a space before the colon ("defaulting to : %s"), which is a minor stylistic inconsistency.

Consider one of these approaches for consistency:

Option 1: Remove the explicit empty check (lines 89-92) and rely on validation (lines 104-107) to catch both empty and invalid cases, matching the pattern used for other cron fields:

-	if config.HeartbeatConfig.StateReportInterval == "" {
-		config.HeartbeatConfig.StateReportInterval = DefaultStateReportCronExpr
-		warnings = append(warnings, fmt.Sprintf("heartbeat interval not specified, defaulting to : %s", DefaultStateReportCronExpr))
-	}
-

Then update the validation message to be clearer:

 	if !isValidCronExpression(config.HeartbeatConfig.StateReportInterval) {
 		config.HeartbeatConfig.StateReportInterval = DefaultStateReportCronExpr
-		warnings = append(warnings, fmt.Sprintf("invalid schedule provided for state_report_interval, using default schedule %s", DefaultStateReportCronExpr))
+		warnings = append(warnings, fmt.Sprintf("invalid or empty schedule provided for state_report_interval, using default schedule %s", DefaultStateReportCronExpr))
 	}

Option 2: Add explicit empty-string checks for StateReplicationInterval and RegisterSatelliteInterval to match this pattern (better UX but more code).

Option 3: Keep as-is if the distinct "not specified" message is intentional for the new heartbeat feature, but fix the spacing:

-		warnings = append(warnings, fmt.Sprintf("heartbeat interval not specified, defaulting to : %s", DefaultStateReportCronExpr))
+		warnings = append(warnings, fmt.Sprintf("heartbeat interval not specified, defaulting to: %s", DefaultStateReportCronExpr))
ground-control/internal/server/satellite_handlers.go (1)

29-39: Suggest consolidating duplicate type definitions.

SatelliteStatusParams here duplicates StatusReportParams in internal/state/report.go (lines 15-25). Consider:

  • Import and use state.StatusReportParams in the handler
  • Or move the shared type to a common package (e.g., pkg/api)

This eliminates duplication and ensures consistency.

internal/state/reporting_process.go (3)

17-31: Remove unused Done channel.

The Done field (line 21) is declared but never used. Remove it to reduce confusion, or document its intended purpose if it's for future use.


88-93: start() always returns true; consider preventing concurrent execution.

If concurrent Execute calls are a concern, implement a guard:

  func (s *StatusReportingProcess) start() bool {
    s.mu.Lock()
    defer s.mu.Unlock()
+   if s.isRunning {
+     return false  // Already running
+   }
    s.isRunning = true
    return true
  }

Otherwise, remove the return value as it's always true.


53-86: Return on status report error to avoid misleading success log.

   if err := sendStatusReport(ctx, groundControlURL, &req); err != nil {
-    log.Warn().Err(err).Msg("Failed to send status report")
+    return fmt.Errorf("failed to send status report: %w", err)
   }

   log.Info().Msg("Heartbeat sent to ground control successfully")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0eb13c and 763fdc2.

📒 Files selected for processing (4)
  • ground-control/internal/server/satellite_handlers.go (3 hunks)
  • internal/state/reporting_process.go (1 hunks)
  • internal/state/state_process.go (12 hunks)
  • pkg/config/validate.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
  • ConfigManager (29-37)
internal/scheduler/scheduler.go (2)
  • UpstreamInfo (13-18)
  • ParseEveryExpr (151-160)
internal/logger/logger.go (1)
  • FromContext (85-94)
internal/state/report.go (1)
  • StatusReportParams (16-26)
internal/state/state_process.go (2)
internal/scheduler/scheduler.go (4)
  • UpstreamInfo (13-18)
  • ActivityEncounteredError (33-33)
  • ActivityReconcilingState (34-34)
  • ActivityStateSynced (32-32)
internal/state/state.go (1)
  • SatelliteState (27-30)
ground-control/internal/server/satellite_handlers.go (3)
ground-control/internal/server/server.go (1)
  • Server (18-22)
ground-control/internal/server/helpers.go (1)
  • DecodeRequestBody (223-231)
ground-control/internal/server/handleResponse.go (1)
  • HandleAppError (26-35)
pkg/config/validate.go (2)
pkg/config/config.go (1)
  • HeartbeatConfig (32-35)
pkg/config/constants.go (1)
  • DefaultStateReportCronExpr (20-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: e2e-tests
  • GitHub Check: build-satellite
  • GitHub Check: build-ground-control
  • GitHub Check: test-release
🔇 Additional comments (4)
pkg/config/validate.go (1)

104-107: LGTM!

The validation logic is correct and follows the same pattern used for other cron intervals (StateReplicationInterval and RegisterSatelliteInterval). The default fallback and warning message are appropriate.

internal/state/state_process.go (3)

62-62: LGTM: Execute signature updated to accept upstreamPayload.

The signature change and initial StateURL assignment are appropriate. The main goroutine safely initializes StateURL before spawning workers.

Also applies to: 85-86


109-118: LGTM: State digest fetch and assignment.

The digest fetch logic is correct. Errors are logged and ActivityEncounteredError is set, while execution continues for partial processing. Assignment to upstreamPayload.LatestStateDigest is safe as it occurs in the main goroutine before workers start.


209-212: LGTM: Config digest conditionally assigned.

Good fix! LatestConfigDigest is now only set when FetchDigest succeeds (err == nil), addressing the previous review concern.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
internal/state/reporting_process.go (3)

33-86: Implement periodic heartbeats as required; current implementation is event-driven only.

Per PR objectives (issue #207) and a past review comment, satellites must send heartbeats at a configured interval even when nothing changes. The current implementation only sends a report when Execute is called with new upstream info, making it purely event-driven rather than periodic.

As suggested in the past review, parse the interval once, create a time.NewTicker, and send reports on each tick. You can still send immediately on upstream updates if desired, but the ticker ensures periodic reporting regardless of activity.


101-103: Honor heartbeat_config.disabled in CanExecute.

Per the PR description and a past review comment, the configuration includes heartbeat_config.disabled to allow disabling heartbeats. CanExecute should check this flag and return false when heartbeats are disabled.

Apply this diff to add the guard:

 func (s *StatusReportingProcess) CanExecute(log *zerolog.Logger) (bool, string) {
+	if s.cm != nil && s.cm.IsHeartbeatDisabled() {
+		return false, "heartbeat disabled by configuration"
+	}
 	return true, fmt.Sprintf("Process %s can execute all conditions fulfilled", s.name)
 }

Note: Verify the exact method name on ConfigManager (e.g., IsHeartbeatDisabled(), GetHeartbeatDisabled(), or similar).


137-137: Add Authorization header with satellite token.

Per a past review comment, if Ground Control requires authentication, the request should include an Authorization: Bearer <token> header. Based on the relevant code snippets showing ConfigManager has a Token field, retrieve and set the token.

Apply this diff to add the Authorization header:

 	httpReq.Header.Set("Content-Type", "application/json")
+	if s.cm.Token != "" {
+		httpReq.Header.Set("Authorization", fmt.Sprintf("Bearer %s", s.cm.Token))
+	}

Note: The Token field is shown in the relevant code snippets from pkg/config/manager.go. Adjust the field access pattern if the token is retrieved via a getter method instead.

🧹 Nitpick comments (2)
internal/state/reporting_process.go (2)

21-21: Remove unused Done channel.

The Done field is declared but never initialized, closed, or selected anywhere in the code.

Apply this diff to remove the unused field:

 type StatusReportingProcess struct {
 	name      string
 	isRunning bool
 	mu        *sync.Mutex
-	Done      chan struct{}
 	cm        *config.ConfigManager
 }

115-119: Remove unnecessary mutex in Name().

The name field is set once in the constructor and never modified, so the mutex is unnecessary overhead for this read-only access.

Apply this diff:

 func (s *StatusReportingProcess) Name() string {
-	s.mu.Lock()
-	defer s.mu.Unlock()
 	return s.name
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 763fdc2 and fa1f204.

📒 Files selected for processing (1)
  • internal/state/reporting_process.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
  • ConfigManager (29-37)
internal/scheduler/scheduler.go (2)
  • UpstreamInfo (13-18)
  • ParseEveryExpr (151-160)
internal/logger/logger.go (1)
  • FromContext (85-94)
internal/state/report.go (1)
  • StatusReportParams (16-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: e2e-tests
  • GitHub Check: build-ground-control
  • GitHub Check: build-satellite
  • GitHub Check: test-release

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa1f204 and cef9020.

📒 Files selected for processing (1)
  • dagger.json (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: e2e-tests
  • GitHub Check: build-satellite
  • GitHub Check: build-ground-control

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
ground-control/internal/server/satellite_handlers.go (1)

364-364: Use log.Printf for consistency with the codebase.

Line 364 uses fmt.Printf for debug output, which is inconsistent with the rest of the server code that uses the standard log package (e.g., lines 48, 76, 128, etc.).

Based on learnings

Apply this diff:

 	}
 	// todo : process the heartbeat. eg:- save latest state in db
-	fmt.Printf("satellite reported status : %v\n", req)
+	log.Printf("satellite reported status: %v", req)
 	w.WriteHeader(http.StatusOK)
🧹 Nitpick comments (1)
ground-control/internal/server/satellite_handlers.go (1)

356-367: Return a JSON response body for client clarity and API consistency.

The handler currently only sends a 200 status code without a response body (line 365). Other handlers in this file return JSON responses using WriteJSONResponse (e.g., lines 235, 338, 353, 384, 479, 540, 668). Consider returning a simple success JSON for consistency.

Apply this diff:

 func (s *Server) statusReportHandler(w http.ResponseWriter, r *http.Request) {
 	var req SatelliteStatusParams
 	if err := DecodeRequestBody(r, &req); err != nil {
 		log.Println(err)
 		HandleAppError(w, err)
 		return
 	}
 	// todo : process the heartbeat. eg:- save latest state in db
-	fmt.Printf("satellite reported status : %v\n", req)
-	w.WriteHeader(http.StatusOK)
-
+	log.Printf("satellite reported status: %v", req)
+	WriteJSONResponse(w, http.StatusOK, map[string]string{"message": "Status received"})
 }

If your clients expect an empty 200 response, please confirm and I can mark this as resolved.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4892862 and fb8eefd.

⛔ Files ignored due to path filters (1)
  • ground-control/go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • .dagger/e2e.go (2 hunks)
  • go.mod (5 hunks)
  • ground-control/go.mod (2 hunks)
  • ground-control/internal/server/satellite_handlers.go (3 hunks)
  • pkg/config/config.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ground-control/internal/server/satellite_handlers.go (3)
ground-control/internal/server/server.go (1)
  • Server (18-22)
ground-control/internal/server/helpers.go (1)
  • DecodeRequestBody (223-231)
ground-control/internal/server/handleResponse.go (1)
  • HandleAppError (26-35)
🪛 OSV Scanner (2.2.3)
ground-control/go.mod

[HIGH] 250-250: github.com/golang-jwt/jwt 3.2.2+incompatible: Excessive memory allocation during header parsing in github.com/golang-jwt/jwt

(GO-2025-3553)


[HIGH] 250-250: github.com/golang-jwt/jwt 3.2.2+incompatible: jwt-go allows excessive memory allocation during header parsing

(GHSA-mh63-6h87-95cp)


[HIGH] 252-252: github.com/golang-jwt/jwt/v5 5.2.1: Excessive memory allocation during header parsing in github.com/golang-jwt/jwt

(GO-2025-3553)


[HIGH] 252-252: github.com/golang-jwt/jwt/v5 5.2.1: jwt-go allows excessive memory allocation during header parsing

(GHSA-mh63-6h87-95cp)


[HIGH] 280-280: github.com/hashicorp/go-getter 1.7.8: HashiCorp go-getter Vulnerable to Symlink Attacks in github.com/hashicorp/go-getter

(GO-2025-3892)


[HIGH] 280-280: github.com/hashicorp/go-getter 1.7.8: HashiCorp go-getter Vulnerable to Symlink Attacks

(GHSA-wjrx-6529-hcj3)


[HIGH] 355-355: github.com/open-policy-agent/opa 1.2.0: OPA server Data API HTTP path injection of Rego in github.com/open-policy-agent/opa

(GO-2025-3660)


[HIGH] 355-355: github.com/open-policy-agent/opa 1.2.0: OPA server Data API HTTP path injection of Rego

(GHSA-6m8w-jc87-6cr7)


[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Allows A Specially Crafted Chart Archive To Cause Out Of Memory Termination in helm.sh/helm

(GO-2025-3601)


[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Allows A Specially Crafted JSON Schema To Cause A Stack Overflow in helm.sh/helm

(GO-2025-3602)


[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm vulnerable to Code Injection through malicious chart.yaml content in helm.sh/helm

(GO-2025-3802)


[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Charts with Specific JSON Schema Values Can Cause Memory Exhaustion in helm.sh/helm

(GO-2025-3887)


[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm May Panic Due To Incorrect YAML Content in helm.sh/helm

(GO-2025-3888)


[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Allows A Specially Crafted Chart Archive To Cause Out Of Memory Termination

(GHSA-4hfp-h4cw-hj8p)


[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm vulnerable to Code Injection through malicious chart.yaml content

(GHSA-557j-xg8c-q2mm)


[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Allows A Specially Crafted JSON Schema To Cause A Stack Overflow

(GHSA-5xqw-8hwv-wg92)


[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm Charts with Specific JSON Schema Values Can Cause Memory Exhaustion

(GHSA-9h84-qmv7-982p)


[HIGH] 465-465: helm.sh/helm/v3 3.17.1: Helm May Panic Due To Incorrect YAML Content

(GHSA-f9f8-9pmf-xv68)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build-satellite
  • GitHub Check: test-release
  • GitHub Check: e2e-tests
🔇 Additional comments (6)
ground-control/go.mod (1)

477-477: LGTM! Replace directive aligns with monorepo structure.

The replace directive correctly redirects the harbor-satellite module to the parent directory, which is appropriate for local development in a monorepo.

.dagger/e2e.go (1)

72-72: LGTM! Directory structure correctly aligned with monorepo layout.

The change to mount the entire project root at /app and set the working directory to /app/ground-control properly reflects the monorepo structure where ground-control is a subdirectory.

Also applies to: 80-81

go.mod (2)

16-18: LGTM! Dependency updates support new status reporting features.

The updates to gopsutil/v4, viper, and testify align with the PR objectives:


112-112: No action required. The verification confirms that github.com/BurntSushi/toml is directly imported at internal/container_runtime/containerd.go:9, making it a proper direct dependency. The current entry in go.mod (without // indirect) is correct.

Likely an incorrect or invalid review comment.

pkg/config/config.go (1)

25-26: LGTM! New heartbeat configuration fields are well-designed.

The addition of DisableHeartbeat and HeartbeatInterval fields:

  • Follow Go naming conventions
  • Use appropriate types (bool for toggle, string for interval/cron expression)
  • Include omitempty JSON tags for optional configuration
  • Are logically placed alongside other interval settings

These fields enable the satellite heartbeat status reporting feature described in the PR objectives.

ground-control/internal/server/satellite_handlers.go (1)

29-39: LGTM! SatelliteStatusParams structure is well-designed.

The new type appropriately captures all necessary satellite health metrics:

  • Identity (Name)
  • Activity state
  • Timing (StateReportInterval, RequestCreatedTime)
  • State tracking (LatestStateDigest, LatestConfigDigest)
  • Resource metrics (MemoryUsedBytes, StorageUsedBytes, CPUPercent)

Field types and JSON tags are appropriate. This aligns with the status reporting requirements in issue #207.

Signed-off-by: meethereum <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb8eefd and 1801f2c.

📒 Files selected for processing (2)
  • .dagger/main.go (1 hunks)
  • .dagger/utils.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
.dagger/utils.go (1)
.dagger/main.go (2)
  • PROJ_MOUNT (14-14)
  • DEFAULT_GO (13-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: cubic · AI code reviewer
  • GitHub Check: build-ground-control
  • GitHub Check: build-satellite
  • GitHub Check: test-release
  • GitHub Check: e2e-tests
  • GitHub Check: Codacy Static Code Analysis

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1801f2c and da1effd.

📒 Files selected for processing (1)
  • .dagger/publish_image.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
.dagger/publish_image.go (1)
.dagger/main.go (1)
  • PROJ_MOUNT (14-14)

Signed-off-by: meethereum <[email protected]>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 23, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
.dagger/publish_image.go (2)

24-35: Inconsistent directory handling creates confusion.

The directory variable is computed per-component (lines 24-31) but only used for the debug listing container (lines 32-35). The actual build at line 36 passes the full source directory to getBuildContainer, not the per-component directory. This inconsistency may confuse readers about what's actually being built.

Consider either:

  • Removing the per-component directory computation and using source directly for the debug container, or
  • Adding a comment clarifying that directory is only for debug listing and doesn't affect the build.

Apply this diff to simplify by using source consistently:

-	var directory *dagger.Directory
-	switch {
-	case component == "satellite":
-		directory = source
-	case component == "ground-control":
-		directory = source.Directory(GROUND_CONTROL_PATH)
-	default:
-		panic(fmt.Sprintf("unknown component: %s", component))
-	}
 	dirContainer := dag.Container().From("alpine:latest").
-		WithMountedDirectory(PROJ_MOUNT, directory).
+		WithMountedDirectory(PROJ_MOUNT, source).
 		WithExec([]string{"ls", PROJ_MOUNT})
 	dirContainer.Stdout(ctx)

173-179: Add a default case for defensive programming.

The switch statement lacks a default case to handle unexpected component values. While earlier validation prevents invalid components from reaching this code, adding a default case would improve consistency with the defensive pattern used in PublishImage (lines 24-30) and make the code more robust.

Apply this diff to add a default case:

 	var workDir string
 	switch {
 	case component == "satellite":
 		workDir = PROJ_MOUNT
 	case component == "ground-control":
 		workDir = PROJ_MOUNT + "/ground-control"
+	default:
+		panic(fmt.Sprintf("unknown component: %s", component))
 	}
QUICKSTART.md (1)

169-169: Use standard JSON format without inline comments.

JSON does not officially support comments (including # syntax). While some parsers may handle this leniently, strict JSON parsers will reject it, potentially causing parsing failures. Since this is example configuration, it should be valid JSON.

Apply this diff to move the note into standard documentation:

-        "heartbeat_interval": "@every 00h00m30s", # optional, default interval is 5 minutes 
+        "heartbeat_interval": "@every 00h00m30s"

Then add clarification in the prose above or below the code block:

> **Note:** The `heartbeat_interval` field is optional and defaults to 5 minutes. Adjust the value using the `@every HHhMMMmSSs` format as needed.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da1effd and 9d8be02.

📒 Files selected for processing (2)
  • .dagger/publish_image.go (3 hunks)
  • QUICKSTART.md (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
.dagger/publish_image.go (1)
.dagger/main.go (1)
  • PROJ_MOUNT (14-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-satellite
  • GitHub Check: test-release
  • GitHub Check: e2e-tests
  • GitHub Check: build-ground-control
  • GitHub Check: lint
🔇 Additional comments (2)
.dagger/publish_image.go (1)

36-36: Build path fix resolves previous critical issue.

The change to pass the full source directory to getBuildContainer (line 36) and mount it at PROJ_MOUNT (line 192), combined with the per-component workDir logic (lines 173-179), correctly resolves the previous critical path mismatch issue for ground-control. Now:

  • satellite: Full source mounted at /app, workDir=/app, build command finds ./cmd/main.go
  • ground-control: Full source mounted at /app, workDir=/app/ground-control (exists), build command finds go.mod and sources ✓

Also applies to: 192-193

QUICKSTART.md (1)

169-169: No changes needed—documentation is accurate.

Verification confirms the QUICKSTART.md example is correct:

  • Field name: heartbeat_interval is the correct field in AppConfig (confirmed in pkg/config/config.go:26)
  • Default value: The default is indeed 5 minutes (@every 00h05m00s, defined in pkg/config/constants.go:20)
  • Documentation: The inline comment "optional, default interval is 5 minutes" accurately describes the default
  • Example: The @every 00h00m30s (30 seconds) is a valid override from the default, properly documented

The example correctly shows how to configure a shorter heartbeat interval than the 5-minute default.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
ground-control/internal/server/satellite_handlers.go (1)

357-375: Return JSON, use standard logger, validate inputs, and keep error handling consistent.

  • Return a small JSON ack via WriteJSONResponse instead of empty 200.
  • Use log.Printf, not fmt.Printf, for consistency.
  • Validate req.Name and parse StateReportInterval; bound CPUPercent; unify errors via HandleAppError.
  • Avoid per-request json.MarshalIndent to reduce log CPU and noise; log a compact summary.

Proposed focused change:

 func (s *Server) statusReportHandler(w http.ResponseWriter, r *http.Request) {
 	var req SatelliteStatusParams
 	if err := DecodeRequestBody(r, &req); err != nil {
 		log.Println(err)
 		HandleAppError(w, err)
 		return
 	}
 
-	//todo: instead of just printing, save the latest state in db 
-	heartbeat, err := json.MarshalIndent(req, "", "  ")
-	if err != nil {
-		log.Printf("failed to marshal json: %v\n", err)
-		http.Error(w, "failed to marshal json", http.StatusInternalServerError)
-		return
-	}
-
-	fmt.Printf("satellite reported status:\n%s\n", string(heartbeat))
-	w.WriteHeader(http.StatusOK)
+	// Validate basic fields
+	if !utils.IsValidName(req.Name) {
+		HandleAppError(w, &AppError{Message: fmt.Sprintf(invalidNameMessage, "satellite"), Code: http.StatusBadRequest})
+		return
+	}
+	if req.StateReportInterval != "" {
+		if _, err := time.ParseDuration(req.StateReportInterval); err != nil {
+			HandleAppError(w, &AppError{Message: "invalid state_report_interval", Code: http.StatusBadRequest})
+			return
+		}
+	}
+	if req.CPUPercent < 0 || req.CPUPercent > 100 {
+		HandleAppError(w, &AppError{Message: "cpu_percent must be between 0 and 100", Code: http.StatusBadRequest})
+		return
+	}
+
+	// TODO: persist latest status into DB
+	log.Printf(
+		"satellite status: name=%s activity=%s interval=%s cpu=%.2f mem=%dB storage=%dB req_time=%s",
+		req.Name, req.Activity, req.StateReportInterval, req.CPUPercent, req.MemoryUsedBytes, req.StorageUsedBytes,
+		req.RequestCreatedTime.Format(time.RFC3339),
+	)
+
+	// Respond with a small JSON ack for client clarity
+	WriteJSONResponse(w, http.StatusOK, map[string]interface{}{
+		"message":     "ok",
+		"received_at": time.Now().UTC().Format(time.RFC3339),
+	})
 }

To confirm routing and auth:

#!/bin/bash
# Verify route and middleware for /satellites/status
rg -nP --type=go -C3 '/satellites/status' || true
rg -nP --type=go -C3 'statusReportHandler\s*\(' ground-control | sed -n '1,120p'
🧹 Nitpick comments (2)
ground-control/internal/server/satellite_handlers.go (2)

4-4: Imports OK; consider dropping encoding/json if we remove pretty-printing.

Looks fine. If you switch logging in statusReportHandler away from json.MarshalIndent (see suggestion below), remove the encoding/json import to avoid an unused import.

Apply if you adopt the handler change:

 import (
-	"encoding/json"
 	"fmt"
 	"log"
 	"net/http"
 	"os"
 	"strconv"
 	"time"
 )

Also applies to: 10-10


30-40: Tighten payload schema: make optional fields omitempty; plan basic validation.

  • Name should be validated (utils.IsValidName), others can be optional. Add ,omitempty so clients aren’t forced to send empty values. Consider parsing StateReportInterval to time.Duration in the handler and bounding CPUPercent to [0,100]. You can keep RequestCreatedTime as is; decode expects RFC3339.

Minimal tag tweak:

 type SatelliteStatusParams struct {
 	Name                string    `json:"name"`                  // Satellite identifier
-	Activity            string    `json:"activity"`              // Current activity satellite is doing
-	StateReportInterval string    `json:"state_report_interval"` // Interval between status reports
-	LatestStateDigest   string    `json:"latest_state_digest"`   // Digest of latest state artifact
-	LatestConfigDigest  string    `json:"latest_config_digest"`  // Digest of latest config artifact
-	MemoryUsedBytes     uint64    `json:"memory_used_bytes"`     // Memory currently used by satellite
-	StorageUsedBytes    uint64    `json:"storage_used_bytes"`    // Storage currently used by satellite
-	CPUPercent          float64   `json:"cpu_percent"`           // CPU usage percentage
-	RequestCreatedTime  time.Time `json:"request_created_time"`  // Timestamp of request creation
+	Activity            string    `json:"activity,omitempty"`              // Current activity satellite is doing
+	StateReportInterval string    `json:"state_report_interval,omitempty"` // Interval between status reports
+	LatestStateDigest   string    `json:"latest_state_digest,omitempty"`   // Digest of latest state artifact
+	LatestConfigDigest  string    `json:"latest_config_digest,omitempty"`  // Digest of latest config artifact
+	MemoryUsedBytes     uint64    `json:"memory_used_bytes,omitempty"`     // Memory currently used by satellite
+	StorageUsedBytes    uint64    `json:"storage_used_bytes,omitempty"`    // Storage currently used by satellite
+	CPUPercent          float64   `json:"cpu_percent,omitempty"`           // CPU usage percentage
+	RequestCreatedTime  time.Time `json:"request_created_time,omitempty"`  // Timestamp of request creation
 }

Optional follow-ups:

  • Consider pointer fields for metrics (e.g., *uint64, *float64) to distinguish “missing” vs “zero”.
  • If aligning with CloudEvents later, map RequestCreatedTime to the CloudEvents time attribute.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d8be02 and 71cd308.

📒 Files selected for processing (1)
  • ground-control/internal/server/satellite_handlers.go (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ground-control/internal/server/satellite_handlers.go (2)
ground-control/internal/server/helpers.go (1)
  • DecodeRequestBody (223-231)
ground-control/internal/server/handleResponse.go (1)
  • HandleAppError (26-35)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: lint
  • GitHub Check: e2e-tests
  • GitHub Check: test-release
  • GitHub Check: build-satellite
  • GitHub Check: build-ground-control

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
ground-control/internal/server/satellite_handlers.go (2)

377-377: Use structured logger instead of fmt.Printf.

As flagged in previous reviews, use log.Printf for consistency with the rest of the server codebase.

Apply this diff:

-	fmt.Printf("satellite reported status:\n%s\n", string(heartbeat))
+	log.Printf("satellite reported status:\n%s\n", string(heartbeat))

378-379: Send an explicit JSON response body.

As flagged in previous reviews, the handler should return a JSON response for consistency and client clarity rather than just setting the status header.

Apply this diff:

-	fmt.Printf("satellite reported status:\n%s\n", string(heartbeat))
-	w.WriteHeader(http.StatusOK)
+	log.Printf("satellite reported status:\n%s\n", string(heartbeat))
+	WriteJSONResponse(w, http.StatusOK, map[string]string{"message": "Status report received"})
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71cd308 and b7a233f.

📒 Files selected for processing (1)
  • ground-control/internal/server/satellite_handlers.go (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ground-control/internal/server/satellite_handlers.go (3)
ground-control/internal/server/server.go (1)
  • Server (18-22)
ground-control/internal/server/helpers.go (1)
  • DecodeRequestBody (223-231)
ground-control/internal/server/handleResponse.go (2)
  • HandleAppError (26-35)
  • AppError (9-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: lint
  • GitHub Check: build-satellite
  • GitHub Check: test-release
  • GitHub Check: e2e-tests
  • GitHub Check: build-ground-control
🔇 Additional comments (3)
ground-control/internal/server/satellite_handlers.go (3)

4-4: LGTM: Import additions are appropriate.

The encoding/json and time imports are necessary for the new status reporting functionality.

Also applies to: 10-10


30-40: LGTM: Well-defined status parameters struct.

The struct is clearly documented with appropriate field types and JSON tags for status reporting.


357-379: Verify authentication/authorization for the status endpoint.

Ensure that the POST /satellites/status endpoint is protected by authentication middleware to prevent unauthorized status reports.

Run the following script to check the routing configuration and middleware:

Signed-off-by: meethereum <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
pkg/config/manager.go (1)

132-138: Consider using distinct ConfigChangeType values for different intervals.

Both StateReplicationInterval and HeartbeatInterval changes emit the same IntervalsChanged type. When both intervals change simultaneously, consumers receive two ConfigChange entries with identical types, requiring them to inspect OldValue/NewValue to distinguish which interval changed.

For improved clarity and type safety, consider introducing separate constants like HeartbeatIntervalChanged and StateReplicationIntervalChanged.

Apply this refactor if you want to improve type specificity:

 const (
 	LogLevelChanged  ConfigChangeType = "log_level"
 	IntervalsChanged ConfigChangeType = "intervals"
+	HeartbeatIntervalChanged ConfigChangeType = "heartbeat_interval"
+	StateReplicationIntervalChanged ConfigChangeType = "state_replication_interval"
 	ZotConfigChanged ConfigChangeType = "zot_config"
 )

Then update the detection logic:

 	if oldConfig.AppConfig.StateReplicationInterval != newConfig.AppConfig.StateReplicationInterval {
 		changes = append(changes, ConfigChange{
-			Type:     IntervalsChanged,
+			Type:     StateReplicationIntervalChanged,
 			OldValue: oldConfig.AppConfig.StateReplicationInterval,
 			NewValue: newConfig.AppConfig.StateReplicationInterval,
 		})
 	}

 	if oldConfig.AppConfig.HeartbeatInterval != newConfig.AppConfig.HeartbeatInterval {
 		changes = append(changes, ConfigChange{
-			Type:     IntervalsChanged,
+			Type:     HeartbeatIntervalChanged,
 			OldValue: oldConfig.AppConfig.HeartbeatInterval,
 			NewValue: newConfig.AppConfig.HeartbeatInterval,
 		})
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7a233f and f94b272.

📒 Files selected for processing (3)
  • cmd/main.go (2 hunks)
  • internal/hotreload/hotreload.go (3 hunks)
  • pkg/config/manager.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
internal/hotreload/hotreload.go (1)
internal/scheduler/scheduler.go (1)
  • Scheduler (21-29)
pkg/config/manager.go (1)
pkg/config/config.go (1)
  • AppConfig (17-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build-ground-control
  • GitHub Check: e2e-tests
  • GitHub Check: build-satellite
  • GitHub Check: test-release
🔇 Additional comments (1)
internal/hotreload/hotreload.go (1)

23-23: LGTM!

The heartbeat scheduler field, constructor parameter, and initialization follow the same pattern as the existing state replication scheduler.

Also applies to: 33-33, 40-40

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
internal/hotreload/hotreload.go (1)

101-107: Critical copy-paste errors remain unfixed.

This issue was already flagged in a previous review. The heartbeat scheduler block contains three critical copy-paste errors:

  1. Line 102: Log message incorrectly references "state replication scheduler"
  2. Line 103: Operates on hrm.stateReplicationScheduler instead of hrm.heartbeatScheduler
  3. Line 105: Error message incorrectly references "state replication scheduler"

This will cause the state replication scheduler to be reset twice with different intervals, and the heartbeat scheduler will never be reset at all.

Apply the fix from the previous review:

 if hrm.heartbeatScheduler != nil {
-	hrm.log.Info().Msg("Restarting state replication scheduler with new interval")
-	err := hrm.stateReplicationScheduler.ResetIntervalFromExpr(hrm.cm.GetHeartbeatInterval())
+	hrm.log.Info().Msg("Restarting heartbeat scheduler with new interval")
+	err := hrm.heartbeatScheduler.ResetIntervalFromExpr(hrm.cm.GetHeartbeatInterval())
 	if err != nil {
-		return fmt.Errorf("unable to restart state replication scheduler: %w", err)
+		return fmt.Errorf("unable to restart heartbeat scheduler: %w", err)
 	}
 }
internal/state/reporting_process.go (5)

135-138: Add Authorization header (if GC requires it) and reuse an HTTP client.

Include bearer token when configured and reuse a client to benefit from connection pooling.

 	httpReq.Header.Set("Content-Type", "application/json")
+	if sCtx, ok := ctx.Value(struct{}{}).(*struct{}); ok && sCtx == nil { _ = ok } // no-op placeholder to avoid unused warnings if you inline below
+	if req != nil && s != nil && s.cm != nil && s.cm.Token != "" {
+		httpReq.Header.Set("Authorization", fmt.Sprintf("Bearer %s", s.cm.Token))
+	}
 
-	client := &http.Client{}
+	// TODO: inject a shared *http.Client with timeouts via StatusReportingProcess if this runs frequently.
+	client := &http.Client{}

Would you like a small constructor change to accept a shared client?


86-91: Guard against concurrent executions.

start() always returns true, so overlapping Execute calls can run concurrently. Return false if already running.

 func (s *StatusReportingProcess) start() bool {
 	s.mu.Lock()
 	defer s.mu.Unlock()
-	s.isRunning = true
-	return true
+	if s.isRunning {
+		return false
+	}
+	s.isRunning = true
+	return true
 }

99-101: Honor heartbeat_config.disabled in CanExecute.

Skip execution when heartbeats are disabled.

 func (s *StatusReportingProcess) CanExecute(log *zerolog.Logger) (bool, string) {
-	return true, fmt.Sprintf("Process %s can execute all conditions fulfilled", s.name)
+	if s.cm != nil && s.cm.IsHeartbeatDisabled() {
+		return false, "heartbeat disabled by configuration"
+	}
+	return true, fmt.Sprintf("Process %s can execute: all conditions fulfilled", s.name)
 }

53-57: Return early on failures to avoid emitting misleading success logs/payloads.

If satellite name extraction or metrics collection fails, return the error instead of proceeding.

 	satelliteName, err := extractSatelliteNameFromURL(upstream.StateURL)
 	if err != nil {
-		log.Warn().Err(err).Msg("Failed to extract satellite name")
-
+		return fmt.Errorf("extract satellite name: %w", err)
 	}
@@
 	if err := collectStatusReportParams(ctx, duration, &req); err != nil {
-		log.Warn().Err(err).Msg("Failed to collect status report parameters")
-
+		return fmt.Errorf("collect status report params: %w", err)
 	}

Also applies to: 72-75


146-148: Fix misleading/non-sensical HTTP error; include response body for diagnostics.

Message claims timeout and wraps a nil err. Report non-2xx properly.

-	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
-		return fmt.Errorf("heartbeat request timed out : %w, with status code %d", err, resp.StatusCode)
-	}
+	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+		return fmt.Errorf("heartbeat request failed with status code %d", resp.StatusCode)
+	}

Optionally also read a few bytes of resp.Body and include them in the error for easier debugging.

🧹 Nitpick comments (3)
internal/satellite/satellite.go (1)

91-93: Minor: avoid shadowing package name in Stop().

for _, scheduler := range s.schedulers { scheduler.Stop() } shadows the scheduler package. Rename loop variable for clarity.

-	for _, scheduler := range s.schedulers {
-		scheduler.Stop()
+	for _, sch := range s.schedulers {
+		sch.Stop()
internal/state/reporting_process.go (2)

42-47: Remove redundant stop() call; already deferred.

You call s.stop() here and also via defer s.stop(). Drop the explicit call.

 	if !canExecute {
 		log.Warn().Msgf("Process %s cannot execute: %s", s.name, reason)
-		s.stop()
 		return nil
 	}

17-23: Remove unused field.

Done chan struct{} isn’t used; delete to keep the type lean.

 type StatusReportingProcess struct {
 	name      string
 	isRunning bool
 	mu        *sync.Mutex
-	Done      chan struct{}
 	cm        *config.ConfigManager
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f94b272 and d127f52.

📒 Files selected for processing (4)
  • internal/hotreload/hotreload.go (3 hunks)
  • internal/satellite/satellite.go (2 hunks)
  • internal/state/reporting_process.go (1 hunks)
  • pkg/config/getters.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/config/getters.go
🧰 Additional context used
🧬 Code graph analysis (3)
internal/state/reporting_process.go (4)
pkg/config/manager.go (1)
  • ConfigManager (29-37)
internal/scheduler/scheduler.go (2)
  • UpstreamInfo (13-18)
  • ParseEveryExpr (151-160)
internal/logger/logger.go (1)
  • FromContext (85-94)
internal/state/report.go (1)
  • StatusReportParams (16-26)
internal/satellite/satellite.go (2)
internal/scheduler/scheduler.go (2)
  • UpstreamInfo (13-18)
  • NewSchedulerWithInterval (38-55)
internal/state/reporting_process.go (1)
  • NewStatusReportingProcess (25-31)
internal/hotreload/hotreload.go (1)
internal/scheduler/scheduler.go (1)
  • Scheduler (21-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build-ground-control
  • GitHub Check: build-satellite
  • GitHub Check: e2e-tests
  • GitHub Check: test-release
🔇 Additional comments (2)
internal/hotreload/hotreload.go (1)

23-23: LGTM! Heartbeat scheduler field addition is clean.

The heartbeatScheduler field, parameter, and initialization follow the same pattern as the existing stateReplicationScheduler. The changes are consistent and well-structured.

Also applies to: 33-33, 40-40

internal/satellite/satellite.go (1)

82-83: Review comment is incorrect; no changes needed.

The review misunderstands the intended design. Satellite.Run() is correctly non-blocking: it starts background scheduler goroutines and returns immediately. This pattern allows the caller (cmd/main.go) to:

  1. Initialize all components after Run() returns (line 154–157)
  2. Manage blocking separately with <-ctx.Done() (line 160)

Adding <-ctx.Done() inside Run() would prevent line 154–157 from executing, breaking the initialization sequence. The current implementation is a well-established pattern where Run() starts background work and returns while the caller handles blocking.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d127f52 and e5436be.

📒 Files selected for processing (1)
  • internal/hotreload/hotreload.go (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/hotreload/hotreload.go (1)
internal/scheduler/scheduler.go (1)
  • Scheduler (21-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: test-release
  • GitHub Check: build-ground-control
  • GitHub Check: build-satellite
🔇 Additional comments (1)
internal/hotreload/hotreload.go (1)

23-23: LGTM! Field, parameter, and initialization are correct.

The heartbeat scheduler integration follows the same pattern as the state replication scheduler. The field declaration, constructor parameter, and initialization are all properly implemented.

Also applies to: 33-33, 40-40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation golang

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Satellite Heart Beat

1 participant