Skip to content

Conversation

@mihow
Copy link
Collaborator

@mihow mihow commented Oct 7, 2025

Summary

Introduces a new periodic status check for all incomplete jobs.

Fixes: #721

List of Changes

Detailed Description

How to Test the Changes

Checklist

  • I have tested these changes appropriately.
  • I have added and/or modified relevant tests.
  • I updated relevant documentation or comments.
  • I have verified that this PR follows the project's coding standards.
  • Any dependent changes have already been merged to main.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added debugging utilities for troubleshooting job and task issues.
    • Introduced automatic periodic checks of unfinished jobs every 3 minutes to maintain consistency.
  • Bug Fixes

    • Improved concurrent job update handling with enhanced safety mechanisms.
    • Better detection and automatic retry of disappeared tasks.
  • Tests

    • Added comprehensive test coverage for job status checking and concurrent operations.

Copilot AI review requested due to automatic review settings October 7, 2025 01:40
@netlify
Copy link

netlify bot commented Oct 7, 2025

Deploy Preview for antenna-preview ready!

Name Link
🔨 Latest commit 3917882
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6903e12705cb520008cadacb
😎 Deploy Preview https://deploy-preview-981--antenna-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 29 (🔴 down 2 from production)
Accessibility: 80 (no change from production)
Best Practices: 100 (no change from production)
SEO: 92 (no change from production)
PWA: 80 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a comprehensive job status monitoring system to detect and handle disconnected and stale jobs automatically. The system introduces periodic status checking with configurable timeouts and automatic retry mechanisms for failed tasks.

  • Adds a new check_status() method to the Job model that validates job state against Celery task status
  • Implements a periodic task (check_unfinished_jobs) that runs every 3 minutes to monitor all incomplete jobs
  • Includes extensive test coverage for various job failure scenarios including disappeared tasks, stale jobs, and timeout conditions

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ami/jobs/models.py Core implementation of job status checking with timeout handling and auto-retry logic
ami/jobs/tasks.py Periodic task for monitoring unfinished jobs with cache-based locking
ami/jobs/tests.py Comprehensive test suite covering all status checking scenarios
ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py Database migration adding last_checked_at field and periodic task setup
ami/jobs/management/commands/update_stale_jobs.py Updated management command to use new status checking system
ami/jobs/management/commands/debug_jobs.py New debugging utility for testing job status scenarios
setup.cfg Flake8 configuration to ignore F541 warnings
.vscode/settings.json VS Code settings update for F541 ignore

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 927 to 931
# Only update the fields we changed to avoid concurrency issues
update_fields = ["last_checked_at"]
if status_changed:
update_fields.extend(["status", "progress", "finished_at"])
self.save(update_fields=update_fields, update_progress=False)
Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment mentions avoiding concurrency issues but the code pattern is repeated multiple times. Consider extracting this logic into a helper method to reduce duplication and ensure consistent handling of partial field updates.

Copilot uses AI. Check for mistakes.
unfinished_jobs = unfinished_jobs[:MAX_JOBS_PER_RUN]

# Only check jobs that haven't been checked recently
now = datetime.datetime.now()
Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using datetime.datetime.now() instead of timezone.now() may cause timezone-related issues. Since Django's timezone.now() is already imported elsewhere in the codebase, consider using it here for consistency.

Copilot uses AI. Check for mistakes.
STUCK_PENDING_NO_WORKERS_TIMEOUT_SECONDS = 3600 # 1 hour - max time in PENDING without workers
PENDING_LOG_INTERVAL_SECONDS = 300 # 5 minutes - how often to log waiting messages

now = datetime.datetime.now()
Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using datetime.datetime.now() instead of Django's timezone.now() may cause timezone-related issues, especially when comparing with database timestamps that are timezone-aware.

Copilot uses AI. Check for mistakes.
Comment on lines 893 to 899
# Configuration thresholds (TODO: make these configurable via settings)
NO_TASK_ID_TIMEOUT_SECONDS = 300 # 5 minutes - time to wait for task_id before marking as failed
DISAPPEARED_TASK_RETRY_THRESHOLD_SECONDS = 300 # 5 minutes - auto-retry if task disappeared within this time
MAX_JOB_RUNTIME_SECONDS = 2 * 24 * 60 * 60 # 2 days - max time a job can run before being marked stale
STUCK_PENDING_TIMEOUT_SECONDS = 600 # 10 minutes - max time in PENDING with workers available
STUCK_PENDING_NO_WORKERS_TIMEOUT_SECONDS = 3600 # 1 hour - max time in PENDING without workers
PENDING_LOG_INTERVAL_SECONDS = 300 # 5 minutes - how often to log waiting messages
Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] These configuration constants are hardcoded in the method. Consider moving them to Django settings or class-level constants to make them more maintainable and testable.

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +87
# Configuration thresholds (TODO: make these configurable via settings)
LOCK_TIMEOUT_SECONDS = 300 # 5 minutes - how long the lock is held
MAX_JOBS_PER_RUN = 100 # Maximum number of jobs to check in one run
MIN_CHECK_INTERVAL_MINUTES = 2 # Minimum time between checks for the same job
Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Similar to the job model, these configuration constants should be moved to Django settings to make them configurable without code changes.

Copilot uses AI. Check for mistakes.
@mihow
Copy link
Collaborator Author

mihow commented Oct 10, 2025

@mihow

  • Add some filters to the jobs view in the Django admin to help troubleshoot all jobs as an admin
  • Add healthcheck to the celeryworker so that it gets restarted when stuck

@coderabbitai
Copy link

coderabbitai bot commented Oct 30, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This pull request introduces a comprehensive job status reconciliation system to resolve issues where job statuses remain outdated or "stuck" in STARTED state. It adds a periodic Celery task that continuously checks unfinished jobs against their actual Celery task states, implements atomic row-level locking for safe concurrent updates, adds a debugging management command, and includes extensive tests for status checking and concurrency scenarios.

Changes

Cohort / File(s) Summary
Core Status Reconciliation
ami/jobs/models.py, ami/jobs/tasks.py, ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py
Introduces check_unfinished_jobs() periodic task with cache-based locking to periodically re-check and reconcile job statuses against Celery task states. Adds last_checked_at field to Job model. Implements atomic job update context manager (atomic_job_update) and Celery worker availability checks. Enhances Job.save() with optional row-level locking to prevent concurrent update race conditions. Extends JobLogHandler with atomic logging writes.
Management Commands
ami/jobs/management/commands/debug_jobs.py, ami/jobs/management/commands/update_stale_jobs.py
Introduces comprehensive debug_jobs command for testing job/Celery interactions (check, remove/corrupt tasks, list unfinished jobs, create test jobs). Updates update_stale_jobs to use new Job.check_status workflow and adds --no-retry flag for disabling automatic retry logic.
Testing
ami/jobs/tests.py
Adds extensive TestJobStatusChecking suite covering status reconciliation with various Celery states, locking behavior, concurrent updates, log atomicity, pending-running ambiguity, disappeared task handling, and forced retry scenarios.

Sequence Diagram

sequenceDiagram
    participant Periodic as Periodic<br/>Task Scheduler
    participant CeleryTask as check_unfinished_jobs<br/>(Celery Task)
    participant Cache as Django Cache<br/>(Distributed Lock)
    participant JobModel as Job Model<br/>(DB + Locking)
    participant CeleryWorker as Celery AsyncResult<br/>(Remote Task State)

    Periodic->>CeleryTask: Trigger every 3 minutes
    CeleryTask->>Cache: Attempt to acquire lock
    alt Lock acquired
        Cache-->>CeleryTask: Lock successful
        CeleryTask->>JobModel: Query unfinished jobs<br/>(not recently checked)
        JobModel-->>CeleryTask: Job list
        loop For each Job
            CeleryTask->>JobModel: atomic_job_update(lock)
            CeleryTask->>CeleryWorker: Fetch current task state
            CeleryWorker-->>CeleryTask: Task status<br/>(PENDING/STARTED/SUCCESS/FAILED/etc.)
            CeleryTask->>JobModel: Reconcile & update status<br/>if state mismatched
            JobModel-->>CeleryTask: Updated (or no change)
            CeleryTask->>JobModel: Release lock
        end
        CeleryTask->>JobModel: Update last_checked_at
        CeleryTask->>Cache: Release distributed lock
    else Lock already held
        Cache-->>CeleryTask: Lock unavailable
        CeleryTask-->>Periodic: Skip (another run in progress)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • Concurrency and Locking (ami/jobs/models.py): The atomic_job_update context manager and row-level locking via select_for_update are critical for preventing race conditions. Review the lock timeout handling, fallback behavior on lock failure, and interaction with the save() method.
  • Periodic Task Cache Lock (ami/jobs/tasks.py): The distributed lock mechanism using Django cache must be verified to prevent concurrent task runs. Confirm lock acquisition, release, and timeout behavior under high concurrency.
  • Status Reconciliation Logic (ami/jobs/models.py, ami/jobs/tasks.py): The logic that reconciles job status with Celery AsyncResult state, handles disappeared tasks, and decides when to retry needs careful review for edge cases (e.g., pending-but-running ambiguity, stale timestamps).
  • Atomic Logging (ami/jobs/models.py): JobLogHandler.emit now uses atomic locking; verify that log writes are not lost and that the max log length cap is enforced without overwriting concurrent updates.
  • Field-Level Updates (ami/jobs/models.py, ami/jobs/tasks.py): Multiple save points now use update_fields to minimize lock duration; verify correctness of which fields are included and excluded in each context.
  • Test Coverage (ami/jobs/tests.py): The test suite is comprehensive; pay special attention to mocking of Celery AsyncResult states and verification of concurrent locking scenarios.

Poem

🐰 Hop, hop, hop! Our jobs were stuck in time,
STARTED forever—oh, what a crime!
Now periodic checks with locks so tight,
Reconcile statuses day and night.
Thump thump! No more stale, the status shines bright! 🌟

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description is significantly incomplete relative to the template requirements. While the Summary section provides a brief one-line overview and the Related Issues section correctly references #721, the List of Changes, Detailed Description, and How to Test the Changes sections are entirely empty. These are critical sections that should document the scope of modifications, the implementation approach, and verification steps. The absence of this content makes it difficult for reviewers to understand the full impact and validation of the changes without examining the code directly.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "Update status of disconnected and stale jobs" accurately reflects the primary objective of this changeset, which is to introduce a periodic status check mechanism for incomplete jobs and handle edge cases like disappeared Celery tasks. The title is concise, specific, and directly related to the main changes including the new periodic task, atomic job locking, and enhanced status reconciliation logic. The title clearly summarizes what a teammate would see as the core feature being delivered.
Linked Issues Check ✅ Passed The PR changes comprehensively address the core objective of issue #721 to "Ensure that a processing job's status is always updated correctly." The implementation includes a new periodic Celery task (check_unfinished_jobs) that scans and re-checks status of incomplete jobs [tasks.py], a new Job.check_status workflow that reconciles job status with Celery task state and handles disappeared/stale tasks [models.py], an updated update_stale_jobs command leveraging this workflow [update_stale_jobs.py], row-level locking to prevent concurrent update race conditions, and extensive test coverage validating status checking behavior across multiple scenarios. These changes directly address the reported issue where jobs incorrectly retained "STARTED" status despite errors or completion.
Out of Scope Changes Check ✅ Passed All substantive code changes align with the requirements of issue #721. The core changes (periodic task, atomic locking, status reconciliation, migration) directly implement the solution. The new debug_jobs.py management command, while not strictly required to fix the core issue, provides supportive tooling for testing and troubleshooting job status interactions and Celery tasks, which relates to the overall objective of improving job status reliability and debuggability. The comprehensive test suite validates the new functionality and is appropriate for the feature scope.
Docstring Coverage ✅ Passed Docstring coverage is 81.25% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/job-status-checks

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
ami/jobs/models.py (1)

1335-1359: Replace naive datetime.now() with timezone.now() in check_status.

We’re still calling datetime.datetime.now() here, so now is naive. The moment a job has scheduled_at/last_checked_at populated via timezone.now() (the normal path), the subtraction inside _check_missing_task_id or the “min check interval” logic will raise TypeError: can't subtract offset-naive and offset-aware datetimes. That makes the new periodic checker blow up after the first successful update. Please switch to django.utils.timezone.now() (and propagate it through the helpers) so we always compare like-with-like—this is the blocker that Copilot flagged earlier, and it’s still unresolved. Based on learnings.

-        now = datetime.datetime.now()
+        from django.utils import timezone
+        now = timezone.now()
ami/jobs/tasks.py (1)

115-126: Use timezone-aware timestamps in check_unfinished_jobs.

datetime.datetime.now() returns a naive datetime, but Job.last_checked_at and Job.scheduled_at are stored as timezone-aware values (we set them via timezone.now() all over the codebase). On the second invocation of this task the subtraction now - job.last_checked_at will hit a TypeError: can't subtract offset-naive and offset-aware datetimes, crashing the periodic job. Switch to django.utils.timezone.now() (and reuse it everywhere in this task) so the arithmetic stays compatible with our model fields.

@@
-    import datetime
+    import datetime
+    from django.utils import timezone
@@
-        now = datetime.datetime.now()
+        now = timezone.now()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3b9711 and 3917882.

📒 Files selected for processing (6)
  • ami/jobs/management/commands/debug_jobs.py (1 hunks)
  • ami/jobs/management/commands/update_stale_jobs.py (2 hunks)
  • ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py (1 hunks)
  • ami/jobs/models.py (24 hunks)
  • ami/jobs/tasks.py (3 hunks)
  • ami/jobs/tests.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
ami/jobs/models.py (2)
ami/jobs/views.py (1)
  • retry (98-109)
ami/ml/post_processing/base.py (1)
  • update_progress (57-68)
ami/jobs/tests.py (2)
ami/jobs/models.py (8)
  • Job (875-1641)
  • check_status (1320-1409)
  • JobState (140-194)
  • atomic_job_update (91-137)
  • save (1518-1587)
  • update_progress (1439-1463)
  • logger (1626-1635)
  • JobLogHandler (386-429)
ami/jobs/tasks.py (1)
  • check_unfinished_jobs (72-157)
ami/jobs/management/commands/debug_jobs.py (1)
ami/jobs/models.py (7)
  • JobState (140-194)
  • MLJob (456-658)
  • check_status (1320-1409)
  • running_states (185-186)
  • final_states (189-190)
  • failed_states (193-194)
  • enqueue (966-983)
ami/jobs/tasks.py (1)
ami/jobs/models.py (6)
  • logger (1626-1635)
  • save (1518-1587)
  • Job (875-1641)
  • JobState (140-194)
  • running_states (185-186)
  • check_status (1320-1409)
ami/jobs/management/commands/update_stale_jobs.py (1)
ami/jobs/models.py (5)
  • Job (875-1641)
  • JobState (140-194)
  • running_states (185-186)
  • check_status (1320-1409)
  • save (1518-1587)
🪛 Ruff (0.14.2)
ami/jobs/models.py

65-65: Consider moving this statement to an else block

(TRY300)


66-66: Do not catch blind exception: Exception

(BLE001)


73-73: Unused function argument: timestamp

(ARG001)


394-394: Comment contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF003)


1073-1073: Unused method argument: now

(ARG002)


1103-1103: Do not catch blind exception: Exception

(BLE001)


1131-1131: Unused method argument: task

(ARG002)


1172-1172: Consider moving this statement to an else block

(TRY300)


1173-1173: Do not catch blind exception: Exception

(BLE001)


1174-1174: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


1243-1243: Do not catch blind exception: Exception

(BLE001)


1244-1244: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


1369-1369: Do not catch blind exception: Exception

(BLE001)


1400-1400: Do not catch blind exception: Exception

(BLE001)


1401-1401: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


1509-1509: Do not catch blind exception: Exception

(BLE001)


1576-1576: Do not catch blind exception: Exception

(BLE001)


1577-1577: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

ami/jobs/tests.py

388-388: Unused method argument: mock_job_objects

(ARG002)

ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py

5-5: Unused function argument: apps

(ARG001)


5-5: Unused function argument: schema_editor

(ARG001)


23-23: Unused function argument: apps

(ARG001)


23-23: Unused function argument: schema_editor

(ARG001)


29-31: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


33-44: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

ami/jobs/management/commands/debug_jobs.py

119-119: Unused method argument: args

(ARG002)


163-163: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


163-163: Avoid specifying long messages outside the exception class

(TRY003)


190-190: Do not catch blind exception: Exception

(BLE001)


207-207: f-string without any placeholders

Remove extraneous f prefix

(F541)


221-221: Avoid specifying long messages outside the exception class

(TRY003)


276-276: f-string without any placeholders

Remove extraneous f prefix

(F541)


327-327: Do not catch blind exception: Exception

(BLE001)


407-407: f-string without any placeholders

Remove extraneous f prefix

(F541)


413-413: f-string without any placeholders

Remove extraneous f prefix

(F541)


431-431: f-string without any placeholders

Remove extraneous f prefix

(F541)


439-439: f-string without any placeholders

Remove extraneous f prefix

(F541)


448-448: Do not catch blind exception: Exception

(BLE001)


453-453: f-string without any placeholders

Remove extraneous f prefix

(F541)


458-458: f-string without any placeholders

Remove extraneous f prefix

(F541)


463-463: f-string without any placeholders

Remove extraneous f prefix

(F541)


471-471: f-string without any placeholders

Remove extraneous f prefix

(F541)


478-478: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


478-478: Avoid specifying long messages outside the exception class

(TRY003)


494-494: f-string without any placeholders

Remove extraneous f prefix

(F541)


512-512: f-string without any placeholders

Remove extraneous f prefix

(F541)


515-515: Do not catch blind exception: Exception

(BLE001)


520-520: f-string without any placeholders

Remove extraneous f prefix

(F541)


526-526: f-string without any placeholders

Remove extraneous f prefix

(F541)


527-527: f-string without any placeholders

Remove extraneous f prefix

(F541)


529-529: f-string without any placeholders

Remove extraneous f prefix

(F541)


530-530: f-string without any placeholders

Remove extraneous f prefix

(F541)


532-532: f-string without any placeholders

Remove extraneous f prefix

(F541)


533-533: f-string without any placeholders

Remove extraneous f prefix

(F541)


535-535: f-string without any placeholders

Remove extraneous f prefix

(F541)


536-536: f-string without any placeholders

Remove extraneous f prefix

(F541)

ami/jobs/management/commands/update_stale_jobs.py

29-29: Unused method argument: args

(ARG002)


47-47: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: test
  • GitHub Check: Redirect rules
  • GitHub Check: Header rules
  • GitHub Check: Pages changed

Comment on lines +315 to +323
if status_changed:
job.refresh_from_db()
if dry_run:
self.stdout.write(
self.style.WARNING(f" → Would change to: {job.status} (dry-run, not saved)")
)
else:
self.stdout.write(self.style.SUCCESS(f" → Changed to: {job.status}"))
updated += 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dry-run reporting prints the wrong target status

When --check-all runs in dry-run mode, job.check_status(save=False, …) mutates the in-memory job.status to the prospective value. Immediately refreshing from the database resets it before logging, so the output always shows the original status and the dry-run preview is misleading. Capture the new status before refreshing (or defer the refresh until after printing) so the CLI reports what would actually change.

                 if status_changed:
-                    job.refresh_from_db()
-                    if dry_run:
-                        self.stdout.write(
-                            self.style.WARNING(f"  → Would change to: {job.status} (dry-run, not saved)")
-                        )
-                    else:
-                        self.stdout.write(self.style.SUCCESS(f"  → Changed to: {job.status}"))
+                    new_status = job.status
+                    if dry_run:
+                        self.stdout.write(
+                            self.style.WARNING(f"  → Would change to: {new_status} (dry-run, not saved)")
+                        )
+                        job.refresh_from_db()
+                    else:
+                        job.refresh_from_db()
+                        self.stdout.write(self.style.SUCCESS(f"  → Changed to: {job.status}"))
                     updated += 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if status_changed:
job.refresh_from_db()
if dry_run:
self.stdout.write(
self.style.WARNING(f" → Would change to: {job.status} (dry-run, not saved)")
)
else:
self.stdout.write(self.style.SUCCESS(f" → Changed to: {job.status}"))
updated += 1
if status_changed:
new_status = job.status
if dry_run:
self.stdout.write(
self.style.WARNING(f" → Would change to: {new_status} (dry-run, not saved)")
)
job.refresh_from_db()
else:
job.refresh_from_db()
self.stdout.write(self.style.SUCCESS(f" → Changed to: {job.status}"))
updated += 1
🤖 Prompt for AI Agents
In ami/jobs/management/commands/debug_jobs.py around lines 315 to 323, the
dry-run branch refreshes the job from DB before printing, which resets the
in-memory job.status and causes the CLI to show the original status instead of
the prospective one; capture the new_status returned or assigned by
job.check_status(save=False, ...) (or store job.status into a local variable
immediately after check_status) and use that saved value when printing the
dry-run message (or move job.refresh_from_db() to after the print) so the "Would
change to" output shows the intended target status.

Comment on lines +29 to +33
dependencies = [
("jobs", "0017_alter_job_logs_alter_job_progress"),
]

operations = [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Declare dependency on django_celery_beat before touching its tables.

This migration calls IntervalSchedule.objects.get_or_create() and PeriodicTask.objects.get_or_create(), but it has no dependency on django_celery_beat. On a fresh database Django can schedule this migration before the beat app’s migrations, so the underlying tables don’t exist yet and you get a ProgrammingError: relation "django_celery_beat_periodictask" does not exist. Add an explicit dependency so we only run after beat’s schema is in place.

 class Migration(migrations.Migration):
     dependencies = [
         ("jobs", "0017_alter_job_logs_alter_job_progress"),
+        ("django_celery_beat", "0001_initial"),  # adjust to the latest beat migration in this project
     ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dependencies = [
("jobs", "0017_alter_job_logs_alter_job_progress"),
]
operations = [
dependencies = [
("jobs", "0017_alter_job_logs_alter_job_progress"),
("django_celery_beat", "0001_initial"), # adjust to the latest beat migration in this project
]
operations = [
🧰 Tools
🪛 Ruff (0.14.2)

29-31: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents
In ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py around
lines 29 to 33, the migration uses IntervalSchedule.objects.get_or_create() and
PeriodicTask.objects.get_or_create() but lacks a dependency on
django_celery_beat; add an explicit dependency tuple to the dependencies list
(for example ("django_celery_beat", "0001_initial")) so this migration runs only
after django_celery_beat’s migrations have created the required tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure that a processing job's status is always updated correctly

2 participants