Update status of disconnected and stale jobs #981

mihow · 2025-10-07T01:40:15Z

Summary

Introduces a new periodic status check for all incomplete jobs.

Fixes: #721

List of Changes

Detailed Description

How to Test the Changes

Checklist

I have tested these changes appropriately.
I have added and/or modified relevant tests.
I updated relevant documentation or comments.
I have verified that this PR follows the project's coding standards.
Any dependent changes have already been merged to main.

Summary by CodeRabbit

Release Notes

New Features
- Added debugging utilities for troubleshooting job and task issues.
- Introduced automatic periodic checks of unfinished jobs every 3 minutes to maintain consistency.
Bug Fixes
- Improved concurrent job update handling with enhanced safety mechanisms.
- Better detection and automatic retry of disappeared tasks.
Tests
- Added comprehensive test coverage for job status checking and concurrent operations.

netlify · 2025-10-07T01:40:29Z

✅ Deploy Preview for antenna-preview ready!

Name	Link
🔨 Latest commit	`3917882`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-preview/deploys/6903e12705cb520008cadacb
😎 Deploy Preview	https://deploy-preview-981--antenna-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
Lighthouse	1 paths audited Performance: 29 (🔴 down 2 from production) Accessibility: 80 (no change from production) Best Practices: 100 (no change from production) SEO: 92 (no change from production) PWA: 80 (no change from production) View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot

Pull Request Overview

This PR implements a comprehensive job status monitoring system to detect and handle disconnected and stale jobs automatically. The system introduces periodic status checking with configurable timeouts and automatic retry mechanisms for failed tasks.

Adds a new check_status() method to the Job model that validates job state against Celery task status
Implements a periodic task (check_unfinished_jobs) that runs every 3 minutes to monitor all incomplete jobs
Includes extensive test coverage for various job failure scenarios including disappeared tasks, stale jobs, and timeout conditions

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
ami/jobs/models.py	Core implementation of job status checking with timeout handling and auto-retry logic
ami/jobs/tasks.py	Periodic task for monitoring unfinished jobs with cache-based locking
ami/jobs/tests.py	Comprehensive test suite covering all status checking scenarios
ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py	Database migration adding last_checked_at field and periodic task setup
ami/jobs/management/commands/update_stale_jobs.py	Updated management command to use new status checking system
ami/jobs/management/commands/debug_jobs.py	New debugging utility for testing job status scenarios
setup.cfg	Flake8 configuration to ignore F541 warnings
.vscode/settings.json	VS Code settings update for F541 ignore

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-07T01:41:32Z

ami/jobs/models.py

+                # Only update the fields we changed to avoid concurrency issues
+                update_fields = ["last_checked_at"]
+                if status_changed:
+                    update_fields.extend(["status", "progress", "finished_at"])
+                self.save(update_fields=update_fields, update_progress=False)


[nitpick] The comment mentions avoiding concurrency issues but the code pattern is repeated multiple times. Consider extracting this logic into a helper method to reduce duplication and ensure consistent handling of partial field updates.

Copilot · 2025-10-07T01:41:33Z

ami/jobs/tasks.py

+            unfinished_jobs = unfinished_jobs[:MAX_JOBS_PER_RUN]
+
+        # Only check jobs that haven't been checked recently
+        now = datetime.datetime.now()


Using datetime.datetime.now() instead of timezone.now() may cause timezone-related issues. Since Django's timezone.now() is already imported elsewhere in the codebase, consider using it here for consistency.

Copilot · 2025-10-07T01:41:33Z

ami/jobs/models.py

+        STUCK_PENDING_NO_WORKERS_TIMEOUT_SECONDS = 3600  # 1 hour - max time in PENDING without workers
+        PENDING_LOG_INTERVAL_SECONDS = 300  # 5 minutes - how often to log waiting messages
+
+        now = datetime.datetime.now()


Using datetime.datetime.now() instead of Django's timezone.now() may cause timezone-related issues, especially when comparing with database timestamps that are timezone-aware.

Copilot · 2025-10-07T01:41:33Z

ami/jobs/models.py

+        # Configuration thresholds (TODO: make these configurable via settings)
+        NO_TASK_ID_TIMEOUT_SECONDS = 300  # 5 minutes - time to wait for task_id before marking as failed
+        DISAPPEARED_TASK_RETRY_THRESHOLD_SECONDS = 300  # 5 minutes - auto-retry if task disappeared within this time
+        MAX_JOB_RUNTIME_SECONDS = 2 * 24 * 60 * 60  # 2 days - max time a job can run before being marked stale
+        STUCK_PENDING_TIMEOUT_SECONDS = 600  # 10 minutes - max time in PENDING with workers available
+        STUCK_PENDING_NO_WORKERS_TIMEOUT_SECONDS = 3600  # 1 hour - max time in PENDING without workers
+        PENDING_LOG_INTERVAL_SECONDS = 300  # 5 minutes - how often to log waiting messages


[nitpick] These configuration constants are hardcoded in the method. Consider moving them to Django settings or class-level constants to make them more maintainable and testable.

Copilot · 2025-10-07T01:41:33Z

ami/jobs/tasks.py

+    # Configuration thresholds (TODO: make these configurable via settings)
+    LOCK_TIMEOUT_SECONDS = 300  # 5 minutes - how long the lock is held
+    MAX_JOBS_PER_RUN = 100  # Maximum number of jobs to check in one run
+    MIN_CHECK_INTERVAL_MINUTES = 2  # Minimum time between checks for the same job


[nitpick] Similar to the job model, these configuration constants should be moved to Django settings to make them configurable without code changes.

mihow · 2025-10-10T04:35:06Z

@mihow

Add some filters to the jobs view in the Django admin to help troubleshoot all jobs as an admin
Add healthcheck to the celeryworker so that it gets restarted when stuck

…atus-checks

coderabbitai · 2025-10-30T22:05:34Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This pull request introduces a comprehensive job status reconciliation system to resolve issues where job statuses remain outdated or "stuck" in STARTED state. It adds a periodic Celery task that continuously checks unfinished jobs against their actual Celery task states, implements atomic row-level locking for safe concurrent updates, adds a debugging management command, and includes extensive tests for status checking and concurrency scenarios.

Changes

Cohort / File(s)	Summary
Core Status Reconciliation `ami/jobs/models.py`, `ami/jobs/tasks.py`, `ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py`	Introduces `check_unfinished_jobs()` periodic task with cache-based locking to periodically re-check and reconcile job statuses against Celery task states. Adds `last_checked_at` field to Job model. Implements atomic job update context manager (`atomic_job_update`) and Celery worker availability checks. Enhances Job.save() with optional row-level locking to prevent concurrent update race conditions. Extends JobLogHandler with atomic logging writes.
Management Commands `ami/jobs/management/commands/debug_jobs.py`, `ami/jobs/management/commands/update_stale_jobs.py`	Introduces comprehensive debug_jobs command for testing job/Celery interactions (check, remove/corrupt tasks, list unfinished jobs, create test jobs). Updates update_stale_jobs to use new Job.check_status workflow and adds --no-retry flag for disabling automatic retry logic.
Testing `ami/jobs/tests.py`	Adds extensive TestJobStatusChecking suite covering status reconciliation with various Celery states, locking behavior, concurrent updates, log atomicity, pending-running ambiguity, disappeared task handling, and forced retry scenarios.

Sequence Diagram

sequenceDiagram
    participant Periodic as Periodic<br/>Task Scheduler
    participant CeleryTask as check_unfinished_jobs<br/>(Celery Task)
    participant Cache as Django Cache<br/>(Distributed Lock)
    participant JobModel as Job Model<br/>(DB + Locking)
    participant CeleryWorker as Celery AsyncResult<br/>(Remote Task State)

    Periodic->>CeleryTask: Trigger every 3 minutes
    CeleryTask->>Cache: Attempt to acquire lock
    alt Lock acquired
        Cache-->>CeleryTask: Lock successful
        CeleryTask->>JobModel: Query unfinished jobs<br/>(not recently checked)
        JobModel-->>CeleryTask: Job list
        loop For each Job
            CeleryTask->>JobModel: atomic_job_update(lock)
            CeleryTask->>CeleryWorker: Fetch current task state
            CeleryWorker-->>CeleryTask: Task status<br/>(PENDING/STARTED/SUCCESS/FAILED/etc.)
            CeleryTask->>JobModel: Reconcile & update status<br/>if state mismatched
            JobModel-->>CeleryTask: Updated (or no change)
            CeleryTask->>JobModel: Release lock
        end
        CeleryTask->>JobModel: Update last_checked_at
        CeleryTask->>Cache: Release distributed lock
    else Lock already held
        Cache-->>CeleryTask: Lock unavailable
        CeleryTask-->>Periodic: Skip (another run in progress)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

Concurrency and Locking (ami/jobs/models.py): The atomic_job_update context manager and row-level locking via select_for_update are critical for preventing race conditions. Review the lock timeout handling, fallback behavior on lock failure, and interaction with the save() method.
Periodic Task Cache Lock (ami/jobs/tasks.py): The distributed lock mechanism using Django cache must be verified to prevent concurrent task runs. Confirm lock acquisition, release, and timeout behavior under high concurrency.
Status Reconciliation Logic (ami/jobs/models.py, ami/jobs/tasks.py): The logic that reconciles job status with Celery AsyncResult state, handles disappeared tasks, and decides when to retry needs careful review for edge cases (e.g., pending-but-running ambiguity, stale timestamps).
Atomic Logging (ami/jobs/models.py): JobLogHandler.emit now uses atomic locking; verify that log writes are not lost and that the max log length cap is enforced without overwriting concurrent updates.
Field-Level Updates (ami/jobs/models.py, ami/jobs/tasks.py): Multiple save points now use update_fields to minimize lock duration; verify correctness of which fields are included and excluded in each context.
Test Coverage (ami/jobs/tests.py): The test suite is comprehensive; pay special attention to mocking of Celery AsyncResult states and verification of concurrent locking scenarios.

Poem

🐰 Hop, hop, hop! Our jobs were stuck in time,
STARTED forever—oh, what a crime!
Now periodic checks with locks so tight,
Reconcile statuses day and night.
Thump thump! No more stale, the status shines bright! 🌟

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description is significantly incomplete relative to the template requirements. While the Summary section provides a brief one-line overview and the Related Issues section correctly references #721, the List of Changes, Detailed Description, and How to Test the Changes sections are entirely empty. These are critical sections that should document the scope of modifications, the implementation approach, and verification steps. The absence of this content makes it difficult for reviewers to understand the full impact and validation of the changes without examining the code directly.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "Update status of disconnected and stale jobs" accurately reflects the primary objective of this changeset, which is to introduce a periodic status check mechanism for incomplete jobs and handle edge cases like disappeared Celery tasks. The title is concise, specific, and directly related to the main changes including the new periodic task, atomic job locking, and enhanced status reconciliation logic. The title clearly summarizes what a teammate would see as the core feature being delivered.
Linked Issues Check	✅ Passed	The PR changes comprehensively address the core objective of issue #721 to "Ensure that a processing job's status is always updated correctly." The implementation includes a new periodic Celery task (check_unfinished_jobs) that scans and re-checks status of incomplete jobs [tasks.py], a new Job.check_status workflow that reconciles job status with Celery task state and handles disappeared/stale tasks [models.py], an updated update_stale_jobs command leveraging this workflow [update_stale_jobs.py], row-level locking to prevent concurrent update race conditions, and extensive test coverage validating status checking behavior across multiple scenarios. These changes directly address the reported issue where jobs incorrectly retained "STARTED" status despite errors or completion.
Out of Scope Changes Check	✅ Passed	All substantive code changes align with the requirements of issue #721. The core changes (periodic task, atomic locking, status reconciliation, migration) directly implement the solution. The new debug_jobs.py management command, while not strictly required to fix the core issue, provides supportive tooling for testing and troubleshooting job status interactions and Celery tasks, which relates to the overall objective of improving job status reliability and debuggability. The comprehensive test suite validates the new functionality and is appropriate for the feature scope.
Docstring Coverage	✅ Passed	Docstring coverage is 81.25% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/job-status-checks

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

ami/jobs/models.py (1)
1335-1359: Replace naive datetime.now() with timezone.now() in check_status.

We’re still calling datetime.datetime.now() here, so now is naive. The moment a job has scheduled_at/last_checked_at populated via timezone.now() (the normal path), the subtraction inside _check_missing_task_id or the “min check interval” logic will raise TypeError: can't subtract offset-naive and offset-aware datetimes. That makes the new periodic checker blow up after the first successful update. Please switch to django.utils.timezone.now() (and propagate it through the helpers) so we always compare like-with-like—this is the blocker that Copilot flagged earlier, and it’s still unresolved. Based on learnings.
-        now = datetime.datetime.now()
+        from django.utils import timezone
+        now = timezone.now()
ami/jobs/tasks.py (1)
115-126: Use timezone-aware timestamps in check_unfinished_jobs.

datetime.datetime.now() returns a naive datetime, but Job.last_checked_at and Job.scheduled_at are stored as timezone-aware values (we set them via timezone.now() all over the codebase). On the second invocation of this task the subtraction now - job.last_checked_at will hit a TypeError: can't subtract offset-naive and offset-aware datetimes, crashing the periodic job. Switch to django.utils.timezone.now() (and reuse it everywhere in this task) so the arithmetic stays compatible with our model fields.
@@
-    import datetime
+    import datetime
+    from django.utils import timezone
@@
-        now = datetime.datetime.now()
+        now = timezone.now()

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3b9711 and 3917882.

📒 Files selected for processing (6)

ami/jobs/management/commands/debug_jobs.py (1 hunks)
ami/jobs/management/commands/update_stale_jobs.py (2 hunks)
ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py (1 hunks)
ami/jobs/models.py (24 hunks)
ami/jobs/tasks.py (3 hunks)
ami/jobs/tests.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (5)

ami/jobs/models.py (2)

ami/jobs/views.py (1)

retry (98-109)

ami/ml/post_processing/base.py (1)

update_progress (57-68)

ami/jobs/tests.py (2)

ami/jobs/models.py (8)

Job (875-1641)

check_status (1320-1409)

JobState (140-194)

atomic_job_update (91-137)

save (1518-1587)

update_progress (1439-1463)

logger (1626-1635)

JobLogHandler (386-429)

ami/jobs/tasks.py (1)

check_unfinished_jobs (72-157)

ami/jobs/management/commands/debug_jobs.py (1)

ami/jobs/models.py (7)

JobState (140-194)

MLJob (456-658)

check_status (1320-1409)

running_states (185-186)

final_states (189-190)

failed_states (193-194)

enqueue (966-983)

ami/jobs/tasks.py (1)

ami/jobs/models.py (6)

logger (1626-1635)

save (1518-1587)

Job (875-1641)

JobState (140-194)

running_states (185-186)

check_status (1320-1409)

ami/jobs/management/commands/update_stale_jobs.py (1)

ami/jobs/models.py (5)

Job (875-1641)

JobState (140-194)

running_states (185-186)

check_status (1320-1409)

save (1518-1587)

🪛 Ruff (0.14.2)

ami/jobs/models.py

65-65: Consider moving this statement to an else block

(TRY300)

66-66: Do not catch blind exception: Exception

(BLE001)

73-73: Unused function argument: timestamp

(ARG001)

394-394: Comment contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF003)

1073-1073: Unused method argument: now

(ARG002)

1103-1103: Do not catch blind exception: Exception

(BLE001)

1131-1131: Unused method argument: task

(ARG002)

1172-1172: Consider moving this statement to an else block

(TRY300)

1173-1173: Do not catch blind exception: Exception

(BLE001)

1174-1174: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

1243-1243: Do not catch blind exception: Exception

(BLE001)

1244-1244: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

1369-1369: Do not catch blind exception: Exception

(BLE001)

1400-1400: Do not catch blind exception: Exception

(BLE001)

1401-1401: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

1509-1509: Do not catch blind exception: Exception

(BLE001)

1576-1576: Do not catch blind exception: Exception

(BLE001)

1577-1577: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

ami/jobs/tests.py

388-388: Unused method argument: mock_job_objects

(ARG002)

ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py

5-5: Unused function argument: apps

(ARG001)

5-5: Unused function argument: schema_editor

(ARG001)

23-23: Unused function argument: apps

(ARG001)

23-23: Unused function argument: schema_editor

(ARG001)

29-31: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

33-44: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

ami/jobs/management/commands/debug_jobs.py

119-119: Unused method argument: args

(ARG002)

163-163: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

163-163: Avoid specifying long messages outside the exception class

(TRY003)

190-190: Do not catch blind exception: Exception

(BLE001)

207-207: f-string without any placeholders

Remove extraneous f prefix

(F541)

221-221: Avoid specifying long messages outside the exception class

(TRY003)

276-276: f-string without any placeholders

Remove extraneous f prefix

(F541)

327-327: Do not catch blind exception: Exception

(BLE001)

407-407: f-string without any placeholders

Remove extraneous f prefix

(F541)

413-413: f-string without any placeholders

Remove extraneous f prefix

(F541)

431-431: f-string without any placeholders

Remove extraneous f prefix

(F541)

439-439: f-string without any placeholders

Remove extraneous f prefix

(F541)

448-448: Do not catch blind exception: Exception

(BLE001)

453-453: f-string without any placeholders

Remove extraneous f prefix

(F541)

458-458: f-string without any placeholders

Remove extraneous f prefix

(F541)

463-463: f-string without any placeholders

Remove extraneous f prefix

(F541)

471-471: f-string without any placeholders

Remove extraneous f prefix

(F541)

478-478: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

478-478: Avoid specifying long messages outside the exception class

(TRY003)

494-494: f-string without any placeholders

Remove extraneous f prefix

(F541)

512-512: f-string without any placeholders

Remove extraneous f prefix

(F541)

515-515: Do not catch blind exception: Exception

(BLE001)

520-520: f-string without any placeholders

Remove extraneous f prefix

(F541)

526-526: f-string without any placeholders

Remove extraneous f prefix

(F541)

527-527: f-string without any placeholders

Remove extraneous f prefix

(F541)

529-529: f-string without any placeholders

Remove extraneous f prefix

(F541)

530-530: f-string without any placeholders

Remove extraneous f prefix

(F541)

532-532: f-string without any placeholders

Remove extraneous f prefix

(F541)

533-533: f-string without any placeholders

Remove extraneous f prefix

(F541)

535-535: f-string without any placeholders

Remove extraneous f prefix

(F541)

536-536: f-string without any placeholders

Remove extraneous f prefix

(F541)

ami/jobs/management/commands/update_stale_jobs.py

29-29: Unused method argument: args

(ARG002)

47-47: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: test
GitHub Check: Redirect rules
GitHub Check: Header rules
GitHub Check: Pages changed

coderabbitai · 2025-10-30T22:11:23Z

ami/jobs/management/commands/debug_jobs.py

+                if status_changed:
+                    job.refresh_from_db()
+                    if dry_run:
+                        self.stdout.write(
+                            self.style.WARNING(f"  → Would change to: {job.status} (dry-run, not saved)")
+                        )
+                    else:
+                        self.stdout.write(self.style.SUCCESS(f"  → Changed to: {job.status}"))
+                    updated += 1


⚠️ Potential issue | 🟠 Major

Dry-run reporting prints the wrong target status

When --check-all runs in dry-run mode, job.check_status(save=False, …) mutates the in-memory job.status to the prospective value. Immediately refreshing from the database resets it before logging, so the output always shows the original status and the dry-run preview is misleading. Capture the new status before refreshing (or defer the refresh until after printing) so the CLI reports what would actually change.

if status_changed: - job.refresh_from_db() - if dry_run: - self.stdout.write( - self.style.WARNING(f" → Would change to: {job.status} (dry-run, not saved)") - ) - else: - self.stdout.write(self.style.SUCCESS(f" → Changed to: {job.status}")) + new_status = job.status + if dry_run: + self.stdout.write( + self.style.WARNING(f" → Would change to: {new_status} (dry-run, not saved)") + ) + job.refresh_from_db() + else: + job.refresh_from_db() + self.stdout.write(self.style.SUCCESS(f" → Changed to: {job.status}")) updated += 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if status_changed:

job.refresh_from_db()

if dry_run:

self.stdout.write(

self.style.WARNING(f" → Would change to: {job.status} (dry-run, not saved)")

)

else:

self.stdout.write(self.style.SUCCESS(f" → Changed to: {job.status}"))

updated += 1

if status_changed:

new_status = job.status

if dry_run:

self.stdout.write(

self.style.WARNING(f" → Would change to: {new_status} (dry-run, not saved)")

)

job.refresh_from_db()

else:

job.refresh_from_db()

self.stdout.write(self.style.SUCCESS(f" → Changed to: {job.status}"))

updated += 1

🤖 Prompt for AI Agents

In ami/jobs/management/commands/debug_jobs.py around lines 315 to 323, the dry-run branch refreshes the job from DB before printing, which resets the in-memory job.status and causes the CLI to show the original status instead of the prospective one; capture the new_status returned or assigned by job.check_status(save=False, ...) (or store job.status into a local variable immediately after check_status) and use that saved value when printing the dry-run message (or move job.refresh_from_db() to after the print) so the "Would change to" output shows the intended target status.

coderabbitai · 2025-10-30T22:11:24Z

ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py

+    dependencies = [
+        ("jobs", "0017_alter_job_logs_alter_job_progress"),
+    ]
+
+    operations = [


⚠️ Potential issue | 🔴 Critical

Declare dependency on django_celery_beat before touching its tables.

This migration calls IntervalSchedule.objects.get_or_create() and PeriodicTask.objects.get_or_create(), but it has no dependency on django_celery_beat. On a fresh database Django can schedule this migration before the beat app’s migrations, so the underlying tables don’t exist yet and you get a ProgrammingError: relation "django_celery_beat_periodictask" does not exist. Add an explicit dependency so we only run after beat’s schema is in place.

class Migration(migrations.Migration): dependencies = [ ("jobs", "0017_alter_job_logs_alter_job_progress"), + ("django_celery_beat", "0001_initial"), # adjust to the latest beat migration in this project ]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

dependencies = [

("jobs", "0017_alter_job_logs_alter_job_progress"),

]

operations = [

dependencies = [

("jobs", "0017_alter_job_logs_alter_job_progress"),

("django_celery_beat", "0001_initial"), # adjust to the latest beat migration in this project

]

operations = [

🧰 Tools

🪛 Ruff (0.14.2)

29-31: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents

In ami/jobs/migrations/0018_add_last_checked_at_and_periodic_task.py around lines 29 to 33, the migration uses IntervalSchedule.objects.get_or_create() and PeriodicTask.objects.get_or_create() but lacks a dependency on django_celery_beat; add an explicit dependency tuple to the dependencies list (for example ("django_celery_beat", "0001_initial")) so this migration runs only after django_celery_beat’s migrations have created the required tables.

mihow added 2 commits October 6, 2025 18:09

feat: periodic task for checking status of all jobs

1b07910

feat: add tests and management command for debugging lost jobs

b92bfd2

Copilot AI review requested due to automatic review settings October 7, 2025 01:40

Copilot AI reviewed Oct 7, 2025

View reviewed changes

mihow added 6 commits October 12, 2025 07:41

feat: use cached status of processing service before sending images

fb6fdc6

feat: add retry mechanism for processing service checks

5c635c4

feat: cleanup job status checks, check for resurrected

100d4da

feat: try to show more details about 500 errors

d3f5839

feat: don't overwrite logs & status in concurrent background tasks

198634d

fix: force progress to match status in certain cases

e9d3263

f-PLT assigned mihow Oct 30, 2025

mihow mentioned this pull request Oct 30, 2025

Add retry mechanism and improved error handling for processing services #1022

Merged

Merge branch 'main' of github.com:RolnickLab/antenna into feat/job-st…

3917882

…atus-checks

coderabbitai bot reviewed Oct 30, 2025

View reviewed changes

This was referenced Oct 31, 2025

Add periodic status check for incomplete jobs #1025

Open

[Draft] Don't overwrite logs & status in concurrent background tasks #1026

Draft

Update status of disconnected and stale jobs #981

Are you sure you want to change the base?

Update status of disconnected and stale jobs #981

Conversation

mihow commented Oct 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

List of Changes

Detailed Description

How to Test the Changes

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

netlify bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-preview ready!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

mihow commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mihow commented Oct 7, 2025 •

edited by coderabbitai bot

Loading

netlify bot commented Oct 7, 2025 •

edited

Loading

mihow commented Oct 10, 2025 •

edited

Loading

coderabbitai bot commented Oct 30, 2025 •

edited

Loading