-
Notifications
You must be signed in to change notification settings - Fork 81
Open
Description
- Expose document explaining how heartbeats are used to mark runs and tasks as failed. This document can be at metaflow.org and in the READMEs of this repository. This will be in addition to https://github.com/Netflix/metaflow-service/blob/master/services/ui_backend_service/docs/environment.md#heartbeat-intervals
- When a task or run fails because of a missing heartbeat, show that fact in MFGUI.
- Have a default minimum heartbeat and a maximum heartbeat time. If the task/run misses the minimum heartbeat, show it as "pending" and only show it as "failed" when it misses the maximum heartbeat time. This functionality will have to consider resumes and multiple attempts.
The reason for this issue is that some runs/tasks are being marked as "failed" when they have not started yet, and some runs/tasks are still marked as "running" when they have failed but not reached the heartbeat threshold yet.
Metadata
Metadata
Assignees
Labels
No labels