Skip to content

Commit cb444d6

Browse files
committed
rfc21: add reattach job state
1 parent 379966b commit cb444d6

File tree

3 files changed

+130
-89
lines changed

3 files changed

+130
-89
lines changed

data/spec_21/states.dot

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ digraph states {
1212
DEPEND;
1313
PRIORITY;
1414
SCHED;
15-
RUN;
15+
{rank=same; RUN; REATTACH;}
1616
CLEANUP;
1717
}
1818

@@ -25,6 +25,9 @@ digraph states {
2525

2626
SCHED -> PRIORITY [label="flux-restart"]
2727

28+
RUN -> REATTACH [xlabel="reattach"]
29+
REATTACH -> RUN [xlabel="attached"]
30+
2831
edge [weight=0 color="red"];
2932

3033
DEPEND -> CLEANUP [label="exception"];

data/spec_21/states.svg

Lines changed: 85 additions & 86 deletions
Loading

spec_21.rst

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,14 @@ RUN
110110
job shells have been started, and a ``finish`` event once all the job shells
111111
have exited. The state transitions to CLEANUP.
112112

113+
REATTACH
114+
The job was started, but the job manager has lost tracking to it
115+
due to an error (for example, a system crash). The job manager is
116+
attempting to reattach itself to the running job. A
117+
``reattach`` event is logged to indicate transition into this
118+
state. ``attached`` will be logged when the tracking has been
119+
reestablished and we can re-enter the RUN state.
120+
113121
CLEANUP
114122
The job has completed or an exception has occurred. Under normal termination,
115123
the job manager waits for notification from the exec service that job
@@ -133,10 +141,10 @@ PENDING
133141
The job is in DEPEND, PRIORITY, or SCHED states.
134142

135143
RUNNING
136-
The job is in RUN or CLEANUP states.
144+
The job is in RUN, REATTACH, or CLEANUP states.
137145

138146
ACTIVE
139-
The job is in DEPEND, PRIORITY, SCHED, RUN, or CLEANUP states.
147+
The job is in DEPEND, PRIORITY, SCHED, RUN, REATTACH, or CLEANUP states.
140148

141149

142150
Exceptions
@@ -391,6 +399,37 @@ status
391399
{"timestamp":1552594348.0,"name":"epilog-finish","context":{"description":"/usr/sbin/job-epilog.sh", "status":0}}
392400
393401
402+
Reattach Event
403+
^^^^^^^^^^^^^^
404+
405+
The job manager is attempting to reattach to a running job.
406+
407+
The following keys are OPTIONAL in the event context object:
408+
409+
id
410+
(long long) job ID to reattach to
411+
412+
Example:
413+
414+
.. code:: json
415+
416+
{"timestamp":1636747761.5495925,"name":"reattach","context":{"id":341835776000}}
417+
418+
419+
Attached Event
420+
^^^^^^^^^^^^^^
421+
422+
The job manager has re-connected to the job shells.
423+
424+
The context SHALL be empty.
425+
426+
Example:
427+
428+
.. code:: json
429+
430+
{"timestamp":1636747761.827836,"name":"reattached"}
431+
432+
394433
Free Event
395434
^^^^^^^^^^
396435

0 commit comments

Comments
 (0)