Skip to content

Commit c681eeb

Browse files
Adding files for Workflow Task Failures view (#4007)
* Adding files for Workflow Task Failures view * Update web-ui.mdx Fix whitespace * Update web-ui.mdx Edits from tech review --------- Co-authored-by: Milecia McG <[email protected]>
1 parent 4cb9bf4 commit c681eeb

File tree

3 files changed

+71
-4
lines changed

3 files changed

+71
-4
lines changed

docs/encyclopedia/detecting-workflow-failures.mdx

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ If you need to perform an action inside your Workflow after a specific period of
2727
- [Workflow Run Timeout](#workflow-run-timeout)
2828
- [Workflow Task Timeout](#workflow-task-timeout)
2929

30-
## Workflow Execution Timeout? {#workflow-execution-timeout}
30+
## Workflow Execution Timeout {#workflow-execution-timeout}
3131

3232
**What is a Workflow Execution Timeout in Temporal?**
3333

@@ -51,7 +51,7 @@ If this timeout is reached, the Workflow Execution changes to a Timed Out status
5151
This timeout is different from the [Workflow Run Timeout](#workflow-run-timeout).
5252
This timeout is most commonly used for stopping the execution of a [Temporal Cron Job](/cron-job) after a certain amount of time has passed.
5353

54-
## Workflow Run Timeout? {#workflow-run-timeout}
54+
## Workflow Run Timeout {#workflow-run-timeout}
5555

5656
**What is a Workflow Run Timeout in Temporal?**
5757

@@ -79,7 +79,7 @@ This timeout is most commonly used to limit the execution time of a single [Temp
7979

8080
If the Workflow Run Timeout is reached, the Workflow Execution will be Timed Out.
8181

82-
## Workflow Task Timeout? {#workflow-task-timeout}
82+
## Workflow Task Timeout {#workflow-task-timeout}
8383

8484
**What is a Workflow Task Timeout in Temporal?**
8585

@@ -104,3 +104,23 @@ This Timeout is primarily available to recognize whether a Worker has gone down
104104
This timeout is primarily available to recognize whether a Worker has gone down so that the Workflow Execution can be recovered on a different Worker.
105105
The main reason for increasing the default value is to accommodate a Workflow Execution that has an extensive Workflow Execution History, requiring more than 10 seconds for the Worker to load.
106106
It's worth mentioning that although you can extend the timeout up to the maximum value of 120 seconds, it's not recommended to move beyond the default value.
107+
108+
## Detecting Workflow Task Failures
109+
110+
Use the `TemporalReportedProblems` Search Attribute to detect Workflows with failed Workflow Tasks.
111+
A failed Workflow Task does not cause the Workflow to fail. Some Tasks within a Workflow may be intended to fail.
112+
For example, a Workflow Task may check a remote data source for new messages. If there aren't any, the Task will fail as intended.
113+
If your Task has code to handle the failure, the Workflow will proceed.
114+
However, if your Workflow has a Task that fails and the failure is not handled, the Workflow will continue to run, but will not complete.
115+
Detecting Workflows in this state is a common troubleshooting issue.
116+
117+
To identify Workflows with Task failures, you can use the Temporal Web UI. See [Task Failures View](/web-ui/#task-failures-view) for more details.
118+
119+
You can also detect Workflows with Task failures by searching for the `TemporalReportedProblems` search attribute with your observability tools.
120+
121+
:::warning Activating Workflow Task Failure in AWS Namespaces
122+
123+
To enable the Task Failures View for a Namespace running on AWS, you need to update the Dynamic Config for that Namespace.
124+
See [Activating Task Failure View for AWS Namespaces](/web-ui/#activate-task-failures-view-for-aws).
125+
126+
:::

docs/encyclopedia/visibility/search-attributes.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ These Search Attributes are created when the initial index is created.
7373
| StateTransitionCount | Int | The number of times that Workflow Execution has persisted its state. Available only for closed Workflows. |
7474
| TaskQueue | Keyword | Task Queue used by Workflow Execution. |
7575
| TemporalChangeVersion | Keyword List | Stores change/version pairs if the GetVersion API is enabled. |
76+
| TemporalReportedProblems | Keyword List | Stores information about Workflow task failures. Formatted as `category=<category> cause=<cause>`.
7677
| TemporalScheduledStartTime | Datetime | The time that the Workflow is schedule to start according to the Schedule Spec. Can be manually triggered. Set on Schedules. |
7778
| TemporalScheduledById | Keyword | The Id of the Schedule that started the Workflow. |
7879
| TemporalSchedulePaused | Boolean | Indicates whether the Schedule has been paused. Set on Schedules. |

docs/web-ui.mdx

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ For start time and end time, users can set their preferred date and time format
5454
Select a Workflow Execution to view the Workflow Execution's History, Workers, Relationships, pending Activities and
5555
Nexus Operations, Queries, and Metadata.
5656

57-
### Saved Views
57+
### Saved Views {#saved-views}
5858

5959
Saved Views let you save and reuse your frequently used visibility queries in the Temporal Web UI. Instead of recreating
6060
complex filters every time, you can save them once and apply them with a single click.
@@ -137,6 +137,52 @@ Saved Views that use relative times will be shared with absolute time.
137137

138138
:::
139139

140+
## Task Failures View {#task-failures-view}
141+
142+
The Task Failures view is a pre-defined Saved View that displays Workflows that have a Workflow Task failure.
143+
These Workflows are still running, but one of their Tasks has failed or timed out.
144+
145+
The details of the Task Failures View displays the Workflow's ID, the Run ID, and the Workflow type.
146+
Clicking on any of the links in the details opens the Workflow page for that Workflow.
147+
On this page, you will find more information about the Task that failed and remaining pending tasks.
148+
You can also cancel the Workflow by clicking the Request Cancellation button on this page.
149+
150+
### Activating Task Failure View for AWS Namespaces {#activate-task-failures-view-for-aws}
151+
152+
To enable the Task Failures View for a Namespace running on AWS, you need to update the Dynamic Config first.
153+
To turn the feature on for a Namespace, use the following command:
154+
155+
``` command
156+
omni ocld dynamic-config namespace patch --namespace "$NS" --json '{
157+
"system.numConsecutiveWorkflowTaskProblemsToTriggerSearchAttribute": 5
158+
}'
159+
```
160+
161+
`$NS` is the name of the Namespace where you want to set up Task Failures view.
162+
`numConsecutiveWorkflowTaskProblemsToTriggerSearchAttribute` is the number of consecutive Workflow Task Failures
163+
required to trigger the `TemporalReportedProblems` search attribute.
164+
The default value is 5. If adding this search attribute causes strain on the visibility system, consider increasing this number.
165+
166+
To turn off the feature for a Namespace, set `numConsecutiveWorkflowTaskProblemsToTriggerSearchAttribute` to 0.
167+
You can also deactivate the feature by removing the key:
168+
169+
``` command
170+
omni ocld dynamic-config namespace remove \
171+
-n "$NS" \
172+
--key "system.numConsecutiveWorkflowTaskProblemsToTriggerSearchAttribute"
173+
```
174+
175+
where `$NS` is the name of the Namespace for which you wish to deactivate the feature.
176+
177+
To determine which Namespaces in your fleet have the feature activated, use the following command:
178+
179+
``` command
180+
omni ocld dc search \
181+
--namespace \
182+
--key-regex 'system.numConsecutiveWorkflowTaskProblemsToTriggerSearchAttribute' \
183+
--all
184+
```
185+
140186
## History
141187

142188
A Workflow Execution History is a view of the [Events](/workflow-execution/event#event) and Event fields within the

0 commit comments

Comments
 (0)