Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions docs/develop/python/failure-detection.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,108 @@ description: Learn how to set Workflow and Activity timeouts, retries, retry pol

This page shows how to do the following:

- [Raise and Handle Exceptions](#exception-handling)
- [Deliberately Fail Workflows](#workflow-failure)
- [Set Workflow Timeouts](#workflow-timeouts)
- [Set Workflow Retries](#workflow-retries)
- [Set Activity Timeouts](#activity-timeouts)
- [Set an Activity Retry Policy](#activity-retries)
- [Heartbeat an Activity](#activity-heartbeats)

## Raise and Handle Exceptions {#exception-handling}

In each Temporal SDK, error handling is implemented idiomatically, following the conventions of the language.
Temporal uses several different error classes internally — for example, [`CancelledError`](https://python.temporal.io/temporalio.exceptions.CancelledError.html) in the Python SDK, to handle a Workflow cancellation.
You should not raise or otherwise implement these manually, as they are tied to Temporal platform logic.

The one Temporal error class that you will typically raise deliberately is [`ApplicationError`](https://python.temporal.io/temporalio.exceptions.ApplicationError.html).
In fact, *any* other exceptions that are raised from your Python code in a Temporal Activity will be converted to an `ApplicationError` internally.
This way, an error's type, severity, and any additional details can be sent to the Temporal Service, indexed by the Web UI, and even serialized across language boundaries.

In other words, these two code samples do the same thing:

```python
class MyCustomError(Exception):
def __init__(self, message, error_code):
super().__init__(message)
self.error_code = error_code

def __str__(self):
return f"{self.message} (Error Code: {self.error_code})"

@activity.defn
async def my_activity(input: MyActivityInput):
try:
# Your activity logic goes here
except Exception as e:
raise MyCustomError(
f"Error encountered on attempt {attempt}",
) from e
```

```python
from temporalio.exceptions import ApplicationError

@activity.defn
async def my_activity(input: MyActivityInput):
try:
# Your activity logic goes here
except Exception as e:
raise ApplicationError(
type="MyCustomError",
message=f"Error encountered on attempt {attempt}",
) from e
```

Depending on your implementation, you may decide to use either method.
One reason to use the Temporal `ApplicationError` class is because it allows you to set an additional `non_retryable` parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One reason to use the Temporal `ApplicationError` class is because it allows you to set an additional `non_retryable` parameter.
An additional reason to use the Temporal `ApplicationError` class is because it allows you to set an additional `non_retryable` parameter.

Suggested this change since you already gave one valid reason; so this is another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, did I already give another reason?

This way, you can decide whether an error should not be retried automatically by Temporal.
This can be useful for deliberately failing a Workflow due to bad input data, rather than waiting for a timeout to elapse:

```python
from temporalio.exceptions import ApplicationError

@activity.defn
async def my_activity(input: MyActivityInput):
try:
# Your activity logic goes here
except Exception as e:
raise ApplicationError(
type="MyNonRetryableError",
message=f"Error encountered on attempt {attempt}",
non_retryable=True,
) from e
```

You can alternately specify a list of errors that are non-retryable in your Activity [Retry Policy](#activity-retries).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick code sample for this here would be nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is anchored down to the rest of the same doc, I didn't want to overload it here.


## Failing Workflows {#workflow-failure}

One of the core design principles of Temporal is that an Activity Failure will never directly cause a Workflow Failure — a Workflow should never return as Failed unless deliberately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One of the core design principles of Temporal is that an Activity Failure will never directly cause a Workflow Failure — a Workflow should never return as Failed unless deliberately.
One of the core design principles of Temporal is that an Activity Failure will never directly cause a Workflow Failure — a Workflow should never return as Failed unless it is deliberately returned that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm. Good pointing out this wasn't clear, can we just change "deliberately" to "intentional"? Fewer words.

The default retry policy associated with Temporal Activities is to retry them until reaching a certain timeout threshold.
Activities will not actually *return* a failure to your Workflow until this condition or another non-retryable condition is met.
At this point, you can decide how to handle an error returned by your Activity the way you would in any other program.
For example, you could implement a [Saga Pattern](https://learn.temporal.io/tutorials/python/trip-booking-app/) that uses `try` and `except` blocks to "unwind" some of the steps your Workflow has performed up to the point of Activity Failure.

**You will only fail a Workflow by manually raising an `ApplicationError` from the Workflow code.**
You could do this in response to an Activity Failure, if the failure of that Activity means that your Workflow should not continue:

```python
try:
credit_card_confirmation = await workflow.execute_activity_method()
except ActivityError as e:
workflow.logger.error(f"Unable to process credit card {e.message}")
raise ApplicationError(
"Unable to process credit card", "CreditCardProcessingError"
)
```

This works differently in a Workflow than raising exceptions from Activities.
In an Activity, any Python exceptions or custom exceptions are converted to a Temporal `ApplicationError`.
In a Workflow, any exceptions that are raised other than an explicit Temporal `ApplicationError` will only fail that particular [Workflow Task](https://docs.temporal.io/tasks#workflow-task-execution) and be retried.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to specifically mention:

  • the difference between a Workflow Task Failure vs Workflow Execution Failure. I know you're doing this here, but calling it Workflow Task Failure might help people understand this error when it appears in the Web UI.
  • Also that Workflow Task Failures retry by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal bias, but imo I think trying to explain "Workflow Task vs Workflow Execution" gets really into the weeds (it's my least favorite aspect of the actual course right now), and I deliberately tried to avoid that here. A Workflow Task is a very marginal aspect of most people's understanding of Temporal, I want to keep the emphasis on "retry" vs "return".

This includes any typical Python runtime errors like a `NameError` or a `TypeError` that are raised automatically.
These errors are treated as bugs that can be corrected with a fixed deployment, rather than a reason for a Temporal Workflow Execution to return unexpectedly.

## Workflow timeouts {#workflow-timeouts}

**How to set Workflow timeouts using the Temporal Python SDK**
Expand Down