Skip to content

[Bug] calling workflow.random().randint before workflow.uuid4 breaks replay with [TMPRL1100] Nondeterminism error #1109

@dpr-synth

Description

@dpr-synth

What are you really trying to do?

I have a long running workflow that uses workflow.uuid4 to set an ID for a child workflow, then sleeps. I modified the retry policy of an activity and set initial_interval with workflow.random().randint. The modified activity is called before the child workflow is started. I deployed the new workflow.

Describe the bug

All the running workflows failed with [TMPRL1100] Nondeterminism error after they woke up from the sleep. All the errors were complaining about the ID of the child workflow. Example error message:

[TMPRL1100] Nondeterminism error: Child workflow id of scheduled event 'child_workflow_6bfd6073-2403-413f-acc3-7090cf837d0a' does not match child workflow id of command 'child_workflow_96a033ac-6730-4f6d-abfd-60732403113f

If I make any other change in the retry policy, e.g., changing initial_interval from one number to another, I don't get the above error. It only happens, if a workflow.random() function call is added.

Minimal Reproduction

I managed to reproduce the issue with the simple workflow code below.

  1. Deploy the worker
  2. Start the workflow
  3. Uncomment workflow.random().randint
  4. Re-deploy the worker
  5. After workflow wakes up it dies with non-determinism error

I also got the same error with other random functions, e.g., workflow.random().uniform. However, just adding workflow.random() doesn't cause workflow fail with non-determinism.If workflow.uuid4 is called BEFORE workflow.random().randint, then the workflow doesn't fail with non-determinism after workflow.random().randint was added.

from datetime import timedelta

from temporalio import workflow


@workflow.defn
class ChildWorkflowUuid:
    @workflow.run
    async def run(self) -> str:
        return workflow.info().workflow_id


@workflow.defn
class WorkflowUuid:
    @workflow.run
    async def run(self) -> str:
        # IF CHILD ID IS SET HERE, THE WORKFLOW COMPLETES WITHOUT ANY ERRORS AFTER ADDING `workflow.random().randint(1, 5)`
        # child_workflow_id = str(workflow.uuid4())
        # UNCOMMENT `workflow.random().randint(1, 5)` WHILE WORKFLOW IS SLEEPING (just `workflow.random()` does not cause non-determinism error)
        # workflow.random().randint(1, 5)
        # IF CHILD ID IS SET HERE, WORKFLOW FAILS WITH
        # [TMPRL1100] Nondeterminism error: Child workflow id of scheduled event '0d4bdc56-4950-4478-94ad-076b61f06fb1' does not match child workflow id of command 'ab05243a-0d4b-4c56-8950-547854ad076b
        child_workflow_id = str(workflow.uuid4())
        child_workflow_uuid = await workflow.execute_child_workflow(
            ChildWorkflowUuid.run,
            id=child_workflow_id,
        )
        await workflow.sleep(timedelta(seconds=30))
        return child_workflow_uuid

Environment/Versions

  • OS and processor: M1 Mac Pro and x86 Linux
  • Temporal Version: 1.17.0
  • Are you using Docker or Kubernetes or building Temporal from source: no

Additional context

I am not sure if this is a bug or expected behaviour. I assumed that workflow.uuid4 seed only depends on the run ID, which should be unchanged after the workflow was modified.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions