Skip to content

OOM do not return 137 exit code in K8s executor #6436

@jorgee

Description

@jorgee

Bug report

Expected behavior and actual behavior

When executing a task that produces a container OOM error, it returns the following error:

Process UseMem (1) terminated for an unknown reason -- Likely it has been terminated by the external system

This problem disables the possibility of retrying OOM errors. As there is no exit code, users can't retry checking if exitcode is 137.

Steps to reproduce the problem

Run a pipeline with a task exahausting the memory

nextflow run 'https://github.com/robsyme/nf-test' -r mem-testing

Program output

In the log file, we can see the error is produced because the .exitcode file is not generated.

In other executors like AWS Batch exit code is first got from API status and fallback to read .exitfile in case of not able to get from the API.

Environment

  • Nextflow version: 25.04.6
  • Java version: [?]
  • Operating system: [macOS, Linux, etc]
  • Bash version: (use the command $SHELL --version)

Additional context

(Add any other context about the problem here)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions