[BUG] Success Rate Incorrect - says 0%

### Version of Eraser

v1.3.1

### Expected Behavior

I have multiple clusters running Eraser w/ v1.3.1 and we've set our success rate pretty low (80%), down from 95% because we couldn't get Eraser to mark the ImageJob as successful. Looking at the logs, it seems that there's a bug in the success rate math that causes Eraser to think it's 0% successful when one or two pods fail in a strange way.

```
 {"level":"info","ts":1755535148.5651102,"logger":"controller","msg":"Marking job as failed","process":"imagejob-controller","success ratio":0.8,"actual ratio":0}
```

In reality, the job had 272 successful nodes and one node that causes the pod to reach an `outOfCpu` state. This happened on other clusters with nodes w/ memory pressure instead of cpu pressure.

Expected behavior: the ImageJob is marked as successful (as it's currently >99% successful) and the pods are cleaned up (we have `.runtimeConfig.manager.imageJob.cleanup.delayOnSuccess` set to `0s`).

### Actual Behavior

Actual behavior: ImageJob fails w/ 0% success rate and pods aren't cleaned up. (we have `.runtimeConfig.manager.imageJob.cleanup.delayOnFailure` set to `5h`).

### Steps To Reproduce

K8s v1.32.6
Eraser helm chart v1.3.1

helm values:
```yaml
runtimeConfig:
  manager:
    nodeFilter:
      type: exclude
      selectors:
        - eraser.sh/exclude-node # exclude nodes with this label
    scheduling:
      repeatInterval: "6h" # default is 24h
    imageJob:
      successRatio: 0.80 # 80% success ratio for image jobs to be considered 'successful'. Needs to be lower than 100% to account for cpu/memory pressure that causes the job to fail occasionally.
      cleanup:
        delayOnSuccess: "0s" # clean up pods immediately after success
        delayOnFailure: "5h" # keep the pods around for 5 hours after failure to allow for investigation
```

Then get a node to have enough cpu/mem pressure to cause an imagejob pod to error with `outOfCpu` or `outOfMemory`.

### Are you willing to submit PRs to contribute to this bug fix?

- [ ] Yes, I am willing to implement it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Success Rate Incorrect - says 0% #1163

Version of Eraser

Expected Behavior

Actual Behavior

Steps To Reproduce

Are you willing to submit PRs to contribute to this bug fix?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Success Rate Incorrect - says 0% #1163

Description

Version of Eraser

Expected Behavior

Actual Behavior

Steps To Reproduce

Are you willing to submit PRs to contribute to this bug fix?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions