Skip to content

ec2 ocf resource retry #33

@nasjomach

Description

@nasjomach

Concerns: cluster-glue/lib/plugins/stonith/external/ec2

Seems to me that there are no retry mechanism in the EC2 OCF script.
AWS EC2 API calls can be throttle if more than 10000 API request a seconds are made.
In this case the script would not report any status and consider the resource in a bad status ending up with the STONITH device getting stopped.

Performing a "resource cleanup" operation starts the STONITH again in operational state after such failures.

/var/log/messages
2021-09-16T16:02:04.751248+00:00 external/ec2(res_AWS_STONITH)[31700]: info: status check for is
<-- Missing instance status report after "is" keyword

2021-09-16T16:02:04.760725+00:00 external/ec2(res_AWS_STONITH)[31694]: WARN: Already fenced (Instance status = ). Aborting fence attempt.
2021-09-16T16:02:13.742017+00:00 external/ec2(res_AWS_STONITH)[32004]: ERROR: Operation status failed: 1

Maybe some kind of fault tolerance would be nice to have I guess.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions