Gracefully handle critical bugs #549
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, if by some chance a critical bug is reached within the
process()
function, the whole operation panics, and it will continue to panic on the next invocation, which would halt all transaction scheduling.This is a problem because even though a critical issue was detected, we will halt execution of transactions that don't face this critical issue. Instead, we can emit a critical issue event, which can be inspected and fixed while still preserving transaction scheduling functionality.
We need to set alerts to such events being emitted.
Furthermore, this improves issues coming from scheduling. Instead of panicking over a critical issue during
schedule()
invocation, which would only prevent scheduling that single transaction (which is not as bad as the above example), the panic would still only be received by the user and likely ignored, even though the system has a critical bug. Emitting an event instead would allow us to monitor for such critical issues and fix them.