Skip to content

Manually close deleted Comments and Downtimes#913

Open
yhabteab wants to merge 3 commits intomainfrom
fake-downtime-end-event
Open

Manually close deleted Comments and Downtimes#913
yhabteab wants to merge 3 commits intomainfrom
fake-downtime-end-event

Conversation

@yhabteab
Copy link
Member

@yhabteab yhabteab commented Mar 28, 2025

tests: Simulate vanishing Comments and Downtimes

Ensure that removed Comment and Downtime objects attached to Hosts,
vanishing after a config pack redeployment, are correctly closed by
Icinga DB.


icingadb: Manually close deleted Comments and Downtimes

It is possible for objects to vanish in Icinga 2, e.g., when a new
config pack is deployed as the Director does. Icinga DB already handles
this within its synchronization logic, explicitly deleting removed
objects.

However, these deleted objects might have open logical dependencies,
such as entries in the history tables. For example, only deleting
a Downtime object is not enough, as both its downtime_history and
sla_history_downtime entries needs to be updated as well.

In this example, these updates are required to not show the Downtimes
anymore, have a valid SLA calculation, and ensure a working downtime
history retention.

As Icinga DB does not receive the change from Icinga 2 over the usual
"icinga:history:stream:downtime" Redis stream, it must be reproduced or
mocked by Icinga DB itself. Now, Icinga DB is able to close the history
for both vanished Comment and Downtime objects itself.

Co-Authored-By: Yonas Habteab yonas.habteab@icinga.com


tests: Bump icinga-testing for Config Pack support

Icinga/icinga-testing#34


Fixes #910.

@lippserd
Copy link
Member

@oxzi Would you like to do the first round of reviews here?

Copy link
Member

@oxzi oxzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general and when excluding the open TODOs: LGTM, thanks!

I have spent a bit of a time testing this PR in different situations and it worked as expected. Btw, I have pushed a rebased version to the fake-downtime-end-event-review-oxzi branch, also including an initial draft for an integration test, 0414c2a.

I would say: Let's continue with this PR by also populating the other tables.

@yhabteab: If you currently do not have the time and want to hand this PR over, I am free to continue here.

Thanks again, looks good so far!

@oxzi oxzi added this to the 1.5.2 milestone Jan 15, 2026
@yhabteab
Copy link
Member Author

@yhabteab: If you currently do not have the time and want to hand this PR over, I am free to continue here.

I'm working on a completely different thing right now, so you can of course take over this :).

@oxzi oxzi self-assigned this Jan 15, 2026
@oxzi oxzi force-pushed the fake-downtime-end-event branch 3 times, most recently from b37318e to 0fd36fe Compare January 22, 2026 09:08
@oxzi oxzi changed the title Downtimes: mark their histories as cancelled when removed from conf file Manually close deleted Comments and Downtimes Jan 22, 2026
@oxzi oxzi marked this pull request as ready for review January 22, 2026 09:10
@oxzi
Copy link
Member

oxzi commented Jan 22, 2026

@lippserd: If you want to, feel free to start reviewing this PR now.

@oxzi oxzi marked this pull request as draft January 23, 2026 09:54
@oxzi oxzi force-pushed the fake-downtime-end-event branch from 0fd36fe to 2ca94db Compare January 23, 2026 10:26
@oxzi
Copy link
Member

oxzi commented Jan 23, 2026

Added a check against deleted objects which aren't even in the database. This happened during multiple bigger redeployments in my test setup.

@oxzi oxzi marked this pull request as ready for review January 23, 2026 10:27
@oxzi oxzi force-pushed the fake-downtime-end-event branch from 2ca94db to 7e9be66 Compare January 27, 2026 08:48
Copy link
Member

@Al2Klimov Al2Klimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. If I add and remove a comment/downtime, I get two additional events in the history tab.

Except if I stop Redis between addition and removal and restart Icinga 2 afterwards to clear eventually buffered history events. Or if I rm /var/lib/icinga2/api/packages/_api/4b444b3c-7063-4318-93f2-8b41cb6ba1ec/conf.d/*/*.conf and restart Icinga. Or if I simply use curl -fvku iw2:iw2 -X DELETE -H 'Accept: application/json' https://127.0.0.1:5665/v1/objects/comments. Then I only get the addition events, even with snapshot packages.

👎The runtime behavior doesn't even change with this PR. See curl command above.

👍In my other scenarios I indeed get removal events for both types.

@oxzi
Copy link
Member

oxzi commented Jan 27, 2026

Or if I simply use curl -fvku iw2:iw2 -X DELETE -H 'Accept: application/json' https://127.0.0.1:5665/v1/objects/comments. Then I only get the addition events, even with snapshot packages.

This is actually a loophole in Icinga 2. When directly deleting comment objects and not using the remove-comment API action, the object will be deleted, but Icinga 2 does not generate history.

https://github.com/Icinga/icinga2/blob/9bffe0616911d8ab0a8c589a6fc17eee6564bdf8/lib/icingadb/icingadb-objects.cpp#L2354-L2358

However, the change is being propagated as a runtime update, where Icinga DB acts upon.

icinga2   | [2026-01-27 16:03:40 +0000] information/HttpServerConnection: Request DELETE /v1/objects/comments (from [::ffff:172.18.0.1]:41148, user: root, agent: curl/8.17.0, status: OK) took total 0ms.
[...]
icingadb  | 2026-01-27T16:03:58.150Z    INFO    runtime-updates Deleted 2 Comment items
icingadb  | 2026-01-27T16:03:58.150Z    DEBUG   database        Executed "DELETE FROM \"comment\" WHERE id IN (?)" with 2 rows

While this is definitely something to look into, this is not really what this PR tries to address. The core are disappearing or vanishing objects, such as after a Director deployment.

@oxzi oxzi force-pushed the fake-downtime-end-event branch from 7e9be66 to 66d105e Compare January 27, 2026 16:16
@Al2Klimov Al2Klimov dismissed their stale review January 27, 2026 16:20

this is not really what this PR tries to address. The core are disappearing or vanishing objects, such as after a Director deployment.

oxzi added a commit that referenced this pull request Jan 28, 2026
The history retention is currently based on the final or closing
timestamp, if such a timestamp exists. This change switches to the
initial timestamp, which always exists.

As reported in the Community Forum[^0] and in an ongoing PR[^1], there
might be (historical) history entries without a closing timestamp.
Thus, these entries will never be rotated. Furthermore, based on the
documentation, it might be expected that the initial timestamp is used
for retention, not a later one.

This change does not cover the SLA retention, to ensure no misleading
SLA reports are being generated when a dangerously short SLA retention
period was defined.

[^0]: https://community.icinga.com/t/comment-history-and-retention-on-a-busy-system/15318/3
[^1]: #913
oxzi added a commit that referenced this pull request Jan 28, 2026
The history retention is currently based on the final or closing
timestamp, if such a timestamp exists. This change switches to the
initial timestamp, which always exists.

As reported in the Community Forum[^0] and in an ongoing PR[^1], there
might be (historical) history entries without a closing timestamp.
Thus, these entries will never be rotated. Furthermore, based on the
documentation, it might be expected that the initial timestamp is used
for retention, not a later one.

This change does not cover the SLA retention, to ensure no misleading
SLA reports are being generated when a dangerously short SLA retention
period was defined.

[^0]: https://community.icinga.com/t/comment-history-and-retention-on-a-busy-system/15318/3
[^1]: #913
oxzi added a commit that referenced this pull request Feb 10, 2026
The history retention is currently based on the final or closing
timestamp, if such a timestamp exists. This change also takes the
initial timestamp into account as a fallback.

As reported in the Community Forum[^0] and in an ongoing PR[^1], there
might be (historical) history entries without a closing timestamp.
Thus, these entries will never be rotated. Furthermore, based on the
documentation, it might be expected that the initial timestamp is used
for retention, not a later one.

This change does not cover the SLA retention, to ensure no misleading
SLA reports are being generated when a dangerously short SLA retention
period was defined.

[^0]: https://community.icinga.com/t/comment-history-and-retention-on-a-busy-system/15318/3
[^1]: #913
@oxzi oxzi force-pushed the fake-downtime-end-event branch from 66d105e to 9ef9a27 Compare February 23, 2026 14:02
oxzi and others added 3 commits March 23, 2026 10:13
It is possible for objects to vanish in Icinga 2, e.g., when a new
config pack is deployed as the Director does. Icinga DB already handles
this within its synchronization logic, explicitly deleting removed
objects.

However, these deleted objects might have open logical dependencies,
such as entries in the history tables. For example, only deleting
a Downtime object is not enough, as both its downtime_history and
sla_history_downtime entries needs to be updated as well.

In this example, these updates are required to not show the Downtimes
anymore, have a valid SLA calculation, and ensure a working downtime
history retention.

As Icinga DB does not receive the change from Icinga 2 over the usual
"icinga:history:stream:downtime" Redis stream, it must be reproduced or
mocked by Icinga DB itself. Now, Icinga DB is able to close the history
for both vanished Comment and Downtime objects itself.

Co-Authored-By: Yonas Habteab <yonas.habteab@icinga.com>
Ensure that removed Comment and Downtime objects attached to Hosts,
vanishing after a config pack redeployment, are correctly closed by
Icinga DB.
@oxzi oxzi force-pushed the fake-downtime-end-event branch from 9ef9a27 to 0808b72 Compare March 23, 2026 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Downtime on a removed object are never closed.

4 participants