Skip to content

MDEV-38814 Skip root in opposite-intention check#4887

Draft
iMineLink wants to merge 1 commit intoMariaDB:11.8from
iMineLink:MDEV-38814
Draft

MDEV-38814 Skip root in opposite-intention check#4887
iMineLink wants to merge 1 commit intoMariaDB:11.8from
iMineLink:MDEV-38814

Conversation

@iMineLink
Copy link
Copy Markdown
Contributor

@iMineLink iMineLink commented Apr 1, 2026

btr_cur_need_opposite_intention() checks whether a structural change at a page (e.g., a page split inserting a node pointer into the parent, a merge with a sibling removing one, or a boundary change updating one) could cascade into its parent. Handling such a structural change would also require latching sibling pages at that level, which cannot be safely acquired under SX without risking deadlock with concurrent readers; hence the index lock must be upgraded from SX to X. The root has no parent and no siblings, so such cascades are impossible and no sibling latches are needed; return false early, avoiding unnecessary upgrades.

A root page split (btr_root_raise_and_insert()) is a special case that changes the tree height, but it never cascades upward into a parent.

Add a debug assertion in btr_cur_search_to_nth_level() that freshly acquiring the root page is only permitted under X-lock or when no page latch is held (brand-new descent). Under SX with existing page latches, the root must already be latched in the mtr, retained from the original search_leaf() descent.

Add debug-only counters (Innodb_btr_need_opposite_intention_root and Innodb_btr_index_lock_upgrades) to track how often the root early return is taken and how often SX-to-X upgrades occur, exposed via SHOW GLOBAL STATUS in debug builds.

Add a test that inserts and updates 1000 rows in a 4K-page table, verifying that the root early return fires and that index lock upgrades are reduced.

Notes:

  1. The change is based on MDEV-38814 testing findings which highlighted a suspicious number of btr_cur_need_opposite_intention() calls returning true when evaluated at the root level.
  2. The assertion in btr_cur_search_to_nth_level() is added as an extra safety net to validate a pre-existing invariant.
  3. As discussed internally 11.4 might be also a good candidate branch for this change.
  4. For performance testing, 12.3 might be a better base branch.
  5. Currently opened as draft to allow initial discussion and CI coverage.
  6. TODO: In the added innodb.index_lock_upgrade test, why btr_cur_optimistic_update() returns DB_UNDERFLOW even if the record "grows" in the UPDATE? To be checked

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines +1840 to +1845
/* Under SX, the root page must already be latched from the
original search_leaf() descent. Needing to freshly acquire it
here means a cascade escaped the space checks — only permissible
under X where any latch order is allowed (WL#6326 rule 3). */
ut_ad(page_id.page_no() != index->page ||
mtr->memo_contains_flagged(&index->lock, MTR_MEMO_X_LOCK));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get here if mtr->get_already_latched(page_id, rw_latch) failed. This function btr_cur_search_to_nth_level() is only being used for searching to non-leaf levels.

It would seem to me that WL#6326 rule (2.2) is more relevant here. I would suggest to write down the rules somewhere separately, because the original source could disappear at any time.

Why do we only care about holding exclusive index->lock when we are accessing the root page? I think that the answer is that this function is traversing from the root to the leaf, and it suffices to check the condition only on the root page? After all, we must be holding a latch on the parent page(s) when we goto search_loop.

Holding an exclusive root page latch along with a index->lock.u_lock() would certainly allow us to acquire any non-leaf page latch:
(2.2) When acquiring a node pointer page latch at level L, we must hold the left sibling page latch (at level L) or some ancestor latch (at level>L).

I did not think it through if the same would hold for any rw_latch mode that we are using here.

The first part of the assertion could be simplified. I care a little about debug build performance as well:

  ut_ad(height != ULINT_UNDEFINED ||
        mtr->memo_contains_flagged(&index->lock, MTR_MEMO_X_LOCK));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a partial answer is that the root is special in the sense that, being ancestor to all the other nodes in the tree, it makes rule (2.2) valid for every other non-leaf node if you have it X-latched under SX-lock of the index.
Otherwise, given that there is no sibling or ancestor to the root, you need for sure X-lock of the index to latch the root here as per rule (3), after the search_leaf() initial descent, which shall have retained at least some latches.
The only possibility under SX-lock that the root is latched here without violating (2.2) is if it's a brand new descent and no other latches are held, but I don't think this is possible after search_leaf() is executed (I'll see if assertions can help supporting this claim).
About rw_latch mode, it seems the only S-latch request comes from btr_page_get_father_node_ptr_for_validate() which requires X-lock on the index, therefore it should be currently safe to exclude rw_latch from the assertion.

btr_cur_need_opposite_intention() checks whether a structural change at
a page (e.g., a page split inserting a node pointer into the parent,
a merge with a sibling removing one, or a boundary change updating
one) could cascade into its parent.  Handling such a structural
change would also require latching sibling pages at that level, which
cannot be safely acquired under SX without risking deadlock with
concurrent readers; hence the index lock must be upgraded from SX to X.
The root has no parent and no siblings, so such cascades are impossible
and no sibling latches are needed; return false early, avoiding
unnecessary upgrades.

A root page split (btr_root_raise_and_insert()) is a special case that
changes the tree height, but it never cascades upward into a parent.

Add a debug assertion in btr_cur_search_to_nth_level() that freshly
acquiring the root page is only permitted under X-lock or when no page
latch is held (brand-new descent).  Under SX with existing page latches,
the root must already be latched in the mtr, retained from the original
search_leaf() descent.

Add debug-only counters (Innodb_btr_need_opposite_intention_root and
Innodb_btr_index_lock_upgrades) to track how often the root early return
is taken and how often SX-to-X upgrades occur, exposed via SHOW GLOBAL
STATUS in debug builds.

Add a test that inserts and updates 1000 rows in a 4K-page table,
verifying that the root early return fires and that index lock upgrades
are reduced.
@iMineLink
Copy link
Copy Markdown
Contributor Author

I added || (mtr->get_savepoint() == 1 && !mtr->block_at_savepoint(0)) in the btr_cur_search_to_nth_level() assertion to allow brand-new descents to (X-)latch the root even under index SX-lock. The assertion currently never failed before though (checked via bb cr). ut_d() usage and limiting the parameters of btr_cur_need_opposite_intention() to 6 were also addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants