Skip to content

MDEV-39013: Fix flaky main.tmp_space_usage test#4765

Open
varundeepsaini wants to merge 1 commit intoMariaDB:11.8from
varundeepsaini:MDEV-39013-fix-flaky-tmp-space-usage
Open

MDEV-39013: Fix flaky main.tmp_space_usage test#4765
varundeepsaini wants to merge 1 commit intoMariaDB:11.8from
varundeepsaini:MDEV-39013-fix-flaky-tmp-space-usage

Conversation

@varundeepsaini
Copy link
Copy Markdown

@varundeepsaini varundeepsaini commented Mar 9, 2026

Summary

  • Reading information_schema.session_status with tmp_memory_table_size=0 forces the query's internal temp table to disk, which changes tmp_space_used as a side effect of reading it. By the time connection c1 reads the value from processlist, it sees the inflated value (observed diff: exactly 16384 bytes = one InnoDB page).
  • Fix by temporarily setting tmp_memory_table_size high before the session_status read — same pattern already used earlier in the test (lines 72-77) for the same reason.

Test plan

  • main.tmp_space_usage passes locally (500/500 runs)

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 10, 2026
Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

Mostly LGTM, but please rebase on 11.8

@varundeepsaini varundeepsaini force-pushed the MDEV-39013-fix-flaky-tmp-space-usage branch from 76f7e7f to feca392 Compare March 10, 2026 13:52
@varundeepsaini varundeepsaini changed the base branch from 11.5 to 11.8 March 10, 2026 13:53
@varundeepsaini varundeepsaini requested a review from gkodinov March 12, 2026 19:48
Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please stand by for the final review.

@gkodinov gkodinov requested a review from svoj March 13, 2026 09:03
@svoj
Copy link
Copy Markdown
Contributor

svoj commented Mar 21, 2026

@varundeepsaini have you been able to reproduce this locally? It appears that 11.8 may be unaffected, buildbot is unaware of such failures in 11.8. Do you have an explanation why only MSAN builds are affacted?

  • Reading information_schema.session_status with tmp_memory_table_size=0 forces the query's internal temp table to disk, which changes tmp_space_used as a side effect of reading it. By the time connection c1 reads the value from processlist, it sees the inflated value (observed diff: exactly 16384 bytes = one InnoDB page).

It must've get deflated at the end of the statement, before switch to connection c1. Did you actually see this, or is it a guess based conclusion?

InnoDB must not be relevant here. FWICS it never updates tmp_space_used.

  • Fix by temporarily setting tmp_memory_table_size high before the session_status read — same pattern already used earlier in the test (lines 72-77) for the same reason.

I doubt the same pattern can be applied here: compared to max_tmp_space_used, tmp_space_used must get deflated at the end of the statement.

So far I was unable to reproduce it locally, can't tell what's going wrong yet.

@varundeepsaini
Copy link
Copy Markdown
Author

varundeepsaini commented Mar 22, 2026

@svoj No, I haven't been able to reproduce it locally either.

My original explanation was wrong. I looked at the buildbot cross-reference (https://buildbot.mariadb.org/cr/?branch=&revision=&platform=&dt=&bbnum=&typ=&info=&test_name=main.tmp_space_usage&test_variant=&info_text=&failure_text=&limit=50) and found 3 failures, all on amd64-msan-clang-20:

Build 9332 is the interesting one. MSAN says the read at sql/mysqld.cc:3830 uses uninitialized memory:

if (thd->status_var.tmp_space_used + size_change >
    thd->variables.max_tmp_space_usage && !no_error &&
    thd->variables.max_tmp_space_usage)

The MSAN origin trace points to my_malloc (mysys/my_malloc.c:117) called from CONNECT::create_thd() (sql/sql_connect.cc:1626). The crash happens during a binlog write — _my_b_cache_writetmp_file_tracktemp_file_size_cb_func.

tmp_space_used is set to 0 in the THD constructor (sql_class.cc:802), but MSAN still flags something in the create_thd() path as uninitialized. The wrong values in builds 9347/9342 could be caused by the same thing — if part of status_var has uninitialized bytes, the math on tmp_space_used can give wrong results.

My fix avoids the symptom but doesn't fix this root cause. This probably needs someone with an MSAN build to look at the create_thd() init path.

Can you reproduce with MSAN locally?

@svoj
Copy link
Copy Markdown
Contributor

svoj commented Mar 22, 2026

@varundeepsaini no, I haven't been able to reproduce it locally either. I use MSAN docker image as described in https://mariadb.com/docs/server/server-management/install-and-upgrade-mariadb/compiling-mariadb-from-source/legacy-guides/compile-and-using-mariadb-with-sanitizers-asan-ubsan-tsan-msan.

The mysqld.cc:3830 line of c5f6fd3 you're referring to is different:

!no_error && !thd->in_binlog_commit)

@svoj
Copy link
Copy Markdown
Contributor

svoj commented Mar 22, 2026

@varundeepsaini I was able to reproduce it locally and posted my analysis to MDEV-39013.

You were right that these 16KB come from I_S internal temporary table. However they're reset at the end of the statement. With MDEV-38019 client is released earlier, before tmp_space_used is updated. And thus concurrent connection may see old value. Failing with MSAN just because it is much slower than normal builds.

Your approach is viable and we may use it, but there're other ways to get it fixed. We need to devise some general way to handle issues cause by MDEV-38019. This is to be discussed with MariaDB developers team.

If you're willing to keep this pull request as an option, commit message has to be updated such that it gives proper explanation of the problem. Optionally retarget for 12.3 (we can do it ourselves).

@gkodinov
Copy link
Copy Markdown
Member

@svoj I wonder if wait_until_count_sessions.inc would help in this case. When the control is returned to the client it looks like the session is still counted in Threads_connected.

@svoj
Copy link
Copy Markdown
Contributor

svoj commented Mar 26, 2026

This pull request is now waiting for feedback to #4684 (comment). Specifically for mtr framework that'd allow to fix such failures. wait_until_count_sessions.inc is not relevant, we need to wait for session to finalise cleanup after preceding statement, not for session to go away.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 30, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While waiting for the final review, please address the licensing check.

@varundeepsaini varundeepsaini force-pushed the MDEV-39013-fix-flaky-tmp-space-usage branch 2 times, most recently from 3598859 to 99cce15 Compare April 1, 2026 11:50
Reading information_schema.session_status with tmp_memory_table_size=0
forces the query's internal temp table to disk, which changes
tmp_space_used as a side effect of reading it. By the time connection
c1 reads the value from processlist, it sees the inflated value.

Fix by temporarily setting tmp_memory_table_size high before the
session_status read, same pattern already used earlier in the test
(lines 72-77) for the same reason.

Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <deepsainivarun@gmail.com>
@varundeepsaini varundeepsaini force-pushed the MDEV-39013-fix-flaky-tmp-space-usage branch from 99cce15 to 7df6aca Compare April 1, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

4 participants