Skip to content

Conversation

@BuonOmo
Copy link
Collaborator

@BuonOmo BuonOmo commented Oct 9, 2025

Problem

test_create_with_query_cache fails quite often, but it is a flaky test, and fast local reproduction is difficult.

Exploration

I started a bunch of parallel tests (256 is the maximum allowed by Github Actions), and checked if one was failing fast. Then isolated the related tests, and restarted. Once with a limited amount of tests, I could check in a more linear fashion which tests were leading to the bad conditions. And here a constant reproduction (Rails 8.0.3, CRDB 25.2):

TESTOPTS="--seed=1 --name='/test_truncate_with_query_cache|test_create_with_query_cache/'" \
COCKROACH_SKIP_LOAD_SCHEMA=1 bundle exec rake test

Reproduction time is 5s and we isolated two test. Now we should be able to pinpoint the issue.

Bug reproduction

# frozen_string_literal: true

require "activerecord-cockroachdb-adapter"
require "minitest/autorun"
require "logger"

FIXTURES_ROOT = Bundler.load.specs["activerecord"].first.full_gem_path + "/test/fixtures"

ActiveRecord::Base.establish_connection("cockroachdb://root@localhost:26257/defaultdb")

# ActiveRecord::Base.logger = Logger.new(STDOUT)

def title(str)
  puts
  puts str
  puts "=" * (str.length)
  puts
end

title "ActiveRecord Version: #{ActiveRecord::VERSION::STRING}"

title ActiveRecord::Base.connection.select_one("select version()")["version"]

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
    t.references :author
    t.string :title, null: false
    t.text    :body, null: false
    t.string  :type
    t.integer :legacy_comments_count, default: 0
    t.integer :taggings_with_delete_all_count, default: 0
    t.integer :taggings_with_destroy_count, default: 0
    t.integer :tags_count, default: 0
    t.integer :indestructible_tags_count, default: 0
    t.integer :tags_with_destroy_count, default: 0
    t.integer :tags_with_nullify_count, default: 0
  end
end

class Post < ActiveRecord::Base
end

class BugTest < ActiveSupport::TestCase
  def test_query_cache_should_not_hit
    @connection = ActiveRecord::Base.connection
    reset_fixtures("posts")

    begin # test_truncate_with_query_cache
      @connection.enable_query_cache!

      assert_operator Post.count, :>, 0

      @connection.truncate("posts")
      assert_equal 0, Post.count
    ensure
      reset_fixtures("posts")
      @connection.disable_query_cache!
    end

    begin # test_create_with_query_cache
      @connection.enable_query_cache!

      count = Post.count

      @connection.create("INSERT INTO posts(title, body) VALUES ('', '')")

      assert_equal count + 1, Post.count
    ensure
      reset_fixtures("posts")
      @connection.disable_query_cache!
    end
  end

  private def reset_fixtures(fixture_name)
    ActiveRecord::FixtureSet.reset_cache

    ActiveRecord::FixtureSet.create_fixtures(FIXTURES_ROOT, fixture_name)
  end
end

If we use this reproduction with git bisect, we can see that the first bug come from commit 63e8e86

Conclusion

Based on this search and some debugging, I saw that the root cause came from rails/rails#52428. The fix is trivial though.

I took the opportunity to extract my way of isolating the test through a new 'flaky.yml' workflow. That will allow us to be more efficient on flaky tests. And there are still some...

@BuonOmo BuonOmo changed the title temp: isolate bugged test fix(test): test_create_with_query_cache Oct 9, 2025
@BuonOmo BuonOmo force-pushed the fix/test_create_with_query_cache branch 11 times, most recently from da689a1 to 9818a31 Compare October 13, 2025 14:04
Add a manually triggered workflow that runs a lot of tests
in parallel with the `--fail-fast` option set. There are
two goals here:
- Detect new flaky tests
- Get test seeds for fast failing flaky tests for an easier
  local reproduction

Signed-off-by: Ulysse Buonomo <[email protected]>
Since [rails pull request #52428][1], `#execute_batch` does not trigger
a cache clear anymore. However, `#insert_fixtures_set` relies on that
clear to ensure consistency. In the postgresql adapter, this is ensured
by a call to `#execute` rather than `#execute_batch` in
`#disable_referential_integrity`. Since we are not always calling
`#disable_referential_integrity`, we need to ensure that the cache is
cleared when running our statements by calling `#execute` instead of
`#execute_batch`.

[1]: rails/rails#52428

Signed-off-by: Ulysse Buonomo <[email protected]>
@BuonOmo BuonOmo force-pushed the fix/test_create_with_query_cache branch from 9818a31 to 341c20f Compare October 13, 2025 14:13
@BuonOmo BuonOmo marked this pull request as ready for review October 13, 2025 14:18
@BuonOmo
Copy link
Collaborator Author

BuonOmo commented Oct 13, 2025

@rafiss I would rebase and merge this if approved, the two commits are separated to make more sense :)

@BuonOmo BuonOmo requested a review from rafiss October 13, 2025 14:55
Copy link
Contributor

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a great investigation, thank you for tracking this down! the new testing workflow should be very useful.

@rafiss rafiss merged commit e27240e into master Oct 14, 2025
5 checks passed
@BuonOmo BuonOmo deleted the fix/test_create_with_query_cache branch October 14, 2025 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants