Fail tests that run for more than 10 minutes #3906

paulmedynski · 2026-01-21T15:03:14Z

Description

Individual tests should never run for more than 10 minutes. We currently have some stubborn tests that occasionally hang forever, but the current xUnit output doesn't tell us which tests they are. This will reveal them, and prevent future hanging tests from wasting pipeline run time.

I'll move any offending tests into the Flaky category as part of this PR.

Testing

Normal PR/CI runs will reveal the problem tests and determine if 10 minutes is a suitable per-test timeout.

…est.

Copilot

Pull request overview

This PR implements per-test timeout enforcement by adding xUnit's --blame-hang options to all test execution commands. The goal is to identify tests that hang and exceed a 10-minute threshold, preventing them from wasting pipeline execution time.

Changes:

Added --blame-hang, --blame-hang-dump-type none, and --blame-hang-timeout 10m arguments to all test execution tasks
Refactored test command arguments to use multi-line YAML format for better readability
Added missing command: build parameter to DotNetCoreCLI@2 task in configure-sql-server-step.yml

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
eng/pipelines/common/templates/steps/configure-sql-server-step.yml	Added missing `command: build` parameter to DotNetCoreCLI@2 task (unrelated fix)
eng/pipelines/common/templates/steps/build-and-run-tests-netfx-step.yml	Added blame-hang timeout options to all 4 .NET Framework test tasks and reformatted arguments for readability
eng/pipelines/common/templates/steps/build-and-run-tests-netcore-step.yml	Added blame-hang timeout options to all 4 .NET Core test tasks and reformatted arguments for readability
build.proj	Added blame-hang timeout options to all 6 test targets (Unit/Functional/Manual for both Windows and Unix)

eng/pipelines/common/templates/steps/configure-sql-server-step.yml

vonzshik · 2026-01-21T15:29:12Z

We currently have some stubborn tests that occasionally hang forever, but the current xUnit output doesn't tell us which tests they are. This will reveal them, and prevent future hanging tests from wasting pipeline run time.

Just FYI, blame-hang doesn't ever tell which test actually hanged, the most it can do is to make a dump of the whole process. Encountered this while attempting to track a similar issue in npgsql :(

benrr101

lol, I thought it would be pretty easy to implement 🚢

- Generating full dumps for hung tests that are killed.

paulmedynski · 2026-01-21T17:38:44Z

@vonzshik - Interesting. This article seems to show the test and stack:

https://www.meziantou.net/generating-a-dump-file-when-tests-hang-on-a-ci-machine.htm

I've enabled full dumps anyway, and confirmed that we are publishing any dumps/sequence files after the test runs so we will be able to view/analyze them.

vonzshik · 2026-01-21T18:20:16Z

@paulmedynski the stacktrace there is useless because it's for the host, not for a specific test.

When I last tried it using --blame-crash --blame-hang-timeout 30s it didn't generate a sequence file, only the dump.

As for debugging the dump, from my understanding that's only supported for full dumps (you do have them enabled), but the last time I tried debugging it, I didn't managed to go that far because there wasn't any actual thread being stuck (I guess instead it was a stuck async task).

paulmedynski · 2026-01-21T19:05:55Z

https://sqlclientdrivers.visualstudio.com/public/_build/results?buildId=137007&view=logs&j=ddd09f31-44de-51f9-b157-88b752699cec&t=00478fad-bc4c-54dd-ca91-c337ea4d1676

We had a hung test, and the output looks promising:

  The active test run was aborted. Reason: Test host process crashed
  Data collector 'Blame' message: The specified inactivity time of 10 minutes has elapsed. Collecting hang dumps from testhost and its child processes.
  Data collector 'Blame' message: Dumping 7208 - testhost.
  Results File: D:\a\_work\1\s\TestResults\Manual-Windowsnetfx-3_net462_20260121181400.trx
  
  Test Run Aborted.
  Total tests: Unknown
       Passed: 189
      Skipped: 72
   Total time: 17.7223 Minutes
  
  The active Test Run was aborted because the host process exited unexpectedly. Please inspect the call stack above, if available, to get more information about where the exception originated from.
  The test running when the crash occurred: 
  Microsoft.Data.SqlClient.ManualTesting.Tests.TvpTest.TestMain
  
  This test may, or may not be the source of the crash.
  
  Attachments:
    D:\a\_work\1\s\TestResults\10902ef7-e53c-48b9-a664-4d1431ea7819\cloudtest_138ba1eac00000B_2026-01-21.18_13_59.coverage
    D:\a\_work\1\s\TestResults\10902ef7-e53c-48b9-a664-4d1431ea7819\testhost_7208_20260121T181352_hangdump.dmp
    D:\a\_work\1\s\TestResults\10902ef7-e53c-48b9-a664-4d1431ea7819\Sequence_b2339be4ffe8464ab80194aaff4123d0.xml

I'll take a look at the dump and sequence files to see if they match the test that was running Microsoft.Data.SqlClient.ManualTesting.Tests.TvpTest.TestMain. The ManualTest suite runs all tests serially, so it's likely the culprit, and that same test was hung 3 times in a row across several different jobs.

vonzshik · 2026-01-21T19:14:40Z

Wonder whether it's because it's running on windows and not linux...

- Changed test results publishing to occur regardless of Project vs Package references.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

eng/pipelines/common/templates/steps/build-and-run-tests-netcore-step.yml

eng/pipelines/common/templates/steps/configure-sql-server-step.yml

eng/pipelines/common/templates/jobs/ci-run-tests-job.yml

eng/pipelines/common/templates/steps/run-all-tests-step.yml

src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/ParameterTest/TvpTest.cs

codecov · 2026-01-21T23:38:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.62%. Comparing base (77f79e0) to head (8a200ff).
⚠️ Report is 3 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (77f79e0) and HEAD (8a200ff). Click for more details.

HEAD has 4 uploads less than BASE

Flag BASE (77f79e0) HEAD (8a200ff)

netfx 2 1

netcore 2 1

addons 2 0

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3906      +/-   ##
==========================================
- Coverage   76.90%   67.62%   -9.29%     
==========================================
  Files         269      263       -6     
  Lines       43246    66170   +22924     
==========================================
+ Hits        33260    44746   +11486     
- Misses       9986    21424   +11438

Flag	Coverage Δ
addons	`?`
netcore	`67.64% <ø> (-9.30%)`	⬇️
netfx	`66.63% <ø> (-9.74%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/ParameterTest/TvpTest.cs

Added test hang protection with a default timeout of 10 minutes per t…

d8399cc

…est.

paulmedynski requested a review from a team as a code owner January 21, 2026 15:03

Copilot AI review requested due to automatic review settings January 21, 2026 15:03

paulmedynski added Area\Tests Issues that are targeted to tests or test projects Area\Engineering Use this for issues that are targeted for changes in the 'eng' folder or build systems. labels Jan 21, 2026

Copilot started reviewing on behalf of paulmedynski January 21, 2026 15:03 View session

paulmedynski changed the title ~~Fail test that run for more than 10 minutes~~ Fail tests that run for more than 10 minutes Jan 21, 2026

Copilot AI reviewed Jan 21, 2026

View reviewed changes

eng/pipelines/common/templates/steps/configure-sql-server-step.yml Show resolved Hide resolved

benrr101 previously approved these changes Jan 21, 2026

View reviewed changes

- Allowing all test steps to run, even if a previous step has failed.

d21990d

- Generating full dumps for hung tests that are killed.

paulmedynski dismissed benrr101’s stale review via d21990d January 21, 2026 17:29

benrr101 previously approved these changes Jan 21, 2026

View reviewed changes

- Found a test that hangs and moved it to the flaky category.

8a200ff

- Changed test results publishing to occur regardless of Project vs Package references.

Copilot AI review requested due to automatic review settings January 21, 2026 19:33

paulmedynski dismissed benrr101’s stale review via 8a200ff January 21, 2026 19:33

Copilot started reviewing on behalf of paulmedynski January 21, 2026 19:34 View session

Copilot AI reviewed Jan 21, 2026

View reviewed changes

paulmedynski commented Jan 21, 2026

View reviewed changes

benrr101 approved these changes Jan 22, 2026

View reviewed changes

src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/ParameterTest/TvpTest.cs Show resolved Hide resolved

priyankatiwari08 self-assigned this Jan 22, 2026

apoorvdeshmukh approved these changes Jan 23, 2026

View reviewed changes

paulmedynski merged commit 0ec7af1 into main Jan 23, 2026
280 checks passed

paulmedynski deleted the dev/paul/test-hang branch January 23, 2026 12:42

Fail tests that run for more than 10 minutes #3906

Fail tests that run for more than 10 minutes #3906

Conversation

paulmedynski commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

vonzshik commented Jan 21, 2026

Uh oh!

benrr101 left a comment

Choose a reason for hiding this comment

Uh oh!

paulmedynski commented Jan 21, 2026

Uh oh!

vonzshik commented Jan 21, 2026

Uh oh!

paulmedynski commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vonzshik commented Jan 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

paulmedynski commented Jan 21, 2026 •

edited

Loading

paulmedynski commented Jan 21, 2026 •

edited

Loading

codecov bot commented Jan 21, 2026 •

edited

Loading