Skip to content

Conversation

@paulmedynski
Copy link
Contributor

@paulmedynski paulmedynski commented Jan 21, 2026

Description

Individual tests should never run for more than 10 minutes. We currently have some stubborn tests that occasionally hang forever, but the current xUnit output doesn't tell us which tests they are. This will reveal them, and prevent future hanging tests from wasting pipeline run time.

I'll move any offending tests into the Flaky category as part of this PR.

Testing

Normal PR/CI runs will reveal the problem tests and determine if 10 minutes is a suitable per-test timeout.

@paulmedynski paulmedynski requested a review from a team as a code owner January 21, 2026 15:03
Copilot AI review requested due to automatic review settings January 21, 2026 15:03
@paulmedynski paulmedynski added Area\Tests Issues that are targeted to tests or test projects Area\Engineering Use this for issues that are targeted for changes in the 'eng' folder or build systems. labels Jan 21, 2026
@paulmedynski paulmedynski changed the title Fail test that run for more than 10 minutes Fail tests that run for more than 10 minutes Jan 21, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements per-test timeout enforcement by adding xUnit's --blame-hang options to all test execution commands. The goal is to identify tests that hang and exceed a 10-minute threshold, preventing them from wasting pipeline execution time.

Changes:

  • Added --blame-hang, --blame-hang-dump-type none, and --blame-hang-timeout 10m arguments to all test execution tasks
  • Refactored test command arguments to use multi-line YAML format for better readability
  • Added missing command: build parameter to DotNetCoreCLI@2 task in configure-sql-server-step.yml

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
eng/pipelines/common/templates/steps/configure-sql-server-step.yml Added missing command: build parameter to DotNetCoreCLI@2 task (unrelated fix)
eng/pipelines/common/templates/steps/build-and-run-tests-netfx-step.yml Added blame-hang timeout options to all 4 .NET Framework test tasks and reformatted arguments for readability
eng/pipelines/common/templates/steps/build-and-run-tests-netcore-step.yml Added blame-hang timeout options to all 4 .NET Core test tasks and reformatted arguments for readability
build.proj Added blame-hang timeout options to all 6 test targets (Unit/Functional/Manual for both Windows and Unix)

@vonzshik
Copy link

We currently have some stubborn tests that occasionally hang forever, but the current xUnit output doesn't tell us which tests they are. This will reveal them, and prevent future hanging tests from wasting pipeline run time.

Just FYI, blame-hang doesn't ever tell which test actually hanged, the most it can do is to make a dump of the whole process. Encountered this while attempting to track a similar issue in npgsql :(

benrr101
benrr101 previously approved these changes Jan 21, 2026
Copy link
Contributor

@benrr101 benrr101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, I thought it would be pretty easy to implement 🚢

- Generating full dumps for hung tests that are killed.
benrr101
benrr101 previously approved these changes Jan 21, 2026
@paulmedynski
Copy link
Contributor Author

@vonzshik - Interesting. This article seems to show the test and stack:

https://www.meziantou.net/generating-a-dump-file-when-tests-hang-on-a-ci-machine.htm

I've enabled full dumps anyway, and confirmed that we are publishing any dumps/sequence files after the test runs so we will be able to view/analyze them.

@vonzshik
Copy link

@paulmedynski the stacktrace there is useless because it's for the host, not for a specific test.

When I last tried it using --blame-crash --blame-hang-timeout 30s it didn't generate a sequence file, only the dump.

image image

As for debugging the dump, from my understanding that's only supported for full dumps (you do have them enabled), but the last time I tried debugging it, I didn't managed to go that far because there wasn't any actual thread being stuck (I guess instead it was a stuck async task).

@paulmedynski
Copy link
Contributor Author

paulmedynski commented Jan 21, 2026

https://sqlclientdrivers.visualstudio.com/public/_build/results?buildId=137007&view=logs&j=ddd09f31-44de-51f9-b157-88b752699cec&t=00478fad-bc4c-54dd-ca91-c337ea4d1676

We had a hung test, and the output looks promising:

  The active test run was aborted. Reason: Test host process crashed
  Data collector 'Blame' message: The specified inactivity time of 10 minutes has elapsed. Collecting hang dumps from testhost and its child processes.
  Data collector 'Blame' message: Dumping 7208 - testhost.
  Results File: D:\a\_work\1\s\TestResults\Manual-Windowsnetfx-3_net462_20260121181400.trx
  
  Test Run Aborted.
  Total tests: Unknown
       Passed: 189
      Skipped: 72
   Total time: 17.7223 Minutes
  
  The active Test Run was aborted because the host process exited unexpectedly. Please inspect the call stack above, if available, to get more information about where the exception originated from.
  The test running when the crash occurred: 
  Microsoft.Data.SqlClient.ManualTesting.Tests.TvpTest.TestMain
  
  This test may, or may not be the source of the crash.
  
  Attachments:
    D:\a\_work\1\s\TestResults\10902ef7-e53c-48b9-a664-4d1431ea7819\cloudtest_138ba1eac00000B_2026-01-21.18_13_59.coverage
    D:\a\_work\1\s\TestResults\10902ef7-e53c-48b9-a664-4d1431ea7819\testhost_7208_20260121T181352_hangdump.dmp
    D:\a\_work\1\s\TestResults\10902ef7-e53c-48b9-a664-4d1431ea7819\Sequence_b2339be4ffe8464ab80194aaff4123d0.xml

I'll take a look at the dump and sequence files to see if they match the test that was running Microsoft.Data.SqlClient.ManualTesting.Tests.TvpTest.TestMain. The ManualTest suite runs all tests serially, so it's likely the culprit, and that same test was hung 3 times in a row across several different jobs.

@vonzshik
Copy link

Wonder whether it's because it's running on windows and not linux...

- Changed test results publishing to occur regardless of Project vs Package references.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.62%. Comparing base (77f79e0) to head (8a200ff).
⚠️ Report is 3 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (77f79e0) and HEAD (8a200ff). Click for more details.

HEAD has 4 uploads less than BASE
Flag BASE (77f79e0) HEAD (8a200ff)
netfx 2 1
netcore 2 1
addons 2 0
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3906      +/-   ##
==========================================
- Coverage   76.90%   67.62%   -9.29%     
==========================================
  Files         269      263       -6     
  Lines       43246    66170   +22924     
==========================================
+ Hits        33260    44746   +11486     
- Misses       9986    21424   +11438     
Flag Coverage Δ
addons ?
netcore 67.64% <ø> (-9.30%) ⬇️
netfx 66.63% <ø> (-9.74%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@priyankatiwari08 priyankatiwari08 self-assigned this Jan 22, 2026
@paulmedynski paulmedynski merged commit 0ec7af1 into main Jan 23, 2026
280 checks passed
@paulmedynski paulmedynski deleted the dev/paul/test-hang branch January 23, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area\Engineering Use this for issues that are targeted for changes in the 'eng' folder or build systems. Area\Tests Issues that are targeted to tests or test projects

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants