Skip to content

Fix deadlock in LSPClient#2933

Merged
gabritto merged 1 commit intomainfrom
gabritto/lspclient_deadlock
Mar 2, 2026
Merged

Fix deadlock in LSPClient#2933
gabritto merged 1 commit intomainfrom
gabritto/lspclient_deadlock

Conversation

@gabritto
Copy link
Member

@gabritto gabritto commented Feb 27, 2026

From these logs, a formatting test was deadlocking:

Trace
goroutine 477303 [running]:
testing.(*M).startAlarm.func1()
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/testing/testing.go:2802 +0x605
created by time.goFunc
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/time/sleep.go:215 +0x45

goroutine 1 [chan receive, 44 minutes]:
testing.tRunner.func1()
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/testing/testing.go:1993 +0x9ad
testing.tRunner(0xc0003d6008, 0xc000043a68)
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/testing/testing.go:2042 +0x256
testing.runTests({0x253ead3, 0x22}, {0x257ba5f, 0x3f}, 0xc0001be0a8, {0x4737b20, 0xcd3, 0xcd3}, {0xc260454d644e9cc3, 0x274a9f4a64b, ...})
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/testing/testing.go:2583 +0x9ea
testing.(*M).Run(0xc0003ab540)
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/testing/testing.go:2443 +0xf4c
github.com/microsoft/typescript-go/internal/fourslash/tests/gen_test.TestMain(0xc0003ab540)
	/mnt/vss/_work/typescript-go/typescript-go/internal/fourslash/tests/gen/testmain_test.go:11 +0x3e
main.main()
	_testmain.go:6612 +0x172

goroutine 1315 [sync.WaitGroup.Wait, 44 minutes]:
sync.runtime_SemacquireWaitGroup(0xc004b32cc8?, 0x0?)
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/runtime/sema.go:114 +0x2e
sync.(*WaitGroup).Wait(0xc004b32cc8)
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/sync/waitgroup.go:206 +0xd3
golang.org/x/sync/errgroup.(*Group).Wait(0xc004b32cc0)
	/home/cloudtest/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:56 +0x35
github.com/microsoft/typescript-go/internal/testutil/lsptestutil.NewLSPClient.func3()
	/mnt/vss/_work/typescript-go/typescript-go/internal/testutil/lsptestutil/lspclient.go:112 +0x66
github.com/microsoft/typescript-go/internal/fourslash.NewFourslash.func2()
	/mnt/vss/_work/typescript-go/typescript-go/internal/fourslash/fourslash.go:220 +0x67
github.com/microsoft/typescript-go/internal/fourslash/tests/gen_test.TestFormattingOnStatementsWithNoSemicolon(0xc000533b08)
	/mnt/vss/_work/typescript-go/typescript-go/internal/fourslash/tests/gen/formattingOnStatementsWithNoSemicolon_test.go:184 +0xf3c
testing.tRunner(0xc000533b08, 0x2cd60f0)
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/testing/testing.go:2036 +0x21d
created by testing.(*T).Run in goroutine 1
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/testing/testing.go:2101 +0xb13

goroutine 194903 [sync.WaitGroup.Wait, 44 minutes]:
sync.runtime_SemacquireWaitGroup(0xc0081c26c8?, 0x0?)
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/runtime/sema.go:114 +0x2e
sync.(*WaitGroup).Wait(0xc0081c26c8)
	/home/cloudtest/.local/share/microsoft-go/go1.26.0-1/src/sync/waitgroup.go:206 +0xd3
golang.org/x/sync/errgroup.(*Group).Wait(0xc0081c26c0)
	/home/cloudtest/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:56 +0x35
github.com/microsoft/typescript-go/internal/lsp.(*Server).Run(0xc00bf06708, {0x2cf8c80, 0xc008ba9720})
	/mnt/vss/_work/typescript-go/typescript-go/internal/lsp/server.go:361 +0x452
github.com/microsoft/typescript-go/internal/testutil/lsptestutil.NewLSPClient.func1()
	/mnt/vss/_work/typescript-go/typescript-go/internal/testutil/lsptestutil/lspclient.go:101 +0xbd
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/home/cloudtest/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x87
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1315
	/home/cloudtest/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11d

goroutine 194909 [chan send, 44 minutes]:
github.com/microsoft/typescript-go/internal/testutil/lsptestutil.(*LSPWriter).Write(0xc00777ca00, 0xc004872438)
	/mnt/vss/_work/typescript-go/typescript-go/internal/testutil/lsptestutil/lspclient.go:39 +0x4d
github.com/microsoft/typescript-go/internal/lsp.(*Server).writeLoop(0xc00bf06708, {0x2cf8c80, 0xc005b6a730})
	/mnt/vss/_work/typescript-go/typescript-go/internal/lsp/server.go:506 +0x14b
github.com/microsoft/typescript-go/internal/lsp.(*Server).Run.func2()
	/mnt/vss/_work/typescript-go/typescript-go/internal/lsp/server.go:347 +0x47
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/home/cloudtest/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93 +0x87
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 194903
	/home/cloudtest/go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:78 +0x11d

Basically this was due to a combination of:
The formatting request in the test returns many edits, and for each edit fourslash will apply it and send a textDocument/didChange notification to the server. The server will then send back many logging messages for these, such that, depending on the ordering in which things happen:

  1. The server output buffered channel gets full.
  2. The server writeLoop gets blocked on writing to the output, because it is full.
  3. The fourslash test gets to the end, cancels the context.
  4. The LSP client's MessageRouter sees the context is cancelled and exits.
  5. The test cannot finish, because it's waiting for the server to exit, and the server has to wait for its writeLoop to exit.

This PR fixes that by having MessageRouter drain the output reader messages when the context is cancelled.

I think the other formatting test we had disabled a month ago was failing for a similar reason.

I also can't think of a way to test all this right now.

Copilot AI review requested due to automatic review settings February 27, 2026 20:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@jakebailey
Copy link
Member

the server has to wait for its writeLoop to exit.

writeLoop is a select loop. Why is cancellation not exiting it?

@gabritto
Copy link
Member Author

gabritto commented Mar 2, 2026

the server has to wait for its writeLoop to exit.

writeLoop is a select loop. Why is cancellation not exiting it?

IIUC it blocks on writing to the buffered channel before cancellation hits, and then stays blocked.

@gabritto gabritto added this pull request to the merge queue Mar 2, 2026
Merged via the queue into main with commit fcf298a Mar 2, 2026
25 checks passed
@gabritto gabritto deleted the gabritto/lspclient_deadlock branch March 2, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants