Skip to content

Conversation

ezzieyguywuf
Copy link
Contributor

  • Add HTTPCon.AtomicState to fix some tsan errors.
  • Add a mutex to httpz.request.State struct.
  • Add mutex to HTTPCon.
  • Add a mutex to NonBlocking, which is used in processWebsocketData

I did not do any performance testing to establish the performance impact of
these various new mutexes. If someone can point me towards some documentation on
how to execute performance tests on this project I'm happy to do this analysis.

I also didn't spend too much time thinking about whether or not larger design
changes may be necessary (e.g. I don't know why multiple threads are calling
processWebsocketData or State.reset()/State.parse()) - instead I opted for the
more straightforward approach.

I'm happy to use this PR as a conversation started if necessary.

@ezzieyguywuf ezzieyguywuf mentioned this pull request Jun 14, 2025
@karlseguin
Copy link
Owner

I'm not convinced. It shouldn't be possible for state.reset() and state.parse() to be executing at the same time. If this is happening, adding a mutex won't solve the issue..you'll get a reset of data in the middle parsing..that won't be right. If this is happening, I'd rather figure out why and fix it at the root.

@karlseguin
Copy link
Owner

With 44f050e I don't get any tsan errors.

But I find it weird that that simple change fixed warnings/errors that didn't seem related.

@ezzieyguywuf
Copy link
Contributor Author

ezzieyguywuf commented Jun 16, 2025

I'd rather figure out why and fix it at the root

I don't disagree with this, I don't mind spending some more time unraveling things to get to the root

With 44f050e I don't get any tsan errors.

😅 or maybe you already figured it out

But I find it weird that that simple change fixed warnings/errors that didn't seem related.

this change actually seems similar to 5bca6b2 from this PR. In fact, it seems identical, since as far as I can tell std.atomic.Value.load/store just use @atomicLoad/@atomicStore under the hood (source, copied below for convenience):

        pub inline fn load(self: *const Self, comptime order: AtomicOrder) T {
            return @atomicLoad(T, &self.raw, order);
        }

        pub inline fn store(self: *Self, value: T, comptime order: AtomicOrder) void {
            @atomicStore(T, &self.raw, value, order);

With 74df8f9 I'm able to produce the ThreadSanitizer output that prompted the additional commits on this PR.

I'm not sure why, but my gut tells me that if we're using atomic operations anywhere for _state, we should prefer to use them everywhere

@ezzieyguywuf
Copy link
Contributor Author

ezzieyguywuf commented Jun 16, 2025

I should note that I need to update this PR to use the proper atomic operation orders for load/store as in getState and setState:

<edit: removed diff, pushed requisite changes>

Specifically, State.parse and State.reset are called from multiple
threads, leading to data races since both mutate the state of...well
State.

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
This is used in requestDone to ensure that multiple threads don't race
to mutate the state.

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
This mutex prevents data races surfaced by ThreadSanitizer.

Signed-off-by: Wolfgang E. Sanyer <[email protected]>
@ezzieyguywuf
Copy link
Contributor Author

hm weird, at first running tests off of 44f050e resulted in no more ThreadSanitizer output, but after rerunning it a few times it came back:

wolfgangsanyer@wolfgangsanyer22:~/Programs/http.zig$ git log --oneline | head -n 3
68e760c 'make t' uses thread sanitizer
44f050e make writing new state thread-safe
726f3da Use test_filters instead of deprecated test_filter (#139)
wolfgangsanyer@wolfgangsanyer22:~/Programs/http.zig$ zig build test -Dtsan                                                                                                                                                                                                                
info(websocket): received shutdown signal                                                                                                                                                                                                                                                 
httpz: quick shutdown (53.22ms)                                                                                                                                                                                                                                                           
httpz: invalid request (0.92ms)                                                                                                                                                                                                                                                           
httpz: invalid request path (0.55ms)                                                                                                                                                                                                                                                      
httpz: invalid header name (0.51ms)
<snip>
websocket: invalid request (56.91ms)                                                                                                                                                                                                                                   15:15:45 [259/1621]
debug(websocket): (127.0.0.1:53576) received text message
==================                                                    
WARNING: ThreadSanitizer: data race (pid=601752)
  Write of size 7 at 0x7fb8051ed000 by thread T210:
    #0 read /home/wolfgangsanyer/.zvm/0.14.0/lib/tsan/sanitizer_common/sanitizer_common_interceptors.inc:972:3 (test+0x4b2560)
    #1 posix.read /home/wolfgangsanyer/.zvm/0.14.0/lib/std/posix.zig:862:31 (test+0x1f48c0)
    #2 net.Stream.read /home/wolfgangsanyer/.zvm/0.14.0/lib/std/net.zig:1851:26 (test+0x2f7027)
    #3 proto.Reader.fill__anon_52333 /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/proto.zig:142:34 (test+0x364a3e)
    #4 server.server._handleClientData__anon_53085 /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:1591:16 (test+0x37398c)
    #5 server.server.handleClientData__anon_52180 /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:1582:29 (test+0x36235e)
    #6 server.server.NonBlockingBase(httpz.TestWebsocketHandler.WebsocketHandler,false)._dataAvailable /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:752:40 (test+0x362152)
    #7 server.server.NonBlockingBase(httpz.TestWebsocketHandler.WebsocketHandler,false).dataAvailable /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:738:39 (test+0x3535df)
    #8 worker.NonBlocking(*httpz.Server(httpz.TestWebsocketHandler),httpz.TestWebsocketHandler.WebsocketHandler).processWebsocketData /home/wolfgangsanyer/Programs/http.zig/src/worker.zig:826:64 (test+0x33cd0e)
    #9 worker.NonBlocking(*httpz.Server(httpz.TestWebsocketHandler),httpz.TestWebsocketHandler.WebsocketHandler).processData /home/wolfgangsanyer/Programs/http.zig/src/worker.zig:784:61 (test+0x31f94b)
    #10 thread_pool.Worker((function 'processData')).run.6 /home/wolfgangsanyer/Programs/http.zig/src/thread_pool.zig:244:17 (test+0x2e3937)
    #11 Thread.callFn__anon_40862 /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:488:13 (test+0x2b3dec)
    #12 Thread.PosixThreadImpl.spawn__anon_37148.Instance.entryFn /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:757:30 (test+0x285054)

  Previous read of size 6 at 0x7fb8051ed002 by thread T209:
    #0 writev /home/wolfgangsanyer/.zvm/0.14.0/lib/tsan/sanitizer_common/sanitizer_common_interceptors.inc:1155:3 (test+0x4b33f3)
    #1 posix.writev /home/wolfgangsanyer/.zvm/0.14.0/lib/std/posix.zig:1337:33 (test+0x1d94c5)
    #2 server.server.Conn.writeAllIOVec /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:1431:41 (test+0x3289c6)
    #3 server.server.Conn.writeFrame /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:1414:34 (test+0x32bb41)
    #4 server.server.Conn.write /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:1322:31 (test+0x373691)
    #5 httpz.TestWebsocketHandler.WebsocketHandler.clientMessage /home/wolfgangsanyer/Programs/http.zig/src/httpz.zig:1826:32 (test+0x3737a7)
    #6 server.server._handleClientData__anon_53085 /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:1648:47 (test+0x37417f)
    #7 server.server.handleClientData__anon_52180 /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:1582:29 (test+0x36235e)
    #8 server.server.NonBlockingBase(httpz.TestWebsocketHandler.WebsocketHandler,false)._dataAvailable /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:752:40 (test+0x362152)
    #9 server.server.NonBlockingBase(httpz.TestWebsocketHandler.WebsocketHandler,false).dataAvailable /home/wolfgangsanyer/.cache/zig/p/websocket-0.1.0-ZPISdXNIAwCXG7oHBj4zc1CfmZcDeyR6hfTEOo8_YI4r/src/server/server.zig:738:39 (test+0x3535df)
    #10 worker.NonBlocking(*httpz.Server(httpz.TestWebsocketHandler),httpz.TestWebsocketHandler.WebsocketHandler).processWebsocketData /home/wolfgangsanyer/Programs/http.zig/src/worker.zig:826:64 (test+0x33cd0e)
    #11 worker.NonBlocking(*httpz.Server(httpz.TestWebsocketHandler),httpz.TestWebsocketHandler.WebsocketHandler).processData /home/wolfgangsanyer/Programs/http.zig/src/worker.zig:784:61 (test+0x31f94b)
    #12 thread_pool.Worker((function 'processData')).run.6 /home/wolfgangsanyer/Programs/http.zig/src/thread_pool.zig:244:17 (test+0x2e3937)
    #13 Thread.callFn__anon_40862 /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:488:13 (test+0x2b3dec)
    #14 Thread.PosixThreadImpl.spawn__anon_37148.Instance.entryFn /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:757:30 (test+0x285054)

  Thread T210 (tid=601964, running) created by thread T205 at:
    #0 pthread_create /home/wolfgangsanyer/.zvm/0.14.0/lib/tsan/tsan_interceptors_posix.cpp:1023:3 (test+0x4aae3d)
    #1 Thread.PosixThreadImpl.spawn__anon_37148 /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:775:33 (test+0x284b46)
    #2 Thread.spawn__anon_33651 /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:421:32 (test+0x24bf19)
    #3 thread_pool.ThreadPool((function 'processData')).init.6 /home/wolfgangsanyer/Programs/http.zig/src/thread_pool.zig:55:46 (test+0x24b90a)
    #4 worker.NonBlocking(*httpz.Server(httpz.TestWebsocketHandler),httpz.TestWebsocketHandler.WebsocketHandler).init /home/wolfgangsanyer/Programs/http.zig/src/worker.zig:461:70 (test+0x24c5f0)
    #5 httpz.Server(httpz.TestWebsocketHandler).listen /home/wolfgangsanyer/Programs/http.zig/src/httpz.zig:423:49 (test+0x24dd79)
    #6 Thread.callFn__anon_28988 /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:507:21 (test+0x202d4d)
    #7 Thread.PosixThreadImpl.spawn__anon_21383.Instance.entryFn /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:757:30 (test+0x1edce4)

  Thread T209 (tid=601963, running) created by thread T205 at:
    #0 pthread_create /home/wolfgangsanyer/.zvm/0.14.0/lib/tsan/tsan_interceptors_posix.cpp:1023:3 (test+0x4aae3d)
    #1 Thread.PosixThreadImpl.spawn__anon_37148 /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:775:33 (test+0x284b46)
    #2 Thread.spawn__anon_33651 /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:421:32 (test+0x24bf19)
    #3 thread_pool.ThreadPool((function 'processData')).init.6 /home/wolfgangsanyer/Programs/http.zig/src/thread_pool.zig:55:46 (test+0x24b90a)
    #4 worker.NonBlocking(*httpz.Server(httpz.TestWebsocketHandler),httpz.TestWebsocketHandler.WebsocketHandler).init /home/wolfgangsanyer/Programs/http.zig/src/worker.zig:461:70 (test+0x24c5f0)
    #5 httpz.Server(httpz.TestWebsocketHandler).listen /home/wolfgangsanyer/Programs/http.zig/src/httpz.zig:423:49 (test+0x24dd79)
    #6 Thread.callFn__anon_28988 /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:507:21 (test+0x202d4d)
    #7 Thread.PosixThreadImpl.spawn__anon_21383.Instance.entryFn /home/wolfgangsanyer/.zvm/0.14.0/lib/std/Thread.zig:757:30 (test+0x1edce4)

SUMMARY: ThreadSanitizer: data race /home/wolfgangsanyer/.zvm/0.14.0/lib/tsan/sanitizer_common/sanitizer_common_interceptors.inc:972:3 in read

<snip>

@karlseguin
Copy link
Owner

@ezzieyguywuf are you on Mac or Linux? I can't reproduce this. The stack that you provided tell me that it's trying to process multiple websocket messages at the same time. When the connection is upgraded, we switch to oneshot/dispatch in epoll/kqueue. Could definitely be an issue with how that's done.

@ezzieyguywuf
Copy link
Contributor Author

I'm on linux

@karlseguin
Copy link
Owner

Based on https://nullprogram.com/blog/2022/10/03/ it seems that TSan does generate false positives when used with epoll and oneshot (which is what httpz is doing with websocket connections).

I tried to look into it, and I've yet to observe multiple threads concurrently accessing processWebsocketData - which is the main entry point where all this is happening. Guarding the function with a mutex, as in this PR, would make the websocket handling single-threaded - processing one message at a time.

Gonna keep looking into it.

@ezzieyguywuf
Copy link
Contributor Author

eyy nice find. I can see if there's a way to tell tsan to ignore one particular spot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants