Skip to content

[vsock] guest -> host data corruption for guest kernel v6.17 #5475

@b1naryth1ef

Description

@b1naryth1ef

I'll caveat this by saying I've seen the supported kernel versions and understand that v6.17 is both not within this list and also quite new in and of itself. That said I'd figure I would lodge this issue as it can be quite nasty to diagnose. Please feel free to close or shelve this if it should wait until 6.17 guests are officially supported.

Issue

I have a Deno script running within a Firecracker guest that performs HTTP POST requests containing varying sized files within a multi-part form (via vsock) against a server running on the host. When running under a 6.17 kernel I found that when this script would attempt to upload some files it would fail with an odd error indicating the multipart form boundary couldn't be found, suggesting the POST body was truncated or corrupted. Upon further investigation I found that this appeared to only happen for large files (40Mb) and would always succeed for smaller files (~5Kb).

After testing a Go version of the above script I found that it didn't exhibit the same behavior and initially attributed this to a bug with Deno's vsock implementation. I don't know of an easy way to tcpdump the traffic here so I found it quite tricky to actually narrow down the issue. When I eventually tested another version of the script running on Python I discovered that it too exhibited the same issue as Deno. Additionally as I was building these alternative reproduction cases I discovered that even relatively small payloads would hit this problem, file sizes (excluding HTTP/multi-part/etc overhead) of just 1024 * 224 would reproduce, but 1024 * 223 would not.

Bisecting the kernel I found 6693731487a8145a9b039bc983d77edc47693855 as the culprit. To further rule-out a kernel bug I tested qemu with the same kernel and rootfs I had used with Firecracker. Because QEMU uses a pure vsock implementation without the unix sockets proxy this isn't exactly 1-to-1 and I don't think it can explicitly rule out a kernel issue. Regardless QEMU didn't show the same problems and was able to pass data correctly.

I'm not 100% positive why Go doesn't hit this issue however viewing an strace I did notice that while Go performs multiple write syscalls, Python3 performs a single sendto and Deno performs multiple sendto's.

To Reproduce

  1. Start Firecracker running a v6.17 guest kernel (or any kernel that contains the 66937314 commit) and with vsock enabled.
  2. On the host machine run some sort of TCP/HTTP/etc server which receives and validates data. This should listen to a unix socket for vsock integration, as is documented by Firecracker.
  3. Within the guest run some sort of application which connects (via vsock) to the server on the host machine and sends data. For consistent reproduction try to send a payload of at least 1MB.
  4. You should observe data corruption on the server-side, the client will print failed: Bad at index 229209 (an error forwarded from the server). When running on a non-reproducing kernel the client should not print out anything.

To aide in reproduction I've included the Deno server script, Deno client script, and Python 3 client script:

// deno run -A server.ts
async function handler(request: Request): Promise<Response> {
  const { pathname } = new URL(request.url);

  if (request.method !== "POST" || pathname !== "/upload") {
    return new Response("Not Found", { status: 404 });
  }

  const data = await request.bytes();
  for (let i = 0; i < data.length; i++) {
    if (data[i] !== 65 + (i % 10)) {
      return new Response(`Bad at index ${i}`, { status: 500 });
    }
  }

  return new Response(JSON.stringify({}), {
    status: 200,
    headers: { "Content-Type": "application/json" },
  });
}

const socketPath = "./v.sock_80";

try {
  await Deno.remove(socketPath);
} catch (error) {
  if (!(error instanceof Deno.errors.NotFound)) {
    throw error;
  }
}

Deno.serve({
  handler,
  path: socketPath,
});
// deno run -A client.ts
async function runTest(size: number) {
  const client = Deno.createHttpClient({
    proxy: {
      transport: "vsock",
      cid: 2,
      port: 80,
    },
  });

  const data = new Uint8Array(size);
  for (let i = 0; i < size; i++) {
    data[i] = 65 + (i % 10);
  }
  const blob = new Blob([data]);

  const res = await fetch(`http://localhost/upload`, {
    client,
    method: "POST",
    body: blob,
  });
  if (res.status !== 200) {
    console.error(`failed: ${await res.text()}`);
    return;
  }
}

runTest(1024 * 1024 * 10);
#!/usr/bin/env python3

import socket

VMADDR_CID_HOST = 2
HOST_PORT = 80

def main():
    sock = None
    try:
        request_data = bytearray()
        for i in range(1024 * 1024 * 10):
            request_data.append(65 + (i % 10))

        headers = (
            f"POST /upload HTTP/1.1\r\n"
            f"Host: localhost\r\n"
            f"Content-Length: {len(request_data)}\r\n"
            f"\r\n"
        ).encode('utf-8')
        sock = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
        sock.connect((VMADDR_CID_HOST, HOST_PORT))

        print(len(request_data))
        sock.sendall(headers + request_data)

        response = sock.recv(4096)  # Receive up to 4KB of the response
        if response:
            print(response.decode('utf-8', errors='ignore'))
        else:
            print("No response received from the server.")
    finally:
        if sock:
            sock.close()

if __name__ == "__main__":
    main()

Expected behaviour

Data should be passed across vsock without corruption or truncation.

Environment

  • Firecracker version: v1.13.1
  • Host and guest kernel versions: Guest 6.17, Hosts 6.8 and 5.4 tested.
  • Rootfs used: Ubuntu 24.04
  • Architecture: amd64

Metadata

Metadata

Assignees

No one assigned

    Labels

    Priority: LowIndicates that an issue or pull request should be resolved behind issues or pull requests labelled `Status: ParkedIndicates that an issues or pull request will be revisited laterType: BugIndicates an unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions