-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
I'll caveat this by saying I've seen the supported kernel versions and understand that v6.17 is both not within this list and also quite new in and of itself. That said I'd figure I would lodge this issue as it can be quite nasty to diagnose. Please feel free to close or shelve this if it should wait until 6.17 guests are officially supported.
Issue
I have a Deno script running within a Firecracker guest that performs HTTP POST requests containing varying sized files within a multi-part form (via vsock) against a server running on the host. When running under a 6.17 kernel I found that when this script would attempt to upload some files it would fail with an odd error indicating the multipart form boundary couldn't be found, suggesting the POST body was truncated or corrupted. Upon further investigation I found that this appeared to only happen for large files (40Mb) and would always succeed for smaller files (~5Kb).
After testing a Go version of the above script I found that it didn't exhibit the same behavior and initially attributed this to a bug with Deno's vsock implementation. I don't know of an easy way to tcpdump
the traffic here so I found it quite tricky to actually narrow down the issue. When I eventually tested another version of the script running on Python I discovered that it too exhibited the same issue as Deno. Additionally as I was building these alternative reproduction cases I discovered that even relatively small payloads would hit this problem, file sizes (excluding HTTP/multi-part/etc overhead) of just 1024 * 224
would reproduce, but 1024 * 223
would not.
Bisecting the kernel I found 6693731487a8145a9b039bc983d77edc47693855 as the culprit. To further rule-out a kernel bug I tested qemu with the same kernel and rootfs I had used with Firecracker. Because QEMU uses a pure vsock implementation without the unix sockets proxy this isn't exactly 1-to-1 and I don't think it can explicitly rule out a kernel issue. Regardless QEMU didn't show the same problems and was able to pass data correctly.
I'm not 100% positive why Go doesn't hit this issue however viewing an strace I did notice that while Go performs multiple write
syscalls, Python3 performs a single sendto
and Deno performs multiple sendto
's.
To Reproduce
- Start Firecracker running a v6.17 guest kernel (or any kernel that contains the
66937314
commit) and with vsock enabled. - On the host machine run some sort of TCP/HTTP/etc server which receives and validates data. This should listen to a unix socket for vsock integration, as is documented by Firecracker.
- Within the guest run some sort of application which connects (via vsock) to the server on the host machine and sends data. For consistent reproduction try to send a payload of at least 1MB.
- You should observe data corruption on the server-side, the client will print
failed: Bad at index 229209
(an error forwarded from the server). When running on a non-reproducing kernel the client should not print out anything.
To aide in reproduction I've included the Deno server script, Deno client script, and Python 3 client script:
// deno run -A server.ts
async function handler(request: Request): Promise<Response> {
const { pathname } = new URL(request.url);
if (request.method !== "POST" || pathname !== "/upload") {
return new Response("Not Found", { status: 404 });
}
const data = await request.bytes();
for (let i = 0; i < data.length; i++) {
if (data[i] !== 65 + (i % 10)) {
return new Response(`Bad at index ${i}`, { status: 500 });
}
}
return new Response(JSON.stringify({}), {
status: 200,
headers: { "Content-Type": "application/json" },
});
}
const socketPath = "./v.sock_80";
try {
await Deno.remove(socketPath);
} catch (error) {
if (!(error instanceof Deno.errors.NotFound)) {
throw error;
}
}
Deno.serve({
handler,
path: socketPath,
});
// deno run -A client.ts
async function runTest(size: number) {
const client = Deno.createHttpClient({
proxy: {
transport: "vsock",
cid: 2,
port: 80,
},
});
const data = new Uint8Array(size);
for (let i = 0; i < size; i++) {
data[i] = 65 + (i % 10);
}
const blob = new Blob([data]);
const res = await fetch(`http://localhost/upload`, {
client,
method: "POST",
body: blob,
});
if (res.status !== 200) {
console.error(`failed: ${await res.text()}`);
return;
}
}
runTest(1024 * 1024 * 10);
#!/usr/bin/env python3
import socket
VMADDR_CID_HOST = 2
HOST_PORT = 80
def main():
sock = None
try:
request_data = bytearray()
for i in range(1024 * 1024 * 10):
request_data.append(65 + (i % 10))
headers = (
f"POST /upload HTTP/1.1\r\n"
f"Host: localhost\r\n"
f"Content-Length: {len(request_data)}\r\n"
f"\r\n"
).encode('utf-8')
sock = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
sock.connect((VMADDR_CID_HOST, HOST_PORT))
print(len(request_data))
sock.sendall(headers + request_data)
response = sock.recv(4096) # Receive up to 4KB of the response
if response:
print(response.decode('utf-8', errors='ignore'))
else:
print("No response received from the server.")
finally:
if sock:
sock.close()
if __name__ == "__main__":
main()
Expected behaviour
Data should be passed across vsock without corruption or truncation.
Environment
- Firecracker version: v1.13.1
- Host and guest kernel versions: Guest 6.17, Hosts 6.8 and 5.4 tested.
- Rootfs used: Ubuntu 24.04
- Architecture: amd64