-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Description
Memory snapshots cannot be restored due to network packets with "already-parsed" headers
Steps to reproduce
I've been debugging a problem with restoration of memory-snapshots. Many of my crashes come from this line in PacketBuffer.consume
.
I have two checkpoints to reproduce the problem: one that (consistently) fails to be restored, and another one that can be restored without problems.
I've been looking at gVisor logs (instrumented with my own debug logging). The difference between these snapshots seems to be this: the unsuccessful one always has a non-empty PacketBufferList during restore — i.e. it loads a non-empty PacketBufferList from the serialized checkpoint. It seems to me that this per se is a condition that gVisor software is not designed to handle well. Those packets seem to be "already parsed" (parsed in the network sense, not in the checkpoint sense: they've already been populated from the underlying network buffer), and gVisor doesn't seem to like that.
Prior to the "second" consume
call, the headers in the packet object look something like this.
I0820 19:54:33.405501 1 packet_buffer.go:350] === Packet Headers Debug ===
I0820 19:54:33.405511 1 packet_buffer.go:353] Header[0] virtioNetHeader: offset=0, length=0
I0820 19:54:33.405516 1 packet_buffer.go:353] Header[1] linkHeader: offset=0, length=14
I0820 19:54:33.405520 1 packet_buffer.go:353] Header[2] networkHeader: offset=14, length=28
I0820 19:54:33.405523 1 packet_buffer.go:353] Header[3] transportHeader: offset=0, length=0
I0820 19:54:33.405527 1 packet_buffer.go:365] PacketBuffer state: reserved=0, pushed=0, consumed=42
What might be going on? I couldn't figure out conditions under which packets with partially-parsed headers could end up in processor.pkts
. Perhaps packet-parsing can somehow race with checkpointing? I haven't explored that theory much.
I'm using source code from this revision to build gVisor: bb08e96046752b436adbdfd363673fde8ad8b373
runsc version
bb08e96046752b436adbdfd363673fde8ad8b373
docker version (if using docker)
uname
No response
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
Full `boot.txt` log (with some lines at the top redacted):
https://gist.githubusercontent.com/abhagwat/19a41d2cf9fce01e5d6f77678788ad1a/raw/b26460d1febd0ccde201fc56e872cf9d2d878075/gistfile0.txt