Skip to content

Memory snapshots cannot be restored due to network packets with "already-parsed" headers #12072

@abhagwat

Description

@abhagwat

Description

Memory snapshots cannot be restored due to network packets with "already-parsed" headers

Steps to reproduce

I've been debugging a problem with restoration of memory-snapshots. Many of my crashes come from this line in PacketBuffer.consume.

I have two checkpoints to reproduce the problem: one that (consistently) fails to be restored, and another one that can be restored without problems.

I've been looking at gVisor logs (instrumented with my own debug logging). The difference between these snapshots seems to be this: the unsuccessful one always has a non-empty PacketBufferList during restore — i.e. it loads a non-empty PacketBufferList from the serialized checkpoint. It seems to me that this per se is a condition that gVisor software is not designed to handle well. Those packets seem to be "already parsed" (parsed in the network sense, not in the checkpoint sense: they've already been populated from the underlying network buffer), and gVisor doesn't seem to like that.

Prior to the "second" consume call, the headers in the packet object look something like this.

I0820 19:54:33.405501       1 packet_buffer.go:350] === Packet Headers Debug ===
I0820 19:54:33.405511       1 packet_buffer.go:353] Header[0] virtioNetHeader: offset=0, length=0
I0820 19:54:33.405516       1 packet_buffer.go:353] Header[1] linkHeader: offset=0, length=14
I0820 19:54:33.405520       1 packet_buffer.go:353] Header[2] networkHeader: offset=14, length=28
I0820 19:54:33.405523       1 packet_buffer.go:353] Header[3] transportHeader: offset=0, length=0
I0820 19:54:33.405527       1 packet_buffer.go:365] PacketBuffer state: reserved=0, pushed=0, consumed=42

What might be going on? I couldn't figure out conditions under which packets with partially-parsed headers could end up in processor.pkts. Perhaps packet-parsing can somehow race with checkpointing? I haven't explored that theory much.


I'm using source code from this revision to build gVisor: bb08e96046752b436adbdfd363673fde8ad8b373

runsc version

bb08e96046752b436adbdfd363673fde8ad8b373

docker version (if using docker)

uname

No response

kubectl (if using Kubernetes)

repo state (if built from source)

No response

runsc debug logs (if available)

Full `boot.txt` log (with some lines at the top redacted):
https://gist.githubusercontent.com/abhagwat/19a41d2cf9fce01e5d6f77678788ad1a/raw/b26460d1febd0ccde201fc56e872cf9d2d878075/gistfile0.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions