Skip to content

Enums don't serialize correctly #365

@antonsmetanin

Description

@antonsmetanin

Enum serialization doesn't seem to work correctly. I've prepared a test that demonstrates the issue, but basically I couldn't find a way to serialize an enum to match the following schema, other than manually building a Value, and even then schema resolution fails to find the correct union variant:

{
    "name": "Root",
    "type": "record",
    "fields": [
        {"name": "field_union", "type": [
            {
                "name": "A",
                "type": "record",
                "fields": []
            },
            {
                "name": "B",
                "type": "record",
                "fields": []
            },
            {
                "name": "C",
                "type": "record",
                "fields": [
                    {"name": "field_a", "type": "long"},
                    {"name": "field_b", "type": ["null", "string"]}
                ]
            },
            {
                "name": "D",
                "type": "record",
                "fields": [
                    {"name": "field_a", "type": "float"},
                    {"name": "field_b", "type": "int"}
                ]
            }
        ]},
        {"name": "field_f", "type": "string"}
    ]
}

Here's the enum definition that's supposed to work with this schema, but it doesn't currently:

#[derive(Serialize)]
struct Root {
    field_union: Enum,
    field_f: String,
}

#[derive(Serialize)]
enum Enum {
    A {},
    B {},
    C {
        field_a: i64,
        field_b: Option<String>,
    },
    D {
        field_a: f32,
        field_b: i32
    },
}

So I've looked into the implementation, and found two issues:

  1. For some reason enum variants get serialized as records with two fields:
  • "type" with an Avro enum for the discriminator,
  • "value" with a union variant containing the fields.
    instead of just serializing union variant directly. Because of this bytecode ends up being different from what's expected (there's an additional enum byte for "type"), but also schema resolution fails because it expects to find actual fields at the top level, while they are under "value".
  1. Separately from the first issue, even if you try to use resolve() on a Value built manually, schema resolution ignores the variant number coming from Value::Union and tries to find a schema by simply matching the fields, which causes serialization to lose field data in some cases, depending on the order in which these variants are listed in the schema. In the example above, A is tested first, and since it doesn't have any fields, it matches any value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions