fix: Union serialization #366

antonsmetanin · 2025-12-13T05:07:39Z

Description

Fix for the issue described here: #365

For some reason enum variants get serialized as records with two fields:

"type" with an Avro enum for the discriminator,

"value" with a union variant containing the fields.
instead of just serializing union variant directly. Because of this bytecode ends up being different from what's expected (there's an additional enum byte for "type"), but also schema resolution fails because it expects to find actual fields at the top level, while they are under "value".

Separately from the first issue, even if you try to use resolve() on a Value built manually, schema resolution ignores the variant number coming from Value::Union and tries to find a schema by simply matching the fields, which causes serialization to lose field data in some cases, depending on the order in which these variants are listed in the schema.

Changes

changed SeqVariantSerializer and StructVariantSerializer to serialize Enum variants as Union(...) directly
instead of Record([("type", Enum(...)) ("value, Union(...))]);
since they don't need to store the variant anymore, I removed the field and removed the lifetime annotation
that was only needed because this field was a &str;
SeqVariantSerializer had two implementations: ser::SerializeTupleVariant and ser::SerializeStructVariant,
where the first one delegated everything to the second one, but the second one was never used by itself,
so I removed it;
changed the logic of the resolve_union function so that it uses the index from Value::Union
to get the correct schema;
added a new Error variant UnionIndexOutOfBounds for cases where this index gets out of bounds;
wrote a test that fails before the fix and succeeds after.

Kriskras99 · 2025-12-13T19:18:48Z

Hi Anton,

Thank you for your PR! I think your intended change is very useful, but will require some time to review. I will get to it, but it might have to wait a few weeks.

antonsmetanin · 2025-12-20T05:11:35Z

Thanks! I might have misunderstood how schema resolution is supposed to work, so changes to types.rs could be wrong, but the rest should still hold up. I'll look into it a bit later, but apparently there are two steps to finding the correct type in the schema based on the Avro datum:

The tag byte from the data is used to index union from the writer's schema. So for example, if the writer's schema for a field is defined as [ "null", "A", "B", "string", "C" ], and the tag is 2, it must pick type B.
The type from the first step is used to find a corresponding type in the reader's schema. So if the reader's schema is [ "null", "string", "D", "C", "B", "A" ], it will find B and the resulting index after resolution becomes 4. For primitive types, the comparison is trivial, but for complex ones they should match by name (or alias) first and then by structure, since the first condition for matching records in the spec states:

To match, one of the following must hold:
both schemas are records with the same (unqualified) name

So according to this, the current implementation is still not correct, because it ignores the tag. What also confuses me is how the resolve function is supposed to be used in practice. I would expect it to accept both writer's and reader's schemas, but it only accepts one and here in schema registry converter it's used with the writer's schema when encoding the value.

antonsmetanin · 2025-12-20T06:12:24Z

Fixed clippy warnings and formatting.

antonsmetanin force-pushed the fix/union-serialization branch from a492796 to 8243090 Compare December 13, 2025 05:08

antonsmetanin mentioned this pull request Dec 13, 2025

Enums don't serialize correctly #365

Open

fix: Union serialization

a4a430f

antonsmetanin force-pushed the fix/union-serialization branch from 8243090 to a4a430f Compare December 20, 2025 05:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Union serialization #366

fix: Union serialization #366

Uh oh!

antonsmetanin commented Dec 13, 2025 •

edited

Loading

Uh oh!

Kriskras99 commented Dec 13, 2025

Uh oh!

antonsmetanin commented Dec 20, 2025

Uh oh!

antonsmetanin commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Union serialization #366

Are you sure you want to change the base?

fix: Union serialization #366

Uh oh!

Conversation

antonsmetanin commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Uh oh!

Kriskras99 commented Dec 13, 2025

Uh oh!

antonsmetanin commented Dec 20, 2025

Uh oh!

antonsmetanin commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

antonsmetanin commented Dec 13, 2025 •

edited

Loading