-
Notifications
You must be signed in to change notification settings - Fork 402
Best‐Practices: Serialization and Deserialization framework for Rust
We want a stable and efficient format for structured data in our Rust code. The choice should balance speed, memory behavior, schema evolution, and a smooth experience for contributors.
- Low allocation during reads where practical
- Predictable performance for both reads and writes
- Clear schema evolution story
- Simple build and code generation steps
- Good Rust support and long term stability
Pros
- Very easy to use
- Clean integration with normal Rust structs
- No external tools needed
- Great for readability and debugging
Cons
- High allocation cost for large or frequent messages
- Slower parsing compared with binary formats
- Larger data size on the wire or disk
- No enforced schema evolution rules
Allocation
High. In the usual pattern, decoding into Rust structs or into serde_json::Value produces owned data structures that allocate on the heap. This is simple and predictable, but not ideal for hot paths.
Pros
- Compact binary encoding
- Mature and actively used Rust support
- Clear and predictable schema evolution rules
- Generates normal Rust types
- Good community and tooling support
Cons
- Requires a build script and generated code
- Read path allocates to build Rust structs
- Not a zero copy format
Allocation Similar in nature to JSON when decoding into owned Rust values. Protobuf also constructs an owned object graph, therefore it allocates for fields and collections. The main benefits over JSON come from a smaller encoded size and typically faster parsing, not from avoiding allocations.
Pros
- Can read data in place from a single buffer without extra copies in many situations
- Very low allocation in read heavy lookup paths when using buffer views
- Good fit for some performance sensitive read patterns
- Compact binary representation
Cons
- Builder pattern for writes is more verbose
- Rust support is documented as experimental and APIs may change between minor versions
- Accessor based API can feel mechanical to use
- Schema evolution requires care and discipline
Allocation
Reads can work over the original buffer, so additional allocation on the read path can be very low when the API is used in that style. Writes use a builder that owns a growing Vec<u8>, which may reallocate as it grows if it is not sized correctly up front.
Pros
- Designed for direct traversal of message buffers without a separate decode step
- Good fit for memory mapped or shared buffers
- Clear schema evolution model
- Rust crate is stable and maintained
- Supports zero copy style access in many scenarios
Cons
- Learning curve around segments, arenas, and pointer layout
- Smaller ecosystem compared with Protobuf
- Requires schema based code generation and a build script
- Write side uses arena style allocation, so memory use depends on message layout and configuration
Allocation Capn proto encodes messages in a layout that allows in place traversal of the buffer, which can avoid extra allocations on the read path. On the write path, it relies on an arena of one or more segments that grow as needed, so allocation behavior depends on message size and growth pattern. We do not yet have internal measurements that compare write side behavior directly against FlatBuffers for our workloads.
| Feature | JSON (serde) | Protobuf (prost) | FlatBuffers | Cap’n Proto |
|---|---|---|---|---|
| Encoding Format | Text based JSON | Compact binary tag length value format | Binary format with offset tables and tables | Binary pointer based format with segments |
| Read Performance | Slowest for large or frequent messages | Usually faster than JSON, often good enough for many workloads | Very fast when reading directly from buffers | Very fast when traversing message buffers directly |
| Write Performance | Simple, moderate speed | Good, encoding cost usually lower than JSON | Good, but builder API can be verbose | Good, arena based writes, cost depends on message layout |
| Allocation on Read | High, always builds owned Rust values | High, also builds owned Rust values | Very low when using buffer views instead of materializing structs | Very low when traversing buffers directly, may increase if data is copied |
| Allocation on Write | Allocations in string builder or buffers | Allocations for message buffers and repeated fields | Single growing buffer that may reallocate if not sized up front | Arena or segment growth, allocations depend on segment sizing |
| Schema Evolution | By convention only, no enforced rules | Strong and well documented evolution rules | Workable but requires discipline and careful schema changes | Strong evolution model with explicit rules |
| Rust Developer Experience | Easiest to adopt, integrates naturally with structs and enums | Straightforward once code generation is wired into the build | Less natural, accessor style API and experimental status in Rust | Requires learning segments and builders, API is less familiar for the team |
| Tooling and Ecosystem | Very broad support, many helper crates | Strong ecosystem across languages and good Rust integration | Good general story, Rust support smaller and less mature | Solid core tools, smaller ecosystem and fewer examples than Protobuf |
| Indexing or Debuggability | Human readable, easy to inspect and log | Needs decoded view or specific tools to inspect binary messages | Requires schema aware tools or helper code to inspect buffers | Requires schema aware tools, binary layout less intuitive to read directly |
| Code Generation | None required for typical usage | Requires .proto schemas and a build script to generate Rust code |
Requires schema files and generated code, plus builder or accessor helpers | Requires schema files, generated code, and build integration |
| Weak Spots | Slow and large for hot paths, no built in schema guarantees | No zero copy read, still allocates for full object graphs | Experimental Rust API, more complex schema and builder mental model | Less internal experience, arena behavior and layout need careful profiling |
For most use cases that need a good balance of performance, clarity, and long term stability, Protobuf remains the default choice. It provides predictable behavior, strong schema evolution rules, and a mature Rust toolchain, and we already understand how to integrate it into our builds.
JSON remains useful for configuration, small human edited files, and readable examples. It should not be used for performance sensitive data paths.
For workloads that require very low allocation or direct use of memory mapped buffers, Capn proto and FlatBuffers are both worth exploring. At this point we do not have enough internal experience or measurements to clearly favor one over the other. Teams that hit such bottlenecks should run focused experiments with their real data and access patterns, then choose the format that best fits their constraints and build setup.
This guidance therefore proposes:
- Use Protobuf as the primary format for structured data in this library
- Continue to use JSON for configuration and debugging oriented structures
- Treat Capn proto and FlatBuffers as advanced options to evaluate when a specific workload clearly needs lower allocation or a more specialized memory layout