diff --git a/content/editions/features.md b/content/editions/features.md index 2764f3e1..c1fd59a3 100644 --- a/content/editions/features.md +++ b/content/editions/features.md @@ -100,8 +100,8 @@ and `export` keywords to set per-field behavior. Read more about this at to local. * `LOCAL_ALL`: All symbols default to local. * `STRICT`: All symbols local by default. Nested types cannot be exported, - except for a special-case caveat for message `{ enum {} reserved 1 to max; - }`. This is the recommended setting for new protos. + except for a special-case caveat for `message { enum {} reserved 0 to max; + }`. This will become the default in a future edition. **Applicable to the following scope:** Enum, Message @@ -172,7 +172,7 @@ protos are round-trippable by default with a feature value to opt-out to use **Applicable to the following scope:** File -**Added in:** 2024 +**Added in:** Edition 2024 **Default behavior per syntax/edition:** @@ -231,7 +231,7 @@ and after of a proto3 file. **Applicable to the following scopes:** File, Enum -**Added in:** 2023 +**Added in:** Edition 2023 **Default behavior per syntax/edition:** @@ -292,7 +292,7 @@ whether a protobuf field has a value. **Applicable to the following scopes:** File, Field -**Added in:** 2023 +**Added in:** Edition 2023 **Default behavior per syntax/edition:** @@ -385,7 +385,7 @@ and after of a proto3 file. Editions behavior matches the behavior in proto3. **Applicable to the following scopes:** File, Message, Enum -**Added in:** 2023 +**Added in:** Edition 2023 **Default behavior per syntax/edition:** @@ -448,7 +448,7 @@ the following conditions are met: **Applicable to the following scopes:** File, Field -**Added in:** 2023 +**Added in:** Edition 2023 **Default behavior per syntax/edition:** @@ -504,7 +504,7 @@ for `repeated` fields has been migrated to in Editions. **Applicable to the following scopes:** File, Field -**Added in:** 2023 +**Added in:** Edition 2023 **Default behavior per syntax/edition:** @@ -583,7 +583,7 @@ and after of a proto3 file. **Applicable to the following scopes:** File, Field -**Added in:** 2023 +**Added in:** Edition 2023 **Default behavior per syntax/edition:** @@ -650,7 +650,7 @@ in the migration guide for more on this topic. **Applicable to the following scopes:** Enum, File -**Added in:** 2024 +**Added in:** Edition 2024 **Default behavior per syntax/edition:** @@ -680,7 +680,7 @@ example, switch statements are not supported. **Applicable to the following scopes:** Enum -**Added in:** 2024 +**Added in:** Edition 2024 **Default behavior per syntax/edition:** @@ -713,7 +713,7 @@ before and after of a proto3 file. **Applicable to the following scopes:** File, Field -**Added in:** 2023 +**Added in:** Edition 2023 **Default behavior per syntax/edition:** @@ -762,7 +762,7 @@ message Msg { **Languages:** Java This feature controls whether the Java generator will nest the generated class -in the Java generated file class. Setting this option to `Yes` is the equivalent +in the Java generated file class. Setting this option to `NO` is the equivalent of setting `java_multiple_files = true` in proto2/proto3/edition 2023. The default outer classname is also updated to always be the camel-cased .proto @@ -779,7 +779,7 @@ becomes `BarBazProto`). You can still override this using the **Applicable to the following scopes:** Message, Enum, Service -**Added in:** 2024 +**Added in:** Edition 2024 **Default behavior per syntax/edition:** @@ -812,7 +812,7 @@ removed. **Applicable to the following scopes:** File, Field -**Added in:** 2023 +**Added in:** Edition 2023 **Default behavior per syntax/edition:** @@ -893,7 +893,7 @@ before and after of a proto3 file. **Applicable to the following scopes:** Field, File -**Added in:** 2023 +**Added in:** Edition 2023 **Default behavior per syntax/edition:** @@ -974,7 +974,7 @@ generator strips the repetitive prefix or not. **Applicable to the following scopes:** Enum, File -**Added in:** 2024 +**Added in:** Edition 2024 **Default behavior per syntax/edition:** @@ -991,7 +991,7 @@ files: ```proto edition = "2024"; -import "third_party/golang/protobuf/v2/src/google/protobuf/go_features.proto"; +import "google/protobuf/go_features.proto"; option features.(pb.go).strip_enum_prefix = STRIP_ENUM_PREFIX_STRIP; @@ -1074,16 +1074,23 @@ The following shows the settings to replicate Edition 2023 behavior with Edition 2024. ```proto +// foo/bar_baz.proto edition = "2024"; import option "third_party/protobuf/cpp_features.proto"; import option "third_party/java/protobuf/java_features.proto"; +// If previously relying on edition 2023 default java_outer_classname. +option java_outer_classname = "BarBaz" // or BarBazOuterClass + option features.(pb.cpp).string_type = STRING; option features.enforce_naming_style = STYLE_LEGACY; option features.default_symbol_visibility = EXPORT_ALL; option features.(pb.cpp).enum_name_uses_string_view = false; -option features.(pb.java).nest_in_file_class = LEGACY; + +message MyMessage { + option features.(pb.java).nest_in_file_class = YES; +} ``` ### Caveats and Exceptions {#caveats} diff --git a/content/editions/overview.md b/content/editions/overview.md index ea047cf4..e474129c 100644 --- a/content/editions/overview.md +++ b/content/editions/overview.md @@ -28,10 +28,6 @@ The examples in this topic show edition 2024 features, but edition 2024 is currently in **pre-release review** and is not yet recommended for production code. -The examples in this topic show edition 2024 features, but edition 2024 is -currently in **pre-release review** and is not yet recommended for production -code. - ## Lifecycle of a Feature {#lifecycles} Editions provide the fundamental increments for the lifecycle of a feature. @@ -369,7 +365,7 @@ Edition 2024 added support for option imports using the syntax `import option`. Option imports must come after any other `import` statements. -Unlike normal `import` statements, option imports import only custom options +Unlike normal `import` statements, `import option` only imports custom options defined in a `.proto` file, without importing other symbols. This means that messages and enums are excluded from the option import. In the diff --git a/content/getting-started/cpptutorial.md b/content/getting-started/cpptutorial.md index 2b95d676..403d4bf5 100644 --- a/content/getting-started/cpptutorial.md +++ b/content/getting-started/cpptutorial.md @@ -16,9 +16,7 @@ shows you how to This isn't a comprehensive guide to using protocol buffers in C++. For more detailed reference information, see the -[Protocol Buffer Language Guide (proto2)](/programming-guides/proto2), -the -[Protocol Buffer Language Guide (proto3)](/programming-guides/proto3), +[Protocol Buffer Language Guide](/programming-guides/editions), the [C++ API Reference](/reference/cpp/api-docs), the [C++ Generated Code Guide](/reference/cpp/cpp-generated), and the @@ -78,14 +76,14 @@ each field in the message. Here is the `.proto` file that defines your messages, `addressbook.proto`. ```proto -syntax = "proto2"; +edition = "2023"; package tutorial; message Person { - optional string name = 1; - optional int32 id = 2; - optional string email = 3; + string name = 1; + int32 id = 2; + string email = 3; enum PhoneType { PHONE_TYPE_UNSPECIFIED = 0; @@ -95,8 +93,8 @@ message Person { } message PhoneNumber { - optional string number = 1; - optional PhoneType type = 2 [default = PHONE_TYPE_HOME]; + string number = 1; + PhoneType type = 2; } repeated PhoneNumber phones = 4; @@ -105,71 +103,58 @@ message Person { message AddressBook { repeated Person people = 1; } + ``` As you can see, the syntax is similar to C++ or Java. Let's go through each part of the file and see what it does. -The `.proto` file starts with a package declaration, which helps to prevent -naming conflicts between different projects. In C++, your generated classes will -be placed in a namespace matching the package name. - -Next, you have your message definitions. A message is just an aggregate -containing a set of typed fields. Many standard simple data types are available -as field types, including `bool`, `int32`, `float`, `double`, and `string`. You -can also add further structure to your messages by using other message types as -field types -- in the above example the `Person` message contains `PhoneNumber` -messages, while the `AddressBook` message contains `Person` messages. You can -even define message types nested inside other messages -- as you can see, the -`PhoneNumber` type is defined inside `Person`. You can also define `enum` types -if you want one of your fields to have one of a predefined list of values -- -here you want to specify that a phone number can be one of the following phone -types: `PHONE_TYPE_MOBILE`, `PHONE_TYPE_HOME`, or `PHONE_TYPE_WORK`. +The `.proto` file starts with an `edition` declaration. Editions replace the +older `syntax = "proto2"` and `syntax = "proto3"` declarations and provide a +more flexible way to evolve the language over time. + +Next is a package declaration, which helps to prevent naming conflicts between +different projects. In C++, your generated classes will be placed in a namespace +matching the package name. + +Following the package declaration are your message definitions. A message is +just an aggregate containing a set of typed fields. Many standard simple data +types are available as field types, including `bool`, `int32`, `float`, +`double`, and `string`. You can also add further structure to your messages by +using other message types as field types -- in the above example the `Person` +message contains `PhoneNumber` messages, while the `AddressBook` message +contains `Person` messages. You can even define message types nested inside +other messages -- as you can see, the `PhoneNumber` type is defined inside +`Person`. You can also define enum types if you want one of your fields to have +one of a predefined list of values -- here you want to specify that a phone +number can be one of several types. The " = 1", " = 2" markers on each element identify the unique field number that field uses in the binary encoding. Field numbers 1-15 require one less byte to encode than higher numbers, so as an optimization you can decide to use those numbers for the commonly used or repeated elements, leaving field numbers 16 and -higher for less-commonly used optional elements. Each element in a repeated -field requires re-encoding the field number, so repeated fields are particularly -good candidates for this optimization. - -Each field must be annotated with one of the following modifiers: - -- `optional`: the field may or may not be set. If an optional field value - isn't set, a default value is used. For simple types, you can specify your - own default value, as we've done for the phone number `type` in the example. - Otherwise, a system default is used: zero for numeric types, the empty - string for strings, false for bools. For embedded messages, the default - value is always the "default instance" or "prototype" of the message, which - has none of its fields set. Calling the accessor to get the value of an - optional (or required) field which has not been explicitly set always - returns that field's default value. -- `repeated`: the field may be repeated any number of times (including zero). - The order of the repeated values will be preserved in the protocol buffer. - Think of repeated fields as dynamically sized arrays. -- `required`: a value for the field must be provided, otherwise the message - will be considered "uninitialized". If `libprotobuf` is compiled in debug - mode, serializing an uninitialized message will cause an assertion failure. - In optimized builds, the check is skipped and the message will be written - anyway. However, parsing an uninitialized message will always fail (by - returning `false` from the parse method). Other than this, a required field - behaves exactly like an optional field. - -{{% alert title="Important" color="warning" %}} **Required Is Forever** -You should be very careful about marking fields as `required`. If at some point -you wish to stop writing or sending a required field, it will be problematic to -change the field to an optional field -- old readers will consider messages -without this field to be incomplete and may reject or drop them unintentionally. -You should consider writing application-specific custom validation routines for -your buffers instead. Within Google, `required` fields are strongly disfavored; -most messages defined in proto2 syntax use `optional` and `repeated` only. -(Proto3 does not support `required` fields at all.) -{{% /alert %}} +higher for less-commonly used elements. + +Fields can be one of the following: + +* singular: By default, fields are optional, meaning the field may or may not + be set. If a singular field is not set, a type-specific default is used: + zero for numeric types, the empty string for strings, false for bools, and + the first defined enum value for enums (which must be 0). Note that you + cannot explicitly set a field to `singular`. This is a description of a + non-repeated field. + +* **`repeated`**: The field may be repeated any number of times (including + zero). The order of the repeated values will be preserved. Think of repeated + fields as dynamically sized arrays. + +In older versions of protobuf, a `required` keyword existed, but it has been +found to be brittle and is not supported in modern protobufs (though editions +does have a feature you can use to enable it, for backward compatibility). You'll find a complete guide to writing `.proto` files -- including all the possible field types -- in the -[Protocol Buffer Language Guide](/programming-guides/proto2). +[Protocol Buffer Language Guide](/programming-guides/editions). Don't go looking for facilities similar to class inheritance, though -- protocol buffers don't do that. @@ -180,15 +165,14 @@ classes you'll need to read and write `AddressBook` (and hence `Person` and `PhoneNumber`) messages. To do this, you need to run the protocol buffer compiler `protoc` on your `.proto`: -1. If you haven't installed the compiler, - [download the package](/downloads) and follow the - instructions in the README. +1. If you haven't installed the compiler, follow the instructions in + [Protocol Buffer Compiler Installation](/installation/). 2. Now run the compiler, specifying the source directory (where your application's source code lives -- the current directory is used if you don't provide a value), the destination directory (where you want the generated code to go; often the same as `$SRC_DIR`), and the path to your - `.proto`. In this case, you...: + `.proto`. In this case: ```shell protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/addressbook.proto @@ -213,42 +197,40 @@ and `phones` fields, you have these methods: ```cpp // name - inline bool has_name() const; - inline void clear_name(); - inline const ::std::string& name() const; - inline void set_name(const ::std::string& value); - inline void set_name(const char* value); - inline ::std::string* mutable_name(); + bool has_name() const; // Only for explicit presence + void clear_name(); + const ::std::string& name() const; + void set_name(const ::std::string& value); + ::std::string* mutable_name(); // id - inline bool has_id() const; - inline void clear_id(); - inline int32_t id() const; - inline void set_id(int32_t value); + bool has_id() const; + void clear_id(); + int32_t id() const; + void set_id(int32_t value); // email - inline bool has_email() const; - inline void clear_email(); - inline const ::std::string& email() const; - inline void set_email(const ::std::string& value); - inline void set_email(const char* value); - inline ::std::string* mutable_email(); + bool has_email() const; + void clear_email(); + const ::std::string& email() const; + void set_email(const ::std::string& value); + ::std::string* mutable_email(); // phones - inline int phones_size() const; - inline void clear_phones(); - inline const ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >& phones() const; - inline ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >* mutable_phones(); - inline const ::tutorial::Person_PhoneNumber& phones(int index) const; - inline ::tutorial::Person_PhoneNumber* mutable_phones(int index); - inline ::tutorial::Person_PhoneNumber* add_phones(); + int phones_size() const; + void clear_phones(); + const ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >& phones() const; + ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >* mutable_phones(); + const ::tutorial::Person_PhoneNumber& phones(int index) const; + ::tutorial::Person_PhoneNumber* mutable_phones(int index); + ::tutorial::Person_PhoneNumber* add_phones(); ``` As you can see, the getters have exactly the name as the field in lowercase, and -the setter methods begin with `set_`. There are also `has_` methods for each -singular (required or optional) field which return true if that field has been -set. Finally, each field has a `clear_` method that un-sets the field back to -its empty state. +the setter methods begin with `set_`. There are also `has_` methods for singular +fields that have explicit presence tracking, which return true if that field has +been set. Finally, each field has a `clear_` method that un-sets the field back +to its default state. While the numeric `id` field just has the basic accessor set described above, the `name` and `email` fields have a couple of extra methods because they're @@ -293,8 +275,7 @@ forward-declare `Person_PhoneNumber`. Each message class also contains a number of other methods that let you check or manipulate the entire message, including: -- `bool IsInitialized() const;`: checks if all the required fields have been - set. +- `bool IsInitialized() const;`: checks if all required fields have been set. - `string DebugString() const;`: returns a human-readable representation of the message, particularly useful for debugging. - `void CopyFrom(const Person& from);`: overwrites the message with the given @@ -324,7 +305,7 @@ include: C++ `istream`. These are just a couple of the options provided for parsing and serialization. -Again, see the +See the [`Message` API reference](/reference/cpp/api-docs/google.protobuf.message#Message) for a complete list. @@ -334,13 +315,15 @@ don't provide additional functionality; they don't make good first class citizens in an object model. If you want to add richer behavior to a generated class, the best way to do this is to wrap the generated protocol buffer class in an application-specific class. Wrapping protocol buffers is also a good idea if -you don't have control over the design of the `.proto` file (if, say, you're +you don't have control over the design of the .proto file (if, say, you're reusing one from another project). In that case, you can use the wrapper class to craft an interface better suited to the unique environment of your application: hiding some data and methods, exposing convenience functions, etc. -**You should never add behavior to the generated classes by inheriting from -them**. This will break internal mechanisms and is not good object-oriented -practice anyway. {{% /alert %}} +**You cannot add behavior to the generated classes by inheriting from them**, as +they are final. This prevents breaking internal mechanisms and is not good +object-oriented practice anyway. + + {{% /alert %}} ## Writing a Message {#writing-a-message} @@ -349,10 +332,10 @@ address book application to be able to do is write personal details to your address book file. To do this, you need to create and populate instances of your protocol buffer classes and then write them to an output stream. -Here is a program which reads an `AddressBook` from a file, adds one new -`Person` to it based on user input, and writes the new `AddressBook` back out to -the file again. The parts which directly call or reference code generated by the -protocol compiler are highlighted. +Here is a program that reads an `AddressBook` from a file, adds one new `Person` +to it based on user input, and writes the new `AddressBook` back out to the file +again. The parts which directly call or reference code generated by the protocol +compiler are highlighted. ```cpp #include @@ -362,21 +345,21 @@ protocol compiler are highlighted. using namespace std; // This function fills in a Person message based on user input. -void PromptForAddress(tutorial::Person* person) { +void PromptForAddress(tutorial::Person& person) { cout << "Enter person ID number: "; int id; cin >> id; - person->set_id(id); + person.set_id(id); cin.ignore(256, '\n'); cout << "Enter name: "; - getline(cin, *person->mutable_name()); + getline(cin, *person.mutable_name()); cout << "Enter email address (blank for none): "; string email; getline(cin, email); if (!email.empty()) { - person->set_email(email); + person.set_email(email); } while (true) { @@ -387,7 +370,7 @@ void PromptForAddress(tutorial::Person* person) { break; } - tutorial::Person::PhoneNumber* phone_number = person->add_phones(); + tutorial::Person::PhoneNumber* phone_number = person.add_phones(); phone_number->set_number(number); cout << "Is this a mobile, home, or work phone? "; @@ -400,7 +383,7 @@ void PromptForAddress(tutorial::Person* person) { } else if (type == "work") { phone_number->set_type(tutorial::Person::PHONE_TYPE_WORK); } else { - cout << "Unknown phone type. Using default." << endl; + cout << "Unknown phone type. Using default." << endl; } } } @@ -432,7 +415,7 @@ int main(int argc, char* argv[]) { } // Add an address. - PromptForAddress(address_book.add_people()); + PromptForAddress(*address_book.add_people()); { // Write the new address book back to disk. @@ -481,18 +464,14 @@ using namespace std; // Iterates though all people in the AddressBook and prints info about them. void ListPeople(const tutorial::AddressBook& address_book) { - for (int i = 0; i < address_book.people_size(); i++) { - const tutorial::Person& person = address_book.people(i); - + for (const tutorial::Person& person : address_book.people()) { cout << "Person ID: " << person.id() << endl; cout << " Name: " << person.name() << endl; - if (person.has_email()) { + if (!person.has_email()) { cout << " E-mail address: " << person.email() << endl; } - for (int j = 0; j < person.phones_size(); j++) { - const tutorial::Person::PhoneNumber& phone_number = person.phones(j); - + for (const tutorial::Person::PhoneNumber& phone_number : person.phones()) { switch (phone_number.type()) { case tutorial::Person::PHONE_TYPE_MOBILE: cout << " Mobile phone #: "; @@ -503,6 +482,10 @@ void ListPeople(const tutorial::AddressBook& address_book) { case tutorial::Person::PHONE_TYPE_WORK: cout << " Work phone #: "; break; + case tutorial::Person::PHONE_TYPE_UNSPECIFIED: + default: + cout << " Phone #: "; + break; } cout << phone_number.number() << endl; } @@ -549,30 +532,23 @@ your new buffers to be backwards-compatible, and your old buffers to be forward-compatible -- and you almost certainly do want this -- then there are some rules you need to follow. In the new version of the protocol buffer: -- you *must not* change the field numbers of any existing fields. -- you *must not* add or delete any required fields. -- you *may* delete optional or repeated fields. -- you *may* add new optional or repeated fields but you must use fresh field +* you *must not* change the field numbers of any existing fields. +* you *may* delete singular or repeated fields. +* you *may* add new singular or repeated fields but you must use fresh field numbers (that is, field numbers that were never used in this protocol buffer, not even by deleted fields). (There are -[some exceptions](/programming-guides/proto2#updating) to -these rules, but they are rarely used.) +[some exceptions](/programming-guides/editions#updating) +to these rules, but they are rarely used.) If you follow these rules, old code will happily read new messages and simply -ignore any new fields. To the old code, optional fields that were deleted will -simply have their default value, and deleted repeated fields will be empty. New -code will also transparently read old messages. However, keep in mind that new -optional fields will not be present in old messages, so you will need to either -check explicitly whether they're set with `has_`, or provide a reasonable -default value in your `.proto` file with `[default = value]` after the field -number. If the default value is not specified for an optional element, a -type-specific default value is used instead: for strings, the default value is -the empty string. For booleans, the default value is false. For numeric types, -the default value is zero. Note also that if you added a new repeated field, -your new code will not be able to tell whether it was left empty (by new code) -or never set at all (by old code) since there is no `has_` flag for it. +ignore any new fields. To the old code, fields that were deleted will simply +have their default value, and deleted repeated fields will be empty. New code +will also transparently read old messages. However, keep in mind that new fields +will not be present in old messages, so you will need to check for their +presence by checking if they have the default value (e.g., an empty string) +before use. ## Optimization Tips {#optimization} @@ -580,15 +556,54 @@ The C++ Protocol Buffers library is extremely heavily optimized. However, proper usage can improve performance even more. Here are some tips for squeezing every last drop of speed out of the library: -- Reuse message objects when possible. Messages try to keep around any memory - they allocate for reuse, even when they are cleared. Thus, if you are - handling many messages with the same type and similar structure in - succession, it is a good idea to reuse the same message object each time to - take load off the memory allocator. However, objects can become bloated over - time, especially if your messages vary in "shape" or if you occasionally - construct a message that is much larger than usual. You should monitor the - sizes of your message objects by calling the `SpaceUsed` method and delete - them once they get too big. +- **Use Arenas for memory allocation.** When you create many protocol buffer + messages in a short-lived operation (like parsing a single request), the + system's memory allocator can become a bottleneck. Arenas are designed to + mitigate this. By using an arena, you can perform many allocations with low + overhead, and a single deallocation for all of them at once. This can + significantly improve performance in message-heavy applications. + + To use arenas, you allocate messages on a `google::protobuf::Arena` object: + + ```cpp + google::protobuf::Arena arena; + tutorial::Person* person = google::protobuf::Arena::Create(&arena); + // ... populate person ... + ``` + + When the arena object is destroyed, all messages allocated on it are freed. + For more details, see the [Arenas guide](/arenas). + +- **Reuse non-arena message objects when possible.** Messages try to keep + around any memory they allocate for reuse, even when they are cleared. Thus, + if you are handling many messages with the same type and similar structure + in succession, it is a good idea to reuse the same message object each time + to take load off the memory allocator. However, objects can become bloated + over time, especially if your messages vary in "shape" or if you + occasionally construct a message that is much larger than usual. You should + monitor the sizes of your message objects by calling the `SpaceUsed` method + and delete them once they get too big. + + Reusing arena messages can lead to unbounded memory growth. Reusing heap + messages is safer. Even with heap message, though, you can still experience + issues with the high water mark of fields. For example, if you see messages: + + ```none + a: [1, 2, 3, 4] + b: [1] + ``` + + and + + ```none + a: [1] + b: [1, 2, 3, 4] + ``` + + and reuse the messages, then both fields will have enough memory for the + largest they have seen. So if each input only had 5 elements, the reused + message will have memory for 8. + - Your system's memory allocator may not be well-optimized for allocating lots of small objects from multiple threads. Try using [Google's TCMalloc](https://github.com/google/tcmalloc) instead. diff --git a/content/news/2025-06-27.md b/content/news/2025-06-27.md index 17323915..333f0cb1 100644 --- a/content/news/2025-06-27.md +++ b/content/news/2025-06-27.md @@ -8,7 +8,7 @@ type = "docs" ## Edition 2024 -We are planning to release Protobuf Editions in 32.x in Q3 2025. +We are planning to release Protobuf Edition 2024 in 32.x in Q3 2025. These describe changes as we anticipate them being implemented, but due to the flexible nature of software some of these changes may not land or may vary from @@ -115,7 +115,7 @@ has some notable differences (for example, switch statements are not supported). Edition 2024 adds support for option imports using the syntax `import option`. -Unlike normal `import` statements, option imports import only custom options +Unlike normal `import` statements, `import option` only imports custom options defined in a `.proto` file, without importing other symbols. This means that messages and enums are excluded from the option import. In the diff --git a/content/news/v32.md b/content/news/v32.md index 9bf9c481..d78dbe2a 100644 --- a/content/news/v32.md +++ b/content/news/v32.md @@ -103,7 +103,7 @@ has some notable differences (for example, switch statements are not supported). Edition 2024 adds support for option imports using the syntax `import option`. -Unlike normal `import` statements, option imports import only custom options +Unlike normal `import` statements, `import option` only imports custom options defined in a `.proto` file, without importing other symbols. This means that messages and enums are excluded from the option import. In the diff --git a/content/programming-guides/editions.md b/content/programming-guides/editions.md index 0c020524..647ed062 100644 --- a/content/programming-guides/editions.md +++ b/content/programming-guides/editions.md @@ -1,15 +1,15 @@ +++ title = "Language Guide (editions)" weight = 40 -description = "Covers how to use the edition 2023 revision of the Protocol Buffers language in your project." +description = "Covers how to use the editions revisions of the Protocol Buffers language in your project." type = "docs" +++ This guide describes how to use the protocol buffer language to structure your protocol buffer data, including `.proto` file syntax and how to generate data -access classes from your `.proto` files. It covers **edition 2023** of the -protocol buffers language. For information about how editions differ from proto2 -and proto3 conceptually, see +access classes from your `.proto` files. It covers **edition 2023** to **edition +2024** of the protocol buffers language. For information about how editions +differ from proto2 and proto3 conceptually, see [Protobuf Editions Overview](/editions/overview). For information on the **proto2** syntax, see the @@ -871,6 +871,15 @@ file: import "myproject/other_protos.proto"; ``` +As of Edition 2024, you can also use `import option` to use +[custom option definitions](#customoptions) from other `.proto` files. Unlike +regular imports, this only allows use of custom options definitions but not +other message or enum definitions to avoid dependencies in the generated code. + +```proto +import option "myproject/other_protos.proto"; +``` + By default, you can use definitions only from directly imported `.proto` files. However, sometimes you may need to move a `.proto` file to a new location. Instead of moving the `.proto` file directly and updating all the call sites in @@ -908,12 +917,24 @@ flag. If no flag was given, it looks in the directory in which the compiler was invoked. In general you should set the `--proto_path` flag to the root of your project and use fully qualified names for all imports. +### Symbol Visibility {#proto2} + +Visibility of what symbols are available or unavailable when imported by other +protos is controlled by the +[`features.default_symbol_visibility`](/editions/features#symbol-vis) +feature and the +[`export` and `local` keywords](/editions/overview#export-local) +which were added in Edition 2024. + +Only symbols that are exported, either via the default symbol visibility or with +an `export` keyword, can be referenced by the importing file. + ### Using proto2 and proto3 Message Types {#proto2} It's possible to import [proto2](/programming-guides/proto2) and [proto3](/programming-guides/proto3) message types and -use them in your editions 2023 messages, and vice versa. +use them in your editions messages, and vice versa. ## Nested Types {#nested} @@ -1809,7 +1830,8 @@ Here are a few of the most commonly used options: messages, services, and enumerations, and the wrapper Java class generated for this `.proto` file won't contain any nested classes/enums/etc. This is a Boolean option which defaults to `false`. If not generating Java code, this - option has no effect. + option has no effect. This was removed in edition 2024 and replaced with + [`features.(pb.java).nest_in_file_class`](/editions/features/#java-nest_in_file) ```proto option java_multiple_files = true; @@ -1941,6 +1963,9 @@ you need to create your own options, see the for details. Note that creating custom options uses [extensions](/programming-guides/proto2#extensions). +Starting in edition 2024, import custom option definitions using `import +option`. See [Importing](#importing). + ### Option Retention {#option-retention} Options have a notion of *retention*, which controls whether an option is diff --git a/content/programming-guides/enum.md b/content/programming-guides/enum.md index 7ddba71b..2e48628c 100644 --- a/content/programming-guides/enum.md +++ b/content/programming-guides/enum.md @@ -114,18 +114,6 @@ Under editions, this behavior is represented by the deprecated field feature [`features.(pb.cpp).legacy_closed_enum`](/editions/features#legacy_closed_enum). There are two options for moving to conformant behavior: -* Remove the field feature. This is the recommended approach, but may cause - runtime behavior changes. Without the feature, unrecognized integers will - end up stored in the field cast to the enum type instead of being put into - the unknown field set. -* Change the enum to closed. This is discouraged, and can cause runtime - behavior if *anybody else* is using the enum. Unrecognized integers will end - up in the unknown field set instead of those fields. - -Under editions, this behavior is represented by the deprecated field feature -[`features.(pb.cpp).legacy_closed_enum`](/editions/features#legacy_closed_enum). -There are two options for moving to conformant behavior: - * Remove the field feature. This is the recommended approach, but may cause runtime behavior changes. Without the feature, unrecognized integers will end up stored in the field cast to the enum type instead of being put into @@ -133,7 +121,7 @@ There are two options for moving to conformant behavior: * Change the enum to closed. This is discouraged, and can cause runtime behavior changes if *anybody else* is using the enum. Unrecognized integers will end up in the unknown field set instead of those fields. - + ### C# {#csharp} All known C# releases are out of conformance. C# treats all enums as **open**. diff --git a/content/programming-guides/serialization-not-canonical.md b/content/programming-guides/serialization-not-canonical.md index 9ef6264c..98cc5116 100644 --- a/content/programming-guides/serialization-not-canonical.md +++ b/content/programming-guides/serialization-not-canonical.md @@ -39,17 +39,19 @@ allow for optimization opportunities. ## Inherent Barriers to Stable Serialization Protobuf objects preserve unknown fields to provide forward and backward -compatibility. Unknown fields cannot be canonically serialized: - -1. Unknown fields can't distinguish between bytes and sub-messages, as both - have the same wire type. This makes it impossible to canonicalize messages - stored in the unknown field set. If we were going to canonicalize, we would - need to recurse into unknown submessages to sort their fields by field - number, but we don't have enough information to do this. -1. Unknown fields are always serialized after known fields, for efficiency. But - canonical serialization would require interleaving unknown fields with known - fields by field number. This would cause efficiency and code size overheads - for everybody, even people who do not use the feature. +compatibility. The handling of unknown fields is a primary obstacle to canonical +serialization. + +In the wire format, bytes fields and nested sub-messages use the same wire type. +This ambiguity makes it impossible to correctly canonicalize messages stored in +the unknown field set. Since the exact same contents may be either one, it is +impossible to know whether to treat it as a message and recurse down or not. + +For efficiency, implementations typically serialize unknown fields after known +fields. Canonical serialization, however, would require interleaving unknown +fields with known fields according to field number. This would impose +significant efficiency and code size costs on all users, even those not +requiring this feature. ## Things Intentionally Left Undefined @@ -66,4 +68,3 @@ allow for more optimization opportunities: To leave room for optimizations like this, we want to intentionally scramble field order in some configurations, so that applications do not inappropriately depend on field order. - diff --git a/content/reference/protobuf/edition-2024-spec.md b/content/reference/protobuf/edition-2024-spec.md new file mode 100644 index 00000000..d802110c --- /dev/null +++ b/content/reference/protobuf/edition-2024-spec.md @@ -0,0 +1,434 @@ ++++ +title = "Protocol Buffers Edition 2024 Language Specification" +weight = 801 +linkTitle = "2024 Language Specification" +description = "Language specification reference for edition 2024 of the Protocol Buffers language." +type = "docs" ++++ + +The syntax is specified using +[Extended Backus-Naur Form (EBNF)](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_Form): + +``` +| alternation +() grouping +[] option (zero or one time) +{} repetition (any number of times) +``` + +## Lexical Elements {#lexical_elements} + +### Letters and Digits {#letters_and_digits} + +``` +letter = "A" ... "Z" | "a" ... "z" +capitalLetter = "A" ... "Z" +decimalDigit = "0" ... "9" +octalDigit = "0" ... "7" +hexDigit = "0" ... "9" | "A" ... "F" | "a" ... "f" +``` + +### Identifiers + +``` +ident = letter { letter | decimalDigit | "_" } +fullIdent = ident { "." ident } +messageName = ident +enumName = ident +fieldName = ident +oneofName = ident +mapName = ident +serviceName = ident +rpcName = ident +streamName = ident +messageType = [ "." ] { ident "." } messageName +enumType = [ "." ] { ident "." } enumName +groupName = capitalLetter { letter | decimalDigit | "_" } +``` + +### Integer Literals {#integer_literals} + +``` +intLit = decimalLit | octalLit | hexLit +decimalLit = [-] ( "1" ... "9" ) { decimalDigit } +octalLit = [-] "0" { octalDigit } +hexLit = [-] "0" ( "x" | "X" ) hexDigit { hexDigit } +``` + +### Floating-point Literals + +``` +floatLit = [-] ( decimals "." [ decimals ] [ exponent ] | decimals exponent | "."decimals [ exponent ] ) | "inf" | "nan" +decimals = [-] decimalDigit { decimalDigit } +exponent = ( "e" | "E" ) [ "+" | "-" ] decimals +``` + +### Boolean + +``` +boolLit = "true" | "false" +``` + +### String Literals {#string_literals} + +``` +strLit = strLitSingle { strLitSingle } +strLitSingle = ( "'" { charValue } "'" ) | ( '"' { charValue } '"' ) +charValue = hexEscape | octEscape | charEscape | unicodeEscape | unicodeLongEscape | /[^\0\n\\]/ +hexEscape = '\' ( "x" | "X" ) hexDigit [ hexDigit ] +octEscape = '\' octalDigit [ octalDigit [ octalDigit ] ] +charEscape = '\' ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | '\' | "'" | '"' ) +unicodeEscape = '\' "u" hexDigit hexDigit hexDigit hexDigit +unicodeLongEscape = '\' "U" ( "000" hexDigit hexDigit hexDigit hexDigit hexDigit | + "0010" hexDigit hexDigit hexDigit hexDigit +``` + +### EmptyStatement + +``` +emptyStatement = ";" +``` + +### Constant + +``` +constant = fullIdent | ( [ "-" | "+" ] intLit ) | ( [ "-" | "+" ] floatLit ) | + strLit | boolLit | MessageValue +``` + +`MessageValue` is defined in the +[Text Format Language Specification](/reference/protobuf/textformat-spec#fields). + +## Edition + +The edition statement replaces the legacy `syntax` keyword, and is used to +define the edition that this file is using. + +``` +edition = "edition" "=" [ ( "'" decimalLit "'" ) | ( '"' decimalLit '"' ) ] ";" +``` + +## Import Statement {#import_statement} + +The import statement is used to import another .proto's definitions. + +``` +import = "import" [ "public" | "option" ] strLit ";" +``` + +Example: + +```proto +import public "other.proto"; +import option "custom_option.proto"; +``` + +## Package + +The package specifier can be used to prevent name clashes between protocol +message types. + +``` +package = "package" fullIdent ";" +``` + +Example: + +```proto +package foo.bar; +``` + +## Option + +Options can be used in proto files, messages, enums and services. An option can +be a protobuf defined option or a custom option. For more information, see +[Options](/programming-guides/proto2#options) in the +language guide. Options are also be used to control +[Feature Settings](/editions/features). + +``` +option = "option" optionName "=" constant ";" +optionName = ( ident | "(" ["."] fullIdent ")" ) +``` + +For examples: + +```proto +option java_package = "com.example.foo"; +option features.enum_type = CLOSED; +``` + +## Fields + +Fields are the basic elements of a protocol buffer message. Fields can be normal +fields, group fields, oneof fields, or map fields. A field has a label, type and +field number. + +``` +label = [ "repeated" ] +type = "double" | "float" | "int32" | "int64" | "uint32" | "uint64" + | "sint32" | "sint64" | "fixed32" | "fixed64" | "sfixed32" | "sfixed64" + | "bool" | "string" | "bytes" | messageType | enumType +fieldNumber = intLit; +``` + +### Normal field {#normal_field} + +Each field has a label, type, name, and field number. It may have field options. + +``` +field = [label] type fieldName "=" fieldNumber [ "[" fieldOptions "]" ] ";" +fieldOptions = fieldOption { "," fieldOption } +fieldOption = optionName "=" constant +``` + +Examples: + +```proto +foo.bar nested_message = 2; +repeated int32 samples = 4 [packed=true]; +``` + +### Oneof and oneof field {#oneof_and_oneof_field} + +A oneof consists of oneof fields and a oneof name. Oneof fields do not have +labels. + +``` +oneof = "oneof" oneofName "{" { option | oneofField } "}" +oneofField = type fieldName "=" fieldNumber [ "[" fieldOptions "]" ] ";" +``` + +Example: + +```proto +oneof foo { + string name = 4; + SubMessage sub_message = 9; +} +``` + +### Map field {#map_field} + +A map field has a key type, value type, name, and field number. The key type can +be any integral or string type. Note, the key type may not be an enum. + +``` +mapField = "map" "<" keyType "," type ">" mapName "=" fieldNumber [ "[" fieldOptions "]" ] ";" +keyType = "int32" | "int64" | "uint32" | "uint64" | "sint32" | "sint64" | + "fixed32" | "fixed64" | "sfixed32" | "sfixed64" | "bool" | "string" +``` + +Example: + +```proto +map projects = 3; +``` + +## Extensions and Reserved {#extensions_and_reserved} + +Extensions and reserved are message elements that declare a range of field +numbers or field names. + +### Extensions + +Extensions declare that a range of field numbers in a message are available for +third-party extensions. Other people can declare new fields for your message +type with those numeric tags in their own .proto files without having to edit +the original file. + +``` +extensions = "extensions" ranges ";" +ranges = range { "," range } +range = intLit [ "to" ( intLit | "max" ) ] +``` + +Examples: + +```proto +extensions 100 to 199; +extensions 4, 20 to max; +``` + +### Reserved + +Reserved declares a range of field numbers or names in a message or enum that +can't be used. + +``` +reserved = "reserved" ( ranges | reservedIdent ) ";" +fieldNames = fieldName { "," fieldName } +``` + +Examples: + +```proto +reserved 2, 15, 9 to 11; +reserved foo, bar; +``` + +## Top Level definitions {#top_level_definitions} + +### Symbol Visibility {#symbol_visibility} + +Some message and enum definitions can be annotated to override their default +symbol visibility. + +This is controlled by [`features.default_symbol_visibility`](/editions/features/#symbol-vis) and symbol visibility is further documented in [export / local Keywords](/editions/overview/#export-local) + +``` +symbolVisibility = "export" | "local" +``` + +### Enum definition {#enum_definition} + +The enum definition consists of a name and an enum body. The enum body can have options, enum fields, and reserved statements. + +``` +enum = [ symbolVisibility ] "enum" enumName enumBody +enumBody = "{" { option | enumField | emptyStatement | reserved } "}" +enumField = fieldName "=" [ "-" ] intLit [ "[" enumValueOption { "," enumValueOption } "]" ]";" +enumValueOption = optionName "=" constant +``` + +Example: + +```proto +enum EnumAllowingAlias { + option allow_alias = true; + EAA_UNSPECIFIED = 0; + EAA_STARTED = 1; + EAA_RUNNING = 2 [(custom_option) = "hello world"]; +} +``` + +### Message definition {#message_definition} + +A message consists of a message name and a message body. The message body can +have fields, nested enum definitions, nested message definitions, extend +statements, extensions, groups, options, oneofs, map fields, and reserved +statements. A message cannot contain two fields with the same name in the same +message schema. + +``` +message = [ symbolVisibility ] "message" messageName messageBody +messageBody = "{" { field | enum | message | extend | extensions | group | +option | oneof | mapField | reserved | emptyStatement } "}" +``` + +Example: + +```proto +message Outer { + option (my_option).a = true; + message Inner { // Level 2 + required int64 ival = 1; + } + map my_map = 2; + extensions 20 to 30; +} +``` + +None of the entities declared inside a message may have conflicting names. All +of the following are prohibited: + +``` +message MyMessage { + string foo = 1; + message foo {} +} + +message MyMessage { + string foo = 1; + oneof foo { + string bar = 2; + } +} + +message MyMessage { + string foo = 1; + extend Extendable { + string foo = 2; + } +} + +message MyMessage { + string foo = 1; + enum E { + foo = 0; + } +} +``` + +### Extend + +If a message in the same or imported .proto file has reserved a range for +extensions, the message can be extended. + +``` +extend = "extend" messageType "{" {field | group} "}" +``` + +Example: + +```proto +extend Foo { + int32 bar = 126; +} +``` + +### Service definition {#service_definition} + +``` +service = "service" serviceName "{" { option | rpc | emptyStatement } "}" +rpc = "rpc" rpcName "(" [ "stream" ] messageType ")" "returns" "(" [ "stream" ] +messageType ")" (( "{" { option | emptyStatement } "}" ) | ";" ) +``` + +Example: + +```proto +service SearchService { + rpc Search (SearchRequest) returns (SearchResponse); +} +``` + +## Proto file {#proto_file} + +``` +proto = [syntax] { import | package | option | topLevelDef | emptyStatement } +topLevelDef = message | enum | extend | service +``` + +An example .proto file: + +```proto +edition = "2024"; +import public "other.proto"; +import option "custom_option.proto"; +option java_package = "com.example.foo"; +enum EnumAllowingAlias { + option allow_alias = true; + EAA_UNSPECIFIED = 0; + EAA_STARTED = 1; + EAA_RUNNING = 1; + EAA_FINISHED = 2 [(custom_option) = "hello world"]; +} +message Outer { + option (my_option).a = true; + message Inner { // Level 2 + int64 ival = 1 [features.field_presence = LEGACY_REQUIRED]; + } + repeated Inner inner_message = 2; + EnumAllowingAlias enum_field = 3; + map my_map = 4; + extensions 20 to 30; + reserved reserved_field; +} +message Foo { + message GroupMessage { + bool a = 1; + } + GroupMessage groupmessage = [features.message_encoding = DELIMITED]; +} +``` diff --git a/content/reference/rust/rust-generated.md b/content/reference/rust/rust-generated.md index 0489c9df..ff65ad62 100644 --- a/content/reference/rust/rust-generated.md +++ b/content/reference/rust/rust-generated.md @@ -70,19 +70,28 @@ message Foo {} ``` The compiler generates a struct named `Foo`. The `Foo` struct defines the -following methods: +following associated functions and methods: + +### Associated Functions * `fn new() -> Self`: Creates a new instance of `Foo`. * `fn parse(data: &[u8]) -> Result`: Parses `data` - into an instance of `Foo` if `data` holds a valid wire format representation - of `Foo`. Otherwise, the function returns an error. -* `fn clear_and_parse(&mut self, data: &[u8]) -> Result<(), ParseError>`: Like - calling `.clear()` and `parse()` in sequence. + and returns an instance of `Foo` if `data` holds a valid wire format + representation of `Foo`. Otherwise, the function returns an error. + +### Methods + +* `fn clear_and_parse(&mut self, data: &[u8]) -> Result<(), ParseError>`: + Clearing and parsing into existing instance (`protobuf::ClearAndParse` + trait). * `fn serialize(&self) -> Result, SerializeError>`: Serializes the message to Protobuf wire format. Serialization can fail but rarely will. Failure reasons include exceeding the maximum message size, insufficient - memory, and required fields (proto2) that are unset. -* `fn merge_from(&mut self, other)`: Merges `self` with `other`. + memory, and required fields (proto2) that are unset (`protobuf::Serialize` + trait). +* `fn clear(&mut self)`: Clears message (`protobuf::Clear` trait). +* `fn merge_from(&mut self, other)`: Merges `self` with `other` + (`protobuf::MergeFrom` trait). * `fn as_view(&self) -> FooView<'_>`: Returns an immutable handle (view) to `Foo`. This is further covered in the section on proxy types. * `fn as_mut(&mut self) -> FooMut<'_>`: Returns a mutable handle (mut) to @@ -90,6 +99,13 @@ following methods: `Foo` implements the following traits: +* `protobuf::ClearAndParse` +* `protobuf::Clear` +* `protobuf::CopyFrom` +* `protobuf::MergeFrom` +* `protobuf::Parse` +* `protobuf::Serialize` +* `protobuf::TakeFrom` * `std::fmt::Debug` * `std::default::Default` * `std::clone::Clone`