diff --git a/proposals/p5913.md b/proposals/p5913.md new file mode 100644 index 0000000000000..eac58b71ca5f0 --- /dev/null +++ b/proposals/p5913.md @@ -0,0 +1,1026 @@ +# Reworking unformed state + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/5913) + + + +## Table of contents + +- [Abstract](#abstract) +- [Problem](#problem) +- [Background](#background) + - [Safety-specific background](#safety-specific-background) +- [Goals and use cases](#goals-and-use-cases) +- [Proposal](#proposal) + - [Declaring an unformed state for a type](#declaring-an-unformed-state-for-a-type) + - [`unsafe as`, `unsafe adapt`, and raw storage](#unsafe-as-unsafe-adapt-and-raw-storage) + - [A special type to operate on potentially unformed objects](#a-special-type-to-operate-on-potentially-unformed-objects) + - [Flow-sensitive restrictions on objects known to be unformed](#flow-sensitive-restrictions-on-objects-known-to-be-unformed) + - [Putting all this together for our use cases](#putting-all-this-together-for-our-use-cases) +- [C++ interop](#c-interop) + - [C++ types and unformed state](#c-types-and-unformed-state) + - [C++ standard library types](#c-standard-library-types) + - [Passing unformed objects into C++ code](#passing-unformed-objects-into-c-code) +- [Further details](#further-details) + - [Expected standard type behavior](#expected-standard-type-behavior) + - [Class types with a vtable](#class-types-with-a-vtable) + - [Comparison to `MaybeUninit` from Rust](#comparison-to-maybeuninit-from-rust) +- [Rationale](#rationale) +- [Alternatives considered](#alternatives-considered) + - [Bit-mask based unformed state](#bit-mask-based-unformed-state) + - [Conversion oriented API design](#conversion-oriented-api-design) + - [Switching to one of the simpler alternatives discussed in #257](#switching-to-one-of-the-simpler-alternatives-discussed-in-257) + - [Alternative syntax choices for `unsafe as` discussed in #5930](#alternative-syntax-choices-for-unsafe-as-discussed-in-5930) + + + +## Abstract + +Introduce a more formal model for how unformed state is managed for types. This +includes: + +- Specifying the interfaces used by types to opt into unformed state, and how + they work. +- Providing a `Core.MaybeUnformed(T)` wrapper that models the API subset of + `T` available both when fully formed and unformed. +- Initial concepts of `unsafe adapt` and `unsafe as` to model unsafe type + conversions between compatible types. +- A flow-sensitive set of rules for how unformed types become fully formed. + +## Problem + +Carbon is trying to bring a rigorous model to handle complex initialization +scenarios from C++ in a way that maximizes reliability (through bug-detection), +soundness, efficiency, and ergonomics. + +Our initial design is detailed in +[proposal \#257: "Initialization of memory and variables"](/proposals/p0257.md). +While that direction remains promising, the specific mechanics suggested there +are incomplete and/or don't seem to achieve the desired result. For example, +that proposal suggests that objects in an unformed state can only be _assigned_ +or _destroyed_, and any other operation is invalid. But that leads to a problem: +how does one _implement_ either the assignment or destructor in a way that +doesn't perform an invalid operation? Any access to a field would be just such +an invalid operation, but accessing a field seems like a necessity in this +model. Similarly, how does one establish an unformed state for a new object? +Whatever operation is used seems like it would inherently violate the +constraints trying to be enforced, even returning an object in unformed state is +declared an error in [#257](/proposals/p0257.md). + +Ultimately, we need a reworked way of managing the unformed state of types in +order to make it implementable and coherent. + +## Background + +- [Proposal \#257: "Initialization of memory and variables"](/proposals/p0257.md) +- [Background](/proposals/p0257.md#background) of that proposal +- Rust's + [MaybeUninit](https://doc.rust-lang.org/std/mem/union.MaybeUninit.html) + +### Safety-specific background + +This proposal and this area of Carbon cannot be meaningfully addressed without +discussing safety. This is tricky as we have thus far been attempting to defer +much of the safety discussion until we have made more progress on the layer of +Carbon that will be the foundation of C++ interop. + +This proposal attempts to limit itself to concerns around _initialization_ and +_type_ safety, which are already reasonably achievable in C++ today and so +something that we need to consider even in the interop-focused layer. This is +still somewhat tricky as these issues do overlap, and we don't have a strong +foundation of safety principles that we are building on already in Carbon. The +hope is that this proposal can make reasonable progress as-is, and be refined +and integrated into a more comprehensive safety story when that arrives. We +still try to provide some fundamental background on these sub-classes of memory +safety for reference here: + +- ["Secure by Design: Google’s Perspective on Memory Safety"](https://research.google/pubs/secure-by-design-googles-perspective-on-memory-safety/), + specifically the section "Classes of Memory Safety Bugs" + - This is very similar to the earlier + [classification published by Apple](https://security.apple.com/blog/towards-the-next-generation-of-xnu-memory-safety/) + - Both build on widely documented bug classifications, such as + [presented by Microsoft](https://github.com/Microsoft/MSRC-Security-Research/blob/master/presentations/2019_02_BlueHatIL/2019_01%20-%20BlueHatIL%20-%20Trends%2C%20challenge%2C%20and%20shifts%20in%20software%20vulnerability%20mitigation.pdf) + +## Goals and use cases + +The goal of _unformed state_ is to provide for better balance between safety and +ergonomics prior to initialization and (once we add move semantics to Carbon) +after moves. For example, consider this motivating example control flow: + +```carbon +var x: SomeType; + +for (SomeLoopKnownNonEmpty()) { + x = MakeSomeValue(...); + ... +} + +UseSomeValue(x); +``` + +Where we assign to a variable at the start of the loop, but it is beneficial to +leave it uninitialized until that point as we have no meaningful value prior and +we always enter the loop. Rather than needing a separate operation for the first +iteration compared to all subsequent ones, or needing to manufacture a +meaningless value, unformed state lets us assign in all iterations uniformly and +then reliably use the now well formed value after the loop. + +The idea is to leverage either of two flexibilities that frequently are +available in the design of a type, and expose those to the language so that it +can automatically synthesize the necessary behavior for this pattern. + +Specifically: + +- Types which have a detectable invalid state (such as null for a non-null + pointer) +- Types for which there are states that make the destructor a no-op + +For types that _do_ support unformed state, they may do so in three broad +categories based on the specific combination of the above properties they have: + +1. Types where there is an invalid state that can be used as the unformed state, + and it can _optionally_ be queried, but the destructor will be a no-op + without checking for that state. +2. Types where there is an invalid state that can be used as the unformed state, + but destruction must _check_ for that invalid state and skip meaningful logic + to remain correct for objects in that state. +3. Types that have a valid state with a no-op destructor that they reuse when + unformed. These cannot support querying whether they are unformed. + +These different categories also result in tradeoffs in the capabilities and +implementation approach for unformed state. Types may be able to support more +than one of these strategies, and will have to select which one they use based +on which tradeoffs are right for that specific type. + +Types may alternatively _not_ have unformed state, either by necessity or +choice. Types with neither of the above properties are the simple case as they +_cannot support unformed state_. Other types may choose to not support an +unformed state even when they are capable, as that may result in a better API +design despite the ergonomic tradeoffs in the absence of an unformed state. + +We will illustrate the design of unformed state with examples in each of the +three categories that support unformed state. We will also use multiple examples +within a category to surface interesting choices about how to approach unformed +state in that category. + +Our examples for category (1) are two important primitive types: a non-owning +pointer `T*` and `bool`. For the non-owning pointer `T*` Carbon expects the null +value to be invalid. And for bool objects, we expect to have an entire byte of +state, and so have a great deal of flexibility as only two states will be valid. + +For category (2), the canonical example is an owning pointer type similar to +C++'s `std::unique_ptr`: + +```carbon +// Original type, prior to adding support for unformed state. +class OwningPtr(T:! type) { + var ptr: T*; + + fn destroy [ref self: Self]() { + // Note: we don't want to do this if `self` is unformed. + SomeDeallocationFunction(self.ptr); + } +} +``` + +Here, because we are building on top of the more primitive `T*`, we will also +illustrate _re-using_ unformed state of a member to implement a containing +type's unformed state. + +And lastly, for category (3), we consider both the primitive type `i32` with a +trivial destructor for all states, and something like an +`Optional(OwningPtr(T))` (ignoring niche optimizations) with an interesting +destructor in some states. + +## Proposal + +The largest constraint for how to model this comes from the checking we would +like to perform. We suggest at the highest level three tiers of enforcement: + +1. Compile-time checking when there is either a use of an unformed variable or + dead code that _would_ use the variable unformed if not dead (analogous to + Clang's `-Wsometimes-uninitialized`). +2. Implicit and cheap, local run-time checking for use of unformed variables + whenever possible, if the compile time checks are not sufficient. +3. The potential to restrict code that can't satisfy either of these by + explicitly marking them as unsafe, and the ability to apply additional + runtime hardening when executing such code. When exactly to enable + enforcement of the `unsafe` marker is left as future work. + +Achieving (1) suggests modeling this using the type system. Fully modeling this +in the type system would require flow-sensitive typing, which introduces +significant complexity that we would like to avoid in the broader language just +for this feature. So we propose a system that hides the flow sensitivity here, +retaining just enough of its functionality for our purposes. If Carbon ever +develops more rich flow sensitive typing, it may be reasonable to work to +integrate these further. + +There are three main components to the proposal: + +- Interfaces to identify the existence of an unformed state and initialize + objects using it. +- A special type for use when an object might be in an unformed state. +- The flow sensitive restrictions on how an object in an unformed state can be + used. + +### Declaring an unformed state for a type + +First, we'll introduce interfaces for opting a type into having an unformed +state, defining the subset of the object that participates in its unformed +state, and having some way of initializing objects to this state. Types do this +by implementing an interface from the prelude: + +```carbon +interface UnformedInit { + let StructT:! type; + fn Op() -> StructT; +} + +interface UnformedInvalid { + let StructT:! type; + let Value:! StructT; +} +final impl forall [T:! UnformedInvalid] T as UnformedInit { + where StructT = T.StructT; + + // The return should be a struct form that corresponds to the struct type + // `StructT`, but with all members returned as initializing return forms. How + // exactly to spell such a struct form return is beyond the scope of this + // proposal, here we just use placeholder syntax to transform the type into a + // form. + fn Op() -> {var ...expand StructT} { + return T.Value; + } +} + +interface UnformedNoop { + let StructT:! type; + let Value:! StructT; +} +final impl forall [T:! UnformedNoop] T as UnformedInit { + where StructT = T.StructT; + fn Op() -> StructT { + return T.Value; + } +} +``` + +The `StructT` associated type of the `UnformedInit` interface is a struct type +with a subset of the field names of `Self`, and each field's type must be +compatible-with the corresponding field type of `Self`. The implementation +function returns a struct form that corresponds to this struct type, but with +each field an initializing return form. When initializing a type `T` with an +unformed state to its unformed state, the fields of `T` that are in the +`StructT` type are initialized with the values (of compatible, but potentially +different, types) produced by calling this function. + +However, while this is provided as a simple, common way to _use_ the unformed +state, we expect most types to instead implement one of the more convenient +interfaces `UnformedInvalid` and `UnformedNoop`. These two interfaces model the +two fundamental approaches to unformed state that a type may elect to use: +either an _invalid_ state that it can distinguish from its valid states, or a +_noop_ state that can be assigned-to and destroyed, but where the destructor can +be omitted if desired. For these two categories, types should nominate a +specific constant value to use. + +We expect some code to end up falling back to unsafe code during initialization, +especially around C++ interop or during a migration from existing C++ API +designs. In these cases, we expect the compiler to automatically do some amount +of hardening to prevent any bugs in that unsafe code from being as easily +exploited, for example +[Clang's trivial auto variable initialization](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-ftrivial-auto-var-init). + +Beyond this, we expose interfaces that allow types to _customize_ any runtime +hardening that is especially valuable or benefits from specific values when +falling back to unsafe initialization: + +```carbon +interface UnformedHardenInit { + require impls UnformedInit; + let StructT:! type; + fn Op() -> StructT; +} + +interface UnformedHarden { + let StructT:! type; + let Value:! StructT; +} +final impl forall [T:! UnformedHarden] T as UnformedHardenInit { + where StructT = T.StructT; + fn Op() -> StructT { + return T.Value; + } +} +``` + +The `UnformedHardenInit.StructT` type has similar restrictions as +`UnformedInit.StructT` -- its fields must be a subset of the types fields, each +a compatible-with type. Further, the `UnformedHardenInit.StructT` must also be a +superset of the fields in `UnformedInit.StructT`. We refer to _hardening_ of an +unformed value as setting its fields to the result of `UnformedHardenInit.Op`, +and any additional initialization done by the compiler for security in the face +of unsafe code. Whichever semantic model a type uses for its unformed state, +`UnformedInvalid` or `UnformedNoop`, the hardened state must match that +semantic. This hardening only occurs to unformed or uninitialized objects, never +to well formed ones, and there is no restriction on whether both +`UnformedInit.Op` and `UnformedHardenInit.Op` are called or only one. Note that +these hardening interfaces should not be relied on and shouldn't replace broad +automatic initialization, but can allow types to specifically add extra +hardening where it is most useful, and surface if there is an especially +beneficial value to use. + +We also propose an interface that can be used when `UnformedInvalid` is +implemented to test for the unformed state: + +```carbon +interface IsUnformed { + require impls UnformedInvalid; + + // The type of `self` will be clarified below. + fn Op[self: Core.MaybeUnformed(T)]() -> bool; +} + +match_first { +impl forall [T:! UnformedInvalid & UnformedHarden] T as IsUnformed { + fn Op[self: Core.MaybeUnformed(T)]() -> bool { + return (self as T.(UnformedInvalid.StructT)) + == T.(UnformedInvalid.Value) or + (self as T.(UnformedHarden.StructT)) + == T.(UnformedHarden.Value); + } +} + +impl forall [T:! UnformedInvalid] T as IsUnformed { + fn Op[self: Core.MaybeUnformed(T)]() -> bool { + return (self as T.StructT) == T.Value; + } +} +} +``` + +We suggest providing blanket `impl`s for types that implement `UnformedInvalid` +that test for that invalid state, and for types that implement both +`UnformedInvalid` and `UnformedHarden` that test for either of those states. +This has an implication that the state used by `UnformedHarden` cannot be a +valid state for the type. Types can customize the implementation of `IsUnformed` +in the usual way for interfaces, but those need to uphold the contract of +definitively testing for an object being in an unformed (and potentially +hardened) state. + +**Future work**: We should probably provide default implementations of most of +these interfaces when the members have implementations. Spelling that out and +picking the specific default options isn't handled here and is future work. + +**Future work**: Eventually, we should design a more comprehensive system to +expose invalid states, bit patterns, etc., in order to facilitate stashing more +bits into types for discriminants and other tools. At that point, we can look at +more powerful ways of expressing both the basic invalid state and any +invalid+hardened state. + +**Future work:** We might want to additionally support types opting into +hardening _without_ supporting an unformed state at all. If that proves +motivating, we would either need a separate interface that is only used for +hardening and doesn't confer any unformed state semantics, or we would need to +split the hardening aspect of `UnformedHarden` apart into such an interface. +Currently, the suggestion is to have `UnformedHarden` be a superset because we +expect this to be the common case for types where an explicit hardening state is +desired, and separating it from UnformedInit would largely require types to +implement two things. In other words, we expect the blanket impl of +`UnformedInit` in terms of `UnformedHarden` to be a relatively common case. + +### `unsafe as`, `unsafe adapt`, and raw storage + +We introduce the idea of _unsafe conversions_ in order to use the above tools +effectively. These extend the conversion model of explicit and implicit +conversions: + +```carbon +interface UnsafeAs(T:! type) { + fn Convert[self: Self]() -> T; +} + +interface As(T:! type) { + extend final impl as UnsafeAs(T); +} + +interface ImplicitAs(T:! type) { + extend final impl as As(T); +} +``` + +Much as implicit conversions use `ImplicitAs`, and explicit conversions with the +as keyword use `As`, _unsafe conversions_ are written ` unsafe as T`, and +use the `UnsafeAs` interface implementation to perform the conversion. + +We combine this with the concept of +[adapting a type](/docs/design/generics/details.md#adapting-types) in an unsafe +way by modifying the adapt declaration with the `unsafe` keyword. Such an +adaptor has all the same properties as normal adapters, but the conversions +between the types are only available with `unsafe as`. These adapters can then +explicitly implement other conversion interfaces, but potentially do so +_conditionally_ or impose other restrictions such as only allowing them in one +direction. The syntax choice here was decided in leads issue #5930, with +[alternatives](#alternative-syntax-choices-for-unsafe-as-discussed-in-5930) +discussed below. + +When an `interface` uses `extend impl`, we propose that an `impl` of that +interface can also write `extend impl` within its `impl`. Doing so requires that +an `impl` of the extended interface be visible, and incorporates that into the +newly defined `impl` rather than requiring it to be duplicated. This +`extend impl` will end with a semicolon `;`, instead of a definition block in +curly braces `{`...`}`, but acts as a definition of the members of the extended +interface. This also allows the found _impl_ to be `final` because this +precludes any changes to the aspects of the `impl` that were final. This allows +adapters to extend which layer of these kinds of interface hierarchies they +implement without breaking coherence: + +```carbon +class A {} +class B { + // Provides definitions of both `B as UnsafeAs(A)` and + // `A as UnsafeAs(B)`. + unsafe adapt A; +} + +// Explicit conversion from B -> A +impl B as As(A) { + // Uses the definition of `B as UnsafeAs(A)` provided + // by `unsafe adapt A;` in the definition of `class B`. + extend impl as UnsafeAs(A); +} +// No implicit conversion from B -> A. + +// Implicit conversion from A -> B +impl A is ImplicitAs(B) { + // Uses the definition of `A as UnsafeAs(B)` provided + // by `unsafe adapt A;` in the definition of `class B`. + extend impl as UnsafeAs(B); + + // Note that we don't need to mention `As(B)` here, because we'll use the + // normal blanket impl for that one. +} +``` + +The end result is that safe adapters are syntactic sugar around unsafe adapters, +adding implementations of `As` that extend the implementations of `UnsafeAs`. + +> **Open question:** How does `extend impl` within an `impl` interact with +> `final`? Should we require writing `extend final impl` in an `impl` when the +> interface uses `extend final impl`? Should we require the extending `impl` to +> be a `final impl`? + +> **Future work:** Directly calling `UnsafeAs.Op` is equally unsafe, and should +> be findable when auditing. We don't yet have a general purpose way of making +> such calls auditable in code, but should add and use it for this interface. + +> **Future work:** Adding an extending impl of `UnsafeAs` seems like an unsafe +> operation, and might need an `unsafe` keyword for auditing. We should consider +> whether we have `unsafe interface` and `unsafe impl` or some other approach to +> tracking this in a future proposal around safety. + +One important use case we imagine for these semantics is working with the raw, +underlying storage of an object by defining an unsafe adapter for its type. This +proposal doesn't try to define the specifics of this, that is expected to be +part of a subsequent proposal that covers both storage and initialization of +storage. + +> **Future work:** Fully define how raw storage is represented for objects and +> the relevant operations on it. + +### A special type to operate on potentially unformed objects + +Next, we introduce a new, synthesized type that models an object of type T that +might be fully initialized or might be initialized to an unformed state: +`Core.MaybeUnformed(T)`: + +```carbon +class MaybeUnformed(T:! type) { + unsafe adapt T; +} + +impl forall [T:! type] T as ImplicitAs(MaybeUnformed(T)) { + fn Convert[self: Self]() -> MaybeUnformed(T) { + // Not clear how to write this, see open question above. + } +} +``` + +This type provides an API subset of `T`, somewhat analogous to how `const T` +works. Rather than the fields being `const`-qualified, only the subset of the +fields in the `UnformedInit.StructT` type are available. All of the functions +and methods on `T` are available, but functions that safely operate on unformed +objects of type `T` must accept `Core.MaybeUnformed(T)` to signify this to the +caller and enforce that their definition doesn't make assumptions about the +object being fully initialized. For example, this lets us describe the `Self` +type used in the `IsUnformed` interface for querying: + +```carbon +interface IsUnformed { + requires Self impls UnformedInit; + fn Op[self: Core.MaybeUnformed(Self)]() -> bool; +} +``` + +There is also a blanket `impl` for `IsUnformed` that forwards: + +```carbon +final impl forall [T:! IsUnformed] Core.MaybeUnformed(T) as IsUnformed { + fn Op[self: Self]() -> bool { + // We can call `T`'s implementation on `self` due to it taking + // its object parameter as `Core.MaybeUnformed(Self)` which is our `Self`. + return self.(T.(IsUnformed.Op))(); + } +} +``` + +Types that implement `IsUnformed` have the option of implementing assignment and +destruction with either `Core.MaybeUnformed(T)` as the `self` type, or `T`. +Assignment and destruction can even make different choices. Types without an +`IsUnformed` implementation (but do support unformed state) _must_ implement +assignment and destruction with `Core.MaybeUnformed(T)`. When assignment or +destruction are implemented with `Core.MaybeUnformed(T)`, they are called +regardless of whether the object is in the unformed state. When assignment or +destruction are implemented with `T`, the implementation will only call them if +`IsUnformed` returns false and using the normal type. If the `IsUnformed` +returns true in these cases, assignment will be replaced with initialization, +and destruction will be skipped. + +Unformed objects can be directly created from the unformed initialization +struct, using this built-in impl of `ImplicitAs`: + +```carbon +impl forall [T:! UnformedInit] + T.(UnformedInit.StructT) + as ImplicitAs(Core.MaybeUnformed(T)) { + fn Convert[self: Self]() -> Core.MaybeUnformed(T) { + // Initialize the fields of `T` that are part of `StructT`. + } +} +``` + +We also allow types to document that they have checked the type's invariants +outside of what the typesystem can prove by using the `unsafe as` operation. For +example: + +1. Convert a fully initialized `Core.MaybeUnformed(T)` object back to `T`. +2. Convert an unformed `Core.MaybeUnformed(T)` object to its underlying raw + storage. + +Both of these convert _reference expressions_, and continue to refer to the same +underlying storage or object. Because these are all _compatible_, they can also +be used to convert pointers between these types. + +The first conversion is useful when the `Core.MaybeUnformed(T)` object is in +fact fully initialized to continue with the normal `T` type. It can also be used +when the unformed state is arranged to be a valid initialization of `T`, and the +operation is merely reifying it as fully initialized. + +The second conversion is useful when the `Core.MaybeUnformed(T)` object is in +fact in an unformed state and a new object will be initialized into its existing +raw storage (because the destructor need not be run). The language mechanisms +needed for initializing raw storage will be part of specifying raw storage in a +future proposal. + +**Future work:** we should consider adding `impl`s to `Core.MaybeUnformed(T)` +when `IsUnformed` is implemented, ideally matching the those used for optional +types. This should allow `Core.MaybeUnformed(T)` to participate in all the +language-level affordances we provide for optional types as they are designed. A +key goal should be that this allows using `Core.MaybeUnformed(T)` without any +unsafe operations through tools like `if let`. + +### Flow-sensitive restrictions on objects known to be unformed + +Finally, we tie the previous two components together with flow-sensitive +restrictions on how objects known to be unformed can be used. + +An object is known to be in an unformed state when it is declared without an +explicit initializer, such as: + +```carbon +var object: SomeType; +``` + +**Future work:** We also expect to add operations to Carbon that put objects +into this state without a declaration, as we would like to use unformed state +for non-destructively-moved-from objects as well, but those will come in +subsequent and separate proposals. + +Because `object` above is known to be in an unformed state, we restrict how it +may be used subsequently: it may be used freely when its expression is +immediately converted to the special type `Core.MaybeUnformed(T)`, but any other +use is rejected until after it has been used as a reference parameter of type +`Core.MaybeUnformed(T)` to a call other than the type's own destructor. That +first call is required to initialize the object fully. + +Note that the type of `object` is _always_ `T`. If we are _deducing_ the type, +we will deduce the type `T`. The flow-sensitive restriction is on the valid +operations rather than the actual type. + +There is an escape hatch available here, by doing `unsafe as` to convert from +`Core.MaybeUnformed(T)` to `T`, including for a reference expression or in the +pointee type of a pointer. If known to be in the unformed state, this might +trigger using the `UnsafeHardenInit.Op` return to mitigate any bugs in the +subsequent (unsafe) use of an unformed value with the full type's API and +potentially accessing uninitialized memory. That hardening is expected to be +handled by the compiler and language automatically. + +This results in a very restrictive model that requires either initialization, +explicit code handling unformed values with `Core.MaybeUnformed`, or `unsafe`. +However, we suggest this restrictive model be considered _experimental_ and be +revisited based on experience as it may be necessary to make the escape hatch +occur automatically in more cases, especially around C++ interop. We also have +empirical evidence that the security implications of unsafe initialization are +much more realistic to address through automated measures such as the hardening +interface and automatic zeroing of storage in specific cases, making it more +reasonable to consider relaxing the safety target here. + +### Putting all this together for our use cases + +Let's start with `bool` and `T*`. Here, we'll use class types that implement +both in terms of private integers for illustration purposes, but the expectation +is that in practice these would likely be directly provided by builtin +operations. + +```carbon +// Imagined pseudo-implementation of `bool` for illustration. +class Bool { + fn True() -> Bool { return {.value = 1}; } + fn False() -> Bool { return {.value = 0}; } + + private var value: i8; +} + +impl Bool as Core.UnformedInvalid + where .StructT = {.value: i8} + and .Value = {.value = -1} {} + +// Imagined pseudo-implementation of `T*` for illustration. +class Ptr(T:! type) { + private var value: i64; +} + +impl forall [T:! type] Ptr(T) as Core.UnformedInvalid + where .StructT = {.value: i64} + and .Value = {.value = 0} {} + +// For illustration, we also make pointers have a hardened constant +// with the "infinite scream" value from LLVM's pattern initialization. +// +// Note that while there is a default `impl` of `UnformedInvalid`, +// the above `impl` will be used instead as it is more specialized. +impl forall [T:! type] Ptr(T) as Core.UnformedHarden + where .StructT = {.value: i64} + and .Value = {.value = 0xAAAA_AAAA_AAAA_AAAA} {} +``` + +Now let's build the `OwningPtr` version. This will build on the `Ptr(T)` above, +but demonstrate the different category of this type: it requires a non-trivial +destructor! + +```carbon +class OwningPtr(T:! type) { + private var ptr: Ptr(T); + + // With reflection, we could potentially provide an automatic + // way of delegating to a member, but spelling it out for now. + impl as Core.UnformedInvalid + where .StructT = {.ptr: Core.MaybeUnformed(Ptr(T))} + and .Value = {.ptr = Ptr(T).(UnformedInvalid.Value)} {} + + impl as Core.UnformedHarden + where .StructT = {.ptr: Core.MaybeUnformed(Ptr(T))} + and .Value = {.ptr = Ptr(T).(UnformedHarden.Value)} {} + + impl as Core.IsUnformed { + fn [self: Core.MaybeUnformed(Self)]() -> bool { + // Note that `self.ptr` has type `Core.MaybeUnformed(Ptr(T))`, but + // that still allows us to call `.(Core.IsUnformed.Op)()`. + return self.ptr.(Core.IsUnformed.Op)(); + } + } + + fn Reset[ref self: Core.MaybeUnformed(Self)](var rhs: OwningPtr(T)) { + if (!self.ptr.(Core.IsUnformed.Op)()) { + // This is a fully initialized object. Note that we could also + // directly access the `ptr` field by way of `self`, but are using the + // `unsafe as` here to illustrate its use in a case where it is + // correct. + let ref real_self: Self = self unsafe as Self; + destroy(real_self); + } + + // Imagined syntax to directly re-initialize storage, + // not part of this proposal. `~rhs` is a destructive move. + let ref storage: ??? = self unsafe as ???; + raw_init storage = {.ptr = (~rhs).ptr}; + } + + // Note, the destructor can simply be declared as accepting `Self`. + // This will require the test for unformed to be synthesized prior + // to automatic calls to the destructor, and requires explicit + // destructor calls from MaybeUnformed(OwningPtr(T)) to do that test + // themselves. + fn destroy[ref self: Self]() { + SomeDeallocationFunction(self.ptr); + } +} +``` + +Next, let's consider an integer where we _cannot_ test for unformed, but the +destructor is a no-op. We use a private member rather than adapting to make the +conversions a bit easier to read, but the effect should be the same. + +```carbon +class Int(N:! IntLiteral) { + private var value: MakeInt(N); + + // The unformed state can be empty because our destructor is trivial. + impl as UnformedNoop where .StructT = {} + and .Value = {}; + + // But we might harden the integer to zero. + impl as UnformedHarden where .StructT = {.value: MakeInt(N)} + and .Value = {.value = (0 as Int(N)).value}; + + impl as AssignWith(Int(N)) { + fn Op[ref self: Core.MaybeUnformed(Self)](rhs: Int(N)) { + // Nothing to do -- the destructor is always trivial. But this is + // still the only method that can be called on an unformed object + // to provide correctness checking. + + // Imagined syntax to directly re-initialize storage, not part of + // this proposal + let ref storage: ??? = self unsafe as ???; + raw_init storage = {.value = rhs.value}; + } + } +} +``` + +Last but not least, let's imagine an `OptionalOwningPtr(T)` where the optional +itself borrows the the unformed state of `OwningPtr(T)` as one of its _valid_ +states, and so can't implement testing for being unformed directly. This still +requires a non-trivial destructor though. + +```carbon +class OptionalOwningPtr(T:! type) { + fn Make(var ptr: OwningPtr(T)) -> Self { + return {.ptr = ~ptr}; + } + fn MakeEmpty() -> Self { + // Equivalent to `{.ptr = OwningPtr(T).(UnformedInvalid.Value)}`, but more + // general for types beyond our `OwningPtr(T)`. + return {.ptr = OwningPtr(T).(UnformedInit.Op)()}; + } + + let PtrT:! type = Core.MaybeUnformed(OwningPtr(T)); + + private var ptr: PtrT; + + // Note that isn't invalid, just a noop. + impl as Core.UnformedNoop + where .StructT = {.ptr: PtrT} + and .Value = {.ptr = PtrT.(UnformedInvalid.Value)} {} + + fn Reset[ref self: Core.MaybeUnformed(Self)](var rhs: OwningPtr(T)) { + // The destructor handles the unformed state itself. + self.destroy(); + + // Imagined syntax to directly re-initialize storage, + // not part of this proposal. `~rhs` is a destructive move. + let ref storage: ??? = self unsafe as ???; + raw_init storage = {.ptr = (~rhs).ptr}; + } + + fn destroy[ref self: Core.MaybeUnformed(Self)]() { + if (!self.ptr.(Core.IsUnformed.Op)()) { + (self.ptr unsafe as OwningPtr(T)).destroy(); + } + } +} +``` + +All of these rely in their implementation details on a few more facilities that +are not part of this proposal but expected to come in future proposals: + +- Invoking the destructor on objects +- Destructively moving objects +- Turning an object into raw storage +- Initializing raw storage with a new value + +However, the underlying model for the unformed state hopefully makes sense in +isolation. In an effort to decompose a large number of interconnected topics, +we're starting off by setting up the unformed state details and will have +follow-up proposals to fill in these gaps. + +## C++ interop + +This feature has specific C++ interop implications that we want to spell out. + +### C++ types and unformed state + +First, we want to provide unformed state for as many C++ types as we can without +creating painful surprises for users with the behavior. Provided there isn't +user surprise, types with unformed state have significantly better ergonomics. +However, we can't use default heuristics to synthesize an unformed and _invalid_ +state, and so our defaults and heuristics don't provide any `IsUnformed` +implementation. + +The proposed initial rule for synthesizing an unformed state of a C++ type is if +one of the following applies: + +- If the type is trivially default constructible, then it implements + `UnformedNoop` with an empty struct type and value. Note that we don't + require a trivial destructor here, as even if technically non-trivial, it + cannot correctly read any members. Carbon may elide calls to the destructor + that C++ would make, but this seems reasonable for interop purposes. +- If the type is default constructible and trivially _destroyable_, then the + type implements `UnformedInit` with a struct type of all its fields and the + `Op` function returning a default constructed version of the type. + +Beyond these defaults we can customize the behavior of specific types by +implementing their unformed state interfaces in wrapping Carbon code to select +specific values. + +#### C++ standard library types + +We suggest aggressively defining unformed states for widely used vocabulary +types in the C++ standard library to provide the best possible ergonomics during +interop. For many vocabulary types, this will likely be handled by mapping the +type into a Carbon-native type such as C++ pointers to Carbon pointers, and +`std::unique_ptr` into whatever owning pointer Carbon has. For other standard +library types, this should be done in their wrapping Carbon code. + +### Passing unformed objects into C++ code + +C++ APIs that expect to initialize output parameters passed by reference or +pointer won't have the Carbon `Core.MaybeUnformed` type to identify them. We +propose in the non-strict safety modes of Carbon, that there is an implicit +`unsafe` cast in order to allow calling these APIs even with an unformed object +provided the API could legitimately initialize the type. After this cast, the +object should be assumed well formed, as it would be normally. + +We expect the strict Carbon mode to include rejecting these without the +_explicit_ `unsafe` operation, to slowly move to all unsafe initialization being +marked explicitly in the source code. + +As with the overall strict model, we suggest these rules to be considered +experimental and revisited if they are either too lax in the non-strict mode, or +problematic even in strict mode. + +## Further details + +### Expected standard type behavior + +We expect integers, floating point numbers, pointers, and bool to all support +unformed states. Generally, we expect Core types in Carbon to provide unformed +state whenever there is a reasonable implementation strategy. + +We also specifically expect pointers and bool to implement `IsUnformed`, +allowing types that build on top of these to use them to model their own +unformed state. + +### Class types with a vtable + +**Future work:** it would be very nice to allow types to reuse the +vtable-pointer field to implement their unformed state. This is left as future +work to address how to allow types to control opting-in and opting-out of this +behavior, as well as how it should work across inheritance. + +### Comparison to `MaybeUninit` from Rust + +It's important to compare and contrast what this proposal suggests to Rust's +`MaybeUninit` as they can seem superficially very similar. + +Fundamentally, the goal of modeling unformed state is somewhat different from +`MaybeUninit` -- unformed state is about providing access to types' internal +invalid or no-op states in order to provide improved ergonomics around +initialization without unsafe user code. It is very much focused on enabling +_type design_ to opt into this, rather than intended for user code dealing with +complex initialization. + +Rust's `MaybeUninit` on the other hand is about explicitly modeling +uninitialized memory, allowing unsafe code to initialize it in ways that can't +be directly modeled in the (safe) type system. It is a tool that code uses with +a type to handle complex initialization patterns, including populating data from +I/O buffers or unsafe system calls, not a tool that the type itself uses to +build up its own API. + +Carbon may well end up needing something like `MaybeUninit` to manage +uninitialized memory which cannot be modeled safely. Currently, the plan is to +directly use raw storage for this, but if doing so surfaces an important need +for an abstraction layer on top like `MaybeUninit`, we should add it. + +The differences in functionality provided by `Core.MaybeUnformed(T)` and its +expected usage (in assignment and destruction) follows from this different goal +of allowing a type to create an efficient and ergonomic API with initialization +flexibility. + +## Rationale + +- [Performance-critical software](/docs/project/goals.md#performance-critical-software) + - Customizing the exact hardening approach gives added per-type control to + library authors to get the best cost/benefit tradeoff between + performance and security. + - Exposing an unformed state can reduce the branching required to + represent control-dependent initialized objects. + - Strict handling of unformed state can reduce the need for defensive + hardening of objects, providing the developer control over the costs of + their code without loss of safety. +- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) + - The unformed state models common idioms used in C++ where it types can + model an otherwise-invalid state that still supports assignment and + destruction to simplify initialization and moving code patterns. + - Making incorrect usage of objects in this state explicit in the language + and type system allows better and earlier error messages during + development. +- [Practical safety and testing mechanisms](/docs/project/goals.md#practical-safety-and-testing-mechanisms) + - Supports both existing C++ idioms when mapped into Carbon without + regressing safety and provides a clear path to increase safety in the + space of initialization. + +## Alternatives considered + +### Bit-mask based unformed state + +Rather than using a subset of the fields of an object, we could instead define a +bitmask of the object that is initialized in the unformed state. + +Advantages: + +- Significantly finer granularity of initialization and querying of the + object. +- Potential to use invalid bit patterns that are not represented as fields. + +Disadvantages: + +- We don't yet have a design for bit-fields in Carbon, which would likely + intersect with this in many ways. +- More complex model than using fields. + +The suggestion is to not pursue this initially, but to revisit when fully +introducing bit-fields and thinking more holistically about bit-oriented type +layouts. That seems like the place where this would become most desirable and a +collection of design that any bit-oriented solution would need to integrate +cleanly with. + +### Conversion oriented API design + +Initially, this proposal pursued an API design for working with unformed values +by converting them to different types in order to access the fields available in +the unformed state. + +The proposal shifted to the current direction because the resulting code with +conversions was complicated and difficult to understand. The model of an API +subset was much more easily understood, explained, and used in practice. + +### Switching to one of the simpler alternatives discussed in #257 + +Fleshing out these details does raise the question of whether we should switch +Carbon to one of the alternatives to unformed state more generally. The set of +options here has not materially changed since #257 and so we don't duplicate +that list and analysis here. + +Fundamentally, this proposal suggests that there is still a good motivation to +try and match the idioms across C++ code where types have this "partially +formed" (what we're calling "unformed") state that is used for deferred +initialization and potentially moved-from states. This pattern continues to be +prevalent and well liked in C++. We hope that the model here allows Carbon to +provide a very natural access to this pattern while also providing a path to +ever more careful and strict checking of its correct usage. + +### Alternative syntax choices for `unsafe as` discussed in #5930 + +The main alternative syntax considered was avoiding the two keywords in sequence +with `unsafe_as`. It also looked at `try as` or `try_as` for comparison. + +Advantages of `unsafe_as`: + +- Lexically simpler as it is a single word. +- Many other languages use underscores for compound keywords. + - But Python at least does provide precedent with `not in` for omitting + the underscore. +- The separate keywords don't work for all cases we could imagine, see below. + +Disadvantages of `unsafe_as`: + +- Unclear how composition with `try_as` would work. +- Makes the use of `unsafe` for auditing unsafe constructs more difficult. + +The issue also examined whether we want this to be _specific_ to `unsafe`, but +that continued to pose compositional challenges. + +The decision was to use two keywords, but specifically here where the keywords +both read well on their own as keywords, and where they compose in a clear +modifying way. We could imagine other compound words which would struggle to +meet both of these criteria such as `raw_init`, and the decision to use spaces +here doesn't implicate those keywords one way or the other. When we come to such +a keyword, we'll need to decide whether to have both independent keywords such +as `unsafe as` _and_ connected multi-word keywords at the same time, or make +some other adjustment.