| # `Unpin` for C++ Types |
| |
| SUMMARY: A C++ type is `Unpin` if it is trivially relocatable (e.g., a trivial |
| type, or a nontrivial type which is `[[clang::trivial_abi]]`), and is `final`. |
| Any such type can be used by value or plain reference/pointer in interop, all |
| non-`Unpin` types must instead be used behind pinned pointers and references. |
| |
| A C++ type `T` is `Unpin` (always safe to manipulate through `&mut T`) if it is |
| known to be a **trivially relocatable type** (move+destroy is logically |
| equivalent to `memcpy`+release) with **insignificant padding** (it does not |
| matter if the padding is included in that `memcpy`). |
| |
| `Unpin` C++ types can be used like any other normal Rust type: they are always |
| safe to access by reference or by value. Non-`Unpin` types, in contrast, can |
| only be accessed behind pins such as `Pin<&mut T>`, or `Pin<Box<T>>`, because it |
| may not be safe to directly mutate. These types are never used directly by value |
| in Rust, because value-like assignment has incorrect semantics: it fails to run |
| C++ special members for non-trivially-relocatable types, it can overwrite |
| padding for types with significant padding. |
| |
| ## Trivially Relocatable Types |
| |
| In C++, moving a value between locations in memory involves executing code to |
| either initialize (move-construct) or overwrite (move-assign) the new location. |
| The old location still exists, but is in a moved-from state, and must still be |
| destroyed to release resources. |
| |
| (For example, `std::string x = std::move(y);` will run the move constructor, so |
| that `x` contains the same value that `y` used to have before the move. The |
| variable `y` will still be a valid string, but might be empty, or might contain |
| some garbage value. The destructors for both `x` and `y` will run when they go |
| out of scope.) |
| |
| Rust does not have move constructors or move assignment. In fact, there is no |
| way to customize what happens during moving or assignment: in Rust, moving or |
| swapping an object means changing its location in memory, as if by `memcpy` |
| without running the destructor logic in the old location. Another way of looking |
| at it is that it's as if an object moved around in memory over time: it is |
| constructed in one place, and then further operations and eventual destruction |
| might happen in other places. We call such a Rust-like move a "trivial |
| relocation" operation. |
| |
| Despite C++ moves using explicit construction and destruction calls, many C++ |
| types could also have used the Rust movement model. We call such types |
| **trivially relocatable** types. |
| |
| For example, a C++ `std::unique_ptr`, implemented in the obvious way, is |
| trivially relocatable: its actual location in memory does not matter. In |
| contrast, a self-referential type is not trivially relocatable, because to |
| relocate it, you must also update the pointer it has to itself. This is done |
| inside the move constructor in C++, but cannot be done in the Rust model, where |
| the move operation is not customizable. |
| |
| For more background, see |
| [P1144](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1144r5.html). |
| |
| ### Which types are trivially relocatable? |
| |
| For the purpose of Rust/C++ interop, we define a type to be trivially |
| relocatable if, and only if, it is "trivial for calls" in Clang. That is, |
| either: |
| |
| 1. It is actually |
| [trivial](https://en.cppreference.com/w/cpp/named_req/TrivialType), **or** |
| 2. It uses |
| [`[[clang::trivial_abi]]`](https://clang.llvm.org/docs/AttributeReference.html#trivial-abi) |
| to make itself trivial for calls |
| |
| This definition is conservative: some types that could be considered trivially |
| relocatable are not trivial for calls. (For example, `std::unique_ptr` uses |
| `[[clang::trivial_abi]]` only in the unstable libc++ ABI; the stable libc++ ABI |
| predates this attribute, and adding it now is ABI-breaking.) |
| |
| This definition is, however, sound: all types which are trivial for calls are |
| trivially relocatable, because a type which is trivial for calls is |
| trivially-relocated when passed by value as a function argument. |
| |
| ### Expanding trivial relocatability |
| |
| We are working to extend libc++ and Clang to trivially relocate these types in |
| even more circumstances, which would make `[[clang::trivial_abi]]` more |
| compelling and more widely used, enhancing both performance and |
| Rust-compatibility for our C++ core libraries. |
| |
| * [[clang] Mark `trivial_abi` types as "trivially relocatable".](https://reviews.llvm.org/D114732) |
| * [Use trivial relocation operations in std::vector, by porting D67524 and |
| part of D61761 to work on top of the changes in |
| D114732.](https://reviews.llvm.org/D119385) |
| |
| A future change to C++ or Clang in the vein of |
| [P1144](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1144r5.html) |
| could make types trivially relocatable without requiring ABI changes as |
| `[[clang::trivial_abi]]` does, although in the short term this doesn't seem very |
| likely. |
| |
| ## Insignificant Padding |
| |
| If a type has padding, then even if the type is trivially relocatable and |
| therefore safe to write as if by `memcpy`, **Rust will `memcpy` an incorrect |
| number of bytes**: Rust will include the padding, though C++ would not. |
| Trivially relocatable types where the padding potentially has semantic meaning |
| can still be handled by value, but are `!Unpin`, and all mutable references Rust |
| receives from C++ must be `Pin<&mut T>`. Only trivially relocatable types where |
| the padding has no significance can be `Unpin` and safe to deal with via `&mut`. |
| |
| Significant padding occurs via inheritance -- derived types may reuse the |
| padding for other objects -- and from the `[[no_unique_address]]` attribute |
| (which declares the padding to be reusable). |
| |
| For the purposes of C++/Rust interop, `[[no_unique_address]]` is an unsafe |
| feature, and any type which cannot be inherited from (via e.g. `final`) is |
| considered to have insignificant padding. |
| |
| ### When is padding significant? |
| |
| In C++, if you take a mutable reference to a base class subobject, and pass it |
| around, this is ultimately pretty safe. If you assign to it, it is a bit bad -- |
| it will assign to only the base class subobject (if it's nonvirtual), not just |
| the subclass -- but it's possible for this to make sense, and if it were truly |
| dangerous they'd probably have deleted assignment or not inherited from the base |
| class. |
| |
| In Rust, this is *extremely dangerous*, because the size of the base class |
| subobject can extend to include fields from the derived class. For example, take |
| this class hierarchy: |
| |
| ```c++ |
| class Base { |
| int64_t x_; |
| int32_t y_; |
| /* ...methods... */ |
| }; |
| |
| class Derived : public Base { |
| int32_t size_; |
| char* data_; |
| /* ...methods... */ |
| }; |
| ``` |
| |
| Here we have a class `Derived` with some string data, which inherits from |
| `Base`. But something unfortunate happens: because `Base` has an extra 32 bits |
| of tail padding, and is not POD for the purpose of layout, the `size_` member of |
| the derived class is stored inside the tail padding for `Base`. This is allowed |
| by the C++ standard, and actually taken advantage of in the Itanium ABI. |
| |
| In C++, this presents no problems, as C++ assignment doesn't do something like |
| `memcpy sizeof(x) bytes`, even when the class is trivially assignable. It only |
| copies the real data size, excluding padding. And so this code will not |
| accidentally overwrite the `size_` field: |
| |
| ```c++ |
| Derived& d = ...; |
| Base& b1 = d; |
| Base& b2 = ...; |
| std::swap(b1, b2); |
| ``` |
| |
| But the seemingly equivalent Rust code absolutely will: |
| |
| ```rs |
| let d : &mut Derived = ...; |
| let b1 : &mut Base = d.into(); |
| let b2 : &mut Base = ...; |
| // This overwrites size_ from the derived class with uninitialized memory from |
| // b2. |
| std::mem::swap(b1, b2); // Catastrophically bad. |
| ``` |
| |
| As a consequence, types like `Base` should not be exposed as `&mut` references: |
| they might refer to a base class subobject, in which case assignment in Rust |
| will do the wrong thing. Even if they are trivially relocatable and assignment |
| is equivalent to a `memcpy`, Rust will memcpy the wrong number of bytes. |
| |
| ### Gaps |
| |
| #### `[[no_unique_address]]` |
| |
| The exact same behavior can occur with `[[no_unique_address]]`. There are three |
| options: |
| |
| 1. Live with the unsafety of `[[no_unique_address]]`, and make it buyer beware. |
| This is similar to how we treat packed struct fields. |
| |
| 2. Forbid `[[no_unique_address]]` in the C++ style guide, except for zero-sized |
| types (which we can probably handle fine). |
| |
| 3. Switch approaches: rather than only allowing it for `final` classes and the |
| like, only allow it for classes whose data size is guaranteed to be the same |
| as their stride, possibly using something like a `[[pod_layout]]` attribute. |
| |
| For now, we take approach #1: `[[no_unique_address]]` is considered an unsafe |
| feature, which can render padding significant on any type which has padding. |
| |
| #### Lambdas |
| |
| TODO: implement this. |
| |
| Lambdas are class types, are not `final`, and cannot be marked `final`. Most |
| likely, we need to simply pretend that they are `final` -- it is not very useful |
| to inherit from a lambda, and this should not break people in practice. |
| |
| ### How common is this? |
| |
| Only ~4% of classes at Google are base |
| classes to some other type. |
| |
| This means the number of classes that *should* be pinned due to potentially |
| significant padding is low, and the number of classes that *should* be marked |
| final is high. Mixed blessings: more boilerplate in C++, but less annoyance in |
| Rust, as the vast majority of classes can be marked `final` via LSC. |
| |
| However, 4% doesn't quite seem small enough that we can pretend the issue |
| doesn't exist. |