blob: f71879cfe998d304054e2294f7e06e6b82659410 [file] [log] [blame] [view]
Devin Jeanpierre2111ede2022-04-25 15:52:46 -07001# `Unpin` for C++ Types
2
3SUMMARY: A C++ type is `Unpin` if it is trivially relocatable (e.g., a trivial
4type, or a nontrivial type which is `[[clang::trivial_abi]]`), and is `final`.
5Any such type can be used by value or plain reference/pointer in interop, all
6non-`Unpin` types must instead be used behind pinned pointers and references.
7
8A C++ type `T` is `Unpin` (always safe to manipulate through `&mut T`) if it is
9known to be a **trivially relocatable type** (move+destroy is logically
10equivalent to `memcpy`+release) with **insignificant padding** (it does not
11matter if the padding is included in that `memcpy`).
12
13`Unpin` C++ types can be used like any other normal Rust type: they are always
14safe to access by reference or by value. Non-`Unpin` types, in contrast, can
15only be accessed behind pins such as `Pin<&mut T>`, or `Pin<Box<T>>`, because it
16may not be safe to directly mutate. These types are never used directly by value
17in Rust, because value-like assignment has incorrect semantics: it fails to run
18C++ special members for non-trivially-relocatable types, it can overwrite
19padding for types with significant padding.
20
21## Trivially Relocatable Types
22
23In C++, moving a value between locations in memory involves executing code to
24either initialize (move-construct) or overwrite (move-assign) the new location.
25The old location still exists, but is in a moved-from state, and must still be
26destroyed to release resources.
27
28(For example, `std::string x = std::move(y);` will run the move constructor, so
29that `x` contains the same value that `y` used to have before the move. The
30variable `y` will still be a valid string, but might be empty, or might contain
31some garbage value. The destructors for both `x` and `y` will run when they go
32out of scope.)
33
34Rust does not have move constructors or move assignment. In fact, there is no
35way to customize what happens during moving or assignment: in Rust, moving or
36swapping an object means changing its location in memory, as if by `memcpy`
37without running the destructor logic in the old location. Another way of looking
38at it is that it's as if an object moved around in memory over time: it is
39constructed in one place, and then further operations and eventual destruction
40might happen in other places. We call such a Rust-like move a "trivial
41relocation" operation.
42
43Despite C++ moves using explicit construction and destruction calls, many C++
44types could also have used the Rust movement model. We call such types
45**trivially relocatable** types.
46
47For example, a C++ `std::unique_ptr`, implemented in the obvious way, is
48trivially relocatable: its actual location in memory does not matter. In
49contrast, a self-referential type is not trivially relocatable, because to
50relocate it, you must also update the pointer it has to itself. This is done
51inside the move constructor in C++, but cannot be done in the Rust model, where
52the move operation is not customizable.
53
54For more background, see
55[P1144](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1144r5.html).
56
57### Which types are trivially relocatable?
58
59For the purpose of Rust/C++ interop, we define a type to be trivially
60relocatable if, and only if, it is "trivial for calls" in Clang. That is,
61either:
62
631. It is actually
64 [trivial](https://en.cppreference.com/w/cpp/named_req/TrivialType), **or**
652. It uses
66 [`[[clang::trivial_abi]]`](https://clang.llvm.org/docs/AttributeReference.html#trivial-abi)
67 to make itself trivial for calls
68
69This definition is conservative: some types that could be considered trivially
70relocatable are not trivial for calls. (For example, `std::unique_ptr` uses
71`[[clang::trivial_abi]]` only in the unstable libc++ ABI; the stable libc++ ABI
72predates this attribute, and adding it now is ABI-breaking.)
73
74This definition is, however, sound: all types which are trivial for calls are
75trivially relocatable, because a type which is trivial for calls is
76trivially-relocated when passed by value as a function argument.
77
78### Expanding trivial relocatability
79
80We are working to extend libc++ and Clang to trivially relocate these types in
81even more circumstances, which would make `[[clang::trivial_abi]]` more
82compelling and more widely used, enhancing both performance and
83Rust-compatibility for our C++ core libraries.
84
85* [[clang] Mark `trivial_abi` types as "trivially relocatable".](https://reviews.llvm.org/D114732)
86* [Use trivial relocation operations in std::vector, by porting D67524 and
87 part of D61761 to work on top of the changes in
88 D114732.](https://reviews.llvm.org/D119385)
89
90A future change to C++ or Clang in the vein of
91[P1144](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1144r5.html)
92could make types trivially relocatable without requiring ABI changes as
93`[[clang::trivial_abi]]` does, although in the short term this doesn't seem very
94likely.
95
96## Insignificant Padding
97
98If a type has padding, then even if the type is trivially relocatable and
99therefore safe to write as if by `memcpy`, **Rust will `memcpy` an incorrect
100number of bytes**: Rust will include the padding, though C++ would not.
101Trivially relocatable types where the padding potentially has semantic meaning
102can still be handled by value, but are `!Unpin`, and all mutable references Rust
103receives from C++ must be `Pin<&mut T>`. Only trivially relocatable types where
104the padding has no significance can be `Unpin` and safe to deal with via `&mut`.
105
106Significant padding occurs via inheritance -- derived types may reuse the
107padding for other objects -- and from the `[[no_unique_address]]` attribute
108(which declares the padding to be reusable).
109
110For the purposes of C++/Rust interop, `[[no_unique_address]]` is an unsafe
111feature, and any type which cannot be inherited from (via e.g. `final`) is
112considered to have insignificant padding.
113
114### When is padding significant?
115
116In C++, if you take a mutable reference to a base class subobject, and pass it
117around, this is ultimately pretty safe. If you assign to it, it is a bit bad --
118it will assign to only the base class subobject (if it's nonvirtual), not just
119the subclass -- but it's possible for this to make sense, and if it were truly
120dangerous they'd probably have deleted assignment or not inherited from the base
121class.
122
123In Rust, this is *extremely dangerous*, because the size of the base class
124subobject can extend to include fields from the derived class. For example, take
125this class hierarchy:
126
127```c++
128class Base {
129 int64_t x_;
130 int32_t y_;
131 /* ...methods... */
132};
133
134class Derived : public Base {
135 int32_t size_;
136 char* data_;
137 /* ...methods... */
138};
139```
140
141Here we have a class `Derived` with some string data, which inherits from
142`Base`. But something unfortunate happens: because `Base` has an extra 32 bits
143of tail padding, and is not POD for the purpose of layout, the `size_` member of
144the derived class is stored inside the tail padding for `Base`. This is allowed
145by the C++ standard, and actually taken advantage of in the Itanium ABI.
146
147In C++, this presents no problems, as C++ assignment doesn't do something like
148`memcpy sizeof(x) bytes`, even when the class is trivially assignable. It only
149copies the real data size, excluding padding. And so this code will not
150accidentally overwrite the `size_` field:
151
152```c++
153Derived& d = ...;
154Base& b1 = d;
155Base& b2 = ...;
156std::swap(b1, b2);
157```
158
159But the seemingly equivalent Rust code absolutely will:
160
161```rs
162let d : &mut Derived = ...;
163let b1 : &mut Base = d.into();
164let b2 : &mut Base = ...;
165// This overwrites size_ from the derived class with uninitialized memory from
166// b2.
167std::mem::swap(b1, b2); // Catastrophically bad.
168```
169
170As a consequence, types like `Base` should not be exposed as `&mut` references:
171they might refer to a base class subobject, in which case assignment in Rust
172will do the wrong thing. Even if they are trivially relocatable and assignment
173is equivalent to a `memcpy`, Rust will memcpy the wrong number of bytes.
174
175### Gaps
176
177#### `[[no_unique_address]]`
178
179The exact same behavior can occur with `[[no_unique_address]]`. There are three
180options:
181
1821. Live with the unsafety of `[[no_unique_address]]`, and make it buyer beware.
183 This is similar to how we treat packed struct fields.
184
1852. Forbid `[[no_unique_address]]` in the C++ style guide, except for zero-sized
186 types (which we can probably handle fine).
187
1883. Switch approaches: rather than only allowing it for `final` classes and the
189 like, only allow it for classes whose data size is guaranteed to be the same
190 as their stride, possibly using something like a `[[pod_layout]]` attribute.
191
192For now, we take approach #1: `[[no_unique_address]]` is considered an unsafe
193feature, which can render padding significant on any type which has padding.
194
195#### Lambdas
196
197TODO: implement this.
198
199Lambdas are class types, are not `final`, and cannot be marked `final`. Most
200likely, we need to simply pretend that they are `final` -- it is not very useful
201to inherit from a lambda, and this should not break people in practice.
202
203### How common is this?
204
205Only ~4% of classes at Google are base
206classes to some other type.
207
208This means the number of classes that *should* be pinned due to potentially
209significant padding is low, and the number of classes that *should* be marked
210final is high. Mixed blessings: more boilerplate in C++, but less annoyance in
211Rust, as the vast majority of classes can be marked `final` via LSC.
212
213However, 4% doesn't quite seem small enough that we can pretend the issue
214doesn't exist.