blob: a85281d8f865e06ae5842bb74fd1f139b75afef8 [file] [log] [blame] [view]
# Struct Layout
C++ (in the Itanium ABI) extends the C layout rules, and so `repr(C)` isn't
enough. This pages documents the tweaks to Rust structs to give them the same
layout as C++ structs.
In particular:
* C++ classes and Rust structs must have the same alignment, so that
references can be exchanged without violating the alignment rules. This is
usually ensured by the regular `#[repr(C)]` layout algorithm, but sometimes
the interop tool needs to generate explicit `#[repr(align(n))]` annotations.
* C++ classes and Rust structs must have the same size, so that arrays of
objects can be exchanged.
* Public subobjects must have the same offsets in C++ and Rust versions of the
structs.
## Non-field data
Rust bindings introduce a `__non_field_data: [MaybeUninit<u8>; N]` field to
cover data within the object that is not part of individual fields. This
includes:
* Base classes.
* VTable pointers.
* Empty struct padding.
### Empty Structs
One notable special case of this is the empty struct padding. An empty struct or
class (e.g. `struct Empty{};`) has size `1`, while in Rust, it has size `0`. To
make the layout match up, bindings for empty structs will always enforce that
the struct has size of at least 1, via `__non_field_data`.
(In C++, different array elements are guaranteed to have different addresses,
and also, arrays are guaranteed to be contiguous. Therefore, no object in C++
can have size `0`. Rust, like C++, has only contiguous arrays, but unlike C++
Rust does not guarantee that distinct elements have distinct addresses.)
## Potentially-overlapping objects
In C++, in some circumstances, the requirement that objects do not overlap is
relaxed: base classes and `[[no_unique_address]]` member variables can have
subsequent objects live inside of their tail padding. The most famous instance
of this is the
[empty base class optimization (EBCO)](https://en.cppreference.com/w/cpp/language/ebo):
a base class with no data members is permitted to take up zero space inside of
derived classes.
NOTE: This has other, non-layout consequences for Rust: for example, it is not
safe to obtain two `&mut` references to overlapping objects, unless they are of
size `0`. (To prevent this, classes that might be base classes are always
[`!Unpin`](unpin.md).)
This is impossible to represent in a C-like struct. (Indeed, it's impossible to
represent even in a C++-like struct, before the introduction of
`[[no_unique_address]]`). Therefore, in Rust, we don't even try:
potentially-overlapping subobjects are replaced in the Rust layout by a
`[MaybeUninit<u8>; N]` field, where `N` is large enough to ensure that the next
subobject starts at the correct offset. The alignment of the struct is still
changed so that it matches the C++ alignment, but via `#[repr(align(n))]`
instead of by aligning the field.
### Example
For example, consider these two C++ classes:
```c++
// This is a class, instead of a struct, to ensure that it is not POD for the
// purpose of layout. (The Itanium ABI disables the overlapping subobject
// optimization for POD types.)
class A {
int16_t x_;
int8_t y_;
};
struct B final : A {
int8_t z;
}
```
In memory, this may be laid out as so:
```
| x_ | x_ | y_ | z |
<------------> <->
A subobject | B
<------------------>
sizeof(A)
(also sizeof(B))
```
The correct representation for `B`, in Rust, is something like this:
```rs
#[repr(C)]
#[repr(align(2))] // match the alignment of the int16_t variable.
struct B {
// The We don't use a field of type `A`, because it would have a size of 4,
// and Rust wouldn't permit `z` to live inside of it.
// Nor do we align the array, for the same reason -- correct alignment must be
// achieved via the repr(align(2)) at the top.
__non_field_data : [MaybeUninit<u8>; 3];
pub z: i8,
}
```