C++ (in the Itanium ABI) extends the C layout rules, and so repr(C)
isn't enough. This pages documents the tweaks to Rust structs to give them the same layout as C++ structs.
In particular:
#[repr(C)]
layout algorithm, but sometimes the interop tool needs to generate explicit #[repr(align(n))]
annotations.Rust bindings introduce a __non_field_data: [MaybeUninit<u8>; N]
field to cover data within the object that is not part of individual fields. This includes:
One notable special case of this is the empty struct padding. An empty struct or class (e.g. struct Empty{};
) has size 1
, while in Rust, it has size 0
. To make the layout match up, bindings for empty structs will always enforce that the struct has size of at least 1, via __non_field_data
.
(In C++, different array elements are guaranteed to have different addresses, and also, arrays are guaranteed to be contiguous. Therefore, no object in C++ can have size 0
. Rust, like C++, has only contiguous arrays, but unlike C++ Rust does not guarantee that distinct elements have distinct addresses.)
In C++, in some circumstances, the requirement that objects do not overlap is relaxed: base classes and [[no_unique_address]]
member variables can have subsequent objects live inside of their tail padding. The most famous instance of this is the empty base class optimization (EBCO): a base class with no data members is permitted to take up zero space inside of derived classes.
NOTE: This has other, non-layout consequences for Rust: for example, it is not safe to obtain two &mut
references to overlapping objects, unless they are of size 0
. (To prevent this, classes that might be base classes are always !Unpin
.)
This is impossible to represent in a C-like struct. (Indeed, it‘s impossible to represent even in a C++-like struct, before the introduction of [[no_unique_address]]
). Therefore, in Rust, we don’t even try: potentially-overlapping subobjects are replaced in the Rust layout by a [MaybeUninit<u8>; N]
field, where N
is large enough to ensure that the next subobject starts at the correct offset. The alignment of the struct is still changed so that it matches the C++ alignment, but via #[repr(align(n))]
instead of by aligning the field.
For example, consider these two C++ classes:
// This is a class, instead of a struct, to ensure that it is not POD for the // purpose of layout. (The Itanium ABI disables the overlapping subobject // optimization for POD types.) class A { int16_t x_; int8_t y_; }; struct B final : A { int8_t z; }
In memory, this may be laid out as so:
| x_ | x_ | y_ | z | <------------> <-> A subobject | B <------------------> sizeof(A) (also sizeof(B))
The correct representation for B
, in Rust, is something like this:
#[repr(C)] #[repr(align(2))] // match the alignment of the int16_t variable. struct B { // The We don't use a field of type `A`, because it would have a size of 4, // and Rust wouldn't permit `z` to live inside of it. // Nor do we align the array, for the same reason -- correct alignment must be // achieved via the repr(align(2)) at the top. __non_field_data : [MaybeUninit<u8>; 3]; pub z: i8, }