This doc contains code snippets of C++ code and how it would look when automatically translated by a tool, both the first (unsafe) version and after some automated cleanup passes. Each subsection focus on a specific topic. Unless specified otherwise, we assume all C++ code is present in a header file.
We also don't consider the possibility of name collision (including with reserved keywords in Rust). We will use the same naming convention as the C++/Rust interop, and use a local scope for any extra temporaries we might need. Lastly, we assume some sort of annotation in C++ code about whether a pointer is nullable or never null.
We assume the first pass would convert the C++ code to an unsafe version of Rust as similar as possible to the original code. This includes:
T::Drop
for types with different destruction order between C++ and Rust (structs, tuples, arrays/owned slices and enum variants), to keep the same destruction order as in the C++ implementation. This is only done if the type has 2+ non-trivially-destructible fields.After the first pass is done, we can run multiple cleanup passes, running tests in each pass to see if they respect expected behavior. Some of them are safe to be applied automatically, while for others we might prefer to require human review (we should strive to make the latter simple and fast to review).
Automated passes include:
Passes that require human review include:
RefCell
.T::Drop
implementations if the user confirms that the destruction order does not matter for the type T.The cleanup pass to convert Rust raw pointers that have a lifetime annotation into references is probably the most complicated one, because we need guarantees that opaque functions (including FFI calls) don't alias pointers. This also includes objects that contain at least a pointer T* and either another field of the same type T, another pointer of the same type T* or another type that contains T inside. Our strategy here is multifold:
All code below assumes C++ pointers are annotated as not nullable, unless stated otherwise.
Original C++ code:
#include <stdint.h> uint32_t AddAndCast(uint64_t x, uint64_t y) { return x + y; }
Rust code after the first pass:
pub fn add_and_cast(x: u64, y: u64) -> u32 { x.wrapping_add(y) as u32 }
Rust code after the arithmetic cleanup pass, triggered manually:
pub fn add_and_cast(x: u64, y: u64) -> u32 { (x + y) as u32 }
Original C++ code:
#include <stdint.h> void UseUInt32T(uint32_t x); uint32_t UsePtrs(uint32_t* x, uint32_t * $a y) { UseUInt32T((*x)++); UseUInt32T(++(*x)); UseUInt32T((*y)); return *x; }
Rust code after the first pass:
fn use_u_int32_t(x: u32) { // translated based on the implementation in the .cc file or // imported from another crate } pub fn use_ptrs(x: *mut u32, y: *mut /* $a */ u32) -> u32 { use_u_int32_t( unsafe { // c++ version: // (*x)++ let temp1 = *x; *x += 1; temp1 } ); use_u_int32_t( unsafe { // c++ version: // ++(*x) *x += 1; *x } ); use_u_int32_t( unsafe { // c++ version: // (*y) *y } ); unsafe { // c++ version: // (*x) *x } }
Rust code after converting raw pointers with lifetime annotations to references:
fn use_u_int32_t(x: u32) { // translated based on the implementation in the .cc file or // imported from another crate } pub fn use_ptrs(x: *mut u32, y: &'a mut u32) -> u32 { use_u_int32_t( unsafe { // c++ version: // (*x)++ let temp1 = *x; *x += 1; temp1 } ); use_u_int32_t( unsafe { // c++ version: // ++(*x) *x += 1; *x } ); use_u_int32_t(*y); unsafe { // c++ version: // (*x) *x } }
Rust code after pass to move nested blocks to separate line:
fn use_u_int32_t(x: u32) { // translated based on the implementation in the .cc file or // imported from another crate } pub fn use_ptrs(x: *mut u32, y: &'a mut u32) -> u32 { let temp1 = unsafe { // c++ version: // (*x)++ let temp1 = *x; *x += 1; temp1 }; use_u_int32_t(temp1); let temp2 = unsafe { // c++ version: // ++(*x) *x += 1; *x }; use_u_int32_t(temp2); use_u_int32_t(*y); unsafe { // c++ version: // (*x) *x } }
Original C++ code:
#include <stdint.h> uint32_t CreateAlias(uint32_t* x, uint32_t * $a y) { y = x; return *y; }
Rust code after the first pass:
pub fn create_alias(x: *mut u32, y: *mut /* $a */ u32) -> u32 { y = x; unsafe { // c++ version: // (*y) *y } }
Since CreateAlias
does alias y and x, we won't be able to convert raw pointers with lifetime annotations (in this case y) to references, because x has no lifetime annotation. If both had a matching lifetime annotation the tool would try to convert them to references.
Original C++ code:
#include <stdint.h> void AnnotatedFfiFunctionThatCreatesAlias( /* alias_tag */ uint32_t* x, /* alias_tag */ uint32_t* y); uint32_t UsePtrs(uint32_t * $a x, uint32_t * $b y) { AnnotatedFfiFunctionThatCreatesAlias(x, y); return *y; }
Rust code after the first pass:
import crate_name::annotated_ffi_function_that_creates_alias; // signature: // pub fn annotated_ffi_function_that_creates_alias(x: *mut u32, y: *mut u32); pub fn use_ptrs(x: *mut /* $a */ u32, y: *mut /* $b */ u32) -> u32 { annotated_ffi_function_that_creates_alias(x, y); unsafe { // c++ version: // (*y) *y } }
Since annotated_ffi_function_that_creates_alias
has annotations in both arguments that it introduces alias, we will not run the cleanup pass to convert the raw pointers into references.
Original C++ code:
#include <stdint.h> #include other_file.h struct S [[lifetime_param(a)]] { uint32_t *$a x; uint32_t *$a y; } S createS(uint32_t * $a x, uint32_t * $a y) { S s = S{x, y}; MakeInternalAlias(&s); return s; } ////////////////////////////////////////////////// // other_file.h ////////////////////////////////////////////////// // Assume the following function is not being converted into Rust right now (it // will be called via FFI). void MakeInternalAlias(S* s) { s.x = s.y; }
The automated conversion tool would try to generate the following Rust code:
import crate_name::make_internal_alias; // signature: // pub fn make_internal_alias(s: *mut /* $a */ S); struct S /* $a */ { x: *mut /* $a */ u32, y: *mut /* $a */ u32, } pub fn create_s(x: *mut /* $a */ u32, y: *mut /* $a */ u32) -> S { let mut s = S {x, y}; make_internal_alias(&s); s }
Nonetheless the tool would fail, since make_internal_alias
is not annotated as creating alias and the syntactic analysis would recognize it does so. Therefore the tool would not be able to import make_internal_alias
into Rust code and would either ask the user to annotate the arguments of MakeInternalAlias
as being aliased or to make the function not alias (which, in this case, is not possible at all).
Original C++ code:
#include <stdint.h> #include other_file.h struct S [[lifetime_param(a)]] { uint32_t *$a x; uint32_t *$a y; } S createS(uint32_t * $a x) { return createSWithAlias(x); } ////////////////////////////////////////////////// // other_file.h ////////////////////////////////////////////////// // Assume the following function is not being converted into Rust right now (it // will be called via FFI). S CreateSWithAlias(uint32_t * $a x) { return S{x, x}; }
The automated conversion tool would try to generate the following Rust code:
import crate_name::create_s_with_alias; // signature: // pub fn create_s_with_alias(x: *mut /* $a */ u32); struct S /* $a */ { x: *mut /* $a */ u32, y: *mut /* $a */ u32, } pub fn create_s(x: *mut /* $a */ u32) -> S { create_s_with_alias(x) }
But the tool would fail, since the static analysis would detect that create_s_with_alias
creates an alias of x. Therefore once again the tool would not be able to import create_s_with_alias
and would either ask the user to annotate the argument of CreateSWithAlias
as being aliased or to make it not alias (which, in this case, is not possible at all).
Struct fields in Rust are dropped in declaration order, while in C++ they are dropped in reverse declaration order. This is also true for enum variants, tuples, arrays and owned slices. To make matters worse, if Rust panics during construction of the object, then the fields are dropped in reverse order of declaration. In other words, the field drop order in Rust depends on when the object is dropped (i.e. during panic unwind or not).
Therefore the safest solution is for the tool to create T::Drop
implementations for any type with 2+ non-trivially-destructible types in the first pass. The T::Drop
implementation for types with 2+ non-trivially-destructible fields calls all drop methods in the object fields in reverse order of declaration (i.e. matching the C++ destruction order). This method should also wrap ManuallyDrop<T>
on each of the (non-trivially-destructible) struct fields and call MemoryDrop::drop
on them (analogously for other types mentioned above). For this to work with inlined arrays/tuples it might also be necessary to wrap it in a Rust struct in order to be able to implement Drop
on them. If there is a manually-implemented destructor for the class, this destructor code will be converted into Rust and be executed before the MemoryDrop::drop
calls in the Drop implementation. Lastly, if the type contains at most one non-trivially destructible field we can ignore destruction order altogether.
Afterwards, in a cleanup pass, the tool will ask the user to specify whether the destruction order of each array/owned slice, tuple and struct (and C++ class) fields matters. If not, we can delete the T::Drop
implementation, the ManuallyDrop
and inlined-arrays/tuple struct wrappers, and just depend on Rust behavior, even if it doesn't match the C++ one. If the user confirms that the destruction order matters, then we keep the T::Drop
implementation for those types.
As a first example, let's look at a C++ class where the destruction order matters. In the example below MyFileReader
stores the file path (as a string) and a FileHandle
, while the latter stores a string_view
to the file path. Therefore the FileHandle
should be destructed before the file path string.
Original C++ code:
class FileHandle { FileHandle(absl::string_view file_path): file_path_(file_path) { // ... } ~FileHandle() { // ... } //... private: absl::string_view file_path_; //... } class MyFileReader { MyFileReader(std::string file_path) { this.file_path_ = file_path; this.file_handle_ = MyFileHandle(file_path_) } ~MyFileReader() { // <user-specified destructor logic> } string_view ReadLine() { // ... } private: const std::string file_path_; FileHandle file_handle_; }
Therefore the tool will realize that the destruction order of MyFileReader
fields might matter (since both std::string
and MyFileHandle
are non-trivially-destructible) and should match the C++ one. Therefore the automated conversion tool would generate the following Rust code (assume here that string_view
is a type in Rust):
struct FileHandle { file_path: string_view, //... } impl FileHandle { pub fn new(string_view file_path) { FileHandle { file_path, // ... } } } impl Drop for FileHandle { fn drop(&mut self) { // ... } } struct MyFileReader { file_path: ManuallyDrop<String>, file_handle: ManuallyDrop<FileHandle>, } impl MyFileReader { pub fn new(String file_path) { MyFileReader { file_path, MyFileHandle(file_path), } } pub string_view ReadLine(&mut self) { // ... } } impl Drop for MyFileReader { fn drop(&mut self) { // <user-specified destructor logic converted into Rust> ManuallyDrop::drop(self.file_handle); ManuallyDrop::drop(self.file_path); } }
For tuples/arrays the tool behavior would be exactly the same as for a struct (see example above). The only difference is that in a first step the tool will create a wrapper struct around the tuple/array (with a single tuple/array public field), and the tuple/array field inside would contain ManuallyDrop<T>
, where T was the type contained inside the typle/array. The drop method for this struct will be a reverse for-loop, dropping each element of the tuple/array (from last to first) by calling ManuallyDrop::drop
on it.
In C++ the evaluation order (including both value computation and side effects) of temporaries in the same expression are, in general, unspecified (i.e. they can happen at whatever order, and may interleave). In Rust things are not so well documented: we only know what Niko said, which is that Rust evaluates things roughly left-to-right, except for assignments, which are right-to-left. Therefore we should be concerned about cases where C++ evaluates right-to-left and Rust evaluates left-to-right.
The only such example is new-assignments in C++, where the call to new
is (since C++17) sequenced-before the evaluation of constructor arguments, while in Rust the order of evaluation of the memory allocation and the struct fields evaluation is not clear. Nonetheless I can't think of a single interesting example where the constructor fails and behaviors differ between C++ and Rust.
Of course there are cases where C++ evaluation order is unspecified or indeterminately sequenced and Rust‘s evaluation order is specified (example: evaluation order in expression). Nonetheless this isn’t very interesting in the context of converting non-buggy C++ code to Rust (since non-buggy C++ code should not rely on an unspecified evaluation order). Therefore I believe the tool doesn't have to worry about anything specific to these topics.
What to do with variadic functions? Replace the varargs by an array has the problem that the array size must be known at compile time. Vec implies a memory allocation. Probably slices are the natural way to go for the signature, and all callers could create an array at the call site.
What to do about function overloading? Rust does not allow functions with the same name and different signatures. Maybe some naming convention should be enough.
What about void*
? Devin mentioned the interop tool will probably use extern types. Another option would be to use *mut libc::c_void
or *libc::c_void
. We should probably not use *Void
, since it‘s a zero-sized type (which in Rust has 0 bytes, and in C++ has 1 byte, so it can’t move across FFI boundaries). Maybe since it‘s a pointer to Void
it’s OK (ptr would always occupy word-size bytes, no matter the type it points to)?
How to handle class inheritance? I guess pure abstract classes in C++ are more straightforward to convert to traits, but I'm not sure about non-abstract classes.
How to convert C++ namespaces to Rust?
How to convert C++ templates to Rust?