blob: 1fc9652855036592707f5740dacb36c8ab0420b2 [file] [log] [blame] [view]
## Snippet of C++ to Rust migration
## Overview
This doc contains code snippets of C++ code and how it would look when
automatically translated by a tool, both the first (unsafe) version and after
some automated cleanup passes. Each subsection focus on a specific topic. Unless
specified otherwise, we assume all C++ code is present in a header file.
We also don't consider the possibility of name collision (including with
reserved keywords in Rust). We will use the same naming convention as the
C++/Rust interop, and use a local scope for any extra temporaries we might need.
Lastly, we assume some sort of annotation in C++ code about whether a pointer is
nullable or never null.
## Automated cleanup passes
We assume the first pass would convert the C++ code to an unsafe version of Rust
as similar as possible to the original code. This includes:
- Using raw pointers instead of references, keeping the same behavior if they
are nullptr and passing along any lifetime annotation present in C++ code
- Using wrapping unsigned arithmetic
- Replacing any explicit or implicit C++ casts by Rust casts that truncate if
the content doesn't fit
- Implementing `T::Drop` for types with different destruction order between
C++ and Rust (structs, tuples, arrays/owned slices and enum variants), to
keep the same destruction order as in the C++ implementation. This is only
done if the type has 2+ non-trivially-destructible fields.
After the first pass is done, we can run multiple cleanup passes, running tests
in each pass to see if they respect expected behavior. Some of them are safe to
be applied automatically, while for others we might prefer to require human
review (we should strive to make the latter simple and fast to review).
Automated passes include:
- Replace raw pointers with lifetime annotations by references, checking for
aliasing issues using the borrow checker. Nullable raw pointers would be
replaced by an Option containing a reference.
- Rename class fields to not have a trailing underscore.
- Move any automatically-generated nested statements to separate statements.
Example includes unsafe blocks automatically added inside other expressions
when dealing with raw pointers and (possibly) constructor calls of
non-primitive types. This should probably be done after the other automatic
cleanup passes, since some of them might remove unsafe blocks nested inside
other statements, thus making this step unnecessary.
Passes that require human review include:
- Replace wrapping arithmetic by normal arithmetic that panics on overflow
- Converting raw pointers without lifetime annotations into references with
elided lifetime annotations (if it passes the borrow checker aliasing rules)
- Converting raw pointers with lifetime annotations into references when it
requires usage of `RefCell`.
- Removing automatically-generated `T::Drop` implementations if the user
confirms that the destruction order does not matter for the type T.
### Converting raw pointers into references
The cleanup pass to convert Rust raw pointers that have a lifetime annotation
into references is probably the most complicated one, because we need guarantees
that opaque functions (including FFI calls) don't alias pointers. This also
includes objects that contain at least a pointer T\* and either another field of
the same type T, another pointer of the same type T\* or another type that
contains T inside. Our strategy here is multifold:
- We will create a specific annotation for C++ code to tag an argument of a
function that can be aliased. If Rust code calls this function passing a
pointer, we assume the pointer can break aliasing rules, so both this
pointer and any other pointer that aliases it in Rust code will not be
refactored into references (they will be kept as raw pointers).
- For pointers that are not tagged with this annotation we run some static
analysis (at conversion time) to check that the transitive closure of all
function calls from Rust into C++ that take a pointer/reference do not
introduce aliasing. If it finds that a function F() (called directly from
Rust) introduces aliasing--either because it introduces alias itself or
because it indirectly calls a function that does it--we will not allow the
Rust code to import F(). In this case the user should either mark those
arguments of the function F() with the annotation mentioned in the previous
item (and use raw pointers in Rust) or, if possible, should refactor F() not
to alias.
## Code snippets
All code below assumes C++ pointers are annotated as not nullable, unless stated
otherwise.
### Arithmetic
#### Example 1
Original C++ code:
``` cpp
#include <stdint.h>
uint32_t AddAndCast(uint64_t x, uint64_t y) {
return x + y;
}
```
Rust code after the first pass:
``` rust
pub fn add_and_cast(x: u64, y: u64) -> u32 {
x.wrapping_add(y) as u32
}
```
Rust code after the arithmetic cleanup pass, triggered manually:
``` rust
pub fn add_and_cast(x: u64, y: u64) -> u32 {
(x + y) as u32
}
```
### Side effects
#### Example 1
Original C++ code:
``` cpp
#include <stdint.h>
void UseUInt32T(uint32_t x);
uint32_t UsePtrs(uint32_t* x, uint32_t * $a y) {
UseUInt32T((*x)++);
UseUInt32T(++(*x));
UseUInt32T((*y));
return *x;
}
```
Rust code after the first pass:
``` rust
fn use_u_int32_t(x: u32) {
// translated based on the implementation in the .cc file or
// imported from another crate
}
pub fn use_ptrs(x: *mut u32, y: *mut /* $a */ u32) -> u32 {
use_u_int32_t(
unsafe {
// c++ version:
// (*x)++
let temp1 = *x;
*x += 1;
temp1
}
);
use_u_int32_t(
unsafe {
// c++ version:
// ++(*x)
*x += 1;
*x
}
);
use_u_int32_t(
unsafe {
// c++ version:
// (*y)
*y
}
);
unsafe {
// c++ version:
// (*x)
*x
}
}
```
Rust code after converting raw pointers with lifetime annotations to references:
``` rust
fn use_u_int32_t(x: u32) {
// translated based on the implementation in the .cc file or
// imported from another crate
}
pub fn use_ptrs(x: *mut u32, y: &'a mut u32) -> u32 {
use_u_int32_t(
unsafe {
// c++ version:
// (*x)++
let temp1 = *x;
*x += 1;
temp1
}
);
use_u_int32_t(
unsafe {
// c++ version:
// ++(*x)
*x += 1;
*x
}
);
use_u_int32_t(*y);
unsafe {
// c++ version:
// (*x)
*x
}
}
```
Rust code after pass to move nested blocks to separate line:
``` rust
fn use_u_int32_t(x: u32) {
// translated based on the implementation in the .cc file or
// imported from another crate
}
pub fn use_ptrs(x: *mut u32, y: &'a mut u32) -> u32 {
let temp1 = unsafe {
// c++ version:
// (*x)++
let temp1 = *x;
*x += 1;
temp1
};
use_u_int32_t(temp1);
let temp2 = unsafe {
// c++ version:
// ++(*x)
*x += 1;
*x
};
use_u_int32_t(temp2);
use_u_int32_t(*y);
unsafe {
// c++ version:
// (*x)
*x
}
}
```
#### Example 2 - pointer aliasing in Rust code
Original C++ code:
``` cpp
#include <stdint.h>
uint32_t CreateAlias(uint32_t* x, uint32_t * $a y) {
y = x;
return *y;
}
```
Rust code after the first pass:
``` rust
pub fn create_alias(x: *mut u32, y: *mut /* $a */ u32) -> u32 {
y = x;
unsafe {
// c++ version:
// (*y)
*y
}
}
```
Since `CreateAlias` does alias y and x, we won't be able to convert raw pointers
with lifetime annotations (in this case y) to references, because x has no
lifetime annotation. If both had a matching lifetime annotation the tool would
try to convert them to references.
#### Example 3 - annotated aliasing function that takes pointers
Original C++ code:
``` cpp
#include <stdint.h>
void AnnotatedFfiFunctionThatCreatesAlias(
/* alias_tag */ uint32_t* x, /* alias_tag */ uint32_t* y);
uint32_t UsePtrs(uint32_t * $a x, uint32_t * $b y) {
AnnotatedFfiFunctionThatCreatesAlias(x, y);
return *y;
}
```
Rust code after the first pass:
``` rust
import crate_name::annotated_ffi_function_that_creates_alias;
// signature:
// pub fn annotated_ffi_function_that_creates_alias(x: *mut u32, y: *mut u32);
pub fn use_ptrs(x: *mut /* $a */ u32, y: *mut /* $b */ u32) -> u32 {
annotated_ffi_function_that_creates_alias(x, y);
unsafe {
// c++ version:
// (*y)
*y
}
}
```
Since `annotated_ffi_function_that_creates_alias` has annotations in both
arguments that it introduces alias, we will not run the cleanup pass to convert
the raw pointers into references.
#### Example 4 - non-annotated aliasing function that takes object
Original C++ code:
``` cpp
#include <stdint.h>
#include other_file.h
struct S [[lifetime_param(a)]] {
uint32_t *$a x;
uint32_t *$a y;
}
S createS(uint32_t * $a x, uint32_t * $a y) {
S s = S{x, y};
MakeInternalAlias(&s);
return s;
}
//////////////////////////////////////////////////
// other_file.h
//////////////////////////////////////////////////
// Assume the following function is not being converted into Rust right now (it
// will be called via FFI).
void MakeInternalAlias(S* s) {
s.x = s.y;
}
```
The automated conversion tool <span style="text-decoration:underline;">would
try</span> to generate the following Rust code:
``` rust
import crate_name::make_internal_alias;
// signature:
// pub fn make_internal_alias(s: *mut /* $a */ S);
struct S /* $a */ {
x: *mut /* $a */ u32,
y: *mut /* $a */ u32,
}
pub fn create_s(x: *mut /* $a */ u32, y: *mut /* $a */ u32) -> S {
let mut s = S {x, y};
make_internal_alias(&s);
s
}
```
<span style="text-decoration:underline;">Nonetheless the tool would fail</span>,
since `make_internal_alias` is not annotated as creating alias and the syntactic
analysis would recognize it does so. Therefore the tool would not be able to
import `make_internal_alias` into Rust code and would either ask the user to
annotate the arguments of `MakeInternalAlias` as being aliased or to make the
function not alias (which, in this case, is not possible at all).
#### Example 5 - aliasing function that creates object
Original C++ code:
``` cpp
#include <stdint.h>
#include other_file.h
struct S [[lifetime_param(a)]] {
uint32_t *$a x;
uint32_t *$a y;
}
S createS(uint32_t * $a x) {
return createSWithAlias(x);
}
//////////////////////////////////////////////////
// other_file.h
//////////////////////////////////////////////////
// Assume the following function is not being converted into Rust right now (it
// will be called via FFI).
S CreateSWithAlias(uint32_t * $a x) {
return S{x, x};
}
```
The automated conversion tool <span style="text-decoration:underline;">would
try</span> to generate the following Rust code:
``` rust
import crate_name::create_s_with_alias;
// signature:
// pub fn create_s_with_alias(x: *mut /* $a */ u32);
struct S /* $a */ {
x: *mut /* $a */ u32,
y: *mut /* $a */ u32,
}
pub fn create_s(x: *mut /* $a */ u32) -> S {
create_s_with_alias(x)
}
```
But <span style="text-decoration:underline;">the tool would fail</span>, since
the static analysis would detect that `create_s_with_alias` creates an alias of
x. Therefore once again the tool would not be able to import
`create_s_with_alias` and would either ask the user to annotate the argument of
`CreateSWithAlias` as being aliased or to make it not alias (which, in this
case, is not possible at all).
### Destructor order
Struct fields in Rust are dropped in declaration order, while in C++ they are
dropped in reverse declaration order. This is also true for enum variants,
tuples, arrays and owned slices. To make matters worse, if Rust panics during
construction of the object, then the fields are dropped in reverse order of
declaration. In other words, the field drop order in Rust depends on when the
object is dropped (i.e. during panic unwind or not).
Therefore the safest solution is for the tool to create `T::Drop`
implementations for any type with 2+ non-trivially-destructible types in the
first pass. The `T::Drop` implementation for types with 2+
non-trivially-destructible fields calls all drop methods in the object fields in
reverse order of declaration (i.e. matching the C++ destruction order). This
method should also wrap `ManuallyDrop<T>` on each of the
(non-trivially-destructible) struct fields and call `MemoryDrop::drop` on them
(analogously for other types mentioned above). For this to work with inlined
arrays/tuples it might also be necessary to wrap it in a Rust struct in order to
be able to implement `Drop` on them. If there is a manually-implemented
destructor for the class, this destructor code will be converted into Rust and
be executed before the `MemoryDrop::drop` calls in the Drop implementation.
Lastly, if the type contains at most one non-trivially destructible field we can
ignore destruction order altogether.
Afterwards, in a cleanup pass, the tool will ask the user to specify whether the
destruction order of each array/owned slice, tuple and struct (and C++ class)
fields matters. If not, we can delete the `T::Drop` implementation, the
`ManuallyDrop` and inlined-arrays/tuple struct wrappers, and just depend on Rust
behavior, even if it doesn't match the C++ one. If the user confirms that the
destruction order matters, then we keep the `T::Drop` implementation for those
types.
#### Example 1:
As a first example, let's look at a C++ class where the destruction order
matters. In the example below `MyFileReader` stores the file path (as a string)
and a `FileHandle`, while the latter stores a `string_view` to the file path.
Therefore the `FileHandle` should be destructed before the file path string.
Original C++ code:
``` cpp
class FileHandle {
  FileHandle(absl::string_view file_path): file_path_(file_path) {
    // ...
  }
  ~FileHandle() {
    // ...
  }
  //...
private:
  absl::string_view file_path_;
  //...
}
class MyFileReader {
  MyFileReader(std::string file_path) {
    this.file_path_ = file_path;
    this.file_handle_ = MyFileHandle(file_path_)
  }
  ~MyFileReader() {
    // <user-specified destructor logic>
  }
  string_view ReadLine() {
    // ...
  }
private:
  const std::string file_path_;
  FileHandle file_handle_;
}
```
Therefore the tool will realize that the destruction order of `MyFileReader`
fields might matter (since both `std::string` and `MyFileHandle` are
non-trivially-destructible) and should match the C++ one. Therefore the
automated conversion tool would generate the following Rust code (assume here
that `string_view` is a type in Rust):
``` rust
struct FileHandle {
file_path: string_view,
//...
}
impl FileHandle {
pub fn new(string_view file_path) {
FileHandle {
file_path,
// ...
}
}
}
impl Drop for FileHandle {
fn drop(&mut self) {
// ...
}
}
struct MyFileReader {
file_path: ManuallyDrop<String>,
file_handle: ManuallyDrop<FileHandle>,
}
impl MyFileReader {
pub fn new(String file_path) {
MyFileReader {
file_path,
MyFileHandle(file_path),
}
}
pub string_view ReadLine(&mut self) {
// ...
}
}
impl Drop for MyFileReader {
fn drop(&mut self) {
// <user-specified destructor logic converted into Rust>
ManuallyDrop::drop(self.file_handle);
ManuallyDrop::drop(self.file_path);
}
}
```
#### Example 2: tuples and arrays
For tuples/arrays the tool behavior would be exactly the same as for a struct
(see example above). The only difference is that in a first step the tool will
create a wrapper struct around the tuple/array (with a single tuple/array public
field), and the tuple/array field inside would contain `ManuallyDrop<T>`, where
T was the type contained inside the typle/array. The drop method for this struct
will be a reverse for-loop, dropping each element of the tuple/array (from last
to first) by calling `ManuallyDrop::drop` on it.
### Temporaries & Order of function call within expression
In [C++ the evaluation
order](https://en.cppreference.com/w/cpp/language/eval_order) (including both
value computation and side effects) of temporaries in the same expression are,
in general, unspecified (i.e. they can happen at whatever order, and may
interleave). In Rust things are [not so well
documented](https://github.com/rust-lang/reference/issues/248): we only know
what Niko said, which is that Rust evaluates things roughly left-to-right,
except for assignments, which are right-to-left. Therefore we should be
concerned about cases where C++ evaluates right-to-left and Rust evaluates
left-to-right.
The only such example is new-assignments in C++, where the call to `new` is
(since C++17) sequenced-before the evaluation of constructor arguments, while in
Rust the order of evaluation of the memory allocation and the struct fields
evaluation is not clear. Nonetheless I can't think of a single interesting
example where the constructor fails and behaviors differ between C++ and Rust.
Of course there are cases where C++ evaluation order is unspecified or
indeterminately sequenced and Rust's evaluation order is specified (example:
evaluation order in expression). Nonetheless this isn't very interesting in the
context of converting non-buggy C++ code to Rust (since non-buggy C++ code
should not rely on an unspecified evaluation order). Therefore I believe the
tool doesn't have to worry about anything specific to these topics.
### Open questions
- What to do with variadic functions? Replace the varargs by an array has the
problem that the array size must be known at compile time. Vec implies a
memory allocation. Probably slices are the natural way to go for the
signature, and all callers could create an array at the call site.
- What to do about function overloading? Rust does not allow functions with
the same name and different signatures. Maybe some naming convention should
be enough.
- What about `void*`? Devin mentioned the interop tool will probably use
extern types. Another option would be to use `*mut libc::c_void` or
`*libc::c_void`. We should probably not use `*Void`, since it's a zero-sized
type (which in Rust has 0 bytes, and in C++ has 1 byte, so it can't move
across FFI boundaries). Maybe since it's a pointer to `Void` it's OK (ptr
would always occupy word-size bytes, no matter the type it points to)?
- How to handle class inheritance? I guess pure abstract classes in C++ are
more straightforward to convert to traits, but I'm not sure about
non-abstract classes.
- How to convert C++ namespaces to Rust?
- How to convert C++ templates to Rust?