docs/cpp/best_practices.md - crubit - Git at Google

 # Best Practices Writing Rust Bindings for Existing C++ Libraries

 ## Introduction

 This document is an attempt at guidance for how Rust changes can be made to
 existing C++ libraries, including core foundational libraries.

 For an introduction, see [Rust Bindings for C++ Libraries](index.md).

 ## Code Organization {#organization}

 For technical reasons, it is generally necessary for the C++ library and its
 Rust bindings to be the same Bazel target. It is not possible to define the Rust
 bindings for a target as a completely separate and independent target. The
 automatically generated bindings, and their configuration, must be on and in the
 C++ target itself.

 <section class="zippy" markdown="1">

 The reasons why are fairly technical, and you can stop reading here if you're OK
 with this.

 ### Technical Justification

 Crubit generates bindings using
 [Bazel **aspects**](https://bazel.build/extending/aspects): given an arbitrary
 C++ Bazel target, Crubit generates, in an aspect, the Rust library which wraps
 it. To users it appears as if the Bazel target was both a C++ and a Rust
 library.

 This is necessary for the same reason that it's necessary for protocol buffers.
 And, just like protocol buffers, this means that we don't have a `rust_library`
 target where we could customize its behavior using Bazel attributes.

 Specifically, we cannot use a regular Bazel rule for bindings generation because
 the rule cannot generate bindings for transitive dependencies: if A depends on
 B, then bindings(A) depends on bindings(B), so that bindings(A) can wrap
 functions in A that return types from B, and so on. (See
 [FAQ: Why can't we use separate rules?](#faq_separate_rules))

 Because bindings are generated in an aspect, and not a rule, there are only two
 places to configure the bindings of a target A:

 *   In the **source code** of the target receiving Rust support, using
     configuration pragmas or attributes. (This is similar to protocol buffers.)
 *   In the **BUILD file**, on the target receiving Rust support, via
     `aspect_hints`. Aspect hints are a storage location for configuration data,
     readable by the aspect, placed directly on the target that the aspect runs
     on.

 Generally speaking, it's better to modify the source code than to configure
 externally via aspect hints. However, some source code annotations are
 nonstandard and can have performance implications (see b/321933939). In addition
 to this, source code is not readable from the build system itself, and so where
 configuring a target requires customizing the build graph, these must go in
 aspect hints.

 For these reasons, currently most publicly available methods of customizing
 bindings occur in aspect hints.

 In any case, any configuration or support for Rust is done *directly* to the
 target.

 ### Example

 To enable Crubit on a C++ target, one actually modifies the target itself,
 adding `aspect_hints = ["//features:supported"]`. This must
 be an aspect hint, not a source code annotation, for all of the above reasons:

 1.  It makes the build faster and more resilient: when Crubit is disabled on a
     target, Bazel needs to know so it can completely avoid running Crubit on it.
 2.  There is no stable, reliable, and style-approved header-wide pragma we can
     use for enabling/disabling Crubit, but `aspect_hints` does work.

 ### FAQ: Why can't we use separate rules? {#faq_separate_rules}

 A library `A`, and its bindings `bindings(A)`, must be linked together in the
 build graph: if `B` uses a type from `A`, then `bindings(B)` uses a type from
 `bindings(A)`.

 Crucially, this also goes in reverse: if a *Rust library* `C` uses a type from
 `bindings(A)`, then `reverse_bindings(C)` uses a type from `A`. This forms a
 natural dependency cycle: the build graph must understand both the link from `A`
 to `bindings(A)`, and the link from `bindings(A)` to `A`.

 Crubit resolves this by making `A` and `bindings(A)` the same target in the
 build graph: bindings for a target are obtained by reading an aspect on the
 target.

 It is not possible to make `A` one build target, and `bindings(A)` a separate
 build target, call it `X`:

 1.  We cannot literally configure on `A` that its bindings are in a different
     target `X`, because this ends up producing a real dependency cycle, as
     mentioned above: if `bindings(A)` = `X`, then `reverse_bindings(X)` = `A`.
 2.  We cannot avoid the cycle by creating the dependency "lazily", or
     "dynamically" based on e.g. a naming scheme during Bazel analysis. Bazel
     dependencies cannot be discovered dynamically; once Bazel reaches this point
     of evaluation, dependencies need to be fully resolved: labels in `deps` are
     no longer strings in this stage, they are edges in a dependency graph. That
     graph must not have cycles.
 3.  In some limited cases, we *can* hardcode the relationship within Crubit:
     Crubit is actually two aspects, each of which handles a single direction of
     interop. So Crubit can hardcode inside of itself that `bindings(A)` = `X`,
     and in the other half, that `reverse_bindings(X)` = `A`. This requires that
     Crubit itself depends on `A` and `X`. Therefore, to avoid another dependency
     cycle, neither `A` nor `X` can depend/use Crubit in their transitive
     dependencies. This is not feasible except in very isolated cases. Currently,
     we only do this for the Rust and C++ standard libraries.

 To compare with another similar technology, PyCLIF avoids this problem because
 it only supports "one-directional" interop, and so it doesn't need to avoid
 dependency cycles. Crubit is bidirectional, and this comes with some technical
 restrictions.

 ### FAQ: Why are there extra dependencies in `deps(target)`? {#faq_dependency_edges}

 Because the Rust bindings are created using an **aspect** on the C++ target,
 everything that the Rust bindings need to depend on will appear in a Bazel query
 / depserver query for `deps(target)`.

 For example, if you wanted to add some extra source file to the Rust bindings,
 you might specify them in `aspect_hints`. This file will show up in
 `deps(target)`.

 These Rust-only deps are not used at all in pure-C++ builds (the Bazel actions
 registered by them won't be executed), but they will show up in the dependency
 graph anyway, due to how Bazel query and depserver track dependencies.

 NOTE: In particular, if your project has tests that count/limit the transitive
 dependencies of a C++ binary, they will overcount the dependencies, and the
 overcounting will get worse as Rust support is rolled out through the C++ build
 graph.

 </section>

 ## Wrapping and type bridging vs direct use of types {#bridging}

 Crubit automatically generates layout-compatible Rust equivalents of C++ types.
 When the C++ type is [Rust-movable](classes_and_structs.md#rust_movable), the
 Crubit-generated Rust type is Rust-movable, these can be used by value, by
 pointer, in struct fields, arrays, and any other compound data type. A C++
 pointer `const T*` can become a Rust `*const T`, and a C++ `T` field can become
 a Rust `T` field, and so on, with few restrictions.

 For example, the following C++ type:

 ```c++
 struct Vec2d {
     float x;
     float y;
 };
 ```

 Becomes (roughly) the following Rust type:

 ```rust
 #[repr(C)]
 struct Vec2d {
     pub x: f32,
     pub y: f32,
 }
 ```

 These have an identical layout, and so a C++ pointer or field containing a C++
 `Vec2d` is exactly equivalent to a Rust pointer or field containing a Rust
 `Vec2d`.

 (See [Types](../types/) for more information about layout-compatibility.)

 Because of this, it is often not required to manually write any new types. The
 bindings generated by Crubit will produce a working type automatically.

 ### When to wrap a type

 There are, still, a handful of reasons to manually write "wrapper" types which
 encapsulate or replace the original C++ type (or its Crubit-generated Rust
 type).

 *   If the type is **not** naturally Rust-movable, but it's important for the
     Rust type to be Rust-movable. It may be possible to make changes to the C++
     code to make the type Rust-movable using some of the strategies described in
     [the cookbook](cookbook.md#rust_movable). This allows the greatest
     flexibility, as the type becomes usable in almost every context. But if that
     is not possible, writing a new "wrapper" type can keep Rust programmers
     productive.
 *   Some Rust types have very special semantics, which are impossible to
     implement in the bindings for a C++ type. For example, Rust has special
     support for `Result` and `Option` in error handling via the `?` operator,
     which cannot yet be implemented by `Status` or `std::optional` using stable
     Rust features. These privileged Rust types can be used instead of the
     equivalent C++ types, as a wrapper type.

 In these cases, we may bridge to a wrapper type as a workaround, while we
 hopefully fix the underlying issues that mean we cannot directly use the
 underlying type. This offers us a subset of the API we want, and allows
 continued progress.

 ### Why not to wrap a type

 Wrapper types work best when passed by value: if you return a `T` in C++, the
 corresponding Rust function can automatically convert it to and return a
 `WrappedT`.

 However, no conversion is possible for references or fields, which really are
 the original type, with its size and alignment and address in memory - to make
 this work transparently requires an ever-expanding network of wrapper types, one
 for every compound data type that might contain `T`:

 *   `T` must become `WrappedT`
 *   `const T&`, if it is supported at all, must become something like
     `TRef<'a>`, or a dynamically sized `&TView`.
 *   `std::vector<T>`, if it is supported at all, must become something like
     `TVector`.
 *   `struct MyStruct {T x;}` must become a wrapped `WrappedMyStruct`.
 *   ...

 The problems introduced by wrapper types can easily outweigh the benefits that
 they bring. Crubit aims to reduce their necessity to zero over time.

 #### *Bad reasons to wrap a type*

 In most other circumstances where one might *want* to reach for wrapper types,
 alternatives exist:

 *   If we want to use a wrapper type in order to give the type a nicer Rust API,
     then, as an alternative, one can customize the Rust API of the wrapped type
     using an aspect hint. You can define new methods and trait implementations
     to the side, without altering any C++ code.

 *   If we want to use a wrapper type in order to change the type invariants – to
     make them stricter or looser – this is fine, as long as it doesn't *replace*
     the not-as-nice type. For example, if a C++ API returns `std::string`
     (bytes, "probably" UTF-8), the Rust equivalent should not return a Rust
     `String` (Unicode, definitely UTF-8). Changing type invariants in-place
     causes some APIs to become impossible to call, and causes the Rust and C++
     ecosystems to diverge and become incompatible. The bindings should be high
     fidelity. Wrapper types of this form should be optional, and available
     equally to both C++ and Rust to avoid fragmenting the ecosystem.

 </section>

 ## Fidelity {#fidelity}

 Anything possible in C++ should be possible in Rust. See
 [<internal link>](http://goto.google.com/no-hitchhikers).

 The Rust API for a given C++ API should not try to make the interface "better"
 at more than a superficial level, because it can compromise the ability of other
 teams to write new Rust code, or port existing C++ code to Rust.

 **Good changes:**

 *   Changing method names, especially to names that Rust callers might expect.
     For example, changing `Status::ok()` (C++) to `Status::is_ok()` (Rust) –
     Rust callers expect many of these boolean functions to be prefixed with
     `is_`.
 *   Adding new APIs that Rust users expect. For example, trait implementations
     that allow the type to better interoperate with the Rust ecosystem, or
     functions which accept a `Path` or `&str` in *addition* to a raw C++
     `string_view`.
 *   Reifying C++ comments around lifetime or safety as actual lifetime
     annotations or `unsafe` declarations.

 If the Rust type is outright unnatural to use, people won't use it, and it's
 worse for the ecosystem to have two APIs than one API.

 **Bad changes:**

 *   Removing deprecated APIs which still have C++ callers.
 *   Placing new requirements on Rust callers that were not placed on C++
     callers, such as requiring UTF-8 when C++ does not.
	# Best Practices Writing Rust Bindings for Existing C++ Libraries

	## Introduction

	This document is an attempt at guidance for how Rust changes can be made to
	existing C++ libraries, including core foundational libraries.

	For an introduction, see [Rust Bindings for C++ Libraries](index.md).

	## Code Organization {#organization}

	For technical reasons, it is generally necessary for the C++ library and its
	Rust bindings to be the same Bazel target. It is not possible to define the Rust
	bindings for a target as a completely separate and independent target. The
	automatically generated bindings, and their configuration, must be on and in the
	C++ target itself.

	<section class="zippy" markdown="1">

	The reasons why are fairly technical, and you can stop reading here if you're OK
	with this.

	### Technical Justification

	Crubit generates bindings using
	[Bazel aspects](https://bazel.build/extending/aspects): given an arbitrary
	C++ Bazel target, Crubit generates, in an aspect, the Rust library which wraps
	it. To users it appears as if the Bazel target was both a C++ and a Rust
	library.

	This is necessary for the same reason that it's necessary for protocol buffers.
	And, just like protocol buffers, this means that we don't have a `rust_library`
	target where we could customize its behavior using Bazel attributes.

	Specifically, we cannot use a regular Bazel rule for bindings generation because
	the rule cannot generate bindings for transitive dependencies: if A depends on
	B, then bindings(A) depends on bindings(B), so that bindings(A) can wrap
	functions in A that return types from B, and so on. (See
	[FAQ: Why can't we use separate rules?](#faq_separate_rules))

	Because bindings are generated in an aspect, and not a rule, there are only two
	places to configure the bindings of a target A:

	* In the source code of the target receiving Rust support, using
	configuration pragmas or attributes. (This is similar to protocol buffers.)
	* In the BUILD file, on the target receiving Rust support, via
	`aspect_hints`. Aspect hints are a storage location for configuration data,
	readable by the aspect, placed directly on the target that the aspect runs
	on.

	Generally speaking, it's better to modify the source code than to configure
	externally via aspect hints. However, some source code annotations are
	nonstandard and can have performance implications (see b/321933939). In addition
	to this, source code is not readable from the build system itself, and so where
	configuring a target requires customizing the build graph, these must go in
	aspect hints.

	For these reasons, currently most publicly available methods of customizing
	bindings occur in aspect hints.

	In any case, any configuration or support for Rust is done directly to the
	target.

	### Example

	To enable Crubit on a C++ target, one actually modifies the target itself,
	adding `aspect_hints = ["//features:supported"]`. This must
	be an aspect hint, not a source code annotation, for all of the above reasons:

	1. It makes the build faster and more resilient: when Crubit is disabled on a
	target, Bazel needs to know so it can completely avoid running Crubit on it.
	2. There is no stable, reliable, and style-approved header-wide pragma we can
	use for enabling/disabling Crubit, but `aspect_hints` does work.

	### FAQ: Why can't we use separate rules? {#faq_separate_rules}

	A library `A`, and its bindings `bindings(A)`, must be linked together in the
	build graph: if `B` uses a type from `A`, then `bindings(B)` uses a type from
	`bindings(A)`.

	Crucially, this also goes in reverse: if a Rust library `C` uses a type from
	`bindings(A)`, then `reverse_bindings(C)` uses a type from `A`. This forms a
	natural dependency cycle: the build graph must understand both the link from `A`
	to `bindings(A)`, and the link from `bindings(A)` to `A`.

	Crubit resolves this by making `A` and `bindings(A)` the same target in the
	build graph: bindings for a target are obtained by reading an aspect on the
	target.

	It is not possible to make `A` one build target, and `bindings(A)` a separate
	build target, call it `X`:

	1. We cannot literally configure on `A` that its bindings are in a different
	target `X`, because this ends up producing a real dependency cycle, as
	mentioned above: if `bindings(A)` = `X`, then `reverse_bindings(X)` = `A`.
	2. We cannot avoid the cycle by creating the dependency "lazily", or
	"dynamically" based on e.g. a naming scheme during Bazel analysis. Bazel
	dependencies cannot be discovered dynamically; once Bazel reaches this point
	of evaluation, dependencies need to be fully resolved: labels in `deps` are
	no longer strings in this stage, they are edges in a dependency graph. That
	graph must not have cycles.
	3. In some limited cases, we can hardcode the relationship within Crubit:
	Crubit is actually two aspects, each of which handles a single direction of
	interop. So Crubit can hardcode inside of itself that `bindings(A)` = `X`,
	and in the other half, that `reverse_bindings(X)` = `A`. This requires that
	Crubit itself depends on `A` and `X`. Therefore, to avoid another dependency
	cycle, neither `A` nor `X` can depend/use Crubit in their transitive
	dependencies. This is not feasible except in very isolated cases. Currently,
	we only do this for the Rust and C++ standard libraries.

	To compare with another similar technology, PyCLIF avoids this problem because
	it only supports "one-directional" interop, and so it doesn't need to avoid
	dependency cycles. Crubit is bidirectional, and this comes with some technical
	restrictions.

	### FAQ: Why are there extra dependencies in `deps(target)`? {#faq_dependency_edges}

	Because the Rust bindings are created using an aspect on the C++ target,
	everything that the Rust bindings need to depend on will appear in a Bazel query
	/ depserver query for `deps(target)`.

	For example, if you wanted to add some extra source file to the Rust bindings,
	you might specify them in `aspect_hints`. This file will show up in
	`deps(target)`.

	These Rust-only deps are not used at all in pure-C++ builds (the Bazel actions
	registered by them won't be executed), but they will show up in the dependency
	graph anyway, due to how Bazel query and depserver track dependencies.

	NOTE: In particular, if your project has tests that count/limit the transitive
	dependencies of a C++ binary, they will overcount the dependencies, and the
	overcounting will get worse as Rust support is rolled out through the C++ build
	graph.

	</section>

	## Wrapping and type bridging vs direct use of types {#bridging}

	Crubit automatically generates layout-compatible Rust equivalents of C++ types.
	When the C++ type is [Rust-movable](classes_and_structs.md#rust_movable), the
	Crubit-generated Rust type is Rust-movable, these can be used by value, by
	pointer, in struct fields, arrays, and any other compound data type. A C++
	pointer `const T` can become a Rust `const T`, and a C++ `T` field can become
	a Rust `T` field, and so on, with few restrictions.

	For example, the following C++ type:

	```c++
	struct Vec2d {
	float x;
	float y;
	};
	```

	Becomes (roughly) the following Rust type:

	```rust
	#[repr(C)]
	struct Vec2d {
	pub x: f32,
	pub y: f32,
	}
	```

	These have an identical layout, and so a C++ pointer or field containing a C++
	`Vec2d` is exactly equivalent to a Rust pointer or field containing a Rust
	`Vec2d`.

	(See [Types](../types/) for more information about layout-compatibility.)

	Because of this, it is often not required to manually write any new types. The
	bindings generated by Crubit will produce a working type automatically.

	### When to wrap a type

	There are, still, a handful of reasons to manually write "wrapper" types which
	encapsulate or replace the original C++ type (or its Crubit-generated Rust
	type).

	* If the type is not naturally Rust-movable, but it's important for the
	Rust type to be Rust-movable. It may be possible to make changes to the C++
	code to make the type Rust-movable using some of the strategies described in
	[the cookbook](cookbook.md#rust_movable). This allows the greatest
	flexibility, as the type becomes usable in almost every context. But if that
	is not possible, writing a new "wrapper" type can keep Rust programmers
	productive.
	* Some Rust types have very special semantics, which are impossible to
	implement in the bindings for a C++ type. For example, Rust has special
	support for `Result` and `Option` in error handling via the `?` operator,
	which cannot yet be implemented by `Status` or `std::optional` using stable
	Rust features. These privileged Rust types can be used instead of the
	equivalent C++ types, as a wrapper type.

	In these cases, we may bridge to a wrapper type as a workaround, while we
	hopefully fix the underlying issues that mean we cannot directly use the
	underlying type. This offers us a subset of the API we want, and allows
	continued progress.

	### Why not to wrap a type

	Wrapper types work best when passed by value: if you return a `T` in C++, the
	corresponding Rust function can automatically convert it to and return a
	`WrappedT`.

	However, no conversion is possible for references or fields, which really are
	the original type, with its size and alignment and address in memory - to make
	this work transparently requires an ever-expanding network of wrapper types, one
	for every compound data type that might contain `T`:

	* `T` must become `WrappedT`
	* `const T&`, if it is supported at all, must become something like
	`TRef<'a>`, or a dynamically sized `&TView`.
	* `std::vector<T>`, if it is supported at all, must become something like
	`TVector`.
	* `struct MyStruct {T x;}` must become a wrapped `WrappedMyStruct`.
	* ...

	The problems introduced by wrapper types can easily outweigh the benefits that
	they bring. Crubit aims to reduce their necessity to zero over time.

	#### Bad reasons to wrap a type

	In most other circumstances where one might want to reach for wrapper types,
	alternatives exist:

	* If we want to use a wrapper type in order to give the type a nicer Rust API,
	then, as an alternative, one can customize the Rust API of the wrapped type
	using an aspect hint. You can define new methods and trait implementations
	to the side, without altering any C++ code.

	* If we want to use a wrapper type in order to change the type invariants – to
	make them stricter or looser – this is fine, as long as it doesn't replace
	the not-as-nice type. For example, if a C++ API returns `std::string`
	(bytes, "probably" UTF-8), the Rust equivalent should not return a Rust
	`String` (Unicode, definitely UTF-8). Changing type invariants in-place
	causes some APIs to become impossible to call, and causes the Rust and C++
	ecosystems to diverge and become incompatible. The bindings should be high
	fidelity. Wrapper types of this form should be optional, and available
	equally to both C++ and Rust to avoid fragmenting the ecosystem.

	</section>

	## Fidelity {#fidelity}

	Anything possible in C++ should be possible in Rust. See
	[<internal link>](http://goto.google.com/no-hitchhikers).

	The Rust API for a given C++ API should not try to make the interface "better"
	at more than a superficial level, because it can compromise the ability of other
	teams to write new Rust code, or port existing C++ code to Rust.

	Good changes:

	* Changing method names, especially to names that Rust callers might expect.
	For example, changing `Status::ok()` (C++) to `Status::is_ok()` (Rust) –
	Rust callers expect many of these boolean functions to be prefixed with
	`is_`.
	* Adding new APIs that Rust users expect. For example, trait implementations
	that allow the type to better interoperate with the Rust ecosystem, or
	functions which accept a `Path` or `&str` in addition to a raw C++
	`string_view`.
	* Reifying C++ comments around lifetime or safety as actual lifetime
	annotations or `unsafe` declarations.

	If the Rust type is outright unnatural to use, people won't use it, and it's
	worse for the ecosystem to have two APIs than one API.

	Bad changes:

	* Removing deprecated APIs which still have C++ callers.
	* Placing new requirements on Rust callers that were not placed on C++
	callers, such as requiring UTF-8 when C++ does not.