blob: 69d915610d34959331459430c95a5057cc413662 [file] [log] [blame] [view] [edit]
# Best Practices Writing Rust Bindings for Existing C++ Libraries
## Introduction
This document is an attempt at guidance for how Rust changes can be made to
existing C++ libraries, including core foundational libraries.
For an introduction, see [Rust Bindings for C++ Libraries](index.md).
## Code Organization {#organization}
For technical reasons, it is generally necessary for the C++ library and its
Rust bindings to be the same Bazel target. It is not possible to define the Rust
bindings for a target as a completely separate and independent target. The
automatically generated bindings, and their configuration, must be on and in the
C++ target itself.
<section class="zippy" markdown="1">
The reasons why are fairly technical, and you can stop reading here if you're OK
with this.
### Technical Justification
Crubit generates bindings using
[Bazel **aspects**](https://bazel.build/extending/aspects): given an arbitrary
C++ Bazel target, Crubit generates, in an aspect, the Rust library which wraps
it. To users it appears as if the Bazel target was both a C++ and a Rust
library.
This is necessary for the same reason that it's necessary for protocol buffers.
And, just like protocol buffers, this means that we don't have a `rust_library`
target where we could customize its behavior using Bazel attributes.
Specifically, we cannot use a regular Bazel rule for bindings generation because
the rule cannot generate bindings for transitive dependencies: if A depends on
B, then bindings(A) depends on bindings(B), so that bindings(A) can wrap
functions in A that return types from B, and so on. (See
[FAQ: Why can't we use separate rules?](#faq_separate_rules))
Because bindings are generated in an aspect, and not a rule, there are only two
places to configure the bindings of a target A:
* In the **source code** of the target receiving Rust support, using
configuration pragmas or attributes. (This is similar to protocol buffers.)
* In the **BUILD file**, on the target receiving Rust support, via
`aspect_hints`. Aspect hints are a storage location for configuration data,
readable by the aspect, placed directly on the target that the aspect runs
on.
Generally speaking, it's better to modify the source code than to configure
externally via aspect hints. However, some source code annotations are
nonstandard and can have performance implications (see b/321933939). In addition
to this, source code is not readable from the build system itself, and so where
configuring a target requires customizing the build graph, these must go in
aspect hints.
For these reasons, currently most publicly available methods of customizing
bindings occur in aspect hints.
In any case, any configuration or support for Rust is done *directly* to the
target.
### Example
To enable Crubit on a C++ target, one actually modifies the target itself,
adding `aspect_hints = ["//features:supported"]`. This must
be an aspect hint, not a source code annotation, for all of the above reasons:
1. It makes the build faster and more resilient: when Crubit is disabled on a
target, Bazel needs to know so it can completely avoid running Crubit on it.
2. There is no stable, reliable, and style-approved header-wide pragma we can
use for enabling/disabling Crubit, but `aspect_hints` does work.
### FAQ: Why can't we use separate rules? {#faq_separate_rules}
A library `A`, and its bindings `bindings(A)`, must be linked together in the
build graph: if `B` uses a type from `A`, then `bindings(B)` uses a type from
`bindings(A)`.
Crucially, this also goes in reverse: if a *Rust library* `C` uses a type from
`bindings(A)`, then `reverse_bindings(C)` uses a type from `A`. This forms a
natural dependency cycle: the build graph must understand both the link from `A`
to `bindings(A)`, and the link from `bindings(A)` to `A`.
Crubit resolves this by making `A` and `bindings(A)` the same target in the
build graph: bindings for a target are obtained by reading an aspect on the
target.
It is not possible to make `A` one build target, and `bindings(A)` a separate
build target, call it `X`:
1. We cannot literally configure on `A` that its bindings are in a different
target `X`, because this ends up producing a real dependency cycle, as
mentioned above: if `bindings(A)` = `X`, then `reverse_bindings(X)` = `A`.
2. We cannot avoid the cycle by creating the dependency "lazily", or
"dynamically" based on e.g. a naming scheme during Bazel analysis. Bazel
dependencies cannot be discovered dynamically; once Bazel reaches this point
of evaluation, dependencies need to be fully resolved: labels in `deps` are
no longer strings in this stage, they are edges in a dependency graph. That
graph must not have cycles.
3. In some limited cases, we *can* hardcode the relationship within Crubit:
Crubit is actually two aspects, each of which handles a single direction of
interop. So Crubit can hardcode inside of itself that `bindings(A)` = `X`,
and in the other half, that `reverse_bindings(X)` = `A`. This requires that
Crubit itself depends on `A` and `X`. Therefore, to avoid another dependency
cycle, neither `A` nor `X` can depend/use Crubit in their transitive
dependencies. This is not feasible except in very isolated cases. Currently,
we only do this for the Rust and C++ standard libraries.
To compare with another similar technology, PyCLIF avoids this problem because
it only supports "one-directional" interop, and so it doesn't need to avoid
dependency cycles. Crubit is bidirectional, and this comes with some technical
restrictions.
### FAQ: Why are there extra dependencies in `deps(target)`? {#faq_dependency_edges}
Because the Rust bindings are created using an **aspect** on the C++ target,
everything that the Rust bindings need to depend on will appear in a Bazel query
/ depserver query for `deps(target)`.
For example, if you wanted to add some extra source file to the Rust bindings,
you might specify them in `aspect_hints`. This file will show up in
`deps(target)`.
These Rust-only deps are not used at all in pure-C++ builds (the Bazel actions
registered by them won't be executed), but they will show up in the dependency
graph anyway, due to how Bazel query and depserver track dependencies.
NOTE: In particular, if your project has tests that count/limit the transitive
dependencies of a C++ binary, they will overcount the dependencies, and the
overcounting will get worse as Rust support is rolled out through the C++ build
graph.
</section>
## Wrapping and type bridging vs direct use of types {#bridging}
Crubit automatically generates layout-compatible Rust equivalents of C++ types.
When the C++ type is [Rust-movable](classes_and_structs.md#rust_movable), the
Crubit-generated Rust type is Rust-movable, these can be used by value, by
pointer, in struct fields, arrays, and any other compound data type. A C++
pointer `const T*` can become a Rust `*const T`, and a C++ `T` field can become
a Rust `T` field, and so on, with few restrictions.
For example, the following C++ type:
```c++
struct Vec2d {
float x;
float y;
};
```
Becomes (roughly) the following Rust type:
```rust
#[repr(C)]
struct Vec2d {
pub x: f32,
pub y: f32,
}
```
These have an identical layout, and so a C++ pointer or field containing a C++
`Vec2d` is exactly equivalent to a Rust pointer or field containing a Rust
`Vec2d`.
(See [Types](../types/) for more information about layout-compatibility.)
Because of this, it is often not required to manually write any new types. The
bindings generated by Crubit will produce a working type automatically.
### When to wrap a type
There are, still, a handful of reasons to manually write "wrapper" types which
encapsulate or replace the original C++ type (or its Crubit-generated Rust
type).
* If the type is **not** naturally Rust-movable, but it's important for the
Rust type to be Rust-movable. It may be possible to make changes to the C++
code to make the type Rust-movable using some of the strategies described in
[the cookbook](cookbook.md#rust_movable). This allows the greatest
flexibility, as the type becomes usable in almost every context. But if that
is not possible, writing a new "wrapper" type can keep Rust programmers
productive.
* Some Rust types have very special semantics, which are impossible to
implement in the bindings for a C++ type. For example, Rust has special
support for `Result` and `Option` in error handling via the `?` operator,
which cannot yet be implemented by `Status` or `std::optional` using stable
Rust features. These privileged Rust types can be used instead of the
equivalent C++ types, as a wrapper type.
In these cases, we may bridge to a wrapper type as a workaround, while we
hopefully fix the underlying issues that mean we cannot directly use the
underlying type. This offers us a subset of the API we want, and allows
continued progress.
### Why not to wrap a type
Wrapper types work best when passed by value: if you return a `T` in C++, the
corresponding Rust function can automatically convert it to and return a
`WrappedT`.
However, no conversion is possible for references or fields, which really are
the original type, with its size and alignment and address in memory - to make
this work transparently requires an ever-expanding network of wrapper types, one
for every compound data type that might contain `T`:
* `T` must become `WrappedT`
* `const T&`, if it is supported at all, must become something like
`TRef<'a>`, or a dynamically sized `&TView`.
* `std::vector<T>`, if it is supported at all, must become something like
`TVector`.
* `struct MyStruct {T x;}` must become a wrapped `WrappedMyStruct`.
* ...
The problems introduced by wrapper types can easily outweigh the benefits that
they bring. Crubit aims to reduce their necessity to zero over time.
#### *Bad reasons to wrap a type*
In most other circumstances where one might *want* to reach for wrapper types,
alternatives exist:
* If we want to use a wrapper type in order to give the type a nicer Rust API,
then, as an alternative, one can customize the Rust API of the wrapped type
using an aspect hint. You can define new methods and trait implementations
to the side, without altering any C++ code.
* If we want to use a wrapper type in order to change the type invariants – to
make them stricter or looser – this is fine, as long as it doesn't *replace*
the not-as-nice type. For example, if a C++ API returns `std::string`
(bytes, "probably" UTF-8), the Rust equivalent should not return a Rust
`String` (Unicode, definitely UTF-8). Changing type invariants in-place
causes some APIs to become impossible to call, and causes the Rust and C++
ecosystems to diverge and become incompatible. The bindings should be high
fidelity. Wrapper types of this form should be optional, and available
equally to both C++ and Rust to avoid fragmenting the ecosystem.
</section>
## Fidelity {#fidelity}
Anything possible in C++ should be possible in Rust. See
[<internal link>](http://goto.google.com/no-hitchhikers).
The Rust API for a given C++ API should not try to make the interface "better"
at more than a superficial level, because it can compromise the ability of other
teams to write new Rust code, or port existing C++ code to Rust.
**Good changes:**
* Changing method names, especially to names that Rust callers might expect.
For example, changing `Status::ok()` (C++) to `Status::is_ok()` (Rust) –
Rust callers expect many of these boolean functions to be prefixed with
`is_`.
* Adding new APIs that Rust users expect. For example, trait implementations
that allow the type to better interoperate with the Rust ecosystem, or
functions which accept a `Path` or `&str` in *addition* to a raw C++
`string_view`.
* Reifying C++ comments around lifetime or safety as actual lifetime
annotations or `unsafe` declarations.
If the Rust type is outright unnatural to use, people won't use it, and it's
worse for the ecosystem to have two APIs than one API.
**Bad changes:**
* Removing deprecated APIs which still have C++ callers.
* Placing new requirements on Rust callers that were not placed on C++
callers, such as requiring UTF-8 when C++ does not.