Best Practices Writing Rust Bindings for Existing C++ Libraries

Introduction

This document is an attempt at guidance for how Rust changes can be made to existing C++ libraries, including core foundational libraries.

For an introduction, see Rust Bindings for C++ Libraries.

Code Organization

For technical reasons, it is generally necessary for the C++ library and its Rust bindings to be the same Bazel target. It is not possible to define the Rust bindings for a target as a completely separate and independent target. The automatically generated bindings, and their configuration, must be on and in the C++ target itself.

The reasons why are fairly technical, and you can stop reading here if you're OK with this.

Technical Justification

Crubit generates bindings using Bazel aspects: given an arbitrary C++ Bazel target, Crubit generates, in an aspect, the Rust library which wraps it. To users it appears as if the Bazel target was both a C++ and a Rust library.

This is necessary for the same reason that it‘s necessary for protocol buffers. And, just like protocol buffers, this means that we don’t have a rust_library target where we could customize its behavior using Bazel attributes.

Specifically, we cannot use a regular Bazel rule for bindings generation because the rule cannot generate bindings for transitive dependencies: if A depends on B, then bindings(A) depends on bindings(B), so that bindings(A) can wrap functions in A that return types from B, and so on. (See FAQ: Why can't we use separate rules?)

Because bindings are generated in an aspect, and not a rule, there are only two places to configure the bindings of a target A:

  • In the source code of the target receiving Rust support, using configuration pragmas or attributes. (This is similar to protocol buffers.)
  • In the BUILD file, on the target receiving Rust support, via aspect_hints. Aspect hints are a storage location for configuration data, readable by the aspect, placed directly on the target that the aspect runs on.

Generally speaking, it's better to modify the source code than to configure externally via aspect hints. However, some source code annotations are nonstandard and can have performance implications (see b/321933939). In addition to this, source code is not readable from the build system itself, and so where configuring a target requires customizing the build graph, these must go in aspect hints.

For these reasons, currently most publicly available methods of customizing bindings occur in aspect hints.

In any case, any configuration or support for Rust is done directly to the target.

Example

To enable Crubit on a C++ target, one actually modifies the target itself, adding aspect_hints = ["//features:supported"]. This must be an aspect hint, not a source code annotation, for all of the above reasons:

  1. It makes the build faster and more resilient: when Crubit is disabled on a target, Bazel needs to know so it can completely avoid running Crubit on it.
  2. There is no stable, reliable, and style-approved header-wide pragma we can use for enabling/disabling Crubit, but aspect_hints does work.

FAQ: Why can't we use separate rules? {#faq_separate_rules}

A library A, and its bindings bindings(A), must be linked together in the build graph: if B uses a type from A, then bindings(B) uses a type from bindings(A).

Crucially, this also goes in reverse: if a Rust library C uses a type from bindings(A), then reverse_bindings(C) uses a type from A. This forms a natural dependency cycle: the build graph must understand both the link from A to bindings(A), and the link from bindings(A) to A.

Crubit resolves this by making A and bindings(A) the same target in the build graph: bindings for a target are obtained by reading an aspect on the target.

It is not possible to make A one build target, and bindings(A) a separate build target, call it X:

  1. We cannot literally configure on A that its bindings are in a different target X, because this ends up producing a real dependency cycle, as mentioned above: if bindings(A) = X, then reverse_bindings(X) = A.
  2. We cannot avoid the cycle by creating the dependency “lazily”, or “dynamically” based on e.g. a naming scheme during Bazel analysis. Bazel dependencies cannot be discovered dynamically; once Bazel reaches this point of evaluation, dependencies need to be fully resolved: labels in deps are no longer strings in this stage, they are edges in a dependency graph. That graph must not have cycles.
  3. In some limited cases, we can hardcode the relationship within Crubit: Crubit is actually two aspects, each of which handles a single direction of interop. So Crubit can hardcode inside of itself that bindings(A) = X, and in the other half, that reverse_bindings(X) = A. This requires that Crubit itself depends on A and X. Therefore, to avoid another dependency cycle, neither A nor X can depend/use Crubit in their transitive dependencies. This is not feasible except in very isolated cases. Currently, we only do this for the Rust and C++ standard libraries.

To compare with another similar technology, PyCLIF avoids this problem because it only supports “one-directional” interop, and so it doesn't need to avoid dependency cycles. Crubit is bidirectional, and this comes with some technical restrictions.

FAQ: Why are there extra dependencies in deps(target)? {#faq_dependency_edges}

Because the Rust bindings are created using an aspect on the C++ target, everything that the Rust bindings need to depend on will appear in a Bazel query / depserver query for deps(target).

For example, if you wanted to add some extra source file to the Rust bindings, you might specify them in aspect_hints. This file will show up in deps(target).

These Rust-only deps are not used at all in pure-C++ builds (the Bazel actions registered by them won't be executed), but they will show up in the dependency graph anyway, due to how Bazel query and depserver track dependencies.

NOTE: In particular, if your project has tests that count/limit the transitive dependencies of a C++ binary, they will overcount the dependencies, and the overcounting will get worse as Rust support is rolled out through the C++ build graph.

Wrapping and type bridging vs direct use of types

Crubit automatically generates layout-compatible Rust equivalents of C++ types. When the C++ type is Rust-movable, the Crubit-generated Rust type is Rust-movable, these can be used by value, by pointer, in struct fields, arrays, and any other compound data type. A C++ pointer const T* can become a Rust *const T, and a C++ T field can become a Rust T field, and so on, with few restrictions.

For example, the following C++ type:

struct Vec2d {
    float x;
    float y;
};

Becomes (roughly) the following Rust type:

#[repr(C)]
struct Vec2d {
    pub x: f32,
    pub y: f32,
}

These have an identical layout, and so a C++ pointer or field containing a C++ Vec2d is exactly equivalent to a Rust pointer or field containing a Rust Vec2d.

(See Types for more information about layout-compatibility.)

Because of this, it is often not required to manually write any new types. The bindings generated by Crubit will produce a working type automatically.

When to wrap a type

There are, still, a handful of reasons to manually write “wrapper” types which encapsulate or replace the original C++ type (or its Crubit-generated Rust type).

  • If the type is not naturally Rust-movable, but it's important for the Rust type to be Rust-movable. It may be possible to make changes to the C++ code to make the type Rust-movable using some of the strategies described in the cookbook. This allows the greatest flexibility, as the type becomes usable in almost every context. But if that is not possible, writing a new “wrapper” type can keep Rust programmers productive.
  • Some Rust types have very special semantics, which are impossible to implement in the bindings for a C++ type. For example, Rust has special support for Result and Option in error handling via the ? operator, which cannot yet be implemented by Status or std::optional using stable Rust features. These privileged Rust types can be used instead of the equivalent C++ types, as a wrapper type.

In these cases, we may bridge to a wrapper type as a workaround, while we hopefully fix the underlying issues that mean we cannot directly use the underlying type. This offers us a subset of the API we want, and allows continued progress.

Why not to wrap a type

Wrapper types work best when passed by value: if you return a T in C++, the corresponding Rust function can automatically convert it to and return a WrappedT.

However, no conversion is possible for references or fields, which really are the original type, with its size and alignment and address in memory - to make this work transparently requires an ever-expanding network of wrapper types, one for every compound data type that might contain T:

  • T must become WrappedT
  • const T&, if it is supported at all, must become something like TRef<'a>, or a dynamically sized &TView.
  • std::vector<T>, if it is supported at all, must become something like TVector.
  • struct MyStruct {T x;} must become a wrapped WrappedMyStruct.
  • ...

The problems introduced by wrapper types can easily outweigh the benefits that they bring. Crubit aims to reduce their necessity to zero over time.

Bad reasons to wrap a type

In most other circumstances where one might want to reach for wrapper types, alternatives exist:

  • If we want to use a wrapper type in order to give the type a nicer Rust API, then, as an alternative, one can customize the Rust API of the wrapped type using an aspect hint. You can define new methods and trait implementations to the side, without altering any C++ code.

  • If we want to use a wrapper type in order to change the type invariants – to make them stricter or looser – this is fine, as long as it doesn't replace the not-as-nice type. For example, if a C++ API returns std::string (bytes, “probably” UTF-8), the Rust equivalent should not return a Rust String (Unicode, definitely UTF-8). Changing type invariants in-place causes some APIs to become impossible to call, and causes the Rust and C++ ecosystems to diverge and become incompatible. The bindings should be high fidelity. Wrapper types of this form should be optional, and available equally to both C++ and Rust to avoid fragmenting the ecosystem.

Fidelity

Anything possible in C++ should be possible in Rust. See .

The Rust API for a given C++ API should not try to make the interface “better” at more than a superficial level, because it can compromise the ability of other teams to write new Rust code, or port existing C++ code to Rust.

Good changes:

  • Changing method names, especially to names that Rust callers might expect. For example, changing Status::ok() (C++) to Status::is_ok() (Rust) – Rust callers expect many of these boolean functions to be prefixed with is_.
  • Adding new APIs that Rust users expect. For example, trait implementations that allow the type to better interoperate with the Rust ecosystem, or functions which accept a Path or &str in addition to a raw C++ string_view.
  • Reifying C++ comments around lifetime or safety as actual lifetime annotations or unsafe declarations.

If the Rust type is outright unnatural to use, people won‘t use it, and it’s worse for the ecosystem to have two APIs than one API.

Bad changes:

  • Removing deprecated APIs which still have C++ callers.
  • Placing new requirements on Rust callers that were not placed on C++ callers, such as requiring UTF-8 when C++ does not.