# High-level design of C++/Rust interop

This document describes the high-level design choices of Crubit, a C++/Rust
Bidirectional Interop Tool.

[TOC]

## C++/Rust interop goal

**The primary goal of Crubit is to enable Rust to be used side-by-side with C++
in large existing codebases.**

In the short term we would like to focus on codebases that roughly follow the
Google C++ style guide to improve the interop fidelity. Other, more diverse
codebases are possible prospective users in the long term, and their needs will
be addressed by customization and extension points.

## C++/Rust interop requirements

In support of the interop goal, we identify the following requirements:

1.  **Enable using existing C++ libraries from Rust with high fidelity**
    *   **High fidelity means that interop will make C++ APIs available in Rust,
        even when those API projections would not be idiomatic, ergonomic, or
        safe** in Rust, to facilitate cheap, small step incremental migration
        workflow. Based on the experience of other cross-language
        interoperability systems and language migrations (for example,
        Objective-C/Swift, Java/Kotlin, JavaScript/TypeScript), we believe that
        working in a mixed C++/Rust codebase would be significantly harder if
        some C++ APIs were not available in Rust.
    *   **Interop will bridge C++ constructs to Rust constructs only when the
        semantics match closely**. Bridging large semantic gaps creates a risk
        of making C++ APIs unusable in Rust, as well as a risk of creating
        performance problems. For example, interop will not bridge destructive
        Rust moves and non-destructive C++ moves; instead it will make C++ move
        constructors and move assignment operators available to use in Rust
        code. As another example, interop will not bridge C++ templates and Rust
        generics by default.
    *   Interop should be **performant**, as close to having no runtime cost as
        possible. The performance costs of the interop should be documented, and
        where possible, intuitive to the user.
    *   Interop should be **ergonomic and safe**, as long as ergonomic and
        safety accommodations do not hurt performance or fidelity. Where a
        tradeoff is possible, the interop will choose performance and fidelity
        over ergonomics; the user will be allowed to override this choice.
    *   **Enable owners of the C++ API to control their Rust API projection**,
        for example, with attributes in C++ headers and by extending generated
        bindings with a manually implemented overlay. Such an overlay will wrap
        or extend generated bindings to improve ergonomics and safety.
2.  **Enable using Rust libraries from C++**
    *   However, using C++ libraries from Rust has a higher priority than using
        Rust libraries from C++.
3.  **Put little to no barriers to entry**
    *   **Ideally, no boilerplate code** needs to be written in order to start
        using a C++ library from Rust. Adding some extra information can make
        the generated bindings more ergonomic to use.
    *   The amount of **duplicated API information is minimized**.
    *   **Future evolution of C++ APIs should be minimally hindered by the
        presence of Rust users**.

## Proposal and high-level design

**We propose to develop our own C++/Rust interop tooling.** There are no
existing tools that satisfy all of our requirements. Modifying an existing tool
to fulfill these requirements would take more effort than building a new tool
from scratch or might require forking its codebase given that some existing
tools have goals that conflict with our goals.

See the "alternatives considered" section for a discussion of existing tools.

### Source of information about C++ API

**Interop tooling will read C++ headers**, as they contain the information
needed to generate Rust API projections and the necessary glue code. Interop
tooling that is used during builds will not read C++ source files, to maintain
the principle that C++ API information is only located in headers, and that a
C++ library can't break the build of its dependencies by changing source files.

Some interop-adjacent tools (e.g., large-scale refactoring tools that seed the
initial set of lifetime annotations) will also read C++ sources. These tools
will not be used during builds.

**Pros**

*   **Minimal barrier to entry**: minimal amount of manual work is required to
    start using a C++ library from Rust.
    *   Encourages leaf projects to start incrementally adopting Rust in new
        code, or incrementally rewriting C++ targets in Rust.
*   **C++ API information is located only in headers**, regardless of the
    language that the API consumer is written in (C++ or Rust).
*   **Interop tooling that generates Rust API projections from a C++ header can
    get exactly the same information that the C++ compiler has** when processing
    a translation unit that uses one of the APIs declared within that header.
    *   Interop tooling can generate the most performant calls to C++ APIs,
        without C++-side thunks that translate the C++ ABI into a C ABI.
    *   Interop tooling can autodetect implementation details that are critical
        for interop but are not a part of the API surface (for example, the size
        and alignment of C++ classes that have private data members).
    *   In alternative solutions, users need to repeat these implementation
        details in sidecar files. Interop can verify that the specified
        information is correct through static assertions in generated C++ code,
        but the overall user experience is inferior.

**Cons**

*   **Having to read C++ headers makes interop tooling more complex.**
*   **The Rust projection of the C++ API is only visible in machine-generated
    files.**
    *   These are not trivially accessible.
    *   There is a limit on how readable these files can be made.
    *   We can mitigate these issues by building tooling that shows the Rust
        view of a C++ header (for example in Code Search, or in editors as an
        alternative go-to-definition target).

### Customizability

Interop tooling will be sufficiently customizable to accommodate the unique
needs of different C++ libraries in the codebase. Interop should be customizable
enough to accommodate existing codebases. C++ API owners can:

*   **Guide how interop tooling generates Rust API projections from C++
    headers**. For example, headers can provide:
    *   Custom Rust names for C++ function overloads (instead of applying the
        general interop strategy for function overloads),
    *   Custom Rust names for overloaded C++ operators,
    *   Custom Rust lifetimes for pointers and references mentioned in the C++
        API,
    *   Nullability information for pointers in the C++ API,
    *   Assertions (verified at compile time) and promises (not verified by
        tooling) that certain C++ types are trivially relocatable.
*   **Provide custom logic to bridge types**, for example, mapping C++
    `absl::StatusOr` to Rust `Result`.
*   **Provide API overlays** that improve the automatically generated Rust API.
    *   For example, the overlays could inject additional methods into
        automatically generated Rust types or hide some of the generated
        methods.

More intrusive customization techniques will be useful for template and
macro-heavy libraries where the baseline import rules just won't work. We
believe customizability will be an essential enabler for providing high-fidelity
interop.

### Source of additional information that customizes C++ API projection into Rust

Where C++ headers don't already provide all information necessary for interop
tooling to generate a Rust API projection, we will add such information to C++
headers whenever possible. If it is not desirable to edit a certain C++ header,
extra information can be stored in a sidecar file.

Examples of additional information that interop tooling will need:

*   **Nullability annotations.** C++ APIs often expose pointers that are
    documented or assumed by convention to be never null, but can't be
    refactored to references due to language limitations (for example,
    `std::vector<MyProtobuf *>`). If C++ headers don't provide nullability
    information for pointers in a machine-readable form, interop tooling has to
    conservatively mark all C++ pointers as nullable in the Rust API projection.
    The Rust compiler will then force users to write unnecessary (and
    untestable) null checks.
*   **Lifetimes of references and pointers** in C++ headers are not described in
    a machine-readable way (and sometimes are not even documented in prose).
    Lifetime information is essential to generate safe and idiomatic Rust APIs
    from C++ headers.

#### Additional information is stored in C++ headers

**Pros**

*   **Additional information needed for C++/Rust interop will be expressed as
    annotations on existing syntactic elements in C++.**
    *   The annotations are located in the most logical place.
    *   The annotations are more likely to be noticed and updated by C++ API
        owners.
    *   API owners retain full control over how the API looks in Rust.
*   **C++ users may find lifetime and nullability annotations useful.** For
    example, information about lifetimes is highly important to C++ and Rust
    users alike.
*   **C++ API definitions are only written once,** minimizing duplication and
    maintenance burden.

**Cons**

*   **Annotations that benefit Rust users can bother C++ API owners** who don't
    care about Rust. Especially at the beginning of integrating Rust into an
    existing codebase, C++ API owners can push back on adding annotations.
    *   To encourage adoption of annotations, we can develop tooling for C++
        that uses lifetime and nullability annotations to find bugs in C++ code.
    *   The pushback is likely to be short-term: if Rust takes off in a C++
        codebase, C++ library owners in that codebase will need to care about
        Rust users and how their API looks in Rust.
*   **There may be headers that we cannot (or would not want to) change**, for
    example, headers in third-party code, headers that are open-sourced, or when
    first-party owners are not cooperating.
    *   We can apply the
        [sidecar strategy](#additional-information-is-stored-in-sidecar-files)
        to these headers.

#### Additional information is stored in sidecar files

Additional information needed for C++/Rust interop can be stored in sidecar
files, similarly to Swift APINotes, CLIF etc. If sidecar files get sufficiently
broad adoption (for example, if annotating third-party code turns out to be
sufficiently important that optimizing C++/Rust interop ergonomics there would
be worth it), it would make sense to write sidecar files in a Rust-like
language, as that provides the most natural way to define Rust APIs.

**Pros**

*   **Sidecar files enable more broad adoption of annotations** by providing
    additional interop information without modifying C++ headers. Sidecar files
    will allow us to annotate headers in third-party code, headers that can't
    adopt annotations for technical reasons, or headers owned by first-party
    owners who are not cooperating.

**Cons**

*   Like in the
    [Use Rust code to customize API projection into Rust](#use-rust-code-to-customize-api-projection-into-rust)
    alternative, **some part of C++ API information is duplicated**, which is a
    burden for the C++ API owners.
*   The projection of C++ APIs to Rust is defined in a new language.
    *   C++ API owners and Rust users will have to learn this language.
    *   If we expect wide adoption of sidecar files, we will need to create
        tooling to parse, edit, and run LSCs against this language.
*   **Annotations in sidecar files are more prone to become out of sync with the
    C++ code.** When making changes to C++ code, engineers are less likely to
    notice and update the annotations in sidecar files.
    *   Presubmits can catch some cases of desynchronization between C++ headers
        and sidecar filles. However, presubmit errors that remind engineers to
        edit more files create an inferior user experience.
*   **Sidecar files create extra friction to modify the code.** Where previously
    one had to edit only a C++ header and a C++ source file, now one also likely
    needs to update a sidecar file.
    *   When engineers realize that they need to update a sidecar file, opening
        another file and finding the right place to update creates extra
        friction to modify code.
    *   Once engineers understand the extra maintenance burden associated with
        sidecar files that tend to go out of sync with headers, they will be
        less likely to adopt annotations in the first place.

### Glue code generation

C++/Rust interop tooling will generate executable glue code and type definitions
in Rust and in C++ (not just merely `extern "C"` function declarations) in order
to achieve the following goals:

*   **Enable instantiating C++ templates from Rust, and monomorphizing Rust
    generics from C++. Enable Rust types to participate in C++ inheritance
    hierarchies.**
    *   For example, imagine Rust code using an object of type
        `std::vector<MyProtobuf>`, while C++ code in the same program is never
        instantiating this type. The Bazel `rust_library` target that mentions
        this type must therefore be responsible for instantiating this template
        and linking the resulting executable code into the final program. We
        propose that this instantiation happens in an automatically generated
        "glue" C++ translation unit that is a part of that `rust_library`.
*   **Enable automatically wrapping C++ code to be more ergonomic in Rust.** For
    example:
    *   `extern "C"` functions in Rust are necessarily unsafe (it is a language
        rule). We would like the vast majority of C++ API projections into Rust
        to be safe. In the current Rust language, we can achieve that only by
        wrapping the unsafe `extern "C"` function in a safe function marked with
        `#[inline(always)]`.
    *   C++ API owners can provide rules for automatic type bridging, for
        example, mapping C++ `absl::StatusOr` to Rust `Result`. This conversion
        necessitates generation of a Rust wrapper function around a C++ entry
        point that takes advantage of such type bridging.
*   **Provide stable locations (C++ modules, Rust crates) that "own" the types
    from the language point of view.**
    *   For example, when we project a C++ type into Rust, its Rust definition
        must be located in a Rust crate. Furthermore, all Rust users of this
        type must observe it as being defined in the same crate in order for
        every users to consider that they use the same type. Indeed, this is a
        rule in Rust, that types defined in different crates are unrelated
        types.
    *   When we project a Rust type into C++ we could repeat its C++ definition
        in C++ code any number of times (for example, in every C++ user of a
        Rust type). This is technically fine because C++ allows the same type to
        be defined multiple types within a program. Nevertheless, such
        duplication is error-prone.

### Glue code is generated as C++ and Rust source code

Interop tooling will generate glue code as C++ and Rust source files, which are
then compiled with an unmodified compiler for that language. The alternative is
to generate LLVM IR or object files with machine code directly from interop
tooling.

**Pros**

*   **It is easy to inject customizations provided by API owners into generated
    source code.**
    *   The customizations will be written in the target language, making it
        (hopefully) intuitive to write them.
*   **Generated source code can be easily inspected by compiler engineers**
    while debugging interop problems and compiler bugs.
*   **Generated source code can be inspected and understood by interop users,**
    who are not compiler experts.
    *   LLVM IR wouldn't be meaningful to them.
*   **Generated source code is processed by the regular toolchain like any other
    code in the project.**
    *   It automatically benefits from all performance optimizations and
        sanitizers that are newly implemented in Clang and Rust compilers.
*   **We avoid adding a new tool that generates unique LLVM IR patterns.**
    *   We avoid making the job of the C++ toolchain maintainers harder.

**Cons**

*   **Interop tooling will be limited to generating LLVM IR and machine code
    that Clang and Rust compilers can generate.**

### Glue code and API projections will assume implementation details of the target execution environment

To provide the most ergonomic and performant interop, C++/Rust interop tooling
will allow the target codebase to opt into assuming various implementation
details of the target execution environment. For example:

*   When calling C++ from Rust, interop tooling can either wrap C++ functions in
    thunks with a C calling convention, or call C++ entry points directly.
    Thunks cause code bloat and can collectively add up to become a performance
    problem, so it is desirable to call C++ entry points from Rust directly.
    Interop tooling can do that only if it may assume a specific target platform
    and C++ ABI.

Implementation details of the target execution environment that are considered
stable enough will be reflected in API projections, for example:

*   The C++ standard does not specify sizes of integer types (`short`, `int`,
    `long` etc.) To map them to Rust, interop tooling will need to assume a size
    that they have on the platform that targets in practice. The alternative
    would be to create target-agnostic integer types (for example, `Int` in
    Swift is a strong typedef for `Int32` on 32-bit targets, and `Int64` on
    64-bit targets), but this makes it harder to provide idiomatic, transparent,
    high-performance interop.
*   The C++ standard does not specify whether standard library types like
    `std::vector` are trivially relocatable; it is an implementation detail.
    Universal interop tooling would have to conservatively assume
    non-trivially-relocatable types. Interop tooling specific to certain
    environments can rely on libc++ providing a trivially-relocatable
    `std::vector` and project it into Rust in a much more ergonomic way.

**Pros**

*   **Interop tooling will generate the most performant code sequences** to call
    foreign language functions.
    *   If interop tooling generates portable code, it would have some overhead.
        The overhead can be eliminated by C++ and Rust optimizers at least in
        some cases, but at the cost of increased build times. For example,
        eliminating thunks would require turning on LTO, which is not fast, and
        usually only used for release builds. It is much preferable to not
        generate thunks in the first place, if the target platform does not need
        them.
*   **Ergonomics of API projections will be improved.**
    *   For example, whether a C++ type is trivially relocatable or not is an
        implementation detail in C++, transparent to C++ users of that type, but
        it makes a huge ergonomic difference in the Rust API projection.

**Cons**

*   **C++ code will have additional evolution constraints.**
    *   For example, changing a type from trivially relocatable to non-trivially
        relocatable is a non-API-breaking change for C++ users, but it would
        break Rust users.
*   **It would be more difficult to switch internal environments to a different
    C++ standard library.**
*   **Code that is deployed in environments that have incompatible
    implementation details won't be able to use this C++/Rust interop system.**
    *   Alternatively, these executables would have to bring a suitable
        execution environment with them (e.g., a copy of libc++).

### Interop tooling should be maintainable and evolvable for a long time

We should design and implement C++/Rust interop tooling in such a way that we
can maintain and evolve it for more than a decade. If Rust becomes tightly
integrated into an existing C++ project, specific requirements for interop and
API projection rules will keep changing. The more Rust adoption we will have,
the more library and team-specific interop customizations we will have to
support, and the more it will make sense for the performance team to tweak
generated code to implement sweeping optimizations. These kinds of changes
should be readily possible, and they should not create conflicts of interest
between diferent users of the interop tooling.

### Interop tooling should facilitate C++ to Rust migration

C++/Rust interop tooling should try to create a favorable environment for
migrating C++ code to Rust. Specifically, projections of C++ APIs into Rust
should be implementable in Rust. This way, a C++ library can be converted from
C++ into Rust transparently for its users, as its public API won't change.

## Alternatives Considered: Design decisions

### Repeat C++ API completely in a separate IDL

Instead of reading C++ headers in the interop tooling, we would require the user
to repeat the C++ API in some other form, for example, in a Rust-based IDL like
in the cxx crate, or in sidecar files in a completely new format.

**Pros**

*   **Interop tooling can be simpler if it does not have to read C++ headers**.
    But even under this alternative approach, tooling might want to read C++
    headers, nullifying this advantage. For example, tooling might want to
    automatically generate an initial Rust snippet or to suggest in presubmits
    to adjust the Rust code that mirrors a C++ API when that C++ API changes.
*   The **most natural way to define Rust APIs** is by using Rust code or
    Rust-like syntax in sidecar files.
*   **Available Rust APIs are defined in easily accessible checked-in files.**
*   **API definitions written by a human might have higher quality, on
    average.**

**Cons**

*   **A big part of the C++ API needs to be duplicated** to reliably match the
    Rust code with the C++ declarations. The initial code can be generated by
    tooling, but it has to be kept in sync. This is a burden for the C++ API
    owners, potentially a bigger one than allowing annotations in C++ headers.
    *   There is a risk that C++ API owners might refuse to own IDL files.
*   The need to create a sidecar file creates a **barrier to start using C++
    libraries from Rust.**
    *   While the duplication overhead is justifiable for widely-used libraries,
        it is relatively high for libraries with few users and binaries, making
        it less likely that leaf teams will start adopting Rust.
*   **When the C++ API is changed, the Rust definitions become out-of-sync with
    it.** Tooling needs to detect this, and the Rust definitions need to be
    changed (either manually or tool-assisted).
*   There is no effective way to verify Rust binding code at the presubmit time
    of a C++ library other than building downstream projects.
*   **Mapping Rust API definitions to the original C++ API definitions is more
    complicated and error-prone**. For example, how would we target a specific
    overload of a function or constructor?
*   There is a **risk that individual teams will build team-specific tooling
    that generates IDL files** from C++ headers or generates both IDL files and
    C++ headers from a single source. These solutions are unlikely to scale to
    existing large codebases and will likely only work for that specific team.

### Use Rust code to customize API projection into Rust

An alternative to storing additional information in C++ headers is to put it
into Rust code. For example, the cxx crate requires users to re-state the C++
API in Rust syntax, adding information about lifetimes and nullability. The pros
and cons of this choice are the same as when defining a special IDL that repeats
the C++ API completely (see above).

### Generate glue code in binary formats

Instead of generating glue code as textual sources, interop tooling could use
Clang and LLVM APIs to emit object files with C++ glue code and use Rust
compiler APIs to generate rmeta and rlib files with Rust glue code.

**Pros**

*   **More flexibility in the code that can be generated.** Controlling LLVM IR
    generation allows interop tooling to generate code that an unmodified
    compiler can't generate from textual source code. For example, the Rust
    language does not have any constructs that map to `linkonce_odr` functions
    in LLVM IR; if the interop tooling embedded the Rust compiler as a library
    and had more control over how it generates the IR, we could make that
    happen.

**Cons**

*   Injecting customizations provided by API owners is harder.
*   LLVM, Clang, and Rust compiler APIs are not stable. The format of Rust
    metadata files is not stable either. The larger the API subset we consume
    from Clang and Rust, the more difficult it becomes to maintain the tooling.
*   To generate object files the interop tooling has to ensure that its
    Clang/LLVM version and configuration is identical with the Clang compiler
    used to build other C++ code.
    *   We can solve this problem, but it makes the system more fragile,
        compared to using existing C++ and Rust compilers to compile generated
        sources.
*   From time to time LLVM introduces bugs that cause miscompilations. If
    interop tooling embeds LLVM, we would be adding another tool that toolchain
    engineers will need to look into when debugging a miscompilation. We would
    be making the job of C++ toolchain maintainers harder.

## Alternatives Considered: Existing tools

### bindgen

[bindgen](https://rust-lang.github.io/rust-bindgen/) **automatically generates
Rust bindings from C and C++ headers**, which it consumes using libclang. The
generated **bindings are pure Rust code** that interfaces with C and C++ using
Rust’s [built-in FFI for C](https://doc.rust-lang.org/nomicon/ffi.html)
(`#[repr(C)]` to indicate that a struct should use C memory layout and `extern
"C"` to indicate that a function should use a C calling convention). C++
functions are handled by generating a Rust `extern "C"` function that has the
same ABI as the C++ function and attaching a `link_name` attribute with the
mangled name.

See
[here](https://manishearth.github.io/blog/2021/02/22/integrating-rust-and-c-plus-plus-in-firefox/)
for an in-depth description of the use of bindgen in Stylo, a Rust component in
Firefox.

**Pros**

*   **The oldest and the most mature** of the existing C++ interop tools
    (developed
    [since Feb 2012](https://github.com/rust-lang/rust-bindgen/commit/9fe92b0cfd48d5ebd1c82af8b1ff041f8c416a65)).

**Cons**

*   **Deficiencies in safety and ergonomics**, for example:
    *   References are imported as pointers. No lifetimes, no null-safety.
    *   Constructors and destructors are not called automatically.
    *   Overloads are distinguished by a numbered suffix in Rust. These numbers
        clutter the source code and are hard to remember, as they have no
        meaning. Adding overloads can change the numbering and hence break Rust
        callers.
*   It is **impossible to use C++ inline functions and templates** from Rust
    because of bindgen’s architecture[^1]. The architecture is unlikely to
    change, and therefore, this is a dealbreaker.

**Evaluation**

bindgen could be used in a project that has very limited C++ interop needs.
However, creating safe and ergonomic wrappers for the generated bindings would
require additional effort. Our vision and goals for C++ interop are very
different from what bindgen provides.

### cbindgen

[cbindgen](https://github.com/eqrion/cbindgen) **automatically generates C or
C++ headers for Rust libraries which expose a public C API**.

**Pros**

*   **An old and mature tool** (developed
    [since March 2017](https://github.com/eqrion/cbindgen/commit/215d3a987b223d4a1a878e2385c8677d5ae3a80b)).

**Cons**

*   **Shallow understanding of Rust's modules and types**.

    *   [`cbindgen`'s docs](https://github.com/eqrion/cbindgen/blob/master/docs.md)
        point out that "A major limitation of cbindgen is that it does not
        understand Rust's module system or namespacing. This means that if
        cbindgen sees that it needs the definition for MyType and there exists
        two things in your project with the type name MyType, it won't know what
        to do. Currently, cbindgen's behaviour is unspecified if this happens."
    *   This limitation seems mostly caused by building `cbindgen` on top of
        [the `syn` crate](https://docs.rs/syn). `syn` is able to parse Rust
        source code into an AST, but there is no facility at the `syn` level for
        type deduction or module traversal. Building such functionality would
        require replicating parts of the `rustc` compiler into `cbindgen`, or
        alternatively rewriting `cbindgen` on top of
        [the `rustc_driver` crate](https://doc.rust-lang.org/stable/nightly-rustc/rustc_driver/)).

*   **Support of only `extern "C"` functions**.

    *   Supporting Rust functions that use the default calling convention would
        require generating not only C/C++ headers, but also generating Rust
        source with `extern "C"` thunks that trampoline into the original
        function (requiring that `cbindgen` starts generating Rust sources).

*   **Support of only `#[repr(C)]` structs**.

    *   Default memory layout of Rust structs is
        [unspecified](https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#default-layout-repr-rust:~:text=the%20default%20layout%20of%20structs%20is%20not%20specified)
        and therefore cannot be determined by code examination at the `syn`
        level.
    *   Even if the memory layout could be determined, the layout can change in
        a future compiler version, or change depending on compilation command
        line flags. To prevent using stale layout information, the
        auto-generated FFI code should therefore include compile-time assertions
        that the layout didn't change from the FFI generation time. The
        assertions should be present both in the generated C/C++ headers *and*
        on the Rust side (requiring that `cbindgen` starts generating Rust
        sources). The assertions would effectively verify that the FFI
        generation is driven by the build system (i.e. by Bazel, or Cargo, or
        GN/ninja, rather than manually) and that the integration between the FFI
        tools and the build system doesn't have any bugs (e.g. that it
        faithfully replicates all relevent compilation flags).

**Evaluation**

cbindgen could be used in a project that can create a narrow `extern "C"` /
`#[repr(C)]` API and that is ready to manage the risk of incorrect name/module
resolution. Wrapping additional Rust APIs would require extra effort.

**Take-aways for Crubit design**

Notes and observations about `cbindgen` can guide some design aspects of
Crubit's [`cc_bindings_from_rs`](../cc_bindings_from_rs/README.md) tool
(that similarly to `cbindgen` generates C++ bindings for Rust crates).
Using internal compiler knowledge (e.g. memory layout of structs, name and type
resolution) requires that `cc_bindings_from_rs` depends on
`rustc_driver` and other internal crates of `rustc`. The API of these crates is
unstable which might increase the risk and maintenance cost of Crubit.
Nevertheless, our experience with maintaining tools based on (also unstable)
Clang APIs suggests that this extra risk and cost is likely going to be
acceptable.

Build determinism requires that the Rust compiler produces the same output for
the same set of inputs (the same compiler version, the same command-line flags,
the same sources, etc.). This means that (despite
[conservative reservations about layout determinism](https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#default-layout-repr-rust:~:text=A%20note%20on%20determinism))
it should be okay to assume that `cc_bindings_from_rs` and `rustc` invocations
will observe the same memory layout of structs, but this requires that
`cc_bindings_from_rs` is built against exactly the same version of
`rustc_driver` libraries as `rustc`. (This should also be reinforced by
compile-time assertions in the generated FFI layer.)

### cxx

[cxx](https://cxx.rs/) generates **Rust bindings for C++ APIs and vice versa**
from an **interface definition language (IDL) included inline in Rust source
code.** cxx generates Rust and C++ source code from IDL definitions. To check
that the IDL definitions match the actual C++ API, cxx inserts static
assertions[^2] into the generated C++ code; it does not, however, read the C++
headers itself. cxx contains built-in bindings for various Rust and C++ standard
library types that are not customizable.

As far as we understand, cxx has the following design constraints and goals:

*   **Ship a stable product for its intended audience.**
    *   As a consequence, improvements such as integrating move semantics are
        not going to be accepted soon. We understand that cxx is not a vehicle
        for experimentation. cxx maintainers would prefer us to first show that
        our ideas work in a fork of cxx or in a different system, such as
        autocxx, and that our improvements pull their weight given the added
        complexity.
*   **Remain simple and transparent.** There is a limit on the amount of
    complexity that will be tolerated.
    *   There is a chance that improvements such as modeling C++ move semantics
        or various attempts at eliminating thunks will not be ever accepted in
        upstream cxx.
*   **Non-goal: Automatically provide high fidelity interop.**
    *   cxx is designed for the use case of an executable where C++ and Rust
        parts communicate through a narrow interface.
*   **Non-goal: Automatically provide the most performant interop in as many
    cases as possible.** For example:
    *   cxx does not attempt to eliminate C++-side thunks. Instead, using LTO is
        recommended.
    *   cxx considers it acceptable to allocate all objects of "opaque" types on
        the heap. Users who find these heap allocations unacceptable for
        performance reasons are expected to implement a different C++ entry
        point that does not hit this limitation and bind it to Rust instead of
        the original C++ API. Heap allocation is acceptable for many C++ classes
        in most environments, but the exceptions are important enough for us
        that this is a major restriction.

**Pros**

*   **Mature and ergonomic enough today for mixing C++ and Rust in existing
    codebases with limited C++ interop needs.**
*   We avoid being on a tech island.

**Cons**

*   cxx’s stability goal makes it **hard to experiment with how the Rust API
    looks.**
*   **Our goals are unlikely to align well with the goals of the intended user
    audience of cxx.** We would be pulling cxx in directions that make it a
    worse product for its current users.
*   **Almost no customizability**. Users who are not satisfied with what cxx
    does are expected to wrap the target C++ API in a different C++ API that is
    more friendly to cxx.
*   cxx tries to be compatible with most standard C++ implementations found in
    the real world, so it **cannot take advantage of unique guarantees provided
    by the target execution environment.**

**Evaluation**

cxx could be used in projects with limited C++/Rust interop requirements.
However, we would not be able to implement many interop features that we
consider essential (for example, move semantics, templates).

### autocxx

[autocxx](https://github.com/google/autocxx) **automatically generates Rust
bindings from C++ headers**. As the name implies, it automatically generates IDL
definitions for cxx, which then produces the actual bindings. In addition,
autocxx generates its own Rust and C++ code to extend the Rust API beyond what
cxx itself would provide, for example to support passing POD types by value.
autocxx consumes C++ headers indirectly by first running bindgen on them and
then parsing the Rust code output by bindgen.

autocxx’s
[design goals](https://www.chromium.org/Home/chromium-security/memory-safety/rust-and-c-interoperability)
are similar to our own in this document.

We did a case study on using an existing project's C++ API from Rust using
autocxx.

**Pros**

*   **Low barrier to entry**: Bindings are generated from C++ headers, no need
    to write duplicate API definitions.
*   **Ergonomic mappings** for many C++ constructs.
*   **Open to contributions that change the generated Rust APIs** or make
    architectural changes.

**Cons**

*   **Relatively new and immature.**
*   **Cannot (yet) consume complex headers without errors.** We’ve managed to
    import some actual Spanner headers, but there are still enough outstanding
    issues that we can’t yet do anything useful with Spanner.
*   **Architecture can make modifications difficult.** autocxx is built on top
    of two other tools, bindgen and cxx, and the interfaces between these
    components can make it harder to make a modification than it would be in a
    monolithic tool. Specifically:
    *   autocxx uses bindgen to generate a description of the C++ API that it
        can parse easily (as opposed to trying to parse C++ headers either
        directly or using Clang APIs). Since bindgen was not intended for this
        purpose, its output lacks some information that autocxx needs, so
        autocxx [has forked](https://crates.io/crates/autocxx-bindgen) bindgen
        to adapt it to its needs. The forked version emits additional
        information about the C++ API in the form of attributes attached to
        various API elements.
    *   bindgen in turn is built on the libclang API, which doesn’t surface all
        of the functionality available through Clang’s C++ API. Adding features
        to libclang requires additional effort and has a 6 month lead time to
        appear in a stable release (to become eligible to be used from bindgen).
    *   When errors occur, it can be hard to figure out which of the components
        is responsible.
    *   Adding features can require touching multiple components, which requires
        commits to multiple repositories.

**Evaluation**

We initially intended to use autocxx to prototype various interop ideas and
potentially as a basis for a field trial. We still believe this would be
feasible, but after trying to modify autocxx and its bindgen fork during an
internal C++/Rust interop study, we feel that autocxx’s complex architecture is
enough of an impediment that we could achieve our goals with less total effort
by creating an interop tool from scratch that consists of a single codebase and
uses the Clang C++ API to directly interface with Clang.

[^1]: Doing so would require either generating C++ source code or interfacing
    deeply enough with Clang to generate object code for inline functions and
    template instantiation.
[^2]: And tricks such as suitable type conversions that force the C++ compiler
    to perform appropriate checks at compile time.
