docs/design/design.md - crubit - Git at Google

 # High-level design of C++/Rust interop

 This document describes the high-level design choices of Crubit, a C++/Rust
 Bidirectional Interop Tool.

 [TOC]

 ## C++/Rust interop goal

 **The primary goal of Crubit is to enable Rust to be used side-by-side with C++
 in large existing codebases.**

 In the short term we would like to focus on codebases that roughly follow the
 Google C++ style guide to improve the interop fidelity. Other, more diverse
 codebases are possible prospective users in the long term, and their needs will
 be addressed by customization and extension points.

 ## C++/Rust interop requirements

 In support of the interop goal, we identify the following requirements:

 1.  **Enable using existing C++ libraries from Rust with high fidelity**
     *   **High fidelity means that interop will make C++ APIs available in Rust,
         even when those API projections would not be idiomatic, ergonomic, or
         safe** in Rust, to facilitate cheap, small step incremental migration
         workflow. Based on the experience of other cross-language
         interoperability systems and language migrations (for example,
         Objective-C/Swift, Java/Kotlin, JavaScript/TypeScript), we believe that
         working in a mixed C++/Rust codebase would be significantly harder if
         some C++ APIs were not available in Rust.
     *   **Interop will bridge C++ constructs to Rust constructs only when the
         semantics match closely**. Bridging large semantic gaps creates a risk
         of making C++ APIs unusable in Rust, as well as a risk of creating
         performance problems. For example, interop will not bridge destructive
         Rust moves and non-destructive C++ moves; instead it will make C++ move
         constructors and move assignment operators available to use in Rust
         code. As another example, interop will not bridge C++ templates and Rust
         generics by default.
     *   Interop should be **performant**, as close to having no runtime cost as
         possible. The performance costs of the interop should be documented, and
         where possible, intuitive to the user.
     *   Interop should be **ergonomic and safe**, as long as ergonomic and
         safety accommodations do not hurt performance or fidelity. Where a
         tradeoff is possible, the interop will choose performance and fidelity
         over ergonomics; the user will be allowed to override this choice.
     *   **Enable owners of the C++ API to control their Rust API projection**,
         for example, with attributes in C++ headers and by extending generated
         bindings with a manually implemented overlay. Such an overlay will wrap
         or extend generated bindings to improve ergonomics and safety.
 2.  **Enable using Rust libraries from C++**
     *   However, using C++ libraries from Rust has a higher priority than using
         Rust libraries from C++.
 3.  **Put little to no barriers to entry**
     *   **Ideally, no boilerplate code** needs to be written in order to start
         using a C++ library from Rust. Adding some extra information can make
         the generated bindings more ergonomic to use.
     *   The amount of **duplicated API information is minimized**.
     *   **Future evolution of C++ APIs should be minimally hindered by the
         presence of Rust users**.

 ## Proposal and high-level design

 **We propose to develop our own C++/Rust interop tooling.** There are no
 existing tools that satisfy all of our requirements. Modifying an existing tool
 to fulfill these requirements would take more effort than building a new tool
 from scratch or might require forking its codebase given that some existing
 tools have goals that conflict with our goals.

 See the "alternatives considered" section for a discussion of existing tools.

 ### Source of information about C++ API

 **Interop tooling will read C++ headers**, as they contain the information
 needed to generate Rust API projections and the necessary glue code. Interop
 tooling that is used during builds will not read C++ source files, to maintain
 the principle that C++ API information is only located in headers, and that a
 C++ library can't break the build of its dependencies by changing source files.

 Some interop-adjacent tools (e.g., large-scale refactoring tools that seed the
 initial set of lifetime annotations) will also read C++ sources. These tools
 will not be used during builds.

 **Pros**

 *   **Minimal barrier to entry**: minimal amount of manual work is required to
     start using a C++ library from Rust.
     *   Encourages leaf projects to start incrementally adopting Rust in new
         code, or incrementally rewriting C++ targets in Rust.
 *   **C++ API information is located only in headers**, regardless of the
     language that the API consumer is written in (C++ or Rust).
 *   **Interop tooling that generates Rust API projections from a C++ header can
     get exactly the same information that the C++ compiler has** when processing
     a translation unit that uses one of the APIs declared within that header.
     *   Interop tooling can generate the most performant calls to C++ APIs,
         without C++-side thunks that translate the C++ ABI into a C ABI.
     *   Interop tooling can autodetect implementation details that are critical
         for interop but are not a part of the API surface (for example, the size
         and alignment of C++ classes that have private data members).
     *   In alternative solutions, users need to repeat these implementation
         details in sidecar files. Interop can verify that the specified
         information is correct through static assertions in generated C++ code,
         but the overall user experience is inferior.

 **Cons**

 *   **Having to read C++ headers makes interop tooling more complex.**
 *   **The Rust projection of the C++ API is only visible in machine-generated
     files.**
     *   These are not trivially accessible.
     *   There is a limit on how readable these files can be made.
     *   We can mitigate these issues by building tooling that shows the Rust
         view of a C++ header (for example in Code Search, or in editors as an
         alternative go-to-definition target).

 ### Customizability

 Interop tooling will be sufficiently customizable to accommodate the unique
 needs of different C++ libraries in the codebase. Interop should be customizable
 enough to accommodate existing codebases. C++ API owners can:

 *   **Guide how interop tooling generates Rust API projections from C++
     headers**. For example, headers can provide:
     *   Custom Rust names for C++ function overloads (instead of applying the
         general interop strategy for function overloads),
     *   Custom Rust names for overloaded C++ operators,
     *   Custom Rust lifetimes for pointers and references mentioned in the C++
         API,
     *   Nullability information for pointers in the C++ API,
     *   Assertions (verified at compile time) and promises (not verified by
         tooling) that certain C++ types are trivially relocatable.
 *   **Provide custom logic to bridge types**, for example, mapping C++
     `absl::StatusOr` to Rust `Result`.
 *   **Provide API overlays** that improve the automatically generated Rust API.
     *   For example, the overlays could inject additional methods into
         automatically generated Rust types or hide some of the generated
         methods.

 More intrusive customization techniques will be useful for template and
 macro-heavy libraries where the baseline import rules just won't work. We
 believe customizability will be an essential enabler for providing high-fidelity
 interop.

 ### Source of additional information that customizes C++ API projection into Rust

 Where C++ headers don't already provide all information necessary for interop
 tooling to generate a Rust API projection, we will add such information to C++
 headers whenever possible. If it is not desirable to edit a certain C++ header,
 extra information can be stored in a sidecar file.

 Examples of additional information that interop tooling will need:

 *   **Nullability annotations.** C++ APIs often expose pointers that are
     documented or assumed by convention to be never null, but can't be
     refactored to references due to language limitations (for example,
     `std::vector<MyProtobuf *>`). If C++ headers don't provide nullability
     information for pointers in a machine-readable form, interop tooling has to
     conservatively mark all C++ pointers as nullable in the Rust API projection.
     The Rust compiler will then force users to write unnecessary (and
     untestable) null checks.
 *   **Lifetimes of references and pointers** in C++ headers are not described in
     a machine-readable way (and sometimes are not even documented in prose).
     Lifetime information is essential to generate safe and idiomatic Rust APIs
     from C++ headers.

 #### Additional information is stored in C++ headers

 **Pros**

 *   **Additional information needed for C++/Rust interop will be expressed as
     annotations on existing syntactic elements in C++.**
     *   The annotations are located in the most logical place.
     *   The annotations are more likely to be noticed and updated by C++ API
         owners.
     *   API owners retain full control over how the API looks in Rust.
 *   **C++ users may find lifetime and nullability annotations useful.** For
     example, information about lifetimes is highly important to C++ and Rust
     users alike.
 *   **C++ API definitions are only written once,** minimizing duplication and
     maintenance burden.

 **Cons**

 *   **Annotations that benefit Rust users can bother C++ API owners** who don't
     care about Rust. Especially at the beginning of integrating Rust into an
     existing codebase, C++ API owners can push back on adding annotations.
     *   To encourage adoption of annotations, we can develop tooling for C++
         that uses lifetime and nullability annotations to find bugs in C++ code.
     *   The pushback is likely to be short-term: if Rust takes off in a C++
         codebase, C++ library owners in that codebase will need to care about
         Rust users and how their API looks in Rust.
 *   **There may be headers that we cannot (or would not want to) change**, for
     example, headers in third-party code, headers that are open-sourced, or when
     first-party owners are not cooperating.
     *   We can apply the
         [sidecar strategy](#additional-information-is-stored-in-sidecar-files)
         to these headers.

 #### Additional information is stored in sidecar files

 Additional information needed for C++/Rust interop can be stored in sidecar
 files, similarly to Swift APINotes, CLIF etc. If sidecar files get sufficiently
 broad adoption (for example, if annotating third-party code turns out to be
 sufficiently important that optimizing C++/Rust interop ergonomics there would
 be worth it), it would make sense to write sidecar files in a Rust-like
 language, as that provides the most natural way to define Rust APIs.

 **Pros**

 *   **Sidecar files enable more broad adoption of annotations** by providing
     additional interop information without modifying C++ headers. Sidecar files
     will allow us to annotate headers in third-party code, headers that can't
     adopt annotations for technical reasons, or headers owned by first-party
     owners who are not cooperating.

 **Cons**

 *   Like in the
     [Use Rust code to customize API projection into Rust](#use-rust-code-to-customize-api-projection-into-rust)
     alternative, **some part of C++ API information is duplicated**, which is a
     burden for the C++ API owners.
 *   The projection of C++ APIs to Rust is defined in a new language.
     *   C++ API owners and Rust users will have to learn this language.
     *   If we expect wide adoption of sidecar files, we will need to create
         tooling to parse, edit, and run LSCs against this language.
 *   **Annotations in sidecar files are more prone to become out of sync with the
     C++ code.** When making changes to C++ code, engineers are less likely to
     notice and update the annotations in sidecar files.
     *   Presubmits can catch some cases of desynchronization between C++ headers
         and sidecar filles. However, presubmit errors that remind engineers to
         edit more files create an inferior user experience.
 *   **Sidecar files create extra friction to modify the code.** Where previously
     one had to edit only a C++ header and a C++ source file, now one also likely
     needs to update a sidecar file.
     *   When engineers realize that they need to update a sidecar file, opening
         another file and finding the right place to update creates extra
         friction to modify code.
     *   Once engineers understand the extra maintenance burden associated with
         sidecar files that tend to go out of sync with headers, they will be
         less likely to adopt annotations in the first place.

 ### Glue code generation

 C++/Rust interop tooling will generate executable glue code and type definitions
 in Rust and in C++ (not just merely `extern "C"` function declarations) in order
 to achieve the following goals:

 *   **Enable instantiating C++ templates from Rust, and monomorphizing Rust
     generics from C++. Enable Rust types to participate in C++ inheritance
     hierarchies.**
     *   For example, imagine Rust code using an object of type
         `std::vector<MyProtobuf>`, while C++ code in the same program is never
         instantiating this type. The Bazel `rust_library` target that mentions
         this type must therefore be responsible for instantiating this template
         and linking the resulting executable code into the final program. We
         propose that this instantiation happens in an automatically generated
         "glue" C++ translation unit that is a part of that `rust_library`.
 *   **Enable automatically wrapping C++ code to be more ergonomic in Rust.** For
     example:
     *   `extern "C"` functions in Rust are necessarily unsafe (it is a language
         rule). We would like the vast majority of C++ API projections into Rust
         to be safe. In the current Rust language, we can achieve that only by
         wrapping the unsafe `extern "C"` function in a safe function marked with
         `#[inline(always)]`.
     *   C++ API owners can provide rules for automatic type bridging, for
         example, mapping C++ `absl::StatusOr` to Rust `Result`. This conversion
         necessitates generation of a Rust wrapper function around a C++ entry
         point that takes advantage of such type bridging.
 *   **Provide stable locations (C++ modules, Rust crates) that "own" the types
     from the language point of view.**
     *   For example, when we project a C++ type into Rust, its Rust definition
         must be located in a Rust crate. Furthermore, all Rust users of this
         type must observe it as being defined in the same crate in order for
         every users to consider that they use the same type. Indeed, this is a
         rule in Rust, that types defined in different crates are unrelated
         types.
     *   When we project a Rust type into C++ we could repeat its C++ definition
         in C++ code any number of times (for example, in every C++ user of a
         Rust type). This is technically fine because C++ allows the same type to
         be defined multiple types within a program. Nevertheless, such
         duplication is error-prone.

 ### Glue code is generated as C++ and Rust source code

 Interop tooling will generate glue code as C++ and Rust source files, which are
 then compiled with an unmodified compiler for that language. The alternative is
 to generate LLVM IR or object files with machine code directly from interop
 tooling.

 **Pros**

 *   **It is easy to inject customizations provided by API owners into generated
     source code.**
     *   The customizations will be written in the target language, making it
         (hopefully) intuitive to write them.
 *   **Generated source code can be easily inspected by compiler engineers**
     while debugging interop problems and compiler bugs.
 *   **Generated source code can be inspected and understood by interop users,**
     who are not compiler experts.
     *   LLVM IR wouldn't be meaningful to them.
 *   **Generated source code is processed by the regular toolchain like any other
     code in the project.**
     *   It automatically benefits from all performance optimizations and
         sanitizers that are newly implemented in Clang and Rust compilers.
 *   **We avoid adding a new tool that generates unique LLVM IR patterns.**
     *   We avoid making the job of the C++ toolchain maintainers harder.

 **Cons**

 *   **Interop tooling will be limited to generating LLVM IR and machine code
     that Clang and Rust compilers can generate.**

 ### Glue code and API projections will assume implementation details of the target execution environment

 To provide the most ergonomic and performant interop, C++/Rust interop tooling
 will allow the target codebase to opt into assuming various implementation
 details of the target execution environment. For example:

 *   When calling C++ from Rust, interop tooling can either wrap C++ functions in
     thunks with a C calling convention, or call C++ entry points directly.
     Thunks cause code bloat and can collectively add up to become a performance
     problem, so it is desirable to call C++ entry points from Rust directly.
     Interop tooling can do that only if it may assume a specific target platform
     and C++ ABI.

 Implementation details of the target execution environment that are considered
 stable enough will be reflected in API projections, for example:

 *   The C++ standard does not specify sizes of integer types (`short`, `int`,
     `long` etc.) To map them to Rust, interop tooling will need to assume a size
     that they have on the platform that targets in practice. The alternative
     would be to create target-agnostic integer types (for example, `Int` in
     Swift is a strong typedef for `Int32` on 32-bit targets, and `Int64` on
     64-bit targets), but this makes it harder to provide idiomatic, transparent,
     high-performance interop.
 *   The C++ standard does not specify whether standard library types like
     `std::vector` are trivially relocatable; it is an implementation detail.
     Universal interop tooling would have to conservatively assume
     non-trivially-relocatable types. Interop tooling specific to certain
     environments can rely on libc++ providing a trivially-relocatable
     `std::vector` and project it into Rust in a much more ergonomic way.

 **Pros**

 *   **Interop tooling will generate the most performant code sequences** to call
     foreign language functions.
     *   If interop tooling generates portable code, it would have some overhead.
         The overhead can be eliminated by C++ and Rust optimizers at least in
         some cases, but at the cost of increased build times. For example,
         eliminating thunks would require turning on LTO, which is not fast, and
         usually only used for release builds. It is much preferable to not
         generate thunks in the first place, if the target platform does not need
         them.
 *   **Ergonomics of API projections will be improved.**
     *   For example, whether a C++ type is trivially relocatable or not is an
         implementation detail in C++, transparent to C++ users of that type, but
         it makes a huge ergonomic difference in the Rust API projection.

 **Cons**

 *   **C++ code will have additional evolution constraints.**
     *   For example, changing a type from trivially relocatable to non-trivially
         relocatable is a non-API-breaking change for C++ users, but it would
         break Rust users.
 *   **It would be more difficult to switch internal environments to a different
     C++ standard library.**
 *   **Code that is deployed in environments that have incompatible
     implementation details won't be able to use this C++/Rust interop system.**
     *   Alternatively, these executables would have to bring a suitable
         execution environment with them (e.g., a copy of libc++).

 ### Interop tooling should be maintainable and evolvable for a long time

 We should design and implement C++/Rust interop tooling in such a way that we
 can maintain and evolve it for more than a decade. If Rust becomes tightly
 integrated into an existing C++ project, specific requirements for interop and
 API projection rules will keep changing. The more Rust adoption we will have,
 the more library and team-specific interop customizations we will have to
 support, and the more it will make sense for the performance team to tweak
 generated code to implement sweeping optimizations. These kinds of changes
 should be readily possible, and they should not create conflicts of interest
 between diferent users of the interop tooling.

 ### Interop tooling should facilitate C++ to Rust migration

 C++/Rust interop tooling should try to create a favorable environment for
 migrating C++ code to Rust. Specifically, projections of C++ APIs into Rust
 should be implementable in Rust. This way, a C++ library can be converted from
 C++ into Rust transparently for its users, as its public API won't change.

 ## Alternatives Considered: Design decisions

 ### Repeat C++ API completely in a separate IDL

 Instead of reading C++ headers in the interop tooling, we would require the user
 to repeat the C++ API in some other form, for example, in a Rust-based IDL like
 in the cxx crate, or in sidecar files in a completely new format.

 **Pros**

 *   **Interop tooling can be simpler if it does not have to read C++ headers**.
     But even under this alternative approach, tooling might want to read C++
     headers, nullifying this advantage. For example, tooling might want to
     automatically generate an initial Rust snippet or to suggest in presubmits
     to adjust the Rust code that mirrors a C++ API when that C++ API changes.
 *   The **most natural way to define Rust APIs** is by using Rust code or
     Rust-like syntax in sidecar files.
 *   **Available Rust APIs are defined in easily accessible checked-in files.**
 *   **API definitions written by a human might have higher quality, on
     average.**

 **Cons**

 *   **A big part of the C++ API needs to be duplicated** to reliably match the
     Rust code with the C++ declarations. The initial code can be generated by
     tooling, but it has to be kept in sync. This is a burden for the C++ API
     owners, potentially a bigger one than allowing annotations in C++ headers.
     *   There is a risk that C++ API owners might refuse to own IDL files.
 *   The need to create a sidecar file creates a **barrier to start using C++
     libraries from Rust.**
     *   While the duplication overhead is justifiable for widely-used libraries,
         it is relatively high for libraries with few users and binaries, making
         it less likely that leaf teams will start adopting Rust.
 *   **When the C++ API is changed, the Rust definitions become out-of-sync with
     it.** Tooling needs to detect this, and the Rust definitions need to be
     changed (either manually or tool-assisted).
 *   There is no effective way to verify Rust binding code at the presubmit time
     of a C++ library other than building downstream projects.
 *   **Mapping Rust API definitions to the original C++ API definitions is more
     complicated and error-prone**. For example, how would we target a specific
     overload of a function or constructor?
 *   There is a **risk that individual teams will build team-specific tooling
     that generates IDL files** from C++ headers or generates both IDL files and
     C++ headers from a single source. These solutions are unlikely to scale to
     existing large codebases and will likely only work for that specific team.

 ### Use Rust code to customize API projection into Rust

 An alternative to storing additional information in C++ headers is to put it
 into Rust code. For example, the cxx crate requires users to re-state the C++
 API in Rust syntax, adding information about lifetimes and nullability. The pros
 and cons of this choice are the same as when defining a special IDL that repeats
 the C++ API completely (see above).

 ### Generate glue code in binary formats

 Instead of generating glue code as textual sources, interop tooling could use
 Clang and LLVM APIs to emit object files with C++ glue code and use Rust
 compiler APIs to generate rmeta and rlib files with Rust glue code.

 **Pros**

 *   **More flexibility in the code that can be generated.** Controlling LLVM IR
     generation allows interop tooling to generate code that an unmodified
     compiler can't generate from textual source code. For example, the Rust
     language does not have any constructs that map to `linkonce_odr` functions
     in LLVM IR; if the interop tooling embedded the Rust compiler as a library
     and had more control over how it generates the IR, we could make that
     happen.

 **Cons**

 *   Injecting customizations provided by API owners is harder.
 *   LLVM, Clang, and Rust compiler APIs are not stable. The format of Rust
     metadata files is not stable either. The larger the API subset we consume
     from Clang and Rust, the more difficult it becomes to maintain the tooling.
 *   To generate object files the interop tooling has to ensure that its
     Clang/LLVM version and configuration is identical with the Clang compiler
     used to build other C++ code.
     *   We can solve this problem, but it makes the system more fragile,
         compared to using existing C++ and Rust compilers to compile generated
         sources.
 *   From time to time LLVM introduces bugs that cause miscompilations. If
     interop tooling embeds LLVM, we would be adding another tool that toolchain
     engineers will need to look into when debugging a miscompilation. We would
     be making the job of C++ toolchain maintainers harder.

 ## Alternatives Considered: Existing tools

 ### bindgen

 [bindgen](https://rust-lang.github.io/rust-bindgen/) **automatically generates
 Rust bindings from C and C++ headers**, which it consumes using libclang. The
 generated **bindings are pure Rust code** that interfaces with C and C++ using
 Rust’s [built-in FFI for C](https://doc.rust-lang.org/nomicon/ffi.html)
 (`#[repr(C)]` to indicate that a struct should use C memory layout and `extern
 "C"` to indicate that a function should use a C calling convention). C++
 functions are handled by generating a Rust `extern "C"` function that has the
 same ABI as the C++ function and attaching a `link_name` attribute with the
 mangled name.

 See
 [here](https://manishearth.github.io/blog/2021/02/22/integrating-rust-and-c-plus-plus-in-firefox/)
 for an in-depth description of the use of bindgen in Stylo, a Rust component in
 Firefox.

 **Pros**

 *   **The oldest and the most mature** of the existing C++ interop tools
     (developed
     [since Feb 2012](https://github.com/rust-lang/rust-bindgen/commit/9fe92b0cfd48d5ebd1c82af8b1ff041f8c416a65)).

 **Cons**

 *   **Deficiencies in safety and ergonomics**, for example:
     *   References are imported as pointers. No lifetimes, no null-safety.
     *   Constructors and destructors are not called automatically.
     *   Overloads are distinguished by a numbered suffix in Rust. These numbers
         clutter the source code and are hard to remember, as they have no
         meaning. Adding overloads can change the numbering and hence break Rust
         callers.
 *   It is **impossible to use C++ inline functions and templates** from Rust
     because of bindgen’s architecture[^1]. The architecture is unlikely to
     change, and therefore, this is a dealbreaker.

 **Evaluation**

 bindgen could be used in a project that has very limited C++ interop needs.
 However, creating safe and ergonomic wrappers for the generated bindings would
 require additional effort. Our vision and goals for C++ interop are very
 different from what bindgen provides.

 ### cbindgen

 [cbindgen](https://github.com/eqrion/cbindgen) **automatically generates C or
 C++ headers for Rust libraries which expose a public C API**.

 **Pros**

 *   **An old and mature tool** (developed
     [since March 2017](https://github.com/eqrion/cbindgen/commit/215d3a987b223d4a1a878e2385c8677d5ae3a80b)).

 **Cons**

 *   **Shallow understanding of Rust's modules and types**.

     *   [`cbindgen`'s docs](https://github.com/eqrion/cbindgen/blob/master/docs.md)
         point out that "A major limitation of cbindgen is that it does not
         understand Rust's module system or namespacing. This means that if
         cbindgen sees that it needs the definition for MyType and there exists
         two things in your project with the type name MyType, it won't know what
         to do. Currently, cbindgen's behaviour is unspecified if this happens."
     *   This limitation seems mostly caused by building `cbindgen` on top of
         [the `syn` crate](https://docs.rs/syn). `syn` is able to parse Rust
         source code into an AST, but there is no facility at the `syn` level for
         type deduction or module traversal. Building such functionality would
         require replicating parts of the `rustc` compiler into `cbindgen`, or
         alternatively rewriting `cbindgen` on top of
         [the `rustc_driver` crate](https://doc.rust-lang.org/stable/nightly-rustc/rustc_driver/)).

 *   **Support of only `extern "C"` functions**.

     *   Supporting Rust functions that use the default calling convention would
         require generating not only C/C++ headers, but also generating Rust
         source with `extern "C"` thunks that trampoline into the original
         function (requiring that `cbindgen` starts generating Rust sources).

 *   **Support of only `#[repr(C)]` structs**.

     *   Default memory layout of Rust structs is
         [unspecified](https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#default-layout-repr-rust:~:text=the%20default%20layout%20of%20structs%20is%20not%20specified)
         and therefore cannot be determined by code examination at the `syn`
         level.
     *   Even if the memory layout could be determined, the layout can change in
         a future compiler version, or change depending on compilation command
         line flags. To prevent using stale layout information, the
         auto-generated FFI code should therefore include compile-time assertions
         that the layout didn't change from the FFI generation time. The
         assertions should be present both in the generated C/C++ headers *and*
         on the Rust side (requiring that `cbindgen` starts generating Rust
         sources). The assertions would effectively verify that the FFI
         generation is driven by the build system (i.e. by Bazel, or Cargo, or
         GN/ninja, rather than manually) and that the integration between the FFI
         tools and the build system doesn't have any bugs (e.g. that it
         faithfully replicates all relevent compilation flags).

 **Evaluation**

 cbindgen could be used in a project that can create a narrow `extern "C"` /
 `#[repr(C)]` API and that is ready to manage the risk of incorrect name/module
 resolution. Wrapping additional Rust APIs would require extra effort.

 **Take-aways for Crubit design**

 Notes and observations about `cbindgen` can guide some design aspects of
 Crubit's [`cc_bindings_from_rs`](../cc_bindings_from_rs/README.md) tool
 (that similarly to `cbindgen` generates C++ bindings for Rust crates).
 Using internal compiler knowledge (e.g. memory layout of structs, name and type
 resolution) requires that `cc_bindings_from_rs` depends on
 `rustc_driver` and other internal crates of `rustc`. The API of these crates is
 unstable which might increase the risk and maintenance cost of Crubit.
 Nevertheless, our experience with maintaining tools based on (also unstable)
 Clang APIs suggests that this extra risk and cost is likely going to be
 acceptable.

 Build determinism requires that the Rust compiler produces the same output for
 the same set of inputs (the same compiler version, the same command-line flags,
 the same sources, etc.). This means that (despite
 [conservative reservations about layout determinism](https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#default-layout-repr-rust:~:text=A%20note%20on%20determinism))
 it should be okay to assume that `cc_bindings_from_rs` and `rustc` invocations
 will observe the same memory layout of structs, but this requires that
 `cc_bindings_from_rs` is built against exactly the same version of
 `rustc_driver` libraries as `rustc`. (This should also be reinforced by
 compile-time assertions in the generated FFI layer.)

 ### cxx

 [cxx](https://cxx.rs/) generates **Rust bindings for C++ APIs and vice versa**
 from an **interface definition language (IDL) included inline in Rust source
 code.** cxx generates Rust and C++ source code from IDL definitions. To check
 that the IDL definitions match the actual C++ API, cxx inserts static
 assertions[^2] into the generated C++ code; it does not, however, read the C++
 headers itself. cxx contains built-in bindings for various Rust and C++ standard
 library types that are not customizable.

 As far as we understand, cxx has the following design constraints and goals:

 *   **Ship a stable product for its intended audience.**
     *   As a consequence, improvements such as integrating move semantics are
         not going to be accepted soon. We understand that cxx is not a vehicle
         for experimentation. cxx maintainers would prefer us to first show that
         our ideas work in a fork of cxx or in a different system, such as
         autocxx, and that our improvements pull their weight given the added
         complexity.
 *   **Remain simple and transparent.** There is a limit on the amount of
     complexity that will be tolerated.
     *   There is a chance that improvements such as modeling C++ move semantics
         or various attempts at eliminating thunks will not be ever accepted in
         upstream cxx.
 *   **Non-goal: Automatically provide high fidelity interop.**
     *   cxx is designed for the use case of an executable where C++ and Rust
         parts communicate through a narrow interface.
 *   **Non-goal: Automatically provide the most performant interop in as many
     cases as possible.** For example:
     *   cxx does not attempt to eliminate C++-side thunks. Instead, using LTO is
         recommended.
     *   cxx considers it acceptable to allocate all objects of "opaque" types on
         the heap. Users who find these heap allocations unacceptable for
         performance reasons are expected to implement a different C++ entry
         point that does not hit this limitation and bind it to Rust instead of
         the original C++ API. Heap allocation is acceptable for many C++ classes
         in most environments, but the exceptions are important enough for us
         that this is a major restriction.

 **Pros**

 *   **Mature and ergonomic enough today for mixing C++ and Rust in existing
     codebases with limited C++ interop needs.**
 *   We avoid being on a tech island.

 **Cons**

 *   cxx’s stability goal makes it **hard to experiment with how the Rust API
     looks.**
 *   **Our goals are unlikely to align well with the goals of the intended user
     audience of cxx.** We would be pulling cxx in directions that make it a
     worse product for its current users.
 *   **Almost no customizability**. Users who are not satisfied with what cxx
     does are expected to wrap the target C++ API in a different C++ API that is
     more friendly to cxx.
 *   cxx tries to be compatible with most standard C++ implementations found in
     the real world, so it **cannot take advantage of unique guarantees provided
     by the target execution environment.**

 **Evaluation**

 cxx could be used in projects with limited C++/Rust interop requirements.
 However, we would not be able to implement many interop features that we
 consider essential (for example, move semantics, templates).

 ### autocxx

 [autocxx](https://github.com/google/autocxx) **automatically generates Rust
 bindings from C++ headers**. As the name implies, it automatically generates IDL
 definitions for cxx, which then produces the actual bindings. In addition,
 autocxx generates its own Rust and C++ code to extend the Rust API beyond what
 cxx itself would provide, for example to support passing POD types by value.
 autocxx consumes C++ headers indirectly by first running bindgen on them and
 then parsing the Rust code output by bindgen.

 autocxx’s
 [design goals](https://www.chromium.org/Home/chromium-security/memory-safety/rust-and-c-interoperability)
 are similar to our own in this document.

 We did a case study on using an existing project's C++ API from Rust using
 autocxx.

 **Pros**

 *   **Low barrier to entry**: Bindings are generated from C++ headers, no need
     to write duplicate API definitions.
 *   **Ergonomic mappings** for many C++ constructs.
 *   **Open to contributions that change the generated Rust APIs** or make
     architectural changes.

 **Cons**

 *   **Relatively new and immature.**
 *   **Cannot (yet) consume complex headers without errors.** We’ve managed to
     import some actual Spanner headers, but there are still enough outstanding
     issues that we can’t yet do anything useful with Spanner.
 *   **Architecture can make modifications difficult.** autocxx is built on top
     of two other tools, bindgen and cxx, and the interfaces between these
     components can make it harder to make a modification than it would be in a
     monolithic tool. Specifically:
     *   autocxx uses bindgen to generate a description of the C++ API that it
         can parse easily (as opposed to trying to parse C++ headers either
         directly or using Clang APIs). Since bindgen was not intended for this
         purpose, its output lacks some information that autocxx needs, so
         autocxx [has forked](https://crates.io/crates/autocxx-bindgen) bindgen
         to adapt it to its needs. The forked version emits additional
         information about the C++ API in the form of attributes attached to
         various API elements.
     *   bindgen in turn is built on the libclang API, which doesn’t surface all
         of the functionality available through Clang’s C++ API. Adding features
         to libclang requires additional effort and has a 6 month lead time to
         appear in a stable release (to become eligible to be used from bindgen).
     *   When errors occur, it can be hard to figure out which of the components
         is responsible.
     *   Adding features can require touching multiple components, which requires
         commits to multiple repositories.

 **Evaluation**

 We initially intended to use autocxx to prototype various interop ideas and
 potentially as a basis for a field trial. We still believe this would be
 feasible, but after trying to modify autocxx and its bindgen fork during an
 internal C++/Rust interop study, we feel that autocxx’s complex architecture is
 enough of an impediment that we could achieve our goals with less total effort
 by creating an interop tool from scratch that consists of a single codebase and
 uses the Clang C++ API to directly interface with Clang.

 [^1]: Doing so would require either generating C++ source code or interfacing
     deeply enough with Clang to generate object code for inline functions and
     template instantiation.
 [^2]: And tricks such as suitable type conversions that force the C++ compiler
     to perform appropriate checks at compile time.