Rust interop design documentation. PiperOrigin-RevId: 448656933

commit: 85ea7778d949e8645cc048fa613884b3abf7d62e [log] [tgz]
author: Googler <no-reply@google.com> Sat May 14 01:29:31 2022 -0700
committer: Copybara-Service <copybara-worker@google.com> Sat May 14 01:29:59 2022 -0700
tree: f2a8424710954bdb9c19e4642d2992733ba16252
parent: cc262635a041cc76f04bf20d146b53c096d70833 [diff]
diff --git a/docs/design.md b/docs/design.md
new file mode 100644
index 0000000..628b893
--- /dev/null
+++ b/docs/design.md

@@ -0,0 +1,649 @@
+## Introduction
+
+This document describes the high-level design choices of Crubit, a C++/Rust
+Bidirectional Interop Tool.
+
+## C++/Rust interop goal
+
+**The primary goal of C++/Rust interop tooling is to enable Rust to be used
+side-by-side with C++ in large existing codebases.**
+
+In the short term we would
+like to focus on codebases that roughly follow the Google C++ style guide to improve the interop
+fidelity. Other, more diverse codebases are possible prospective users
+in the long term, and their needs will be addressed by customization and
+extension points.
+
+## C++/Rust interop requirements
+
+In support of the interop goal, we identify the following requirements:
+
+1. **Enable using existing C++ libraries from Rust with high fidelity**
+    *   **High fidelity means that interop will make C++ APIs available in Rust,
+        even when those API projections would not be idiomatic, ergonomic,
+        or safe** in Rust, to facilitate cheap, small step incremental migration
+        workflow. Based on the experience of other cross-language
+        interoperability systems and language migrations (for example,
+        Objective-C/Swift, Java/Kotlin, JavaScript/TypeScript), we believe that
+        working in a mixed C++/Rust codebase would be significantly harder if
+        some C++ APIs were not available in Rust.
+    *   **Interop will bridge C++ constructs to Rust constructs only
+        when the semantics match closely**. Bridging large semantic gaps creates
+        a risk of making C++ APIs unusable in Rust, as well as a risk of
+        creating performance problems. For example, interop will not bridge
+        destructive Rust moves and non-destructive C++ moves; instead it will
+        make C++ move constructors and move assignment operators available to
+        use in Rust code. As another example, interop will not bridge C++
+        templates and Rust generics by default.
+    *   Interop should be **performant**, as close to having no runtime cost
+        as possible. The performance costs of the interop should be documented,
+        and where possible, intuitive to the user.
+    *   Interop should be **ergonomic and safe**, as long as ergonomic and
+        safety accommodations do not hurt performance or fidelity.
+        Where a tradeoff is possible, the interop will choose performance
+        and fidelity over ergonomics; the user will be allowed to override
+        this choice.
+    *   **Enable owners of the C++ API to control their Rust API projection**,
+        for example, with attributes in C++ headers and by extending generated
+        bindings with a manually implemented overlay. Such an overlay will
+        wrap or extend generated bindings to improve ergonomics and safety.
+2. **Enable using Rust libraries from C++**
+    *   However, using C++ libraries from Rust has a higher priority than
+        using Rust libraries from C++.
+3. **Put little to no barriers to entry**
+    *   **Ideally, no boilerplate code** needs to be written in order to
+        start using a C++ library from Rust. Adding some extra information
+        can make the generated bindings more ergonomic to use.
+    *   The amount of **duplicated API information is minimized**.
+    *   **Future evolution of C++ APIs should be minimally hindered
+        by the presence of Rust users**.
+
+## Proposal and high-level design
+
+**We propose to develop our own C++/Rust interop tooling.** There are no
+existing tools that satisfy all of our requirements. Modifying an existing
+tool to fulfill these requirements would take more effort than building a new
+tool from scratch or might require forking its codebase given that some existing
+tools have goals that conflict with our goals.
+
+See the "alternatives considered" section for a discussion of existing tools.
+
+### Source of information about C++ API
+
+**Interop tooling will read C++ headers**, as they contain the information
+needed to generate Rust API projections and the necessary glue code. Interop
+tooling that is used during builds will not read C++ source files,
+to maintain the principle that C++ API information is only located in headers,
+and that a C++ library can't break the build of its dependencies by
+changing source files.
+
+Some interop-adjacent tools (e.g., large-scale refactoring tools that seed the initial set of
+lifetime annotations) will also read C++ sources. These tools will not be
+used during builds.
+
+**Pros**
+
+*   **Minimal barrier to entry**: minimal amount of manual work is
+    required to start using a C++ library from Rust.
+    *   Encourages leaf projects to start incrementally adopting Rust in
+        new code, or incrementally rewriting C++ targets in Rust.
+*   **C++ API information is located only in headers**,
+    regardless of the language that the API consumer is
+    written in (C++ or Rust).
+*   **Interop tooling that generates Rust API projections
+    from a C++ header can get exactly the same information that the C++
+    compiler has** when processing a translation unit that uses one of the APIs
+    declared within that header.
+    *   Interop tooling can generate the most performant calls to C++ APIs,
+        without C++-side thunks that translate the C++ ABI into a C ABI.
+    *   Interop tooling can autodetect implementation details that are critical
+        for interop but are not a part of the API surface (for example, the size
+        and alignment of C++ classes that have private data members).
+    *   In alternative solutions, users need to repeat these implementation
+        details in sidecar files. Interop can verify that the specified
+        information is correct through static assertions in generated C++ code,
+        but the overall user experience is inferior.
+
+**Cons**
+
+*   **Having to read C++ headers makes interop tooling more complex.**
+*   **The Rust projection of the C++ API is only visible in machine-generated
+    files.**
+    *   These are not trivially accessible.
+    *   There is a limit on how readable these files can be made.
+    *   We can mitigate these issues by building tooling that shows the Rust
+        view of a C++ header (for example in Code Search, or in editors as
+        an alternative go-to-definition target).
+
+### Customizability
+
+Interop tooling will be sufficiently customizable to accommodate the unique
+needs of different C++ libraries in the codebase. Interop should be customizable
+enough to accommodate existing codebases. C++ API owners can:
+
+*   **Guide how interop tooling generates Rust API projections
+    from C++ headers**. For example, headers can provide:
+    *   Custom Rust names for C++ function overloads
+        (instead of applying the general interop strategy for
+        function overloads),
+    *   Custom Rust names for overloaded C++ operators,
+    *   Custom Rust lifetimes for pointers and references mentioned in
+        the C++ API,
+    *   Nullability information for pointers in the C++ API,
+    *   Assertions (verified at compile time) and promises (not verified
+        by tooling) that certain C++ types are trivially relocatable.
+*   **Provide custom logic to bridge types**, for example, mapping C++
+    `absl::StatusOr` to Rust `Result`.
+*   **Provide API overlays** that improve the automatically generated Rust API.
+    *   For example, the overlays could inject additional methods into
+        automatically generated Rust types or hide some of the generated methods.
+
+More intrusive customization techniques will be useful for template and
+macro-heavy libraries where the baseline
+import rules just won't work. We believe customizability will be an essential
+enabler for providing high-fidelity interop.
+
+### Source of additional information that customizes C++ API projection into Rust
+
+Where C++ headers don't already provide all information necessary for interop
+tooling to generate a Rust API projection, we will add such information to C++
+headers whenever possible. If it is not desirable to edit a certain C++ header,
+extra information can be stored in a sidecar file.
+
+Examples of additional information that interop tooling will need:
+
+*   **Nullability annotations.** C++ APIs often expose pointers that are
+    documented or assumed by convention to be never null, but can't be
+    refactored to references due to language limitations
+    (for example, `std::vector<MyProtobuf *>`). If C++ headers don't provide
+    nullability information for pointers in a machine-readable form, interop
+    tooling has to conservatively mark all C++ pointers as nullable in the Rust
+    API projection. The Rust compiler will then force users to write unnecessary
+    (and untestable) null checks.
+*   **Lifetimes of references and pointers** in C++ headers are not described
+    in a machine-readable way (and sometimes are not even documented in prose).
+    Lifetime information is essential to generate safe and idiomatic Rust APIs
+    from C++ headers.
+
+#### Additional information is stored in C++ headers
+
+**Pros**
+
+*   **Additional information needed for C++/Rust interop will be expressed
+    as annotations on existing syntactic elements in C++.**
+    *   The annotations are located in the most logical place.
+    *   The annotations are more likely to be noticed and updated by
+        C++ API owners.
+    *   API owners retain full control over how the API looks in Rust.
+*   **C++ users may find lifetime and nullability annotations useful.** For
+    example, information about lifetimes is highly important to C++ and
+    Rust users alike.
+*   **C++ API definitions are only written once,** minimizing duplication
+    and maintenance burden.
+
+**Cons**
+
+*   **Annotations that benefit Rust users can bother C++ API owners** who don't
+    care about Rust. Especially at the beginning of integrating Rust into an existing codebase, C++ API
+    owners can push back on adding annotations.
+    *   To encourage adoption of annotations, we can develop tooling for C++
+        that uses lifetime and nullability annotations to find bugs in C++ code.
+    *   The pushback is likely to be short-term: if Rust takes off in a C++ codebase,
+        C++ library owners in that codebase will need to care about Rust users
+        and how their API looks in Rust.
+*   **There may be headers that we cannot (or would not want to) change**,
+    for example, headers in third-party code, headers that are open-sourced,
+    or when first-party owners are not cooperating.
+    *   We can apply the sidecar strategy to these headers.
+
+#### Additional information is stored in sidecar files
+
+Additional information needed for C++/Rust interop can be stored in sidecar
+files, similarly to Swift APINotes, CLIF etc. If sidecar files get sufficiently
+broad adoption (for example, if annotating third-party code turns out to be sufficiently
+important that optimizing C++/Rust interop ergonomics there
+would be worth it), it would make sense to write sidecar files in a Rust-like
+language, as that provides the most natural way to define Rust APIs.
+
+**Pros**
+
+*   **Sidecar files enable more broad adoption of annotations** by providing
+    additional interop information without modifying C++ headers. Sidecar files
+    will allow us to annotate headers in third-party code, headers that can't
+    adopt annotations for technical reasons, or headers owned by first-party
+    owners who are not cooperating.
+
+**Cons**
+
+*   Like in the [Use Rust code to customize API projection into Rust](#use-rust-code-to-customize-api-projection-into-rust)
+    alternative, **some part of C++ API information is duplicated**,
+    which is a burden for the C++ API owners.
+*   The projection of C++ APIs to Rust is defined in a new language.
+    *   C++ API owners and Rust users will have to learn this language.
+    *   If we expect wide adoption of sidecar files, we will need to create
+        tooling to parse, edit, and run LSCs against this language.
+*   **Annotations in sidecar files are more prone to become out of sync with
+    the C++ code.** When making changes to C++ code, engineers are less likely
+    to notice and update the annotations in sidecar files.
+    *   Presubmits can catch some cases of desynchronization between C++ headers
+        and sidecar filles. However, presubmit errors that remind engineers
+        to edit more files create an inferior user experience.
+*   **Sidecar files create extra friction to modify the code.** Where previously
+    one had to edit only a C++ header and a C++ source file, now one also likely
+    needs to update a sidecar file.
+    *   When engineers realize that they need to update a sidecar file,
+        opening another file and finding the right place to update creates
+        extra friction to modify code.
+    *   Once engineers understand the extra maintenance burden associated
+        with sidecar files that tend to go out of sync with headers, they will
+        be less likely to adopt annotations in the first place.
+
+### Glue code generation
+
+C++/Rust interop tooling will generate executable glue code and type definitions
+in Rust and in C++ (not just merely `extern "C"` function declarations) in order
+to achieve the following goals:
+
+*   **Enable instantiating C++ templates from Rust, and
+    monomorphizing Rust generics from C++. Enable Rust types to participate in
+    C++ inheritance hierarchies.**
+    *   For example, imagine Rust code using an object of type
+        `std::vector<MyProtobuf>`, while C++ code in the same program is
+        never instantiating this type. The Bazel `rust_library` target that mentions this
+        type must therefore be responsible for instantiating this template and
+        linking the resulting executable code into the final program. We propose
+        that this instantiation happens in an automatically generated "glue" C++
+        translation unit that is a part of that `rust_library`.
+*   **Enable automatically wrapping C++ code to be more ergonomic in Rust.**
+    For example:
+    *   `extern "C"` functions in Rust are necessarily unsafe (it is a language
+        rule). We would like the vast majority of C++ API projections into Rust
+        to be safe. In the current Rust language, we can achieve that only by
+        wrapping the unsafe `extern "C"` function in a safe function marked with
+        `#[inline(always)]`.
+    *   C++ API owners can provide rules for automatic type bridging,
+        for example, mapping C++ `absl::StatusOr` to Rust `Result`.
+        This conversion necessitates generation of a Rust wrapper function
+        around a C++ entry point that takes advantage of such type bridging.
+*   **Provide stable locations (C++ modules, Rust crates) that "own" the types
+    from the language point of view.**
+    *   For example, when we project a C++ type into Rust, its Rust definition
+        must be located in a Rust crate. Furthermore, all Rust users of this
+        type must observe it as being defined in the same crate (otherwise,
+        two identical type definitions in different crates are unrelated types
+        in Rust).
+    *   When we project a Rust type into C++ we could repeat its C++
+        definition in C++ code any number of times (for example, in every C++
+        user of a Rust type). This is technically fine because C++ allows
+        the same type to be defined multiple types within a program.
+        Nevertheless, such duplication is error-prone.
+
+### Glue code is generated as C++ and Rust source code
+
+Interop tooling will generate glue code as C++ and Rust source files, which
+are then compiled with an unmodified compiler for that language. The alternative
+is to generate LLVM IR or object files with machine code directly from interop
+tooling.
+
+**Pros**
+
+*   **It is easy to inject customizations provided by API owners into generated
+    source code.**
+    *   The customizations will be written in the target language, making it
+        (hopefully) intuitive to write them.
+*   **Generated source code can be easily inspected by compiler engineers**
+    while debugging interop problems and compiler bugs.
+*   **Generated source code can be inspected and understood by interop users,**
+    who are not compiler experts.
+    *   LLVM IR wouldn't be meaningful to them.
+*   **Generated source code is processed by the regular toolchain like any other
+    code in the project.**
+    *   It automatically benefits from all performance optimizations and
+        sanitizers that are newly implemented in Clang and Rust compilers.
+*   **We avoid adding a new tool that generates unique LLVM IR patterns.**
+    *   We avoid making the job of the C++ toolchain maintainers harder.
+
+**Cons**
+
+*   **Interop tooling will be limited to generating LLVM IR and machine code
+    that Clang and Rust compilers can generate.**
+
+### Glue code and API projections will assume implementation details of the target execution environment
+
+To provide the most ergonomic and performant interop, C++/Rust interop tooling
+will allow the target codebase to opt into assuming various implementation
+details of the target execution environment. For example:
+
+*   When calling C++ from Rust, interop tooling can either wrap C++ functions in
+    thunks with a C calling convention, or call C++ entry points directly.
+    Thunks cause code bloat and can collectively add up to become a performance
+    problem, so it is desirable to call C++ entry points from Rust directly.
+    Interop tooling can do that only if it may assume a specific target platform
+    and C++ ABI.
+
+Implementation details of the target execution environment that are considered
+stable enough will be reflected in API projections, for example:
+
+*   The C++ standard does not specify sizes of integer types
+    (`short`, `int`, `long` etc.) To map them to Rust, interop tooling will need
+    to assume a size that they have on the platform that targets in
+    practice. The alternative would be to create target-agnostic integer types
+    (for example, `Int` in Swift is a strong typedef for `Int32` on 32-bit
+    targets, and `Int64` on 64-bit targets), but this makes it harder to
+    provide idiomatic, transparent, high-performance interop.
+*   The C++ standard does not specify whether standard library types like
+    `std::vector` are trivially relocatable; it is an implementation detail.
+    Universal interop tooling would have to conservatively assume
+    non-trivially-relocatable types. Interop tooling specific to certain
+    environments can rely on libc++ providing a trivially-relocatable
+    `std::vector` and project it into Rust in a much more ergonomic way.
+
+**Pros**
+
+*   **Interop tooling will generate the most performant code sequences**
+    to call foreign language functions.
+    *   If interop tooling generates portable code, it would
+        have some overhead. The overhead can be eliminated by C++
+        and Rust optimizers at least in some cases, but at the cost of increased
+        build times. For example, eliminating thunks would require
+        turning on LTO, which is not fast, and usually only used for release builds.
+        It is much preferable to not generate thunks in the first place,
+        if the target platform does not need them.
+*   **Ergonomics of API projections will be improved.**
+    *   For example, whether a C++ type is trivially relocatable or not is an
+        implementation detail in C++, transparent to C++ users of that type, but
+        it makes a huge ergonomic difference in the Rust API projection.
+
+**Cons**
+
+*   **C++ code will have additional evolution constraints.**
+    *   For example, changing a type from trivially relocatable to non-trivially
+        relocatable is a non-API-breaking change for C++ users, but it would
+        break Rust users.
+*   **It would be more difficult to switch internal environments to a different
+    C++ standard library.**
+*   **Code that is deployed in environments that have
+    incompatible implementation details won't be able to use this C++/Rust
+    interop system.**
+    *   Alternatively, these executables would have to bring a suitable
+        execution environment with them (e.g., a copy of libc++).
+
+### Interop tooling should be maintainable and evolvable for a long time
+
+We should design and implement C++/Rust interop tooling in
+such a way that we can maintain and evolve it for more than a decade. If Rust
+becomes tightly integrated into an existing C++ project, specific requirements for interop
+and API projection rules will keep changing. The more Rust adoption we will
+have, the more library and team-specific interop customizations we will have to
+support, and the more it will make sense for the performance team to tweak
+generated code to implement sweeping optimizations. These kinds of changes should
+be readily possible, and they should not create conflicts of interest between
+diferent users of the interop tooling.
+
+### Interop tooling should facilitate C++ to Rust migration
+
+C++/Rust interop tooling should try to create a favorable environment for
+migrating C++ code to Rust. Specifically, projections of C++ APIs into Rust
+should be implementable in Rust. This way, a C++ library can be converted from
+C++ into Rust transparently for its users, as its public API won't change.
+
+## Alternatives Considered: Design decisions
+
+### Repeat C++ API completely in a separate IDL
+
+Instead of reading C++ headers in the interop tooling, we would require the user
+to repeat the C++ API in some other form, for example, in a Rust-based IDL like
+in the cxx crate, or in sidecar files in a completely new format.
+
+**Pros**
+
+*   **Interop tooling can be simpler if it does not have to read C++ headers**.
+    But even under this alternative approach, tooling might want to read C++
+    headers, nullifying this advantage. For example, tooling might want to
+    automatically generate an initial Rust snippet or to suggest in presubmits
+    to adjust the Rust code that mirrors a C++ API when that C++ API changes.
+*   The **most natural way to define Rust APIs** is by using Rust code or
+    Rust-like syntax in sidecar files.
+*   **Available Rust APIs are defined in easily accessible checked-in files.**
+*   **API definitions written by a human might have higher quality,
+    on average.**
+
+**Cons**
+
+*   **A big part of the C++ API needs to be duplicated** to reliably match the
+    Rust code with the C++ declarations. The initial code can be generated by
+    tooling, but it has to be kept in sync. This is a burden for the C++ API
+    owners, potentially a bigger one than allowing annotations in C++ headers.
+    *   There is a risk that C++ API owners might refuse to own IDL files.
+*   The need to create a sidecar file creates a **barrier to start using C++
+    libraries from Rust.**
+    *   While the duplication overhead is justifiable for widely-used libraries,
+        it is relatively high for libraries with few users and binaries,
+        making it less likely that leaf teams will start adopting Rust.
+*   **When the C++ API is changed, the Rust definitions become out-of-sync
+    with it.** Tooling needs to detect this, and the Rust definitions need to be
+    changed (either manually or tool-assisted).
+*   There is no effective way to verify Rust binding code at the presubmit time
+    of a C++ library other than building downstream projects.
+*   **Mapping Rust API definitions to the original C++ API definitions is more
+    complicated and error-prone**. For example, how would we target a specific
+    overload of a function or constructor?
+*   There is a **risk that individual teams will build team-specific tooling
+    that generates IDL files** from C++ headers or generates both IDL files and
+    C++ headers from a single source. These solutions are unlikely to scale to
+    existing large codebases and will likely only work for that specific team.
+
+### Use Rust code to customize API projection into Rust
+
+An alternative to storing additional information in C++ headers is to put it
+into Rust code. For example, the cxx crate requires users to re-state the C++
+API in Rust syntax, adding information about lifetimes and nullability. The pros
+and cons of this choice are the same as when defining a special IDL that repeats
+the C++ API completely (see above).
+
+### Generate glue code in binary formats
+
+Instead of generating glue code as textual sources, interop tooling could use
+Clang and LLVM APIs to emit object files with C++ glue code and use Rust
+compiler APIs to generate rmeta and rlib files with Rust glue code.
+
+**Pros**
+
+*   **More flexibility in the code that can be generated.** Controlling LLVM IR
+    generation allows interop tooling to generate code that an unmodified
+    compiler can't generate from textual source code. For example, the Rust
+    language does not have any constructs that map to `linkonce_odr` functions
+    in LLVM IR; if the interop tooling embedded the Rust compiler as a library
+    and had more control over how it generates the IR, we could make that
+    happen.
+
+**Cons**
+
+*   Injecting customizations provided by API owners is harder.
+*   LLVM, Clang, and Rust compiler APIs are not stable. The format of Rust
+    metadata files is not stable either. The larger the API subset we consume
+    from Clang and Rust, the more difficult it becomes to maintain the tooling.
+*   To generate object files the interop tooling has to ensure that its
+    Clang/LLVM version and configuration is identical with the Clang compiler
+    used to build other C++ code.
+    *   We can solve this problem, but it makes the system more fragile,
+        compared to using existing C++ and Rust compilers to compile generated
+        sources.
+*   From time to time LLVM introduces bugs that cause miscompilations. If interop tooling
+    embeds LLVM, we would be adding another tool that toolchain engineers will need to
+    look into when debugging a miscompilation. We would be making the job of
+    C++ toolchain maintainers harder.
+
+## Alternatives Considered: Existing tools
+
+### bindgen
+
+[bindgen](https://rust-lang.github.io/rust-bindgen/) **automatically generates
+Rust bindings from C and C++ headers**, which it consumes using libclang. The
+generated **bindings are pure Rust code** that interfaces with C and C++ using
+Rust’s [built-in FFI for C](https://doc.rust-lang.org/nomicon/ffi.html)
+(`#[repr(C)]` to indicate that a struct should use C memory layout and
+`extern "C"` to indicate that a function should use a C calling convention). C++
+functions are handled by generating a Rust `extern "C"` function that has the
+same ABI as the C++ function and attaching a `link_name` attribute with
+the mangled name.
+
+See [here](https://manishearth.github.io/blog/2021/02/22/integrating-rust-and-c-plus-plus-in-firefox/)
+for an in-depth description of the use of bindgen in Stylo,
+a Rust component in Firefox.
+
+**Pros**
+
+*   **The oldest and the most mature** of the existing C++ interop tools.
+
+**Cons**
+
+*   **Deficiencies in safety and ergonomics**, for example:
+    *   References are imported as pointers. No lifetimes, no null-safety.
+    *   Constructors and destructors are not called automatically.
+    *   Overloads are distinguished by a numbered suffix in Rust.
+        These numbers clutter the source code and are hard to remember,
+        as they have no meaning. Adding overloads can change the numbering
+        and hence break Rust callers.
+*   It is **impossible to use C++ inline functions and templates**
+    from Rust because of bindgen’s architecture[^1]. The architecture is
+    unlikely to change, and therefore, this is a dealbreaker.
+
+**Evaluation**
+
+bindgen could be used in a project that has
+very limited C++ interop needs. However, creating safe and ergonomic wrappers
+for the generated bindings would require additional effort. Our vision and goals
+for C++ interop are very different from what bindgen provides.
+
+### cxx
+
+[cxx](https://cxx.rs/) generates **Rust bindings for C++ APIs and vice versa**
+from an **interface definition language (IDL) included inline in Rust
+source code.** cxx generates Rust and C++ source code from IDL definitions.
+To check that the IDL definitions match the actual C++ API, cxx inserts static
+assertions[^2] into the generated C++ code; it does not, however, read the C++
+headers itself. cxx contains built-in bindings for various Rust and C++ standard
+library types that are not customizable.
+
+As far as we understand, cxx has the following design constraints and goals:
+
+*   **Ship a stable product for its intended audience.**
+    *   As a consequence, improvements such as integrating move semantics are
+        not going to be accepted soon. We understand that cxx is not a
+        vehicle for experimentation. cxx maintainers would prefer
+        us to first show that our ideas work in a fork of cxx or in a different
+        system, such as autocxx, and that our improvements pull their weight
+        given the added complexity.
+*   **Remain simple and transparent.** There is a limit on the amount of
+    complexity that will be tolerated.
+    *   There is a chance that improvements such as modeling C++ move semantics
+        or various attempts at eliminating thunks will not be ever accepted in
+        upstream cxx.
+*   **Non-goal: Automatically provide high fidelity interop.**
+    *   cxx is designed for the use case of an executable where C++ and Rust
+        parts communicate through a narrow interface.
+*   **Non-goal: Automatically provide the most performant interop in as many
+    cases as possible.** For example:
+    *   cxx does not attempt to eliminate C++-side thunks. Instead, using LTO
+        is recommended.
+    *   cxx considers it acceptable to allocate all objects of "opaque" types
+        on the heap. Users who find these heap allocations unacceptable for
+        performance reasons are expected to implement a different C++ entry
+        point that does not hit this limitation and bind it to Rust instead of
+        the original C++ API. Heap allocation is acceptable for many C++ classes
+        in most environments, but the exceptions are important enough for us
+        that this is a major restriction.
+
+**Pros**
+
+*   **Mature and ergonomic enough today for mixing C++ and Rust in existing codebases
+    with limited C++ interop needs.**
+*   We avoid being on a tech island.
+
+**Cons**
+
+*   cxx’s stability goal makes it **hard to experiment with how the Rust API
+    looks.**
+*   **Our goals are unlikely to align well with the goals of the intended
+    user audience of cxx.** We would be pulling cxx in directions that make
+    it a worse product for its current users.
+*   **Almost no customizability**. Users who are not satisfied with what cxx
+    does are expected to wrap the target C++ API in a different C++ API that
+    is more friendly to cxx.
+*   cxx tries to be compatible with most standard C++ implementations found
+    in the real world, so it **cannot take advantage of unique guarantees
+    provided by the target execution environment.**
+
+**Evaluation**
+
+cxx could be used in projects with limited C++/Rust interop
+requirements. However, we would not be able to implement many interop features
+that we consider essential (for example, move semantics, templates).
+
+### autocxx
+
+[autocxx](https://github.com/google/autocxx) **automatically generates Rust
+bindings from C++ headers**. As the name implies, it automatically generates
+IDL definitions for cxx, which then produces the actual bindings. In addition,
+autocxx generates its own Rust and C++ code to extend the Rust API beyond what
+cxx itself would provide, for example to support passing POD types by value.
+autocxx consumes C++ headers indirectly by first running bindgen on them and
+then parsing the Rust code output by bindgen.
+
+autocxx’s [design goals](https://www.chromium.org/Home/chromium-security/memory-safety/rust-and-c-interoperability)
+are similar to our own in this document.
+
+We did a case study on using an existing project's C++ API from Rust
+using autocxx.
+
+**Pros**
+
+*   **Low barrier to entry**: Bindings are generated from C++ headers,
+    no need to write duplicate API definitions.
+*   **Ergonomic mappings** for many C++ constructs.
+*   **Open to contributions that change the generated Rust APIs** or
+    make architectural changes.
+
+**Cons**
+
+*   **Relatively new and immature.**
+*   **Cannot (yet) consume complex headers without errors.**
+    We’ve managed to import some actual Spanner headers, but there are still
+    enough outstanding issues that we can’t yet do anything useful with Spanner.
+*   **Architecture can make modifications difficult.** autocxx is built on
+    top of two other tools, bindgen and cxx, and the interfaces between these
+    components can make it harder to make a modification than it would be in a
+    monolithic tool. Specifically:
+    *   autocxx uses bindgen to generate a description of the C++ API that it
+        can parse easily (as opposed to trying to parse C++ headers either
+        directly or using Clang APIs). Since bindgen was not intended for this
+        purpose, its output lacks some information that autocxx needs,
+        so autocxx [has forked](https://crates.io/crates/autocxx-bindgen)
+        bindgen to adapt it to its needs. The forked version emits additional
+        information about the C++ API in the form of attributes attached
+        to various API elements.
+    *   bindgen in turn is built on the libclang API, which doesn’t surface all
+        of the functionality available through Clang’s C++ API. Adding features
+        to libclang requires additional effort and has a 6 month lead time to appear in a stable release (to become eligible to be used from bindgen).
+    *   When errors occur, it can be hard to figure out which of the components
+        is responsible.
+    *   Adding features can require touching multiple components,
+        which requires commits to multiple repositories.
+
+**Evaluation**
+
+We initially intended to use autocxx to prototype various interop ideas and
+potentially as a basis for a field trial. We still believe this would be
+feasible, but after trying to modify autocxx and its bindgen fork
+during an internal C++/Rust interop study, we feel that autocxx’s complex
+architecture is enough of an impediment that we could achieve our goals with
+less total effort by creating an interop tool from scratch that consists of
+a single codebase and uses the Clang C++ API to directly interface with Clang.
+
+[^1]: Doing so would require either generating C++ source code or interfacing
+deeply enough with Clang to generate object code for inline functions and
+template instantiation.
+
+[^2]: And tricks such as suitable type conversions that force the C++ compiler
+to perform appropriate checks at compile time.
commit	85ea7778d949e8645cc048fa613884b3abf7d62e	[log] [tgz]
author	Googler <no-reply@google.com>	Sat May 14 01:29:31 2022 -0700
committer	Copybara-Service <copybara-worker@google.com>	Sat May 14 01:29:59 2022 -0700
tree	f2a8424710954bdb9c19e4642d2992733ba16252
parent	cc262635a041cc76f04bf20d146b53c096d70833 [diff]