| # High-level design of C++/Rust interop |
| |
| This document describes the high-level design choices of Crubit, a C++/Rust |
| Bidirectional Interop Tool. |
| |
| [TOC] |
| |
| ## C++/Rust interop goal |
| |
| **The primary goal of Crubit is to enable Rust to be used side-by-side with C++ |
| in large existing codebases.** |
| |
| In the short term we would like to focus on codebases that roughly follow the |
| Google C++ style guide to improve the interop fidelity. Other, more diverse |
| codebases are possible prospective users in the long term, and their needs will |
| be addressed by customization and extension points. |
| |
| ## C++/Rust interop requirements |
| |
| In support of the interop goal, we identify the following requirements: |
| |
| 1. **Enable using existing C++ libraries from Rust with high fidelity** |
| * **High fidelity means that interop will make C++ APIs available in Rust, |
| even when those API projections would not be idiomatic, ergonomic, or |
| safe** in Rust, to facilitate cheap, small step incremental migration |
| workflow. Based on the experience of other cross-language |
| interoperability systems and language migrations (for example, |
| Objective-C/Swift, Java/Kotlin, JavaScript/TypeScript), we believe that |
| working in a mixed C++/Rust codebase would be significantly harder if |
| some C++ APIs were not available in Rust. |
| * **Interop will bridge C++ constructs to Rust constructs only when the |
| semantics match closely**. Bridging large semantic gaps creates a risk |
| of making C++ APIs unusable in Rust, as well as a risk of creating |
| performance problems. For example, interop will not bridge destructive |
| Rust moves and non-destructive C++ moves; instead it will make C++ move |
| constructors and move assignment operators available to use in Rust |
| code. As another example, interop will not bridge C++ templates and Rust |
| generics by default. |
| * Interop should be **performant**, as close to having no runtime cost as |
| possible. The performance costs of the interop should be documented, and |
| where possible, intuitive to the user. |
| * Interop should be **ergonomic and safe**, as long as ergonomic and |
| safety accommodations do not hurt performance or fidelity. Where a |
| tradeoff is possible, the interop will choose performance and fidelity |
| over ergonomics; the user will be allowed to override this choice. |
| * **Enable owners of the C++ API to control their Rust API projection**, |
| for example, with attributes in C++ headers and by extending generated |
| bindings with a manually implemented overlay. Such an overlay will wrap |
| or extend generated bindings to improve ergonomics and safety. |
| 2. **Enable using Rust libraries from C++** |
| * However, using C++ libraries from Rust has a higher priority than using |
| Rust libraries from C++. |
| 3. **Put little to no barriers to entry** |
| * **Ideally, no boilerplate code** needs to be written in order to start |
| using a C++ library from Rust. Adding some extra information can make |
| the generated bindings more ergonomic to use. |
| * The amount of **duplicated API information is minimized**. |
| * **Future evolution of C++ APIs should be minimally hindered by the |
| presence of Rust users**. |
| |
| ## Proposal and high-level design |
| |
| **We propose to develop our own C++/Rust interop tooling.** There are no |
| existing tools that satisfy all of our requirements. Modifying an existing tool |
| to fulfill these requirements would take more effort than building a new tool |
| from scratch or might require forking its codebase given that some existing |
| tools have goals that conflict with our goals. |
| |
| See the "alternatives considered" section for a discussion of existing tools. |
| |
| ### Source of information about C++ API |
| |
| **Interop tooling will read C++ headers**, as they contain the information |
| needed to generate Rust API projections and the necessary glue code. Interop |
| tooling that is used during builds will not read C++ source files, to maintain |
| the principle that C++ API information is only located in headers, and that a |
| C++ library can't break the build of its dependencies by changing source files. |
| |
| Some interop-adjacent tools (e.g., large-scale refactoring tools that seed the |
| initial set of lifetime annotations) will also read C++ sources. These tools |
| will not be used during builds. |
| |
| **Pros** |
| |
| * **Minimal barrier to entry**: minimal amount of manual work is required to |
| start using a C++ library from Rust. |
| * Encourages leaf projects to start incrementally adopting Rust in new |
| code, or incrementally rewriting C++ targets in Rust. |
| * **C++ API information is located only in headers**, regardless of the |
| language that the API consumer is written in (C++ or Rust). |
| * **Interop tooling that generates Rust API projections from a C++ header can |
| get exactly the same information that the C++ compiler has** when processing |
| a translation unit that uses one of the APIs declared within that header. |
| * Interop tooling can generate the most performant calls to C++ APIs, |
| without C++-side thunks that translate the C++ ABI into a C ABI. |
| * Interop tooling can autodetect implementation details that are critical |
| for interop but are not a part of the API surface (for example, the size |
| and alignment of C++ classes that have private data members). |
| * In alternative solutions, users need to repeat these implementation |
| details in sidecar files. Interop can verify that the specified |
| information is correct through static assertions in generated C++ code, |
| but the overall user experience is inferior. |
| |
| **Cons** |
| |
| * **Having to read C++ headers makes interop tooling more complex.** |
| * **The Rust projection of the C++ API is only visible in machine-generated |
| files.** |
| * These are not trivially accessible. |
| * There is a limit on how readable these files can be made. |
| * We can mitigate these issues by building tooling that shows the Rust |
| view of a C++ header (for example in Code Search, or in editors as an |
| alternative go-to-definition target). |
| |
| ### Customizability |
| |
| Interop tooling will be sufficiently customizable to accommodate the unique |
| needs of different C++ libraries in the codebase. Interop should be customizable |
| enough to accommodate existing codebases. C++ API owners can: |
| |
| * **Guide how interop tooling generates Rust API projections from C++ |
| headers**. For example, headers can provide: |
| * Custom Rust names for C++ function overloads (instead of applying the |
| general interop strategy for function overloads), |
| * Custom Rust names for overloaded C++ operators, |
| * Custom Rust lifetimes for pointers and references mentioned in the C++ |
| API, |
| * Nullability information for pointers in the C++ API, |
| * Assertions (verified at compile time) and promises (not verified by |
| tooling) that certain C++ types are trivially relocatable. |
| * **Provide custom logic to bridge types**, for example, mapping C++ |
| `absl::StatusOr` to Rust `Result`. |
| * **Provide API overlays** that improve the automatically generated Rust API. |
| * For example, the overlays could inject additional methods into |
| automatically generated Rust types or hide some of the generated |
| methods. |
| |
| More intrusive customization techniques will be useful for template and |
| macro-heavy libraries where the baseline import rules just won't work. We |
| believe customizability will be an essential enabler for providing high-fidelity |
| interop. |
| |
| ### Source of additional information that customizes C++ API projection into Rust |
| |
| Where C++ headers don't already provide all information necessary for interop |
| tooling to generate a Rust API projection, we will add such information to C++ |
| headers whenever possible. If it is not desirable to edit a certain C++ header, |
| extra information can be stored in a sidecar file. |
| |
| Examples of additional information that interop tooling will need: |
| |
| * **Nullability annotations.** C++ APIs often expose pointers that are |
| documented or assumed by convention to be never null, but can't be |
| refactored to references due to language limitations (for example, |
| `std::vector<MyProtobuf *>`). If C++ headers don't provide nullability |
| information for pointers in a machine-readable form, interop tooling has to |
| conservatively mark all C++ pointers as nullable in the Rust API projection. |
| The Rust compiler will then force users to write unnecessary (and |
| untestable) null checks. |
| * **Lifetimes of references and pointers** in C++ headers are not described in |
| a machine-readable way (and sometimes are not even documented in prose). |
| Lifetime information is essential to generate safe and idiomatic Rust APIs |
| from C++ headers. |
| |
| #### Additional information is stored in C++ headers |
| |
| **Pros** |
| |
| * **Additional information needed for C++/Rust interop will be expressed as |
| annotations on existing syntactic elements in C++.** |
| * The annotations are located in the most logical place. |
| * The annotations are more likely to be noticed and updated by C++ API |
| owners. |
| * API owners retain full control over how the API looks in Rust. |
| * **C++ users may find lifetime and nullability annotations useful.** For |
| example, information about lifetimes is highly important to C++ and Rust |
| users alike. |
| * **C++ API definitions are only written once,** minimizing duplication and |
| maintenance burden. |
| |
| **Cons** |
| |
| * **Annotations that benefit Rust users can bother C++ API owners** who don't |
| care about Rust. Especially at the beginning of integrating Rust into an |
| existing codebase, C++ API owners can push back on adding annotations. |
| * To encourage adoption of annotations, we can develop tooling for C++ |
| that uses lifetime and nullability annotations to find bugs in C++ code. |
| * The pushback is likely to be short-term: if Rust takes off in a C++ |
| codebase, C++ library owners in that codebase will need to care about |
| Rust users and how their API looks in Rust. |
| * **There may be headers that we cannot (or would not want to) change**, for |
| example, headers in third-party code, headers that are open-sourced, or when |
| first-party owners are not cooperating. |
| * We can apply the |
| [sidecar strategy](#additional-information-is-stored-in-sidecar-files) |
| to these headers. |
| |
| #### Additional information is stored in sidecar files |
| |
| Additional information needed for C++/Rust interop can be stored in sidecar |
| files, similarly to Swift APINotes, CLIF etc. If sidecar files get sufficiently |
| broad adoption (for example, if annotating third-party code turns out to be |
| sufficiently important that optimizing C++/Rust interop ergonomics there would |
| be worth it), it would make sense to write sidecar files in a Rust-like |
| language, as that provides the most natural way to define Rust APIs. |
| |
| **Pros** |
| |
| * **Sidecar files enable more broad adoption of annotations** by providing |
| additional interop information without modifying C++ headers. Sidecar files |
| will allow us to annotate headers in third-party code, headers that can't |
| adopt annotations for technical reasons, or headers owned by first-party |
| owners who are not cooperating. |
| |
| **Cons** |
| |
| * Like in the |
| [Use Rust code to customize API projection into Rust](#use-rust-code-to-customize-api-projection-into-rust) |
| alternative, **some part of C++ API information is duplicated**, which is a |
| burden for the C++ API owners. |
| * The projection of C++ APIs to Rust is defined in a new language. |
| * C++ API owners and Rust users will have to learn this language. |
| * If we expect wide adoption of sidecar files, we will need to create |
| tooling to parse, edit, and run LSCs against this language. |
| * **Annotations in sidecar files are more prone to become out of sync with the |
| C++ code.** When making changes to C++ code, engineers are less likely to |
| notice and update the annotations in sidecar files. |
| * Presubmits can catch some cases of desynchronization between C++ headers |
| and sidecar filles. However, presubmit errors that remind engineers to |
| edit more files create an inferior user experience. |
| * **Sidecar files create extra friction to modify the code.** Where previously |
| one had to edit only a C++ header and a C++ source file, now one also likely |
| needs to update a sidecar file. |
| * When engineers realize that they need to update a sidecar file, opening |
| another file and finding the right place to update creates extra |
| friction to modify code. |
| * Once engineers understand the extra maintenance burden associated with |
| sidecar files that tend to go out of sync with headers, they will be |
| less likely to adopt annotations in the first place. |
| |
| ### Glue code generation |
| |
| C++/Rust interop tooling will generate executable glue code and type definitions |
| in Rust and in C++ (not just merely `extern "C"` function declarations) in order |
| to achieve the following goals: |
| |
| * **Enable instantiating C++ templates from Rust, and monomorphizing Rust |
| generics from C++. Enable Rust types to participate in C++ inheritance |
| hierarchies.** |
| * For example, imagine Rust code using an object of type |
| `std::vector<MyProtobuf>`, while C++ code in the same program is never |
| instantiating this type. The Bazel `rust_library` target that mentions |
| this type must therefore be responsible for instantiating this template |
| and linking the resulting executable code into the final program. We |
| propose that this instantiation happens in an automatically generated |
| "glue" C++ translation unit that is a part of that `rust_library`. |
| * **Enable automatically wrapping C++ code to be more ergonomic in Rust.** For |
| example: |
| * `extern "C"` functions in Rust are necessarily unsafe (it is a language |
| rule). We would like the vast majority of C++ API projections into Rust |
| to be safe. In the current Rust language, we can achieve that only by |
| wrapping the unsafe `extern "C"` function in a safe function marked with |
| `#[inline(always)]`. |
| * C++ API owners can provide rules for automatic type bridging, for |
| example, mapping C++ `absl::StatusOr` to Rust `Result`. This conversion |
| necessitates generation of a Rust wrapper function around a C++ entry |
| point that takes advantage of such type bridging. |
| * **Provide stable locations (C++ modules, Rust crates) that "own" the types |
| from the language point of view.** |
| * For example, when we project a C++ type into Rust, its Rust definition |
| must be located in a Rust crate. Furthermore, all Rust users of this |
| type must observe it as being defined in the same crate in order for |
| every users to consider that they use the same type. Indeed, this is a |
| rule in Rust, that types defined in different crates are unrelated |
| types. |
| * When we project a Rust type into C++ we could repeat its C++ definition |
| in C++ code any number of times (for example, in every C++ user of a |
| Rust type). This is technically fine because C++ allows the same type to |
| be defined multiple types within a program. Nevertheless, such |
| duplication is error-prone. |
| |
| ### Glue code is generated as C++ and Rust source code |
| |
| Interop tooling will generate glue code as C++ and Rust source files, which are |
| then compiled with an unmodified compiler for that language. The alternative is |
| to generate LLVM IR or object files with machine code directly from interop |
| tooling. |
| |
| **Pros** |
| |
| * **It is easy to inject customizations provided by API owners into generated |
| source code.** |
| * The customizations will be written in the target language, making it |
| (hopefully) intuitive to write them. |
| * **Generated source code can be easily inspected by compiler engineers** |
| while debugging interop problems and compiler bugs. |
| * **Generated source code can be inspected and understood by interop users,** |
| who are not compiler experts. |
| * LLVM IR wouldn't be meaningful to them. |
| * **Generated source code is processed by the regular toolchain like any other |
| code in the project.** |
| * It automatically benefits from all performance optimizations and |
| sanitizers that are newly implemented in Clang and Rust compilers. |
| * **We avoid adding a new tool that generates unique LLVM IR patterns.** |
| * We avoid making the job of the C++ toolchain maintainers harder. |
| |
| **Cons** |
| |
| * **Interop tooling will be limited to generating LLVM IR and machine code |
| that Clang and Rust compilers can generate.** |
| |
| ### Glue code and API projections will assume implementation details of the target execution environment |
| |
| To provide the most ergonomic and performant interop, C++/Rust interop tooling |
| will allow the target codebase to opt into assuming various implementation |
| details of the target execution environment. For example: |
| |
| * When calling C++ from Rust, interop tooling can either wrap C++ functions in |
| thunks with a C calling convention, or call C++ entry points directly. |
| Thunks cause code bloat and can collectively add up to become a performance |
| problem, so it is desirable to call C++ entry points from Rust directly. |
| Interop tooling can do that only if it may assume a specific target platform |
| and C++ ABI. |
| |
| Implementation details of the target execution environment that are considered |
| stable enough will be reflected in API projections, for example: |
| |
| * The C++ standard does not specify sizes of integer types (`short`, `int`, |
| `long` etc.) To map them to Rust, interop tooling will need to assume a size |
| that they have on the platform that targets in practice. The alternative |
| would be to create target-agnostic integer types (for example, `Int` in |
| Swift is a strong typedef for `Int32` on 32-bit targets, and `Int64` on |
| 64-bit targets), but this makes it harder to provide idiomatic, transparent, |
| high-performance interop. |
| * The C++ standard does not specify whether standard library types like |
| `std::vector` are trivially relocatable; it is an implementation detail. |
| Universal interop tooling would have to conservatively assume |
| non-trivially-relocatable types. Interop tooling specific to certain |
| environments can rely on libc++ providing a trivially-relocatable |
| `std::vector` and project it into Rust in a much more ergonomic way. |
| |
| **Pros** |
| |
| * **Interop tooling will generate the most performant code sequences** to call |
| foreign language functions. |
| * If interop tooling generates portable code, it would have some overhead. |
| The overhead can be eliminated by C++ and Rust optimizers at least in |
| some cases, but at the cost of increased build times. For example, |
| eliminating thunks would require turning on LTO, which is not fast, and |
| usually only used for release builds. It is much preferable to not |
| generate thunks in the first place, if the target platform does not need |
| them. |
| * **Ergonomics of API projections will be improved.** |
| * For example, whether a C++ type is trivially relocatable or not is an |
| implementation detail in C++, transparent to C++ users of that type, but |
| it makes a huge ergonomic difference in the Rust API projection. |
| |
| **Cons** |
| |
| * **C++ code will have additional evolution constraints.** |
| * For example, changing a type from trivially relocatable to non-trivially |
| relocatable is a non-API-breaking change for C++ users, but it would |
| break Rust users. |
| * **It would be more difficult to switch internal environments to a different |
| C++ standard library.** |
| * **Code that is deployed in environments that have incompatible |
| implementation details won't be able to use this C++/Rust interop system.** |
| * Alternatively, these executables would have to bring a suitable |
| execution environment with them (e.g., a copy of libc++). |
| |
| ### Interop tooling should be maintainable and evolvable for a long time |
| |
| We should design and implement C++/Rust interop tooling in such a way that we |
| can maintain and evolve it for more than a decade. If Rust becomes tightly |
| integrated into an existing C++ project, specific requirements for interop and |
| API projection rules will keep changing. The more Rust adoption we will have, |
| the more library and team-specific interop customizations we will have to |
| support, and the more it will make sense for the performance team to tweak |
| generated code to implement sweeping optimizations. These kinds of changes |
| should be readily possible, and they should not create conflicts of interest |
| between diferent users of the interop tooling. |
| |
| ### Interop tooling should facilitate C++ to Rust migration |
| |
| C++/Rust interop tooling should try to create a favorable environment for |
| migrating C++ code to Rust. Specifically, projections of C++ APIs into Rust |
| should be implementable in Rust. This way, a C++ library can be converted from |
| C++ into Rust transparently for its users, as its public API won't change. |
| |
| ## Alternatives Considered: Design decisions |
| |
| ### Repeat C++ API completely in a separate IDL |
| |
| Instead of reading C++ headers in the interop tooling, we would require the user |
| to repeat the C++ API in some other form, for example, in a Rust-based IDL like |
| in the cxx crate, or in sidecar files in a completely new format. |
| |
| **Pros** |
| |
| * **Interop tooling can be simpler if it does not have to read C++ headers**. |
| But even under this alternative approach, tooling might want to read C++ |
| headers, nullifying this advantage. For example, tooling might want to |
| automatically generate an initial Rust snippet or to suggest in presubmits |
| to adjust the Rust code that mirrors a C++ API when that C++ API changes. |
| * The **most natural way to define Rust APIs** is by using Rust code or |
| Rust-like syntax in sidecar files. |
| * **Available Rust APIs are defined in easily accessible checked-in files.** |
| * **API definitions written by a human might have higher quality, on |
| average.** |
| |
| **Cons** |
| |
| * **A big part of the C++ API needs to be duplicated** to reliably match the |
| Rust code with the C++ declarations. The initial code can be generated by |
| tooling, but it has to be kept in sync. This is a burden for the C++ API |
| owners, potentially a bigger one than allowing annotations in C++ headers. |
| * There is a risk that C++ API owners might refuse to own IDL files. |
| * The need to create a sidecar file creates a **barrier to start using C++ |
| libraries from Rust.** |
| * While the duplication overhead is justifiable for widely-used libraries, |
| it is relatively high for libraries with few users and binaries, making |
| it less likely that leaf teams will start adopting Rust. |
| * **When the C++ API is changed, the Rust definitions become out-of-sync with |
| it.** Tooling needs to detect this, and the Rust definitions need to be |
| changed (either manually or tool-assisted). |
| * There is no effective way to verify Rust binding code at the presubmit time |
| of a C++ library other than building downstream projects. |
| * **Mapping Rust API definitions to the original C++ API definitions is more |
| complicated and error-prone**. For example, how would we target a specific |
| overload of a function or constructor? |
| * There is a **risk that individual teams will build team-specific tooling |
| that generates IDL files** from C++ headers or generates both IDL files and |
| C++ headers from a single source. These solutions are unlikely to scale to |
| existing large codebases and will likely only work for that specific team. |
| |
| ### Use Rust code to customize API projection into Rust |
| |
| An alternative to storing additional information in C++ headers is to put it |
| into Rust code. For example, the cxx crate requires users to re-state the C++ |
| API in Rust syntax, adding information about lifetimes and nullability. The pros |
| and cons of this choice are the same as when defining a special IDL that repeats |
| the C++ API completely (see above). |
| |
| ### Generate glue code in binary formats |
| |
| Instead of generating glue code as textual sources, interop tooling could use |
| Clang and LLVM APIs to emit object files with C++ glue code and use Rust |
| compiler APIs to generate rmeta and rlib files with Rust glue code. |
| |
| **Pros** |
| |
| * **More flexibility in the code that can be generated.** Controlling LLVM IR |
| generation allows interop tooling to generate code that an unmodified |
| compiler can't generate from textual source code. For example, the Rust |
| language does not have any constructs that map to `linkonce_odr` functions |
| in LLVM IR; if the interop tooling embedded the Rust compiler as a library |
| and had more control over how it generates the IR, we could make that |
| happen. |
| |
| **Cons** |
| |
| * Injecting customizations provided by API owners is harder. |
| * LLVM, Clang, and Rust compiler APIs are not stable. The format of Rust |
| metadata files is not stable either. The larger the API subset we consume |
| from Clang and Rust, the more difficult it becomes to maintain the tooling. |
| * To generate object files the interop tooling has to ensure that its |
| Clang/LLVM version and configuration is identical with the Clang compiler |
| used to build other C++ code. |
| * We can solve this problem, but it makes the system more fragile, |
| compared to using existing C++ and Rust compilers to compile generated |
| sources. |
| * From time to time LLVM introduces bugs that cause miscompilations. If |
| interop tooling embeds LLVM, we would be adding another tool that toolchain |
| engineers will need to look into when debugging a miscompilation. We would |
| be making the job of C++ toolchain maintainers harder. |
| |
| ## Alternatives Considered: Existing tools |
| |
| ### bindgen |
| |
| [bindgen](https://rust-lang.github.io/rust-bindgen/) **automatically generates |
| Rust bindings from C and C++ headers**, which it consumes using libclang. The |
| generated **bindings are pure Rust code** that interfaces with C and C++ using |
| Rust’s [built-in FFI for C](https://doc.rust-lang.org/nomicon/ffi.html) |
| (`#[repr(C)]` to indicate that a struct should use C memory layout and `extern |
| "C"` to indicate that a function should use a C calling convention). C++ |
| functions are handled by generating a Rust `extern "C"` function that has the |
| same ABI as the C++ function and attaching a `link_name` attribute with the |
| mangled name. |
| |
| See |
| [here](https://manishearth.github.io/blog/2021/02/22/integrating-rust-and-c-plus-plus-in-firefox/) |
| for an in-depth description of the use of bindgen in Stylo, a Rust component in |
| Firefox. |
| |
| **Pros** |
| |
| * **The oldest and the most mature** of the existing C++ interop tools |
| (developed |
| [since Feb 2012](https://github.com/rust-lang/rust-bindgen/commit/9fe92b0cfd48d5ebd1c82af8b1ff041f8c416a65)). |
| |
| **Cons** |
| |
| * **Deficiencies in safety and ergonomics**, for example: |
| * References are imported as pointers. No lifetimes, no null-safety. |
| * Constructors and destructors are not called automatically. |
| * Overloads are distinguished by a numbered suffix in Rust. These numbers |
| clutter the source code and are hard to remember, as they have no |
| meaning. Adding overloads can change the numbering and hence break Rust |
| callers. |
| * It is **impossible to use C++ inline functions and templates** from Rust |
| because of bindgen’s architecture[^1]. The architecture is unlikely to |
| change, and therefore, this is a dealbreaker. |
| |
| **Evaluation** |
| |
| bindgen could be used in a project that has very limited C++ interop needs. |
| However, creating safe and ergonomic wrappers for the generated bindings would |
| require additional effort. Our vision and goals for C++ interop are very |
| different from what bindgen provides. |
| |
| ### cbindgen |
| |
| [cbindgen](https://github.com/eqrion/cbindgen) **automatically generates C or |
| C++ headers for Rust libraries which expose a public C API**. |
| |
| **Pros** |
| |
| * **An old and mature tool** (developed |
| [since March 2017](https://github.com/eqrion/cbindgen/commit/215d3a987b223d4a1a878e2385c8677d5ae3a80b)). |
| |
| **Cons** |
| |
| * **Shallow understanding of Rust's modules and types**. |
| |
| * [`cbindgen`'s docs](https://github.com/eqrion/cbindgen/blob/master/docs.md) |
| point out that "A major limitation of cbindgen is that it does not |
| understand Rust's module system or namespacing. This means that if |
| cbindgen sees that it needs the definition for MyType and there exists |
| two things in your project with the type name MyType, it won't know what |
| to do. Currently, cbindgen's behaviour is unspecified if this happens." |
| * This limitation seems mostly caused by building `cbindgen` on top of |
| [the `syn` crate](https://docs.rs/syn). `syn` is able to parse Rust |
| source code into an AST, but there is no facility at the `syn` level for |
| type deduction or module traversal. Building such functionality would |
| require replicating parts of the `rustc` compiler into `cbindgen`, or |
| alternatively rewriting `cbindgen` on top of |
| [the `rustc_driver` crate](https://doc.rust-lang.org/stable/nightly-rustc/rustc_driver/)). |
| |
| * **Support of only `extern "C"` functions**. |
| |
| * Supporting Rust functions that use the default calling convention would |
| require generating not only C/C++ headers, but also generating Rust |
| source with `extern "C"` thunks that trampoline into the original |
| function (requiring that `cbindgen` starts generating Rust sources). |
| |
| * **Support of only `#[repr(C)]` structs**. |
| |
| * Default memory layout of Rust structs is |
| [unspecified](https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#default-layout-repr-rust:~:text=the%20default%20layout%20of%20structs%20is%20not%20specified) |
| and therefore cannot be determined by code examination at the `syn` |
| level. |
| * Even if the memory layout could be determined, the layout can change in |
| a future compiler version, or change depending on compilation command |
| line flags. To prevent using stale layout information, the |
| auto-generated FFI code should therefore include compile-time assertions |
| that the layout didn't change from the FFI generation time. The |
| assertions should be present both in the generated C/C++ headers *and* |
| on the Rust side (requiring that `cbindgen` starts generating Rust |
| sources). The assertions would effectively verify that the FFI |
| generation is driven by the build system (i.e. by Bazel, or Cargo, or |
| GN/ninja, rather than manually) and that the integration between the FFI |
| tools and the build system doesn't have any bugs (e.g. that it |
| faithfully replicates all relevent compilation flags). |
| |
| **Evaluation** |
| |
| cbindgen could be used in a project that can create a narrow `extern "C"` / |
| `#[repr(C)]` API and that is ready to manage the risk of incorrect name/module |
| resolution. Wrapping additional Rust APIs would require extra effort. |
| |
| **Take-aways for Crubit design** |
| |
| Notes and observations about `cbindgen` can guide some design aspects of |
| Crubit's [`cc_bindings_from_rs`](../cc_bindings_from_rs/README.md) tool |
| (that similarly to `cbindgen` generates C++ bindings for Rust crates). |
| Using internal compiler knowledge (e.g. memory layout of structs, name and type |
| resolution) requires that `cc_bindings_from_rs` depends on |
| `rustc_driver` and other internal crates of `rustc`. The API of these crates is |
| unstable which might increase the risk and maintenance cost of Crubit. |
| Nevertheless, our experience with maintaining tools based on (also unstable) |
| Clang APIs suggests that this extra risk and cost is likely going to be |
| acceptable. |
| |
| Build determinism requires that the Rust compiler produces the same output for |
| the same set of inputs (the same compiler version, the same command-line flags, |
| the same sources, etc.). This means that (despite |
| [conservative reservations about layout determinism](https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#default-layout-repr-rust:~:text=A%20note%20on%20determinism)) |
| it should be okay to assume that `cc_bindings_from_rs` and `rustc` invocations |
| will observe the same memory layout of structs, but this requires that |
| `cc_bindings_from_rs` is built against exactly the same version of |
| `rustc_driver` libraries as `rustc`. (This should also be reinforced by |
| compile-time assertions in the generated FFI layer.) |
| |
| ### cxx |
| |
| [cxx](https://cxx.rs/) generates **Rust bindings for C++ APIs and vice versa** |
| from an **interface definition language (IDL) included inline in Rust source |
| code.** cxx generates Rust and C++ source code from IDL definitions. To check |
| that the IDL definitions match the actual C++ API, cxx inserts static |
| assertions[^2] into the generated C++ code; it does not, however, read the C++ |
| headers itself. cxx contains built-in bindings for various Rust and C++ standard |
| library types that are not customizable. |
| |
| As far as we understand, cxx has the following design constraints and goals: |
| |
| * **Ship a stable product for its intended audience.** |
| * As a consequence, improvements such as integrating move semantics are |
| not going to be accepted soon. We understand that cxx is not a vehicle |
| for experimentation. cxx maintainers would prefer us to first show that |
| our ideas work in a fork of cxx or in a different system, such as |
| autocxx, and that our improvements pull their weight given the added |
| complexity. |
| * **Remain simple and transparent.** There is a limit on the amount of |
| complexity that will be tolerated. |
| * There is a chance that improvements such as modeling C++ move semantics |
| or various attempts at eliminating thunks will not be ever accepted in |
| upstream cxx. |
| * **Non-goal: Automatically provide high fidelity interop.** |
| * cxx is designed for the use case of an executable where C++ and Rust |
| parts communicate through a narrow interface. |
| * **Non-goal: Automatically provide the most performant interop in as many |
| cases as possible.** For example: |
| * cxx does not attempt to eliminate C++-side thunks. Instead, using LTO is |
| recommended. |
| * cxx considers it acceptable to allocate all objects of "opaque" types on |
| the heap. Users who find these heap allocations unacceptable for |
| performance reasons are expected to implement a different C++ entry |
| point that does not hit this limitation and bind it to Rust instead of |
| the original C++ API. Heap allocation is acceptable for many C++ classes |
| in most environments, but the exceptions are important enough for us |
| that this is a major restriction. |
| |
| **Pros** |
| |
| * **Mature and ergonomic enough today for mixing C++ and Rust in existing |
| codebases with limited C++ interop needs.** |
| * We avoid being on a tech island. |
| |
| **Cons** |
| |
| * cxx’s stability goal makes it **hard to experiment with how the Rust API |
| looks.** |
| * **Our goals are unlikely to align well with the goals of the intended user |
| audience of cxx.** We would be pulling cxx in directions that make it a |
| worse product for its current users. |
| * **Almost no customizability**. Users who are not satisfied with what cxx |
| does are expected to wrap the target C++ API in a different C++ API that is |
| more friendly to cxx. |
| * cxx tries to be compatible with most standard C++ implementations found in |
| the real world, so it **cannot take advantage of unique guarantees provided |
| by the target execution environment.** |
| |
| **Evaluation** |
| |
| cxx could be used in projects with limited C++/Rust interop requirements. |
| However, we would not be able to implement many interop features that we |
| consider essential (for example, move semantics, templates). |
| |
| ### autocxx |
| |
| [autocxx](https://github.com/google/autocxx) **automatically generates Rust |
| bindings from C++ headers**. As the name implies, it automatically generates IDL |
| definitions for cxx, which then produces the actual bindings. In addition, |
| autocxx generates its own Rust and C++ code to extend the Rust API beyond what |
| cxx itself would provide, for example to support passing POD types by value. |
| autocxx consumes C++ headers indirectly by first running bindgen on them and |
| then parsing the Rust code output by bindgen. |
| |
| autocxx’s |
| [design goals](https://www.chromium.org/Home/chromium-security/memory-safety/rust-and-c-interoperability) |
| are similar to our own in this document. |
| |
| We did a case study on using an existing project's C++ API from Rust using |
| autocxx. |
| |
| **Pros** |
| |
| * **Low barrier to entry**: Bindings are generated from C++ headers, no need |
| to write duplicate API definitions. |
| * **Ergonomic mappings** for many C++ constructs. |
| * **Open to contributions that change the generated Rust APIs** or make |
| architectural changes. |
| |
| **Cons** |
| |
| * **Relatively new and immature.** |
| * **Cannot (yet) consume complex headers without errors.** We’ve managed to |
| import some actual Spanner headers, but there are still enough outstanding |
| issues that we can’t yet do anything useful with Spanner. |
| * **Architecture can make modifications difficult.** autocxx is built on top |
| of two other tools, bindgen and cxx, and the interfaces between these |
| components can make it harder to make a modification than it would be in a |
| monolithic tool. Specifically: |
| * autocxx uses bindgen to generate a description of the C++ API that it |
| can parse easily (as opposed to trying to parse C++ headers either |
| directly or using Clang APIs). Since bindgen was not intended for this |
| purpose, its output lacks some information that autocxx needs, so |
| autocxx [has forked](https://crates.io/crates/autocxx-bindgen) bindgen |
| to adapt it to its needs. The forked version emits additional |
| information about the C++ API in the form of attributes attached to |
| various API elements. |
| * bindgen in turn is built on the libclang API, which doesn’t surface all |
| of the functionality available through Clang’s C++ API. Adding features |
| to libclang requires additional effort and has a 6 month lead time to |
| appear in a stable release (to become eligible to be used from bindgen). |
| * When errors occur, it can be hard to figure out which of the components |
| is responsible. |
| * Adding features can require touching multiple components, which requires |
| commits to multiple repositories. |
| |
| **Evaluation** |
| |
| We initially intended to use autocxx to prototype various interop ideas and |
| potentially as a basis for a field trial. We still believe this would be |
| feasible, but after trying to modify autocxx and its bindgen fork during an |
| internal C++/Rust interop study, we feel that autocxx’s complex architecture is |
| enough of an impediment that we could achieve our goals with less total effort |
| by creating an interop tool from scratch that consists of a single codebase and |
| uses the Clang C++ API to directly interface with Clang. |
| |
| [^1]: Doing so would require either generating C++ source code or interfacing |
| deeply enough with Clang to generate object code for inline functions and |
| template instantiation. |
| [^2]: And tricks such as suitable type conversions that force the C++ compiler |
| to perform appropriate checks at compile time. |