blob: 8fdb928e727f979437877b456597cd90349796e6 [file] [log] [blame] [view]
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -07001# High-level design of C++/Rust interop
2
Googler85ea7772022-05-14 01:29:31 -07003## Introduction
4
5This document describes the high-level design choices of Crubit, a C++/Rust
6Bidirectional Interop Tool.
7
8## C++/Rust interop goal
9
10**The primary goal of C++/Rust interop tooling is to enable Rust to be used
11side-by-side with C++ in large existing codebases.**
12
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -070013In the short term we would like to focus on codebases that roughly follow the
14Google C++ style guide to improve the interop fidelity. Other, more diverse
15codebases are possible prospective users in the long term, and their needs will
16be addressed by customization and extension points.
Googler85ea7772022-05-14 01:29:31 -070017
18## C++/Rust interop requirements
19
20In support of the interop goal, we identify the following requirements:
21
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700221. **Enable using existing C++ libraries from Rust with high fidelity**
Googler85ea7772022-05-14 01:29:31 -070023 * **High fidelity means that interop will make C++ APIs available in Rust,
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -070024 even when those API projections would not be idiomatic, ergonomic, or
25 safe** in Rust, to facilitate cheap, small step incremental migration
Googler85ea7772022-05-14 01:29:31 -070026 workflow. Based on the experience of other cross-language
27 interoperability systems and language migrations (for example,
28 Objective-C/Swift, Java/Kotlin, JavaScript/TypeScript), we believe that
29 working in a mixed C++/Rust codebase would be significantly harder if
30 some C++ APIs were not available in Rust.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -070031 * **Interop will bridge C++ constructs to Rust constructs only when the
32 semantics match closely**. Bridging large semantic gaps creates a risk
33 of making C++ APIs unusable in Rust, as well as a risk of creating
34 performance problems. For example, interop will not bridge destructive
35 Rust moves and non-destructive C++ moves; instead it will make C++ move
36 constructors and move assignment operators available to use in Rust
37 code. As another example, interop will not bridge C++ templates and Rust
38 generics by default.
39 * Interop should be **performant**, as close to having no runtime cost as
40 possible. The performance costs of the interop should be documented, and
41 where possible, intuitive to the user.
Googler85ea7772022-05-14 01:29:31 -070042 * Interop should be **ergonomic and safe**, as long as ergonomic and
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -070043 safety accommodations do not hurt performance or fidelity. Where a
44 tradeoff is possible, the interop will choose performance and fidelity
45 over ergonomics; the user will be allowed to override this choice.
Googler85ea7772022-05-14 01:29:31 -070046 * **Enable owners of the C++ API to control their Rust API projection**,
47 for example, with attributes in C++ headers and by extending generated
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -070048 bindings with a manually implemented overlay. Such an overlay will wrap
49 or extend generated bindings to improve ergonomics and safety.
502. **Enable using Rust libraries from C++**
51 * However, using C++ libraries from Rust has a higher priority than using
52 Rust libraries from C++.
533. **Put little to no barriers to entry**
54 * **Ideally, no boilerplate code** needs to be written in order to start
55 using a C++ library from Rust. Adding some extra information can make
56 the generated bindings more ergonomic to use.
Googler85ea7772022-05-14 01:29:31 -070057 * The amount of **duplicated API information is minimized**.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -070058 * **Future evolution of C++ APIs should be minimally hindered by the
59 presence of Rust users**.
Googler85ea7772022-05-14 01:29:31 -070060
61## Proposal and high-level design
62
63**We propose to develop our own C++/Rust interop tooling.** There are no
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -070064existing tools that satisfy all of our requirements. Modifying an existing tool
65to fulfill these requirements would take more effort than building a new tool
66from scratch or might require forking its codebase given that some existing
Googler85ea7772022-05-14 01:29:31 -070067tools have goals that conflict with our goals.
68
69See the "alternatives considered" section for a discussion of existing tools.
70
71### Source of information about C++ API
72
73**Interop tooling will read C++ headers**, as they contain the information
74needed to generate Rust API projections and the necessary glue code. Interop
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -070075tooling that is used during builds will not read C++ source files, to maintain
76the principle that C++ API information is only located in headers, and that a
77C++ library can't break the build of its dependencies by changing source files.
Googler85ea7772022-05-14 01:29:31 -070078
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -070079Some interop-adjacent tools (e.g., large-scale refactoring tools that seed the
80initial set of lifetime annotations) will also read C++ sources. These tools
81will not be used during builds.
Googler85ea7772022-05-14 01:29:31 -070082
83**Pros**
84
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -070085* **Minimal barrier to entry**: minimal amount of manual work is required to
86 start using a C++ library from Rust.
87 * Encourages leaf projects to start incrementally adopting Rust in new
88 code, or incrementally rewriting C++ targets in Rust.
89* **C++ API information is located only in headers**, regardless of the
90 language that the API consumer is written in (C++ or Rust).
91* **Interop tooling that generates Rust API projections from a C++ header can
92 get exactly the same information that the C++ compiler has** when processing
93 a translation unit that uses one of the APIs declared within that header.
Googler85ea7772022-05-14 01:29:31 -070094 * Interop tooling can generate the most performant calls to C++ APIs,
95 without C++-side thunks that translate the C++ ABI into a C ABI.
96 * Interop tooling can autodetect implementation details that are critical
97 for interop but are not a part of the API surface (for example, the size
98 and alignment of C++ classes that have private data members).
99 * In alternative solutions, users need to repeat these implementation
100 details in sidecar files. Interop can verify that the specified
101 information is correct through static assertions in generated C++ code,
102 but the overall user experience is inferior.
103
104**Cons**
105
106* **Having to read C++ headers makes interop tooling more complex.**
107* **The Rust projection of the C++ API is only visible in machine-generated
108 files.**
109 * These are not trivially accessible.
110 * There is a limit on how readable these files can be made.
111 * We can mitigate these issues by building tooling that shows the Rust
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700112 view of a C++ header (for example in Code Search, or in editors as an
113 alternative go-to-definition target).
Googler85ea7772022-05-14 01:29:31 -0700114
115### Customizability
116
117Interop tooling will be sufficiently customizable to accommodate the unique
118needs of different C++ libraries in the codebase. Interop should be customizable
119enough to accommodate existing codebases. C++ API owners can:
120
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700121* **Guide how interop tooling generates Rust API projections from C++
122 headers**. For example, headers can provide:
123 * Custom Rust names for C++ function overloads (instead of applying the
124 general interop strategy for function overloads),
Googler85ea7772022-05-14 01:29:31 -0700125 * Custom Rust names for overloaded C++ operators,
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700126 * Custom Rust lifetimes for pointers and references mentioned in the C++
127 API,
Googler85ea7772022-05-14 01:29:31 -0700128 * Nullability information for pointers in the C++ API,
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700129 * Assertions (verified at compile time) and promises (not verified by
130 tooling) that certain C++ types are trivially relocatable.
Googler85ea7772022-05-14 01:29:31 -0700131* **Provide custom logic to bridge types**, for example, mapping C++
132 `absl::StatusOr` to Rust `Result`.
133* **Provide API overlays** that improve the automatically generated Rust API.
134 * For example, the overlays could inject additional methods into
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700135 automatically generated Rust types or hide some of the generated
136 methods.
Googler85ea7772022-05-14 01:29:31 -0700137
138More intrusive customization techniques will be useful for template and
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700139macro-heavy libraries where the baseline import rules just won't work. We
140believe customizability will be an essential enabler for providing high-fidelity
141interop.
Googler85ea7772022-05-14 01:29:31 -0700142
143### Source of additional information that customizes C++ API projection into Rust
144
145Where C++ headers don't already provide all information necessary for interop
146tooling to generate a Rust API projection, we will add such information to C++
147headers whenever possible. If it is not desirable to edit a certain C++ header,
148extra information can be stored in a sidecar file.
149
150Examples of additional information that interop tooling will need:
151
152* **Nullability annotations.** C++ APIs often expose pointers that are
153 documented or assumed by convention to be never null, but can't be
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700154 refactored to references due to language limitations (for example,
155 `std::vector<MyProtobuf *>`). If C++ headers don't provide nullability
156 information for pointers in a machine-readable form, interop tooling has to
157 conservatively mark all C++ pointers as nullable in the Rust API projection.
158 The Rust compiler will then force users to write unnecessary (and
159 untestable) null checks.
160* **Lifetimes of references and pointers** in C++ headers are not described in
161 a machine-readable way (and sometimes are not even documented in prose).
Googler85ea7772022-05-14 01:29:31 -0700162 Lifetime information is essential to generate safe and idiomatic Rust APIs
163 from C++ headers.
164
165#### Additional information is stored in C++ headers
166
167**Pros**
168
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700169* **Additional information needed for C++/Rust interop will be expressed as
170 annotations on existing syntactic elements in C++.**
Googler85ea7772022-05-14 01:29:31 -0700171 * The annotations are located in the most logical place.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700172 * The annotations are more likely to be noticed and updated by C++ API
173 owners.
Googler85ea7772022-05-14 01:29:31 -0700174 * API owners retain full control over how the API looks in Rust.
175* **C++ users may find lifetime and nullability annotations useful.** For
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700176 example, information about lifetimes is highly important to C++ and Rust
177 users alike.
178* **C++ API definitions are only written once,** minimizing duplication and
179 maintenance burden.
Googler85ea7772022-05-14 01:29:31 -0700180
181**Cons**
182
183* **Annotations that benefit Rust users can bother C++ API owners** who don't
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700184 care about Rust. Especially at the beginning of integrating Rust into an
185 existing codebase, C++ API owners can push back on adding annotations.
Googler85ea7772022-05-14 01:29:31 -0700186 * To encourage adoption of annotations, we can develop tooling for C++
187 that uses lifetime and nullability annotations to find bugs in C++ code.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700188 * The pushback is likely to be short-term: if Rust takes off in a C++
189 codebase, C++ library owners in that codebase will need to care about
190 Rust users and how their API looks in Rust.
191* **There may be headers that we cannot (or would not want to) change**, for
192 example, headers in third-party code, headers that are open-sourced, or when
193 first-party owners are not cooperating.
Googler4eb66dc2022-08-01 03:00:43 -0700194 * We can apply the
195 [sidecar strategy](#additional-information-is-stored-in-sidecar-files)
196 to these headers.
Googler85ea7772022-05-14 01:29:31 -0700197
198#### Additional information is stored in sidecar files
199
200Additional information needed for C++/Rust interop can be stored in sidecar
201files, similarly to Swift APINotes, CLIF etc. If sidecar files get sufficiently
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700202broad adoption (for example, if annotating third-party code turns out to be
203sufficiently important that optimizing C++/Rust interop ergonomics there would
204be worth it), it would make sense to write sidecar files in a Rust-like
Googler85ea7772022-05-14 01:29:31 -0700205language, as that provides the most natural way to define Rust APIs.
206
207**Pros**
208
209* **Sidecar files enable more broad adoption of annotations** by providing
210 additional interop information without modifying C++ headers. Sidecar files
211 will allow us to annotate headers in third-party code, headers that can't
212 adopt annotations for technical reasons, or headers owned by first-party
213 owners who are not cooperating.
214
215**Cons**
216
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700217* Like in the
218 [Use Rust code to customize API projection into Rust](#use-rust-code-to-customize-api-projection-into-rust)
219 alternative, **some part of C++ API information is duplicated**, which is a
220 burden for the C++ API owners.
Googler85ea7772022-05-14 01:29:31 -0700221* The projection of C++ APIs to Rust is defined in a new language.
222 * C++ API owners and Rust users will have to learn this language.
223 * If we expect wide adoption of sidecar files, we will need to create
224 tooling to parse, edit, and run LSCs against this language.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700225* **Annotations in sidecar files are more prone to become out of sync with the
226 C++ code.** When making changes to C++ code, engineers are less likely to
227 notice and update the annotations in sidecar files.
Googler85ea7772022-05-14 01:29:31 -0700228 * Presubmits can catch some cases of desynchronization between C++ headers
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700229 and sidecar filles. However, presubmit errors that remind engineers to
230 edit more files create an inferior user experience.
Googler85ea7772022-05-14 01:29:31 -0700231* **Sidecar files create extra friction to modify the code.** Where previously
232 one had to edit only a C++ header and a C++ source file, now one also likely
233 needs to update a sidecar file.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700234 * When engineers realize that they need to update a sidecar file, opening
235 another file and finding the right place to update creates extra
236 friction to modify code.
237 * Once engineers understand the extra maintenance burden associated with
238 sidecar files that tend to go out of sync with headers, they will be
239 less likely to adopt annotations in the first place.
Googler85ea7772022-05-14 01:29:31 -0700240
241### Glue code generation
242
243C++/Rust interop tooling will generate executable glue code and type definitions
244in Rust and in C++ (not just merely `extern "C"` function declarations) in order
245to achieve the following goals:
246
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700247* **Enable instantiating C++ templates from Rust, and monomorphizing Rust
248 generics from C++. Enable Rust types to participate in C++ inheritance
249 hierarchies.**
Googler85ea7772022-05-14 01:29:31 -0700250 * For example, imagine Rust code using an object of type
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700251 `std::vector<MyProtobuf>`, while C++ code in the same program is never
252 instantiating this type. The Bazel `rust_library` target that mentions
253 this type must therefore be responsible for instantiating this template
254 and linking the resulting executable code into the final program. We
255 propose that this instantiation happens in an automatically generated
256 "glue" C++ translation unit that is a part of that `rust_library`.
257* **Enable automatically wrapping C++ code to be more ergonomic in Rust.** For
258 example:
Googler85ea7772022-05-14 01:29:31 -0700259 * `extern "C"` functions in Rust are necessarily unsafe (it is a language
260 rule). We would like the vast majority of C++ API projections into Rust
261 to be safe. In the current Rust language, we can achieve that only by
262 wrapping the unsafe `extern "C"` function in a safe function marked with
263 `#[inline(always)]`.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700264 * C++ API owners can provide rules for automatic type bridging, for
265 example, mapping C++ `absl::StatusOr` to Rust `Result`. This conversion
266 necessitates generation of a Rust wrapper function around a C++ entry
267 point that takes advantage of such type bridging.
Googler85ea7772022-05-14 01:29:31 -0700268* **Provide stable locations (C++ modules, Rust crates) that "own" the types
269 from the language point of view.**
270 * For example, when we project a C++ type into Rust, its Rust definition
271 must be located in a Rust crate. Furthermore, all Rust users of this
Googler75374982022-08-01 03:03:50 -0700272 type must observe it as being defined in the same crate in order for
273 every users to consider that they use the same type. Indeed, this is a
274 rule in Rust, that types defined in different crates are unrelated
275 types.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700276 * When we project a Rust type into C++ we could repeat its C++ definition
277 in C++ code any number of times (for example, in every C++ user of a
278 Rust type). This is technically fine because C++ allows the same type to
279 be defined multiple types within a program. Nevertheless, such
280 duplication is error-prone.
Googler85ea7772022-05-14 01:29:31 -0700281
282### Glue code is generated as C++ and Rust source code
283
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700284Interop tooling will generate glue code as C++ and Rust source files, which are
285then compiled with an unmodified compiler for that language. The alternative is
286to generate LLVM IR or object files with machine code directly from interop
Googler85ea7772022-05-14 01:29:31 -0700287tooling.
288
289**Pros**
290
291* **It is easy to inject customizations provided by API owners into generated
292 source code.**
293 * The customizations will be written in the target language, making it
294 (hopefully) intuitive to write them.
295* **Generated source code can be easily inspected by compiler engineers**
296 while debugging interop problems and compiler bugs.
297* **Generated source code can be inspected and understood by interop users,**
298 who are not compiler experts.
299 * LLVM IR wouldn't be meaningful to them.
300* **Generated source code is processed by the regular toolchain like any other
301 code in the project.**
302 * It automatically benefits from all performance optimizations and
303 sanitizers that are newly implemented in Clang and Rust compilers.
304* **We avoid adding a new tool that generates unique LLVM IR patterns.**
305 * We avoid making the job of the C++ toolchain maintainers harder.
306
307**Cons**
308
309* **Interop tooling will be limited to generating LLVM IR and machine code
310 that Clang and Rust compilers can generate.**
311
312### Glue code and API projections will assume implementation details of the target execution environment
313
314To provide the most ergonomic and performant interop, C++/Rust interop tooling
315will allow the target codebase to opt into assuming various implementation
316details of the target execution environment. For example:
317
318* When calling C++ from Rust, interop tooling can either wrap C++ functions in
319 thunks with a C calling convention, or call C++ entry points directly.
320 Thunks cause code bloat and can collectively add up to become a performance
321 problem, so it is desirable to call C++ entry points from Rust directly.
322 Interop tooling can do that only if it may assume a specific target platform
323 and C++ ABI.
324
325Implementation details of the target execution environment that are considered
326stable enough will be reflected in API projections, for example:
327
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700328* The C++ standard does not specify sizes of integer types (`short`, `int`,
329 `long` etc.) To map them to Rust, interop tooling will need to assume a size
330 that they have on the platform that targets in practice. The alternative
331 would be to create target-agnostic integer types (for example, `Int` in
332 Swift is a strong typedef for `Int32` on 32-bit targets, and `Int64` on
333 64-bit targets), but this makes it harder to provide idiomatic, transparent,
334 high-performance interop.
Googler85ea7772022-05-14 01:29:31 -0700335* The C++ standard does not specify whether standard library types like
336 `std::vector` are trivially relocatable; it is an implementation detail.
337 Universal interop tooling would have to conservatively assume
338 non-trivially-relocatable types. Interop tooling specific to certain
339 environments can rely on libc++ providing a trivially-relocatable
340 `std::vector` and project it into Rust in a much more ergonomic way.
341
342**Pros**
343
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700344* **Interop tooling will generate the most performant code sequences** to call
345 foreign language functions.
346 * If interop tooling generates portable code, it would have some overhead.
347 The overhead can be eliminated by C++ and Rust optimizers at least in
348 some cases, but at the cost of increased build times. For example,
349 eliminating thunks would require turning on LTO, which is not fast, and
350 usually only used for release builds. It is much preferable to not
351 generate thunks in the first place, if the target platform does not need
352 them.
Googler85ea7772022-05-14 01:29:31 -0700353* **Ergonomics of API projections will be improved.**
354 * For example, whether a C++ type is trivially relocatable or not is an
355 implementation detail in C++, transparent to C++ users of that type, but
356 it makes a huge ergonomic difference in the Rust API projection.
357
358**Cons**
359
360* **C++ code will have additional evolution constraints.**
361 * For example, changing a type from trivially relocatable to non-trivially
362 relocatable is a non-API-breaking change for C++ users, but it would
363 break Rust users.
364* **It would be more difficult to switch internal environments to a different
365 C++ standard library.**
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700366* **Code that is deployed in environments that have incompatible
367 implementation details won't be able to use this C++/Rust interop system.**
Googler85ea7772022-05-14 01:29:31 -0700368 * Alternatively, these executables would have to bring a suitable
369 execution environment with them (e.g., a copy of libc++).
370
371### Interop tooling should be maintainable and evolvable for a long time
372
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700373We should design and implement C++/Rust interop tooling in such a way that we
374can maintain and evolve it for more than a decade. If Rust becomes tightly
375integrated into an existing C++ project, specific requirements for interop and
376API projection rules will keep changing. The more Rust adoption we will have,
377the more library and team-specific interop customizations we will have to
Googler85ea7772022-05-14 01:29:31 -0700378support, and the more it will make sense for the performance team to tweak
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700379generated code to implement sweeping optimizations. These kinds of changes
380should be readily possible, and they should not create conflicts of interest
381between diferent users of the interop tooling.
Googler85ea7772022-05-14 01:29:31 -0700382
383### Interop tooling should facilitate C++ to Rust migration
384
385C++/Rust interop tooling should try to create a favorable environment for
386migrating C++ code to Rust. Specifically, projections of C++ APIs into Rust
387should be implementable in Rust. This way, a C++ library can be converted from
388C++ into Rust transparently for its users, as its public API won't change.
389
390## Alternatives Considered: Design decisions
391
392### Repeat C++ API completely in a separate IDL
393
394Instead of reading C++ headers in the interop tooling, we would require the user
395to repeat the C++ API in some other form, for example, in a Rust-based IDL like
396in the cxx crate, or in sidecar files in a completely new format.
397
398**Pros**
399
400* **Interop tooling can be simpler if it does not have to read C++ headers**.
401 But even under this alternative approach, tooling might want to read C++
402 headers, nullifying this advantage. For example, tooling might want to
403 automatically generate an initial Rust snippet or to suggest in presubmits
404 to adjust the Rust code that mirrors a C++ API when that C++ API changes.
405* The **most natural way to define Rust APIs** is by using Rust code or
406 Rust-like syntax in sidecar files.
407* **Available Rust APIs are defined in easily accessible checked-in files.**
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700408* **API definitions written by a human might have higher quality, on
409 average.**
Googler85ea7772022-05-14 01:29:31 -0700410
411**Cons**
412
413* **A big part of the C++ API needs to be duplicated** to reliably match the
414 Rust code with the C++ declarations. The initial code can be generated by
415 tooling, but it has to be kept in sync. This is a burden for the C++ API
416 owners, potentially a bigger one than allowing annotations in C++ headers.
417 * There is a risk that C++ API owners might refuse to own IDL files.
418* The need to create a sidecar file creates a **barrier to start using C++
419 libraries from Rust.**
420 * While the duplication overhead is justifiable for widely-used libraries,
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700421 it is relatively high for libraries with few users and binaries, making
422 it less likely that leaf teams will start adopting Rust.
423* **When the C++ API is changed, the Rust definitions become out-of-sync with
424 it.** Tooling needs to detect this, and the Rust definitions need to be
Googler85ea7772022-05-14 01:29:31 -0700425 changed (either manually or tool-assisted).
426* There is no effective way to verify Rust binding code at the presubmit time
427 of a C++ library other than building downstream projects.
428* **Mapping Rust API definitions to the original C++ API definitions is more
429 complicated and error-prone**. For example, how would we target a specific
430 overload of a function or constructor?
431* There is a **risk that individual teams will build team-specific tooling
432 that generates IDL files** from C++ headers or generates both IDL files and
433 C++ headers from a single source. These solutions are unlikely to scale to
434 existing large codebases and will likely only work for that specific team.
435
436### Use Rust code to customize API projection into Rust
437
438An alternative to storing additional information in C++ headers is to put it
439into Rust code. For example, the cxx crate requires users to re-state the C++
440API in Rust syntax, adding information about lifetimes and nullability. The pros
441and cons of this choice are the same as when defining a special IDL that repeats
442the C++ API completely (see above).
443
444### Generate glue code in binary formats
445
446Instead of generating glue code as textual sources, interop tooling could use
447Clang and LLVM APIs to emit object files with C++ glue code and use Rust
448compiler APIs to generate rmeta and rlib files with Rust glue code.
449
450**Pros**
451
452* **More flexibility in the code that can be generated.** Controlling LLVM IR
453 generation allows interop tooling to generate code that an unmodified
454 compiler can't generate from textual source code. For example, the Rust
455 language does not have any constructs that map to `linkonce_odr` functions
456 in LLVM IR; if the interop tooling embedded the Rust compiler as a library
457 and had more control over how it generates the IR, we could make that
458 happen.
459
460**Cons**
461
462* Injecting customizations provided by API owners is harder.
463* LLVM, Clang, and Rust compiler APIs are not stable. The format of Rust
464 metadata files is not stable either. The larger the API subset we consume
465 from Clang and Rust, the more difficult it becomes to maintain the tooling.
466* To generate object files the interop tooling has to ensure that its
467 Clang/LLVM version and configuration is identical with the Clang compiler
468 used to build other C++ code.
469 * We can solve this problem, but it makes the system more fragile,
470 compared to using existing C++ and Rust compilers to compile generated
471 sources.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700472* From time to time LLVM introduces bugs that cause miscompilations. If
473 interop tooling embeds LLVM, we would be adding another tool that toolchain
474 engineers will need to look into when debugging a miscompilation. We would
475 be making the job of C++ toolchain maintainers harder.
Googler85ea7772022-05-14 01:29:31 -0700476
477## Alternatives Considered: Existing tools
478
479### bindgen
480
481[bindgen](https://rust-lang.github.io/rust-bindgen/) **automatically generates
482Rust bindings from C and C++ headers**, which it consumes using libclang. The
483generated **bindings are pure Rust code** that interfaces with C and C++ using
484Rust’s [built-in FFI for C](https://doc.rust-lang.org/nomicon/ffi.html)
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700485(`#[repr(C)]` to indicate that a struct should use C memory layout and `extern
486"C"` to indicate that a function should use a C calling convention). C++
Googler85ea7772022-05-14 01:29:31 -0700487functions are handled by generating a Rust `extern "C"` function that has the
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700488same ABI as the C++ function and attaching a `link_name` attribute with the
489mangled name.
Googler85ea7772022-05-14 01:29:31 -0700490
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700491See
492[here](https://manishearth.github.io/blog/2021/02/22/integrating-rust-and-c-plus-plus-in-firefox/)
493for an in-depth description of the use of bindgen in Stylo, a Rust component in
494Firefox.
Googler85ea7772022-05-14 01:29:31 -0700495
496**Pros**
497
498* **The oldest and the most mature** of the existing C++ interop tools.
499
500**Cons**
501
502* **Deficiencies in safety and ergonomics**, for example:
503 * References are imported as pointers. No lifetimes, no null-safety.
504 * Constructors and destructors are not called automatically.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700505 * Overloads are distinguished by a numbered suffix in Rust. These numbers
506 clutter the source code and are hard to remember, as they have no
507 meaning. Adding overloads can change the numbering and hence break Rust
508 callers.
509* It is **impossible to use C++ inline functions and templates** from Rust
510 because of bindgen’s architecture[^1]. The architecture is unlikely to
511 change, and therefore, this is a dealbreaker.
Googler85ea7772022-05-14 01:29:31 -0700512
513**Evaluation**
514
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700515bindgen could be used in a project that has very limited C++ interop needs.
516However, creating safe and ergonomic wrappers for the generated bindings would
517require additional effort. Our vision and goals for C++ interop are very
518different from what bindgen provides.
Googler85ea7772022-05-14 01:29:31 -0700519
520### cxx
521
522[cxx](https://cxx.rs/) generates **Rust bindings for C++ APIs and vice versa**
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700523from an **interface definition language (IDL) included inline in Rust source
524code.** cxx generates Rust and C++ source code from IDL definitions. To check
525that the IDL definitions match the actual C++ API, cxx inserts static
Googler85ea7772022-05-14 01:29:31 -0700526assertions[^2] into the generated C++ code; it does not, however, read the C++
527headers itself. cxx contains built-in bindings for various Rust and C++ standard
528library types that are not customizable.
529
530As far as we understand, cxx has the following design constraints and goals:
531
532* **Ship a stable product for its intended audience.**
533 * As a consequence, improvements such as integrating move semantics are
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700534 not going to be accepted soon. We understand that cxx is not a vehicle
535 for experimentation. cxx maintainers would prefer us to first show that
536 our ideas work in a fork of cxx or in a different system, such as
537 autocxx, and that our improvements pull their weight given the added
538 complexity.
Googler85ea7772022-05-14 01:29:31 -0700539* **Remain simple and transparent.** There is a limit on the amount of
540 complexity that will be tolerated.
541 * There is a chance that improvements such as modeling C++ move semantics
542 or various attempts at eliminating thunks will not be ever accepted in
543 upstream cxx.
544* **Non-goal: Automatically provide high fidelity interop.**
545 * cxx is designed for the use case of an executable where C++ and Rust
546 parts communicate through a narrow interface.
547* **Non-goal: Automatically provide the most performant interop in as many
548 cases as possible.** For example:
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700549 * cxx does not attempt to eliminate C++-side thunks. Instead, using LTO is
550 recommended.
551 * cxx considers it acceptable to allocate all objects of "opaque" types on
552 the heap. Users who find these heap allocations unacceptable for
Googler85ea7772022-05-14 01:29:31 -0700553 performance reasons are expected to implement a different C++ entry
554 point that does not hit this limitation and bind it to Rust instead of
555 the original C++ API. Heap allocation is acceptable for many C++ classes
556 in most environments, but the exceptions are important enough for us
557 that this is a major restriction.
558
559**Pros**
560
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700561* **Mature and ergonomic enough today for mixing C++ and Rust in existing
562 codebases with limited C++ interop needs.**
Googler85ea7772022-05-14 01:29:31 -0700563* We avoid being on a tech island.
564
565**Cons**
566
567* cxx’s stability goal makes it **hard to experiment with how the Rust API
568 looks.**
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700569* **Our goals are unlikely to align well with the goals of the intended user
570 audience of cxx.** We would be pulling cxx in directions that make it a
571 worse product for its current users.
Googler85ea7772022-05-14 01:29:31 -0700572* **Almost no customizability**. Users who are not satisfied with what cxx
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700573 does are expected to wrap the target C++ API in a different C++ API that is
574 more friendly to cxx.
575* cxx tries to be compatible with most standard C++ implementations found in
576 the real world, so it **cannot take advantage of unique guarantees provided
577 by the target execution environment.**
Googler85ea7772022-05-14 01:29:31 -0700578
579**Evaluation**
580
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700581cxx could be used in projects with limited C++/Rust interop requirements.
582However, we would not be able to implement many interop features that we
583consider essential (for example, move semantics, templates).
Googler85ea7772022-05-14 01:29:31 -0700584
585### autocxx
586
587[autocxx](https://github.com/google/autocxx) **automatically generates Rust
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700588bindings from C++ headers**. As the name implies, it automatically generates IDL
589definitions for cxx, which then produces the actual bindings. In addition,
Googler85ea7772022-05-14 01:29:31 -0700590autocxx generates its own Rust and C++ code to extend the Rust API beyond what
591cxx itself would provide, for example to support passing POD types by value.
592autocxx consumes C++ headers indirectly by first running bindgen on them and
593then parsing the Rust code output by bindgen.
594
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700595autocxx’s
596[design goals](https://www.chromium.org/Home/chromium-security/memory-safety/rust-and-c-interoperability)
Googler85ea7772022-05-14 01:29:31 -0700597are similar to our own in this document.
598
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700599We did a case study on using an existing project's C++ API from Rust using
600autocxx.
Googler85ea7772022-05-14 01:29:31 -0700601
602**Pros**
603
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700604* **Low barrier to entry**: Bindings are generated from C++ headers, no need
605 to write duplicate API definitions.
Googler85ea7772022-05-14 01:29:31 -0700606* **Ergonomic mappings** for many C++ constructs.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700607* **Open to contributions that change the generated Rust APIs** or make
608 architectural changes.
Googler85ea7772022-05-14 01:29:31 -0700609
610**Cons**
611
612* **Relatively new and immature.**
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700613* **Cannot (yet) consume complex headers without errors.** We’ve managed to
614 import some actual Spanner headers, but there are still enough outstanding
615 issues that we can’t yet do anything useful with Spanner.
616* **Architecture can make modifications difficult.** autocxx is built on top
617 of two other tools, bindgen and cxx, and the interfaces between these
Googler85ea7772022-05-14 01:29:31 -0700618 components can make it harder to make a modification than it would be in a
619 monolithic tool. Specifically:
620 * autocxx uses bindgen to generate a description of the C++ API that it
621 can parse easily (as opposed to trying to parse C++ headers either
622 directly or using Clang APIs). Since bindgen was not intended for this
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700623 purpose, its output lacks some information that autocxx needs, so
624 autocxx [has forked](https://crates.io/crates/autocxx-bindgen) bindgen
625 to adapt it to its needs. The forked version emits additional
626 information about the C++ API in the form of attributes attached to
627 various API elements.
Googler85ea7772022-05-14 01:29:31 -0700628 * bindgen in turn is built on the libclang API, which doesn’t surface all
629 of the functionality available through Clang’s C++ API. Adding features
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700630 to libclang requires additional effort and has a 6 month lead time to
631 appear in a stable release (to become eligible to be used from bindgen).
Googler85ea7772022-05-14 01:29:31 -0700632 * When errors occur, it can be hard to figure out which of the components
633 is responsible.
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700634 * Adding features can require touching multiple components, which requires
635 commits to multiple repositories.
Googler85ea7772022-05-14 01:29:31 -0700636
637**Evaluation**
638
639We initially intended to use autocxx to prototype various interop ideas and
640potentially as a basis for a field trial. We still believe this would be
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700641feasible, but after trying to modify autocxx and its bindgen fork during an
642internal C++/Rust interop study, we feel that autocxx’s complex architecture is
643enough of an impediment that we could achieve our goals with less total effort
644by creating an interop tool from scratch that consists of a single codebase and
645uses the Clang C++ API to directly interface with Clang.
Googler85ea7772022-05-14 01:29:31 -0700646
647[^1]: Doing so would require either generating C++ source code or interfacing
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700648 deeply enough with Clang to generate object code for inline functions and
649 template instantiation.
Googler85ea7772022-05-14 01:29:31 -0700650[^2]: And tricks such as suitable type conversions that force the C++ compiler
Dmitri Gribenko380ddfb2022-05-16 01:00:59 -0700651 to perform appropriate checks at compile time.