This is an opinion piece written in response to The Simpler Alternative to GCC-rs (Shnatsel), explaining why I support the GCC-rs project, and why rustc_codegen_gcc is not a substitute for it. While rustc_codegen_gcc, does provide some of the benefits of the gcc-rs, it does not provide all of them. There are benefits specifically to having a distinct frontend, over just the backend, which I will address in this article. This is from the perspective of an author of lccc, which includes a different alternative implementation of the rust language, though it is further from completion then gcc-rs.
Having Multiple Implementations is a good idea
Having multiple implementations, in general, is a good idea. While it can be shown that, at least to begin with, C and C++ had issues with it, it is in my experience this is generally no longer the case. Because I have options available to me, I can choose the compilers I want to support based on the available features and compliance with the standard. I recently raised a Non-compliance bug with Microsoft Visual C++ that prevents my C++ code from compiling. I’ve chosen not to support MSVC until that bug is fixed, and am able to do that because of alternatives like gcc and clang. With rustc, if I have a legitimate issue, I cannot choose to use a different compiler, I can either choose to redesign the code or abandon rust entirely. In fact, this was an issue with lccc, which would have been written in rust but for the lack of any stable ABI at a higher level than C.
Having competition in the space of rust front ends would give reasons for the compiler authors to fix bugs and otherwise improve, and give users options for when those bugs are present, beyond just “work around it” or “drop the language entirely”.
This point does not come without a “however”, though. It is entirely possible that gcc-rs could cause the ecosystem to fracture, if it introduced considerable inconsistencies with established “features” of the rust language and made limited, or no, efforts to fix them. However, part of the solution would be a proper specification of some kind, which I will address later. The other part, of course, would be holding the implementations accountable to fixing non-compliance bugs, by refusing to support implementations that violate the spec. However, this would naturally require a proper specification, as lacking one, users could hold up their code as a reason that all implementations need to perfectly emulate any other implementation, including it’s bugs and unspecified behaviour.
Bootstrapping is a problem, mrustc is not the solution.
Regarding bootstrapping, having the ability to bootstrap a compiler from any arbitrary point can be useful for more than just introducing the compiler to that platform. It can be useful in security and safety verification of software to be able to start from some known good software (like a C compiler), and build up from there without entering an untrusted domain.
Further, while mrustc can be used to shorten the bootstrap chain, it does not shorten it to a reasonable limit, as it’s only capable of building rustc 1.39, as of the last time I checked (which was a couple weeks prior to publishing this article). rustc 1.n is capable of building rustc 1.n+1, and the current stable rustc is 1.52 (nightly on 1.54). Thus, to reach the current stable rustc, from mrustc, one would have to build 13 different versions of rustc, at least. While this is better than 52 versions for sure, it still takes an entirely unreasonable amount of time. In contrast, with Rust compiler written in C++98 (the version that gcc-rs is written), starting from a C compiler, the sequence is simple:
- Build gcc 4, with C compiler
- Build gcc-rs, with gcc-4
The mrustc sequence is:
- Build gcc 4, with C compiler
- Build gcc 10.2, with gcc-4
- Build mrustc, with gcc-10.2
- Build rustc 1.39, with mrustc
- Build rustc 1.40, with rustc 1.39
- Build rustc 1.41, with rustc 1.4
- Build rustc 1.51, with rustc 1.50
- Build rustc 1.52, with rustc 1.51
In the gcc-rs case, there are 2 bootstrap steps. In the mrustc-rustc case, there are 15 bootstrap steps. A compiler written in C++11, 14, 17, or 20 would have 3 steps (gcc 4, gcc 10, target) and a compiler written in rust that targets mrustc would have 4 (gcc 4, gcc 10, mrustc, target). All of these are significantly less than 15, though 15 is an improvement from a potentially unbounded chain, with at least 53 steps. Additionally, gcc is much less expensive, build-timewise, as rustc is. On my system, rustc takes approximately 6 ½ hours to build 2 stages. With — enable-bootstrap (which is optional for gcc, and can be ignored when doing a manual bootstrap), this time is a mere 2 ¼ hours (40 minutes with — disable-bootstrap). gcc 4 is also significantly smaller and cheaper to build, taking only 40 minutes with — enable-bootstrap (after patching it to build on modern linux with gcc 10).
Further, mrustc is only capable of targeting x86. Performing this reduction is thus impossible on riscv or aarch64, and for bootstrapping on these platforms, one would have to build every single version of rustc.
Additionally, even when safety/security verification is not necessary, cross-compilation can introduce unnecessary builds of rustc (which can take minutes, to several hours, to days). For example, when building a linux environment following similar instructions to Linux From Scratch, to cross compile rustc would involve building it in each stage (pre-tools, tools, and chroot). Having a compiler that is simple to bootstrap would allow someone building such a system to build the rust compiler as late as possible, a lot later than cross-compilation would allow. This issue has been present when building rustc for iglunix, a self-hosted linux distribution that does not contain any GNU components.
Miri is not sufficient for Specifying the Language
One reason for having multiple implementations is to help prompt a specification for the language. It is argued that miri does that job already, however, it is not sufficient for this purpose, for three major reasons:
- Miri only performs dynamic analysis of programs. It can only check code paths that are evaluated (and not all possible code paths and all possible cases). By extension, you can only check conditionally-compiled code you are capable of building for and running. You can’t check code that is enabled on windows without a windows machine (or virtual machine).
- Further, miri doesn’t really help specify it, as the rules it follows are in contention and, with limited exceptions, such as the Stacked Borrows aliasing model and the atomic memory model (the latter due to the fact that rust delegates to the C++ language), cannot be considered in the theoretical, like a well written prose specification can.
- Miri also cannot check code that makes use of inline assembly or FFI calls, which may be necessary in making the specific code either have defined (or undefined) behaviour. Thus miri cannot confirm for all rust code whether or not that code has defined behaviour, even once all the rules are firmly defined.
The ferrocene project is also helpful in specifying the language, however I think that having the experience of multiple implementers in the design of that specification would be further advantageous, than just those working on the rustc frontend and those of that project.
While rustc_codegen_gcc does indeed solve many of the issues with rust that gcc-rs propports to solve, it does not provide all of the benefits that the latter may result in. Further, the bootstrapping problem is barely mitigated by the existence of mrustc, and miri is not useful in reasoning entirely about the behaviour of given rust code, the way a proper specification would be. Because of these reasons, among others unmentioned, I support gcc-rs and I hope I’ve convinced you to support it as well.