This was recently posted to /r/rust with the engagement of the author (mutabah) if you'd like to ask questions or get involved. In particular here's him answering why he started this project: https://www.reddit.com/r/rust/comments/718gbh/mrustc_alterna...
One of the interesting things about this project is that because its initial goal is to validate the reproducibility of the main Rust compiler it doesn't actually need to implement a borrow checker, as the borrow checker strictly rejects programs and has no role in code generation.
> it doesn't actually need to implement a borrow checker
I was under the impression that Rust's lifetime and ownership system is not just used to ensure safety, but also for knowing where the compiler should insert deallocation code for dynamically allocated objects at the end of their lifetimes. Is that not the case? Does mrustc generate code that leaks memory, or is there enough information in the Rust source code to do precise deallocation even without a borrow checker?
That is not the case. Destructors run for things that go out of scope. When you create a reference to something ("borrowing") that reference is associated with a lifetime. This lifetime represents how long the reference is valid. A reference can never outlive the things it references (or it would be a dangling pointer), but the opposite is not true: you can create a reference with a lifetime that is shorter than the thing it references.
So basically lifetimes are used only by the borrow checker for intelligent linting. Once the borrow checker is done, the compiler loses interest in lifetimes entirely, and they don't affect codegen.
Destructor scopes are determined from syntactical rules and lifetimes don't play any role in them. OTOH, lifetimes are restricted by the shape of the destructor scopes.
If you don't really need the borrow checker to compile Rust, can the original Rust compiler have a mode where it doesn't do the borrow checking? That might make it easier to ease into Rust, because right now, "fighting" the borrow-checker is the hardest problem with learning Rust.
Be careful not to underestimate the failure mode of trying to compile arbitrary Rust code without the borrow checker. It appears that your goal is to let people "ease into Rust" by trading compilation errors for runtime segfaults, but it's just as likely that un-borrow-checked Rust code could produce malformed LLVM IR that will cause totally unpredictable compile-time failures in LLVM with indecipherably cryptic error messages.
That really wasn't my intention -- it mostly makes sense to use Rust for the borrow checker. Instead my only intention was that it simplify the process of learning Rust -- using all it's features (except the borrow checker), getting a working program fast and have some easy wins when learning Rust without stumbling constantly over the borrow checker. Maybe production mode could always enable the borrow checker.
When folks complain about the "borrow checker" usually they're complaining about a superset of features including move semantics and how data works in Rust. This is far more deeply tied into the whole model of Rust and is frankly very core to the language, "turning it off" won't make it easier to learn, it will make it easier to learn a completely different language.
Besides, all of this is necessary for soundness. Turning off the borrow checker won't magically get you a compiler that is less permissive, it will get you a compiler that will very likely produce nonsense if your program would have failed the borrow checker had it been present.
The need for the borrow checker is driven by Rust's chosen resource management model. It sounds like what you really want is an optional mode with garbage collection, in which case I recommend you check out F#.
This is akin to suggesting that learning Java could be simplified by making its type system optional. Also, if one could turn off safety features, why would they ever switch them back on? They learned a different language, one where those checks are not present.
You still need the borrow checker, and this compiler won't change the kinds of programs you can write. It only works on programs that pass the borrow checker, just like the regular compiler, the difference being that it doesn't check that the program is valid (i.e. passes the checker) but assumes that its validity has already been checked earlier by the regular compiler.
If you give an invalid program to the regular compiler it will be rejected, but if you give the same program to this compiler it might produce nonsense. So the developer of a program would feed it to the regular compiler for checking, but others could take the finished release of the program, assume that it's valid and feed it to this compiler instead, as I understand it.
Removing the borrow checker would allow you to compile broken code (eg. use after free, data races). However given code that compiles on rustc - passes the borrow checker - you don't need to reverify the borrow checking to get correct code from mrustc.
No I'm not saying I have an issue with compile time validation. Far from it.
My point is that the borrow checker is explicitly meant to protect against data races and such. There are situations where I write code I can prove does not violate any hazards but Rust's "validation" is overly-zealous.
Yes, some valid code won't pass the borrow checker. But at the same time, far more invalid code doesn't pass it. Personally I see it as a tool to help me reduce possible runtime errors, and instead have them as compile time errors. This does mean that I often need to design my program around it, but that's try of basically every other language.
The few times I do have an issue I'm usually writing FFI code to interact with C, and I try to minimise that code as much as possible.
Similarily, I like Haskell, but find it's syntax and the whole function and currying thing confusing. Can't I just write Haskell without HKT, functions and currying?
-------------------
Jokes, aside, if you aren't there for Rust borrow-checker, then I wonder, do you really need Rust at all?
I never understand this. I don't mind the borrow checker, I mind the affine type system. It really bugs me that I can't pass the same String into two functions in series, or why preventing that would be a desirable language property.
Seems like explicitly freeing memory should be regarded as unsafe. Unsafe things aside, the compiler should regard it as a copy instead of a move so multiple functions can safely use the resource without having to pass it by reference. If there is a good reason for an affine type system, I suspect it has to do with parallelism and not manual memory management.
Your suspicion is wrong. It has benefits for parallelism, but is critical for ensuring safety of the more predictable (vs GC) Rust/C++ style of "manual" memory management.
When one doesn't have a pervasive GC to dynamically clean things up, things still need to be freed but have to be freed in static locations (either explicit free calls in C or end of scopes in Rust and C++). Making this safe means defending against pointers becoming dangling which Rust does by restricting mutation. This restriction translates into some things that can't be copied, such as mutable references (that can lead to iterator invalidation and use after free, among every other undefined behaviour).
In other words, explicit/scope-based freeing being unsafe means the only way to safely manage memory is a garbage collector (tracing or reference counting), which limits how many programs can be written that are verified-safe by a computer. I believe Ada takes the approach you suggest, but this limits it to be most useful for programs that don't allocate (or only do O(1) allocation during startup).
Affineness also allows modeling things like session types better, letting programmers construct their own APIs that defend against mistakes.
You could, and people have, argue that some types could be "autoclone", as in, have `clone` calls automatically inserted where necessary, but when it is raised, a lot of people express dislike because they feel it will encourage slow code and mean people aren't guided towards the (usually) better solution of using references.
I understand not copying mutable pointers; I don't understand preventing copies of pointers to immutable data. I also don't see why affine types allow rust to get away without a GC in a way that scoped-based memory doesn't.
&T is Copy, so I'm not sure what you mean with the first bit.
For the second one, consider Box<T> vs uniq_ptr<T>; Rust can statically prevent use after-move, C++ cannot, and you'll get a NPE. That is, you can totally get rid of a GC through only RAII, but you can't guarantee memory safety (as far as we know!) without affine/linear types.
To be clear, Rust does scope-based memory management and does allow copying pointers to immutable data.
Affine typing (plus the rest of Rust's system) on top of scoped memory management is needed to avoid problems like returning a pointers into something that is deallocated at the end of a function:
int &foo() {
std::vector<int> v = ...;
return v[0];
} // return value is dangling
And similarly, avoiding having pointers into things that are destroyed when their parent is modified:
std::vector<std::unique_ptr<int>> v = ...;
int &ref = *v[0];
v.clear();
// ref is dangling
Maybe this compiler could help with bootstrapping the official rust compiler. At the moment, it seems to be quite a mess for packagers. Rust is written in Rust, but also needs cargo to build, which is also written in Rust. To break this circular dependency chain, you need to have pre-compiled binaries first in order to bootstrap the build process.
I've been beating my head against the wall for weeks trying to compile rustc/cargo on Openindiana. I want it so that I can compile a recent version firefox and/or thunderbird for Openindiana, both of which require rustc and cargo.
The typical route is to cross compile a compiler and cargo on another platform and then run the resulting binary. Is that approach infeasible for Openindiana?
C compilers are ubiquituous. Name a platform which does not have at least one C compiler. Rust might come into such a position at some later time as well, but until then at least a transitional solution besides cross-compiling would make packaging easier and reproducible.
Really, the solution is just to make cross compiling as easy as any other kind. Compilation is basically a pure function, so by rights compilers with different targets should be drop-in replacements of each other.
Exactly. There's no reason a compiler should take any input but the source (maybe some config) nor have any output but the translated code. There's no dependence on running on a particular piece of hardware in this equation. Just run the codegen for platform Y on platform X.
Sure, I guess in the 10,000 foot view those look like source code :). But there's no fundamental reason you can't have libraries for platform X merely present on a platform Y system, it's just inconvenient today. That's the point I'm really trying to get at.
I'll admit I don't know a whole lot about compilers but don't most of them do optimizations and other changes based on the CPU like its architecture, cache size, and instruction set? That may be what GP is getting at.
Stuff like LLVM, and emscripten exist though so it's probably not as big of a deal as they say.
Sure, based on the target CPU. But there's no fundamental reason I can't do all those optimizations for, say, x86 in a compiler that happens to be running on an Arm CPU. As far as the compiler is concerned it's all just data that gets stuffed in files, just like any other software.
The super-easy cross compilation in go is one of the best parts. Every install comes with the capacity to compile to every go target with no differences in input except setting it by name.
This is really something more languages should strive for.
> This is really something more languages should strive for.
Most languages these days obviate cross-compilation in the first place by being interpreted. Of the languages that remain, the natively-compiled ones, most haven't gone to Go's lengths of writing a custom libc, which is emphatically not recommended for both Windows and Mac (the syscall interfaces aren't stable, and this has broken Go code in the past: https://github.com/golang/go/issues/16272 ). And gc, the primary Go compiler and the one with out-of-the-box cross-compilation, supports relatively few platforms (I count 11, whereas rustc looks like it supports 50-70 platforms). You can use gccgo to get more platforms, but AFAICT gccgo's cross-compilation story isn't nearly as nice: https://github.com/golang/go/wiki/GccgoCrossCompilation .
For Rust, cross-compilation looks like this:
1. Install the libc for the target system, and, if necessary, a compatible linker.
Well, it's not just data - it's also frequently pointers or labels to code in dynamically imported libraries; things which can't always be calculated without the exact library being used on hand.
Be careful, let's not forget the lesson of Ken Thompson's Reflections on Trusting Trust. It's more accurate to say that compilation is a function that takes two arguments: the source code, and the compiler itself; this is how trusting-trust attacks propagate despite total absence from the source code.
You're being pedantic, but your pedantry is incorrect, so allow me to be pedantic in correcting you :p. The output of all pure functions depends on the function and its inputs; this isn't more true for compilers than, say, addition. The problem with Trusting Trust is that the function isn't practically inspectable, not that the function is impure. A trusting trust compiler is still pure.
A functionally pure compiler would always provide the same output for the given source code. The challenges in creating repeatable builds of highly popular open source projects shows that the average compiler is anything but functionally pure. The global state of the system running the compiler has a huge impact on the output of the compiler.
No, it's not obvious what the issue is. If you want to compile a C compiler, you need a C compiler binary. If you want to compile a rust compiler, you need a rust compiler binary. It's the exact same issue, but people don't whine about the C compiler, probably because it's been that way for 50 years.
No, they use C++ even for the C front-end. But you are right, you can still see its C past in the source code.
What's important though is that the GCC devs actively try to avoid the newest C++-features, this allows it to compile current GCC-trunk even with very old GCC-Releases. IIRC even GCC 4.3 (released in March 2008) is able to compile current trunk (GCC 8.0).
This is somewhat different for Rust, where I think the current policy is that master needs to compile with last released version (releases every 6 weeks and just to make it clear: Rust is written in Rust). Before that, the compiler was updated even more frequently. Bootstrapping Rust from 0 therefore is quite hard, since Rust is far from being as ubiquitous as C/C++-compilers. Even if you already have a rustc on your system, it's not unlikely that it is too old for compiling Rust-master. The first Rust-compiler was written in OCaml, so you need to compile that first, then compile Rust commit-for-commit until you reach current master. This could take quite some time. IMHO this is why mrustc is great, since you just compile current master (if it supports all features) with it and then use the generated compiler to compile Rust.
Any carefully written C program that doesn't use certain C99 an C11 things can be said to be C++.
My language implementation compiles as C and C++. Every so often I build it as C++ (like before releases) to check for regressions and flush out any issues caught by the C++ compiler.
I don't think of it as "written in C++", though it is not technically a false statement.
All Scheme programs are really written in Common Lisp; they just need a suitable library of macros and functions ...
MSVC. Yeah, Windows isn't a "distro" but it is a major OS and MSVC is a pretty major compiler. But it's binary only so there's no need to bootstrap it unless you're working at MS.
Speaking from personal experience, that can be quite painful. Being able to build rust and cargo, even if only for bootstrapping when porting, would truly be a wondrous thing.
Why, we could always use emscripten to compile rustc to JavaScript and use that for bootstrapping!
On a more serious note, ghc at least solved this problem by allowing you to compile Haskell to C (-fvia-C). Writing a naive C code generator from whatever your backend is using is probably not that hard.
Go bootstrap uses Go1.4 which compiler is written in C. So: compile Go1.4 and use that to compile the Go 1.X compiler (I don't know but multiple steps might be necessary).
Of the distros, Fedora has the strictest compiler bootstrapping policy, and Fedora seems to be doing great with Rust packaging (for a number of CPU architectures).
Neat project. It looks like this was largely written by one person, and I'm fairly in awe at anyone who can take a big project like a compiler this far alone.
Isn't there a bit of cognitive dissonance in believing that Rust as a language is an important idea (i.e. by the additional code safety and code maintainability that it conveys), but then simultaneously making the effort to rewrite the current Rust-implemented compiler in C++?
C++ is fast, but aside from a shared value around performance, it has fairly little in common with the ideas that Rust is built on.
Multiple implementations of a compiler lets you implement the "Diverse Double-Compiling"[0] countermeasure to the famous "Reflections on Trusting Trust"[1] attack. You wouldn't necessarily use the C++ implementation in production, but it still improves the security of the Rust language just by existing.
DDC is irrelevant here, DDC is an argument to not write the second compiler in C++ and write it in Rust too.
Having a Rust compiler in C++ is a mitigation to the trusting trust attack, period. You don't need DDC for this.
DDC is necessary when you have two self hosted compilers (e.g. GCC and clang). Here we have one self-hosted compiler (rustc), and one in another language (C++). To mitigate trusting trust in rustc, use mrustc to compile rustc, and then use that rustc to compile itself, and now you have a trusted binary (provided you trust your C++ compiler. you can fix this by DDCing the C++ compilers)
As far as I can tell the goal of the project isn't to target more platforms (Rust targets quite a few by way of LLVM), so I don't think I'd choose any other language, including C++.
Having a compiler and standard library written in the language that it compiles has some huge benefits for increasing the pool of possible contributors.
Interesting. Along with with your other comment about the borrow checker, I guess you could develop using (or occasionally check against) the rustc compiler for borrow checker correctness, and deploy using mrustc. That's pretty cool.
> Having a compiler and standard library written in the language that it compiles has some huge benefits for increasing the pool of possible contributors.
But this isn't the official compiler, this is someone's personal project?
> But this isn't the official compiler, this is someone's personal project?
True, but compilers are complicated machines and Rust is still changing at a fairly frantic rate.
The author seems to be doing quite a good job of development today, but if it has any hope of staying current, it probably needs to think about how to increase its bus factor (something happens like changing jobs, starting a family, or they just become interested in something else, and a single person suddenly has less time to contribute).
rust is changing, but in a backwards compatible way. That said the standard library aggressively makes use of new features, so the challenge isn't the language, but compiling libstd.
Could rustc have a way to output desugared code or code targeting a specific epoch with new features like generators expand to a backwards compatible form. This might allow for preprocessed source that could be compiled by something like mrustc even if it doesn't implement every single RFC?
Yes, but I mean outputing actual Rust source but with generators or async/await expanded into calls to Futures. Similar to how Go is now bootstrapped from a down-level compiler.
Yeah, I get it. But that's what I mean; MIR is the common sub-language that's the same across epochs.
I don't think there's any real plans for a source-based approach. But epochs can only change a limited amount of things for exactly this reason; they minimize the compiler burden of supporting them.
Any that are both open-source and C++11 compliant? Guess you can still build g++-4.6 with just C, then a newer g++ from that, but it's a bit of a pain.
If the goal is to break the dependency cycle, a higher level language like Python would make development much easier. C++ is powerful, but not as rapid to develop in.
It's clearly easier to make a compiler in a higher level language (Python is just an example, but Lisps are suited to this kind of thing). For example, text parsing is easier in Perl/Ruby/Python/Swift/etc. As someone who knows C++, more thought is required to do the same thing as in a higher level language, although it runs much faster. If you just wanted to bootstrap the compiler, then you'd choose the easiest route to that. It could also be easier to read and understand than a C++ compiler.
Until then, those comments only annoy those of us that happen to like Rust, but don't find it mature enough to replace C++ on the use cases we happen to care about.
I didn't say you can't write useful software in C++. I said that C++ software is not, in general, memory safe. Sometimes the benefits of a particular piece of software (for example, having an excellent production-grade optimizer) outweigh its drawbacks (for example, not being memory-safe).
I don't think you'd find a single LLVM developer who would claim that LLVM is memory safe. Giving invalid IR to LLVM and not running the verifier frequently segfaults it, for example...
... but the details are very, very different. Unless I've missed significant revisions, data races and concurrency are a non-goal of the Core Guidelines, but are central to Rust.
That said, I always welcome tooling to make C++ safer; the end game is making programs better, not language partisanship!
I will say this for C++: post C++11, it’s one of the few languages in widespread use to have an explicitly defined memory model. The people working on C++ definitely do care about concurrency and parallelism. I’d still choose Rust over C++ for that kind of program any time I was given the choice, though. =)
Absolutely, I'm not saying they don't care; it's that solving that problem is an explicit non-goal of the GSL work. The C++ committee is clearly working on concurrency related things, I was reading the various coroutines TSes recently in fact, as we're working on similar things in Rust.
SaferCPlusPlus[1] is the library that's probably closer to Rust "in spirit and intent". And more importantly, safety effectiveness. And it does address the data race issue.
Im in awe too how single handedly some folks have that drive to keep hitting the keyboard and put all the brain's power into a model defined by electricity and binary. Truly amazing! Bravo
It's a major milestone for a compiled language to have it be self hosting. Another is having a competing implementation of the compiler/runtime. That's legit; cool project and kudos to the beautiful Rust ecosystem.
IF the goal is bootstrapping another approach would be to use the llvm c backend to generate a version of rustc and cargo that can be built by a c compiler.
It may have been nice to build it on GCC instead of LLVM. First, the existing Rust compiler uses LLVM so this won't be a fully independent implementation. Second, it would provide GCC with a Rust front end.
One of the interesting things about this project is that because its initial goal is to validate the reproducibility of the main Rust compiler it doesn't actually need to implement a borrow checker, as the borrow checker strictly rejects programs and has no role in code generation.