ethanol's comments

ethanol · on July 8, 2024

Ten years? Try one year. I've had massive problems upgrading Gradle between consecutive major versions despite always trying to stick to the rules (as I understand them) and never using deprecated or experimental features. Maven simply doesn't have these problems.

Search for Gradle discussions on HN if you're interested, the rough consensus of this site (if there is such a thing) seems to confirm my experience.

brnt · on July 8, 2024

Funny, I'd've expected the opposite in the Java ecosystem. As someone entirely outside of it: why would one then use Gradle? Haven't seen a single positive post here yet.

zmmmmm · on July 8, 2024

Gradle makes it incredibly fast and easy to bootstrap a simple project. Then, it's extraordinarily flexible after that in how you can use Groovy to add custom logic into how your build works. There are a lot of things I like about it.

What I really hate though is that it's designed around a principle of everything happening 'magically' such that when something goes wrong it is nearly impossible to mechanistically understand what happened and how to fix it. It really leans into the most negative aspects of Groovy and amplifies them.

SOLAR_FIELDS · on July 9, 2024

I usually try to gen out a graphviz dotfile of a “typical” Gradle project’s task dependency in an org to demonstrate to people how much magic happens under the hood. I often have to choose the smallest, simplest project or the graphviz engine will choke on the hundreds to thousands of nodes and dependencies used.

kaba0 · on July 9, 2024

Aren’t you displaying each and every file? Like, gradle doesn’t have all that many tasks in a default java project.

SOLAR_FIELDS · on July 10, 2024

Just the default task dependency tree for a build.gradle/settings.grade. And sure, a default project might be pretty simple but add one or two plugins and it gets very unsimple very quickly. And no JVM project is using Gradle without leveraging at least a few plugins. At my last job I did this for a simple generic cron style service project (perhaps maybe 10-15 Java files, including tests) and still ended up with a task dependency tree of hundreds of nodes, due to the plugins.

kaba0 · on July 9, 2024

While I’m not a big kotlin fan, I do believe that using it as a config language is a good direction.

SOLAR_FIELDS · on July 9, 2024

Adding onto the things that other people said: Gradle is an extremely powerful build tool. It has three tiered caching (project wide, machine wide, remote, with fallbacks), complex DAG resolution, and in general is just really good at building jvm stuff. It can handle nearly every weird way that you could possibly dream of to structure your project. It’s generally not super buggy. It’s generally pretty fast if you know what you are doing. There are years and years of optimizations built into it. It is the tool to build enterprise jvm code. If you go to a jvm shop at an adult company that isn’t stuck in 2006, they are using Gradle.

The downside is that it is insanely complex, and nearly impenetrable for newbies. It’s also for the same reason pretty Perl-like (a “write-only” language)

kaba0 · on July 9, 2024

I would also add that there isn’t too many players in its category of general build tools. Like, sure, you can have a go-specific build tool that is more ergonomic, but it chokes the instant you add a c dependency.

I guess there is only bazel and gradle that are on this level of capability, being able to compile polyglot code bases. Probably also cmake, but compared to that, gradle looks as if it was designed by God itself, even though it severely lacks in ergonomics.

SOLAR_FIELDS · on July 10, 2024

Yeah, I’ll also add is that while Gradle is technically a generic build tool, in practice the ecosystem realistically only makes sense to use for JVM code. All of the JS, Docker, Python, other plugins have significant issues due to only being supported/maintained by a few people. As well as Gradle giving a lot of JVM based caching for tasks set up as part of vanilla Gradle, that aren’t set up in these other ecosystems.

I would argue Buildkit is also trying to be this kind of tool but is significantly less mature.

SpaghettiCthulu · on July 10, 2024

I strongly disagree. Gradle is extremely buggy where it matters the most. For instance, if I cancel a build part-way through and then run it again, gradle assumes that the in-progress JavaCompile task is completed, and proceeds to generate a Jar file containing resources but no classes. To fix this, I am forced to make useless changes, or clear my entire user-level gradle cache.

SOLAR_FIELDS · on July 11, 2024

Doesn’t clean build (or the flag to build the incremental cache again by force) solve this?

SpaghettiCthulu · on July 17, 2024

No, running `gradle clean` only cleans the project-level cache as far as I can tell. It leaves the user-level cache untouched (as far as the issue is concerned). Not sure about that flag you mention though. I'll have to look into that.

Osiris-Team · on July 11, 2024

Running "clean build" should do the trick though, it will delete the ./build directory. Or do you mean the local maven repo?

SpaghettiCthulu · on July 17, 2024

Neither. The corrupted cache is in the user-level gradle cache directory, which is untouched by `gradle clean`.

Tainnor · on July 9, 2024

One of the main reasons for me is that incremental compilation is essentially broken on maven, which may matter on large enough projects and if you keep waiting for CI builds.

That, and the fact that anything slightly out of the ordinary that you'd want to do with maven requires a plugin - or, since most people won't bother, just extra scripts on top of maven, or no automation at all.

vips7L · on July 8, 2024

Gradle has faster builds due to the build daemon it uses and it has a better CLI interface. Starting a new project is just: gradle init. Some people use it just because XML == bad too.

krzyk · on July 9, 2024

because it was new and shiny, people complained that maven doesnt allow arbitrary code to be executed in build, etc.

And now they end up with what Ant was like.

kaba0 · on July 9, 2024

If you think it’s like Ant, then you have absolutely zero understanding of gradle.

Only the config step is “turing-complete”, which outputs a build graph that is cached and used to do every further step, finding the tasks that need recompilation, etc.

krzyk · on July 9, 2024

"only the config step" - isn't that the only step users of gradle should be interested in?

If config is a code, then it makes it more complicated in the long run (people are people and will add complexity to it) - there is a reason why they work on declarative syntax.

ethanol · on July 7, 2024

A unix like os with user interface from the windows 2000 era that he could use as his daily driver at some point (instead of Linux). He said as much in some podcast I can't remember (probably cppcast).

1oooqooq · on July 8, 2024

No. He said it was occupational therapy. The goal was relaxed coding, not achieving anything at all.

ethanol · on July 7, 2024

GC is but one component of the massive operating system that is current web browsers. This is classic anti-Rust "argument" which states that code with tiny islands of well tested and thoroughly reviewed unsafe functions is somehow equal to 100% unsafe C++.

pizlonator · on July 7, 2024

It’s not an island. The whole rest of the browser has to interact with the GC the right way (barriers, pointer tracking) or else you escape memory safety

Dylan16807 · on July 8, 2024

A ton of things using the GC code fits fine with what they meant by "island". You shouldn't need unsafe code all over, just normal ownership tracking that calls into the GC. And while I wouldn't normally call a GC "tiny", browsers are pretty enormous.

pizlonator · on July 8, 2024

That's not what happens in the browser at all.

The JS engine - about 1/3 of the browser engine, ish - is tightly coupled to the GC through-and-through. Let's even assume you have a JS engine that doesn't JIT. (If it did JIT then rewriting it in a memory safe language isn't going to do much for you at all.) The interpreter, the runtime, and the libraries are going to have to be making decisions on behalf of the GC (for stuff like weak maps, at least) and is going to have to play along with the GC's tracing (every class in the JS engine participates in the JS GC and so must tell the GC how to mark - or worse, move - outgoing references).

The rest of the browser has to play along as well, just not as much. The DOM is basically JS's "standard library" within the browser and it also needs to have its objects play along with the GC's semantics.

So, maybe there is code out there where the GC can be an island. The browser is not that.

Dylan16807 · on July 8, 2024

Tight coupling is not a problem. Ignore the word "island" if that's your issue. Pretend it talked about sealing each unsafe system behind a safe API.

And yes, you can make that happen. Marking references for a GC is in the same ballpark as a custom allocator. It doesn't have to fill the code using it with unsafe.

pizlonator · on July 9, 2024

There’s no way to seal the GC’s unsafety behind a safe API unless the safe code pins every GC object it touches. That’s not what happens in a browser engine - the GC has to ask the rest of the engine where its pointers are, and if any part of the browser fails to account for a pointer correctly, then you get use after free as soon as the GC runs. It’s a harder problem than you make it seem.

Dylan16807 · on July 9, 2024

Doesn't Rc already demonstrate that kind of accounting?

You're going to have to add GC awareness to objects so it can trace through them, but the code that makes and manipulates those objects can have all the necessary rules enforced by types.

pizlonator · on July 9, 2024

It’s possible with RC but not GC.

GC != RC.

A type system can let you say things like “make sure someone finds out when I assign this pointer”. But that’s not what GC wants, unless (as I said before) you’re happy with your non-GC heap pinning objects (the browser isn’t, but some GC clients are, sometimes). GC wants a guarantee like: you cannot have a pointer in any data structure unless that data structure can be found by GC, and when it is found, make sure to respond to the GC’s request to scan it. That ends up being hard to do with any type system, unless that type system only allows objects to be allocated by that GC and the abstraction sits in the compiler/runtime below the type system (like in Java, Go, and Fil-C).

Dylan16807 · on July 9, 2024

> GC != RC.

I am aware.

> But that’s not what GC wants, unless (as I said before) you’re happy with your non-GC heap pinning objects (the browser isn’t, but some GC clients are, sometimes).

I can't figure out exactly what you mean here. Temporary pins on the stack should be fine, and I don't see why anything else would be necessary.

> GC wants a guarantee like: you cannot have a pointer in any data structure unless that data structure can be found by GC, and when it is found, make sure to respond to the GC’s request to scan it. That ends up being hard to do with any type system, unless that type system only allows objects to be allocated by that GC and the abstraction sits in the compiler/runtime below the type system (like in Java, Go, and Fil-C).

Doing heap allocations through the GC doesn't sound very hard. And why does it need to be below the type system?

Maybe instead I should ask you what the already existing GC libraries for Rust get wrong? Do they require users to be unsafe or do something to break safety? Are they unusable to implement javascript?

pizlonator · on July 9, 2024

> I can't figure out exactly what you mean here. Temporary pins on the stack should be fine, and I don't see why anything else would be necessary.

The browser has a whole native heap (i.e. C++ objects, in current impls) that participates in the JS GC heap as follows:

- If the C++ object is referenced from C++ or from JS, it must be kept alive.

- If the C++ object references a JS object, then the JS object must be kept alive, so long as the C++ object would have been alive per the previous rule.

- It's possible for an object reference chain like JSobject->C++object->C++object->JSobject, and let's assume there are no other pointers to the C++ objects, in which case the last JS object should kept alive by GC only if the first JS object is alive.

- It's possible for dead reference cycles to exist like JSobject<->C++object, in which case both should die.

This requires that C++ has the ability to place references to JS objects in C++ fields.

This is where pinning comes in. It would be quite simple (and memory safe) but also totally incorrect (i.e. massive memory leak) to say that if a C++ field points to a JS object, then the GC just sees that field as a root. This is what I mean by pinning. (Note that "pinning" has many meanings in GC jargon; I'm using the Hermes version of the jargon. I guess I could have said "strong root" or something, but that's weirder to say.) This would be wrong, since it would not allow us to collect the dead cycle at all. Dead cycles are a common case in the browser. It would also cause other subtle breakage.

So, what the browser does instead is to have the C++ heap participate in GC: every C++ object that could possibly store a reference to JS objects anywhere can respond to GC callbacks asking it to account for all of those references. And, every C++ object needs to have a story for being referenced exclusively from JS, exclusively from C++, or a combo of the two. And the C++ code needs to be able to participate in whatever barrier discipline is necessary to get generations or incrementality or concurrency that the JS heap wants.

There are different ways to do this. Blink's Oilpan is probably the most principled, and that's basically a whole GC-for-C++ framework - very complex stuff. So, tons of inherently not-memory-safe code on the browser side just so it can do business with the JS heap.

Dylan16807 · on July 9, 2024

I'm trying to figure out where our understandings differ, since that generally sounds familiar and reasonable to me. I guess you're assuming the callback/accounting code needs to be unsafe or able to violate the GC's preconditions? But I don't see why you assume that. As part of the GC, build some data structures that can handle that accounting and present a safe API, then use those data structures anywhere you don't want to create a root. When the browser code accesses the contents, barriers can be applied automatically.

The hard part of using a data structure like that is giving it ownership of the data and control over destroying it, but Rc does that too, doesn't it? That kind of thing is why I mentioned Rc. The difference between Rc and GC is much more in the behind-the-scenes tracking than in the API it gives.

pizlonator · on July 9, 2024

> I'm trying to figure out where our understandings differ, since that generally sounds familiar and reasonable to me. I guess you're assuming the callback/accounting code needs to be unsafe or able to violate the GC's preconditions? But I don't see why you assume that. As part of the GC, build some data structures that can handle that accounting and present a safe API, then use those data structures anywhere you don't want to create a root. When the browser code accesses the contents, barriers can be applied automatically.

If this code was all written in Rust, I could imagine there being almost no uses of `unsafe`, except for one: the thing where the GC decides to delete an object.

But this means that all of that code that isn't marked `unsafe`, but instructs the GC about what objects to mark or not, is really super unsafe because if it makes a logic error in telling the GC what to mark then the GC will delete an object that it should not have deleted.

So, the problem here isn't that you can't wrap the unsafe stuff in a safe API. The problem is that even if you do that, all of your seemingly-safe code is really super unsafe.

> The hard part of using a data structure like that is giving it ownership of the data and control over destroying it, but Rc does that too, doesn't it? That kind of thing is why I mentioned Rc. The difference between Rc and GC is much more in the behind-the-scenes tracking than in the API it gives.

The difference between RC and GC is exactly in the fact that the API they give is radically different. And that the behind-the-scenes tracking is different, too.

It's totally valid to tell RC "I want to point at this object so keep it alive".

But that's not the API that the GC will give you, unless it's a pinning API. The API where you tell the GC "keep this alive" will prevent the deletion of the garbage cycles I mentioned earlier.

So, the GC (in the case of something like a browser) gives a different API: one where user code either marks something, or not, at its discretion. There's no easy way to make that safe.

Dylan16807 · on July 9, 2024

> But this means that all of that code that isn't marked `unsafe`, but instructs the GC about what objects to mark or not, is really super unsafe because if it makes a logic error in telling the GC what to mark then the GC will delete an object that it should not have deleted.

By putting it in a GC structure, it would be giving ownership to the GC, and would borrow it back to use. So whenever it's not borrowed, it's safe for the GC to delete it. And the GC won't delete anything outside of its control.

If you have a logic error and prematurely delete, then attempting to borrow the object will return None, which is still safe.

> It's totally valid to tell RC "I want to point at this object so keep it alive".

> But that's not the API that the GC will give you, unless it's a pinning API. The API where you tell the GC "keep this alive" will prevent the deletion of the garbage cycles I mentioned earlier.

The idea is that any pointing/pinning you do while manipulating an object would be a borrow on the stack. As soon as the function returns, there's no pinning.

So there's a few things that exist in this system:

* A GC-object references another GC-object, keeping the target alive while it is alive. A GC-object can be created by either Rust or JS. The loop you describe would be made up of GC-objects, so it would be straightforward to collect.

* A non-GC object has a permanent pinning reference to a GC-object. These are rare and purposeful.

* (Optional) A non-GC object has a weak reference to a GC-object, which can be collected at any time.

* Rust code has a temporary pinning reference to a GC-object on the stack, while traversing/manipulating it, and it goes away as soon as the function exits. It won't pin too long because the release is automatic. It won't free prematurely because the GC code knows the borrow is happening.

ethanol · on July 7, 2024

It's been discussed at length multiple times. This should be enough to get you started on further research:

https://msrc.microsoft.com/blog/2019/07/a-proactive-approach...

> ~70% of the vulnerabilities Microsoft assigns a CVE each year continue to be memory safety issues

ethanol · on July 7, 2024

Now we only need to force bloggers to stop using GitHub Gist embeds. Hugo (and probably other static site generators) has built-in support for code snippets with syntax highlighting, and more dynamic sites can rely on highlight.js which removes dependence on third-party services. It's just insane, using heavy iframes for small code snippets.

https://gohugo.io/content-management/syntax-highlighting/

https://highlightjs.org

arp242 · on July 8, 2024

> force bloggers to stop using GitHub Gist embeds

Or maybe just let people do what they want on their own site, even though you or I might not like it?

"Forcing" people to change their sites to suit your preferences is ridiculous.

withinboredom · on July 7, 2024

I remember when I worked at Automattic and discovered that the gist's heart emoji was actually served by WordPress and not GitHub. They fixed it within a few weeks, but it was like that for years...