Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

why do people care about binary size? I have never understood this. Disk space isn't free but the size of the binaries on my machine doesn't seem like a big problem to me in that regard.


I can't speak for others, but I care about binary size as a user because it tends to be a reliable signal of overall software quality (though there are definitely exceptions).

I also find it incredibly frustrating when an application I rarely use fails to launch because it wants to force me to download, process and install massive updates for features that I don't even want.

As an engineer, I care about binary size because gigantic binaries tend to require far more time to compile, move around and work with. Gigantic binaries tend to be a reliable signal of slow, toxic, internal processes that chew up positive energy and produce cynicism and frustration in it's place.

It's not about the disk space per se, it's about treating waste as if it's a virtue.


It's usually not the disk space that's the bottleneck, but rather, the CPU instruction cache. On modern Intel and AMD CPUs, the jump from L1 -> L2 alone can triple the latency of a memory fetch. For a "hello world" application, that doesn't really mater, but for, say, an OS kernel, it becomes really important to keep as much of your hot-path code in i-cache as possible.


The binary size tells you ~nothing about how good it is at effectively utilizing L1 icache. In fact, optimizations regularly increase binary size because it turns out inlining to avoid function calls can be even more important. See also loop unrolling, SIMD paths, etc...

I'm more likely to believe Zig's "small binaries" are more from lack of optimizations than some obsessive focus on L1 icache density. Which, given it's not a 1.0 language, isn't something that can be held against Zig. But it'd hardly be a strength, either.


It requires a special kind of programming. Maybe they should just release the benchmarks instead of saying "small".

By the way, I wouldn't be surprised if Rust 2.0 will feature a typechecker that guarantees that everything fits in the L1 cache.


I really am stretching to find a link between a typechecker and L1 cache. Perhaps some kind of dynamic analysis of code and averaged data could give you a non-empiric measurement of L1 cache utilisation but considering cache behaviour it's not standardised even within one vendor it would be a ball-park guess at best.


I don't think most CPUs expose caches in any way. You can only hope a variable will be cached.


Given that the size of L1 cache can vary, I strongly doubt it. At best it can guarantee everything fits into a specific size, which may or may not be smaller than L1 cache.


Worth noting that apple's "M*" CPUs have a huge instruction cache.


At first I had the same thaught, but then he speaks about Arduino and embedded systems with hard constraints, and it makes sense tho.


In practice, binary size does not matter much for most use cases. (But for embedded and demo scene this is important)

It is a simple proxy/heuristic for the level of bloat / crust / overengineering of the runtime and to the quality of the compiler.

If a language / framework can't do an extremely simple task in an almost optimal fashion, then it is unlikely that it will improve with more difficult tasks.

It is like looking at the CPU/GPU load of an app in idle mode.


It can be critical when targeting microcontrollers with RAM and storage measured in kilobytes rather than gigabytes.


Yes. For example the Arduino 328P ("Uno" boards) has 31.5k of program flash available to you, while the ATMega 2560 has 256k (minus again 0.5k for the bootloader). The STM32F411RE, a Cortex-M4, has 512k.

In embedded development you can forget the overhead due to binary formats such as ELF (while the toolchain might output these as an intermediate, what is flashed is just the relevant sections), but if you are doing things like loop unrolling all over the place, you're going to get less functionality for static program space.

This is why ARM and PowerPC have "embedded" variants of their ISAs. What this means in practice is a compressed form of their instruction representation. Why? If you have compressed instructions, you can fit more of them in program flash. So both Thumb and VLE are variable length encodings. Thumb, for example, can use 16 bits for many instructions, i.e. two bytes, whereas ARM by default would simply take four bytes for all instructions. (For the avoidance of confusion, they still "address" 32-bits of memory and are thus still 32-bit microcontrollers). PowerPC's VLE is similar.

The driver here is cost. You could put in more program memory (the address space has plenty of room) but that costs more money, and when you don't need it, why do it?


For regular PC programs stored on disc it may not be much of an issue, but think about scenarios like embedded programming, WebAssembly running in web pages, or 4K demo contests.

Also IMHO, statically linked programs simply shouldn't contain any code that will never be called, just out of principle.


I think it is more of a proxy to the controlability, which is not often the case with C/C++ where a seemingly small stdlib feature triggers a larger dependency and you can't easily get rid of them.


I kind of care... I have a binary that I distribute (once and update a lot) to 20k+ servers across multiple geographic datacenters... some of the servers are 100mbit only, so even updating all of them at once, saturates the network and slows everything down.

I use golang with all the size optimization steps that I can use... it makes a difference to make the binary smaller... it doesn't have to be minimal, but 4megs is better than 10megs.


Embedded developers care.

A lot.


Strange comment. Are you saying things that don't matter to you, in general don't matter?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: