Just built both versions for you. Edit: Please note the C++ code change doesn't actually use the STL, it really just changes the compiler and code style in a few places. So I don't think this represents any argument for or against C vs C++.
Unless you're on a limited embedded platform 100k of size increase in a binary is basically nothing these days or any pc/mac built in the past 20 years. Also equating bigger to slower in binaries as a truism is usually a fallacy.
A question that I'd like to ask a C++ expert: If you use any STL container, is it always the case that the whole thing is templated, therefore effectively compiled and statically linked into the binary? Or will part/all of it come from functions in the dynamically linked libstdc++.so?
libstdc++.so (and libc++.so if you're bent that way) contain the standard stream objects, some standard string functions specialized on `char`, some threading and synchrony support and important parts of the C++ language runtime (eg. a default ::operator new, some of the exception unwinding and RTTI mechanisms). And that's it. It's actually fairly small and basic.
Pretty much everything else in the C++ standard library is template code in headers and gets instantiated when needed. You may end up having identical templated functions emitted into several different translation units and the linker will chose one at link time, which is one major reason why C++ is slower to build.
Only template functions that are actually used get instantiated. If you include a header with a gazillion templated functions (namespace level or member functions, it doesn't matter) you are most likely to end up with only one or two instantiated in your binary. Templates are like a cookbook, and just because Larousse has 10,000 recipes does not mean that's how much you're going to eat at every meal. Consider there is virtually an infinite number of possible instantiations for every template.
I guess it depends on your definition of small... VSCode recently mentioned a 10MB hit to statically link the libcxx library from Chromium to dodge issues running on older platforms.
That depends on how the standard library is implemented. I wouldn’t be surprised if libc++ or libstdc++ used explicit template instantiations for STL containers of primitive types, e.g. std::vector<int>.
Most of that is probably in some symbol table used by the linker. C++ symbols include type names for parameters, C compilers only contain a function name, its linker literally can't tell the difference between a char main[10] and int main(int,char*){} because both are compiled to "main" .
The size of the stripped binaries should be compared. ELF binaries contain both link-time sections and loadable segments, and the longer name of C++ symbols only affects disk space (for both the link-time symbol table and for debug information). The size of the runtime-loadable segments is really the only thing of interest, and stripping the binaries gives you a truer indication of this.
Probably a bit. Monomorphization is expensive in terms of code size. On the other hand - why would you care for a couple of 100kb for a use case like transmission?
All sources I've found indicate that uTorrent was written in C++, but perhaps its author was a bit more mindful of unnecessary abstraction and the like.
It is a common mistake what you're doing by
generalizing it out of the original scope, which is only performance wise. Any other aspect you might think is not encompassed into the original zero cost abstraction concept.
In microbenchmarks, on a long pipeline RISC, or similar microarchitectures like NetBurst (P4), I could see that being true. But we're long past that era now. It's the same misguided assumptions that leads to several-KB-long "optimised" memcpy() implementations whose real-world effects on performance are negligible to negative.
If you don't believe me, read what Linus Torvalds has to say about why the Linux kernel is built with -Os by default.
The longer code is typically generated because the compiler will generate vectorized code that provides enormous speedups in case of longer data sets. Take, for example, this code: https://godbolt.org/z/WEx3Gb5jr
At -O2 the assembly it generates is straightforward, and in line with what a human programmer would write. At -O3 it generates vector code that needs a lot more instructions (vector pipeline setup, code to deal with the remaining elements that don't entirely fill up a vector register, etc.) but the main loop takes 4 integers at a time instead of one, so that provides a nice 4x speedup. In order to achieve that it needs 25 instructions to set up the loop / finish the remaining elements, compared to 5 instructions for the -O2 code.
For very short loops the -O2 version will have superior performance, but for runs of data from around 8 integers (wild guess) the -O3 version will begin with pull ahead. So it really depends on the type of data your program is handling, whether it is better to optimize for speed or size.
Evidently you are unaware that use of -Os has on earlier (but quite recent) generations of CPU architecture resulted in notably faster performance.
And, that any compiler feature that is little used will necessarily receive less attention than commonly used features, and be less stable and reliable. Before trying to fix any bug in a program built with -Os, reproducing it first in -O2 will reduce premature balding.
> Before trying to fix any bug in a program built with -Os, reproducing it first in -O2 will reduce premature balding.
I have no idea what you're talking about. Debug your programs in -O0, which means "no optimizations". -Os is, and has always been, optimizing for executable SIZE. It has no guarantees wrt performance.
I can easily name several usecases, when increased code size leads to better performance (compile-time evaluation, architecture-specific optimizations)
"Zero cost abstractions" are, in practice, quite rare.