Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder how much the size of the compiled code increased --- it's been my experience that "C++-ifying" tends to bloat the binaries quite a bit.

"Zero cost abstractions" are, in practice, quite rare.



Just built both versions for you. Edit: Please note the C++ code change doesn't actually use the STL, it really just changes the compiler and code style in a few places. So I don't think this represents any argument for or against C vs C++.

  -rwxr-xr-x. 1 rjones rjones 4167912 Sep 12 22:09 build-c++/gtk/transmission-gtk
  -rwxr-xr-x. 1 rjones rjones 3997408 Sep 12 22:12 build-c/gtk/transmission-gtk

  -rwxr-xr-x. 1 rjones rjones 3130352 Sep 12 22:09 build-c++/daemon/transmission-daemon
  -rwxr-xr-x. 1 rjones rjones 2959640 Sep 12 22:12 build-c/daemon/transmission-daemon

  build-c++/utils/:
  total 12156
  drwxr-xr-x. 6 rjones rjones    4096 Sep 12 22:08 CMakeFiles
  -rw-r--r--. 1 rjones rjones    5653 Sep 12 22:08 cmake_install.cmake
  -rw-r--r--. 1 rjones rjones     289 Sep 12 22:08 CTestTestfile.cmake
  -rw-r--r--. 1 rjones rjones   13793 Sep 12 22:08 Makefile
  -rwxr-xr-x. 1 rjones rjones 3082448 Sep 12 22:09 transmission-create
  -rwxr-xr-x. 1 rjones rjones 3066256 Sep 12 22:09 transmission-edit
  -rwxr-xr-x. 1 rjones rjones 3182928 Sep 12 22:09 transmission-remote
  -rwxr-xr-x. 1 rjones rjones 3073952 Sep 12 22:09 transmission-show

  build-c/utils/:
  total 11496
  drwxr-xr-x. 6 rjones rjones    4096 Sep 12 22:12 CMakeFiles
  -rw-r--r--. 1 rjones rjones    5645 Sep 12 22:12 cmake_install.cmake
  -rw-r--r--. 1 rjones rjones     287 Sep 12 22:12 CTestTestfile.cmake
  -rw-r--r--. 1 rjones rjones   13725 Sep 12 22:12 Makefile
  -rwxr-xr-x. 1 rjones rjones 2917136 Sep 12 22:12 transmission-create
  -rwxr-xr-x. 1 rjones rjones 2895128 Sep 12 22:12 transmission-edit
  -rwxr-xr-x. 1 rjones rjones 3014872 Sep 12 22:12 transmission-remote
  -rwxr-xr-x. 1 rjones rjones 2901832 Sep 12 22:12 transmission-show


I think you’d have to strip the binaries to account for debug symbol differences.


size(1) exists to provide the actual information. File sizes are only a rough indicator.


     text    data     bss     dec     hex filename
   920576   11508    4152  936236   e492c build-c++/gtk/transmission-gtk
   912575   11644    4120  928339   e2a53 build-c/gtk/transmission-gtk


What compiler flags did you use? Perhaps -Os would even things out a little more.


Why would anyone want to optimize for space? On a halfway modern PC at least, some really embedded platform might be a different story.


Cache-friendliness; applies to any platform.

And it's "optimize for size", by the way.


Right, in certain cases. But here it's a networking application. So memory accesses won't be your bottleneck.


gcc-11.2.1-1.fc35.x86_64 with whatever defaults the upstream project chooses.


Did you run CMake with -DCMAKE_BUILD_TYPE=Release ?


No, with -DCMAKE_BUILD_TYPE=RelWithDebInfo


Right, and C++ is definitely going to have more debug info because of longer symbol names.

Test the two with Release instead. That will give you real results.


Thanks for the proof. Over 100k increase on average. Not surprising.

...and people wonder why software gets slower and bigger over time while doing the same thing. We even get HN articles about that semi-regularly.


Unless you're on a limited embedded platform 100k of size increase in a binary is basically nothing these days or any pc/mac built in the past 20 years. Also equating bigger to slower in binaries as a truism is usually a fallacy.


Are you actually being serious? Less than 3% increase warrants such a reaction?


Will it ever decrease by 3%?

The amount of increase is irrelevant if it continues in the same direction.


A question that I'd like to ask a C++ expert: If you use any STL container, is it always the case that the whole thing is templated, therefore effectively compiled and statically linked into the binary? Or will part/all of it come from functions in the dynamically linked libstdc++.so?


libstdc++.so (and libc++.so if you're bent that way) contain the standard stream objects, some standard string functions specialized on `char`, some threading and synchrony support and important parts of the C++ language runtime (eg. a default ::operator new, some of the exception unwinding and RTTI mechanisms). And that's it. It's actually fairly small and basic.

Pretty much everything else in the C++ standard library is template code in headers and gets instantiated when needed. You may end up having identical templated functions emitted into several different translation units and the linker will chose one at link time, which is one major reason why C++ is slower to build.

Only template functions that are actually used get instantiated. If you include a header with a gazillion templated functions (namespace level or member functions, it doesn't matter) you are most likely to end up with only one or two instantiated in your binary. Templates are like a cookbook, and just because Larousse has 10,000 recipes does not mean that's how much you're going to eat at every meal. Consider there is virtually an infinite number of possible instantiations for every template.


I guess it depends on your definition of small... VSCode recently mentioned a 10MB hit to statically link the libcxx library from Chromium to dodge issues running on older platforms.

Edit: libcxx from Chromium, not libstdc++

https://github.com/microsoft/vscode/pull/129360#issue-952350...

> The increase in bundle size is significantly small (~10MB).


That depends on how the standard library is implemented. I wouldn’t be surprised if libc++ or libstdc++ used explicit template instantiations for STL containers of primitive types, e.g. std::vector<int>.


Here, targeting Commodore 64 with C++17.

https://www.youtube.com/watch?v=zBkNBP00wJE


100k bytes = ~98 KB.

That's not material.


Most of that is probably in some symbol table used by the linker. C++ symbols include type names for parameters, C compilers only contain a function name, its linker literally can't tell the difference between a char main[10] and int main(int,char*){} because both are compiled to "main" .


This is important.

The size of the stripped binaries should be compared. ELF binaries contain both link-time sections and loadable segments, and the longer name of C++ symbols only affects disk space (for both the link-time symbol table and for debug information). The size of the runtime-loadable segments is really the only thing of interest, and stripping the binaries gives you a truer indication of this.


Probably a bit. Monomorphization is expensive in terms of code size. On the other hand - why would you care for a couple of 100kb for a use case like transmission?


uTorrent, a whole torrent client I remember using many years ago, was only ~100k in total.


uTorrent is on the same page as Transmission before this change, it achieves its small size by not using the STL at all.


All sources I've found indicate that uTorrent was written in C++, but perhaps its author was a bit more mindful of unnecessary abstraction and the like.


This is worth keeping in mind, but the real issue is the pressure on the optimizer and linker rather than size IMO.


It is a common mistake what you're doing by generalizing it out of the original scope, which is only performance wise. Any other aspect you might think is not encompassed into the original zero cost abstraction concept.


Zero cost abstractions are about performance, not code size.


Code size definitely affects performance, and has done so ever since caches existed.


I advise a talk from Chandler Carruth where he proves longer code can achieve higher performance due to the way computer architectures work.

Unfortunately I no longer remember in which conference he did it, maybe someone else can link it.


In microbenchmarks, on a long pipeline RISC, or similar microarchitectures like NetBurst (P4), I could see that being true. But we're long past that era now. It's the same misguided assumptions that leads to several-KB-long "optimised" memcpy() implementations whose real-world effects on performance are negligible to negative.

If you don't believe me, read what Linus Torvalds has to say about why the Linux kernel is built with -Os by default.


The longer code is typically generated because the compiler will generate vectorized code that provides enormous speedups in case of longer data sets. Take, for example, this code: https://godbolt.org/z/WEx3Gb5jr

At -O2 the assembly it generates is straightforward, and in line with what a human programmer would write. At -O3 it generates vector code that needs a lot more instructions (vector pipeline setup, code to deal with the remaining elements that don't entirely fill up a vector register, etc.) but the main loop takes 4 integers at a time instead of one, so that provides a nice 4x speedup. In order to achieve that it needs 25 instructions to set up the loop / finish the remaining elements, compared to 5 instructions for the -O2 code.

For very short loops the -O2 version will have superior performance, but for runs of data from around 8 integers (wild guess) the -O3 version will begin with pull ahead. So it really depends on the type of data your program is handling, whether it is better to optimize for speed or size.


My recent tests with -Os resulted in distinctly negative effects on performance.

But the main problem with -Os is that it is poorly exercised. The best-exercised modes are -O0 and -O2, so those are the ones to use in production.


> My recent tests with -Os resulted in distinctly negative effects on performance.

Err... yeah, because -Os means "optimize size". Not "speed".

> But the main problem with -Os is that it is poorly exercised

No, the main problem is that you don't understand -Os :) It works as intended.


Evidently you are unaware that use of -Os has on earlier (but quite recent) generations of CPU architecture resulted in notably faster performance.

And, that any compiler feature that is little used will necessarily receive less attention than commonly used features, and be less stable and reliable. Before trying to fix any bug in a program built with -Os, reproducing it first in -O2 will reduce premature balding.


> Before trying to fix any bug in a program built with -Os, reproducing it first in -O2 will reduce premature balding.

I have no idea what you're talking about. Debug your programs in -O0, which means "no optimizations". -Os is, and has always been, optimizing for executable SIZE. It has no guarantees wrt performance.


[flagged]


Please debate in good faith instead of resorting to snide remarks.

Os has never made any guarantees of speed or performance. Any cases where performance increased over O2 are platform specific and are anomalous.

I've never made any statement on edge cases, only guarantees and intent. I have nothing to regret.


He did it on x64.

EDIT: This is the talk, if I remember correctly.

https://isocpp.org/blog/2018/06/cppcon-2017-going-nowhere-fa...


I remember this being talked about in a talk about the Coz profiler. Maybe that was it?


I can easily name several usecases, when increased code size leads to better performance (compile-time evaluation, architecture-specific optimizations)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: