Pop Goes the Population Count?

dhosek · 2025-12-11T20:08:10 1765483690

I still find it wild that Godbolt is his actual name and not some cool term used for the tool to see what compiler output looks like.

silisili · 2025-12-11T21:02:34 1765486954

Same! I always assumed it was the name of the tool until I found out it was a person not long ago.

Maybe a bit of a stretch, but I could see it fitting - https://en.wikipedia.org/wiki/Aptronym

thefaux · 2025-12-11T23:06:40 1765494400

As impressive as this analysis is by the compiler, I shudder to think how much time the compiler spends doing these kinds of optimizations. In my opinion, a better architecture would be for the compiler to provide a separate analysis tool that suggests source level changes for these kinds of optimizations. It could alert you that the loop could be replaced with a popcount (and optionally make the textual replacement in the source code for you). Then you pay the cost of the optimization once and have the benefit of clarity about what your code _actually_ runs at runtime instead of the compiler transparently pulling the rug out from underneath you when run with optimizations.

Side note: many years ago I wrote the backend for a private global surveillance system that has almost surely tracked the physical location of anyone reading this. We could efficiently track how often a device had been seen at a location in the prior 64 (days|weeks|months) in just 192 bytes and use popcount to compute the value. I am not proud that I built this.

adgjlsfhk1 · 2025-12-12T02:03:02 1765504982

Especially in languages that allow generic programming, the right thing to do will be context dependent.

bombcar · 2025-12-11T17:36:45 1765474605

Isn't this the instruction that apparently the NSA asked for?

kens · 2025-12-11T18:58:59 1765479539

I did a lot of research on this [1]. I got confirmation from Robert Garner (architect of the SPARC processor) that the NSA did indeed ask for the population count instruction. His story of meeting with the NSA is pretty amusing [2].

[1] https://retrocomputing.stackexchange.com/a/8666/4158

[2] https://archive.computerhistory.org/resources/access/text/20...

pklausler · 2025-12-11T18:12:29 1765476749

It goes back to the CDC 6600 at least, and is most often seen as part of Hamming distance computation (pop(xor(x,y))). But it turns out to be really useful for other things (trailing zero count), and worth having in hardware since the software sequence is a ~dozen instructions for 64 bits.

adgjlsfhk1 · 2025-12-11T18:03:03 1765476183

yeah. They understood how useful computing on bits can be before anyone else.

majke · 2025-12-11T17:56:45 1765475805

Hey! Popcount used to be my favorite instruction. Now I think I prefer LOP3 though :)

taeric · 2025-12-11T19:25:07 1765481107

Somewhat related, "Gosper's hack" is a fun way to loop through all of the values that have the same number of 1s.

burnt-resistor · 2025-12-11T20:35:01 1765485301

I <3 [AT]BM and BMI[12], and functional-equivalency matching with cost minimization optimization compiler passes.

I wish GCC and LLVM had compiler passes to semi/automagically "vectorize" hot sections using SIMD, i.e., magic transformation of UTF-8 conversion, regex matching, and string functions.

wat10000 · 2025-12-11T18:55:50 1765479350

There's a fun approach where do the computation as a tree in parallel. You do a little masking and shifting to add all the even-numbered bits to all the odd-numbered bits, and come up with a set of (assuming we're working on a 64-bit value) 32 partial sums of 2 bits each. Then you add pairs of those to get 16 partial sums of 4 bits each, and so forth until you get to the top. This requires six sums, plus shifts and masks for each one.

I don't know if compilers are able to detect this and compile it down to a single instruction, though.

mattgodbolt · 2025-12-11T19:17:16 1765480636

All that and more: https://graphics.stanford.edu/~seander/bithacks.html#CountBi... :)