Fascinating read. Does anyone have any examples or explanations for how these CP...

1amzave · on Jan 12, 2011

Circuit-level (layout) problems I'd guess make up a decent portion. I heard (from a guy who worked on it) about a bug in a prototype version of a processor from a major company that didn't cause any correctness problems but was a major performance problem: one of its four cache ways simply didn't have its power rails connected.

Some CPU verification code I wrote on an internship a couple years ago discovered a few bugs in a certain fairly widely-used processor, though I'm pretty sure they were all logic-level problems (i.e. RTL bugs, not circuit level ones)...

- The L1 D-cache tracked clean/dirty status at half-cache-line granularity, and if you did a store (with just the right timing) to one half of a cache line you had just explicitly cleaned with a cache-clean instruction, the dirty bit wouldn't get set on that half line, so as soon as the cache got flushed the data written by the store was lost.

- The prefetcher would shut down sometimes as a power saving technique, but if you laid out the right sequence of cache operations and branches in the last 32 bytes of a 4KB page, sometimes concurrent TLB misses would cause it to not get re-enabled, meaning the processor would lock up, stop fetching instructions and just sit there dead in the water until an interrupt came in (assuming interrupts were enabled).

There were a couple more, but I thought those were the more interesting ones. Granted, these weren't bugs that were likely to be encountered in normal usage for various reasons (in addition to being extremely difficult to reproduce sometimes -- i.e. on one in particular you could run the exact same sequence of instructions from system power-on and sometimes it happened, sometimes it didn't), but bugs nonetheless.

luu · on Jan 12, 2011

I didn’t work on the 386, and bugs can be caused by anything, but based on bugs that I’ve seen on processors that I’ve worked on, my guess would be that there was some forwarding logic designed to speed up consecutive 16-bit operations and consecutive 32-bit operations, along with some logic to detect when to apply the forwarding logic.

If the detection logic is wrong, you could easily end up forwarding 16 good bits + 16 bits of random garbage into one input of a 32-bit operation. That would explain Raymond’s "if all the stars line up exactly right" line, since the hole in the forwarding logic must have been really small (or it would have been caught in testing).

adestefan · on Jan 12, 2011

If it's repeatable, then it's a bug in the logic or microcode of a chip. Some of these could be really extreme corner cases, especially when people are doing things with mixing 16-bit and 32-bit instructions.

It's amazing how much hardware out there is buggy and fixed by operating systems and drivers.