The last few percentage points of performance take an insane amount of power. If you gave up 10% perf you'd probably halve power consumption.
I don't think there's any reason X86 has to use more power than ARM - it's simply not the focus of most implementations, however. As I understand it, most processors at this point are an interpreter on top of a bespoke core. Intel used to get quite a lot of praise for low power consumption back in 2012-2015 with Ivy Bridge and so on - rather coincidentally, that was also when they had a process advantage (rather like AMD and Apple today enjoy).
Yes and no. After the CISC vs RISC war was over I also though ISAs where implementation details.
But from what I’ve read, having different length instructions makes extracting parallelism way harder. That’s why Apple can make such crazy wide machines.
Oh yeah, isn't ARM fixed-size instructions and x86_64 is variable-size? So decoding x86_64 requires clever pipelining, whereas ARM is just "Every X bytes is an instruction" and you can parallelize easily.
I wonder if we'll see Intel or AMD try to make another somewhat-backwards-compatible ISA jump to keep up with ARM.
If I’m not mistaken, based on similar threads on HN decoding is never the bottleneck, so I would be hesitant to write x86 off for mobile devices. It probably does make transition to smaller scale harder, and that is where most efficiency wins happen.
We should never write x86 off when there are billions behind it and variable length instructions have their advantages as well, such as code density, which may come to play an important role again in the future.
But it is much easier to simply chop off a stream of instructions at every X bits than to evaluate a portion and decide what to do later and that difference get larger the wider you go.
> variable length instructions have their advantages as well, such as code density
Variable length instructions in general do have a code density advantage, but x86 is a particularly poor example. For historical reasons, it wastes short encodings with rarely used things like BCD adjustment instructions, and on 64 bits often requires an extra prefix byte. The RISC-V developers did a size comparison when designing their own compressed ISA, and the variable-length x86-64 used more space than the fixed-length 64-bit ARM; for 32 bits, ARM's variable-length Thumb2 was the winner (see page 14 of https://riscv.org/wp-content/uploads/2015/06/riscv-compresse...).
For many years, Intel had quite a process advantage over the competition. That of course helped them a lot with making low power processors vs. what AMD could achieve. And the non-x86 competition had basically stopped making processors in this domain. However, there was a reason that RISC designs were used in most low power applications like embedded and of course smart phones.
Yes, with the complexity and transistor budgets, the disadvantages of x86 can be somewhat glossed over, otherwise they would have vanished from the market long ago, but they add a certain overhead which cannot be ignored when looking at low power applications. The efforts the CPU needs to take until it can execute the commands is higher and x86 requires more optimizations done by the CPU than RISC designs. Which today also contain a translation layer, but a way simpler one tha x86, as the assembly instructions match modern CPU structures better.
It is probably no coincidence that Intel, which had to work around the issues of executing CISC code on a modern CPU chose the EPIC design for the Itanium. Which goes beyond RISC in putting compexity towards the code generation vs. on-cpu optimizations. Too bad it didn't work out - it might have, if AMD had not added 64bit extensions to x86. While there were certainly a lot of technical challenges which were never completely solved, the processors seemed to perform quite well when run with well optimized code. Perhaps they were just one or two process generations to early. While considered large for that time, their transistor count was small compared to a modern iPhone processor. I wonder how they would perform if just ported to 7nm (the latest CPUs were 32nm).
Even though Intel is known for putting tremendous work and effort in to their compilers, and therefore have compilers that put out excellent results (even on AMD), the compilers never delivered on the promises they made with Itanic.
If you'd like to see some first-hand observations about modern-ish compilers on Itanic, check out this person on Twitter who does lots of development on Itanic:
I don't think there's any reason X86 has to use more power than ARM - it's simply not the focus of most implementations, however. As I understand it, most processors at this point are an interpreter on top of a bespoke core. Intel used to get quite a lot of praise for low power consumption back in 2012-2015 with Ivy Bridge and so on - rather coincidentally, that was also when they had a process advantage (rather like AMD and Apple today enjoy).