multi-threaded performance per thread which is a metric people don't use for very good reason.
Maybe I used the wrong term but I'm referring to the idea of how much work can be done by a single instruction stream (thread) in a fixed number of clock cycles.
but that isn't what people mean when they talk about single threaded performance.
Then what do they mean?
I understand what you mean about scaling not being linear with the number of threads, but even with the same (very large) number of threads:
How much work an instruction an instruction stream can do in a fixed number of clock cycles is going to be hugely dependant on what other instruction streams executing at the same time might be doing. That's why the convention is, when measuring single threaded performance, to only use a single thread.
Nothing says that you have to run the same number of threads in your workload as you have hardware threads. Operating systems are there to multiplex software threads over hardware threads, and part of SPEC is a test of the operating system and compiler as well as the chips and motherboards and memory. There's nothing to prevent someone from taking the Xeon system in your your post with 30,000 threads, producing a system with a performance per thread result much much lower than running it with 384 threads.
The interesting results are which systems can achieve the absolutely highest throughput and single thread performance, and which can achieve more throughput or single thread performance per unit price or unit power consumption.
Maybe I used the wrong term but I'm referring to the idea of how much work can be done by a single instruction stream (thread) in a fixed number of clock cycles.
but that isn't what people mean when they talk about single threaded performance.
Then what do they mean?
I understand what you mean about scaling not being linear with the number of threads, but even with the same (very large) number of threads:
POWER7@3.44GHz, 384 threads (16 chips, 6 cores/chip, 4 threads/core) result 3560
Xeon X7542@2.67GHz, 384 threads (64 chips, 6 cores/chip, 1 threads/core) result 8190