Thanks for adding those to the Benchmarks Game - I think it was you?
If you are interested in very short running programs, then these benchmarks can be interesting to look at.
If you are more interested in long running programs, like web servers, then I don't think you'll find many professional virtual machine implementors who will agree that this is a valid way to benchmark things.
But we probably also have some optimisation bugs to work out still. There's even some errors in the logs of those.
I think Truffle has a pretty good showing there that matches your description -- quick scripts don't get much help but longer-running ones have some pretty incredible improvements.
What's "quick" and "long running" in CPU secs on some machine?
What language implementation are we using as a baseline when we say "quick" and "long running"?
Otherwise someone might well say that 8 minutes with TruffleRuby is "long running" and CRuby 2.5 makes "some pretty incredible improvement" over that :-)
If you are interested in very short running programs, then these benchmarks can be interesting to look at.
If you are more interested in long running programs, like web servers, then I don't think you'll find many professional virtual machine implementors who will agree that this is a valid way to benchmark things.
But we probably also have some optimisation bugs to work out still. There's even some errors in the logs of those.