We found bugs in that one as well. ARM even sent us their pre-release binaries some time. We had pretty templaze-heavy code and unit tests. When Visual Studio and gcc (and the standard) agreed on one outcome and RealView on another, we reported it. Record was three hours after receiving the compiler we reported a bug.
One ARM person once said: "You know, we thought we build these CPUs, so we'd be the ones optimizing best for it. Boy did we learn how complicated C++ is. Next version will be based on clang..."
Somewhere there's this great CMSIS-DSP ticket: creator complains that CMSIS uses volatile loads/stores for GCC. Creator remove volatile, GCC reorders instructions to require larger number of in-flight registers, spills everything to the stack and tanks the FFT speed.