We had landed a job to compress tens of thousands of CDs to mp3 for a dutch broadcasting corporation and doing it on a single machine would have taken years at the speed mp3 compressors were running in those days (Fraunhofer reference code).
Doing it on this stack of boards got the job done in under two months.
Nowhere near as just-because-I-can as the linked article though.
I've used the Keil IDE, which is free for small projects. Takes some getting used to but great debugger and easy to get started.
There's also free Eclipse/gcc toolchains so if you're already used to Eclipse that's an easy way to get started.
All the newer ST demo boards have on-board JTAG debuggers so they're a little bigger but convenient since you just need a USB cable. External JTAG debuggers are pretty cheap and Keil (ARM) has a super-nice trace probe if you're willing to spend a little more.
I have no idea if it would be appropriate for your build, but the previously featured micropython (board) does have an fpu (and "oodles" of processing power)
http://www.clustercompute.com/
We had landed a job to compress tens of thousands of CDs to mp3 for a dutch broadcasting corporation and doing it on a single machine would have taken years at the speed mp3 compressors were running in those days (Fraunhofer reference code).
Doing it on this stack of boards got the job done in under two months.
Nowhere near as just-because-I-can as the linked article though.