"CSCI 4020: Writing Fast Code in Slow Languages" does exist, at least in the book form. Teach algorithmic complexity theory in slowest possible language like VB or Ruby. Then demonstrate how O(N) in Ruby trumps O(N^2) in C++.
One of my childhood books compared bubble sort implemented in FORTRAN and running on a Cray-1 and quicksort implemented in BASIC and running on TRS-80.
The BASIC implementation started to outrun the supercomputer at some surprisingly pedestrian array sizes. I was properly impressed.
To be fair, the standard bubble sort algorithm isn't vectorized, and so can only use about 5% of the power of a Cray-1. Which is good for another factor of about 5 in the array size.
Yes, as I understand it, its 80MHz clock gave it a 12.5ns memory access time, and I think it normally accessed memory four times per instruction, enabling it to do 20 MIPS (of 64-bit ALU ops). But the vector units could deliver 160 megaflops, and usually did. I think a TRS-80 could technically run about half a million instructions per second (depending on what they were) but only about 0.05 Dhrystone MIPS—see the Cromemco Z2 on https://netlib.org/performance/html/dhrystone.data.col0.html for a comparable machine.
So we can estimate the Cray's scalar performance at 400× the TRS-80's. On that assumption, Quicksort on the TRS-80 beats the Cray somewhere between 10000 items and 100_000 items. This probably falsifies the claim—10000 items only fits in the TRS-80's 48KiB maximum memory if the items are 4 bytes or less, and although external sorting is certainly a thing, Quicksort in particular is not well-suited to it.
But wait, BASIC on the TRS-80 was specified. I haven't benchmarked it, but I think that's about another factor of 40 performance loss. In that case the crossover isn't until between 100_000 and 1_000_000 items.
So the claim is probably wrong, but close to correct. It would be correct if you replaced the TRS-80 with a slightly faster microcomputer with more RAM, like the Apple iiGS, the Commodore 128, or the IBM PC-AT.
We had this as a lab in a learning systems course. converting python loops into numpy vector manipulation (map reduce), and then into tensorflow operations, and measuring the speed.
Gave a good idea of how python is even remotely useful for AI.
We are rebuilding a core infrastructure system from unmaintained python (it's from before our company was bought and everyone left) to java. It's nothing interesting, standard ML infrastructure fare. A straightforward, uncareful, like weekend implementation in java was over ten times faster.
The reason is very simple: Python takes longer for a few function calls than Java takes to do everything. There's nothing I can do to fix that.
I wrote a portion of code that just takes a list of 170ish simple functions and run them, and they are such that it should be parallelizable, but I was rushing and just slapped the boring serialized version into place to get things working. I'll fix it when we need to be faster I thought.
The entire thing runs in a couple nanoseconds.
So much of our industry is writing godawful interpreted code and then having to do crazy engineering to get stupid interpreted languages to do a little faster.
Oh, and this was before I fixed it so the code didn't rebuild a constant regex pattern 100k times per task.
But our computers are so stupidly fast. It's so refreshing to be able to just write code and it runs as fast as computers run. The naive, trivial to read and understand code just works. I don't need a PhD to write it, understand it, or come up with it.
Big O notation drops the coefficient, sometimes that coefficient is massive enough that O(N) only beats out O(N^2) at billions of iterations.
Premature optimisation is a massive issue, spending days working on finding a better algorithm is many times not with the time spent since the worse algorithm was plenty good enough.
Real world beats algorithmic complexity many many times because you spent ages building a complex data structure with a bunch of heap allocations all over the heap to get O(N) while it's significantly faster to just do the stupid thing that is in linear memory.
I imagine this is a class specifically about slow languages. Writing code that doesn't get garbage collected, using vectorized operations(numpy), exploiting jit to achieve performance greater than normal C, etc.
Python has come along way. It’s never gonna win for something like high-frequency trading, but it will be super competitive in areas you wouldn’t expect.
The Python interpreter and core library is mostly C code, right? Even a Python library can be coded in C. If you want to sort an array for example, it will cost more in Python because it's sorting python objects, but it's coded in C.