"CSCI 4020: Writing Fast Code in Slow Languages" does exist, at least in the boo...

nine_k · 2025-10-24T04:38:29 1761280709

One of my childhood books compared bubble sort implemented in FORTRAN and running on a Cray-1 and quicksort implemented in BASIC and running on TRS-80.

The BASIC implementation started to outrun the supercomputer at some surprisingly pedestrian array sizes. I was properly impressed.

kragen · 2025-10-24T08:45:01 1761295501

To be fair, the standard bubble sort algorithm isn't vectorized, and so can only use about 5% of the power of a Cray-1. Which is good for another factor of about 5 in the array size.

rjsw · 2025-10-24T11:40:22 1761306022

A Cray-1 was still fast at non-vector code when new.

kragen · 2025-10-24T16:47:01 1761324421

Yes, as I understand it, its 80MHz clock gave it a 12.5ns memory access time, and I think it normally accessed memory four times per instruction, enabling it to do 20 MIPS (of 64-bit ALU ops). But the vector units could deliver 160 megaflops, and usually did. I think a TRS-80 could technically run about half a million instructions per second (depending on what they were) but only about 0.05 Dhrystone MIPS—see the Cromemco Z2 on https://netlib.org/performance/html/dhrystone.data.col0.html for a comparable machine.

So we can estimate the Cray's scalar performance at 400× the TRS-80's. On that assumption, Quicksort on the TRS-80 beats the Cray somewhere between 10000 items and 100_000 items. This probably falsifies the claim—10000 items only fits in the TRS-80's 48KiB maximum memory if the items are 4 bytes or less, and although external sorting is certainly a thing, Quicksort in particular is not well-suited to it.

But wait, BASIC on the TRS-80 was specified. I haven't benchmarked it, but I think that's about another factor of 40 performance loss. In that case the crossover isn't until between 100_000 and 1_000_000 items.

So the claim is probably wrong, but close to correct. It would be correct if you replaced the TRS-80 with a slightly faster microcomputer with more RAM, like the Apple iiGS, the Commodore 128, or the IBM PC-AT.

aDyslecticCrow · 2025-10-24T06:30:22 1761287422

We had this as a lab in a learning systems course. converting python loops into numpy vector manipulation (map reduce), and then into tensorflow operations, and measuring the speed.

Gave a good idea of how python is even remotely useful for AI.

d_silin · 2025-10-24T03:15:41 1761275741

The book in question:

https://www.amazon.ca/Visual-Basic-Algorithms-Ready-Run/dp/0...

bluedino · 2025-10-24T13:19:36 1761311976

I work with Python programmers (engineers/scientists who 'know' Python) daily. Having them understand why their slow code is slow would be amazing.

mrguyorama · 2025-10-24T18:14:28 1761329668

We are rebuilding a core infrastructure system from unmaintained python (it's from before our company was bought and everyone left) to java. It's nothing interesting, standard ML infrastructure fare. A straightforward, uncareful, like weekend implementation in java was over ten times faster.

The reason is very simple: Python takes longer for a few function calls than Java takes to do everything. There's nothing I can do to fix that.

I wrote a portion of code that just takes a list of 170ish simple functions and run them, and they are such that it should be parallelizable, but I was rushing and just slapped the boring serialized version into place to get things working. I'll fix it when we need to be faster I thought.

The entire thing runs in a couple nanoseconds.

So much of our industry is writing godawful interpreted code and then having to do crazy engineering to get stupid interpreted languages to do a little faster.

Oh, and this was before I fixed it so the code didn't rebuild a constant regex pattern 100k times per task.

But our computers are so stupidly fast. It's so refreshing to be able to just write code and it runs as fast as computers run. The naive, trivial to read and understand code just works. I don't need a PhD to write it, understand it, or come up with it.

jpc0 · 2025-10-24T15:41:35 1761320495

I’ll take the fight on Algorithmic complexity any day.

There are many cases where O(n^2) will beat O(n).

Utilising the hardware can make a bigger difference than algorithmic complexity in many cases.

Vectorised code on linear memory vs unvectorised code on data scattered around the heap.

d_silin · 2025-10-24T16:08:06 1761322086

I sincerely hope you are joking...

jpc0 · 2025-10-24T20:28:53 1761337733

Big O notation drops the coefficient, sometimes that coefficient is massive enough that O(N) only beats out O(N^2) at billions of iterations.

Premature optimisation is a massive issue, spending days working on finding a better algorithm is many times not with the time spent since the worse algorithm was plenty good enough.

Real world beats algorithmic complexity many many times because you spent ages building a complex data structure with a bunch of heap allocations all over the heap to get O(N) while it's significantly faster to just do the stupid thing that is in linear memory.

zeroCalories · 2025-10-24T11:39:18 1761305958

I imagine this is a class specifically about slow languages. Writing code that doesn't get garbage collected, using vectorized operations(numpy), exploiting jit to achieve performance greater than normal C, etc.

pjmlp · 2025-10-24T08:04:56 1761293096

VB is actually quite fast since VB 6, but your point stands.

LPisGood · 2025-10-24T03:32:45 1761276765

Python has come along way. It’s never gonna win for something like high-frequency trading, but it will be super competitive in areas you wouldn’t expect.

pjmlp · 2025-10-24T08:05:53 1761293153

It could be much better if most folks used PyPy instead of CPython as favourite implementation.

dominicrose · 2025-10-24T08:39:01 1761295141

The Python interpreter and core library is mostly C code, right? Even a Python library can be coded in C. If you want to sort an array for example, it will cost more in Python because it's sorting python objects, but it's coded in C.

liqilin1567 · 2025-10-24T04:37:45 1761280665

Optimizing at the algorithmic and architectural level rather than relying on language speed