How is it a misconception? My overall point was that shell oneliners are often m...

svat · on May 7, 2022

The task Knuth was given was to illustrate his literate programming system (WEB) on the task given to him by Bentley, which meant writing a Pascal program "from the ground up", and (ideally) the program containing something of interest to read.

If instead of writing a full program as asked, he had given some cop-out like “actually, instead of writing this in WEB as you asked, I propose you just go to Bell Labs or some place where Unix is available, where it so happens that other people have written some programs like 'tr' and 'sort', then you can combine them in the following way”, that would have been an inappropriate reply, hardly worth publishing in the CACM column. (McIlroy, as reviewer, had the freedom to spend a section of his review advertising Unix and his invention of Unix pipelines, then not yet as well-known to the general CACM reader.)

So while of course shell one-liners are faster to bang out for a one-off use-case, they obviously cannot accomplish the task that was given (of demonstrating WEB). (BTW, I don't want to too much repeat the earlier discussion, but see https://codegolf.stackexchange.com/questions/188133/bentleys... — on that input, the best trie-based approach is 8x faster than awk and about 200x faster than the tr-sort script.)

aleksiy123 · on May 7, 2022

Seems like you should just go with what you you know best.

Taking 10x longer doesn't seem like a language problem. If you don't know bash well you're going to take even longer to do it in bash than in python.

In any case the task you described is pretty much the same in python as in bash. At worst the python is going to be more more verbose.

   python -c "print(len(set(w for l in list(open('test.txt')) for w in l.split())))"

vs

   tr ' ' '\n' < file_name | sort | uniq -c | wc -l

t43562 · on May 8, 2022

The shell's advantage is that of the pipeline components don't need to suck the whole file in so it can potentially operate on much larger files without running out of memory. I think only "sort" is problematic and at least it's a merge sort.

In Python you could use a generator but it would get a little more complicated and you'd still have to add all the words to set() but hopefully the number of different words is not that great.

The trie approach is quite memory efficient and that can matter.

aleksiy123 · on May 9, 2022

I'm fairly sure `open` is a generator and doesn't load the whole file into memory. So you wouldn't hit a memory error unless like you said the amount of unique words is high enough.

t43562 · on May 9, 2022

I think you're right but I believe that wrapping it in List(...) is where that would force the whole file into memory.

aleksiy123 · on May 9, 2022

Yeah, you're right, that's my mistake.

I think you can just omit it but yeah...

IshKebab · on May 7, 2022

> one-off use case

And fortunately nobody is foolish enough to think that shell scripting is robust enough to use for more than one-off uses! /s

MontyCarloHall · on May 7, 2022

Hah, I have a friend who spent a large chunk of an undergraduate summer internship at Google porting a >50k line bash script (that was used in production!) to Python. It was not their most favorite summer, to say the least.

andi999 · on May 7, 2022

How does one do that? I mean you just can type 50k lines in 2.5 month if you type 1000 lines per day. Which sound a lot to me.

thyrsus · on May 8, 2022

It's only possible if you can identify large portions of the 50k original lines as having been previously implemented by other components (python modules, microservices, etc.), or that large portions are dealing with cases that are guaranteed to no longer arise (so you either produce different results or error out if you detect them).