Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> grep (global regular expression print), awk (Aho, Weinberger, Kernighan; the creators’ initials), sed (stream editor), cat (concatenate), diff (difference). Even when abbreviated, these names were either functional descriptions or systematic derivations.

If you asked someone unfamiliar with unix tools what they thought each of these commands did, diff is the only one which they would have even the slightest chance of guessing. It's ridiculous to complain about "libsodium" and then hold up "awk" as a good name.





Yeah this definitely falls into the category of "I use them so they feel natural", there's nothing amazing about those names.

The underlying problem is that you now run into so many named things (utilities, libraries, programs, etc.) in a day and they all have to differentiate themselves somehow. You can't name every crypto library `libcrypto` for obvious reasons.


Fine. Name it sodium-crypto.a or sodium.crypto.a or whatever. The author's complaint does hold water.

You can, but then the names get needlessly long and one of the things we generally like (especially for command-line programs) is names that are short and easy to type. If we're going to make this argument then why not call the unix tools `concatenate`, `difference`, `stream-editor`, etc. Those are way better names in terms of telling you what they do, but from a usability standpoint they stink to type out.

Libraries and programs also have a habit of gradually changing what exactly they're about and used for. Changing their name at that point doesn't usually make sense, so you'll still end up with long names that don't actually match exactly what it does. Imagine if we were typing out `tape-archive` to make tarballs, it's a historically accurate name but gives you no hint about how people actually use it today. The name remains only because `tar` is pretty generic and there's too much inertia to change it. Honestly I'd say `cat` is the same, It's pretty rare that I see someone actually use it to concatenate multiple files rather than dump a single file to stdout.

The author is missing the fact that stuff like `libsodium` is no differently named from all the other stuff he mentioned. If he used libsodium often then he may just as well have mentioned it as well-named due to it's relation to salt and would instead be complaining about some other library name that he doesn't know much about or doesn't use often. I _understand_ why he's annoyed, but my point is that it's simply nothing new and he's just noticing it now.


Short names are a figment of the age of teletypes when you had to repeatedly type things out. This hasn't been the case for at least 3 decades. Most good shell+terminal combinations will support autocomplete, even the verbose Powershell becomes fairly easy to use with shell history and autocomplete, which, incidentally, it does very well.

If you are repeatedly typing library names, something is wrong with your workflow.

Niklaus Wirth showed us a way out of the teletype world with the Oberon text/command interface, later aped clumsily by Plan 9, but we seem to be stuck firmly in the teletype world, mainly because of Un*x.


libeay

`eay` is just the initials of the original author, so basically the same thing as `awk`.

> The author's complaint does hold water.

Ironically, much like sodium itself, a substance of which the author seemingly possesses too much of.


Without looking it up, is it sodium for "salt"? That's about as tethered to the actual use (salt + hash being a common crypto thing) as any of the names in the root comment

https://en.wikipedia.org/wiki/Libiberty was always my favorite ridiculous name. It was named so you can link it with -liberty.

Used to be that Ruby's "rubygems" library had an alias "ubygems" so that when invoking ruby with the -r option (to require a library) you could say "ruby -rubygems". Sadly, they seem to have removed this alias library sometime around Ruby 2.4.

It was removed because rubygems was made to be required by default so it was now useless.

The stdlib still contains `un.rb` though: https://github.com/ruby/ruby/blob/d428d086c23219090d68eb2d02...


Wow, I did not about un.rb. Looks cool though.

A friend created a library called library which was kind of a converse to that (you had to link it with -lrary). It was funny for 30 seconds and then just annoying.


cat is arguably from catenate, which is the smarter, shorter version of concatenate. By default, unadorned catenation is a joining (literally "making into a chain"), which is always together/with, so the con prefix is redundant. If you ever need a derivative of catenate that means splitting apart, you can coin discatenate, where the dis then plays an essential role.

Also, why is it that people are gregarious when they congregate, and not congregarious? Or why didn't they just gregate? There was such a Latin cognate verb without the con attached.


> Or why didn't they just gregate?

Because that's not how the prepositions worked in Latin. E.g. "Marcus ex casa exit" ("Marcus went out of the house") requires both the "ex" preposition and the "ex-" prefix in the verb. Heck, even today similar things can happen in English: "they gathered together", that sentence has two instances of "gather" in it.


It also seems wrong? libsodium explains the logic in its name right on its about page. It's a fork of NaCL (the chemical formula for sodium salt), which itself is a plain acronym for "networking and cryptography library." Google doesn't seem like a good example, either. Wasn't that meant to be an allusion to the very large number googolplex, as in Google exists to tame the unfathomably large amount of information on the web? The author may or may not like those names, but they have a logic just like grep and awk do.

Even more directly, in fact.

"Google" is from "Googol", the latter being 10^100. Apparently, "Google" the corporate name is an accidental misspelling of the number.

The number (googol) has no mathematical special properties and the name was invented by a 9-year old in the 1920s.

Googolplex is 10 to the googolth power, so 10^(10^100).

And Googleplex is the MV campus of Google.


it should have been called chlorine for "Cl" is the cryptography library in "NaCL"

I’m not sure I like awk, sed, or cat, I think these are just names we’re used to, not good really. diff seems ok.

grep almost has an onomatopoeic nature to it… like, it sounds like you are grabbing or ripping the patterns out of the file, right?


>I’m not sure I like awk, sed, or cat

sed is not "stream editor" as it says above, it's "stream ed", where ed was another prexisting program which was essential and everybody knew it. its name was from "editor" shortened.

the sed commands are the ed commands. so, it's almost not possible to say "i don't like the name", it rests on a rich tradition, it's the stream version of ed. (ed commands are very similar to vi commands at heart.) it's sad the unix crowd never grokked teco because teco was already a programmable stream editor from a tradition that was not particulary streamy. it predated ed by a decade and would have fit the unix world perfectly. maybe it was already too big? I'm sure they would have known about it. Emacs does come from the teco tradition.

grep got its name from what the "grep" command would look like typed within the ed editor.

awk should not be thought of as a tool, it's a programming language, and has every right to the name as ada or pascal or haskell does.

back in those days, filenames had to be short, long names were not allowed, no space, and also, people liked typing short commands. concatenate shortened is... well, cat is as good a name as any. back then the word console was popular for the name of the terminal connected directly to the computer (frequently already logged in), perhaps con was already in use then, it definitely had a meaning already on DEC operating system machines as inherited on Microsoft machines, CON: is still console, and Bell Labs was using DEC machines.

btw at some sites there is a "dog" command. it's like the "cat" command, but it starts at the end of the file and then shows any additions. so, if you want to see if anything is being added to a logfile, you can "dog" the file (which is completely broken when VMS Windows dorks show up and decide to make everything binary) now the verb "to dog" in English means "to follow closely", so it's a cute wordplay on cat and means what it does, similar to "less is more". in less, you can accomplish something like "dog" (dog with more context) with the "F" command. these individual pieces of wordplay don't form a coherent network in the end, but as new things are invented over time they are fun and help you remember new commands till you get used to them.


> ed commands are very similar to vi commands at heart

vi was build on top of ed.

Ed was the Unix line editor, which is why all the commands after a colon have the form of "start,endcommand", eg "1,$p" would list all the lines of a file on your tty/decwriter.

1,$s/findexp/replace/g would s ubstitute all examples ("g") of findexp on the lines 1 through EOF


And these all pretty much came from an era before glass display were (affordable) in computers. A terminal was roughly a keyboard and a printer attached together, or a typewriter cut in half. Paper. No cursors. No arrow keys. Mostly after punched cards and mostly before transistors. And that was only a few decades ago, there's people still alive that have used these machines.

Funny that they are still some of the most efficient and powerful interfaces.


FFS, I'm not that old. I'm 61.

I started out at school in 1977 on a PDP-11 with 16K of RAM, 3 ASR-33s connected by 20ma current loop. We also had a VT-52. All running at 110baud. No valves, all transistors and ICs. That system was already outdated.

Punched/mark sense cards were still around.

Two years later we had a PET, Apple-II, and a TRS-80.

Teletypes have been around since the 1940s.

Cut the false history crap when it's easily found online and elsewhere.


> sed is not "stream editor" as it says above, it's "stream ed"

Well, according to the man page, it is indeed "stream editor":

https://man.cat-v.org/unix_8th/1/sed

I was already aware of its relation to 'ed' (having had to actually use 'ed' in ancient times). However that doesn't change the fact that it does stand for "stream editor".

After reading your post, I thought "That doesn't seem right, I remember it specifically being referred to as 'stream editor'", so I went looking.


They're good names because they're short and easily recognizable

There are only so many short names to go around.

Well yea. These guys got dibs some 35f years ago or so. Maybe more. First come first served.

However once you learn that sed means stream editor, you won't ever forget it. libsodium is forgettable.

> However once you learn that sed means stream editor, you won't ever forget it.

I feel like this is approximately the third time I'm learning this.


I've been using Linux for almost 20 years, including sed a lot of that time, I'm sure I've heard it before, I must have, but when parent wrote it I was like "aah, that makes sense".

Thirty years for me and several more since I learned what sed does and what it's called and never forgot.

I never forget what it does obviously, I use it at least weekly and most of the time daily. But if you asked me what "sed" stand for I'd probably not recall. I might have attempted to work in "extended" somewhere in a guess, because of ex the editor, but besides that :/

Don't forget that you need to know English for that to work. I'm pretty sure most Unix users don't speak English (most computer users definitely don't). I interact with people who know few words besides "hello" and "goodbye", and for them "sed" is a nonsense term, just a set of letters randomly thrown together. Same as e.g. Excel, a random token that means nothing.

sed is just an example, of course, the author's point doesn't hold much weight for many (most?) users globally.


lol no. There are literally a hundred plus Unix tools and commands. I couldn’t tell you what 90% of them mean. I sure as hell couldn’t have told you what sed stood for. And if you asked me tomorrow I also wouldn’t be able to tell you.

C programmers are great. I love C. I wish everything had a beautiful pure C API. But C programmers are strictly banned from naming things. Their naming privileges have been revoked, permanently.


creat(...)

Relevant XKCD

Https://xkcd.com/1168/


To quote bash.org (or qdb.us?), you have to talk to tar with a German accent:

    tar xzf file.tgz
where xzf stands for "extrakt ze feil"

It's `xaf`, because the modern world is way too complex for simple Germanic rules to solve it.

But GNU tar was never the issue. It's almost completely straight forward, the only problem it has is people confusing the tar file with the target directory. If you use some UNIX tar, you will understand why everybody hates it.


Someone once tried this on me during Friday drinks and I successfully conquered the challenge with "tar --help". The challenger tried in vain to claim that this was not valid, but everyone present agreed that an exit code of zero meant that it was a valid solution.

  $ tar --help
  tar: unknown option -- -
  usage: tar {crtux}[014578beFfHhjLmNOoPpqsvwXZz]
             [blocking-factor | format | archive | replstr]
             [-C directory] [-I file] [file ...]
         tar {-crtux} [-014578eHhjLmNOoPpqvwXZz] [-b blocking-factor]
             [-C directory] [-F format] [-f archive] [-I file]
             [-s replstr] [file ...]
  $ echo $?
  1

That is not GNU tar's output. You might wanna make sure your installation is ok.

edit: maybe i missed the joke?


Some drunks in a gnu-shaped echo chamber concluded that the world is gnu-shaped. That's not much a joke, if there is one here. Such presently popular axioms as "unix means linux" or "the userland must be gnu" or "bash is installed" can be shown as poor foundations to reason from by using a unix system that violates all those assumptions. That the XCDD comic did not define what a unix system is is another concern; there are various definitions, some of which would exclude both linux and OpenBSD.

Out of curiosity, what OS are you using?

> maybe i missed the joke?

the bomb specifies only "unix" so you can't assume GNU (which, aha, is Not Unix)


This works with GNU tar, but likely not with tar on other Unix systems.

"tar cf /tmp/a.tar $HOME" would, I guess, work on all POSIX systems.


I seem to remember "tar xvf filename.tar" from the 1990s, I'll try that out. If I'm wrong, I'll be dead before I even notice anything. That's better than dying of cancer or Alzheimer's.

I still do that at least once a week. Along with "tar xzpvf" or more complex invocations like:

    tar cvf - -C /foo/bar baz | zstd > foo.tar.zstd

    tar zxvf
Is burnt into my brain. One of my earliest Linux command line experience required untaring zipped tars.

So yeah that xkcd is "not funny" to me in that sense. Of course I couldn't tell you pretty much any other use without a man page.


z requires it's compressed with gzip and is likely a GNU extension too (it was j for bzip2 iirc). It's also important to keep f the last because it is parametrized and a filename should follow.

So I'd always go with c (create) instead of x (extract), as the latter assumes an existing tar file (zx or xz even a gzipped tar file too; not sure if it's smart enough to autodetect compress-ed .Z files vs .gz either): with create, higher chances of survival in that xkcd.


    tar xvzf file.name
is always a valid command, whether file.name exists or not. When the file doesn't exist, tar will exit with status '2', apparently, but that has no bearing on the validity of the command.

Compare these two logs:

    $ tar xvzf read.me
    tar (child): read.me: Cannot open: No such file or directory
    tar (child): Error is not recoverable: exiting now
    tar: Child returned status 2
    tar: Error is not recoverable: exiting now

    $ tar extract read.me
    tar: invalid option -- 'e'
    Try 'tar --help' or 'tar --usage' for more information.
Do you really not understand the difference between "you told me to do something, but I can't" and "you just spouted some meaningless gibberish"?

The GGP set the benchmark at "returns exit code 0" (for "--help"), and even with XKCD, the term in use is "valid command" which can be interpreted either way.

The rest of your slight is unneccessary, but that's your choice to be nasty.


Like I said, I was operating on a lot of zipped tars. Not sure what you are replying about.

The other commenter already mentioned that the xkcd just said "valid", not return 0 (which to be fair is what the original non xkcd required so I guess fair on the mixup)


Oh, just funny mental gymnastics if we are aiming for survival in 10 seconds with a valid, exit code 0 tar command. :)

As tar is a POSIX (ISO standard for "portable operating system interfaces") utility, I am also highlighting what might get us killed as all of us are mostly used to GNU systems with all the GNU extensions (think also bash commands in scripts vs pure sh too).

No offense intended, just the hackers' chat.


Hehe fair enough in that case. Tho nothing said it had to work on a tar from like 1979 ;)

To me at least POSIX is dead. It's what Windows (before WSL) supported with its POSIX subsystem so it could say it was compatible but of course it was entirely unusable.

    Initial release July 27, 1993; 32 years ago
Like, POSIX: Take the cross section of all the most obscure UNICES out there and declare that you're a UNIX as long as you support that ;)

And yeah I use a Mac at work so a bunch of things I was used to "all my life" so to speak don't work. And they didn't work on AIX either. But that's why you install a sane toolchain (GNU ;) ).

Like sure I was actually building a memory compactification algorithm for MINIX with the vi that comes with MINIX. Which is like some super old version of it that can't do like anything you'd be used to from a VIM. It works. But it's not nice. That's like literally the one time I was using hjkl instead of arrow keys.


Libsodium isn’t a tool or program the average user casually uses. Anyone who actually has to use it in their project even once will remember it.

How often do you forget what Firefox or Gnome are?


That's part of the point, I believe. It's not about being always able to guess the function from first sight. It's also about the function and name serving as mnemonic to each other once you understand how it got named.

I think perhaps the articles argument gets less strong then?

It's claimed grep is "well named" because even though it's not obvious when you first read it, that it being a contraction for "global reg ex print" and hence memorable. I'm not sure the same argument can't be made for libsodium which assuming the reader is familiar with NaCl (the same as the assumption that the previous reader is familiar with regex) then it's an equally memorable name for your crypto library.

There's always a consideration about the context the name is intended and likely to be used in. The article mentions engineering naming and "ibeam", but engineering has it's own technical names an jargon as well. Most people wont know what "4130 tube" means, but people who build bicycle frames or roll cages will - and they're likely to use the less specific term "chromoly" if the don't need to distinguish between 4130 and 4145.

In my head "libsodium" is similar - if you don't know what it (and NaCl) mean, you 100% should keep out of that part of the codebase.


Names fall on a spectrum on this argument. Sodium is not really random because of the use of "salt" on crypto. It's like saying that libsodium is part of your crypto. awk is more random.

The argument goes stronger with projects where the creator seemed to just roll the dice with the name.


Well, "Aho, Weinberger, Kernighan" is not random but entirely unrelated to it's use.

https://en.wikipedia.org/wiki/AWK


One additional complication with grep (and other CLI tools) is that the name itself is part of the day to day UX. It needs to be short, easy to say, and easy to type. With a library the API that is contained within serves the analogous role.

"libsodium" -> "salt" -> "salting is something tangentially related to cryptography" is significantly better as a mnemonic than "awk stands for the author's initials".

Same for grep - with, I guess, the proviso/assumption that you know what regular expression means, which might have been a fair assumption for the sort of people who had command line access to Unix systems in the 70s/80s, but may no longer be valid for developers under 30 who grew up with Windows and were perhaps trained in 6 or 26 week "bootcamps" that didn't have time to cover historical basics like that?

Regular expressions are more of a CS topic (regular languages), though common abbrevs of "re" and "regex" I've only seen in the wild pre and post my formal education in CS.

Yeah, I'd totally expect CS grads, old school Unix sysadmins, and Perl hackers to be fully familiar with Regex. Not so sure I'd expect that from bootcamp front end webdev "grads", self touch game devs, or maybe (I'm not sure?) engineers who have spent their careers in Microsoft dev environments.

> It's ridiculous to complain about "libsodium" and then hold up "awk" as a good name.

Awk is short, easy to pronounce, and difficult to confuse with anything else. It's nearly as perfect as a name can be.

> If you asked someone unfamiliar with unix tools what they thought each of these commands did, diff is the only one which they would have even the slightest chance of guessing.

You seem to have confused the concept of a "name" with that of a "description". The whole point of names is that they aren't descriptive.

https://en.wikipedia.org/wiki/Arbitrariness#Linguistics


> The whole point of names is that they aren't descriptive

I actually agree with this, but that's exactly the opposite of what TFA is arguing.


So few of us use physical tapes these days, but the "tape archive" (tar) remains ubiquitous.

Not entirely unserious: "awk" is a good name because it is three characters to type "rg" is better than "grep" because it is two fewer characters type


There's a reason why the basic Unix file commands are ls, cp, mv, rm.

They're easy to type on a TTY.

grep is from the ed command "g/re/p" which is g (all lines, short for "1,$") /re/ regular expression to search for, "p" to print the lines.

It still works in vi.


I would if they weren't so outrageously expensive (tapes and tape drives ;))

"It's ridiculous to complain about "libsodium" and then hold up "awk" as a good name."

I never really understood what was "wrong" with the NaCl library that it needed another version that does not have the string "nacl" somewhere in the name

I also remember Google named some heavily promoted non-cryptography project "NaCl" subsequent to the NaCl library's release making any searches for "NaCl" biased toward Google's software

I still use original NaCl, e.g., for CurveDNS, tinysshd, dq and dqcache, not libsodium


You don't need to go that deep into the article. Just - emacs. Of course I know what it is, I had to google the name to find it's EditorMACroS

Pretty sure it stands for Eight Megabytes And Constantly Swapping, a literal description of its function. /s

http://www.catb.org/jargon/html/E/EMACS.html (which also pretty much has your definition, though it calls it "Editing MACroS".)


Well that's jargon in any industry. It makes no sense, you take a minute to learn it, and now it's part of your vocabulary.

At least these kinds of acronyms had utility once upon a time, when typing real estate was valuable and you input commands by hand for hours a day. Typing cat vs concat is hours of productivity save.


It's more that they weren't random. There was a convention, a lineage, or a rule behind them. Modern projects often skip that step entirely and jump straight to branding, even when the thing is just plumbing

The problem with current naming is it now using common names like coffee and it's hard to search for them or relate to them. At least the old Unix naming are kinda unique and sometimes means something. Unlike today.

Yep. Cat, find, git, date, gimp, gnu. All very distinctive, easy to search for

Nitpick: Correct me if wrong but I think cat is catenate not concatenate

IMHO, the best names are the ones that are easiest to type. I have read several accounts of authors choosing names for this reason

I sometimes rename other peoples' executables (cf. libraries), not the ones in the traditional UNIX userland, but the ones with goofy names.^1 I will rename them to something I find easier to type and less annoying. I create symbolic links with the original names if I think they will be required^2

With own software, I give every program a number, the source file is named according to the number and the executable name is a short prefix followed by the number. All names are the same length. I have a text file that lists what each program does if I forget

I put a description in a comment at the top of each source file as a sort of header. Then I can do something like

   head src/???.l  
for a list of descriptions

1. Needless to say, Arthur Whitney's software does not get renamed. No need, he gets it

2. I will also rewrite the argument parsing and "usage:" output if it annoys me

The best way to determine what a program does is to read the source. This is one reason I prefer to compile programs from source instead of using "binary packages"

I also think the names that are chosen for so-called "tech" companies are routinely quite silly, but that's another discussion


>developers lost the plot on naming their tools

As a complete outsider it always seemed like the plot had never yet been found to begin with . . .


i think you are misunderstanding the point. with awk, sed, grep they actually hold relevance to the tools whereas a file browser named "zephrus" holds no connection to the actual file browser.

awk is short for awkward, like awkword, for awkwardly manipulating words (text).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: