If all software is built to protect against all possible future anticipated use ...

tyingq · on Feb 22, 2020

Fair, but the leaks apparently weren't documented well, or the linked story wouldn't have read like it did.

eru · on Feb 22, 2020

There's a middle ground. Eg the classic Unix 'cat' (ignoring all the command line switches) does something really simple and re-usable, so it makes sense to make sure it does the Right Thing in all situations.

thaumasiotes · on Feb 22, 2020

I mean, 'cat' does something so simple (apply the identity function to the input) that it has no need to be reusable because there's no point using it in the first place. If you have input, processing it with cat just means you wasted your time to produce something you already had.

derefr · on Feb 22, 2020

The point of cat(1), short for concatenate, is to feed a pipeline multiple concatenated files as input, whereas shell stdin redirection only allows you to feed a shell a single file as input.

This is actually highly flexible, since cat(1) recognizes the “-“ argument to mean stdin, and so you can `cat a - b` in the middle of a pipeline to “wrap” the output of the previous stage in the contents of files a and b (which could contain e.g. a header and footer to assemble a valid SQL COPY statement from a CSV stream.)

thaumasiotes · on Feb 22, 2020

But that is a case where you have several filenames and you want to concatenate the files. The work you're using cat to do is to locate and read the files based on the filename. If you already have the data stream(s), cat does nothing for you; you have to choose the order you want to read them in, but that's also true when you invoke cat.

This is the conceptual difference between

    pipeline | cat       # does nothing

and

    pipeline | xargs cat # leverages cat's ability to open files

Opening files isn't really something I think of cat as doing in its capacity as cat. It's something all the command line utilities do equally.

derefr · on Feb 22, 2020

    pipeline | cat    # does nothing

This is actually re-batching stdin into line-oriented write chunks, IIRC. If you write a program to manually select(2) + fread(2) from stdin, then you’ll observe slightly different behaviour between e.g.

    dd if=./file | myprogram

and

    dd if=./file | cat | myprogram

On the former, select(2) will wake your program up with dd(1)’s default obs (output block size) worth of bytes in the stdin kernel buffer; whereas, on the latter, select(2) will wake your program up with one line’s worth of input in the buffer.

Also, if you have multiple data streams, by using e.g. explicit file descriptor redirection in your shell, ala

    (baz | quux) >4

...then cat(1) won’t even help you there. No tooling from POSIX or GNU really supports consuming those streams, AFAIK.

But it’s pretty simple to instead target the streams into explicit fifo files, and then concatenate those with cat(1).

thaumasiotes · on Feb 22, 2020

> Also, if you have multiple data streams, ...then cat(1) won’t even help you there.

I've been thinking about this more from the perspective of reusing code from cat than of using the cat binary in multiple contexts. Looking over the thread, it seems like I'm the odd one out here.

eru · on Feb 23, 2020

In addition to what the other commenters pointed out about cat being able to concatenate, even using cat as the identity function is useful. Just as the number zero is useful.

gameswithgo · on Feb 22, 2020

For sure, if you can apply a small amount of effort for a high probability of easy re-usability, do it. But if you start going off into weird abstract design land to solve a problem you don't have yet, while it might be fun, probably you should stop. At least if it is a real production thing you are working on.

eru · on Feb 23, 2020

I guess it depends a bit on the shape of your abstract design land. Sometimes it can give you hints about how your API should look like, or what's missing.