Useful uses of dd(1). There remain some useful applications of dd. These may of ...

photon-torpedo · on May 23, 2022

Careful, your #2 and #3 are incorrect -- skip and seek operate with blocks, not bytes. So your #2 would copy 32 bytes after the first 32kB of data, and #3 would write 512 bytes at position 5120k.

_flux · on May 23, 2022

Btw, there are iflags in Gnu dd to work in bytes as well. I have ddbytes aliased to dd iflag=count_bytes,skip_bytes oflag=seek_bytes .

dredmorbius · on May 23, 2022

Thanks. Most of that was from memory and not tested.

The specific recipies should be vetted. The stated goals can be achieved with proper invocation.

zimpenfish · on May 23, 2022

> Read from specific bytes of a file

Especially handy when you've fed in a huge amount of JSON (sometimes all on one line because, y'know, why not) into jq and you get the inscrutable output:

    parse error: Invalid numeric literal at line 1, column 236162512

nemetroid · on May 23, 2022

Tail can do this, too:

  tail --bytes=+236162512

zimpenfish · on May 23, 2022

Handy but `dd` is much better at giving you a limited context to look at - `dd bs=1 skip=236162504 count=20' just gives you 8 before and 12 after.

(edit: I do have to concede that the `tail|head` version is much faster than the `dd` - ~11s vs ~65s in my quick test with that ^skip)

dredmorbius · on May 23, 2022

I suspect you'd get better performance increasing the blocksize, and if necessary trimming the output in a second pass.

Large blocks -> efficient I/O. Within reason.

nrclark · on May 23, 2022

A note here: if you're using GNU dd, you can also use iflag=count_bytes and then set the block-size to whatever you want. That'll give you the best of both worlds.

zimpenfish · on May 24, 2022

Ah, good to know. With a block size of 1M, that brings the `dd` version down to ~15s.

dredmorbius · on May 23, 2022

Ugh!

I'll keep that in mind, json-parsing being among my current hobbies...

Waterluvian · on May 23, 2022

Apologies. Tangent:

What does the (1) mean beside dd? I see this with man pages. Is it a version identifier?

Edit: thank you both for taking the time to share. I appreciate the quick response.

colejohnson66 · on May 23, 2022

The “section”[0] or “subpage.” Programs are section 1, hence ‘dd(1)’ The idea is that it can be possible for a library function (section 3) to have the same name as an executable. In which case, you’d type into your shell:

    man 1 <name> # executable: <name>(1)
    man 3 <name> # function:   <name>(3)

Without the distinction argument, man will throw its arms up in the air and give up.

However, if there’s no ambiguity (such as with ‘dd’ where only the executable exists), you can drop the section number parameter when running man. man will then search all the sections, see that ‘dd’ only exists in section 1, and go from there.

[0]: https://linux.die.net/man/

zengargoyle · on May 23, 2022

Something like `read` gives a better example. Like `man 1 read` is "read — read from standard input into shell variables", "read [-r] var...", a shell function. And `man 2 read` is " read - read from a file descriptor", "ssize_t read(int fd, void buf, size_t count);". You can also get into `man 8 catman`, "catman - create or update the pre-formatted manual pages", "catman [-d?V] [-M path] [-C file] [section] ...". The `catman` is a mirror-like hierarchy of man pages (usually in troff format using the 'an' macro package) pre-formatted for the standard terminal or whatever to shave off a bit of time running roff on the source files all the time.

This is all ancient knowledge and well in place back in the mid 1980's long before Linux or any of that stuff. The `man` page sections used to be different 3-ring binders all printed out sitting on a table in the computer lab. The 'sections' were just different binders of documentation. Get off my lawn!!!

dredmorbius · on May 23, 2022

Correct.

I also use the notation as a convention to indicate that I'm referring to a Unix command (or library function, etc.). E.g., "cat" might refer to a feline, but "cat(1)" should more clearly refer to the Unix / Linux command.

(Where I've got proper markup, I'll typically set commands or references in monospace using backtick notation: `dd`, `cat`, etc.

NAR8789 · on May 23, 2022

What do you do if you need to unambiguously refer to cat-as-in-feline?

eesmith · on May 23, 2022

cat (Felis catus) ;)

dredmorbius · on May 23, 2022

I show claws.

CRConrad · on May 24, 2022

> Where I've got proper markup, I'll typically set commands or references in monospace

You can do that here by putting them on a separate line (two newlines, separate paragraph) and indenting by a couple characters.

jwilk · on May 23, 2022

> Without the distinction argument, man will throw its arms up in the air and give up.

At least in man-db and FreeBSD implementations, it actually shows you the first page it found by default.

colejohnson66 · on May 23, 2022

You're correct; I thought that might've been wrong...

I just tested, and `man man` opened man(1) despite man(7) existing.

OJFord · on May 23, 2022

Or config in 5 is another perhaps more likely to come across as a user.

xigoi · on May 23, 2022

Why is it random numbers rather than readable words?

    man exe <name>
    man lib <name>

karatinversion · on May 23, 2022

As I understand it, because it descends from printed manuals with numbered sections.

colejohnson66 · on May 23, 2022

Yep. Executables are section 1 simply because they came first in the binders.

There's nothing necessarily preventing man from allowing string codes as a replacement, but man interprets (fully) non-numeric arguments as pages to go. So `man cat dd` (on Ubuntu) will first open cat(1), and when you exit (with 'q'), will prompt if you want to continue. If you say yes, it'll open dd(1).

That has the side effect than `man exe dd` would be interpreted as opening exe(#) (printing "No manual entry for exe") followed by dd(1).

nieve · on May 23, 2022

Man pages are divided into sections as follows (GNU & Linux): 0 Header files (usually found in /usr/include) 1 Executable programs or shell commands 2 System calls (functions provided by the kernel) 3 Library calls (functions within program libraries) 4 Special files (usually found in /dev) 5 File formats and conventions, e.g. /etc/passwd 6 Games 7 Miscellaneous (including macro packages and conventions), e.g. man(7), groff(7) 8 System administration commands (usually only for root) 9 Kernel routines [Non standard]

Arnavion · on May 23, 2022

As explained in `man man`, of course.

Maursault · on May 23, 2022

[flagged]

masklinn · on May 23, 2022

Er… what?

GP’s comment should apply to most if not all systems with a few caveats / divergences.

FreeBSD certainly mentions and lists standard sections at the top of `man man`. OpenBSD mentions the concept of categories / sections early on, though it only lists the specific section when it comes around to documenting the corresponding filter.

GNU invented info(1), not man(1).

Maursault · on May 23, 2022

It is just a pet peeve of mine that a lot of younger pros and devs talk about Linux as though it were groundbreaking, when all of GNU/Linux is a copy of a copy of a copy of a copy of the actual earth-shaking developments, SysV (arbitrary) and BSD. Everything in Linux was there before Linux was a twinkle in Linus' eye. I remember when most webservers ran NetBSD, and 5 years later Linux took over the data center. Was NetBSD really so intolerable and Linux really that superior? No, its just that with no memory of the past, one can't know any better. Linux really brought nothing new, no new advances, nothing that wasn't there before, and yet it took over like a jihad. That isn't accurate... not like a jihad... it was a jihad. We can thank fanaticism for Linux in the datacenter. I'm not unfaithful, just system-agnostic. Linux was a solution to a problem already solved many times. And since Linux is not original, it sort of gets under my skin when it is insinuated as such. When talking about any software, one should refer to its original development. When we talk about web browsers, we don't talk as though, say, Microsoft invented the web browser, just because it has one; instead we talk about Tim Berners-Lee, not a copy of a copy of a copy of his work.

xorcist · on May 23, 2022

The main reason Linux outcompeted the BSD-descendants is the GPL license. Instead of competing with proprietary extensions on a free base, fragmenting the ecosystem, upstreaming as much work as possible makes economic sense.

We can still observe that effect in the Android ecosystem which started out really bad and fragmented and slowly drifts into a more coherent whole, instead of the other way around.

Many saw what the UNIX war led to and tended to avoid similar situations. I too liked what the BSD homogenous distribution let to on a technical basis, but the GPL makes sense as long as business cases can be made fit.

masklinn · on May 23, 2022

Nah the main reason Linux outcompeted the BSD is that it arrived at exactly the worst moment for the BSDs: USL v. BSDi.

This case put a severe pall on the attractiveness of BSDs as they were suddenly in legal jeopardy just at the outset of the UNIX wars and as they were coming into their own.

And at the same moment, a cleanroom unix arrived on the market, limited in many ways but safe.

The GPL was at best neutral for most users, as can be seen from its adoption (or lack thereof). However the GPL was nowhere near as problematic as “AT&T might get our OS declared illegal”.

xorcist · on May 23, 2022

Yes, there was that, too. Now I did not mean that the license is of crucial importance to most end users (it might be for some, but likely the other way around as some will prefer the simpler BSD clauses). But it was decidedly important for the business and consulting side to form, and that was hugely where Linux won.

Red Hat, Cygnus and the IBM service group took early big bets on Linux, which could not have happened on a product where vendors based their respective offerings on proprietary lock-ins. That drove adoption in banking and defense whose existing UNIX stacks looked increasingly old, which drove a huge industry shift that took the better part of a decade.

It used to be quite common to find people arguing that BSD was the more "business friendly" license, which is may be true in some specific ways but tends to miss the bigger picture. The adoption of a mainstream system under GPL license was important.

Then the situation was probably different in the web hosting business, in academia, and in other sectors where other factors dominate.

teddyh · on May 23, 2022

What NetBSD did not have was drivers for any old commodity PC hardware which everyone had laying around. That’s it. That’s why people ran Linux on their stuff, and then continued running Linux in the data center.

seedie · on May 23, 2022

386BSD the predecessor of Net/Free/OpenBSD was published under a BSD license in 1992. 6 months after Linus posted his kernel sources on usenet. It was free and open source and no later than the "free" BSDs that are still available today. I'd say that is a reason why it is successful.

Edit: "free" in quotatio marks to not confuse with FreeBSD.

hnlmorg · on May 23, 2022

Not to mention the GNU project had been around for nearly a decade previous

tssva · on May 23, 2022

I fail to see how the comment you are so vehemently responding to in anyway implied that GNU/Linux was somehow groundbreaking. The only mention of GNU & Linux was to clarify that the sections that followed are the man sections on GNU/Linux systems which is appropriate since not all UNIX flavors contain the same sections or the same order of sections. GNU/Linux systems mostly utilize the same sections and order as BSD based systems. SYSV based systems usually have some differing sections and a differing order of sections.

mdp2021 · on May 23, 2022

> We can thank fanaticism for Linux in the datacenter. I'm not unfaithful, just system-agnostic. Linux was a solution to a problem already solved many times

And outside the datacenter? Was the problem of an Open Common Desktop OS - for those people who are radically *not* system-agnostic - solved at the time?

What could have been the effects on the trends building the scenario until today and beyond, had Linux not appeared but keeping the rest of the chessboard intact?

xelxebar · on May 23, 2022

While we're on a tangent. Here's a quick way to list all man pages available on your system:

    $ man -k ''

And if you just want to see the ones for high-level documentation (i.e. section 7), then

    $ man -s 7 -k ''

does the trick. Sections 7 and 5, in particular, are full of hidden gems.

klibertp · on May 23, 2022

On the topic of hidden gems. The info directory is frequently populated by default with manuals when you install a piece of software. These tend to be more in-depth and complete manuals then the man pages for the same tool. Just type `info` in a terminal and be amazed. Emacs has a convenient info viewer also, under `C-h i`.

tingletech · on May 23, 2022

usually `info` only works on a GNU derived OS. `man -k` has worked on every flavor of unix I've met.

edit: `man -k ''` might be a GNUism too. Just tried on a BSD derived OS and got back nothing.

yjftsjthsd-h · on May 23, 2022

Perhaps more directly to your question: Yes, as siblings note it is a manpage section number, but writing it like that is just a way to refer to programs; "cat" could be a feline, but "cat(1)" is a unix program. Oh, and it can disambiguate; printf(1) is a program you run from the shell, printf(3) is a C library function. IMO it's as much a cultural convention as anything.

dredmorbius · on May 23, 2022

Precisely this, esp. the cat vs. cat(1) distinction (which I'd just addressed in another response).

fsckboy · on May 23, 2022

the other answers are good, but haven't quite covered the topic.

the unix manual was commonly printed out, and these section numbers were a necessity for looking things up. Collation was section number, and then alphabetical

when you were first introduced to unix, you sat down with the manual and read it.

rocqua · on May 23, 2022

For the first option a simple

    head -b 512

Will also copy the first 512 bytes in case you want to avoid dd for clarity. I have actually used that for moving mbrs around.

tyingq · on May 23, 2022

Though "-b" is a gnuism, and isn't there at all on many unixy OSes, or is "-c" on others.

jimmaswell · on May 23, 2022

What drives whoever's the second person to implement such a flag to make it different? An explicit desire to inconvenience other "tribes" of computer users and use the flag as a symbol of an in-group?

scbrg · on May 23, 2022

Following a convention already established in the local ecosystem, presumably.

Though in this case I think previous posters are mistaken. My GNU implementation of head supports -c and not -b.

Perhaps it's the fact that the longopt is --bytes that caused the confusion.

dredmorbius · on May 23, 2022

How do you change, improve, or extend a standard if no changes may be permitted?

What's keeping other implementations from adding these features?

jimmaswell · on May 23, 2022

Changes with a good reason are fine, but I don't see a good reason to make the same flag a different letter.

tyingq · on May 23, 2022

Apparently a typo on the post I was replying to...there is no "-b" switch, it's "-c" or "--bytes" for gnu head. Though there are versions of head without one or both of "-c", "--bytes".

rocqua · on May 23, 2022

As noted below my post above is wrong. It should be -c.

I simply miss-rememered.

cperciva · on May 23, 2022

create a sparse file

Note that this can also be done using truncate(1).

Nux · on May 23, 2022

Also fallocate (Linux only though).

natmaka · on May 23, 2022

dd can do it on an existing file or stream, transforming sequences of 0 it contains into "holes".

inopinatus · on May 23, 2022

caveat operator: to ensure the conv=sparse operand achieves the desired outcome, be sure to use an output blocksize equal to st_blksize of the output filesystem.

jhugo · on May 23, 2022

or just use `cp --sparse=always` if it's from one file to another

inopinatus · on May 23, 2022

That is not so portable, so I recommend sticking with dd.

jhugo · on May 23, 2022

Yup, `dd` helps in a lot of situations when you need portability, not just this one.

Commands like `cp --sparse=always` aren't worth such a blanket disrecommendation though; if you are working directly at a console as opposed to scripting you typically don't need portability.

inopinatus · on May 23, 2022

Again, most of the consoles I work at don’t use GNU coretools.

jhugo · on May 23, 2022

Then you obviously can't use this, but many people reading my suggestion can.

yepguy · on May 23, 2022

My most common use for `dd` is using it with `sudo` to direct the output of a unprivileged pipeline to a root-owned file. Instead of running `echo hello >/root/test.txt`, which will fail, I use `echo hello | sudo dd of=/root/test.txt`.

nrclark · on May 23, 2022

a note: I'd recommend using tee instead of dd for that job, or add iflag=fullblock if your dd supports it.

The thing is that dd issues a read() for each block, but is doesn't actually care how many bytes it gets back in response (unless you turn on fullblock mode).

This isn't really a problem when you're reading from a block device, because it's pretty uncommon to get back less data than you requested. But when you're reading from a pipe, it can/does happen sometimes. So you might ask for five 32-byte chunks, and get [32, 32, 30, 32, 32]-sized chunks instead. This has the effect of messing up the contents of file you're writing, with possibly destructive effects.

To avoid it, use `tee` or something else. Or use iflag=fullblock to ensure that you get every byte you request (up to EOF or count==N).

yepguy · on May 23, 2022

I've never had any trouble, but good to know.

matja · on May 23, 2022

7. Write a new MBR to a disk, keeping the partition table:

    dd bs=440 count=1 if=/usr/lib/syslinux/bios/mbr.bin of=/dev/sda

Even in the age of EFI/GPT, that still gets used often (usually VM providers that only offer MBR boot).

jhugo · on May 23, 2022

  head -b 440 /usr/lib/syslinux/bios/mbr.bin > /dev/sda

inopinatus · on May 24, 2022

I suspect you meant -c 440; I can’t find a variant of head(1) that has a -b operand on any Unix. Note that -c is not POSIX but does have widespread support. Notably missing on Solaris.

Fun fact, the -c usage comes from ksh, where head is a shell builtin.

jhugo · on May 25, 2022

Oops, yes, -c indeed! Thanks!

gnubison · on May 23, 2022

Fun fact: 1-4 don’t work in the context of short reads — and GNU’s fullblock extension isn’t specified in POSIX.