Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Useful uses of dd(1).

There remain some useful applications of dd. These may of course be achieved by other mechanisms, but typically less conveniently.

1. Read a specific number of blocks or bytes from a source:

  dd if=/dev/hda of=/root/mbr bs=512 count=1
This will make a copy of, say, your master boot record (first 512 bytes of your first disk drive) and stash it in your /root directory.

2. Read from specific bytes of a file

  dd if=mydata skip=1k bs=32 count=1
Reads 32 bytes after the first 1024 (1k) bytes of "mydata".

3. Write to specific bytes of a file

  dd if=source of=target seek=10k bs=512 count=1 conv=notrunc
That should write 512 bytes from "source" beginning 10k into "target". (I've not tested this, you should verify.)

4. Create a sparse file. Sparse files appear to have a nonzero size, but take up no space on disk, until data is actually written to them. These are often used as "inflating" dynamic filesystem images for virtual machines.

  dd if=/dev/zero of=sparsefile bs=1 count=0 seek=20000M # Create 20 GB sparse file
5. Case conversions. Sure, you could use tr(1), but where's the sport?

  dd if=MixEdCaSE of=lcase conv=lcase   # Convert to lower case
  dd if=MixEdCaSE of=ucase conv=ucase   # Convert to upper case
6. ASCII / EBCDIC conversions

  dd if=ebcdic of=ascii conv=ascii   # ebcdic -> ascii
  dd if=ascii of=ebcdic conv=ebcdic  # ascii -> ebcdic

When reading to or from IBM data tapes, you might find blocking / unblocking conversions useful. I've done this, but it's so long ago that I don't trust my memory on that any more. Odds are good you'll not have to worry about this.

There are other useful applications as well, though these are not typically encountered very often. Do feel free to explore and attempt these on safe media.



Careful, your #2 and #3 are incorrect -- skip and seek operate with blocks, not bytes. So your #2 would copy 32 bytes after the first 32kB of data, and #3 would write 512 bytes at position 5120k.


Btw, there are iflags in Gnu dd to work in bytes as well. I have ddbytes aliased to dd iflag=count_bytes,skip_bytes oflag=seek_bytes .


Thanks. Most of that was from memory and not tested.

The specific recipies should be vetted. The stated goals can be achieved with proper invocation.


> Read from specific bytes of a file

Especially handy when you've fed in a huge amount of JSON (sometimes all on one line because, y'know, why not) into jq and you get the inscrutable output:

    parse error: Invalid numeric literal at line 1, column 236162512


Tail can do this, too:

  tail --bytes=+236162512


Handy but `dd` is much better at giving you a limited context to look at - `dd bs=1 skip=236162504 count=20' just gives you 8 before and 12 after.

(edit: I do have to concede that the `tail|head` version is much faster than the `dd` - ~11s vs ~65s in my quick test with that ^skip)


I suspect you'd get better performance increasing the blocksize, and if necessary trimming the output in a second pass.

Large blocks -> efficient I/O. Within reason.


A note here: if you're using GNU dd, you can also use iflag=count_bytes and then set the block-size to whatever you want. That'll give you the best of both worlds.


Ah, good to know. With a block size of 1M, that brings the `dd` version down to ~15s.


Ugh!

I'll keep that in mind, json-parsing being among my current hobbies...


Apologies. Tangent:

What does the (1) mean beside dd? I see this with man pages. Is it a version identifier?

Edit: thank you both for taking the time to share. I appreciate the quick response.


The “section”[0] or “subpage.” Programs are section 1, hence ‘dd(1)’ The idea is that it can be possible for a library function (section 3) to have the same name as an executable. In which case, you’d type into your shell:

    man 1 <name> # executable: <name>(1)
    man 3 <name> # function:   <name>(3)
Without the distinction argument, man will throw its arms up in the air and give up.

However, if there’s no ambiguity (such as with ‘dd’ where only the executable exists), you can drop the section number parameter when running man. man will then search all the sections, see that ‘dd’ only exists in section 1, and go from there.

[0]: https://linux.die.net/man/


Something like `read` gives a better example. Like `man 1 read` is "read — read from standard input into shell variables", "read [-r] var...", a shell function. And `man 2 read` is " read - read from a file descriptor", "ssize_t read(int fd, void buf, size_t count);". You can also get into `man 8 catman`, "catman - create or update the pre-formatted manual pages", "catman [-d?V] [-M path] [-C file] [section] ...". The `catman` is a mirror-like hierarchy of man pages (usually in troff format using the 'an' macro package) pre-formatted for the standard terminal or whatever to shave off a bit of time running roff on the source files all the time.

This is all ancient knowledge and well in place back in the mid 1980's long before Linux or any of that stuff. The `man` page sections used to be different 3-ring binders all printed out sitting on a table in the computer lab. The 'sections' were just different binders of documentation. Get off my lawn!!!


Correct.

I also use the notation as a convention to indicate that I'm referring to a Unix command (or library function, etc.). E.g., "cat" might refer to a feline, but "cat(1)" should more clearly refer to the Unix / Linux command.

(Where I've got proper markup, I'll typically set commands or references in monospace using backtick notation: `dd`, `cat`, etc.


What do you do if you need to unambiguously refer to cat-as-in-feline?


cat (Felis catus) ;)


I show claws.


> Where I've got proper markup, I'll typically set commands or references in monospace

You can do that here by putting them on a separate line (two newlines, separate paragraph) and indenting by a couple characters.


> Without the distinction argument, man will throw its arms up in the air and give up.

At least in man-db and FreeBSD implementations, it actually shows you the first page it found by default.


You're correct; I thought that might've been wrong...

I just tested, and `man man` opened man(1) despite man(7) existing.


Or config in 5 is another perhaps more likely to come across as a user.


Why is it random numbers rather than readable words?

    man exe <name>
    man lib <name>


As I understand it, because it descends from printed manuals with numbered sections.


Yep. Executables are section 1 simply because they came first in the binders.

There's nothing necessarily preventing man from allowing string codes as a replacement, but man interprets (fully) non-numeric arguments as pages to go. So `man cat dd` (on Ubuntu) will first open cat(1), and when you exit (with 'q'), will prompt if you want to continue. If you say yes, it'll open dd(1).

That has the side effect than `man exe dd` would be interpreted as opening exe(#) (printing "No manual entry for exe") followed by dd(1).


Man pages are divided into sections as follows (GNU & Linux): 0 Header files (usually found in /usr/include) 1 Executable programs or shell commands 2 System calls (functions provided by the kernel) 3 Library calls (functions within program libraries) 4 Special files (usually found in /dev) 5 File formats and conventions, e.g. /etc/passwd 6 Games 7 Miscellaneous (including macro packages and conventions), e.g. man(7), groff(7) 8 System administration commands (usually only for root) 9 Kernel routines [Non standard]


As explained in `man man`, of course.


[flagged]


Er… what?

GP’s comment should apply to most if not all systems with a few caveats / divergences.

FreeBSD certainly mentions and lists standard sections at the top of `man man`. OpenBSD mentions the concept of categories / sections early on, though it only lists the specific section when it comes around to documenting the corresponding filter.

GNU invented info(1), not man(1).


It is just a pet peeve of mine that a lot of younger pros and devs talk about Linux as though it were groundbreaking, when all of GNU/Linux is a copy of a copy of a copy of a copy of the actual earth-shaking developments, SysV (arbitrary) and BSD. Everything in Linux was there before Linux was a twinkle in Linus' eye. I remember when most webservers ran NetBSD, and 5 years later Linux took over the data center. Was NetBSD really so intolerable and Linux really that superior? No, its just that with no memory of the past, one can't know any better. Linux really brought nothing new, no new advances, nothing that wasn't there before, and yet it took over like a jihad. That isn't accurate... not like a jihad... it was a jihad. We can thank fanaticism for Linux in the datacenter. I'm not unfaithful, just system-agnostic. Linux was a solution to a problem already solved many times. And since Linux is not original, it sort of gets under my skin when it is insinuated as such. When talking about any software, one should refer to its original development. When we talk about web browsers, we don't talk as though, say, Microsoft invented the web browser, just because it has one; instead we talk about Tim Berners-Lee, not a copy of a copy of a copy of his work.


The main reason Linux outcompeted the BSD-descendants is the GPL license. Instead of competing with proprietary extensions on a free base, fragmenting the ecosystem, upstreaming as much work as possible makes economic sense.

We can still observe that effect in the Android ecosystem which started out really bad and fragmented and slowly drifts into a more coherent whole, instead of the other way around.

Many saw what the UNIX war led to and tended to avoid similar situations. I too liked what the BSD homogenous distribution let to on a technical basis, but the GPL makes sense as long as business cases can be made fit.


Nah the main reason Linux outcompeted the BSD is that it arrived at exactly the worst moment for the BSDs: USL v. BSDi.

This case put a severe pall on the attractiveness of BSDs as they were suddenly in legal jeopardy just at the outset of the UNIX wars and as they were coming into their own.

And at the same moment, a cleanroom unix arrived on the market, limited in many ways but safe.

The GPL was at best neutral for most users, as can be seen from its adoption (or lack thereof). However the GPL was nowhere near as problematic as “AT&T might get our OS declared illegal”.


Yes, there was that, too. Now I did not mean that the license is of crucial importance to most end users (it might be for some, but likely the other way around as some will prefer the simpler BSD clauses). But it was decidedly important for the business and consulting side to form, and that was hugely where Linux won.

Red Hat, Cygnus and the IBM service group took early big bets on Linux, which could not have happened on a product where vendors based their respective offerings on proprietary lock-ins. That drove adoption in banking and defense whose existing UNIX stacks looked increasingly old, which drove a huge industry shift that took the better part of a decade.

It used to be quite common to find people arguing that BSD was the more "business friendly" license, which is may be true in some specific ways but tends to miss the bigger picture. The adoption of a mainstream system under GPL license was important.

Then the situation was probably different in the web hosting business, in academia, and in other sectors where other factors dominate.


What NetBSD did not have was drivers for any old commodity PC hardware which everyone had laying around. That’s it. That’s why people ran Linux on their stuff, and then continued running Linux in the data center.


386BSD the predecessor of Net/Free/OpenBSD was published under a BSD license in 1992. 6 months after Linus posted his kernel sources on usenet. It was free and open source and no later than the "free" BSDs that are still available today. I'd say that is a reason why it is successful.

Edit: "free" in quotatio marks to not confuse with FreeBSD.


Not to mention the GNU project had been around for nearly a decade previous


I fail to see how the comment you are so vehemently responding to in anyway implied that GNU/Linux was somehow groundbreaking. The only mention of GNU & Linux was to clarify that the sections that followed are the man sections on GNU/Linux systems which is appropriate since not all UNIX flavors contain the same sections or the same order of sections. GNU/Linux systems mostly utilize the same sections and order as BSD based systems. SYSV based systems usually have some differing sections and a differing order of sections.


> We can thank fanaticism for Linux in the datacenter. I'm not unfaithful, just system-agnostic. Linux was a solution to a problem already solved many times

And outside the datacenter? Was the problem of an Open Common Desktop OS - for those people who are radically *not* system-agnostic - solved at the time?

What could have been the effects on the trends building the scenario until today and beyond, had Linux not appeared but keeping the rest of the chessboard intact?


While we're on a tangent. Here's a quick way to list all man pages available on your system:

    $ man -k ''
And if you just want to see the ones for high-level documentation (i.e. section 7), then

    $ man -s 7 -k ''
does the trick. Sections 7 and 5, in particular, are full of hidden gems.


On the topic of hidden gems. The info directory is frequently populated by default with manuals when you install a piece of software. These tend to be more in-depth and complete manuals then the man pages for the same tool. Just type `info` in a terminal and be amazed. Emacs has a convenient info viewer also, under `C-h i`.


usually `info` only works on a GNU derived OS. `man -k` has worked on every flavor of unix I've met.

edit: `man -k ''` might be a GNUism too. Just tried on a BSD derived OS and got back nothing.


Perhaps more directly to your question: Yes, as siblings note it is a manpage section number, but writing it like that is just a way to refer to programs; "cat" could be a feline, but "cat(1)" is a unix program. Oh, and it can disambiguate; printf(1) is a program you run from the shell, printf(3) is a C library function. IMO it's as much a cultural convention as anything.


Precisely this, esp. the cat vs. cat(1) distinction (which I'd just addressed in another response).


the other answers are good, but haven't quite covered the topic.

the unix manual was commonly printed out, and these section numbers were a necessity for looking things up. Collation was section number, and then alphabetical

when you were first introduced to unix, you sat down with the manual and read it.


For the first option a simple

    head -b 512
Will also copy the first 512 bytes in case you want to avoid dd for clarity. I have actually used that for moving mbrs around.


Though "-b" is a gnuism, and isn't there at all on many unixy OSes, or is "-c" on others.


What drives whoever's the second person to implement such a flag to make it different? An explicit desire to inconvenience other "tribes" of computer users and use the flag as a symbol of an in-group?


Following a convention already established in the local ecosystem, presumably.

Though in this case I think previous posters are mistaken. My GNU implementation of head supports -c and not -b.

Perhaps it's the fact that the longopt is --bytes that caused the confusion.


How do you change, improve, or extend a standard if no changes may be permitted?

What's keeping other implementations from adding these features?


Changes with a good reason are fine, but I don't see a good reason to make the same flag a different letter.


Apparently a typo on the post I was replying to...there is no "-b" switch, it's "-c" or "--bytes" for gnu head. Though there are versions of head without one or both of "-c", "--bytes".


As noted below my post above is wrong. It should be -c.

I simply miss-rememered.


create a sparse file

Note that this can also be done using truncate(1).


Also fallocate (Linux only though).


dd can do it on an existing file or stream, transforming sequences of 0 it contains into "holes".


caveat operator: to ensure the conv=sparse operand achieves the desired outcome, be sure to use an output blocksize equal to st_blksize of the output filesystem.


or just use `cp --sparse=always` if it's from one file to another


That is not so portable, so I recommend sticking with dd.


Yup, `dd` helps in a lot of situations when you need portability, not just this one.

Commands like `cp --sparse=always` aren't worth such a blanket disrecommendation though; if you are working directly at a console as opposed to scripting you typically don't need portability.


Again, most of the consoles I work at don’t use GNU coretools.


Then you obviously can't use this, but many people reading my suggestion can.


My most common use for `dd` is using it with `sudo` to direct the output of a unprivileged pipeline to a root-owned file. Instead of running `echo hello >/root/test.txt`, which will fail, I use `echo hello | sudo dd of=/root/test.txt`.


a note: I'd recommend using tee instead of dd for that job, or add iflag=fullblock if your dd supports it.

The thing is that dd issues a read() for each block, but is doesn't actually care how many bytes it gets back in response (unless you turn on fullblock mode).

This isn't really a problem when you're reading from a block device, because it's pretty uncommon to get back less data than you requested. But when you're reading from a pipe, it can/does happen sometimes. So you might ask for five 32-byte chunks, and get [32, 32, 30, 32, 32]-sized chunks instead. This has the effect of messing up the contents of file you're writing, with possibly destructive effects.

To avoid it, use `tee` or something else. Or use iflag=fullblock to ensure that you get every byte you request (up to EOF or count==N).


I've never had any trouble, but good to know.


7. Write a new MBR to a disk, keeping the partition table:

    dd bs=440 count=1 if=/usr/lib/syslinux/bios/mbr.bin of=/dev/sda
Even in the age of EFI/GPT, that still gets used often (usually VM providers that only offer MBR boot).


  head -b 440 /usr/lib/syslinux/bios/mbr.bin > /dev/sda


I suspect you meant -c 440; I can’t find a variant of head(1) that has a -b operand on any Unix. Note that -c is not POSIX but does have widespread support. Notably missing on Solaris.

Fun fact, the -c usage comes from ksh, where head is a shell builtin.


Oops, yes, -c indeed! Thanks!


Fun fact: 1-4 don’t work in the context of short reads — and GNU’s fullblock extension isn’t specified in POSIX.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: