Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The easy way to download something in a Dockerfile:

     RUN wget URL
Your better way?

    RUN wget URL && \
        if [[ "$(sha256sum <the output>)" != "the hash" ]]; then \
            # Wow, I sure hope I spelled this right!  Also, can a comment end with \
            echo "Hmm, sha256 was wrong.  Let's log the actual hash we saw.  Oh wait, forgot to save that.  Run sha256sum again?" 2>&1 \
            echo "Hmm, better not forget to fail!" 2>&1 \
            exit 1 # Better remember that 1 is failure and 0 is success! \
        fi
An actual civilized solution would involve a manifest of external resources, a lockfile, and a little library of instructions that the tooling could use to fetch or build those external resources. Any competent implementation would result in VASTLY better caching behavior than Docker or Buildah can credibly implement today -- wget uses network resources and is usually slow, COPY is oddly slow, and the tooling has no real way to know that the import of a file could be cached even if something earlier in the Dockerfile (like "apt update"!) changed.

Think of it like modern cargo or npm or whatever, but agnostic to the kind of resource being fetched.

If there was a manifest and lockfile, it really would not be that hard to wire apt or dnf up to it so that a dependency solver would run outside the container, fetch packages, and then install them inside the container. Of course, either COPY would need to become faster or bind mounts would have to start working reliably. Oh well.

> Honestly I've always found reproducibility harder to enforce when using Linux package managers

Timestamps could well cause issues (which would be fixable), but it's not conceptually difficult to download .rpm or .deb files and then install them. rpm -i works just fine. In fact, rpm -i --root arguably works quite a bit better than docker/podman build, and it would be straightforward to sandbox it.



> An actual civilized solution would involve a manifest of external resources, a lockfile, and a little library of instructions that the tooling could use to fetch or build those external resources.

Sounds like you're describing Nix.

I actually thought the article would be framed a bit differently when I saw the title: I think Docker and its ecosystem solve several adjacent but not intrinsically intertwined problems:

- Creating repeatable or ideally reproducible runtime environments for applications (via Dockerfiles) - Isolating applications' runtime environments (filesystems, networks, etc) from one another (via the Docker container runtime) - Specifying a common distribution format for applications and their runtime environments (via Docker images) - Providing a runtime to actually run applications in (via the Docker CLI and Docker Desktop)

In this context, a runtime environment consists of the application's dependencies, its configuration files, its temporary and cache files, its persistent state (usually via a volume or bind mount), its exposed ports, and so on.

I would argue that Docker is often used solely for dependency management and application distribution, and for such use cases things like network and filesystem isolation just present obstacles to be worked around, and this is why developers complain about Docker's complexity.


What you are looking for is Mockerfiles.

https://matt-rickard.com/building-a-new-dockerfile-frontend

It’s just a proof of concept. At least shows what can be done if one peeks under the hood a bit.

With multi stage builds you can already do quite few of the things you mention, like downloading in one container and copy into another, happening in parallel while apt install is running. It’s hopelessly verbose to do so though and one ends up with not using it and instead just brute forcing the most simple imperative file instead.


On the one hand, that’s really cool. On the other hand, I just leaned (from that article!) that the Dockerfile “syntax” is actually a reference to a Docker container. It’s turtles all the way down!

Seriously, though:

> The external files are downloaded in separate alpine images, and then use the copy helper to move them into the final image. It uses a small script to verify the checksums of the downloaded binaries s = s.Run(shf("echo \"%s %s\" | sha256sum -c -", e.Sha256, downloadDst)).Root(). If the checksum does not match, the command fails, and the image build stops.

Having any nontrivial build operation being an invocation of an entire Docker container seems like a terrible design. Docker is cool, but actual host-native Linux userspace images are a really rather nastily complicated way to express computation. What’s wrong with Lua or JavaScript or WASM or quake-c or Java or Lisp or any other sandboxable way to express computation that is actually intended for this sort of application? (All of the above, unlike Docker, can actually represent a computation such that a defined runtime can run it portably.

Docker images, being the sort of turtle that are not amenable to a clean build process, don’t seem like a good thing to try to fix by turtles-all-the-way-downing them.


We have built something very similar to what you are describing: https://github.com/chainguard-dev/apko




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: