Docker feels incomplete without Compose. Dockerfiles on their own are usually missing a lot of critical information about how to run the container in practice (volumes, environment variables, network dependencies, ports to forward, etc). With plain Docker you usually have to read a bunch of documentation to figure all that out; with Compose all that information is in the file and you can just run `docker compose up`.
Strongly agree. There's a big knowledge gap between the container author and the container consumer that Docker doesn't have a great toolkit for bridging. The best we get are the ENV, EXPOSE, and VOLUME directives, but authors should really have tools to say:
- what filepaths are important (both for volume-mounting and for files output by the container)
- what envvars (or CMD, or ENTRYPOINT) are needed
- what ports will be opened, and what they mean
- what protocols the ports talk
...etc. I'd really like containers to one day behave like software libraries: you pass in a set of documented parameters, and get an object that you can pass around and use in return.
I don't think that's really a docker problem. Its more about the instances of software that you want to run together.
e.g. nginx in isolation is useless. But is really useful when you want to expose a bunch of other services on a single http(s) interface.
When you are working at the compose level you are thinking about an ecosystem of services working together and of course then you are you using the compose tools to achieve that. But managing each component in isolation is much more tedious.
Even with only one service I still consider Compose to be a necessity. Otherwise you need to manually specify how to mount the volumes, set the environment variables, and forward ports when you run the container. Only the most trivial containers are easily usable with plain `docker run`.
That's a fair point I suppose that docker itself doesn't provide a declarative config for starting containers. But that is also true of a whole bunch of tools which only either provide cli or config but not both
> Dockerfiles on their own are usually missing a lot of critical information about how to run the container
Uh, of course they do, since they say nothing about how to run a container. Dockerfiles encode an image (also not really deterministically, contrary to the common belief, but that's details), not a container.
And docker-compose.yaml represents making a bunch of containers from images. These are completely separate things.
Dockerfiles are more versatile. For example Kubernetes also uses Docker images (thanks to the OCI specification), but you don't configure Kubernetes with docker-compose.yml files, and Kubernetes usually doesn't even use Docker under the hood.
But sure, for local deployment docker-compose is awesome.
Not sure I follow? Docker Compose in no way trys to replace a Dockerfile or docker images - every container you specify in one still has the exact same Dockerfile, compose can't replace this.
I also think Compose is a fantastic tool, but to start comparing to Dockerfiles and Kubernetes suggests not understanding what Compose is for.
The thing Compose does better than a bunch of loose dockerfiles especially is document in a runnable fashion the relationships between key containers in a given stack. This information alone can tell you a lot about how you would deploy in more complex environments, as well as giving you something you can usually deploy instantly with a docker compose up for local development work.
Without compose, you typically have to rely on loose documentation much more, which may be stale or outdated and obviously isn't "runnable" like docker compose is. I also find it far easier to understand what volumes are critical for persistence when its coded in the Compose yaml, vs a bunch of -v args in the documentation for a docker run command launch.
Love Docker Compose. At an old job, we used it to provide every developer with a local version of the frontend, backend and database, all in separate containers that only talked to each other. All of that with one command!
I had a similar set up and it made debugging really easy. Instead of building and running every app you could just run the full environment with known good images and replace the app you're working on with the debuggable instance running in the IDE.
These days you can even build, and deploy a debuggable image from an IDE easily so you don't need to bridge the network.
Practically speaking how was that implemented? I would guess maybe an environment variable per app? I'm just starting out with this kind of setup on a personal project
People have been making this facetious argument since Docker launched a decade ago, it's even more ridiculous to make it now. My job would very literally be harder and take much longer to accomplish things without the benefits of containers. Anyone who still feels this way about containers for software dev today needs to get off their grumpy stool and embrace the benefits.
Funnily enough I chose docker compose because it was such an easy abstraction. I tell it i want a bunch of containers and specify their network relationship in a simple flat file. I tell it that I want each of them to expose their debugger port on a different local port, then I can just hook up a normal debugger and go. Couldn't be easier!
I'm an odd one out -- I actually don't like Compose and it may partly have to do with my dislike for YAML. I often have a hard time finding or understanding the docker-compose equivalent of a docker CLI command or flag.
I've taken to replacing docker compose files with bash scripts for my personal projects. So far, it's been OK.
> I've taken to replacing docker compose files with bash scripts for my personal projects. So far, it's been OK.
I'm sorry but this is just incredibly messy for anything that's not trivial. We have tons of docker-compose services, all instantiated N times and each having their own network, secret, volume, env var, target etc configuration. In my mind it doesn't make sense to replace a 100 line YAML with a gargantuan messy shell script only old graybeards understand.
I share the same feeling about compose. I used to put things in Makefiles, to also have the dependencies of containers of other containers. But I have not yet found a good way to replace `docker compose up' functionality, in case the containers are already running. It does some magic to only replace the containers that need replacing and keeps things running.
You might be interested in what we're building at Kurtosis: you define your app in a deterministic subset of Python, and then we do the magic to change only what's necessary on subsequent runs (what we call "idempotent runs"). It also supports parameterization.
Where do all these people who keep complaining about YAML come from? Sure it's not perfect but don't you have other things to do in your dev job?
If your job is to write config all day, and nothing else (which is the only reason I think you'd care that much to dislike docker compose JUST because of yml), maybe the thing you actually dislike is the nature of the job?
Docker is also a company that sells 3 of these as a product.
What's interesting is that for those 3 there are often better versions. Docker Engine is not used in a number of popular examples – GCP for example uses their own engine if I remember correctly. There are other pluggable runtimes. The Docker desktop app isn't great, and is now expensive, compared to a number of cheaper or free or open source alternatives. And Docker Hub is basically just for free public publishing, as Docker registries are so commoditised.
I've always felt that the best bit of Docker is the file format and the UX around that. But that's not something Docker-the-company has managed to monetise.
This is the correct view. Docker is a company selling some useful products, but the OCI (Open Container Initiative) format is what is interesting, and it can be used with many different tools not just Docker's.
Docker engine uses containerd to run containers. Containerd was born from Docker company. Containerd is probably the most popular runtime for Kubernetes. GKE uses containerd as well.
Docker hub is absolutely the most popular registry for open source software and configured as default address for kubernetes.
Nothing of it is irreplaceable for sure. But Docker made a big contribution to container world, regardless.
I don't understand the need for Docker Desktop. When you install the docker-ce package, you get the docker client CLI and dockerd daemon managed as a systemd service. I have never seen anyone build or run images outside of the CLI. Why are people paying for a GUI wrapper?
Docker Desktop takes care of all the annoying config work needed to make Docker work on Windows, including managing virtual machine engines, configuring the windows firewall and the necessary magic to mount local drives in containers. Using Docker on Windows without Docker Desktop is a massive PITA.
Docker desktop creates and manages Linux VM with docker engine inside. It also does some magic to allow mounting host directories inside containers. I've yet to reproduce that magic, TBH.
It's obviously important for macOS, because macOS does not support docker containers.
While Windows supports docker containers, most docker containers are built for Linux, so Docker Desktop is important for Windows as well.
Now for Linux you can run docker engine on the host, if you're proficient with Linux. However some people use Docker Desktop on Linux as well. I could imagine that you can run very old host Linux this way.
I, personally, avoid docker desktop on macOS. Right now I'm using remote Linux VM and in the future I'll use Linux in VM, configured manually. However I've yet to find out how to mount host directories inside that VM containers. Some magic.
On Windows and Mac, Docker Desktop is the only way to install Docker. If you work for a big company that isn't using Linux then you need to pay just to use Docker.
I've used docker on windows a lot and it never even occurred to me to install Docker Desktop. I just use normal docker inside of WSL since that's where I do all my actual work when I have to use windows. It pretty much works exactly like Linux from what I recall although I can't test it now because my windows machine is broken.
This is a hilarious comment because windows ignored the developer experience for YEARS with its horrible command line experience. Windows was a regression. Linux/unix, which powers the world by the way, is where it is at regardless of how old its foundations are.
I think we just have to agree to disagree. My point was, modern Windows is nothing like Windows 95, just like modern command line experience is nothing like MS-DOS (as you seem to imply). Given a choice I strongly prefer command line for most tasks. Reducing my preference to ignorance rubs me in a wrong way.
Having been there when there was no other option, yes I pretty much consider some people prefer to be stuck in the past of green and amber phosphor terminals, like admiring the golden age of punch cards with diagonal red lines to avoid losing the deck order.
I love seeing this comment in a thread about OCI containers ( a deeply Linux technology ). This whole discussion is about the abstraction that people hav to use on Windows to make it work.
When you want to talk about how the world has progressed, I guess you do not mean Kubernetes or Docker. Give me an example of how Windows has changed the world in the last 30 years.
COM as universal technology to share commercial libraries, language agnostic.
Also the basis to share document manipulation across applications.
An usable 3D API, that has to be emulated on Linux, as not even Android developers care to port their games.
Making computing mainstream for non technical users, where locating where a file was saved is already a challenge in itself.
As for OCI containers, we are back in the 1970's, Linux is catching up with what IBM already did back then, still misses some of the cool capabilities of their mainframes.
It wasn't even the first as, Tru64 and HP-UX Vaults had the first container like capabilities in late 1990's, followed by Solaris Zones, BSD jails.
What are better alternatives to Docker desktop on Windows?
I am using Rancher desktop and number of bugs and lack of features would make paying for Docker desktop worthwhile if I used it more than ~1 hour per week.
This is a weird question: why would you use anything like Docker Desktop to begin with?
I didn't even know Docker Desktop existed until couple months ago. My wife works on MS Windows and she needed to run someone's project that was for some reason distributed in Docker images... She couldn't get Docker installed on Windows, but that's kind of expected as everything is kind of screwy there... so, I had to use Web search to figure out what is the way to do it (she did try to get Docker Desktop).
I ended up configuring Hyper-V (solely because it comes bundled with the system) and installing Docker in some Arch VM I set up for that.
I cannot claim that this is a "comfortable" setup, but given the overall bad UX of that OS and that Docker isn't designed to run on it from the start, I think it's an OK solution.
I briefly tried opening the Docker Desktop GUI, and I just cannot understand why would anyone need that specific thing... it just seems to make everything worse at the baseline. You immediately miss the convenience of being able to feed the output it produces into grep / pager or to combine information extracted from one output with the input into another operation.
Lol, this is exactly where I'm at. I push others to do wsl2 + podman but I'm doing Hyper-v myself just because fighting cgroup and systemd stuff was quite annoying. Further, you get some really nice space saving with the btrfs driver.
Maybe her work changed some settings in Windows? From a clean install of Windows I thought Docker Desktop installed fine for me but WSL2 needed me to change some setting.
It was definitely not a clean Windows install. She inherited this laptop from someone else, and it wasn't well "cleaned up" (some previous user settings were still there). Also, the IT had put some restrictions on how Hyper-V could be used, I think, or maybe have pre-configured it somehow (most likely unintentionally, when configuring something else).
The sort of problems it was having were related to virtualizing the network adapter. Somehow after about half an hour it would just "stop working". Since it's Windows... no debugging, no logs, and the Web search brings up a lot of nonsense when you try to figure out the problem. The kind of "stop working" was that the host could still use the adapter, but the guest system (the one running Docker) would still have an IP, knew the IP of the router... but sending anything to the outside world (i.e. the router) would end up lost.
Maybe it was a faulty adapter. Maybe faulty Windows driver, or maybe a problem in Hyper-V / its settings... I never figured it out. The same problem existed if I tried to do it with any VM I'd create there, if I tried to virtualize the adapter, so it wasn't unique to Docker. Eventually, I've given up on the idea of having a virtualized adapter, and put the VM in its own NAT'ed network.
If you don't need to access `docker` command from Windows, just install docker using the same way you would install docker on a regular Linux in WSL2. (e.g. sudo apt install docker-ce, curl https://get.docker.sh/)
Podman in wsl2 works fairly well. However, there are some rough edges that you'll have to smooth out. (Such as setting up the docker host and fighting the podman socket stuff if you need it).
> First, the Dockerfile file format for declaratively describing a machine (operating system, installed packages, processes, etc).
I have a hard time calling Dockerfile declarative when the RUN statement literally runs a sequence of user provided shell commands in a container. Is a shell script declarative or imperative?
It is structured and mostly results in an easy to understand set of instructions to create a container image. But it is a set of instructions and you can shoot yourself in the foot (especially with reproducibility)
Which is funny because Moby split the Dockerfile format from Docker, so you no longer need the Dockerfile format to make Docker images. At the top of a “Dockerfile” is
# syntax=docker/dockerfile:v1
Which is a Docker image that translates the declarative syntax into the commands sent to the low-level builder. That syntax is pointing to a Docker image name and tag (`docker/dockerfile:v1`) that Buildkit pulls down and feeds the passed commands into. Technically, that could be anything. If you wanted to put in the work, you could write a Buildkit frontend for Ansible or Chef and use that to configure the image.
So, on the list of “4 things Docker is”, they were wrong on the first one.
RUN is the escape hatch that everybody (understandably) abused. And because we do, it's maybe the first thing you think of when thinking of a Dockerfile. But in the context of the other Dockerfile keywords I can probably live with calling it declarative. I'm a Java guy, I'd definitely call Maven declarative, but it surely has some escape hatches too. And what about <fill in your example>...
1. Be as reproducible as possible. e.g. if you're using RUN to download a file, check the hash to ensure that someone building the image in the future gets the same file. Lock your dependency versions wherever possible. Sometimes this is impractical, e.g. OS packages in most Linux distros expect other packages to be mostly up to date.
2. Separate RUN commands so that more frequently changed content is created later in the Dockerfile, to maximize caching.
I think it's important to recognize the difference between Dockerfiles and Docker images. Docker Engine builds Dockerfiles to Docker images, but you don't need a Dockerfile to run a pre-built image.
I came here to say the same thing. Images and their layers are loosely related to the Dockerfile format (each command is a layer), but Dockerfiles are as useful for execution as C Source files are. Likewise images can be sent and received without a Dockerfile just like an executable without its source.
The confusing part was the non-idiomatic English "builds X to Y" instead of "builds Y from X".
Edit: perhaps it is idiomatic somewhere in the world to say "builds X to Y", but not in the U.S., in my experience. Could be similar to British "different to" as compared to American "different than" or "different from".
With GitHub Container Repo, Podman, Colima, etc., the Dockerfile format is maybe the only one that will stick for a long time. I don't think Docker is going away, but I think it is such a great format that there is no need to reinvent it. Same for docker-compose.
The Dockerfile format is, in my opinion, crap. It’s extremely useful, and it’s fairly straightforward to kludge together a container build process using it, but:
It’s very hard to get reproducible output. It’s even fairly hard to get output where the inputs are well controlled.
It can’t do a clean crossbuild — you have to run the container to build it. As a side effect, you need all the tooling to install things into the container to be in the container. (Yes, there are workarounds. They’re ugly.)
It leaves trash behind. You need to fight with it to even get /tmp to be temporary.
It has no usable efficient way to supply large input files. You can bind-mount into a RUN, but getting permissions right when doing so is an uphill battle.
It is inherently not possible for Dockerfiles, as a format, to generate reproducible outputs/images. You can run whatever command you want in a Dockerfile. Docker engine itself has no way of knowing whether that command's behavior is reproducible--and in turn, has no way to guarantee reproducible images from a Dockerfile.
The format and engine could try a lot harder to make improved reproducibility the default.
As a trivial example, network access for RUN should be opt-in, not opt-out. The fact that the easiest ways to pull data in involve things like RUN wget is a design error.
A much better approach would be to have packages that install with as little script involvement as possible. Most Linux images are put together using rpm or deb packages and, other than pre/post-install scripts (which are not usually particularly necessary), package installation is fundamentally reproducible and does not require running the image. A good image building system IMO would mostly look more like:
INSTALLPACKAGES foo bar baz
And dependencies would get solved and packages installed, reproducibly.
> The fact that the easiest ways to pull data in involve things like RUN wget is a design error
Why is that? You can perfectly get reproducible build even using wget. You wget your file, get its checksum and compare it to an expected checksum. Boom, reproducible wget.
Honestly I've always found reproducibility harder to enforce when using Linux package managers (at least with apt-get which messes stuff up with timestamps)
The easy way to download something in a Dockerfile:
RUN wget URL
Your better way?
RUN wget URL && \
if [[ "$(sha256sum <the output>)" != "the hash" ]]; then \
# Wow, I sure hope I spelled this right! Also, can a comment end with \
echo "Hmm, sha256 was wrong. Let's log the actual hash we saw. Oh wait, forgot to save that. Run sha256sum again?" 2>&1 \
echo "Hmm, better not forget to fail!" 2>&1 \
exit 1 # Better remember that 1 is failure and 0 is success! \
fi
An actual civilized solution would involve a manifest of external resources, a lockfile, and a little library of instructions that the tooling could use to fetch or build those external resources. Any competent implementation would result in VASTLY better caching behavior than Docker or Buildah can credibly implement today -- wget uses network resources and is usually slow, COPY is oddly slow, and the tooling has no real way to know that the import of a file could be cached even if something earlier in the Dockerfile (like "apt update"!) changed.
Think of it like modern cargo or npm or whatever, but agnostic to the kind of resource being fetched.
If there was a manifest and lockfile, it really would not be that hard to wire apt or dnf up to it so that a dependency solver would run outside the container, fetch packages, and then install them inside the container. Of course, either COPY would need to become faster or bind mounts would have to start working reliably. Oh well.
> Honestly I've always found reproducibility harder to enforce when using Linux package managers
Timestamps could well cause issues (which would be fixable), but it's not conceptually difficult to download .rpm or .deb files and then install them. rpm -i works just fine. In fact, rpm -i --root arguably works quite a bit better than docker/podman build, and it would be straightforward to sandbox it.
> An actual civilized solution would involve a manifest of external resources, a lockfile, and a little library of instructions that the tooling could use to fetch or build those external resources.
Sounds like you're describing Nix.
I actually thought the article would be framed a bit differently when I saw the title: I think Docker and its ecosystem solve several adjacent but not intrinsically intertwined problems:
- Creating repeatable or ideally reproducible runtime environments for applications (via Dockerfiles)
- Isolating applications' runtime environments (filesystems, networks, etc) from one another (via the Docker container runtime)
- Specifying a common distribution format for applications and their runtime environments (via Docker images)
- Providing a runtime to actually run applications in (via the Docker CLI and Docker Desktop)
In this context, a runtime environment consists of the application's dependencies, its configuration files, its temporary and cache files, its persistent state (usually via a volume or bind mount), its exposed ports, and so on.
I would argue that Docker is often used solely for dependency management and application distribution, and for such use cases things like network and filesystem isolation just present obstacles to be worked around, and this is why developers complain about Docker's complexity.
It’s just a proof of concept. At least shows what can be done if one peeks under the hood a bit.
With multi stage builds you can already do quite few of the things you mention, like downloading in one container and copy into another, happening in parallel while apt install is running. It’s hopelessly verbose to do so though and one ends up with not using it and instead just brute forcing the most simple imperative file instead.
On the one hand, that’s really cool. On the other hand, I just leaned (from that article!) that the Dockerfile “syntax” is actually a reference to a Docker container. It’s turtles all the way down!
Seriously, though:
> The external files are downloaded in separate alpine images, and then use the copy helper to move them into the final image. It uses a small script to verify the checksums of the downloaded binaries s = s.Run(shf("echo \"%s %s\" | sha256sum -c -", e.Sha256, downloadDst)).Root(). If the checksum does not match, the command fails, and the image build stops.
Having any nontrivial build operation being an invocation of an entire Docker container seems like a terrible design. Docker is cool, but actual host-native Linux userspace images are a really rather nastily complicated way to express computation. What’s wrong with Lua or JavaScript or WASM or quake-c or Java or Lisp or any other sandboxable way to express computation that is actually intended for this sort of application? (All of the above, unlike Docker, can actually represent a computation such that a defined runtime can run it portably.
Docker images, being the sort of turtle that are not amenable to a clean build process, don’t seem like a good thing to try to fix by turtles-all-the-way-downing them.
That would require significant integration with the image; do you expect docker to know how to talk to apt, dnf, zypper, nix-env, apk, xbps-install, etc.?
It's at least possible in a limited sense. I'm not going to hold it up as a paragon of a solution, but cloud-init lets you just list packages and translates them automatically into a command line for most popular Linux package managers, even pacman.
But I strongly agree with what I think is your basic gist here, which is that too many people wish Docker was something it fundamentally isn't. It's ultimate intent is as a packaging system more than a build system. Yes, it builds container images, but a container image is just a packaging method. How the software running in it builds is up to the developers of that software. Thus, Docker's goal is to work with arbitrary tooling. Whatever compiler, dependency resolver, and whatever else you want to use, that your software already uses, you can keep using. That includes hacky bullshit shell scripts that pull in everything via wget. There is no good reason Docker should keep you from doing that if that's what you want to do. If you want deterministic, reproducible container builds, use a deterministic, reproducible build system, and put your outputs in a container image. Docker will gladly let you do that.
On the other hand, the other complaint above about having to run the base image to build anything on top of it I somewhat agree with. I get why they did it, because it's probably the simplest way to ensure you're not implicitly depending on the host system running Docker, so your containers won't crash for some stupid reason like the glibc in the container at runtime doesn't match what you had on your build host. But there were better ways to achieve this. arch-chroot, Debian's fakeroot+fakechroot, plenty of other systems already existed to build a self-container system on another system without implicitly building against dependencies that won't be there at runtime, and they don't require setting up and running the rather complicated Docker container engine, in particular the network bridging that can get janky, especially if your host system is using systemd-network. It'd be nice to have the systems for building images and running containers entirely separate and self-contained, which you can have, of course, just not with Docker.
I don't quite get the need to perfectly reproducible builds.
At least in my org, that ends up being more of a detriment than a boon. The problem? Devs hate updating libraries, which is a crucial part of security with docker.
Call me crazy, but I prefer the fact that `apt install foo` gets the latest foo and not what was pinned. We test our images before sending them to prod so if something breaks it's pretty easy to catch it.
If you want the latest foo, then tell your pinning solution that you want that. Then you get a real record of what’s actually running, you can reproduce old builds to instrument them, and you get all the other benefits of tracking what you actually built.
> Dockerfile format is maybe the only one that will stick for a long time
Do you mean the image format? The Dockerfile as a recipe for building an image seems like it can be replaced pretty easily with a nicer interface whenever someone feels like building one. But the image format will probably continue to be supported for some time.
That thread is nuts! More than two years of complaints about the installer bricking the OS, followed by crickets from Docker and the thread being automatically closed. Yikes! How can you have a piece of software which you know might completely corrupt the user's OS just by installing, and not at least notify the users next to the Install button? "Warning: may be Docker, may be virus, use at your own risk."
Not actually bricked, I assume -- your laptop has not been "turned into a brick", just had the OS corrupted, right? You could presumably wipe and reinstall Windows to fix it.
(Not saying that doesn't suck, just arguing with the terminology.)
But reinstall Windows crashes, USB recovery drive only helps copy files from command line, system restore points all crash. “OS corruption” sounds like it can be repaired while keeping files, but this is worse than that.
I fucking hate docker on linux but that issue is so much worse with a cherry of "Closed issues are locked after 30 days of inactivity". I can't tell if it was fixed or not.
I think it's pedantic to call it a "grand oversimplification". It seems like a reasonable simplification to me.
Like calling a .py file an application is reasonable IMO, even though it's a simplification because it's "just" instructions for an interpreter.
> It seems like a reasonable simplification to me.
It's not reasonable at all. If someone is trying to learn Docker and they read this misleading article, they will think they can somehow run their Dockerfile as a container, and they will be wholly confused as to why it's not going to work out.
Or they'll see that it's a small human-readable text file and not several hundred megabytes of machine code and come to a reasonable conclusion.
If I give you a cookie recipe and say "These are the best cookies", it's self-evident that I don't want you to eat the paper.
Nobody who is just learning docker knows or cares what Bazel or Crane are.
A pedant might similarly go the other way and say "C isn't just a compiled language, because personally I use the Ch C interpreter".
I don't think it's a good enough reason to avoid reasonable simplifications in technical writing because "Well actually, advanced users have an edge case where that assumption doesn't strictly hold....". The only way for anyone to get a reasonable understanding of anything complex is by starting with small simplifications and generalizations.
Of course you can run a .py file a program. But would you say the same regarding e.g. a C source file? It has to be compiled first, just as a Dockerfile has to build a container image which in turn can be run.
I don't find it to be an oversimplification at all. If I say "the terminal takes this bash script and runs it", I understand the bash script may do a ton of complicated things, and all the system services that are set up in the first place to run bash are complicated, but I see nothing wrong or oversimplified about that sentence.
Well, actually... you can just run C++ source as an executable... there are a bunch of C++ interpreters out there (of various quality, and probably not quite useful, but hey, nobody said you will have a good time running your C++ source).
But, yeah, if the article wanted to clarify instead of obscuring the functionality of Docker, the one we know and use, then saying that it runs the Dockerfile is misleading. I mean, it does run it to builds an image, that's how it build the image, but then readers need an explanation for why there are images and why do they need them, if all they do is running Dockerfiles, and that's where the "explanation" becomes worthless.
The article is correct if simple. Running a container is not very hard in the linux case since it is just a new cgroup entry. Obviously there is an extra layer in the case of Mac/Windows but you're still just setting up cgroups.
The article is incorrect. If you don't download or build the Docker image from the Dockerfile, you will not be able to run a container from the Dockerfile alone.
I'm not even sure if it's an oversimplification or misconception. I think it might just be a plain old error, and the author meant to write "takes a Docker image".
I mean, that post is a grand oversimplification to the point of being basically wrong on so many levels, that I'm not even sure why do you pick specifically that point to argue about. Even though you are totally right.
Docker is a command that can build, fetch, and run docker images from a docker registry and orchestrates running those typically on a linux host or virtual machine.
A docker image is a set of hashes that identify layers in a layered filesystem. A docker registry stores those layers and the docker looks up layers from the registry. The combined layers form a filesystem that is mounted when you run the docker image. A running docker image is called a docker container.
To create a docker image with docker, you use a Dockerfile that specifies a sequence of steps that modify the layered file system, typically starting from a base image that is specified in the first line.
The company Docker Inc. is a major contributor to the open source docker code base and also owns and runs a popular docker registry called Docker Hub, which is what the docker command uses as default place to look for docker images.
While docker is popular, there are alternative tools that are able to run docker images. For example Kubernetes implements its own way to run docker images. Likewise, there are build tools that produce docker images without using the docker command or a Dockerfile. And the docker registry API is simple enough that you don't need docker to fetch layers either.
Consequently, there are several popular tools in this space. That do similar things with varying degrees of compatibility. Many of these can run docker containers from Docker Hub.
Why not "Docker is an OS level virtualization (or containerization) product"?
This is like saying "Tylenol/Panadol is a painkiller and a fever reducer" without ever mentioning that it is a branded offering of the generic drug Acetaminophen/Paracetamol. We need to start using generic terms for technologies, not product names first. E.g. "Reactive Web Components" could be React, Preact or a number of other libraries/frameworks.
I feel the same about 90% of stuff I see in this field. As for remedying it, that ship has sailed. Most of CS, AI/ML are all plagued with marketing terms for everything. People either reinvent something from 40 years prior, or stack two existing things, name it something completely stupid, never forgetting "blazing fast" at the start, all instead of a humble combination - preferably concatenation - of the simple base concepts it comprises.
It's also a CLI, a daemon, a network abstraction, a company… It just doesn't make any sense to try to count how many things "docker" is. It makes sense to ask which "docker" do you mean when you talk about "docker". But since the author doesn't really talk about anything coherent here, it doesn't make sense to ask.
The function name is misleading you into thinking that it is imperative. In reality, the function just returns a path to a file in the Nix store with the specified contents. The derivation then returns a container in which Nginx is pointed to that same config file. Nothing is temporal or happening in any particular order - you declare some inputs, and you get some outputs (a Docker container, in this case).
Recall, the Dockerfile syntax is 'imperative'. If we change the order of the commands in the Dockerfile, we likely end up with a different image.
In the Nix example, the image we build is in the expression `nix2container.buildImage { ... }`.
The `nginxWebRoot` is the package with the index.html:
nginxWebRoot = pkgs.writeTextDir "index.html" ''
<html><body><h1>Hello from NGINX</h1></body></html>
'';
It's reasonable to say "writeTextDir" modifies the disk. I don't think it's reasonable to say just because changes in state occur that the code is imperative. (e.g. SQL is a declarative language, but clearly allows modifying the database).
We can say the contents of the runCommand argument are run imperatively, sure. (Especially: if you change the order of the bash commands, you might get a different result).
But unlike the Dockerfile, the order where we declare this nginxVar package doesn't matter.
Or, say: the `copyToRoot` in the `nix2container.buildImage` takes in a list of packages where the contents are copied to root. The copying is an action; but the list of what to copy is not an action. -- And again, `copyToRoot` could be put after the `config` attribute.
The mechanisms describing how the copying is done is elsewhere.
One of the biggest ones is that that the nix2container definition is evaluated in the context of a flake.nix file that specifies all the inputs, and a `flake.lock` that guarantees they stay frozen.
By comparison, "FROM nginx" is just grabbing whatever is the latest in some external registry that you don't control— it's the same as starting a Dockerfile with "apt update; apt dist-upgrade", you have this huge chunk of external mutable state that you're dependent on which immediately throws any kind of real reproducibility out the window.
(And yes, doing "FROM nginx:x.y" or "FROM nginx:<sha>" does help a little, but the point remains that you're pulling a big binary blob that is essentially mystery meat— trying to make sense of what's in there is why there's now entire companies dedicated to untangling software bills of materials.)
Good point on the mutability of docker tags. But not sure how applicable "you're pulling a big binary blob that is essentially mystery meat" is when cache.nixos.org exists.
Fair, I suppose— both are remote build systems that you have to put trust in when you pull their tarballs.
But even in a world where Debian's reproducible build project completely achieves all its goals, a given docker build is always going to have temporal state in it if it depends on external images or a mutating package repository. So yes, you may have the Dockerfile that purportedly produced that disk image, but you're unlikely to be able to completely rebuild or verify it unless you also have a snapshot of the apt Packages.gz.
A nix2container image could in principle build completely from scratch, in just one command line invocation, with no external cache present, and get a bit-for-bit identical result. The only real "trusted" input that you have to start with is I believe a small busybox binary and gcc toolchain that is the initial bootstrap.
But it seems to be doing it imperatively. I’d expect something like ‘nginxConf = pkgs.file “nginx.conf”, “contents”’ instead of ‘nginxConf = pkgs.writeText “nginx.conf”, “contents”’.
Not saying the system doesn’t apply this declaratively, but I find it difficult to intuit the above is checking for a state and applying changes only if necessary.
One distinction in Nix vs Docker is that Nix has a dag structure as opposed to a singlely linked list structure of layers.
The "writeText" function produces a derivation (basically an atomic build recipe) that produces that file. The crux of nix is that you make deterministic derivations, and then you can always refer to the results of a derivation from the hash of the derivation and its inputs.
What nix adds is glue logic to chain these derivations together in a way that preserves reproducibility of the individual imperative, but deterministic, components.
Unless you are using something like recursive-nix, you can completely evaluate the nix expression without building any of the derivations.
Also relevant to note that although Nix builds individual derivations imperatively (call this compiler, write this file, rename this directory), it completely controls all the inputs to that imperative process.
This is fundamentally different from a Dockerfile or Ansible script which have no idea what the "starting point" of the target environment is and are pretty much just mindlessly imposing mutations on top of whatever happens to already be there.
You don't really get credit for being "both". There are maintainability and comprehensibility benefits to keeping anything imperative out of a language (you don't have to reason causally from one statement to the next), which is out the window when you introduce imperative elements. Also: in a Dockerfile, those imperative elements are the heart of the system.
`ENV` is a bad example because it's effect differs greatly from where it's placed in the Dockerfile. Eg: before a RUN statement consuming it's value or after.
`FROM` also has more use cases when using multi stage builds.
oh, you're right. I forgot that a key characteristic of anything being "declarative" is that order of statements should not matter.
Acutally, come to think of it, since `RUN` may depend on any other Dockerfile statement (even `EXPOSE` might make a difference in code), does this mean that even a single imperative statement that is introduced in some language, makes the language imperative?
A Docker image is basically the cached result of a lucky, nondeterministic imperative build success from a Dockerfile.
In comparison, a Nix file is actually declarative ("I want my result system to have this" as opposed to "Do this to get me towards my result system"), and is actually reproducible.
But notice the sizes of both a Docker image (many megabytes) and a Nix file (a couple of K)...
You're right, and I meant "Docker image", was fortunately able to edit before the window closed! (Edited it for clarification. Sorry, 2-year-old plus no breakfast or coffee, minus sleep = daddy brain...)
A Dockerfile provides no guarantees that it will succeed, is what I was getting at- which is why people download Docker images to begin with, because the build product is guaranteed to work, since it's immutable at that point.
Like with functional programming, which at extremes is only declarative, there is a tendency to call “declarative” only those approaches that are perfectly declarative and incapable of imperativeness. However, not unlike purely functional programming, declarativeness is useless unless it is contaminated with real world at some boundary.
If you pretend that COPY means with these files like those files on host machine as of execution time, RUN means with this command executed at build time, etc., then even Dockerfile becomes fairly declarative. Every declaration defines a new immutable layer with its own unique hash, it’s just that some declarations can easily be used in ways that make the outcome vary based on the state of the entire world as of build time.
Just as in Ansible you can use “declarative” YAML in a very imperative lasagna of a setup, you can do the same with a Dockerfile, Nix, Haskell, or Python. You can also get pretty close to purely declarative bliss with any of them, but it will grow impractical before that point.
The thing is, some of the first advice you'll get about Docker is that the order matters a ton. These two Dockerfiles technically create the same result, but the difference between the two is very important:
COPY . .
RUN npm install
And:
COPY package.* .
RUN npm install
COPY . .
The first file reruns npm install every single time any file changes in the code. The second only reruns npm install if the packages change. That can make the difference between a 5-minute build and a 5-second build, so it's not a small optimization.
Given how important the order of instructions is, it's hard for me to think of it as declarative. It fits better in my mind as a sequence of instructions, which is the very definition of imperative.
I find Docker an interesting example of a fundamentally elegant approach that, possibly to facilitate commercialisation, was documented in a way that, instead of imposing any particular culture (such as, say, Nix or Haskell), strongly aligned with the preexisting culture of operations and system administration—the resulting ease of adoption and popularity (and, I assume, broadly deserved financial well-being of the original creators), however, could not come without a variety footguns along the lines of which you have described.
I feel like "not declarative" is fair enough if you look at how the important bits work in a Dockerfile. Like what software is installed, that's not usually some structured thing...but typically shell commands out to apk or apt-get.
I get why it is the way it is, but if it were more declarative, it would be easier to manage Dockerfiles through changes, security updates, etc.
You can shell out in Nix and Haskell. If they are not “declarative” then what is [both declarative and useful, a.k.a. capable of interfacing with outside world]?
Dockerfile is intentionally bare-bones. It just gives you RUN straight up, no scary hidden option, but it’s up to you how to use it. If you want to write imperative, you can. If you don’t write imperative, pin everything to a hash, shell out only to Dhall and Prolog, it can get very declarative…
…at a cost. The fact that people do not tend to go this route means equally that 1) they are lazy, and 2) using RUN this way is pragmatic.
And I’m saying that it is not fair, for the same reason (as well as for another reason, which in short is “nothing can be both declarative and useful by that logic”).
> Every declaration defines a new immutable layer with its own unique hash...
This blurs the meaning of "declarative" to the point of meaningless.
Consider the following pseudocode:
x = 5
print x
x = 10
print x
This is clearly 'imperative'.
But from the perspective of 'each line of code declares a new program', then we consider the code snippet a declaration of a program.
> Just as in Ansible you can use “declarative” YAML in a very imperative lasagna of a setup
One of the things which makes Dockerfiles imperative is the sequence of Dockerfile commands is significant; if you swap the order of a COPY, RUN to a RUN, COPY, the result changes significantly.
I also would to call Dockerfiles imperative on syntactic grounds alone, but I feel they have decalartive semantics in a way that's noteworthy.
> But from the perspective of 'each line of code declares a new program', then we consider the code snippet a declaration of a program.
The critical difference is that each line in a Dockerfile yields (declares) a filesystem state which can be referenced and recreated. In contrast, in your example, I have no way to say "give me a snapshot of the system between the third and fourth instructions".
Dockerfiles have all sorts of rules and restrictions that make these semantics possible. You cannot create loops; there is nothing like a "function", at least within the context of one Dockerfile.
> One of the things which makes Dockerfiles imperative is the sequence of Dockerfile commands is significant; if you swap the order of a COPY, RUN to a RUN, COPY, the result changes significantly.
I reject this line of reasoning, simply because decalative languages are indeed order-dependent:
If you call things “declarative” or “imperative” on syntactic grounds, which would you call Ansible (like Docker, a devops tool, famously using pure YAML but in many cases for basically running a bunch of scripts)?
It’s just not a reasonable way of making the distinction; claiming X is “imperative” because you are using it in imperative ways is logically flawed, it is not a statement of truth about X.
Dockerfile is fundamentally declarative, as you note (that’s just how Docker works: every line describes a layer), and it has not even enough features to make it imperative (control flow? goto?).
I think it is fair to distinguish between syntax and semantics here.
Haskell's "do" notation is frequently described as an imperative syntax for functional/declarative transformations. I would put Dockerfiles in the same boat. They behave declaratively, but users can think imperatively when they write them (to a certain extent) and this is part of what makes them more accessible to newcomers.
Ansible is a great example of the opposite. It looks declarative but, like you said, basically runs a bunch of scripts one after another on a system, with state and all.
This is a deeply mistaken view informed by the cargo-culting devops traditions of yore. A Dockerfile is a declaration of layers that does not resist being used in imperative way, in which sense it’s no different from YAML or any functional language you can think of.
> This blurs the meaning of "declarative" to the point of meaningless
The alternative is to draw a clear line where there is none.
> One of the things which makes Dockerfiles imperative is the sequence of Dockerfile commands is significant; if you swap the order of a COPY, RUN to a RUN, COPY, the result changes significantly.
By that logic you can call every Nix program “imperative”. They are sequences of strings, the order of which matters. The horror!
A Docker image is a stack of layers. Every layer points to a preceding layer. It’s not a huge leap from that structure to a fairly elegant sugar of a string array, where each string is a declaration that applies to preceding layer thus describing the next one. I don’t see anything fundamentally imperative about it; it can be used in imperative ways, but anything can.
> The alternative is to draw a clear line where there is none.
I'd say that more/less "imperative" refers to describing sequences of actions that take place (especially those which might modify some state), whereas "declarative" is more about a structure of what the result should be.
I can agree that a Docker image can be considered as a structure, and that it's possible to construct one declaratively. Whereas I'd say a Dockerfile is an imperative construction of that.
Dockerfile describes a set of immutable layers. Whether you model it in your head as a sequence of commands or as a description of a set of immutable layers, is up to you.
Yeah, that's why I would consider a Makefile declarative even though it can contain bits saying how to do something. I can't consider a Dockerfile declarative simply because it is necessarily executed top to bottom.
An array can be declarative even if it has order. Think of the ordered list of statements in a Dockerfile as syntax sugar for a linked list of (previousLayerPointer, nextLayerDeclaration) tuples. I expanded on this in another comment.
If you're going to make absolutist statements about nitpicky minutiae, you have to get it right.
In point of fact the image side of a Dockerfile, where an image is a DAG of other images referenced by an immutable ID or pointer to hosted content, is "100%" declarative. It's only the "build" syntax that is ordered.
If you're going to be pedantic about minutae, you have to get it right.
The "image side of a Dockerfile" isn't a Dockerfile, it's an image, more specifically an image in the OCI Image Format [0]. A Dockerfile is just the most common syntax for controlling software that can create an OCI image (such as Docker and Podman).
You could argue that the OCI Image Format is declarative, but that's not relevant to OP's comment about Dockerfiles.
In all the real-world cases I've seen, the base images in Dockerfiles are just tags, which are mutable (especially the :latest tag which changes with every release).
Unfortunately it also is a daemon that constantly runs on your machine even when you don't use Docker at all.
Which keeps all kind of mounts and virtual networks alive that you don't need and that interfere with other things you want to do.
For example I was on a train in Germany recently and could not use the Wifi. Why? Turned out the docker daemon occupied the IP range the Train Wifi uses. While I was not using docker at all.
First, Kubernetes is a resource allocator/scheduler. Find a place for this workload given my constraints. If something goes wrong, find a new place.
Second, Kubernetes is a dev-friendly app management tool. Start 5 copies of this app, restart if it fails, mount this config file, etc. Most Kube apps look roughly the same across teams and companies, so skills are transferrable.
Third, the Kubernetes API is a standardization layer between compute/cloud capabilities.
Fourth, the Kubernetes Container <foo> Interface (CxI) is a implementation layer to plug in to compute capabilities. You can swap Docker, CRI-O, etc with the Container Runtime Interface (CRI). Use cloud block storage with Container Storage Interface (CSI).
Fifth, Kubernetes custom resources (CRDs) are a standardized extension point for building custom behavior into a cluster in a sane way.
And of these four things, a large number of 3rd party products and open source utilities do not recognize the difference between using Docker with and without Docker Desktop. Most only work without Docker Desktop, and do not even realize that the environment with Docker Desktop completely breaks their offering.
Missing a mention of buildkit, which compiles various high-level build description languages (including Dockerfile) into a low-level build graph execution, which supports all of the bells and whistles of advanced build infrastructure, such as remote execution and remote caching.
I am particularly interested in build systems, so I am biased, but I think this is one of the coolest things coming out of Docker. It is a bit painful to see so much fanfare at DockerCon about Docker Desktop stuff, and so few mentions of more foundational things like buildkit and rootless container support.
I remember when I was diving deep into Docker for the first time a few years ago, I would have really appreciated seeing something like this. I wrote something kind of similar in a blog post [0], but that was only a semi-confident note to self that took quite a bit of digging through READMEs and GitHub issues. All the different container runtimes/engines/interfaces are really enough to make your head spin.
Most commonly used languages are available on all platforms. Together with the vast cornucopia of their libraries. So if I want to develop on a Mac or Windows, as is common, native versions are available for either of them.
For who target Linux as deployment platform, as is common, Docker instead forces to run the Linux version of interpreters and libraries on a Mac or Windows. On top of a Linux mini VM. Could not be otherwise?
i feel like this author does not fully comprehend the topic (docker)... dockerfile is NOT declarative.
"Docker Engine which takes a Dockerfile and runs it on a Linux host natively" no, not really, you have to build the dockerfile into a container. There is a lot of different steps between.
This post is four things and i'll leave it to the reader to figure it out
Docker Desktop is too narrow too include this way. You might as well call it Docker Runtime or something. Personally I would just include it as part of the Docker Engine, along with the CLI, and the Go SDK which can equally be used to spawn docker containers. So Docker is 3 things.
Docker is an Engine and a Hub. The configuration file concept is common to any software which automatizes machine virtualization. The commercial emphasizes behind Desktop and the existence of alternatives like the CLI or Portainer, makes that part dispensable
As an aside, I love the design of this mini-blog format. And I really appreciate the author who says just one thing and doesn't have the need to generate walls of meaningless text to convey it.
I just did a post about this on LinkedIn (gotta build that cred).
Containers aren't a new concept. But in today's world, we're usually talking about the intersection of three Linux features:
* namespaces bundle resources (processes, disk, I/O, etc.) together, isolating them from each other. They're a teacher with a class full of rowdy kids.
* cgroups (control groups) limit and audit how many resources a group of processes can use. They're traffic cops.
* Union File System sits on top of another file system. Anything in the underlying file system is visible, but writes and new files only appear in the UnionFS mount. This layer can be thrown away at the end of a session, leaving the base filesystem pristine. It's the tracing paper you used as a kid when you were learning to draw.
Processes in a container run on the same kernel as the other apps you run, but they're isolated, controlled, and kept from scribbling all over the disk. Add some tooling and standards and you have the foundations of modern #softwaredevelopment !
This is a big over-simplification, but I think it's a good mental model:
Think of a container as a fancy chroot.
Imagine that:
1. You have a whole separate Linux installation in some random folder.
2. You ask your OS to launch some process, and to "trick" the process into thinking that "$YOUR_FOLDER" is actually "/".
3. You maybe also ask your OS to add some extra isolation (network, devices, /proc, etc).
4. The term "container" refers to the chroot, the isolation rules, and usually the process-tree running inside of it.
5. While a container is alive, you can also ask the OS to run some other programs that exist inside of the container's sandbox
Docker is a tool that creates and manages containers. In Docker's view of the world:
- Containers are made by extracting .tar.gz files into some folder and asking the OS to create a sandbox where that folder is a "fake" root directory.
- Those .tar.gz files are called "images".
- Docker can download images from the internet, or you can build your own by asking it to follow the steps contained in a Dockerfile.
It's an individual OS with a filesystem of its own, based on the "image" and what is pre-installed (such as an OS, plus, if you choose: some extras pre-installed based on your project need.).
You can link your laptop or server (AKA your "host" computer) to it through Docker Volumes. And, you can link containers together through Docker Networks, for example.
So, you can see how a single compartmentalized OS (a "container") can: 1. interact with another computer's file system and 2. interact with another computer on a network. And, each one can be quite customized.
we need more clean and short post like this
Docker is 4 things (clear title)
state of the art?/ information needed
-First, the Dockerfile file format for declaratively describing a machine (operating system, installed packages, processes, etc).
-Second, the Docker Engine which takes a Dockerfile and runs it on a Linux host natively, without a virtual machine.
-Third, the Docker Desktop app which takes a Dockerfile and runs it on a Mac or Windows host, using a Linux virtual machine.
-Fourth, the Docker Hub container repository which allows a community to share Dockerfiles for common configurations.
problem that could arrive/why this is important
Often, when people evaluate Docker or make statments about “Docker,” they’re referring only to the Docker Engine.
conclusion
But most of the time, this is not a useful view to take. Most of time, the practicality of Docker comes from a combination of all four of these things."