It's not often that an open-source tool or technology emerges and comes to a position of industry mainstream acceptance within a span of two years. Docker is one such technology that fits within that rarified category, and it deserves discovery and discussion. However, owing to Docker's roots, it often remains entirely a mystery to those developers who find themselves among “the Microsoft crowd,” particularly when the root of Docker's benefit - that of providing isolation at a computer- or operating-system-level - already seems to be covered by Hyper-V and its ilk. Why all the hubbub? What compelling capability does Docker offer that managing Hyper-V virtual machines doesn't?

Related CODE article: Docker for Developers

This article seeks to do two things: one, to get the Docker neophyte up and running with it, through most of the “basics” that you'd need to understand in order to use it for several simple purposes; and two, to understand how Docker differs from what appear to be its contemporary competitors/alternatives, and why Docker serves as a useful tool beyond the traditional Hyper-V story.

Let's begin.

Installing

Getting Docker onto your system is pretty trivial; by visiting the Docker home page, you can download the official Docker installer for Windows. The Docker download is available at https://docs.docker.com/install/. This installs the official “Docker for Windows” release, which will, among other things, start a process running in the background on your computer. This is the Docker daemon, and it's the daemon that does the majority of the work. When you use the Docker command-line tool, it sends instructions to the daemon to carry out. If, for any reason, the daemon isn't running, the Docker command-line tool errors out without being able to do any meaningful work.

One thing that's important to note about Docker for Windows: Unlike Docker on its original platform (Linux), Docker for Windows requires the use of some virtualization technology to carry out its agenda. If you have Hyper-V enabled, Docker for Windows can use that; however, older versions of Docker used VirtualBox (an open-source virtualization tool now managed and maintained by Oracle), and that is also usable, if you desire. The key thing to understand is that Hyper-V and VirtualBox cannot both be enabled on the same computer at the same time - it's a choice of one or the other, never both. (This is due to limitations in the virtualization stack, and not something that's easily fixable at this time.)

Assuming that the installation has worked, Docker for Windows is now running on your computer. To verify that it's installed correctly and running, the easiest thing to do is ask it to report the version installed:

docker -v

Docker responds with something similar to the following:

Docker version 17.09.1-ce, build 19e2cf6

Docker's version numbering scheme informs you of major and minor versions, and the fact that this is the “Community Edition” (hence the ce suffix). It also reports, as you can see, the build number, but this is generally of little use except in error log reports in the event something goes wrong.

Exploring

To begin understanding the promise and potential of Docker, let's use it to do something.

In particular, let's do the traditional “Hello World” of Docker, which is to run a simple bash command in an Ubuntu Docker image:

$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to
be working correctly.
To generate this message, Docker took the following
steps:
( ... snip ... )

Docker's hello-world image is cute, but hardly functional. Let's do something slightly more ambitious. The Docker hello-world image suggests running a bash shell inside of an Ubuntu image, but you can do something a little different and closer to home:

docker run -it microsoft/dotnet
( ... snip ... )
root@83ea2918a67f:/#

What's happening here is both obvious and deceptive: The Docker daemon downloads a collection of filesystem images to your computer. Each filesystem image is, at heart, a delta of a previous image; in essence, a Docker image is made up of a series of filesystem diffs, in the same way that Git and other source-control systems have been keeping track of the changes to your source code for years. Each filesystem delta is downloaded, applied in order, and the end result...well, the end result is an executable environment in which you can run a command. In this particular case, the command is implicitly bash, which is the standard command shell in this particular image, and, because this is the standard Microsoft .NET Core Docker image, it has all of the .NET Core tools installed in it. Specifically, you can “dotnet help” or any of the other .NET Core commands, including the generation of a new .NET project, should you choose. There's not much point to doing that at the moment, as there are a number of other things you need to do first, but the point is this: Docker can act as a virtualization tool, providing an environment in which you can execute commands and run processes. Or, to be more precise, a single process. But more on that in a moment.

Docker can act as a virtualization tool, providing an environment in which you can execute commands and run processes.

The command-line you used is important to parse. Like so many other command-line tools, a Docker command consists of a verb (run) and a target, in this case, the image to run (microsoft/dotnet). That image is stored in a cloud Docker repository called the Docker Hub, and it's accessible to anyone who wishes to use it. This is the central repository for all public Docker images, and holds the “official” images for a number of different tools, not just .NET Core. Numerous open-source tools, such as OpenJDK, MongoDB, Neo4J, Postgres, and others, are stored here, always available for use.

The naming convention takes one of two forms. If the image is an “official” DockerHub image, it has only one part to the name, such as mongodb or postgres. If the image is managed by a third party, like, in this case, Microsoft, it forms a two-part name, such as microsoft/dotnet. The third-party status isn't reserved exclusively for big companies. As a matter of fact, should you feel compelled, you can run the exact same command using one of the Docker images that I've hosted on DockerHub, such as my image that hosts Common Lisp.

docker run -it tedneward/clisp
root@83ea2918a67f:/#

Ditto for any other of my images: rust, ooc, and purescript, plus a few more mainstream ones. The point is, the images are stored in DockerHub, ready for anyone to use.

Assuming that you decide you'd like to spend some time with Lisp, you could make use of my Common-Lisp Docker image. You can run the above, but there's a huge problem with that; any changes you make to the filesystem are wiped out as soon as you exit the Docker image. Surely there has to be a way to preserve those changes.

In the Docker world, this means “mounting a local volume.” Put differently, you want to tell Docker to mount a directory on the computer to a directory point inside the Docker image. Contrary to expectations, the mount command isn't a part of the Docker image, but something that you set when you start the image itself. In other words, you need to pass another command-line parameter to tell the Docker image where you want to mount the volume on the host (the physical computer) and where you want its mount point to be in the image.

In this particular case, I've created the image to have an empty home directory (/home/clisp inside the image) so that I have a place to mount a directory from the host computer. Doing that requires the use of the volume parameter in Docker, which looks something like this (be sure that it's all on one line):

docker run -it -v $(pwd):/home/clisp tedneward/clisp clisp@50c812137d8b:~$

Now, any changes that you make to your current-working-directory are reflected inside the Docker image, and vice versa. For example, if I fire up Visual Studio Code as an editor from the host and create the following file:

(defun hello (name)
(format t "Hello, ~a" name))
(hello "Ted")

Then save it to the current directory as hello.cl. Inside of the Docker image (which you previously fired up using the run command), you can run the C-Lisp code and see the lovely results:

clisp@98a2e68ea7a1:~$ clisp hello.cl
Hello, Ted
clisp@98a2e68ea7a1:~$

Congratulations! You're now officially a Lisp programmer and can add that to your resume. What's more, the image is stored to your local hard drive, so you can fire it up any time you like, including when you're offline. When you exit the image, any files stored in the /home/clisp directory (or any of its subdirectories that you might create along the way) in the image will be written to the host filesystem and thus preserved after the image terminates. The Docker documentation has a great deal more to say on this subject, including some suggestions for how best to manage databases that store a ton of data, but this suffices to make the point that Docker filesystems are ephemeral, and only mounted volumes persist beyond the image's execution. On the surface of it, that seems odd for a virtualization technology.

What's also odd is that you can't do anything inside the image that isn't restricted to a terminal shell. You can't fire up VSCode inside the image, for example, because there's no GUI system inside the image by which to display the VSCode windows. This, too, seems odd for a virtualization tool. It's almost as if we're back to the 1990s, with terminal shell access over the network as the only way to reach a computer. Even a decade ago, you could use remote-connect tools like VNC or RDP to connect GUIs to remote computer, so what gives? (Hold onto that question for a moment.)

One Docker for Windows quirk also deserves mention. Docker for Windows can only use host drives/directories that are available under Windows Filesystem Sharing. By default, the Docker for Windows installation process tries to automate this so it's seamless to the developer, but if you try to use a host directory on a different drive than the boot (C:) drive, you can start to run into errors from the client. If you run into issues, try mounting a host directory under your user account's home directory (C:/Users/Ted/...) to see if that corrects the problem.

Even without these limitations, Docker wouldn't be all that useful if the only thing it could do is mount local filesystems; fortunately, Docker can also open ports in the image that can connect to actual ports on the host. Doing so requires another mapping parameter, p, which maps the port on the image to the port on the host. Common-Lisp is wonderful, but most readers are going to be more familiar with .NET Core, so let's turn back to the official Microsoft .NET Core image, create a new ASP.NET MVC Web app with only the scaffolded files, and run it.

To do this is actually pretty straightforward. Use the Docker command-line shown earlier to launch the Docker image, but with a volume parameter to map to the current working-directory, and the port parameter to map .NET Core's default port of 5000 in the image to the host computer's port of 5050 (or any other open port of your choice). The first step is to launch the .NET Core image. (All of the following code should be typed on one line, allowing it to wrap as needed.)

docker run -it -v $(pwd):/home/dotnet -p 5050:5000 microsoft/dotnet

This brings up the .NET Core image. From here, you can dotnet new mvc -o hello, which scaffolds out the basic template, and then dotnet run, which starts listening on the image's port 5000 for incoming traffic.

Note that as of this writing, Docker for Windows and Docker for Mac still have some quirks with networking that could reasonably be called bugs. In particular, Docker doesn't seem to always do machine-IP forwarding correctly, so instead of being able to browse http://localhost:5000 to see your scaffolded ASP.NET app, you may have to discover the Docker image's IP address on the network and use that directly instead. This is one of the weaker areas of Docker, and the Docker community is working to improve the experience here, but it's not unusual for Windows or Mac developers to end up having to do a little research to get things working. What's worse, any advice I might proffer here could easily be out-of-date by the time you read this. (This is, in many ways, Docker's worst story.)

If I were to stop here, it would be an interesting - if limited - virtualization story. After all, a virtualization tool that can't track the changes to the filesystem, for example, is going to run into a number of limitations pretty quickly. And no GUI? This is not exactly a compelling discussion. Let's see what else we can do.

Building

Part of Docker's appeal is that it doesn't consist only of a virtualization layer. In many respects, Docker is about capturing not only the operating environment (such as the operating system and hardware), but also the application code and the immediate context required to run it.

Consider, if you will, the average ASP.NET application that we stand up, even something relatively simple like the ubiquitous “CRUD app over Northwind database” demo that has been the staple of every .NET column author for the last two decades. If you want to see that demo in action, frequently it requires that you download the Northwind dataset sample, stand up your own instance of SQL Server (regardless of which edition), run the scripts to install the data into the database, download the code for the demo, configure the ASP.NET configuration files to point to your particular instance of SQL Server, and so on. As much as you might like to pretend that the code is what's important, many (if not most) of the issues getting the application up and running have nothing to do with code. They are often configuration issues like connection strings, passwords, and the rest. Not to mention all the time required to do that configuration.

This is where some of the Docker peculiarities start to make sense. Docker lets us capture both the code (usually after a build is done, if compilation is required) and the configuration files as part of one of those filesystem deltas, pull the filesystem down to any Docker-enabled computer connected to a Docker repository (such as DockerHub), and deploy exactly the same bits in exactly the same way, over and over and over again.

It is, in short, the ultimate delivery artifact.

Or, to be more precise, Docker allows you to build a file that can create that ultimate delivery artifact, which can then be loaded and executed. This is partly why so much of the Docker discussion comes hand-in-hand with cloud providers. Instead of having to configure the cloud virtual machines, if the cloud VMs are simply running Docker daemons, they can each pull down the complete image and start execution from that known state. (Databases and other data-storage configurations are still a little tricky in Docker, but I'll talk about ways to handle that in a second.)

The secret to all of this is the Dockerfile, a text file containing a sequence of statements that describe how to build the image. Consider, for example, the Dockerfile from my Common-Lisp image of earlier (again, note that the && user add line needs to be all on a single line and is broken here to accommodate printing):

FROM ubuntu
MAINTAINER Ted Neward <ted@tedneward.com>
RUN apt-get update && \
apt-get install -y clisp
RUN groupadd --gid 1000 clisp \
&& useradd --uid 1000 --gid clisp --shell /bin/bash --create-home clisp \
&& chown -R clisp:clisp /usr/local
USER clisp
WORKDIR /home/clisp
ENTRYPOINT ["/bin/bash"]

For a full reference to the Dockerfile syntax, see the Docker documentation, but in a nutshell, here's what's going on.

For any public image in DockerHub, you can click on the “Dockerfile” link on its home page and see the syntax of the Dockerfile that built that image. Feel free to poke around!

First, this Docker image should be based on an existing image in the Docker repository called ubuntu. This is, not surprisingly, a standard image of the Ubuntu operating system, and I find it easy to work with for some of the images I want to create. The drawback to Ubuntu, bluntly, is that it's large and includes a number of tools and commands that won't be necessary in your typical headless production environment. For this reason, many Docker enthusiasts prefer a different flavor of Linux called Alpine Linux, which is a bare-bones OS and not much else. Were I building an image that I wanted to ship to a cloud cluster, I'd probably prefer that approach.

The MAINTAINER line is simply documentation that has no effect on the image.

Each of the RUN lines are commands executed in the newly forged image. The first asks the standard Ubuntu packaging manager to update itself and install the Common-Lisp package. The second takes the time to create a user and group on the filesystem that will be specific for clisp; it's not necessary, but by default these commands (and the process that you launch) run as the root user inside the image. Given that this image is intended for interactive (rather than production) use, I'd prefer to not be the root while I'm toying around with Lisp. More importantly, each of these commands is “checkpointed” inside the image, so there's a marginal set of benefits to not having too many of these RUN commands inside a single Dockerfile.

The USER command tells Docker to switch over to using the clisp user instead of root, and WORKDIR indicates that the working directory when the image launches its process should be the /home/clisp directory. Again, this is appropriate for an interactive image like the one I'm building here. Lastly, the ENTRYPOINT directive tells Docker to run /bin/bash, the interactive shell, as the command to launch the image.

The key thing to understand here is that a Dockerfile consists of basically two things: a series of commands and statements about how to set up the environment, and a single command to launch a single process. This is what happened earlier when you first ran the Docker hello-world image; the image did the setup necessary to get to the point of being able to run the command that printed “Hello from Docker”, and then the process terminated and Docker returned control back to you. Often, these Docker processes are long-running (“infinite”) processes like a Web server that should never terminate. In fact, that's so often the case that Docker assumes you don't want to interact with the running Docker image by default, and you have to tell Docker to run interactively (the -i parameter to the docker run that you used earlier) and to connect Docker to the command prompt's stdin and stdout (the -t parameter).

Let's look at a more reasonable example of Docker use for an application.

Normally, you'll do development on a local computer, where you have access to all the usual goodness that you've come to expect, a la Visual Studio or VSCode. Assuming that you build an ASP.NET Core application, you'll ask it to publish the resulting build files to a known location on your computer, then tell Docker to build an image consisting of those build files. (The actual ASP.NET app doesn't matter for this scenario - anything will do.)

First, to generate the ASP.NET Core app, let's just do the traditional MVC app: dotnet new mvc -o hello, or File > New > Project from inside of Visual Studio, whichever floats your boat. (Remember, all of this is on the host computer.) Now, however, publish the app via either the Visual Studio GUI or dotnet publish -o ./published, which puts all the relevant files into a local “published” directory.

Next, you need a Dockerfile that tells Docker to build an image containing these bits. You could use the microsoft/dotnet .NET Core image, but that means you'd have to do any standard ASP.NET configuration/commands in every Dockerfile you build. Microsoft beat us to that particular punch by publishing the microsoft/aspnetcore image. It's probably a good idea to have your own directory inside the image that contains all of your published code, so let's create one and call it app, and copy all the published files there. Lastly, you'll need to tell Docker to fire up dotnet to run the published assembly that contains the application code, which, in the case of the hello app will be hello.dll. The Dockerfile looks like this:

FROM microsoft/aspnetcore:2.0
WORKDIR /app
COPY ./published .
ENTRYPOINT ["dotnet", "hello.dll"]

And you're done. Almost.

Docker doesn't want to have to rebuild the image every single time you tell it to run an instance of the image, so Docker wants you to compile these Dockerfiles into images. As a matter of fact, I've been a little loose with my terminology; an image is the filesystem contents that will be loaded into a container and executed. Running instances are called containers, and the thing they're running is an image, in much the same way that you use objects to refer to the running instances of things instantiated from a class.

You need to build this Docker image, and you do that by running docker build . in the directory containing the published directory and the Dockerfile (which, by convention, is assumed to be named “Dockerfile”). Note that if you leave out the “.” from the command, Docker errors out because it needs to be told where to find the Dockerfile. It doesn't assume the current directory.

Assuming that everything works, you should see:

$ docker build .
Sending build context to Docker daemon  6.152MB
Step 1/4 : FROM microsoft/aspnetcore:2.0
---> bb8bdc966bb5
Step 2/4 : WORKDIR /app
---> Using cache
---> dd0676b321cf
Step 3/4 : COPY ./published .
---> c21725750682
Step 4/4 : ENTRYPOINT dotnet hello.dll
---> Running in 4a856a786707
---> 89ecf7a7917e
Removing intermediate container 4a856a786707
Successfully built 89ecf7a7917e
$

Yikes. It built, but Docker, by default, decided that the name of this image should be “89ecf7a7917e”, which is not exactly a user-friendly name for the image. Fortunately, you can name them using the -t parameter to docker build. Thus, docker build -t myaspnetapp . does the same thing again, but this time calls the image myaspnetapp.

Now, an image called myaspnetapp is sitting on your local computer. But this is hardly the point - if Docker is a deployment mechanism, how do you deploy?

Publishing

The DockerHub is only one of many places to which a Docker image can be published, but most will work in much the same way, so you'll use this as an example. First, in order to make sure that your app doesn't conflict with anybody else's app on the DockerHub, the name of the image should be prefixed with your unique username on DockerHub. To do that, two things need to happen. First, you need to obtain credentials on DockerHub (go to http://hub.docker.com and create a free account), and second, the image needs to be rebuilt under the two-part naming scheme from before (“tedneward/myaspnetapp”, in this case).

Once that's done, from the command-line, you can log into DockerHub by issuing “docker login” and following the prompts. Once done, the image can be uploaded to DockerHub by issuing “docker push tedneward/myaspnetapp”. This does a similar set of steps to building the image, uploading each filesystem delta as necessary, and concluding by printing a hash of the uploaded image for verification. If you visit the DockerHub page for your account, the new Docker image should be uploaded, ready, and waiting for anybody else to grab and run. They just need to do as you started with: “docker run tedneward/myaspnetapp”, and lo, they have a copy of the application - fully provisioned, configured, and ready to receive incoming requests - up and running. (To be fair, if it's an ASP.NET app, they'll need to configure the local-to-guest port mapping, but that's still a great deal less work than the pre-Docker alternative.)

Wrapping Up

Where does that leave us with respect to Docker?

This is where the fun begins for Docker. A Dockerfile is often the end-step in a CI pipeline, so that the result of the build is a complete application and environment. The Dockerfile is most certainly a source artifact, so it needs to be kept in sync with any changes made to the project - if you introduce a configuration file, for example, you'll need to make sure it's reflected somewhere inside the Docker image - and needs to be checked into source control so that the CI pipeline can invoke “docker build” to produce the resulting image.

But it gets more interesting from there. If your project, like so many, is a distributed system, each node in the system can be represented by a separate and standalone Docker image. Your database, your messaging server, even a file server, these can all be generalized down into Docker images, which gives you tremendous flexibility in how the system runs during development - parts of it can be local to your computer, and other parts are in a cloud environment, and so on. This forces you to deal with how to “discover” those nodes far earlier in the project, which has the lovely side-effect of making your project more resilient in the face of change.

Now the project consists of multiple Docker images, and starting and stopping all of them in sync can be a pain. It's Docker community to the rescue! Several projects, two of which are Kubernetes and Docker Swarm, specifically aid with the management and lifecycle of multiple Docker images in sync, so that the entire system can come up with a single “docker-compose up” command. (Which of those two will “win” in the end is a hot debate right now.)

This wouldn't be a Docker article if the word “microservices” doesn't get mentioned at least once, but there's a reason for that. The single-process nature of Docker encourages smaller deployment targets. The fact that Docker only allows a single process to run means that developers and architects have to figure out what that one process will do, which encourages a smaller, more single-purpose-focused design.

The “micro” in microservice doesn't refer to the amount of code, but rather the surface area of its responsibilities; a microservice should be focused on one, and only one, domain concept, rather than putting all the services into a single deployment target.

Lastly, using Docker means never having to write a Microsoft Word doc with install instructions ever again. Combined with the fact that every major cloud vendor now supports Docker directly, developers can now make sure that the code that they write will be deployed exactly the way it needs to be in order to run correctly. In essence, Docker allows us to package up the code and its immediate surrounding environment into an image deployable to anywhere, from anywhere, and it will simply work.

And that, my friends, was the point all along.