Distributing confidential Docker images

Here’s my another pet peeve with Docker: the infrastructure for distributing images is simply Not There Yet. What do we have now? There’s a public image index (I still don’t fully get the distinction of index vs registry, but it looks like a way for DotCloud to have some centralized service that’s needed also for private images). I can run my own registry, either keeping the access completely open (with access limited only by IP or network interface), or delegating the authentication to DotCloud’s central index. Even if I choose to authenticate against the index, there doesn’t seem to be any way to actually limit access to the registry — it looks like anyone who has an index account and HTTP(S) access to the registry can download or push images.

It doesn’t seem there is any way in the protocol to authenticate users against anything that’s not the central index – not even plain http auth. Just to get https access, I need to put Apache or nginx in front of the registry. And did I mention that there is no way to move full images between Docker hosts without a registry, not even a tarball export?

I fully understand that Docker is still in development, and that these problems mean that there is not much of bigger showstopper issues, which is actually good. However, this seems to seriously limit usefulness of Docker in production environments; I need to either stop controlling who’s able to download my images, or I need to build image locally on each Docker host — which prevents me from building an image once, testing it, and then using the very same image everywhere.

And the problem with distribution is not only with distributing in-house confidential software. A lot of open source projects run on Java (off the top of my head: Jenkins, RunDeck, Logstash + Elasticsearch, almost anything from Apache Software Foundation…). While I support OpenJDK with all my heart, Oracle’s JVM still wins in terms of performance and reliability; and it’s not allowed to distribute Oracle’s JVM except internally within an organization. I may also want to keep my Docker images partially configured – software is open, but I’d prefer not to publish internal passwords, access keys, or IP numbers.

I hope that in the long run, it will be possible to exchange images in different ways (plain old rsync, distribution via bittorrent, git-annex network, shared filesystems… I could go on and on). Right now, I found only one way, and it doesn’t seem obvious, so I want to share it. Here it is:

Docker’s registry server doesn’t keep any local data; all it knows is in its storage backend (an on-disk directory, or an Amazon S3 bucket). This means it’s possible to run the registry locally (on 127.0.0.1), and move access control to the storage backend; you don’t control Docker’s access to the registry, but registry’s access to the storage. It may be implemented either as a shared filesystem (GlusterFS, or even NFS), it may be an automatically synced directory on disk, or – which is what I prefer – a shared S3 bucket. Each Docker host runs its own registry, attached to the same bucket, with a read-only key pair (to make sure it won’t be able to overwrite tags or push images). The central server that is allowed to build and tag images is the only one that has write access. Images stay confidential, and there even is a crude access control (read-only vs write access). It’s not the best performance you can get to distribute the images, but it gets the job done until there’s a more direct way to export/import a whole image.

I hope this approach is useful; have fun with it!

3 thoughts on “Distributing confidential Docker images

  1. This comment was sent to me by e-mail due to an issue with the comment form (I’ll check it out soon). I’m pasting it here, as it’s mentioning an interesting project – though you can see it’s quite young, and I haven’t fully reviewed it yet:

    I know your frustration on this matter and it’s the same frustration
    that made us create a business out of it because we realized we can’t
    be alone having the same issues.

    Initially we went with Nginx proxying for docker which was running on
    default port (5000). The problems started when we wanted to make the
    private registry “private”. By their guides docker is supposed to
    respond to a basic authentication server such as nginx but it wasn’t
    working properly at all so we looked at the code in the registry as an
    alternative but it had no authentication in place just like you said.

    Even if we managed to make the authentication working there was
    another issue: once authenticated a member can download repositories
    form other users because there’s absolutely no checking being done.

    That’s when we started rolling our own version and we added
    authentication and also permissions sharing. Meaning a user starts
    with full access to his own repository but he can also share read or
    write access to it with other members. This is very useful for teams
    and organisations.

    Thank you for your time in reading this and please share your thoughts
    as it matters for us, you can find our website at
    http://www.dockify.it.

  2. Thanks for your post. Authentication really is a big issue and I had assumed that I would be fine as soon as I saw the “private registry” feature when starting to learn about Docker.

    For now, is there any reason why I shouldn’t rely on the following: “I can run my own registry … with access limited only by IP …”. I will just have to allow access to all my deployment server’s IPs. It does seem silly that the “export” command only exports the filesystem and does not create a “image” file that I could transfer to another machine and “import” from.

    Another thing to bear in mind is that you DO NOT want to push your registry container (with any S3/glance/etc authentication configurations) to the central index. If you are only using local storage you should be fine. One will definitely have to build and run that container on the machine you wish to deploy on. I wouldn’t be surprised if a few people have already made that mistake allowing everyone to download/read their AWS authentication tokens.

    • Limiting access by IP should work, as much as it’s convenient for you. It’s kind of annoying if the cluster changes often enough At one of my projects I automatically spin up Jenkins slaves on EC2, and stop them when they’re not needed – the IP changes with every start, and the old IP can be reused by someone else’s instance. I’m using mixed hosting (EC2 and Hetzner), so I can’t rely on EC2 security groups only. At the moment the slaves don’t need to run Docker, but if/when I need that, I’d need some other way to manage access. I’d probably run the registry on localhost, and use SSH port tunnelling to make it available on each slave.

      BTW, recent Docker versions support docker save / docker load pair of commands that saves image with ancestry and metadata that can be transferred between Docker hosts. I think it was added around 0.7, and solves some of the problems here.

      Also, I obviously don’t want my storage credentials on the central index, but my registry image is there (3ofcoins/registry). The bucket name and access keys are configuration – the nodes in cluster know it from Chef, and pass them to the container as environment variables. This seems more manageable than hardcoding the credentials in the image itself, especially that it creates a chicken and egg problem: I’d need to somehow get the registry container to the node, confidentially.

Comments are closed.