Where does the distribution end?

Yesterday I’ve had an inspiring Twitter conversation with Miah Johnson. The conversation was long, branchy, and restricted by the 140 character limit. It kept me thinking. It seems the main difference we had was about where does the distribution end, and the userspace begin.

It’s reasonable to expect that if a distribution has a mechanism for preconfiguring packages, automated installation and configuration, and (kind of) configuration management, then one can use it as an end tool to configure the system. Or at least one of the tools in pipeline. Why reinvent the wheel?

For both me and Miah, the experience of trying to get things done with the Debian/Ubuntu toolchain turned out to be an uphill battle. Up a steep hill made of yak hair and duct tape, to be precise. Our conclusions were different, though: Miah wants to use the distribution’s toolchain, or switch to a distribution that has usable tools. This is how stuff should work, after all. I respect and admire that, because myself… I just gave up.

I find the clunky duct tape automation and idiosyncratic distro’s solutions workable, but by that I only mean that 98% of the time I can just ignore it, and the remaining 2% needs just a small nudge to disable a piece of setup or tell the system that I really, really want to do stuff myself, yes, thank you, I know what I’m doing.

Case in point: debconf-set-selections, which started the whole conversation. Only time I needed to use these was when I used Ubuntu’s MySQL package, to set the initial root password. Nowadays I prefer to use Percona Server, which doesn’t set initial password, so I can make Chef set it right after package installation. Otherwise, the only nudge is to disable automatic start of services when package is installed, to let Chef configure it and start it when it’s ready.

Case in point: Python and Ruby libraries. In my view, the distribution’s packages of Python packages and Ruby gems are not meant to be used in user’s applications – they are only meant to exist as dependencies for packaged application written in Python or Ruby. For applications, I just use the base language from a package (and with Ruby I prefer to go with Brightbox patched version), and use Bundler or Virtualenv to install libraries needed by my application.

Case in point: init system. Until systemd arrives, if I need to manage a service that is not already packaged (such as my application’s processes), I don’t even try to write init scripts or upstart configuration. I just install Supervisor or Runit and work from there. Systemd may change that, though, and I can’t wait until it’s supported in a stable distro.

And so on. Distribution’s mechanisms are there, but the way I see it, they are there for internal usage within distribution packages, not for poking and configuring it from the outside. I can enjoy a wide range of already built software that more or less fits together, security patches, wide userbase (which means that base system is well tested and much of the time if I have problems, the solution is a search box away). If I need, I can package my own stuff with FPM, ignoring this month’s preferred toolkit for Debian packagers. Since recently, I can keep my sanity points when I internally publish custom packages and pull other packages from a patchwork of PPAs and projects’ official repositories by using Aptly. I can run multiple instances and versions of a service contained by Docker. And I can happily ignore most of the automation that the distribution purportedly provides, because I simply gave up on it — Chef gets that job done.

One thought on “Where does the distribution end?

  1. I agree in general. Distributions do provide some level of usefulness, and ya. Its somewhere in the 90% of what I need. Yes. Chef, Puppet, or another tool can manage the flak. But there are other things to think about.

    – Building your own packages can and will lead to dependency hell. Especially if your package mirrors the name, or ‘provides’ the same functional as something packaged already.

    – Building your own packages adds time to the security race. Not to mention, any statically linked in libraries that may be updated by the system that are also included in your package.

    – You still have to clean up the BS from the package. Even if I can disable apache2 from starting after installing the package I still need to remove the /etc/init.d/apache2 script. This is something that will have to be done for the life of the system. And this is a easy one. What about databases that initialize a database in some pre-determined directory? In some cases you want this eg; the whatis-db.

    – There isn’t a ‘universal’ way to disable functionality in a package. I can’t easily pass a --no-post-inst to dpkg, I _have_ to use debconf-get-selections to find out what specific arguments need to be included in a seed file and pass that during package installation. What if the package is updated, and new options are added that I need to include in my seed file? I don’t know of many Systems Administrators that write good tests around package installation and maintenance.

    I am a fan of what I’ve been calling the ‘instance pattern’, though it could be called something else already. I definitely don’t claim to be the inventor =)

    With this pattern, like above we use Runit or some init-system to manage a unique instance of a service. Lets say apache2 for this example.

    – Consider /opt as our destination but it could be named whatever you want. This destination is typically a mounted partition (not part of /).
    – I want the apache2 binary artifacts.
    – I will construct my own /opt/apache2/instance_1 directory structure.
    – I will create a /opt/apache2/instance_1/conf directory that includes all configuration related to that instance.
    – I will create a /opt/apache2/instance_1/logs directory for log files.
    – I will create a uniquely named apache2_instance_1 type service for runit to manage.

    With that example in mind, you can see that;
    – I _dont_ care about distribution default configuration, log placement or formatting, etc. In fact. I minimize the changes made to the / (root) partition.
    – I can pre-determine the disk space utilization required for my application and use the right file system and the right block sizes or whatever.
    – I can run multiple instances of apache2 without Docker, or any other container based wrapper. (this is a type of container itself).
    – I don’t have to worry that a package upgrade of apache2 is going to fubar my /etc/apache2 at all.

    The key is really, segregating the changes made by the administrators and distribution maintainers. I really dislike that /etc/apache2 exists because it becomes confusing to any debugging the machine in the future. Yes. You could do a ps and see that apache2 is using the configuration from its instance directory, but many admins are going to look in /etc by default.

    Distros sometimes include other fun items in packages that have nothing to do with post-installation scripts. Take for example mdadm which often includes a cron job that will execute a check of your raid array on Sunday evenings.. If you’ve ever wondered why your systems performance drops on Sunday that may be it.

    Its about minimizing the unknown. Because unless you are a maintainer of that distribution, chances are you don’t know it as well as you think you do. And our jobs is all about intimately knowing our systems inside out.


Comments are closed.