Of Containers, Dockers, Rockets, and Daemons

I have started using Docker soon after it showed up. I’ve been running dockerized services in production since well before 1.0. I am still using it quite extensively and recommend it to people.

I have also recently decided to run my next project on FreeBSD. I’ve been playing with this system for quite a while. It purportedly performs better than Linux — definitely feels faster and more reliable. It is engineered, not evolved, and base system is a consistent composition of kernel and user space. It’s been around for 21 years, more than any Linux distribution except Slackware. On top of that it has native ZFS and pf firewall. It’s quite refreshing to work with a system that… just works. Without fuss. Consistently. A system which is not a composition of tens and hundreds independent pieces of software, glued together with duct tape.

There’s just one “but”: Docker runs only on Linux, and I got used to containerized services. Jails are kind of similar, but the toolkits around them are like Linux’ LXC: you wrap entire system in a jail rather than a single service. I spent much time in the last weeks on a research/exploration effort: how much work would it take to run Docker on FreeBSD with jails for isolation, pf for networking, and zfs for storage? (There’s apparently some effort underway, but it seems stalled, and porting and maintaining the whole thing seems like a lot of effort). What would it take to implement some scripting that would quack in a Docker-like enough way to be useful? (Not much less, but I’d have freedom to be more opinionated about the mechanisms; on the other hand, list of features that would make it fully useful seems to converge to Docker’s documentation). All in all, to get things running in a reasonable time frame, I was going to suck it up, settle on nested jails with some ZFS-based tooling (of my own, no existing jail toolkit seems to fully utilize ZFS), and work on actual features. I already started to write some exploratory code.

At this very moment, CoreOS (any association with Oreo cookies purely accidental) announced that they are releasing their own container runtime, named Rocket. The announcement has since been edited (first version was much more aggressive). Docker responded (the text currently online is also edited, and much less defensive than the original). A quick flame war ensued, then passed, there was some name calling and mud slinging, business as usual on the Internet.

While I wouldn’t call Docker fundamentally flawed, the announcement, and my later exploration of Rocket, has shed some light on problems with Docker that I was vaguely aware of, but couldn’t really put my finger on:

  • Docker is implementation-defined. All specs seem to be an afterthought, and it is hard to predict which pieces of specification are stable. For example, Docker’s HTTP API is kind of documented (but not recommended: officially supported “API” is the command-line docker tool I recall having read the previous part, but cannot find it; the sentiments behind it are mirrored in #7538, or by the fact that there still doesn’t seem to be obvious way to express a docker run invocation through the API), but it leaks implementation details all over the place.
  • Docker is a monolyth. There is one binary that is both server and client, which is responsible for building the images, retrieving them from registry, managing volumes, layering filesystems, isolating processes in containers, managing containers as services (which makes it next to impossible to tie Dockerized services into your service manager without using a separate, idle process whose only role is to stay connected to Docker daemon and terminate as soon as the container exits; hello, unnecessary overhead!), configuring NAT, proxying network traffic, and probably a couple more things I didn’t think of.
  • Docker is opaque. I already wrote about HTTP API leaking implementation details, and being actively discouraged. The registry API is even worse. I have spent half an evening trying to figure out how to run a local, authenticated registry (and implement it if necessary). I gave up. Some of the API was documented; many details were buried in behaviour of the docker binary monolyth. It felt like Perl: kind of standarized and documented, but only perl can parse Perl. Seriously, what the hell? Why on Earth can’t downloading tar files work over plain HTTP(S)?

All of that gives a feeling of vendor lock-in and being on a whim of a company, which may or may not be benevolent, who has just received big investment and started to chase enterprise customers and partnerships. And current direction of Docker seems to point towards more complexity, not less. This direction makes all kind of sense for Docker, but it’s not one I really feel comfortable with — especially that until now application container tools were a Docker monoculture.

Enter Rocket. It is developed specification-first, and the implementation is officially a prototype. Specification focuses on one area only, and strictly distinguishes area of responsibility: building an image is separate from image discovery is separate from the image format is separate from container construction is separate from runtime environment. Each piece is independent, and can be used separately. Specification is clear and precise.

I wrote above about spending half an evening trying to figure out just the Docker registry api (together with relationship of registry vs index vs daemon vs client), and giving up. In a similar half-evening I went through Rocket’s app container specification (including image discovery, equivalent of Docker’s registry), was able to ask one of the developers a couple clarifying questions on IRC (CoreOS people were very friendly and helpful), and now I can say I understand how it works well enough to be able to implement it (given enough time, obviously), or reuse pieces in my own projects.

I don’t feel up to porting whole of Rocket to FreeBSD’s ports right now (but who knows? The codebase looks simple and straightforward), but as I am trying to quickly whip up something container-like on top of jails for my own project, I have some already written, well-designed specification I can work with, and some code that I can at least partially reuse. Docker is useless if I don’t port all of it; Rocket is useful even if I don’t use any of their code, and their code is modular enough for me to use only the pieces I want to. And while I am a bit sad that the whole thing began with name calling and mud slinging, and disappointed by Docker’s initial response, I am really happy to see some competition in the container management tools. Even if Docker stays dominant and Rocket doesn’t take off all the way, diversity of tools and preventing monoculture is valuable on its own. I have to admit: I’m pretty excited now.


Where does the distribution end?

Yesterday I’ve had an inspiring Twitter conversation with Miah Johnson. The conversation was long, branchy, and restricted by the 140 character limit. It kept me thinking. It seems the main difference we had was about where does the distribution end, and the userspace begin.

It’s reasonable to expect that if a distribution has a mechanism for preconfiguring packages, automated installation and configuration, and (kind of) configuration management, then one can use it as an end tool to configure the system. Or at least one of the tools in pipeline. Why reinvent the wheel?

For both me and Miah, the experience of trying to get things done with the Debian/Ubuntu toolchain turned out to be an uphill battle. Up a steep hill made of yak hair and duct tape, to be precise. Our conclusions were different, though: Miah wants to use the distribution’s toolchain, or switch to a distribution that has usable tools. This is how stuff should work, after all. I respect and admire that, because myself… I just gave up.

I find the clunky duct tape automation and idiosyncratic distro’s solutions workable, but by that I only mean that 98% of the time I can just ignore it, and the remaining 2% needs just a small nudge to disable a piece of setup or tell the system that I really, really want to do stuff myself, yes, thank you, I know what I’m doing.

Case in point: debconf-set-selections, which started the whole conversation. Only time I needed to use these was when I used Ubuntu’s MySQL package, to set the initial root password. Nowadays I prefer to use Percona Server, which doesn’t set initial password, so I can make Chef set it right after package installation. Otherwise, the only nudge is to disable automatic start of services when package is installed, to let Chef configure it and start it when it’s ready.

Case in point: Python and Ruby libraries. In my view, the distribution’s packages of Python packages and Ruby gems are not meant to be used in user’s applications – they are only meant to exist as dependencies for packaged application written in Python or Ruby. For applications, I just use the base language from a package (and with Ruby I prefer to go with Brightbox patched version), and use Bundler or Virtualenv to install libraries needed by my application.

Case in point: init system. Until systemd arrives, if I need to manage a service that is not already packaged (such as my application’s processes), I don’t even try to write init scripts or upstart configuration. I just install Supervisor or Runit and work from there. Systemd may change that, though, and I can’t wait until it’s supported in a stable distro.

And so on. Distribution’s mechanisms are there, but the way I see it, they are there for internal usage within distribution packages, not for poking and configuring it from the outside. I can enjoy a wide range of already built software that more or less fits together, security patches, wide userbase (which means that base system is well tested and much of the time if I have problems, the solution is a search box away). If I need, I can package my own stuff with FPM, ignoring this month’s preferred toolkit for Debian packagers. Since recently, I can keep my sanity points when I internally publish custom packages and pull other packages from a patchwork of PPAs and projects’ official repositories by using Aptly. I can run multiple instances and versions of a service contained by Docker. And I can happily ignore most of the automation that the distribution purportedly provides, because I simply gave up on it — Chef gets that job done.


We visited Rails Girls Warsaw

I was invited to coach at last week’s Rails Girls Warsaw, and I’m very delighted with being given such an opportunity.

Three of Coins is by no means a RoR company – we write Chef cookbooks, Vagrantfiles run in our blood, and one of the best ways to increase our heart rate is to iterate through backup systems, looking for the Perfect One.

Having said that, I do believe engaging in initiatives that increase the diversity of our community is important enough to spend time learning new technologies. Even intermediate Rails knowledge is enough to help other people get started! In my case, not-so-distant memories of the issues I myself encountered at the beginning of my Rails path allowed me to better understand (and prepare for) questions and problems that might arise during the workshops.

Wonderful organizing team from Warsaw

Wonderful organizing team from Warsaw (Anksfoto)

To make sure that participants got the best out of the weekend, each coach was looking over a group of just three students. Although most attendees had no prior programming experience, one member of my team had learned C during computer science classes at her university – classes designed to actually introduce programming languages, classes that, suffice to say, did not encourage to study programming, nor did they shed light on what programming is about.

The group I was coaching. Yes, we had chocolate!

The group I was coaching. Yes, we had chocolate! (Anksfoto)

Of course, not everyone has to know how to program, and not every participant of the workshop wanted to pursue a programming career. Knowing how this stuff works is an advantage in many fields though, even if the position you hold in your company / non-governmental organization is not the most technical one. And seeing how my team’s enthusiasm rose as workshop time passed, how they stated they want to continue their programming education made me very happy. I had a chance to donate some of my time towards increasing the diversity in our, still homogeneous, programming community.

Rails Girls Warsaw participants

Rails Girls Warsaw participants (Anksfoto)

If you’re interested, check out Rails Girls Warsaw’s Facebook page. If you’re wondering about coaching, you’ll find a list of local Rails Girls workshops here – as you can see, there’s plenty of locations to choose from! And if you are (or you know) Polish, there’s a message board for Rails Girls Warsaw attendees, where participants can share their problems and accomplishments, supported by most of the coaches of previous editions. Give it some thought – educating other is always an invaluable learning experience.


This is what I found at FOSDEM

FOSDEM

FOSDEM is considered the best European conference about open source. I knew that before I participated first time this year, but still was impressed. This year’s edition had an astonishing number of 22 tracks. Most of them (though not all) last for one day, and having a look at the schedule might give you a headache.

After a weekend spent on talks, discussions, and waiting in line to squeeze into the more popular devrooms, I prepared a digest of what I found interesting. (or where I managed to squeeze into) Fortunately (and impressively!) FOSDEM talks (a bit over 510) were recorded and should be available at http://video.fosdem.org/ soon.

Chef’s Sean O’Meara gave an overview of configuration management. It was a good refresher for those more advanced and a useful introduction for people with less experience. Sean talked about the difference between convergence and idempotence and stressed to use these two terms correctly, showed the importance of writing configuration in the correct order (to make sure the code is idempotent) and reminded to always pull, never push. Sean’s presentation and recording aren’t online yet, but you can check out code he wrote for FOSDEM to illustrate his points here.

You might enjoy the „The classification problem” talk by Marco Marongiu. Marco shared his experiences of the pitfalls of internal classification, how exceptions are the unavoidable norm and talked about CFEngine.

Michael Ducy from Chef talked about cross-distro automation. He showed how delivering everything together while abstracting away from the implementation is the way to go.

I did not manage to grab a seat at the „Metadata ocean in Puppet and Chef” talk, where Marc Cluet from Rackspace presented best practices of organizing metadata, so I’m looking forward to the video.

Peter Chanik’s lightning talk might interest you in syslog-ng, a tool that customizes log messages (but that’s putting it very simple, you should probably visit their website).

James Turnbull of Docker shared results of his observations of OSS communities during his „Software Archaeology for Beginners” keynote. Being a seasoned member of the OS community, you might not experience the problems he touched on as often as people new to OS, sometimes overwhelmed by existing trolling and not-always-transparent decision processes or poor documentation. He advised to get to know a community before submitting any changes, to ask contextual questions, to over-share (dumps and logs!) and keep comments as positive as possible, as this easily isolates trolls and bike-shedding. He also suggested that the biggest help might not always come from developing a new feature: fixing broken tests or updating outdated or scarce documentation might be even more welcome.

Florian Gilcher talked about unicorns: those mystical creatures that exist, just like good, article-style technical documentation. Florian asks interesting questions and goes through a few solutions.

What I found positively surprising was a whole track committed to building more energy-efficient software. Looking for ways to minimize the energy consumption of a device was always considered to be a domain of the „hardware people”. Jeremy Bennet from Embecosm and Kerstin Eder form the University of Bristol talked about initiatives aiming to raise awareness and conducting research to support innovation in energy-efficient software. The energy-efficient computing devroom hosted some noteworthy talks. You should check out MAGEEC and ENTRA.

The FOSDEM conference wouldn’t be possible without a hundred volunteers, a team of organizers and sponsors. Participating was a last-minute decision for me and I’d regret very much if I decided not to. It was impossible to see everything I found interesting (for example the legal-policy issues track), so video recordings are invaluable. Summing up: watch out for next year’s edition!


We released Chef Browser today

Even if you know your way around a set of tools and consider yourself an efficient user, some little helpers can still save you lots of time. Today, at Warsaw Ruby Users Group evening, we have officially released such a helper for Chef users. Chef Browser is an open source tool we created to support your daily work. As the name unsurprisingly suggests, Chef Browser lets you browse the data in your Chef server. If you use knife search and knife show a lot to find out detailed information, you might be pleased to know you don’t have to re-type the same commands over and over again to access certain data.

Getting started

You’ll need Ruby 1.9.3 or 2.0.0 (Rubinius and JRuby are also supported) with Rubygems and Bundler. Download the code from the GitHub repository. Run bundle install to install the needed gems. Create a settings.rb file, which will override default settings. To do that, follow the examples in the settings.rb.example file, providing details of your Chef server (url, client name, client key). When you’re ready, run puma -e production. If not specified otherwise, Chef Browser will be available at http://localhost:9292; for production deployment, tell your Apache or Nginx to proxy from that port.

Listing resources

The main page of Chef Browser lists all known nodes. You can navigate to other resources from there: roles, environments, data bags and data bag items. After clicking on any item on the list, you will be taken to a page showing this item’s details.

Chef Browser » Roles

Neat attributes

Let’s use nodes as an example: when viewing details of a node, the accessible data is close to what you’d get by typing knife raw /nodes/node_name . The difference lies in formatting: Chef Browser presents nested attributes with their JSONPath. This makes it easier not to get lost in (sometimes quite deeply) nested attributes, and to find the one you’re looking for without having to navigate down the hierarchy – especially handy if you’re not sure in which part of the attribute hierarchy your data lives. Since it won’t be uncommon for a table of JSON attributes to be over a thousand rows long, the table is supported by a jQuery live filter, narrowing the visible data to just what’s necessary.

Chef Browser » Nodes » batch

We aimed at linking data together where possible. For example, tags and the environment of a given node will link to other nodes sharing the same tag or environment.

Configurable saved searches

How often do you need to find some nodes across the servers by their content, and you have to use (and remember) queries like like knife search mysql_server_root_password:* -i? (Long) search queries that you run often can be saved and accessed from the search bar’s dropdown menu:

Chef Browser » Nodes

It’s enough to edit your settings.rb file, adding one line per saved search, like so: node_search['MySQL'] = 'mysql_server_root_password:*'

Under the hood

We decided to make Chef Browser as light-weight as possible. Since all data is acquired by queries to the Chef server, there’s no necessity to have a separate database. Hence, Sinatra became our web framework of choice. The app is based on Bootstrap 3, which is 100% responsive and looks well on tablets and mobiles – a small thing people on pager duty might appreciate. To talk with the Chef server, we use Ridley, a Chef API client gem made available by Riot Games

Future

This is the first release. We’ve done our best to test Chef Browser, and we already use it everyday. We also plan to keep working on it: on the roadmap there’s at least cookbook browsing and access to encrypted data bags. We’re also very open to suggestions, bug reports as well as seeing new issues and pull requests on Github.


Backups suck — a rant

I’m not even mad, I’m just disappointed. I’m tired. Tired of trying to force my cloud-shaped peg through a tape-shaped hole, of custom data formats and convoluted protocols, and of half-assed systems that work well as long as I have just one machine to manage, with just one kind of data on it — unless I want to hack all the management for different data myself. The current state of open source backup software is sad. I have even tried looking at commercial solutions, but couldn’t extract any real information on what’s inside the box from the enterprise marketing copy. Are my expectations unrealistic?

I’m writing this post freshly after a single restore of 370 gigabytes of database that took almost a week of wrestling with broken storage, incomplete archives, interrupted transfers, stuck communication, and — most of all — Waiting for Stuff to Complete, for hours and hours. In fact, much of this post has been written during the Waiting. Many of the issues I have wrestled with were caused by mistakes on my side: misconfiguration, insufficient monitoring that should have detected issues earlier, and the fact that over last month I was not able to pay enough attention to the day-to-day maintenance, which allowed the suckage to accumulate. At the same time, the software systems should automate out the common parts, make it easy to get things right, and be easy debug when they aren’t. All of the backup systems I’ve ever used or seen fails miserably at two or more out of these three points.

But let’s start from the beginning.

Why do we even need backup?

When we hear the word “backup”, we usually imagine a disaster: say, a database server has failed, everything on it has been lost, and we need to get a new one up, as quickly as possible, losing as little data as possible. But this is just one of many cases when backups are helpful.

Actually, for this particular case, backups aren’t even the best tool; online live replication will be quicker to replace the failed piece (just promote the slave to master, and we’re done), and will lose less data (only the replication lag, usually in range of single seconds). The replication slave can be even used to get some load off the main server by responding to read queries that don’t need perfectly synchronized results, such as analytics and reporting.

This often leads to the conclusion that a backup system isn’t needed after all. Come on, it’s 21st century, we have online replication with heartbeat checks to automatically promote the slave. Why would we need a bunch of static archives that take up space to store and time to recover, what is it, 1980s?

But not only disasters are dangerous to data, and keeping an archive of history snapshots can be useful in many other cases, including recovery from PEBKAC problems where data was damaged by someone, or when an application bug sent a DELETE query to the database that got happily replicated to the slave, in realtime. If you have actual backups — a history of static snapshots going back into the past — you won’t be bothered by hearing any of these:

  • Hey, man, that customer just wrote us they can’t see the old comments on their widget listing page. They swear they didn’t click anything (yeah, right), and that the comments were there 17 days ago. Can you bring them back?
  • Our WordPress has just been hacked, and some code has been injected into the PHP files. Can you check when did it happen and give me the latest clean version to compare?
  • I have worked on that Accounting Spreadsheet three months ago, and I must have deleted it when cleaning my desktop. HALP?
  • Can we test the new release on our full production dataset? The database migration transforms every single document, and we hope we got all destructive corner cases, but you know how creative our users are…
  • I’m preparing a report for the investors, and need some growth figures – do you have some records on amount of customer data we keep for the last three years?

Once you have backup policy right, it’s a time machine where nothing of value is truly lost. It is a safety net for the business. Why then the state of backup software now looks even worse than monitoring software in 2011?

21st Century Backup Checklist

What would I expect of a perfect backup system? Besides seeing into the future and storing only the data that I will actually need, instantly available at the moment it’s needed, and compressed not to take any storage space, that is? A decent backup system would be, in no particular order:

  • Using standard formats and tools. I want to have clear and simple recovery procedure if all I have is backup volumes and a rescue boot / recovery CD. Needing an index with encryption keys and content details is still fine, if the system will generate that for me in a readable format.
  • Encrypted and compressed. You can’t trust your datacenter anymore when you’re in the cloud, spread across five data centers, three continents, three hosting providers, and two storage providers. It’s not your company skyscraper’s enforced concrete cellar anymore. I want my backups to be transparent to me, and opaque to the storage provider.
  • Centrally managed, but without bottlenecks. There should be a single place where I can check status of backups across whole system, drill down through a job’s history, trigger a restore or verification, and so on. On the other hand, some of the heavy lifting should be on the node’s side; in particular, if I’m using cloud storage, the node should directly upload the encrypted data to the storage, rather than push the same data to the central location, which would then bounce it to the cloud.
  • Application-specific. I want to be able to use my tools, that are standard enough for me. If I’m backing up MySQL, I prefer xtrabackup to tar.
  • Zero–copy. I don’t want to have to copy the original data to another directory, then tar it up to an on-disk archive, then encrypt the tarball — still on disk — and only then copy it to the storage. This can and should be done online in a pipeline. We work with terabytes of data nowadays, needing double or triple local storage just to do backup is silly.
  • Able to use different storages. I want to be able to use three cheap unreliable storage providers. I don’t want to be locked in any single particular silo. In particular, I don’t want to pretend that my backup storage is made of pools of magnetic tapes and keep a set of intricate scripts pretending that Amazon S3/Glacier, my local disk directory, or a git-annex repository is a tape autochanger.
  • Supporting data rotation. I want to be able to delete old volumes, not just to add data and keep forever. I want to be able to easily specify how I want the rotation to work, and to change my mind later on.
  • Supporting data reshuffling. It should be possible to move a volume between storages: keep fresh data in the same datacenter, at the same time archive it in the cloud, and put monthly snapshots in deep freeze. If I feel like switching storage providers, adjust my expiration scheme (and apply it to already created volumes), or just want to copy data around manually, I should be able to get it done.
  • Secure. No node in the system should access other node’s backed up data. In fact, no node should even access its own historical data without explicit permission. The less node itself can see or has to say, the better. It is especially bad if node has direct access to underlying storage: in case of a break-in, the attacker can not only take down the server, but also delete all its backups (sometimes even other nodes’ backups)
  • Flexible. I want to be able to restore one machine’s backup to another machine. I want to be able to restore just some files. I want to use backup system to keep staging database up to date with production data, or to provision load test infrastructure that mirrors recent production.
  • Scriptable. I want to be able to give my client a “type this one command” or “push that one button” restore instruction, not three paragraphs. This includes restoring production backup to staging database and deleting sensitive data from it.
  • Testable. I want simple way to specify how to verify that I not only have the backups, but I’ll also be able to restore them. In perfect situation, a one-button fire drill that brings back a copy of production environments and checks that it’s readable. And this button is pushed on a regular basis by the monitoring.

Where do we stand now?

Some of currently available systems meet some of the above points.

Bacula is centrally-managed, secure, flexible, can compress volumes, and with some gymnastics it can rotate and reshuffle volumes, be application-specific and do zero-copy backups and restores (though I haven’t managed to do that, or to even see any tutorial or report from anybody who have done that). It fails miserably when it comes to transparency, standard formats, encryption, and storage backends. And I have to pretend to it that all storage I have is on tapes. It’s awfully clunky and hard to debug when something doesn’t work like it’s supposed to. It’s also underdocumented.

Duplicity handles one machine and one dataset, but it is good with encryption, standard data formats, different storage backends, and rotation. I think it can also be made application-specific, though zero-copy backups may not be possible. Without central management, security and flexibility are irrelevant.

Some people have success with BackupPC, but it’s file-centric, too much decision is left to the node being backed up, and it seems to be focused on backing up workstations.

Other systems that I have looked at either are individual low-level pieces of puzzle, are focused on individual machines rather than whole systems, or are overcomplicated behemoths from the 90’s that expect me to think in terms of tape archives. I couldn’t get through marketing copy of commercial solutions, but I don’t have high hopes in these.

There is one more project I forgot to mention when writing this rant yesterday: Obnam. It is built on some very good ideas (deduplication, data always visible as full snapshots even when incremental, encryption). However, it is still focused on a single node, on backing up files (haven’t found any info about application-specific formats and tools), and uses custom, opaque storage format (which seems to be the price for deduplication — a necessary design trade-off). Without any special means, the node can access — and overwrite or delete — its backup history. If we choose to share a repository between many nodes, each node has also access to all other nodes’ backups.

Of all these, Bacula is closest to what I imagine to be a good solution: it got the architecture right (director for job/volume catalog, scheduling, overall control; storage to store files; client to receive commands from director and push data directly to/from storage). On the other hand, its implementation is just unusable. Its communication protocols are custom, opaque, and tied to the particular version; storage layer could have alternative implementations, but it’s practically impossible, as there is no complete specification of the protocol, nor any hints which parts of it are stable, and which can change. The storage, even on disk, is designed in terms of tape archives. There are even no hooks for disk volumes to allow moving the files around in a smarter way (e.g. with git-annex). Its configuration is more fragile and idiosyncratic than Nagios’. The whole system is opaque and debugging it is a nightmare. It’s not scriptable at all (to make a script that just starts a predefined restore job, I had to use Expect on bacula’s CLI console). It uses custom storage formats that cannot be easily restored without setting up whole machinery. It supposedly can do zero-copy application-specific backups, but it requires quite fragile configuration, and I haven’t ever seen a single working sample of that, or report of anybody having done that. All of these issues are buried deep inside Bacula’s implementation and design. It’s hopeless.

I have also learned that there is a fork of Bacula, named Bareos. Not sure whether it’s actually going towards something more transparent, or just maintaining the hopeless design.

What now, then?

Now, I’m trying to estimate how hard would it be to create a proof-of-concept system based on Bacula’s architecture, but using open protocols, standard tools, and encryption: https for communication, client SSL certificates for authentication, tar as a storage format, gnupg for encryption. For an initial implementation, storage could be handed off to S3 (with its automatic Glacier expiration), but using a façade API that would allow to use different backends. In the last few days, I’ve been playing around with Go to try and implement a proof of concept sketch of such system. This direction looks promising.

If you have any remarks, want to add anything, maybe even offer help with development or design of such a system — or tell me that such system already exists, which would be a great news — feel free to use the comment form below, Twitter, or Hacker News. And if you’re by any chance in London at Velocity EU, just catch me around, or look for a BoF session.

Edits

2013-11-13
Added additional remark about security, mentioned Obnam and Bareos

Devopsdays Barcelona

As I’ve used to live in Barcelona, I couldn’t pass by the opportunity to represent the dev part of 3ofcoins on the devopsdays event that took place there this week. Here are my takeaways from the dev point of view.

Firstly, I’d like to thank the organizers for making this happen. I’ve done my share of conference organizing and I know how much work and stress that can be. Kudos to you guys! There were some things that could’ve gone better, but you live and learn.

Secondly, it’s the first conference I’ve been to with a reliable wifi connection. Unbelievable. It’s probably mostly due to the size of the conference, but still impressive.

Now, moving on to the gist of the event. There were talks and open spaces. The talks varied in quality. The first day started poorly with a buzzword-filled presentation on Disciplined Agile Delivery that made me reach for my laptop to work on some code instead. Later that day we saw a more interesting talk on enemies of Continuous Delivery and a comprehensive report on the state of devops in various European countries. There was some clear confusion on what the Ignites talk would look like, as some of the presenters clearly weren’t ready for this format. Still, it was great to hear about banks giving back to the community (redborder.net).

On the second day there was a nice presentation on Continuous Performance testing – something few teams do and what we all definitely should look into. It was followed by a perfectly-delivered talk on how to measure your value to the company instead of being remembered only in the context of failures and problems. The day finished off with guys from Tuenti talking about their well-automated development & deployment infrastructure.

But devopsdays are more than talks, there were also open spaces – a concept completely new to me. Unfortunately I’ve missed this part on the first day, and on the second day many people have already left. But it’s about quality, not quantity, and I still joined a nice discussion on devops on multisourcing. There was some misunderstanding with the venue owner and we had to cut the second round of open spaces short, which was a shame.

Overall, it was a nice time and an interesting glimpse for me into the devops community. I wished there were some more technical talks, but I still found half of them quite interesting. I’d go again.

If you want to see some of the talks from the conference, they’re already up on the devops vimeo channel. Superfast.


Distributing confidential Docker images

Here’s my another pet peeve with Docker: the infrastructure for distributing images is simply Not There Yet. What do we have now? There’s a public image index (I still don’t fully get the distinction of index vs registry, but it looks like a way for DotCloud to have some centralized service that’s needed also for private images). I can run my own registry, either keeping the access completely open (with access limited only by IP or network interface), or delegating the authentication to DotCloud’s central index. Even if I choose to authenticate against the index, there doesn’t seem to be any way to actually limit access to the registry — it looks like anyone who has an index account and HTTP(S) access to the registry can download or push images.

It doesn’t seem there is any way in the protocol to authenticate users against anything that’s not the central index – not even plain http auth. Just to get https access, I need to put Apache or nginx in front of the registry. And did I mention that there is no way to move full images between Docker hosts without a registry, not even a tarball export?

I fully understand that Docker is still in development, and that these problems mean that there is not much of bigger showstopper issues, which is actually good. However, this seems to seriously limit usefulness of Docker in production environments; I need to either stop controlling who’s able to download my images, or I need to build image locally on each Docker host — which prevents me from building an image once, testing it, and then using the very same image everywhere.

And the problem with distribution is not only with distributing in-house confidential software. A lot of open source projects run on Java (off the top of my head: Jenkins, RunDeck, Logstash + Elasticsearch, almost anything from Apache Software Foundation…). While I support OpenJDK with all my heart, Oracle’s JVM still wins in terms of performance and reliability; and it’s not allowed to distribute Oracle’s JVM except internally within an organization. I may also want to keep my Docker images partially configured – software is open, but I’d prefer not to publish internal passwords, access keys, or IP numbers.

I hope that in the long run, it will be possible to exchange images in different ways (plain old rsync, distribution via bittorrent, git-annex network, shared filesystems… I could go on and on). Right now, I found only one way, and it doesn’t seem obvious, so I want to share it. Here it is:

Docker’s registry server doesn’t keep any local data; all it knows is in its storage backend (an on-disk directory, or an Amazon S3 bucket). This means it’s possible to run the registry locally (on 127.0.0.1), and move access control to the storage backend; you don’t control Docker’s access to the registry, but registry’s access to the storage. It may be implemented either as a shared filesystem (GlusterFS, or even NFS), it may be an automatically synced directory on disk, or – which is what I prefer – a shared S3 bucket. Each Docker host runs its own registry, attached to the same bucket, with a read-only key pair (to make sure it won’t be able to overwrite tags or push images). The central server that is allowed to build and tag images is the only one that has write access. Images stay confidential, and there even is a crude access control (read-only vs write access). It’s not the best performance you can get to distribute the images, but it gets the job done until there’s a more direct way to export/import a whole image.

I hope this approach is useful; have fun with it!


Flat Docker images

Docker seems to be the New Hot Thing these days. It is an application container engine – it lets you pack any Linux software into a self-contained, isolated container image that can be easily distributed and run on different host machines. An image is somewhere in between a well built Omnibus package, and a full-on virtual machine: it’s an LXC container filesystem plus a bit of configuration on top of it (environment variables, default command to run, UID to run it as, TCP/UDP ports to expose, etc). Once you have the image built or downloaded, you can use it to start one or many containers – live environments that actually run the software.

To conserve disk space (and RAM cache), Docker uses AUFS to overlay filesystems. When you start a container from an image, Docker doesn’t copy all the image’s files to the container’s root. It overlays a new read/write directory on top of read-only directory with the image’s filesystem. Any writes the container makes go to its read/write image; all the reads of the unchanged files are actually performed from image’s read-only root. The image’s filesystem root can be shared between all its running containers, and a started container uses only as much space as it has actually written. This conserves not only disk space – when you have multiple containers started from one image, the operating system can use the same memory cache for all of them for the files they share. This also makes the boot of the container pretty much instantaneous, as Docker doesn’t need to copy whole root filesystem.

This idea goes a bit further: the image itself is actually a frozen container. To prepare a new image, you just start a container, run one or more commands to install and prepare the application, and then commit the container as a new image. This means that your new image has only the files that have been added or changed since the image you started it from; and that base image has only files that have changed since its base image, and so on. At the very bottom there’s a base image that has been created from a filesystem archive – the only one that actually contains all of its files. There’s even a cool Dockerfile configuration that lets you describe how to build the application from the base image in one place. And this is where the layering goes a bit too far.

Two versions of the same Docker imageDocker itself is intentionally limited: when you start a container, you’re allowed to run only a single command, and that’s all. Then the container exits, and you can either dispose of it, or commit it as a new image. For running containers, it’s fine – it enforces clean design and separation of concerns. When building images, though, every “RUN” line means a new image is committed, which is a base for the next “RUN” line, and so on. When building any reasonably complex software based only on Dockerfile, we always and up with a whole stack of intermediate images that aren’t useful in any way. In fact, they are harmful, as there seems to be a limit on how many directories can you stack with aufs. It’s reported to be 42 layers, which is not too much considering size of some Dockerfiles floating around.

It seems that flattening existing images is not a simple task. Can we build images in one go, without stacking dozens of them on top of each other? It seems it’s actually quite easy. You just compress all the “RUN” Dockerfile statements into one shell script, and you’re halfway there. If you compress all the “ADD” statements into a single one too, you’re almost there: there’s one image for ADD, and a second one to RUN the setup script. Besides an unnecessary intermediate image being plain ugly, we have another issue: the ADD line often copies a big installer or package into an image, only to have it removed by a RUN line after installation. The user still has to download the intermediate image with the huge package file, only to not see it because child image had it deleted.

It turns out we can use shared volumes instead of the ADD statement. If we use docker run + docker commit manually rather than docker build with a Dockerfile, we can download all the installers on the builder host, expose it to the container as a shared volume, and then commit the container into an image in a single pass.

It’s possible to just write a shell script which does all of that manually. But the Dockerfile format is quite comfortable to use, and there’s quite a few container definitions already available that we wouldn’t be able to reuse. And it’s completely feasible to just read the Dockerfile syntax by a script that would execute it as a one-pass single-layer build:

  • Compose a single dict out of metadata commands, such as MAINTAINER, CMD, or ENTRYPOINT
  • For each RUN command, add a line to a setup shell script in a shared directory
  • For each ADD command, copy the data inside the shared directory, and add a line to the setup script that copies it to its final location
  • Build image in one pass, with a single docker run -v $shared_dir:/.data $from /.data/setup.sh, and if it’s successful, commit the image with a single docker commit using the metadata gathered earlier.

As it turns out, 125 lines of Perl is all it takes. The docker-compile.pl script is in the Gist below, and in the image to the right you can see inheritance diagram of the lopter/collectd-graphite image in two versions: on the right is the original, created by docker build; on the left is one created with docker-compile.pl. The flat one takes 299 MB; the combined ancestry of the original is almost 600MB.

All the script needs is Perl and JSON CPAN module (on Debian or Ubuntu, you can install it with sudo apt-get install libjson-perl). I hope the idea will prove useful – if there’s demand, I can take this proof of concept, polish it, document, and set up a proper GitHub repo with issue tracking and all. Happy hacking!


A brief look at Eurucamp 2013

Starting a day after jRuby conf and held by the lovely Müggelsee lake in South-East Berlin, this year’s Eurucamp was a well balanced mix of talk and play. Kudos to the lovely organizers team for an extended lunch break that made it possible to enjoy the (probably) last days of summer.

The conference was opened with an inspiring keynote by Joseph Wilk. You can see the slides to “Can Machines Be Creative” here. Wilk showed numerous examples of teaching computers to do things humans would consider creative if they were done by other humans, raising important questions about the role of technology in art, and beyond. Other conference talks quite evenly touched on technical, abstract and community issues. Here’s what I picked:

Michael Grosser encouraged using the airbrake_tools and air_man libraries to debug more effectively. Both of them log, prioritize and trace exceptions. Additionally, air_man allows to run airbrake_tools constantly, sends emails about exceptions and assigns people to them, so nothing gets lost and not taken care of. Michael also showcased his library request_recorder for effective and friendly logging. Request_recorder sits in your application’s stack and stores the full log; it also comes with a Chrome extension. You can see the whole presentation here.

As far as logging is concerned, Matthias Viehweger shared some useful tips. Most important message: never raise your log level globally — you’ll start treating everything as noise faster than you know it.

Arne Brasseur talked about web linguistics and why we should stop using strings to handle structured data. You should check out the slides for valuable insights into web security (and refreshing graphic representations).

If you’re striving to be a better programmer, Eurucamp gave you the opportunity to listen to three inspiring talks. Joanne Cheng, who’s a developer working at Thoughtbot, talked about how Ruby Processing helped her become a better programmer. Her talk is not online yet, but if you’re a beginning or intermediate programmer, always wanted to be an artist, or you just like seeing the effects of your work straight away, you ought to learn about Ruby Processing and go make some art. You should also visit the Processing website.

Ellen König presented most effective learning techniques. We all know that learning is important and that all of us should develop our skills constantly, but picking the right thing we’d like to get better at, setting goals and getting there may not be as easy as we think. Fortunately, Ellen got that covered. Watching Floor Drees‘ talk “What I learned learning Rails” is a good wrap up. You can watch the slides and/or read her talk.

Also worth checking out: Drew Neil‘s “Modelling state machines with Ragel”.

I’d like to finish with linking to a great Eurucamp talk made by Ashe Dryden. Ashe spoke about diversity in our programming community and quoted some powerful statistics. In her talk she goes through some important concepts that we should keep in mind. Did you know that women make up for only 3% of OSS contributors, but at the same time they’re 73% of Bulgaria’s computer science students? Read Ashe’s slides to learn more and find ideas to make your community more diverse.