Chef vs Puppet – my take on the holy war

I’m often asked why did I choose Chef over Puppet for my day-to-day configuration management work. Let me start by stating the now-obvious: the answer to the “Chef or Puppet?” question is “Yes.”

I don’t have much first-hand experience with Puppet. I do my evaluation based mostly on feature set and personal preference. In the long run, both ecosystems do pretty much the same – main difference is philosophy, and some of the features.

Here’s why Chef’s approach works better for me:

Fixed order of execution

Puppet orders resources to apply by explicitly declared dependencies. Chef executes run lists & recipes top to bottom, branching only on explicitly declared notifications/subscriptions. While Puppet’s model has nicer theory to it, in practice I prefer stability of Chef’s approach. Puppet can e.g. randomly reorder resources after adding a new one due to hashing details, which makes me a bit afraid of unexpected side effects

Native Ruby

When I was choosing, Ruby DSL for Puppet was a new, unstable and incomplete feature; it may be better now, but still basic language for writing manifests is a separate declarative language. As before, this is cleaner theory, but it forces me to write configuration files in a separate, limited language. Even though it’s supposedly Turing-complete (if you’re a wizard), it still can lead to either hairy code (I think I’ve heard you can do loops with recursion, Scheme-style), or copy&paste coding. Chef recipes are plain Ruby, so things like looping and mapping over lists or custom library additions are possible.

Chef still has clear separation between “wizard code” (definitions, resources and providers, libraries), and “code for mortals” (recipes themselves, templates, roles), which makes it easy for non-specialist programmers to pick up and modify code.

Data-driven approach and orchestration

I’ve had trouble explaining what’s the deal to Puppet people, they usually resort to “we have facts” (which seem to be equivalent to “automatic attributes” of Chef), and that “there are probably add-ons for this”. I guess this part of Chef’s out-of-the-box feature-set is not trivial to get running with Puppet. Let me explain in detail:

Chef server itself is a thin API over a document database (CouchDB) and full-text search engine (Solr). This makes the server a bit tricky to set up and manage by yourself, as there are quite a few services it depends on (CouchDB, Solr, RabbitMQ, a bunch of processes of the Chef server itself), but has a few deep implications. First is, once you get the idea that Chef server is just a searchable document database, its internal model gets very simple and consistent. What’s a Node? It’s a searchable JSON document. Role? Searchable JSON document. Environment? You guess.

Data bags? These are also searchable JSON documents, but they are a different beast. They are completely custom. A “data bag” is a named bucket for JSON documents – “data bag items”. They are searchable, and can be used by any recipe. Example data bag item from “users” data bag from one of my projects looks like this (in YAML format for simplicity):

The generic-users cookbook does a search on the users data bag to create shell accounts on the machines and populate their ~/.ssh/authorized_keys file. The Nagios and Munin cookbooks search for data bag users items with group:sysadmin to configure list of allowed OpenIDs; Nagios also sets up notifications based on this item. Jenkins cookbook does the same, but for all users, not only sysadmin. I have a central place to configure users, and all other cookbooks pull data from it – setting up the new employee is as simple as putting his data in one place and running chef-client on all machines.

It was just a simple example – this can get much more involved. I use deployments data bag to configure all the projects I’m deploying (usernames, API access keys, etc); Opscode has “application” cookbook that achieves quite deep magic this way, including continuous deployment. Well-written cookbooks make for data-driven setup: most of the minor changes means updating the data bag(s) without even touching recipes themselves.

In Chef 0.10, there is also option of encrypted data bags – with secret key shared between the node that needs to know the data and sysadmin that uploads the data bag, Chef server doesn’t need to even know sensitive details. I’ve used it to protect e.g. AWS access keys with permissions to bring up and destroy RDS database instances.

Second aspect of Chef server being a searchable document database is orchestration. A node can search for other nodes using Solr’s query language. This way, frontend webserver for “production” deployment will know addresses of application servers, which in turn will know address of the database to contact. And the other way around: application server knows IP of frontend server, and database server knows IPs of application servers, allowing them to configure firewall rules automatically. Nodes know each other’s public SSH key, populating /etc/ssh/known_hosts automatically and avoiding host key warnings. Munin & Nagios servers know which clients are out there and what services to expect. And so on, and so on…

Search is also good for looking at and selecting sub-groups of nodes (see Opscode blog post on finding recently created hosts with Chef). There’s also a “knife ssh” command on top of search that executes parallel ssh connection to found nodes – or opens a screen/tmux window to each of them.

Downside

There is one minus of Chef: there is no good web UI for Chef server. There is a panel written by Opscode, but it’s clunky and user-unfriendly. While I sometimes use it to browse server state (usually to debug changes made by others and differences between live state and Git repository), I can’t seriously use it for anything more involved.