Sleeper Agent for Python

tl;dr I just wrote a Python library that helps with debugging live processes. Check it out on Github.

I love the idea of after-conference sprints. My usual reaction to a tech conference – days of listening to talks on hacking, talking with people about hacking, and being bombarded with varied, concentrated knowledge on hacking – is to want to sit down and actually hack something. And during the process I get all kinds of ideas, or remember some idea I had during a deathmarch half a year ago when I had no time to do it properly and forgot it since. It seems like it’s not just my reaction. Right after conference is just perfect time to give people time and place to sit down and use all this built up motivation and ideas. I’m glad to see this kind of afterparty happening more often.

This is what people from Python Italia did at Europython 2012 in Florence. One of the talks that caught my particular attention was sys._current_frames(): Take real-time X-rays of your software for fun and performance by Leonardo Rochael Almeida. A very technical and concrete talk, that reminded me one particular bug hunt some months ago. At one of the projects I work with, we use Celery to do the background work. Pretty standard. But, at some point (we could not relate it to any particular code change or upgrade), the worker pool started to lock up. Master process stopped responding to celeryctl status queries, individual workers stopped receiving any work, all froze up hard. kill -9 kind of hard. Also, we were not able to replicate it in any environment smaller than production, which makes debugging a little bit harder.

Fired up usual tools: strace & lsof. Found that processes were not freezing up, but actually doing something. Got into pdb to poke into the process, and crashed right into the wall. The backtrace I got looked something like this:

This one was taken with a regular, idle ipython session; the production backtrace has been longer, but just as useless. Now, how could I get to a Python-level backtrace? Preferably, for all the threads?

The Python wiki includes a Debugging with Gdb page. Sweet. But it needs Python interpreter to be compiled with debugging symbols enabled. Not something I’d like to run in production. Also, getting this to run under Debian Squeeze included hot-patching gdb scripts and configuration. This looked like a deep hack, that may be useful to debugging standard library and CPython itself, but not so much in my case.

Trying to call out to traceback.print_stack from gdb level resulted only in segmentation fault (probably related to the GIL). That was when I dropped the idea and just worked around the issue by running a bunch of solo workers instead of the pool, and updating the configuration later.

To be honest, later it turned out that the issue would not be debuggable this way, and it was related to pool master fork()ing out children after creating some POSIX threads, and the children did not execv() immediately afterwards to reset the state. In human language: this means random crashes. For details, look into CELERYD_FORCE_EXECV option and Python issue #6271. But still – possibility of poking into a living process without debugging symbols and seeing, not hypothesizing, what it REALLY does now, would have been a good thing.

In the meantime, I have figured out how to (kind of) do it in Ruby (regular MRI, tested on REE). Just type this into gdb to see the main thread’s backtrace on the stdout of the process:

Not so easy in Python. Tried to do some improvising that I could wrap inside a gdb macro, but this turned out to be too hairy with the GIL and all. This called for a different approach:

The Sleeper Agent

The idea I had then (and that I finally got to work on at the sprints) was to have a Python library loaded into the process, which would be composed of two pieces:

  • a Python function that gathers the stack traces and other state of the process, and returns it as a string, and
  • a C module, loaded into the process, that would export a C function calling out to the Python function, and return its result as a C string.

The library, when loaded, would do nothing. (Well, it might be used to log process state on error, but it’s just optional). It would be used only when I manually poke the process with gdb, when the sleeper agent would activate and give me the exact report on what’s happening inside the process.

It could be also used to do statistical profiling (see also this Stack Overflow answer). If I want to see where a long-running process spends its time, I can just poke it every 15 seconds (or every 15 minutes), dump the stack trace, and then analyze it. The hot code paths will just show up.

This is exactly what The Sleeper Agent does. It even includes a script that calls out to gdb for you! (it prints out some extra info I wasn’t able to suppress, but I guess it’s not a big problem).

The README file includes all kinds of technical details, and the code itself is short enough to be readable.

What next?

The agent can be expanded to be even more useful. Some ideas come to mind, like:

  • Extend returned info by locals and globals, or maybe function arguments, for each stack level.
  • Possibility to dump the data in parseable format (JSON, YAML?) to be parsed and analyzed later on. This would help with the statistical profiling use case. It may be easier then to have Python side write to a file and return its path to C/gdb.
  • Deep Magic stuff: API for interactively inspecting the stack, locals and globals, with attached gdb. This would be a big process that would need scripting gdb and devising an API to explore the stack. Maybe some parts of Pdb could be reused here?

Any more ideas? This tool looks like it may be useful. I’d like to explore the statistical debugging path. The usual way to do it seems to be set SIGALRM or custom signal handler triggered from the outside, or to have a background thread that wakes up regularly and saves the state. I kind of like the idea of not having any custom code actually plugged in – just poke the program with the debugger, and do it only when I need it.

I hope that result of the Europython sprint will prove useful – and that other tech conferences will pick up on the idea!


Yaclml in pictures, part II: Templating

After a short intermission and explanation/excuse, let’s go on interesting stuff, Yaclml; and if you still didn’t read the first part, where I wrote about HTML generation and compared Yaclml to CL-WHO, do so now!

Continuing first part‘s focus on comparing Yaclml to Ediware, let’s start by mentioning that CL-WHO doesn’t do templating at all; Edi Weitz wrote a separate library for this: HTML-Template. It provides simple templating support, which turns template of a HTML file (or actually any text file, not limited to HTML) to a closure which, when called, can fill the template with supplied values and output result to a stream. Simple enough, based on Perl’s HTML::Template—which shows in the templating language syntax. The template directives of HTML-Template are embedded in HTML comments.

Yaclml includes the templating feature, and it chose a different path. For me, with a tiny bit of Zope background, their path is a little nicer: they decided to support a Lisp variant of Zope’s Template Attribute Language. TAL is strictly an XHTML and XML templating language, whose directives are special XML tags and attributes, living in a separate XML namespace. I find this approach much more elegant than magic directives embedded in comments. This also makes it possible to use XML/XHTML editing tools (such as Emacs’ nxml-mode and excellent nxhtml-mode built on top of it) to aid in authoring and validation of the code. Supposedly it also plays fine with visual design tools, such as Adobe Dreamweaver, but I can’t confirm that, since I don’t use those.

In this section, I won’t go on comparing Yaclml to HTML-Template, because I didn’t use much of the latter, those two are not as similar as CL-WHO and Yaclml’s HTML generation, and would require non-trivial examples to really see meaningful differences; besides, Yaclml’s TAL is obviously better, and anyone will see it from TAL alone. ;) Seriously, HTML-Template’s documentation is great, it’s a simple package, and you can compare it for yourself. These two packages are similar, main difference being TAL’s focus on XML and XHTML, and syntactic differences are much more of a matter of taste than it is with HTML generation. However, Yaclml, and especially its TAL support, is underdocumented (there was some tutorial over at UCW Wiki, but at the moment it’s inaccessible, and only option to read it is Google’s cache) and it’s really hard to figure out how to use it, especially outside of UCW framework; that’s the hole I’m trying to fill here.

During writing this article, I’ve found that Yaclml is actually somewhat documented, in the qbook format, but the generated, readable docs are nowhere to be found. I’ve rebuilt the HTML files and uploaded them at http://common-lisp.net/~mpasternacki/yaclml-qbook/. These are auto-generated API docs, so it’s not an easy-to-read tutorial, but rather a reference, and some descriptions in there might be dated; however, it may come in handy when you explore Yaclml’s APIs on your own.

Using templates

Let’s start from the basic Hello, World template and see how to render it from Lisp. I will write more about the template language in the next section, but I want to write about the Lisp part first, so that you’re able to run and test more complex examples as you read. Here’s the template:

We can see it’s an XHTML document, which makes it also proper XML, and uses XML namespaces. The namespace tal will refer to the templating language tags and attributes; the tal:content attribute’s meaning is to replace tag’s interior with value of a variable. We’ll get to this later, now we just want to display this.

To render a template from Lisp, we need cooperation of three parts: the generator, the template itself, and the environment. Generator is an object that finds templates by name, and compiles them to efficient closures. Yaclml provides a filesystem generator, which finds templates as files in specified directories (‘roots’), but it’s possible to get templates e.g. from SQL database or from network, by creating a class that would inherit from TAL-GENERATOR or FILE-SYSTEM-GENERATOR. A loaded template is a closure which, when called (with an environment and a generator for finding included templates as arguments) renders the template to *YACLML-OUTPUT*. Environment is mapping from variables used in templates to values.

So, let’s render our template in the simplest way possible:

Here, we define a filesystem generator, a simple environment, then we load the template using generator, and we call it. Simple. Result is rendered into stream defined in YACLML:*YACLML-STREAM* variable (default is T, which makes output go to *STANDARD-OUTPUT*; macros (YACLML:WITH-YACLML-STREAM STREAM &BODY BODY) and (YACLML:WITH-YACLML-OUTPUT-TO-STRING &BODY BODY) can be used to redirect the output elegantly).

There is not much more to say about generators: the only generator type actually provided by Yaclml is FILE-SYSTEM-GENERATOR, which accepts a list of pathnames naming directories that will be searched for templates, and an optional initarg :CACHEP which tells whether to cache already parsed templates.

To get the template closure, we use the generator and LOAD-TAL function. Alternatively, we can compile template directly from file or string, using COMPILE-TAL-FILE or COMPILE-TAL-STRING. To render a template, we simply call resulting closure, passing it an environment and a generator (which is used for finding templates included by a template being called) as arguments.

Finally, we get to environments. These are used to fill in templates; they map variable names used in TAL expressions to values. Environments are lists of binding sets; binding set may be a hash table, an association list, a CLOS object (in this case, key would be a slot name), or anything on which a method for FETCH-TAL-VALUE generic is defined. A new environment can be constructed from key-value pairs using TAL-ENV function, from list of binding sets using MAKE-STANDARD-TAL-ENVIRONMENT, or from two existing environments with EXTEND-ENVIRONMENT. The last of these functions effectively allows to create binding stacks in Lisp code.

Template syntax

As I already wrote, template is plain XML (usually XHTML), and whole logic is done by active tags and attributes. Yaclml maps XML namespaces to Lisp packages (see YACLML:*URI-TO-PACKAGE* variable), so you can easily look up definition of any tag with SLIME (or—if you’re one of those people—using your Lisp vendor’s IDE).

Only package/namespace which actually contains active tags/attributes provided by Yaclml is :IT.BESE.YACLML.TAL, AKA :TAL, attributed to namespace http://common-lisp.net/project/bese/tal/core.

Tags

There is only a handful of tags, so let’s start from them.

tal:tal

This tag is semantically neutral—meaningless, and this is why it’s useful. It is used whenever we want to do something with templates, but don’t want to introduce HTML/XML-level elements. I usually use it to group together a sequence of tags.

For example: if we include a sub-template, and the included template consists of more than one tag, we need a top-level tag to be well-formed XML, and to e.g. set XML namespace:

Other example: I need to conditionalize a sequence of list elements in menu:


tal:lisp

This is the tag that you should never, ever use. Seriously. It is the root of all evil. Cause of mixing MVC layers by introducing logic to templates. However, it was included by Yaclml authors, so I feel obligated to cover it. And, when you’re a programmer and create a structure of templates for further use, or library of widgets, it might have some limited use, and might save some keystrokes. But rule of thumb, when it comes to using this tag, is don’t.

OK, you’ve been warned. Now, here’s how this tag works: it simply interprets tag’s content as a TAL expression, which is actually Lisp code sprinkled with a bit of environment magic. That’s it. I don’t want to spoil you, so no examples for this tag; figure it out yourself (there is not much there to figure out anyway).

tal:include

This tag allows one to dynamically include templates within templates. It requires either tal:name attribute literally specifying name of included template, or tal:name-expression attribute, which is interpreted as a TAL expression, whose result specifies name of included template:

Let’s render these two templates:

But hey! There’s more!

I didn’t yet mention one special XML namespace, xmlns:param="http://common-lisp.net/project/bese/tal/params". It allows you to pass arguments (environment parameters) to included template, which effectively gives you not only parametrized subtemplates, but also template inheritance, a very powerful tool for organizing your templates. Let’s look into Yaclml’s own test suite and see how it works:

We can pass to subtemplates not only plain parameters, we can pass whole HTML subtrees.

Attributes

More interesting work, and actual logic, lives in TAL’s attributes. Let’s look at those.

tal:content, tal:content-as-is, and tal:replace

These attributes insert into their tag (content) or replace their tag entirely with (replace) with a TAL expression. Plain content and replace escape HTML special chars (<">); content-as-is does not escape anything. We’ve already seen those in action.

tal:when and tal:unless

These tags are conditionals. They render the tag they belong to (and its content, of course) when a TAL expression is true (tal:when) or false (tal:unless). We’ve seen those too. Unfortunately, there is no if-then-else construct; this would be hard to encode elegantly in XML.

tal:dolist

Looping construct. Its TAL expression should return a list of environments. Its tag will be executed with current environment extended by each of environments on the list. Let’s see it:

Not most readable or elegant, but expressive and gets the work done. A bit like next attribute…

tal:let

This one, added to Yaclml by yours truly (of which yours truly should be slightly ashamed), extends environment for tag content with variables specified as for LET command. This breaks layer separation almost as badly as TAL:LISP, could be implemented way better (e.g. by using xmlns:param="http://common-lisp.net/project/bese/tal/params" namespace and allowing user to use it in all tags, not only tal:include), and is generally evil. Nevertheless, it’s already commited to Yaclml, and should be documented. Please, don’t look at the following example:

tal:in-package

This attribute sets current package for TAL expressions within its tag. We’ll talk about expressions in a moment, now—a silly example:

Expressions

Yaclml’s TAL expressions are simply Lisp expressions, with all their upsides and downsides. Current package is one set with tal:in-package or, when none was set, package that was active when template was called. There is one difference, though: readtable is modified. Symbols following the $ prefix are looked up in current environment. That’s all. You can (and should not) use all the power of Lisp in the expressions; remember to quote the double quotation marks (the " sign) as &quot; entities. Yes, this gets unreadable and ugly. Simply, don’t overuse it. When in doubt, move logic to Lisp code.

These rules apply for most Yaclml attributes; unfortunately, not for all of them. The ugly exception is the tal:name-expression attribute of the tal:include tag. This tag, and all plain HTML tags, can include TAL expressions, by surrounding it with ${tal-expression}. That’s where the ugly ${$included}.tal syntax in tal:include example came from.

The second form, @{tal-expression}, expects tal-expression to return a list, and resulting string is concatenation of this list’s elements.

Caveats

Yaclml has some issues to be aware of. Here are those that I know; this is probably not an exhaustive list, but it should be useful anyway. So…

Input TAL templates are read and interpreted as XML files. This means that XML comments are discarded before the interpreter even sees them. This usually is a good thing; however, if someone tries to use conditional comments, there’s a nasty surprise: conditionals are eaten by template engine. In the next part, which will be about extending Yaclml, we’ll learn, how to work around this.

Attributes aren’t interpreted consistently: most TAL attributes accept TAL expressions, except the tal:name-expression attribute of the tal:include tag, which is interpreted as plain HTML attribute and needs escaping expressions with ${…} syntax; and tal:name attribute of the same tag ignores every attempt to use any syntax at all.

TAL expressions need to be valid XML attributes, so Lisp has to be quoted. This is especially annoying when you try to use string literals within expressions. However, this has a simple workaround: don’t pack logic into your expressions. If this is annoying, make sure you’re not cramming controller logic into your view layer inside templates.

Summary

TAL is not perfect, but it is a solid, usable and quite optimized code base. For generating HTML and XML documents, it is way more convenient (for me) than text based approaches, as it enforces well-formedness and (to some extent) validity of generated HTML. It is also extensible, about which I’ll write in the next part—stay tuned!