Lisp HTML sanitizer

Lately, I was thinking a lot about enabling webapp users to edit rich text easily while staying secure and injection-free.  Until recently, I would just use trane-bb module of CL-Trane, and make users type BBCode inside a textarea, since many users are familiar with it, and I’d be able to easily convert their BB to safe HTML.  However, all JavaScript WYSIWYG editors provide HTML code, which is not that surprising.  I googled around and read a bit on all the issues related with BBCode, Textile and other markup languages, and came to agree with John Atwood (Is HTML a Humane Markup Language?) on HTML being the actually friendly, single markup language.  I was pleasantly surprised to see Bese‘s fork of Franz‘s phtml actually support HTML sanitizing, and (having contributed quite a bit to Bese a few years ago) not surprised at all that this feature is not actually described or documented anywhere.  So, if you’re worried about accepting HTML (and if you’ve decided to accept HTML from users, you should be worried!), check this out:

darcs get http://common-lisp.net/project/bese/repos/parse-html/

2 thoughts on “Lisp HTML sanitizer

  1. I’ve blocked out most of this weekend to think about how to sanitize any and all inputs into a webapp I’m working on. (My first lisp app). But in my case, I’m not providing text areas to anyone, it is just figuring out the most efficient way to run whitelist checks on everything. I don’t think it is any easier than dealing with a textarea because you can always have some malicious type try to post stuff.

  2. Yes, that’s big pain – anyone can post just about anything, and it’s my problem how to handle it. This time, I have to allow basic formatting, so I have to parse HTML to disallow injections, XSS, and so on; for simpler cases, I usually just use regexps.

    If a whitelist is all you need, you have a few ways: you can use simple list of strings, if there are just a few possible values; if there are more values, you can use a hash table. Sometimes I like to use packages instead of a hash table – you create a new package, that doesn’t use any other (including CL), use INTERN or READ to put symbols in the package, and FIND-SYMBOL to check if symbol exists. And if there are many possible values, or the check is complicated, I’d use database anyway, so I’d leave consistency checking there.

    HTH :)

Comments are closed.