March 11, 2007

Python String Functions

What use are python function decorators? Here is something fun: we can use them for speedy string templates. What follows is a further development of the templet python string-template idea. I find these techniques useful when building HTML and XML pages.

Here is a @stringfunction function decorator:

from templet import stringfunction

@stringfunction
def poem(jumper, jumpee="moon"):
  "The $jumper jumped over the $jumpee."

The @stringfunction decorator transforms "poem" into the following function:

def poem(jumper, jumpee="moon"):
  out = []
  out.append("The ")
  out.append(str(jumper))
  out.append(" jumped over the ")
  out.append(str(jumpee))
  out.append(".")
  return ''.join(out)

The decorator saves quite a bit of typing! In this implementation of @stringfunction, handy ${...} and ${{...}} syntax allows us to embed arbitrary python code, so we can make our templates as powerful as we need them to be. If we want to introduce complicated logic, our code can use "out.append(text)" to build up the string by hand.

The templates we get this way have all their parameters nicely-declared so they are easy-to-call and so misspellings get caught. These function templates are also far faster than the most popular template solutions, benchmarking more than four times faster than Django templates, python's string.Template templates, or the v1 templet classes. And we can use them by importing just one small python module, with no hacking of the import path mechanism or other major surgery.

Does Python Need Standard Templates?

One of the reasons for Ruby's popularity is that there is a simple, natural-feeling templating sytem that is part of the ruby standard library. ERB goes beyond simple substitution, and gives you a way to embed ruby in a string.

Python's built in %-string interpolation and string.Template don't go far enough, because they provide no clean solution for composition and control flow. The first time you find yourself going through the parameters of a long and complicated %, trying to figure out which ones need to be cgi.escaped and which ones don't, you will know what I mean. In 2007, we know that sometimes you really do want some code inside your strings, and that model-view religion should not be an orthodoxy. In the Python world, we have Django templates, Cheetah, and other solutions that have tried to fill the gap.

But, to my taste, none of these systems feel minimal, simple, or particularly natural for python. When I am doing weekend hacking and need a quick HTML UI, I don't really want to introduce a big templating package or a new language with unfamiliar syntax like {{ var|foo:"bar" }}. I just want to be able to put some python in my strings.

A Longer Example

The base-class based template approach I previously discussed is very natural for building HTML UI, but its use of exec (and unbound variables) during template expansion makes it slightly slower than other template solutions.

The decorator approach is much faster. It is also almost as nice for large templates.

Here is a longer example that shows @stringfunction being used in a slightly less trivial context, with a little bit of code and composition:

from templet import stringfunction
import cgi

# An example bit of data to display
class FavoritesInfo:
  def __init__(self, user):
    self.user = user
    self.favorites = []
  def addFavorite(self, url, description=None):
    self.favorites.append((url, description))

@stringfunction
def favoritesPage(info): r"""
  <html>
  <head><title>${info.user}'s Favorites</title></head>
  <body>
  <h1>${cgi.escape(info.user)}'s Favorites</h1>
  <ol>
  ${{
    for fav in info.favorites:
      out.append('<li>')
      out.append(buildLink(fav[0], fav[1]))
  }}
  </ol>
  </body>
  </html>
  """

@stringfunction
def buildLink(url, description):
  """<a href="${cgi.escape(url, quote='"')}">$description</a>"""

info = FavoritesInfo('Dave')
info.addFavorite('http://popurls.com/', 'PopURLs')
info.addFavorite('http://news.google.com/', 'Google News')
print favoritesPage(info)

How it Works

The @stringfunction decorator is a subversion of python's documentation strings. The decorator transforms a function's __doc__ string into executable python code, and then returns a function with the same signature as the original function, but with the code replaced by the generated code.

The templet module code below supports @stringfunction and a similar @unicodefunction decorator. It clocks in at a brief 64 lines of code.

The full version here also includes unit tests and more docs, as well as including support for the StringTemplate and UnicodeTemplate base classes that I have discussed previously.

import sys, re, inspect

class _TemplateBuilder(object):
  __pattern = re.compile(r"""\$(        # Directives begin with a $
        \$                            | # $$ is an escape for $
        [^\S\n]*\n                    | # $\n is a line continuation
        [_a-z][_a-z0-9]*              | # $simple Python identifier
        \{(?!\{)[^\}]*\}              | # ${...} expression to eval
        \{\{.*?\}\}                   | # ${{...}} multiline code to exec
        <[_a-z][_a-z0-9]*>            | # $<sub_template> method call
      )(?:(?:(?<=\}\})|(?<=>))[^\S\n]*\n)? # eat some trailing newlines
    """, re.IGNORECASE | re.VERBOSE | re.DOTALL)

  def __init__(self, constpat, emitpat, callpat=None):
    self.constpat, self.emitpat, self.callpat = constpat, emitpat, callpat

  def __realign(self, str, spaces=''):
    """Removes any leading empty columns of spaces and an initial empty line"""
    lines = str.splitlines();
    if lines and not lines[0].strip(): del lines[0]
    lspace = [len(l) - len(l.lstrip()) for l in lines if l.lstrip()]
    margin = len(lspace) and min(lspace)
    return '\n'.join((spaces + l[margin:]) for l in lines)

  def build(self, template, filename, s=''):
    code = []
    for i, part in enumerate(self.__pattern.split(self.__realign(template))):
      if i % 2 == 0:
        if part: code.append(s + self.constpat % repr(part))
      else:
        if not part or (part.startswith('<') and self.callpat is None):
          raise SyntaxError('Unescaped $ in ' + filename)
        elif part.endswith('\n'): continue
        elif part == '$': code.append(s + self.emitpat % '"$"')
        elif part.startswith('{{'): code.append(self.__realign(part[2:-2], s))
        elif part.startswith('{'): code.append(s + self.emitpat % part[1:-1])
        elif part.startswith('<'): code.append(s + self.callpat % part[1:-1])
        else: code.append(s + self.emitpat % part)
    return '\n'.join(code)

def _templatefunction(func, listname, stringtype):
  globals, locals = sys.modules[func.__module__].__dict__, {}
  if '__file__' not in globals: filename = '<%s>' % func.__name__
  else: filename = '%s: <%s>' % (globals['__file__'], func.__name__)
  builder = _TemplateBuilder('%s.append(%%s)' % listname,
                             '%s.append(%s(%%s))' % (listname, stringtype))
  args = inspect.getargspec(func)
  code = [
    'def %s%s:' % (func.__name__, inspect.formatargspec(*args)),
    ' %s = []' % listname,
    builder.build(func.__doc__, filename, ' '),
    ' return "".join(%s)' % listname]
  code = compile('\n'.join(code), filename, 'exec')
  exec code in globals, locals
  return locals[func.__name__]

def stringfunction(func):
  """Function attribute for string template functions"""
  return _templatefunction(func, listname='out', stringtype='str')

def unicodefunction(func):
  """Function attribute for unicode template functions"""
  return _templatefunction(func, listname='out', stringtype='unicode')

An explanation of _templatefunction, which drives the code generation:

  1. First it grabs the global namespace within which the original function was defined. We need that because if you write code that accesses another symbol in that module (e.g., an imported module name, or another function in that module), we want to resolve it in that namespace. It also makes a temporary (empty) local namespace for the generated code.
  2. Then it generates a pseudo-filename to use when generating error messages and so on. This includes the original module's filename, if we can find it.
  3. Then we generate code. We begin by generating a "def" statement exactly like the original. The trick is to use inspect.getargspec(func) and inspect.formatargspec(args) to grab and reproduce the signature for the function. The body of the code is generated by the _TemplateBuilder class. That class just rips apart templates using regular expressions, realigns spaces, and emits appropriate "out.append" code. In our generated function, we bracket this code with "out = []" and "return ''.join(out)".
  4. Finally, we use "exec" to define the generated function, and then we grab the generated function out of the temporary namespace to return it.

Is this Pretty or Ugly?

The @stringfunction decorator produces string functions that are somewhat reminiscent of PTL-style templates, allowing templates to be declared with normal python function syntax. Yet it does it in a much simpler way, without introducing a whole new python compiler, using just a single small module.

Part of the reason @stringfunction is so easy to implement is that it subverts Python's documentation strings for its own tricky ends. To me it feels like, for string template functions, a string that appears as the function body makes "more sense" as a template string than as a documentation string, and that the template string is a fine self-documenting description anyway. But maybe misusing documentation strings like this is distasteful.

For the pythonista's out there: does this strike you as the right way to string templates in python? It is certainly fast and simple. I think it is pretty readable, too.

Posted by David at March 11, 2007 01:37 PM
Comments

Your templatized example looks almost like Ruby:

def poem(jumper="cow", jumpee="moon")
  "The #{jumper} jumped over the #{jumpee}."
end

Posted by: Ed at March 14, 2007 12:21 AM

'%(key)s' % dict does most of this. You're going to a lot of trouble to reinvent the wheel, don't you think?

Python, the language in which everybody eventually creates their own bastardized version of php and calls themselves clever for it, sheesh...

Posted by: John James at March 14, 2007 12:53 AM

def poem(jumper="cow", jumpee="moon"):
print "The %(jumper)s jumped over the %(jumpee)s." % locals()

One line shorter than yours, and nothing to import.

Posted by: Ryan at March 14, 2007 01:52 AM

Sure, you can use %, but there are reasons it falls short. A few more thoughts on "Why another template system?" are in the previous post:

http://davidbau.com/archives/2007/02/18/python_templates.html

Posted by: David at March 14, 2007 04:08 AM

John: How would you get this behavior using '%(key)s' % dict:

"${cgi.escape(info.user)}"

or...

"""${{
for fav in info.favorites:
out.append('')
out.append(buildLink(fav[0], fav[1]))
}}
"""

David's solution is very elegant and simple for not only doing dictionary interpolation (like when using %), but for doing real templating and embedding of Python code.

Posted by: Steven Kryskalla at March 14, 2007 03:19 PM

>>> data = {'won' : 'one', 'too' : 'two' }
>>> print "Python is simple. %(won)s and %(too)s. See?" % dict([(k, f(v)) for ((k, v), f) in zip(data.items(), [len, lambda x: x.strip()])])
Python is simple. 3 and two. See?
>>>

Posted by: Anonymous coward at March 16, 2007 12:17 AM

Yes Steven, embedding code in templates is a fantastic idea. It greatly improves both the readability and testability of your code - just look at the wonders PHP has done in that regard! Oh, wait...

I have trouble getting excited about little hacks like these, as elegant as they might at first appear. Building anything non-trivial using this type of construct and you're asking for trouble, but NIH is arguably Python's most pervasive problem thanks in so small part to its expressive power. A blessing and a curse, that.

Posted by: John James at March 16, 2007 12:57 PM

At the risk of entering a religious debate, I'll bite. In your opinion, what is the best way to build HTML UI in Python?

I'd be the last to suggest putting _all_ your app code into a template. On the other hand, I do think humility is warranted. There is a lot of well-written PHP. And there are lessons to learn from the success of code-in-text in PHP, Ruby, JSP, and ASP. Sometimes the popular way is also the right way.

Not all code wants to be split from presentation. Some code really is part of the UI and belongs right in your templates.

Posted by: David at March 18, 2007 08:59 AM

I like to keep my .html and my .py seperate, but I like this.

I am using mod_python, and I'll usually write a .html template with variables like e.g. %%LIST, and then pull something like:

foo = for x in y: string.join[x]

open(template).read().replace('%%LIST',foo)

for whatever variables I've got, and then pass the resulting text to Apache to send out to the web.

I just want to throw out my crappy, most likely inefficient tactic in comparison to the guy's lambda function up there. I've been trying to figure that (idea, not particular example) out for weeks, and still don't really grasp it.

Personally, I always thought python's greatest strength was it's simplicity and readability. I'd say you're pretty much winning. :)

a lot of times for me, reinventing the wheel on my own terms is kind of easier than figuring out the depths of a new, existing module. I'm lazy, I guess.

Posted by: Japhy Bartlett at September 14, 2007 01:23 AM

Personally, if I ever needed some serious string interpolation in the past I resorted to Ruby or Perl, even going to the extreme of running the Ruby interpreter from within Python code just for this thing!!! I don't really like the templating system in Django either and was kind of hoping that all this (i.e. the issue of variable interpolation) would have been addressed in Python 3k. It seems as though the purists out there aren't willing to yield to my pragmatic tendencies. In the meantime, this is a great, lighweight solution when I am coding in Python.

Posted by: Ric at December 18, 2007 02:42 PM

Thanks David,

That is a very nice module. I was writing a little generator with lots of small string interpolation. I switched it over to templet and the result is far more readable.

I considered using ZopePageTemplates or Zope DTML, but both of those are very XML orientated. The template would be unreadable. The templet result is very nice.

The generator is here, with the templates in the templates folder:
http://zope3tutorial.googlecode.com/svn/trunk/generator

Thanks

kevin

Posted by: Kevin at January 24, 2008 05:42 PM
Post a comment









Remember personal info?