Earlier today I tried to write an explanation of why WSGI technologies are useful to developers, but each attempt sounded too much like snake-oil. When you start to list what advantages you get from applying WSGI best-practises to a problem the results sound fantastic, and frankly unbelievable, but they’re all true.
Caveat emptor: I haven’t properly tested the code in this post, it should all be treated as illustrative pseudo-code. Also, this is really long, you might want to get a cup of tea.
Separation of front- and back-end
There has been a lot of noise about deliverance in the Plone community recently, many consultancy companies have deployed Deliverance-based sites and even plone.org has a deliverance front-end. The most important advantage of this pattern is that it stops front-end developers being blocked by back-end considerations.
When developing a Plone site a designer can very rarely jump straight into modifying templates on day one. For one thing, the markup isn’t all together, such as with the portal_tabs navigation bar. If there is a custom markup needed for these tabs first new viewlets and views need to be created to override the templates being used, and a test instance set up to give the markup guys something to work with. Otherwise there’d be another significant amount of work to do in integrating it later in the project.
Either way, nobody can see these new tabs in place until there has been some backend developer time allocated to the problem. When you consider the sheer number of places markup changes happen, this is clearly an untenable situation.
By using a transformational middleware like deliverance developers can start writing production markup on day one of a project. This can (and does!) happen before the backend developers have even finished talking about what technologies to use. This extra time and freedom truly is invaluable, as it means at no point will the look and feel of the site block development of the functionality or vice-versa.
Right from the start functionality and UI can go to humant testers who help develop the automated test suite. As time progresses and the deliverance theme starts incorporating more rules to integrate with the back-end testing of the integration of the two can begin. Without this separation components of the UI can’t really be tested in isolation.
Reusable components
There are two very popular Plone packages providing CAPTCHA support, collective.captcha and collective.recaptcha. To embed a CAPTCHA on your page you include:
<tal:captcha replace="here/@@captcha/image_tag" />
<input type="text" name="captcha" />
and in the code that verifies your form submission you extract the value from the request and pass it back to the captcha view:
captcha = context.REQUEST.form.get("captcha", None)
if not context.restrictedTraverse("@@captcha").verify(captcha):
raise SpamBotException("CAPTCHA failed")
While there are convenience methods for integrating this with the various places forms are generated in Plone, this is really quite nasty. While we have a nice, generic method for getting and verifying CAPTCHAs, it is very much tied to Plone and its various forms.
If we were to write this as a WSGI middleware we’d want to take the CAPTCHA generation and checking away from the application, so there are no import dependencies. In essence, the CAPTCHA field is just a check if the user is human, the simplest one being a simple checkbox. This isn’t going to filter out many spambots, but it does capture the essence of the problem, and is very simple to include in forms:
<input type="checkbox" name="isHuman" id="isHuman" />
This can be included in any forms readily, as any form library worth its salt can create a simple checkbox, then the backing code merely ensures that it’s a required field. Simples.
However, we’ve got all these nice CAPTCHA libraries, so we create a middleware that grabs this from outgoing requests and replaces it with the full CAPTCHA that we’d otherwise have pulled directly into the form. When a request comes back through with a CAPTCHA response in it we verify it’s correct and use that to set the value of isHuman. That would look a little like this:
class CAPTCHAMiddleware(object):
def __init__(self, app):
self.app = app
def __call__(self, environ, start_response):
req = Request(environ)
if "captcha" in req.params:
value = req.params["captcha"]
del req.params["captcha"]
req.params["isHuman"] = self.verify(value)
res = req.get_response(self.app)
XML = lxml.etree.XML(res.body)
captcha = XML.xpath("//input[@name='isHuman']")
if captcha:
new = lxml.etree.XML(self.getCapcha())
captcha[0].getparent().replace(captcha[0], new)
res.body = lxml.etree.tostring(XML)
return res(environ, start_response)
This method can then be tested to destruction to ensure that there are no ways of circumventing the CAPTCHA. All the time the site still works with its naïve CAPTCHA in place. Any updates to the CAPTCHA product can be done, tested and deployed separately to the main site.
Let’s do a feature comparison:
|
@@captcha |
CAPTCHAMiddleware |
|---|
| Can switch to other implementations |
✔ |
✔ |
| Works without customising a form |
✘ |
✘ |
| Works with any form library OOTB |
✘ |
✔ |
| No vendor lock-in |
✘ |
✔ |
Seamless upgrades of legacy sites
So, there’s a distinct advantage shown above, but it’s not mind-blowing. The second element still bothers me. This would mean that although it’s easy to create new sites that use this CAPTCHA middleware it’s not easily backported. While having this discussion with Alan Hoey of Team Rubber he suggested that he’d not use a new input method, but have a configuration file for which forms need CAPTCHAs added, and simply shield the backend from them.
At first, this sounded like a simple disagreement on implementation, until I realised that it could be elegantly implemented as a second middleware!
In this case, you’d not only have the CAPTCHA middleware I describe above, but a second middleware lower down in the stack which takes a configuration file and adds isHuman checkboxes to other forms. If the isHuman value in the request is set to False for one of these forms it will simply raise an error.
There is, however, good reason to keep with the original architecture instead of relying on searching for the isHuman checkbox, and that is it allows applications to signal that they’re aware of how CAPTCHAs work and handle them elegantly. Also, it allows careful positioning of CAPTCHAs in forms, the opportunity earlier to test they are shown everywhere that is needed and easier handling of dynamic forms, such as comments.
This new middleware, however, would make integration into older sites easy, and the feature matrix would start to look like:
|
@@captcha |
CAPTCHAMiddleware |
Two Middlewares |
|---|
| Can switch to other implementations |
✔ |
✔ |
✔ |
| Works without customising a form |
✘ |
✘ |
✔ |
| Works with any form library OOTB |
✘ |
✔ |
✔ |
| No vendor lock-in |
✘ |
✔ |
✔ |
We have a winner! By breaking the problem down into small sections we now have a very extensible system that could be applied to new and old sites with a minimum of effort. We’re not tied to Plone, or even Zope, and both the backend application and the CAPTCHA system can be readily tested in isolation.
It’s this kind of modularisation that provides the great wins for WSGI. This originally came out of discussion in #repoze in which it was suggested that the environ could hold a special environment variable and call a special function to add a CAPTCHA to a form. Such a system would fail all 4 of my tests above, and unfortunately many popular WSGI middlewares are written this way. Taking the extra time and fitting your problem to normal HTTP requests and responses makes for much more re-usable middlewares.
Testing
I’ve mentioned this a few times, but it’s worth really emphasising. Many WSGI middlewares will be barely a hundred lines long, many will be much shorter. The size of the tests for these can be orders of magnitude greater than the code itself. Many people claim that 100% test coverage just isn’t possible, but when writing this kind of function it’s not a very good idea but you can easily write a very comprehensive set of tests in a short period of time.
As the middleware can evolve on its own, separately from the application that spawned it, it can get new tests and releases as part of the development of other sites. This means that old applications see benefits of new client work faster and more often than monolithic applications.
Outside the process
Matt Hamilton’s excellent Lipstick on a Pig talk is a great example of this. WSGI middlewares can be layered onto HTTP proxies, allowing them to work on any backend, Python or not. wsapi4plone can work in the same way, it is completely agnostic of whether it is plumbed directly into Zope by a WSGI stack, or if it’s being proxied out to a different instance.
By following the WSGI best-practises of depending on HTTP requests and responses rather than making Python calls between layers this kind of flexibility is built directly into your system. This means a legacy site can easily have WSGI middlewares layered on-top of it without even restarting the Zope process.
Cool, huh?