Circular Triangle

Blog

Blog search

Author

Matthew Wilkes is a freelance Python developer, mostly using Zope and Plone. He is based in Bristol, in south-west England, and has a list of alcoholic beverages that he enjoys which this margin is too narrow to contain.

Archives

Categories

DNA and the Police

March 1st, 2010

I’ve mentioned this a few times to various people, but I think the time has come to write up what’s going on.

The background information

On 21st February 2007 one of my parents’ neighbours was found in a cupboard with a ~2 foot barbecue skewer stabbed through his chest. He’d been there for three days and was close to death. He subsequently died in hospital and a murder investigation was started.

Over the course of a year they advertised for witnesses many times, offered rewards and interviewed all his neighbours many times. I wasn’t interviewed as I’d been living in Bristol at the time. Eventually, they phoned me up and asked I’d mind helping them out. It was clear they were scraping the bottom of the barrel.

Two very amiable officers drove down to Bristol and we had a chat in my flat about Les. Just the normal kind of thing, did he antagonise anyone a lot, what was the atmosphere like in his house, and who did I know that went in there. I was very good friends with his grandson, so had been in there a fair bit, including the garage (which was not often cleaned).

The sample collecting

They asked to take my fingerprints and DNA samples so they could exclude me from anything they’d already collected. They asked me if I’d like my samples destroyed after the investigation or kept on the national database, but the way they did so was so funny it stuck with me:

Policeman 1 We can destroy your samples after the investigation, or we can keep them on the database so we can use them again in future.
Me What’s in it for me?
Policeman 1 That sounds like a “no”, doesn’t it?
Me Pretty much. What if I want to commit a serious crime in the future?
Policeman 2 Hang on, you said you’re a computer science student, right?
Me Yes…
Policeman 2 So why are you worrying? There’s no DNA evidence with computer crime!
Me Hmm, still think I’ll pass.

So, that nice bit of banter out of the way, they took my samples and left.

I was not a suspect, I was not detained, I was not cautioned. I invited these policemen into my home and voluntarily gave them a DNA and fingerprint sample to help in a murder investigation because they assured me that is all it would ever be used for.

The fallout

Another year later an inquest ruled that he had accidentally stabbed himself with the skewer, and the investigation was closed. At the end of 2009 I decided to get in contact with the West Midlands police to ask them if they’d been true to their word. Here are some excepts of our 2 month correspondance:

Firstly, on 11th December 2009:

Then, 26th February 2010:

With reference to previous correspondence relating to your request for the removal and destruction of your DNA sample and fingerprints. Unfortunately, the review of the circumstances by the Chief Officer is taking longer than expected. However, I can assure that once complete, you will be notified immediately.

and finally, 1st March 2010:

At this moment in time, all I am able to confirm is that your request is still with the Senior Investigating Officer as the sample you provided is still currently held. Unfortunately, I am unable to comment as to the promise you were given as I was not involved in the investigation.

Lovely, isn’t it?

So, if you’ve ever helped the police and they’ve told you that the information you provided wouldn’t be retained, I’d recommend contacting them.

Personally, I feel like I have been assaulted by two police officers who entered my home on false pretenses. I believe the individual officers were acting in good faith and the force’s administration has let them, and me down.

Update – 24th May 2010

I have received a letter from Suzette Davenport, Assistant Chief Constable for West Midlands police. The following is an extract:

I can confirm that neither your DNA sample nor fingerprints were uploaded to the national databases and will be destroyed in accordance with strict procedures. Your fingerprints will be shredded forthwith, and the 2 mouth swabs will be destroyed by way of incineration as per clinical waste procedures.

It took almost six months after I began chasing this in earnest and it certainly shouldn’t have taken any chasing, but I’m glad it’s finally over. My colleagues seem to think it’s to do with the change of government expediting this kind of back-peddling on civil liberties screw-ups but I’m not so sure the police would be prioritising based on fear of upcoming investigations.

Why WSGI?

January 31st, 2010

Earlier today I tried to write an explanation of why WSGI technologies are useful to developers, but each attempt sounded too much like snake-oil.  When you start to list what advantages you get from applying WSGI best-practises to a problem the results sound fantastic, and frankly unbelievable, but they’re all true.

Caveat emptor: I haven’t properly tested the code in this post, it should all be treated as illustrative pseudo-code. Also, this is really long, you might want to get a cup of tea.

Separation of front- and back-end

There has been a lot of noise about deliverance in the Plone community recently, many consultancy companies have deployed Deliverance-based sites and even plone.org has a deliverance front-end.  The most important advantage of this pattern is that it stops front-end developers being blocked by back-end considerations.

When developing a Plone site a designer can very rarely jump straight into modifying templates on day one.  For one thing, the markup isn’t all together, such as with the portal_tabs navigation bar.  If there is a custom markup needed for these tabs first new viewlets and views need to be created to override the templates being used, and a test instance set up to give the markup guys something to work with.  Otherwise there’d be another significant amount of work to do in integrating it later in the project.

Either way, nobody can see these new tabs in place until there has been some backend developer time allocated to the problem.  When you consider the sheer number of places markup changes happen, this is clearly an untenable situation.

By using a transformational middleware like deliverance developers can start writing production markup on day one of a project. This can (and does!) happen before the backend developers have even finished talking about what technologies to use. This extra time and freedom truly is invaluable, as it means at no point will the look and feel of the site block development of the functionality or vice-versa.

Right from the start functionality and UI can go to humant testers who help develop the automated test suite.  As time progresses and the deliverance theme starts incorporating more rules to integrate with the back-end testing of the integration of the two can begin.  Without this separation components of the UI can’t really be tested in isolation.

Reusable components

There are two very popular Plone packages providing CAPTCHA support, collective.captcha and collective.recaptcha.  To embed a CAPTCHA on your page you include:

<tal:captcha replace="here/@@captcha/image_tag" />
<input type="text" name="captcha" />

and in the code that verifies your form submission you extract the value from the request and pass it back to the captcha view:

captcha = context.REQUEST.form.get("captcha", None)
if not context.restrictedTraverse("@@captcha").verify(captcha):
    raise SpamBotException("CAPTCHA failed")

While there are convenience methods for integrating this with the various places forms are generated in Plone, this is really quite nasty.  While we have a nice, generic method for getting and verifying CAPTCHAs, it is very much tied to Plone and its various forms.

If we were to write this as a WSGI middleware we’d want to take the CAPTCHA generation and checking away from the application, so there are no import dependencies.  In essence, the CAPTCHA field is just a check if the user is human, the simplest one being a simple checkbox.  This isn’t going to filter out many spambots, but it does capture the essence of the problem, and is very simple to include in forms:

<input type="checkbox" name="isHuman" id="isHuman" />

This can be included in any forms readily, as any form library worth its salt can create a simple checkbox, then the backing code merely ensures that it’s a required field. Simples.

However, we’ve got all these nice CAPTCHA libraries, so we create a middleware that grabs this from outgoing requests and replaces it with the full CAPTCHA that we’d otherwise have pulled directly into the form.  When a request comes back through with a CAPTCHA response in it we verify it’s correct and use that to set the value of isHuman.  That would look a little like this:

class CAPTCHAMiddleware(object):

    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        req = Request(environ)
        if "captcha" in req.params:
           value = req.params["captcha"]
           del req.params["captcha"]
           req.params["isHuman"] = self.verify(value)
        res = req.get_response(self.app)
        XML = lxml.etree.XML(res.body)
        captcha = XML.xpath("//input[@name='isHuman']")
        if captcha:
            new = lxml.etree.XML(self.getCapcha())
            captcha[0].getparent().replace(captcha[0], new)
            res.body = lxml.etree.tostring(XML)
        return res(environ, start_response)

This method can then be tested to destruction to ensure that there are no ways of circumventing the CAPTCHA.  All the time the site still works with its naïve CAPTCHA in place.  Any updates to the CAPTCHA product can be done, tested and deployed separately to the main site.

Let’s do a feature comparison:

@@captcha CAPTCHAMiddleware
Can switch to other implementations
Works without customising a form
Works with any form library OOTB
No vendor lock-in

Seamless upgrades of legacy sites

So, there’s a distinct advantage shown above, but it’s not mind-blowing.  The second element still bothers me.  This would mean that although it’s easy to create new sites that use this CAPTCHA middleware it’s not easily backported.  While having this discussion with Alan Hoey of Team Rubber he suggested that he’d not use a new input method, but have a configuration file for which forms need CAPTCHAs added, and simply shield the backend from them.

At first, this sounded like a simple disagreement on implementation, until I realised that it could be elegantly implemented as a second middleware!

In this case, you’d not only have the CAPTCHA middleware I describe above, but a second middleware lower down in the stack which takes a configuration file and adds isHuman checkboxes to other forms.  If the isHuman value in the request is set to False for one of these forms it will simply raise an error.

There is, however, good reason to keep with the original architecture instead of relying on searching for the isHuman checkbox, and that is it allows applications to signal that they’re aware of how CAPTCHAs work and handle them elegantly.  Also, it allows careful positioning of CAPTCHAs in forms, the opportunity earlier to test they are shown everywhere that is needed and easier handling of dynamic forms, such as comments.

This new middleware, however, would make integration into older sites easy, and the feature matrix would start to look like:

@@captcha CAPTCHAMiddleware Two Middlewares
Can switch to other implementations
Works without customising a form
Works with any form library OOTB
No vendor lock-in

We have a winner! By breaking the problem down into small sections we now have a very extensible system that could be applied to new and old sites with a minimum of effort.  We’re not tied to Plone, or even Zope, and both the backend application and the CAPTCHA system can be readily tested in isolation.

It’s this kind of modularisation that provides the great wins for WSGI.  This originally came out of discussion in #repoze in which it was suggested that the environ could hold a special environment variable and call a special function to add a CAPTCHA to a form.  Such a system would fail all 4 of my tests above, and unfortunately many popular WSGI middlewares are written this way.  Taking the extra time and fitting your problem to normal HTTP requests and responses makes for much more re-usable middlewares.

Testing

I’ve mentioned this a few times, but it’s worth really emphasising. Many WSGI middlewares will be barely a hundred lines long, many will be much shorter. The size of the tests for these can be orders of magnitude greater than the code itself. Many people claim that 100% test coverage just isn’t possible, but when writing this kind of function it’s not a very good idea but you can easily write a very comprehensive set of tests in a short period of time.

As the middleware can evolve on its own, separately from the application that spawned it, it can get new tests and releases as part of the development of other sites. This means that old applications see benefits of new client work faster and more often than monolithic applications.

Outside the process

Matt Hamilton’s excellent Lipstick on a Pig talk is a great example of this. WSGI middlewares can be layered onto HTTP proxies, allowing them to work on any backend, Python or not. wsapi4plone can work in the same way, it is completely agnostic of whether it is plumbed directly into Zope by a WSGI stack, or if it’s being proxied out to a different instance.

By following the WSGI best-practises of depending on HTTP requests and responses rather than making Python calls between layers this kind of flexibility is built directly into your system. This means a legacy site can easily have WSGI middlewares layered on-top of it without even restarting the Zope process.

Cool, huh?

Plone 4 faster than Wordpress

January 13th, 2010

“Plone is too slow” – you hear that a lot. Everyone that has used Plone has been frustrated by it at some point. When you want to start up an instance it can seem instant or it can seem like it takes an eternity. The same is true for rendering pages. More recently seeming instant has become more and more common.

That’s because work has explicitly gone into making Plone 4 faster. My friends at netsight ran the Bristol Performance Sprint about year ago explicitly aiming to improve performance.  Plone 4 runs on Python 2.6, has support for BLOBs, and various other changes that all improve performance.

But has it worked?

I have been setting up this site in my spare time and pretty soon it was obvious that I needed a HTTP cache, but it wasn’t the same realisation as usual: I wasn’t waiting for Plone when I realised. The slowest bit of the site was the blog. Now, I don’t claim to be a wordpress expert, so it was an out of the box installation. However, the Plone site is untuned too and that wasn’t as much of an issue. At the request of Alex Limi I did some quick benchmarks of the uncached site:

Wordpress 1.7 requests per second - Plone 2.7 requests per second

More is better ;)

I used apachebench to access the Plone and Wordpress sections of the site 150 times and measured how many requests each could serve. I repeated this both with my theming layer in place and with the standard skin. For both, the differences between the raw and themed versions were so small to be discounted, but the difference between the two platforms is pronounced.

So, the next time somebody complains Plone is too slow, remember Plone 4 is just around the corner.

Python Meme

January 2nd, 2010

Tarek has come up with a quick questionnaire for Python developers, here are my answers:

1. What’s the coolest Python application, framework or library you have discovered in 2009 ?

mr.developer has completely replaced SVN externals for me.  The main advantage is that you can turn development packages on and off on the command line.  That makes it easy to write a buildout that is identical to a production buildout, but that allows a developer to switch out the bit he’s working on with the current SVN version.  This means that each developer only has the code he’s working on checked out, not everything in the project.  Making a quick change to another package drops from being practically free to being a major time investment (check out package, activate it, rebuildout, test), which means that in a multi-developer environment you get communication rather than hacks.

It’s also being used as part of the Plone 4 release process, and I have to say it’s a lot nicer to deal with than the email-the-release-manager system.  Anyone can tell at a glance what releases are needed to make the KGS and anyone can easily manipulate that list in the same way they develop, as it’s just a part of a buildout config.

David rocks. (Sorry, David wrote mr.igor, which is another one I considered)
fschulze rocks.

2. What new programming technique did you learn in 2009 ?

Test Driven Development. Ok, it’s a bit of a cop-out answer, as we all know how to do test driven development, but this year I got good enough at it to be able to use it whenever, including rushing to meet deadlines.  Once you have done enough testing in Python you forget why you found it difficult to begin with.  Not only does it mean that I’m more confident with the code I’m writing as the tests are written without any prior knowledge of how the code works, but it means its easier to collaborate.  While I was at Team Rubber I’d often had to design new parts of applications.  Once the tests, interfaces and docs are written they can be handed off to other people.  Using this pattern lots of interfaces became available for people to work against without the code necessarily being there yet.

3. What’s the name of the open source project you contributed the most in 2009 ? What did you do ?

Plone.  I had the honour of being on the Framework Team for Plone 4, which meant that I have felt closer to the release process for Plone 4 than I have for any other release.  I’ve represented the Plone Foundation for Google Summer of Code and their high-school programme for about 2 years now, have been a foundation member for a little less but being part of in-depth discussions about the future of the project on a regular basis has certainly brought me closer than anything else.  Can’t imagine ever feeling like a core dev with the likes of Hanno and Martin committing every waking minute, though ;)

4. What was the Python blog or website you read the most in 2009 ?

Does Planet Plone count?  I really enjoy the mix there, although I wish there were more contributors, there are some very slow days.  If not, that’d probably be either Chris McDonough’s plope, he posts a great mix of humour, valuable insight and blasphemy against all that we hold dear.

5. What are the three top things you want to learn in 2010 ?

  • How to survive in Germany.  I’m going to be moving here in 2010, and I’d like to be able to get creature comforts, such as real bread.  I’m already planning what to do on Guy Fawkes’ Night.
  • repoze.bfg – I want my next light-weight app to be in BFG, it looks shiny.
  • Some nice new pieces of mathematics, I can feel myself getting rusty in an arts faculty.  Perhaps getting really familiar with group theory.

Back to the blogosphere

December 18th, 2009

Well, it’s only been just over 100 days since I left Team Rubber (and almost half as long again since I last wrote a blog post), so it’s high time I got a new website set up.

So I have.

In the interests of making this at least vaguely technical, I’ve decided to go with a mixture of Plone 4 and Wordpress, fronted by deliverance. Plone 4 being entirely distributed as eggs has made setting it up in a WSGI environment even easier, although there are still a few pain points to do with deliverance and the whole setup, especially as both Wordpress and Plone make it difficult to change the markup used in their portlets, something for which deliverance isn’t well suited.