XML configuration vs wiring
Lessons from Java and Zope
History has a tendency to repeat itself, in particular in computing. In striving to invent the perfect frameworks, developers end up doing a number of balancing acts - control vs. convenience; size vs. speed; maintainability vs. ease of getting started. As frameworks grow and the teams that build them get more experienced, they tend to change the way they're leaning among these and other axes.
XML is a technology that has been at the centre of many such balancing acts over time. Let's consider its role in a few contexts:
- The Java Enterprise stack. That's enterprise with a big E. I'm talking about things like EJBs and inversion-of-control containers like Spring here.
- Zope, and in particular the Zope Component Architecture that is alpha and omega in Zope 3.
- Grok, the no-longer-quite-so-new kid on the block in the Zope work. Grok is very anti-XML (or rather, anti-ZCML, which is the XML-based configuration language for the Zope Component Architecture).
SEparation of concerns
Few concepts have had such sway over the way people program as the mantra of "separation of concerns". Whether you're talking objects, aspects or procedures, a thing should do one thing only, and leave it to another thing to do other things.
When I was learning object-oriented programming, and more generally got to understand how I would work with frameworks (as opposed to writing command line programs with a main() function), I was for a while confused by one question:
- "Where do objects come from?"
Or rather, factories and singletons and other frameworky things. These things are usually configured in some way. For example:
- When using Spring or another Inversion of Control container, we ask a factory to get hold of an object. That object will inseminated... I mean, injected... with other components that fulfill some interface contract, usually acting as "services". There is a registry of services that says what implementation is to be used for each interface at the moment, how objects are re-used and so on.
- In the Zope Component Architecture, we may adapt an object from one interface to another or look up a utility (a context-less component) by its interface and possibly a name. In both cases, the Component Architecture looks up the correct implementation to use in a registry. My application code is blissfully unaware of the implementation that's used.
- When using a web MVC framework, a URL may be resolved to some View that is rendering some Model. The framework usually looks up the view in some kind of registry.
This is where the XML comes in. XML is great because it's very easy for computers to parse and pretty easy for humans to understand. Since we are all breast fed HTML these days, it looks pretty familiar and we don't ask too many questions about why we have to close tags with slashes. As such, it is pretty natural to implement the configuration of "the registry" as one or more XML files. Spring and Zope both use XML to configure their registries.
Separating out this wiring into separate files has many benefits. It is easier to change things, for example by exchanging the file for a different one. We don't get tempted to do overly clever things that break separation of concern and re-use, as we may be if configuration was done in code. Configuration files can also be read by tools that help us visualise and manipulate the configuration data. In general, I'm all for dumb configuration languages.
Flexibility vs everything else
The first problem with this kind of strict separation is that it gets confusing. A developer has to look in at least two places to figure out what's going on. Wonder why that view is acting funny? Step 1: look up the name in the XML file and find the corresponding implementation. Step 2: Find the implementation. Sometimes tools alleviate this. Usually, they only do that partially. I'm a firm believer that if your framework isn't usable without GUI tools, you need a different framework.
The second problem is that very often, there's a large amount of YAGNI - You Ain't Gonna Need It. People don't tend to replace components deep in their system very often (or ever), unless they're writing frameworks. So why optimise for that use case?
Note that I said optimise. Separation of concern is still hugely important. It makes your code better. It makes it easier to test and mock. Asking yourself "could I swap this out without breaking everything else" is a very good test of whether your code is properly isolated. And perhaps most importantly, knowing that you can swap things out in a standard, well-defined way means that your code is at least more likely to be re-usable. I'm all for component registries, so long as they make it easier to solve the immediate problem.
The configuration usability curve
On that test, I think that both Spring and the Zope Component Architecture do very well (the EJB standard perhaps does less well). However, that's sometimes in spite of, not because of their strict separation of the wiring of an application into separate files.
Let's imagine a curve (I'm too lazy to draw it). On the X axis we have project complexity. To the left is a simple application you knock up in a day. To the right is a multi-team project that runs over several months. On the Y axis we have developer usability - how easy is it to get something done?
For this type of framework, the curve normally starts pretty high. A well designed framework should be easy to get started with. If you write some code in one file and some XML in another, you still only have two files and you've got pretty good control over where that code is and what it does.
As your project grows, though, the curve drops off. Suddenly, you have two very long files. Or one configuration file and ten code files. You start spending your time trying to keep the two pieces organised and separate. You don't always know where to look for stuff. You invent your own conventions, but you can't quite keep them consistent. Trying to organise your code and make sure it's maintainable and testable is hard enough. Having to organise two things in parallel is no fun.
Things actually pick up again a little when you've got a huge project. Huge projects require lots of process. Demanding strict separation of code and wiring can help enforce certain good patterns. However, you should expect to spend a lot of time educating your developers about what your project's conventions are and how to follow them.
CONfiguration in code
To address these issues, framework authors started to create optional ways to write the configuration in code. Normally, this uses some kind of code annotation. Java has a pretty flexible annotation system that can annotate classes, methods and instance variables. Things are not quite so well formalised in the Python world, but Grok has settled on a "syntax" that is quite easy to understand.
For example, in "plain Zope 3" you may write:
class SizeOfFoo(object): implements(ISize) def __init__(self, context): self.context = context def __len__(self): return len(self.context.data)
This component, which would be used to adapt an "IFoo" object to an ISize interface, could be registered in ZCML thus:
<adapter for=".interfaces.IFoo" factory=".sizes.SizeOfFoo" />
With Grok, you'd these two things in one:
import grok class SizeOfFoo(grok.Adapter): grok.implements(ISize) grok.context(IFoo) def __len__(self): return len(self.context.data)
A special "grokker" will look through the code and find these registrations. In this case, it will look at the base class to determine that it should register an adapter, and look at the "directives" at class level to determine what the adapter is adapting from and what interface it provides.
The chief advantage of this approach is that I now only have one file, and my configuration follows my code. I haven't broken separation of concerns, because I can still override the adapter registration with another (then perhaps in a separate configuration file). I can still avoid running the configuration machine (the grokker) to get proper isolation in tests. But we've optimised the conventions for immediate developer usability, reducing boilerplate and keeping things obvious.
Does convention over-configuration scale?
This way of designing frameworks is often called "convention over configuration", and was popularised by Ruby on Rails. CoC usually goes further, too, for example by making assumptions about things based on file names or object names, or the location of code on the filesystem
Here be dragons.
Writing a good CoC framework requires some very opinionated, experienced and sensible designers. The kind of people that are in touch with Joe Average Programmer and feels their pain. The kind of people who've tried to teach a complex, fine-grained framework to people who're trying to learn general programming, web programming and programming in the framework in question, all at the same time. It is very easy to be too clever by half and make things convoluted and difficult to understand.
In my observation, good CoC means:
- Focus on consistency of concepts
- Spend time on error reporting to make things easy to debug
- Don't try to infer everything.
- Make sure everything can be done the explicit, long-winded way still.
- Make sure you can "escape" the CoC when you don't want it, without having to drop everything and start from scratch.
- Don't try to use CoC for things that really are configuration. For example, if you need to define the connection string for a database, that belongs in a configuration file, not in code.
In Java land, using annotations gives us a structured way to achieve much of this. I think Grok are doing a great job with it too. I'm personally working hard to bring this type of approachability to Zope 2 and Plone.
Which brings me to why I thought to write this in the first place. I've been watching Chris McDonough publicise repoze.bfg, a new, Zope-based framework that tries to reconceptualise Zope in a world of WSGI and middleware. Many things about repoze.bfg excite me. It looks pretty lightweight, it looks easy to integrate with other Python frameworks. It has some nice, sound design principles.
However, I wish that repoze.bfg would try to re-use elements of - and more importantly, the conventions (or "syntax") of - Grok (like grokcore.component and grokcore.view) as well as elements of Zope proper. Views and components are still defined in ZCML. I wonder if repoze.bfg won't come full circle soon and start looking for ways to escape from having to wire up things in a separate configuration file, when all you want is to write a view and go home.