You are here: Home Articles Python package management
Navigation
OpenID Log in

 

Python package management

by Martin Aspeli last modified Sep 18, 2007 08:32 AM

easy_install, zc.buildout, workingenv, setuptools, virtualenv... are we done yet?

A while ago now, a debate was raging the Plone mailing lists about the relavive merits of workingenv and zc.buildout, two systems to make Python eggs easier to manage. Very briefly, the problem is that setuptools, the system that enables eggs with dependency management, will tend to install packages in the global Python site-packages directory. Sometimes that makes a lot of sense, but when we work with something like Plone, we often want different versions of different packages for different instances, and we don't necessarily want to pollute the global Python interpreter with packages that are used exclusively for one particular Plone instance.

There are roughly two approaches to this problem: Workingenv took the approach of creating a mini-environment inside a particular directory, complete with bin/python, bin/easy_install, lib/python2.4/site-packages and so on. By "activating" this environment, or simply by ensuring that you always used the binaries for python and easy_install inside this directory, you could make sure that eggs installed (whether by running "python setup.py install" or using easy_install) were contained within that environment only. For Zope development, one approach, championed by Daniel Nouri in the ploneenv tool, is to create a standard Zope 2 instance that is also a workingenv.

Buildout, in contrast, ties everything to a single configuration file - buildout.cfg. The buildout tool runs various "recipes" listed in this file. Some recipes will use setuptools to install eggs (and their dependencies) locally in the buildout. Buildout is usually very explicit about which eggs it activates, for example by generating wrapper scripts (such as the ./bin/instance, which starts a Zope instance installed via the plone.recipe.zope2instance recipe) that explicitly activate a particular working set of eggs. Each time you change the config file, you have to re-run the buildout tool to re-create these working sets.

Ian Bicking recently created virtualenv, which supersedes workingenv. He blogged about some of the problems with workingenv (which he now seems to discourage people from using). At the moment, virtualenv doesn't work on Windows or on the standard Python installation on OS X, which is a bit of a problem, but I'm sure it will be overcome. I haven't had a chace to use it yet, but if Ian says it's more appropriate, I trust him. No dobut, ploneenv and other such scripts could quite easily be made to use virtualenv rather than workingenv.

I think Ian summed up the differences between the two approaches quite well on the virtualenv pypi page:

zc.buildout doesn't create an isolated Python environment in the same style, but achieves similar results through a declarative config file that sets up scripts with very particular packages. As a declarative system, it is somewhat easier to repeat and manage, but more difficult to experiment with. zc.buildout includes the ability to setup non-Python systems (e.g., a database server or an Apache instance)

As he notes, buildout is somewhat broader in scope than virtualenv/workingenv. You can use it to run any kind of command, for example to download non-egg packages or create Zope instances or configure a cache server. The buildout tutorial on plone.org explains the concepts in more detail.

Are we there yet?

In my book, I advocate buildout pretty much exclusively. I'm quite glad that I did, especially given that workingenv now seems to be deprecated and virtualenv doesn't yet work on Windows (I'm sure it will eventually, though). More importantly, though, in shared development or deployment scenarios, I think it's very important to have something that's explicit, mangable and repeatable. If you're not careful, it can be hard to know exactly what's in your environment, and what's local and what's global, with the workingenv/virtualenv approach. Setuptools also does not have a very good uninstall story (buildout, which re-calculates the working set of eggs each time it's run, will simply not activate eggs it doesn't need, so there's nothing to install or uninstall).

Buildout also gives me some peace of mind, in that I know I can write a new recipe to solve problems that aren't solved with existing recipes or the egg mechanism. For example, I had to create a recipe to build Deilverance, due to problems with dependencies on some platforms, and a recipe to configure Plone, in order to make the installation of a particular version of Plone predictable and simple. Ideally, neither should be necessary (both Deliverance and Plone could, in theory, just be installed as simple eggs), but for the moment, we need a bit more magic. By writing a recipe, I can isolate this magic, and when Plone does become just as simple as a collection of eggs in the future, I can update the recipe to take advantage of that - the user of the recipe will probably never notice.

However, this is still an over-generalisation. I think the buildout approach works well for Zope/Plone development and similar scenarios, but it would be quite cumbersome to create buildouts for every single Pythons script you ever wrote. I think we need to step back and look at the different use cases for Python package management, and promote appropriate tools for each one.

A standalone program

Say a user wants to install and use a program, which happens to be written in Python. This is where setuptools and easy_install shines. Simply easy_install the package, and you will get the latest version. Using the console_scripts support in setuptools, this will create an appropraite binary (which even works on Windows). For the most part, I think it's appropriate to install this globally (I install Paste Script this way, for example, and use its paster command extensively to create new buildouts and packages, via ZopeSkel).

Some developers will want to have multiple versions of these binaries, e.g. for testing purposes. Here, workingenv/virtualenv is a good aproach, since it creates an isolated mini-environment. Most users won't need this though - they'll just want the latest version. easy_install also makes it easy to upgrade, and to install specific versions if necessary.

A dependency for an ad-hoc script

Developers and integrators may write simple ad-hoc scripts, that are not themselves packaged as eggs. If they need specific libraries, they may use easy_install to get these. Again, they're probably more interested in the latest version.

Using the global easy_install will typically work here. It'd be better if easy_install was better at uninstall, especially when it comes to transitive dependencies. There is a chance that two globally easy_install'ed things could conflict. It may be better to use a workingenv/virtualenv here, though I suspect developers may not always have the patience to do so.

A dependency for a standalone program or library

Developers who create standalone programs or libraries will typically package these as eggs. Dependencies are declared in setup.py. During development, it may make sense to install these using workingenv/virtualenv or manage the application with a buildout, but the final version is just a standalone egg. How it ends up being used depends on what it'll be used for - typically, it'll correspond to one of the other use cases considered here.

Some of the Zope 3 packages actually take a hybrid approach here. They are standard eggs, with a setup.py, but they also ship with  a buildout.cfg file. This means that developers can build a minimal environment with this package and its dependencies, for example to run tests. In the case of a standalone program with console scripts, this approach also lets you get an archive of the program, run buildout and get the executable in a bin/ folder inside the archive. To my mind, this is a little strange, but it should work. I can certainly see the attraction of having something like this in svn to make it easy to run tests.

A library used in a specific, standalone application

This is the Zope/Plone use case. You build a web server and various libraries, and you may need to deploy it to different environments, but it's not necessarily going to end up as an egg that other people will download and grapple with.

Here, I strongly prefer the buildout approach, not at least because it gives a lot of predictability and flexibility. The main downside is that if you want to quickly experiment with a new egg, you need to edit buildout.cfg (or your own package's setup.py) to add it, then re-run buildout. It may be useful to have a standalong workingenv/virtualenv for trial-and-error as well, where you can easy_install anything you want (and blow it away when it goes awry). Some people in the past have experimented with creating a buildout directory that is also a workingenv/virtualenv. I think this gets pretty messy, though, and I wouldn't recommend it.

Conclusion

These are the use cases I can think of right now. I'd be intersted to hear what other people use and what works for them. Also, although I think we have solutions for most of these, I still find there is plenty of scope for confusion, especially when you deal with packages that may come from the global site-packages or a local environment, or "stateful" environments such as workingenv that may or may not be activate at any one time. Having tools to make it easier to manage (upgrade, uninstall, disable, trace dependencies of) packages installed via easy_install in site-packages would actually go a long way, I think.

The only parallels I can think of are .NET DLL assemblies and Java JAR files. I'm not sure these do much better, though. In a recent project, we had somany JAR files in Tomcat's "lib" directory, with such cryptic names and no clear dependency information, that we didn't know what libraries we were actually using. :)

Document Actions

development versus deployment

Posted by Anonymous User at Sep 18, 2007 12:31 PM
This anonymous user is actually Martijn Faassen.

Thanks for doing this overview! I think in discussions like this it is important to consider the difference between deployment for end-user installation and deployment for a developer. In your analysis you touch upon this distinction, but perhaps it needs to be made more explict.

zc.buildout's original and primary intent is to help with development purposes. In the IDE world what zc.buildout manages might be termed a 'project' - something the user of what is being developed typically not sees, but the preferred environment for actually developing the software.

For end-user installation, other methods could be used altogether. You mention distributing a Python package as an egg (or tarball).

Of course many of us end up using buildout for production deployment as well, even though I understand this is explicitly not a primary purpose of buildout. For complicated software like servers, it's often easier to just set up a buildout however, so I suspect we'll see more and more of this usage. I also hope that eventually we'll see some more what makes it very easy to generate installers or packages from
buildouts.

Development and deployment

Posted by Martin Aspeli at Sep 19, 2007 05:55 AM
Actually, I think zc.buildout is a fine tool for building deployment/production systems. In my book, I advocate using a buildout.cfg for development, and a deployment.cfg for deployment, which references and re-uses the relevant bits of buildout.cfg (e.g. egg lists), but sets up ZEO, Varnish and so on.

The Plone installer people are also experimenting with letting the installers essentially create a buildout, since this is a nice and self-contained environment.

bundle

Posted by Anonymous User at Sep 18, 2007 01:32 PM
Don't forget buildutils' python setup.py bundle -- the result of that operation isn't part of any environment or build system at all, it's just a self-contained directory with frozen dependencies. This is more for providing an application to people who don't want to build it at all. I guess it's similar to py2exe or py2app. It certainly could be better (e.g., use system application conventions, like py2exe and py2app do), or less buggy (it's just a few hours of fiddling currently), but I think it usefully fits a different use case.

-- Ian Bicking

couple minor corrections/notes

Posted by Anonymous User at Sep 21, 2007 05:46 PM
with workingenv, you would use your python, but altered the PYTHONPATH as to only use the libs in your env. So no env/bin/python.

virtualenv gives you a new copy of python local to your env (which you invoke ala env/bin/python). The most noticable change is that easy_install doesn't get confused about certain dependencies and their existence.

Worth noting too, buildout now works with workingenv (and probably virtualenv, but I haven't tried). Ploneout for example, works fine within a workingenv: install zc.buildout into the env, run buildout from there and everything works as expected.

-whit
Plone Book
Professional Plone 4 Development

I am the author of a book called Professional Plone Development. You can read more about it here.

About this site

This Plone site is kindly hosted by: 

Six Feet Up