You are viewing illiterat

James Antill - Software packaging, and the 10 × 10 problem

Jun. 20th, 2007

05:09 am - Software packaging, and the 10 × 10 problem

Previous Entry Add to Memories Share Next Entry

First off, I hate all current software packaging for Linux. It's one step up from manually downloading tarballs, which I was doing 10 years ago, and isn't even a real superset of that functionality. Yes, it's better than doing that and yes it's better than what Windows has to offer. But it's still crap, and it annoys me every day, however it seems to be one of those problems that noone seems to want to fix properly and yet they keep thinking up stupid bandaids to work around the fact it doesn't work well. Specifically the major circular argument that is what I've come to think of as the "10 × 10 problem"…

Repeatedly over the last 5+ years I've tried to tell people about how we should be solving this problem, and I've mostly got dismissed … so here it is for posterity.

One of the currently major problems with software packaging and distribution is updating it. All software has bugs, or missing features and those affect different people in different ways. Say you have two people, one who is writting a new web application in python using postgresql and one who is just browsing the web, sharing some files via. the web server and running his own blog. It probably seems obvious to most sane people that the updates these two people want will vary wildly for the five packages: firefox, python, ipython, postgresql, apache-httpd. That is, unless you are in the business of distributing software packages.

With software packaging/release/whatever the arugment is most often about who gets screwed over least, in the above example a common "solution" is to do very minor updates for firefox and maybe ipython but nothing else … want firefox-2, or the latest postgresql/python? Too bad. Wait for the next "major" release and get those with everything else. A common painful package is the Linux kernel, mainly just because everyone uses it, and so all the software packagers argue about how often it should/shouldn't be updated. This is roughly the same as arguing what single human language would be best for the entire OS, if everyone should use python or C to develop all applications or what font everyone should use for all text. There is no correct answer, because the entire question is completely insane.

When I first thought about this, I was visiting/speaking with quite a few different Linux customers and I came up with the general idea of the "10 × 10 problem". The general statement of the problem is that you have 10 customers, each of which have 10 different "changes" they want for the next release. The current "solution" to the problem involves picking a number between 0 and 100, doing that amount of changes and giving the result to everyone so that you have a very small number of "releases" which can be managed by hand. This rarely makes anyone happy, and is basically what Microsoft; Sun and SCO have done for the last 10 years.

Obviously in the real world things are more complicated than the above, one of the most common differences is that each of the 10 customers has changes that are important enough they don't want to wait for "the next release" and they'll also want some of the changes you've done for the other people but also have desires like "apart those changes, I don't want anything else to change". This also destroys any hope of there ever being a magic number of changes you can do, that will actually make everyone happy. And yet, still we get arguments about what the best magic number is.

The "solution" to this problem is to admit defeat and stop managing software packaging and distribution like it's 1995, imagine for a moment that each "customer" had their own private software packaging and distribution team and used that instead of paying someone outside to do the work for them. In this senario it's very likely that all the high priority changes would be done immediately, and that all 10 changes would make it into "their" next release with no other changes. Now imagine that all those private teams communicated openly with each other, this would result in a similar outcome with the added benefit that significant changes done by one team that are desired by others would be merged into their releases.

The common complaint about why this "can't be done" by an external entity is that managing so many releases by hand is "hard" and would cost too much money. In my so humble opinion this is a bit like saying that managing more than one version of a C source file is hard and time consuming, as against implementing some decent SCM tools so that you can manage 100s of 1,000s of different version sets. The next common complaint is that the entire OS needs to be tested as one unit, which is mostly untrue and esp. so when you are talking about changes the average customer desires … and again, any truth in this is mostly a lack of decent tools.

The reason this could never have happened over the last 20 years is that the code itself was hidden from the customers, so they had no choice except to take what they were given by their software packaging and distribution company. This is not the case now, and I've already started to see the emergance of people implementing the good solution privately because they can't buy it from anyone. I'm just waiting for someone to give people what they want, and charge for it ... and I hope I work for the company that does it.

As a final optimistic note, I have helped the real solution come a tiny bit closer to reality. With Fedora 7 you can now install the yum-security package, and do things like "yum --security update -y" to get just security updates or "yum --bz 1234 update -y" to get just the updates which fix bugzilla.redhat.com#1234.

Current Mood: disappointeddisappointed

Comments:

[User Picture]
From:fanf
Date:June 20th, 2007 07:25 am (UTC)
(Link)
A big problem is cascading dependencies, where upgrading app A pulls in an upgrade to library B which is not reliably backwards-compatible, so you have to upgrade app C to a version compiled against the new B - but you didn't want C to change.

The solution to this is to allow multiple parallel installations of packages, so that the two versions of B are not forced to conflict. This is useful for applications as well as libraries. If I can install the new version of A alongside the old version, then it's really easy to revert the upgrade - I just have to change a PATH or a symlink and restart it. I can also test the upgraded A (at least a little bit) before committing to running it. Package deinstallation then becomes a garbage collection job, instead of being crucial for the correctness of the package management tool.

OpenPKG sort-of does this, but it does so by allowing multiple installations of the package manager, so it's rather coarse-grained which means you lose sharing between apps whose version dependencies you want to decouple.

Nix claims to do exactly what I want, but I have not investigated it in detail...
[User Picture]
From:illiterat
Date:June 20th, 2007 02:12 pm (UTC)

Dependancies aren't as big a problem as assumed

(Link)
A big problem is cascading dependencies, where upgrading app A pulls in an upgrade to library B which is not reliably backwards-compatible, so you have to upgrade app C to a version compiled against the new B - but you didn't want C to change.

Kind of, there are certain packages which tend to depend on newer core libraries but it's not obvious how much you'll just be able to sign off on the deps. too. For instance large GUI based ones, like evolution, often depend on newer libs. ... but I think in the real world it'd be a pretty strange requirement that you have the latest evolution but libgtk etc. can't change.

Much more likely is that you treat applications as groups with most or all of their deps. The easiest cases are things like "I've install fish, am on the mailing list and so want the latest version of that" ... or the "web developer" problem where you don't want a 2-5 year old PostgreSQL/Apache-httpd on an OS that you'd otherwise classify as "very stable".

I'm prepared to be proven wrong though :), and I do think solving the multiple versions problem (while hard) is doable … and the solution is very likely to add useful features to the package manager/distribution layer. But I might not start out trying to solve that problem first :).

I haven't heard of Nix, although I have played with rPath a bit and that's a significant step forward. I'd looked at OpenPKG before and the "pipe everything into sh" part scares me a bit, also while it seems like it has a couple of nice things for solving some of the problem it doesn't seem to attack the problem itself (so managing N slight variations of Fedora, say, wouldn't be significantly easier for the consumer or producer).

[User Picture]
From:fanf
Date:June 20th, 2007 02:40 pm (UTC)

Re: Dependancies aren't as big a problem as assumed

(Link)
Right, treating apps and their deps as a group is basically what I'd like to achieve. It's easy to install a new app and pull in its dependencies automatically if you can be certain that this will not disrupt a running system. If each dependency is a separate package then you get sharing where possible.

Regarding your Evolution/gtk example, I'd say that if you are a heavy user of a gtk application then you don't want your work disrupted by a forced upgrade to the latest version just because Evolution wanted a new gtk. So the problem is not that gtk can't change, it's that you need to keep the old version around to support the important application.

The main difficulty with solving the multiple versions problem is that autoconf and libtool are built on the assumption that there's one of each thing on the system installed in a "usual" place. Fixing build scripts so that they work when you break this assumption is a tedious and soul-destroying occupation.
From:(Anonymous)
Date:January 10th, 2008 10:56 pm (UTC)

Re: Dependancies aren't as big a problem as assumed

(Link)
You've identified the problem correctly - libraries aren't backwards-compatible. The solution you propose is not so good. Why not "fix the problem"? Come up with an API/policy to FORCE libraries to be backwards compatible? Essentially, if a new function needs to be added, or an existing function must be modified, create a new function name and add it to the API. Errors in existing API functions can be corrected without renaming. Or somesuch policy.
[User Picture]
From:illiterat
Date:January 11th, 2008 05:53 am (UTC)

Re: Dependancies aren't as big a problem as assumed

(Link)
Why not "fix the problem"? Come up with an API/policy to FORCE libraries to be backwards compatible?

Even if that was "just" very hard, how would you force libraries to do it? It's like saying the solution to a problem is to get all web pages to use white text on a black background. And a lot of libraries are backwards compatible at an interface API/ABI level, but that doesn't help as sometimes the only way to be 100% backwards compatible is to not change.

Essentially, if a new function needs to be added, or an existing function must be modified, create a new function name and add it to the API. Errors in existing API functions can be corrected without renaming. Or somesuch policy.

And what do you do when the function is called from 6 other functions, that you also export ... do you now have to create 7 new functions to do a 1 line change? And then what do you do for the applications that call the function which has changed behaviour ... do they call the new function or the old one? What about for the two scripts calling the applications, one of which wants old and one new? Pretending you can solve all the problems with a single result just doesn't work.