James Antill - Why trusted third party repos. will always be a bad idea
Apr. 3rd, 2009
06:37 pm - Why trusted third party repos. will always be a bad idea
Why not make third party repos. first class
Every now and again, someone takes a look at apt/yum/zypper/smart/PK/whatever and decides that although they have support for third party repos. it's "too annoying" for third parties to get users or for users to use them and so this is a problem which needs to be fixed. Another way this is presented is that the package managers should support "One Click Install". I will hopefully explain (once and for all) why this isn't a problem, and what third parties can do to get what they actually want (to make their users lives easier).
This is not a new problem
A long time ago now, I used Debian-2.2 on my desktop+server and it was good. But then the desktop seemed old and luckily for me Ximian came out with a GNOME desktop for Debian-2.2. So trusting that Ximian had tested everything, and worked to get the latest nice desktop bits into my stable Debian-2.2 I added their repo. and upgraded. I noted they they were replacing large parts of the desktop, and not just adding packages, but this was pretty much what I expected so it didn't bother me. And life was good, I now had a stable system and a new desktop.
Then came the point where I wanted to "distro. upgrade" from Debian-2.2 to Debian-3.0, so I did what everyone does "apt-get update && apt-get dist-upgrade" and I expected that the Ximian stuff was probably going to get lost, as this is what happened on all Red Hat CD updates I'd ever done. What actually happened though was apt-get tried to resolve dependencies for a while, and then said it couldn't do it and gave up. Life was not good.
The core problem is distributed database synchronization
The core problem is that "package management" is actually "database management", where moving from pkgA-1 to pkgA-2 is more about database synchronization than anything else. So when you add a "third party repo." you now have "distributed database management/synchronization". In simple terms. this means that you can test that Debian-X to Debian-Y works, or Fedora-X to Fedora-Y but this testing will not apply to Debian-X1 or Fedora-X1, and while you could expand testing to cover those *-X1 cases (at great cost) noone knows how to make it work for all of -X1, -X2, -X3, ... -XN.
Ignoring the DB we still have the trust problem
Even if we could magically solve the distributed database problem there are significant problems with splitting your database beyond a certain point. At the inevitiable endgame, let's say that all the "large" applications have their own repository (openoffice, evolution, gnome, firefox, kde, apache-httpd, postgresql, mysql, etc. etc.) Now whenever you want to update your view of this distributed database you have to contact N different repositories instead of the 2 you have for Fedora. Making this usable for 10 repos. is a significant amount of work, making it usable for 100 or even 1,000 repos. is a huge amount of work.
Also the quality control of the packages that can get onto your system is now distributed, because the quality of any package on the system is the minimum of that applied on all the repositories you have available. This is also true of the reliability/security, so instead of a single point of failure for most users you have N single points of failure.
But what about "simple" packages and "semi-trusted third party repos"
One solution to the giant distributed database problem is to have "trusted third party repos." not participate in the database. They say something like "I need at least LSB-blah" but have no other dependencies. In theory this is workable, but to do this someone has to write a lot of code in all the packager managers that handle these semi-trusted repos. And even after these code changes to sandbox the packages from the semi-trusted repos. you still have a significant portion of the trust problems to do with managing a lot of repos. and dealing with the network etc.
But IMO the most damning problem with this approach is the number of uses it could be put to, because the packages within these semi-trusted repos. will have much less features than first class packages used in trusted repos. This means that each application in these repos. would have to have it's own copy of everything outside of LSB, so you might end up with 10 copies of FOO until it gets into the next LSB. And the semi-trusted repo. would have to deal with security updates for all of those things itself (and you have to trust it to keep doing so, in a timely manner).
What about if I just ignore all of the above
Even if you ignored or solved all of the above problems, a core point remains that you need a chain of trust starting from your main distribution. This could be Fedora installing the *-release file for the repo. you want, or some large amount of new code to do basically the same thing. The problem is that Fedora doesn't want to provide that chain of trust, for legal and other non-technical reasons, so you are back at the same place you've started from.
So why all the code to support multiple repos.
There are a number of cases where multiple repos. work well, and so is a useful feature to have. However in these cases the extra repos. should rarely be classed as "third party". For instance Fedora has a "release" repo. and an "updates" repo., to cut down on metadata, and an "updates-testing" repo. so users can easily turn that on or off. RHEL also contains many extension repos. for specific applications like clustering, or updated MySQL ... but again any problems here can be solved by changes to the "main" repo. and these specified sets of repos. are tested together.
Something that isn't obviuosly an extension repo. is the rpmfusion repos. for Fedora. But in reality they try and act as much like an extension repo., for US legally problematic packages, as possible. For instance while they are controlled outside of Fedora (and so, from the Fedora infrastructure POV are third party and untrusted) some of the people who control them are part of the Fedora community and so do the integration testing and can help fix any problems caused by using them with Fedora. They also have similar package review guidelines to Fedora, so the quality aspect is maintained as much as possible. However even with these constraints there are still some problems due to the database being distributed and not synchronized.
It's also somewhat common for specific companies/groups to internally have repos., either for custom builds of distro. applications or addon applications they wrote/bought. However these act exactly like extension repos. in that someone is charged with doing all the integration testing and they have the trust problem solved due to it being controlled by them. These repos. can also easily be integrated into the installation, so they don't have the problems people are trying to work around with third party repos.
Possible solution for third parties
This does imply a possible solution for random third party repo. providers that want this problem solved, pool your resources and join the upstream community (to help with integration). An obvious choice is to join the work the rpmfusion community is doing (and also join the Fedora community). This way 100 or 1,000 third party package providers could all provide automatic updates etc. but with some implied level of QA/trust/etc. so that users could tell the good third party from the bad. This would significantly improve the user experience of getting packages from that third party, helping both sides. Another useful property of this is that nothing has to change in any of the package managers.