James Antill - Understanding groups in yum
Aug. 7th, 2008
11:40 pm - Understanding groups in yum
Every now and again someone will will ask a question about groups in yum that amounts to:
If I do "groupinstall xyz" and then I do "groupremove xyz" why do I not end up where I started.
The main problem here is thinking of "groups" as objects that are installed and/or removed by yum, in fact currently the only way yum stores data on your computer is:
- via. it's caches of network data (can be removed at any time)
- via. it's log file (can be removed at any time)
- via. it's transaction log (can be removed at any time, although yum won't be able to recover from transaction errors)
- via. rpm
...and given that rpm only installs pacakges and that doesn't include extra package data like which "groups" the package is in, it's easy to realize that yum cannot perform the groupinstall/groupremove operations using that model (even if that seems like the "correct" thing to do).
What really happens then?
The simple way of thinking about it is that each group is a collection of package names, and on a groupinstall/groupremove yum collects all the package names in the group and tries to install each one (or remove each one). This works exactly as if you had run groupinfo, put all the names in a text file and run that through the "yum shell" command, there is very little magic in how this works and is a simple model (once you forget the "obvious" model as above). One way of thinking about it is that groups are more like "tags" in del.icio.us or livejournal etc.
There is a little more complication in that each group actually has four lists of package names, but groupinfo also displays the different lists in the groups so again the model is the same as the text file example.
So as you might expect from the above: if you have x, y and z installed; then groupinstall "foo" which contains a, b and y; then groupremove "foo" -- you'll end up with x and z.
But what about if we just add some magic?
The next question people ask (usually without fully understanding the above) is something like ok so groupremove will remove things I had installed before a groupinstall ... but if I do "groupinstall GRP1 GRP2" and then "groupremove GRP1" it should be easy to just keep any packages in GRP1 and GRP2, no?
And this does sound easy to implement, just only remove files that are only in GRP1. Except that packages can be in any number of groups. So consider the first example again, the question implicitly assumes that "x" getting remove should be based on whether "x" happens to be in another group or not. This little bit of magic in yum would then become very magic for the user, and it would be very hard to tell what a groupremove command is going to do.
But, but, usability! Do what the users expect!
In my opinion most of the problem here is with the way applications present the concept of "group" to the user, including yum itself (although noone would like the new command name for the cmd line client). GUI package management applications that use yum group definitions shouldn't present the user with an option to "install group X" instead the operation should be presented to the user as "install/remove all packages in group X" with the option to (de)select only parts of the list.
The other thing to do is for people to fix their groups so that "common" applications aren't in weird groups, so that groupremove on those groups is a useful operation again.
Cool, groups are interesting but I can't control the groups from Fedora/etc.
Actually you can, the full list of groups in yum is taken from all the enabled repositories. This means that if you create an empty repository with a group file in it, you can create your own groups! Just like tagging your own URLs in del.icio.us