21 September 2011

(Long) I'd like to see a Linux package manager co-operate with development

An understandable choice

Now, this is all understandable. They don't call development versions "the bleeding edge" for nothing. If a package manager has to choose between blindly trusting locally built software and blindly ignoring it, it should ignore it.

And trying to make it all work together would raise problems of communication and coordination. How is a package manager supposed to know what it needs to know? And even if it knows, how is it supposed to coordinate with "make install"?

Does CheckInstall fix it?

Not for me. CheckInstall wants to watch "make install" and create a package (deb, RPM, others). The idea is that then you use that package to install and uninstall.

  • Creating a package every time I "make install" is a very heavy mechanism. It's not really for developers, it's for small distributors.
  • I'd forget to use it when I "make install", then what have I got?
  • The way I work would confuse it. I typically configure to install into usr/local/stow. Which leads me to the next section.
  • It's a roundabout way of doing things.

Stow could help

What Stow does

If you know about Stow, you probably should skip to Could a package manager work thru stow?

Stow is a Perl package by Bob Glickstein that helps install packages cleanly. More cleanly than you might think possible, if you're familiar with traditional installation. It works like this:

  • You install the package entirely inside one directory, usually a subdirectory of usr/stow or usr/local/stow We'll say it's in usr/local/stow/foo-1.0 No part of it is in bin or usr/bin or usr/doc etc. Every file lives under usr/local/stow/foo-1.0

    With most builds you can arrange for "make install" to do this by passing ./configure an argument like

    --prefix=/usr/local/stow/foo-1.0/
    
  • To complete the install, just:
    cd /usr/local/stow/ 
    stow foo-1.0
    
  • That makes symlinks from usr/doc, usr/bin, etc into the stow/foo-1.0 directory tree. No file is physically moved or copied out of stow/foo-1.0
  • Now the package is available just as if it had been installed.
  • Want it gone? Just
    cd /usr/local/stow/ 
    stow -D foo-1.0
    
  • Want it completely gone from your disk? After the above, you can just:
    rm -r foo-1.0
    

This is neat in every sense of the word. It also can manage multiple versions of a package neatly.

Must it be stow?

Maybe you don't like Perl. I don't either, though I've coded in it and I do like CPAN and a fair bit of stuff that's written in Perl.

stow is not the only tool for this, it's just the first. There are a number of variants or offshoots: Graft, ln_local, Reflect, Sencap, Toast. They mostly seem to be in Perl as well. One exception is Reflect, which requires only bash and coreutils. Unfortunately, Reflect appears to be abandoned.

So the idea here is not so much an application as a file-location protocol. Another tool could do the same job on the same directory. You could even change tools later on.

General approach

It's easy for me to say "it uses stow", but that leaves a lot of little issues. I'll tackle them one by one below, but first I'll outline a general approach.

Desiderata:

  • It needs to work safely even if interrupted. So all the relevant information needs to be stored in the filesystem.
  • It shouldn't require any component to understand details that are not its own concern, following the general rule that components should do one thing well.
  • If possible, it shouldn't require any component to do much more than it does now.
  • It should place little or no extra constraint on development
    • Especially it should place no constraint on how to develop packages that one's package manager is not interested in.
  • It should not make security holes.

File-naming scheme

So I will base this mostly on a file-naming scheme:

  • A package named "FOO" relates uniquely to /usr/stow/FOO/
  • The files of version N.N of a package FOO live under /usr/stow/FOO/N.N/
    • It's a slightly stronger version of the usual stow subdirectory naming practice. The difference from usual practice is just a level of subdirectory; it's FOO/N.N/ instead of FOO-N.N/
    • Almost no format is assumed for N.N, except that it can't start with a single colon.
    • I'm reserving double or more initial colons for situations where the version name wants to start with a colon.
  • FOO-related files and directories that are not themselves installed ("Magic" files) live in /usr/stow/FOO/
    • Their names all start with a single colon.

Specifically unaffected so far:

  • Most stow functionality.
  • Most package manager functionality. It mostly needs to build paths accordingly.
  • Naive local packages that live as usr/stow/FOO-N.N (no slash). They just are not understood as versions of FOO and thus receive no benefit.

stow2

I will talk about stow2, a hypothetical variant of stow. It's old stow with a few minor extra behaviors. It doesn't aim to be a package manager. It doesn't watch out for system coherence as a package manager does. In this vision, it's merely the part of the package manager that handles physically adding or removing a package. It just aims to do software-stowing for either a package manager, a developer or both without inviting problems.

Specific issues

What about shared config?

I mentioned sharing config above. Config shouldn't disappear or get overwritten when changing versions, or when changing how versions are managed.

So configuration data that should persist across versions will live under /usr/stow/FOO/:config/. It's stowed or unstowed just like a normal stow directory. There difference is in how it's used:

  • It is meant to contain data that ought to remain unchanged across versions, such as user preferences.
  • It initially contains very little. I am tempted to say nothing at all, but the case of config triggers makes me unsure.

Unaffected:

  • Unaware packages will just operate on config data where they think it lives, which is where it's stowed from /usr/stow/FOO/:config/.
  • Developers can treat config data for their builds by their own lights.
  • stow2 can stow this like any other directory tree.
  • Maybe package format, which would not need a separate space for config data if there isn't any.

Altered responsibility: A package manager should:

  • When installing or updating
    • Create the FOO/:config/ directory if it doesn't already exist.
    • If it already exists, don't alter it.
    • In any case, arrange for the package to update the config data, eg to add settings for new flags.
  • When told to remove a package's config data, unstow that directory before deleting it.

How the updating can be done is another issue, and this post is already long. I'll just say that I have a vision of farming off the (re)config control role entirely to make, at some stage higher than stow2, in such a way that developers and package managers can both use it.

What if a user unstows an installed package?

OK, everything works neat while the package manager is the only thing moving. But what happens when the user/developer starts using stow as it was meant to be used? He doesn't hurt anything by stowing new packages, but what happens when he unstows an installed package. . .:

I propose solving the unstow case by adding:

  • New flag: Whether a given stow directory is under control of a package manager or not.
  • New behavior:
    • Let stow2 refuse to unstow if that flag is set, unless forced.
    • If forced, inform the package manager.

So reserve the filename /usr/stow/FOO/:managed-by

  • If it doesn't exist, that means no package manager considers FOO installed.
  • If it exists, it is a symlink to a package manager that considers FOO installed.
    • I don't specify what part of the package manager. It might be useful to be a link to an executable.
    • A package-manager, as package itself, has to manage versions of where that symlink points
      • Maybe by pointing it towards an unchanging location.
      • Maybe by understanding previous installed versions of itself.
  • When stow2 stows, it should create ../:managed-by pointing to itself.
  • stow2 should refuse to unstow if ../:managed-by exists and doesn't point to itself, unless forced.
  • (Stows that require unstowing another version are really the unstow case)
  • If stow2 forces an unstow, it should erase ":managed-by".
  • stow2 should support an filename argument that means "Act for this package manager".

What if a user stows a package?

I said "He doesn't hurt anything by stowing new packages". But we wanted more than harmlessness. We wanted locally built packages to be basically on the same footing as installed packages; at least, we want the user to be able to make that happen without real extra work.

But that invites us into a potentially complex system of dependencies and signing. Again, stow2 shouldn't have to concern itself with that but it also shouldn't make a mess.

For dependencies etc

For some concerns, we're just going to have to bite the bullet and say that the developer must talk to the package manager in its own format. Dependencies are one example.

Let's reserve the directories /FOO/:control for all package control data about FOO. Now, various package managers might have their own formats, and different dependencies etc might apply to different versions. So let's further reserve any file or directory /FOO/:control/PMNAME/N.N for a package manager named PMNAME with regard to package FOO version N.N. Here we can't do much about name clashes, but it's just between a few package managers.

Sometimes the same control data applies to many versions of the same package, up to version identity data. Particularly in development, the version might change frequently but the developer shouldn't have to create new control data each time. So let's reserve /FOO/:control-default and /FOO/:control-default/PMNAME.

So a conforming package manager PMNAME:

  • Must recognize files of the form /FOO/:control/PMNAME/N.N and /FOO/:control-default/PMNAME even if FOO is not a package it knows about.
  • Similarly, must recognize files of /FOO/:control-default/PMNAME
    • For them, find all FOO/N.N and consider each a version of FOO.
  • Must treat all such package versions as available or installed, as appropriate.
    • Even where this requirement conflicts with security measures it uses for external packages, such as digital signing.
  • May use its own internal control format, except insofar as it conflicts with other requirements. For instance:
    • The requirement to associate :control-default/PMNAME data with multiple versions, which might conflict with a "version" field in control data.
    • The requirement to treat such packages as available, which might conflict with digital signing.
  • Is allowed but not required to distinguish such packages, for instance by:
    • presenting them differently
    • presenting their dependency chains differently
  • Is recommended to provide a means in control data for indicating particular packages as unstable. Generally package managers already provide this.

But this still requires that the developer create control data in the package manager's format. Let's make it a little easier for him and require any package manager that has FOO-N.N to create /FOO/:control/PMNAME/N.N on request. Generally that's simple. Then the developer can adapt that instead of starting from scratch.

Digital signatures

A package manager generally guarantees package integrity by checking signatures. They are considered part of the control data.

We could require the developer to cause each "make install" to be signed. But this would not make /usr/stow/ even slightly more secure. Someone already put the code in question into it (usually by "sudo make install") Any misbehavior they wanted to do was already possible.

So I just say, trust the local versions. They belong there - or if somehow they don't, it's beyond what a package manager can provide security against.

The exception is if some user:

  • Has write permission into /usr/stow/
  • Does not have permission to stow
  • Does have permission to run the package manager
  • Can stow via the package manager. Perhaps it has the sticky bit, or he can sudo it but can't sudo stow.

So a package manager ought not to consider local versions installable by a user who can't directly run stow. It seems difficult for that situation can arise.

What about name collision?

What if two packages have the same name? That wasn't an issue when we assumed a package manager, which assumes a naming authority that can manage the namespace. And what about the flip side, if one package has two names at various times?

There's often no problem, but when there is, the developer has all the power to solve it. He can name his local stow subdirectories anything he pleases (directly or thru ./configure --prefix=). So it's up to the developer to manage the stow namespace in harmony with his favorite package manager.

Now, if more than one package manager is operating on the same system, there could namespace problems. But that wasn't even possible before, so nothing is lost.

What about bootstrapping?

How do you use stow2 if it's sitting in /usr/stow/stow2/1.1 unstowed? This is a solved problem: To bootstrap, call stow2 by its absolute path. Its support apps (Perl interpreter or w/e) also would be used by absolute path if they haven't been installed. That's all.

What about system bootstrapping?

Some Linux boot systems require some boot software such as the kernel to live in a small1 (<100 Mb) partition at the front of the hard disk. You can't symlink from /boot/ into /usr/stow/, but there's already a well-known solution. You make a stow directory there, eg /boot/stow/, and you put such software in it.

One would have to define a mapping that the package manager understands, and affected systems would have to be configured, and affected packages tagged, but for most packages and systems it's no work.

What about an unbuilt package having requirements?

So I solved most of the obstacles, including "they cannot satisfy requirements", but not "they cannot have requirements". There's a place for them in control data, but:

  • It pertains to the wrong stage. Generally you need packages for the build stage, not after installation. If you as a developer only find you need some other package after installation, you just fetch it.
  • At the time /FOO/:control/PMNAME/N.N is read, FOO/N.N doesn't exist and the package presumably isn't available thru the package manager. So it would be fruitless for a package manager to try to install FOO-N.N.
  • It misses an opportunity to learn requirements from the configure stage.
  • It misses a wider opportunity for other apps to communicate their needs to a package manager. For instance, as a simple way for an app to pull in optional support that's not bundled with it (hopefully only with user approval)

"BUILD" packages don't suit development requirements either.

This one is unrelated to anything stow2 should do. It is properly the domain of package managers as requirement managers. Here I suggest:

  • A common file format for representing requirements, not particular to any package manager.
  • A canonical flag to invoke a package manager with regard to such a file, like:
    aptitude2 --wanted autoconf2-201109212117-1.requirements
    

The format seems to require at least these fields:

  • Name of requesting app
  • Date requested
  • List of desired packages, each giving:
    • Name
    • Version required
    • What exactly is wanted, eg app, static lib, dynamic lib, dev headers.

Because there is no one package manager in control, name collision and name splits are now an issue. In particular, virtual package names have little to go on.

One approach is to allow the package name to be represented in multiple ways, each with respect to some managed namespace. Like:

(("debian" "sawfish") ("redhat" "sawmill"))

Total requirements

So totalling up the requirements,

  • A file-naming scheme, consisting of:
    • One or more STOW directories
      • Zero or more package names
        • Zero or more version names (each a directory tree)
        • :managed-by (a symlink)
        • :installed (a symlink)
        • :config (a directory tree)
        • :control
          • Zero or more package-manager names
            • Zero or more version names
        • :control-default
          • Zero or more package-manager names
        • :stowing (a symlink)
  • A format for listing required packages, not specific to one project manager.
  • stow2 must
    • Respect ../:managed-by wrt any directory it (un)stows from.
    • Impersonate a given package-manager, by filename argument, to ../:managed-by.
    • Otherwise behave like stow
  • Package managers must
    • Respect /FOO/:managed-by
    • Recognize /FOO/:control-default/PMNAME
    • Recognize /FOO/:control/PMNAME/N.N
    • Create /FOO/:control/PMNAME/N.N on request
    • Obey a tag-to-stow-directory mapping, part of its own config.
    • Understand the above required packages format
    • Obey command-line flag --wanted
  • A common means for developers and package managers alike to cause config updates. I suggest leaning on make.
  • That's all.

Footnotes:

1 I just called 100 Mb small. There was a time when we all called that "huge".

2 comments:

  1. Hello,

    I came across this piece by way of Axis of Eval, and I have the following comments:

    You have essentially outlined the way that third-party package managers work on OS X. Since the UNIX software on OS X is not intended to be upgraded by users, Mac package managers have had to maintain packages that do not trample on the system's software, creating a system like you have described out of necessity. There is an extra onus on the user to intelligently manage their PATH variable, but it allows an advanced user to use both vendor and development versions of software in an easy to manage way.

    I would suggest looking at the Homebrew project [1], which, aside from their terrible advice about avoiding `sudo`, has the best implementation. It is essentially a shortcut for classic Unix administration best practices:

    * Install software to PREFIX/name/version
    * Link executables, libraries, and shared files to single global hierarchy, like /opt/brew
    * Prepend or append /opt/brew/{s,}bin to your PATH as you wish.

    This allows the filesystem itself to serve as the package database, which allows for a simple implementation and provides transparency.

    WRT to config management, this is indeed a hard problem, but I think Arch Linux has it right: [2]

    * If a user hasn't touched a config file, he doesn't have much invested in it, so go ahead and modify/delete on upgrade.
    * If the config file has been modified, don't upgrade it for the user on update; save the new file as conf.pacnew, and let the user manually merge the file using vimdiff, or whatever.
    * Ditto for updates that remove config files: rename obsolete config files that the user has modified to conf.pacsave

    Again, this places some of the onus of config management on the user [3], but it also allows for simpler implementation and prevents surprising behavior; note that git has a similar philosophy regarding merge conflicts.

    These two points don't address your whole post, but I thought I'd bring them to your attention. I left Debian last year for Arch Linux because I was tired of fighting with the package manager; my solution to running dev software on Debian was to install everything of interest from source. When this crossed the line to 10+ projects, I was exhausted and jumped ship.

    I would love to see Debian adopt something like you're suggesting, however. A rolling release system like Arch is _fantastic_ for development, but when I'm installing a server that needs to stay up for years with minimal maintenance, I want Debian holding down the fort, without having to install half the system from source.

    Cheers,
    guns


    [1]: http://mxcl.github.com/homebrew/
    [2]: https://wiki.archlinux.org/index.php/Pacnew_and_Pacsave_Files
    [3]: But really, how many Linux users aren't programmers of some sort?

    ReplyDelete
  2. @guns Interesting, I did not know that existed for OS X. Thanks for the links.

    ReplyDelete