Tehom: Thoughts on the structure of Klink Libraries

Klink Libraries

Previously

I blogged some ideas about Library paths for Klink, the Kernel implementation I wrote. I listed several desiderata, based on lessons from the past. I also blogged about how I'd like bridge the gap between package as used by developer and package as used by user.

Additional desiderata

(The desiderata from the previous blog post, plus:)
Should co-operate well with development. Switching from use to development shouldn't require gross changes or compete with library support.
Can fetch libraries automatically with reasonable power and control
- In particular, automatable enough to support "remote autoloading" but ultimately should be under the user's control.
Support clean packaging

Fetching libraries: mostly lean on git

The well-loved version manager git provides most of what I'd want, out of the box:

Co-operates well with development (More than co-operates, that's what it's usually for)
Reasonably compact for non-development. You can clone a repo with depth=1
Fetching
- Via URL (git protocol or otherwise)
- Doesn't treat URLs as sexps - only a mild problem.
Finding out what's there to be fetched, in the sense of available versions (eg, looking for latest stable release)
```
git ls-remote --tags URL 
```
- But we have to distinguish tags and tags^{, which AIUI don't
  refer to versions.}
Secure digital signatures are easy
- Creating them
```
git tag -s
```
- Verifying them
```
git-verify
```
- Some pages about them:
  - http://learn.github.com/p/tagging.html
  - http://progit.org/book/ch2-6.html
Excluding local customizations from being updated
- This is possible with .gitignore and some care
- But customizations will live somewhere else entirely (See below)
Practices supporting stable releases. git-flow (code and practices) does this.
NOT a well-behaved heterogenerated tree of libraries.

Of course git does not support knowing that a repo is intended as Kernel code. Looking at filename extensions does, but that seems to require fetching the repo first. For the same reason, it can't easily be any file that "lives" in the repo. It should be something about the repo itself.

So the convention I propose is that the presence of a branch named --kernel-source-release indicates a branch of stable Kernel code. Tags on that branch would indicate available versions, so even if coders are working informally and doing unstable work on "master", only tagged versions would be offered.

But does keeping --kernel-source-release up to date require extra effort for the maintainer? IIUC, git can simply make --kernel-source-release track "master", so if a coder's workflow is organized, he needn't make any extra effort beyond issuing a one-time command. Branch tracking is intended for remotes, but seems to support this.

Should there be other branches, like --kernel-source-unstable or --kernel-source-development? I suspect they're not needed, and any use of unstable branches should be specifically configured by the daring user.

I'm not proposing to permanently tie Klink (much less Kernel) specifically to git forever. But it serves so well and is so well supported that I'm not concerned.

Where to put it all?

That addressed how we can fetch code. In doing so, it put some restrictions on how we can organize the files on disk. So I should at least sketch how it could work on disk.

The easy part

Of course one would configure directories for libraries to live in. Presumably one would distinguish system, local, and user.

The hard part

A mature set of libraries tends to look like a tree. But above, we had to treat each one a separate project.

Approaches:

Let the namespace tree exactly be the directory tree.
- Fatal con: Code repos want to "own" all their subdirectories. git will try to control everything in its working tree. It even wants submodules to covary with the working tree.
Have a dotty namespace where all the components are expressed in one name. So directories have names like "std.math.util.raw"
- Con: Yuck! There's everything wrong with this.
Reserve a name for subdirectories and .gitignore it.
- Con: Every level of subdirectory becomes two levels.
- Con: Still yuck.
Require that directories must be either inner nodes or leaves, never both. If a project directory becomes an inner node, move code in it to a directory with a special name, say "%main".
- Con: Requires moving subdirectories, possibly behind the back of other software.
Library filespace and project filespace are distinct. Stow the projects into a library tree. They live in some other directory as configured by user.
- Pro: Clean.
- Pro: Flexible.
- Pro: Lets local customizations live outside the project repo, as they should.
- Con: How is stow to know where a project's "root" in filespace is?
- Con (that the other approaches had too): Internal project filetree organization can still mess us up. What happens if its code lives in "src" instead of in "."?

I favor the "stow" approach.

Path configuration

But the stow approach still left issues of where exactly to stow things. We can't solve it in the file system. That would result in one of two ugly things:

Making each project represent the entire library filespace, with its real code living at some depth below the project root.
Making each project physically live in a mirror of the target filespace. This would have all the problems we were avoiding above plus more.

So I propose per-project configuration data to tell stow about paths. I'd allow binding at least these things:

prefix: The library prefix, being a list of symbols.
parts

For example,

((prefix (std util my-app))
   (parts
      (
         (source
            [,,src,,]
            ())
         (source
            [,,tests,,]
            (tests))
         (info
            [,,doc,,]
            ())
         (default-customizations
            [,,defaults,,]
            ())
         (public-key
            [,,pub_key.asc,,]
            ()))))

That would live in a file with a reserved name, say "%kernel-paths" in the repo root. As the example implies, the contents of that file would be sexps, but it wouldn't be code as such. It'd be bindings, to be evaluated in a "sandbox" environment that supported little or no functionality. The expressions seem to be just literals, so no more is required.

Dependencies and version identity

Surfeit of ways to express version identity

There are a number of ways to indicate versions. All have their strengths:

ID hash
- Automatic
- Unique
- Says nothing about stability and features
Release timestamp
- Easily made automatic
- Time ordered
- Nearly unique, but can mess up.
- Says nothing about stability and features
Version major.minor.patch
- Just a little work
- Expresses stability
- Expresses time order, but can be messed up.
Test-satisfaction
- Lots of work
- Almost unused
- Automatically expresses stability and features
- No good convention for communicating the nature of tests
`stable', `unstable', `release', `current'.
- Expresses only stability and currency
By named sub-features
- Just a little work
- Expresses dependencies neatly
- Expressive
- Not automatic

I chose sub-feature names, based on how well that works for emacs libraries, a real stress test. That is, I choose for code to express dependencies in a form like:

(require (li bra ry name) (feature-1 feature-2))

Co-ordinating sub-features with version identity

The other forms of version identity still exist as useful data: ID hash, version tags, results of tests. What makes sense to me is to translate them into sets of provided features. Do this somewhere between the repository and the require statement. require would still just see sets of features.

Desiderata for this translation:

Shouldn't be too much work for the developer.
- Probably easiest to support automatic rules and allow particular exceptions. With a git-flow workflow, this could almost be automatic. As soon as a feature branch is merged into "master", that version and later versions would be deemed to have a feature of that name.
Should be expressable at multiple points in the pipeline, at least:
- Annotations in the source code itself
- In the repo (In case the source code annotations had to be corrected)
- Stand-alone indexes of library identities. Such indexes would be libraries in their own right. Presumably they'd also record other version-relevant attributes such as signature and URL.
- Locally by user
Should be derivable from many types of data, at least:
- Branches (eg, everything on "master" branch has the feature stable)
- Tag text (eg, all versions after (2 3 3) provide foo-feature)
- Tag signature (eg, check it against a public key, possibly found in the repo)
- Source code annotations (eg, after coding foo-feature, write (provide-features ear lier fea tures foo-feature))
- Tests (eg, annotate foo-feature's (sub)test suite to indicate that passing it all means foo-feature is provided)
- ID
  - To express specific exceptions (eg, ID af84925ebdaf4 does not provide works)
  - To potentially compile a mapping from ID to features
- Upstream data. Eg, the bundles of library identities might largely collect and filter data from the libraries
Should be potentially independent of library's presence, so it can be consulted before fetching a version of a library.
Should potentially bundle groups of features under single names, to let require statements require them concisely.

Dependencies

With sub-features, we don't even need Scheme's modest treatment of dependencies, at least not in require. Instead, we could avoid bad versions by indicating that they lack a feature, or possibly possess a negative feature.

The usual configuration might implicitly require:

works
stable
trusted-source
all-tests-passed

The set of implicitly required features must be configurable by the user, eg for a developer to work on unstable branches.

Library namespace conventions

On the whole, I like the CPAN namespace conventions. I'd like to suggest these additional (sub-)library-naming conventions:

raw

This interface provides "raw" functionality that favors regular operation and controllability over guessing intentions.

dwim

This interface provides "dwim" functionality that tries to do what is probably meant.

test

This sub-library contains tests for the library immediately enclosing it

testhelp

This sub-library contains code that helps test libraries that use the library immediately enclosing it. In particular, it should provide instances of objects the library builds or operates on for test purposes.

interaction

This library has no functionality per se, it combine one or more functional libraries with an interface (keybindings, menus, or w/e). This is intended to encourage separation of concerns.

inside-out

This library is young and has not yet been organized into a well-behaved namespace with parts. It can have sub-libraries, and their names should evolve to mirror the overall library organization so that it can become a real library.

(inside-out new-app)

user

This user is providing a library that doesn't yet have an official "home" in the namespace. The second component is a unique user-name.

(user tehom-blog/blogspot.com inside-out new-app)
(user tehom-blog/blogspot.com std utility new-util)

Other roots

But there are other files that are relevant to libraries but are not source code. ISTM there's often a benefit in locating them relative to a given library. But they shouldn't have an identity in the library tree itself:

They're not source
Libraries mustn't accidentally use their identifier as the name of a normal source file.

So I would define other trees exactly paralleling the library tree. I would include at least:

docs

Organized documentation

info

Documentation specifically as info files

customizations

For customizations relating to a library. Always local.

(customizations std util my-app)

Format : List of bindings

default-customizations

For the recurring idiom where one initially configures with defaults, and user changes to them are preserved.

Format : List of lists, each being in the format (name value first-appeared-in-version)

files

General non-source files supporting a library. In particular, in relation to test or examples.

18 October 2011

Thoughts on the structure of Klink Libraries