Klink Libraries
Previously
I blogged some ideas about Library paths for Klink, the Kernel implementation I wrote. I listed several desiderata, based on lessons from the past. I also blogged about how I'd like bridge the gap between package as used by developer and package as used by user.
Additional desiderata
- (The desiderata from the previous blog post, plus:)
- Should co-operate well with development. Switching from use to development shouldn't require gross changes or compete with library support.
-
Can fetch libraries automatically with reasonable power and control
- In particular, automatable enough to support "remote autoloading" but ultimately should be under the user's control.
- Support clean packaging
Fetching libraries: mostly lean on git
The well-loved version manager git
provides most of what I'd want,
out of the box:
- Co-operates well with development (More than co-operates, that's what it's usually for)
- Reasonably compact for non-development. You can clone a repo with depth=1
-
Fetching
- Via URL (git protocol or otherwise)
- Doesn't treat URLs as sexps - only a mild problem.
-
Finding out what's there to be fetched, in the sense of available
versions (eg, looking for latest stable release)
git ls-remote --tags URL
- But we have to distinguish tags and tags, which AIUI don't refer to versions.
-
Secure digital signatures are easy
-
Creating them
git tag -s
-
Verifying them
git-verify
- Some pages about them:
-
Creating them
-
Excluding local customizations from being updated
-
This is possible with
.gitignore
and some care - But customizations will live somewhere else entirely (See below)
-
This is possible with
- Practices supporting stable releases. git-flow (code and practices) does this.
- NOT a well-behaved heterogenerated tree of libraries.
Of course git does not support knowing that a repo is intended as Kernel code. Looking at filename extensions does, but that seems to require fetching the repo first. For the same reason, it can't easily be any file that "lives" in the repo. It should be something about the repo itself.
So the convention I propose is that the presence of a branch named
--kernel-source-release
indicates a branch of stable Kernel code.
Tags on that branch would indicate available versions, so even if
coders are working informally and doing unstable work on "master",
only tagged versions would be offered.
But does keeping --kernel-source-release
up to date require extra
effort for the maintainer? IIUC, git can simply make
--kernel-source-release
track "master", so if a coder's workflow is
organized, he needn't make any extra effort beyond issuing a one-time
command. Branch tracking is intended for remotes, but seems to
support this.
Should there be other branches, like --kernel-source-unstable
or
--kernel-source-development
? I suspect they're not needed, and any
use of unstable branches should be specifically configured by the
daring user.
I'm not proposing to permanently tie Klink (much less Kernel) specifically to git forever. But it serves so well and is so well supported that I'm not concerned.
Where to put it all?
That addressed how we can fetch code. In doing so, it put some restrictions on how we can organize the files on disk. So I should at least sketch how it could work on disk.
The easy part
Of course one would configure directories for libraries to live in. Presumably one would distinguish system, local, and user.
The hard part
A mature set of libraries tends to look like a tree. But above, we had to treat each one a separate project.
Approaches:
-
Let the namespace tree exactly be the directory tree.
-
Fatal con: Code repos want to "own" all their subdirectories.
git
will try to control everything in its working tree. It even wants submodules to covary with the working tree.
-
Fatal con: Code repos want to "own" all their subdirectories.
-
Have a dotty namespace where all the components are expressed in
one name. So directories have names like "std.math.util.raw"
- Con: Yuck! There's everything wrong with this.
-
Reserve a name for subdirectories and
.gitignore
it.- Con: Every level of subdirectory becomes two levels.
- Con: Still yuck.
-
Require that directories must be either inner nodes or leaves,
never both. If a project directory becomes an inner node, move
code in it to a directory with a special name, say "%main".
- Con: Requires moving subdirectories, possibly behind the back of other software.
-
Library filespace and project filespace are distinct. Stow the
projects into a library tree. They live in some other directory as
configured by user.
- Pro: Clean.
- Pro: Flexible.
- Pro: Lets local customizations live outside the project repo, as they should.
- Con: How is stow to know where a project's "root" in filespace is?
- Con (that the other approaches had too): Internal project filetree organization can still mess us up. What happens if its code lives in "src" instead of in "."?
I favor the "stow" approach.
Path configuration
But the stow approach still left issues of where exactly to stow things. We can't solve it in the file system. That would result in one of two ugly things:
- Making each project represent the entire library filespace, with its real code living at some depth below the project root.
- Making each project physically live in a mirror of the target filespace. This would have all the problems we were avoiding above plus more.
So I propose per-project configuration data to tell stow about paths. I'd allow binding at least these things:
- prefix
- The library prefix, being a list of symbols.
- parts
-
List of sub-parts, each being a list, being:
- Type of file as far as library-ing is concerned (source, docs, info, default-customizations, others to be defined. See below).
-
file name or directory relative to repo root
- Or alternatively a URL, eg for a wiki or mailing list associated with the library.
- library name relative to library prefix
For example,
((prefix (std util my-app)) (parts ( (source [,,src,,] ()) (source [,,tests,,] (tests)) (info [,,doc,,] ()) (default-customizations [,,defaults,,] ()) (public-key [,,pub_key.asc,,] ()))))
That would live in a file with a reserved name, say "%kernel-paths" in the repo root. As the example implies, the contents of that file would be sexps, but it wouldn't be code as such. It'd be bindings, to be evaluated in a "sandbox" environment that supported little or no functionality. The expressions seem to be just literals, so no more is required.
Dependencies and version identity
Surfeit of ways to express version identity
There are a number of ways to indicate versions. All have their strengths:
-
ID hash
- Automatic
- Unique
- Says nothing about stability and features
-
Release timestamp
- Easily made automatic
- Time ordered
- Nearly unique, but can mess up.
- Says nothing about stability and features
-
Version major.minor.patch
- Just a little work
- Expresses stability
- Expresses time order, but can be messed up.
-
Test-satisfaction
- Lots of work
- Almost unused
- Automatically expresses stability and features
- No good convention for communicating the nature of tests
-
`stable', `unstable', `release', `current'.
- Expresses only stability and currency
-
By named sub-features
- Just a little work
- Expresses dependencies neatly
- Expressive
- Not automatic
I chose sub-feature names, based on how well that works for emacs libraries, a real stress test. That is, I choose for code to express dependencies in a form like:
(require (li bra ry name) (feature-1 feature-2))
Co-ordinating sub-features with version identity
The other forms of version identity still exist as useful data: ID
hash, version tags, results of tests. What makes sense to me is to
translate them into sets of provided features. Do this somewhere
between the repository and the require
statement. require
would
still just see sets of features.
Desiderata for this translation:
-
Shouldn't be too much work for the developer.
- Probably easiest to support automatic rules and allow particular exceptions. With a git-flow workflow, this could almost be automatic. As soon as a feature branch is merged into "master", that version and later versions would be deemed to have a feature of that name.
-
Should be expressable at multiple points in the pipeline, at least:
- Annotations in the source code itself
- In the repo (In case the source code annotations had to be corrected)
- Stand-alone indexes of library identities. Such indexes would be libraries in their own right. Presumably they'd also record other version-relevant attributes such as signature and URL.
- Locally by user
-
Should be derivable from many types of data, at least:
-
Branches (eg, everything on "master" branch has the feature
stable
) -
Tag text (eg, all versions after (2 3 3) provide
foo-feature
) - Tag signature (eg, check it against a public key, possibly found in the repo)
-
Source code annotations (eg, after coding
foo-feature
, write(provide-features ear lier fea tures foo-feature)
) -
Tests (eg, annotate foo-feature's (sub)test suite to indicate
that passing it all means
foo-feature
is provided) -
ID
-
To express specific exceptions (eg, ID af84925ebdaf4 does not
provide
works
) - To potentially compile a mapping from ID to features
-
To express specific exceptions (eg, ID af84925ebdaf4 does not
provide
- Upstream data. Eg, the bundles of library identities might largely collect and filter data from the libraries
-
Branches (eg, everything on "master" branch has the feature
- Should be potentially independent of library's presence, so it can be consulted before fetching a version of a library.
-
Should potentially bundle groups of features under single names, to
let
require
statements require them concisely.
Dependencies
With sub-features, we don't even need Scheme's modest treatment of
dependencies, at least not in require
. Instead, we could avoid bad
versions by indicating that they lack a feature, or possibly possess a
negative feature.
The usual configuration might implicitly require:
-
works
-
stable
-
trusted-source
-
all-tests-passed
The set of implicitly required features must be configurable by the user, eg for a developer to work on unstable branches.
Library namespace conventions
On the whole, I like the CPAN namespace conventions. I'd like to suggest these additional (sub-)library-naming conventions:
- raw
- This interface provides "raw" functionality that favors regular operation and controllability over guessing intentions.
- dwim
- This interface provides "dwim" functionality that tries to do what is probably meant.
- test
- This sub-library contains tests for the library immediately enclosing it
- testhelp
- This sub-library contains code that helps test libraries that use the library immediately enclosing it. In particular, it should provide instances of objects the library builds or operates on for test purposes.
- interaction
- This library has no functionality per se, it combine one or more functional libraries with an interface (keybindings, menus, or w/e). This is intended to encourage separation of concerns.
- inside-out
-
This library is young and has not yet been organized
into a well-behaved namespace with parts. It can
have sub-libraries, and their names should evolve to
mirror the overall library organization so that it
can become a real library.
(inside-out new-app)
- user
-
This user is providing a library that doesn't yet have an
official "home" in the namespace. The second component is a
unique user-name.
(user tehom-blog/blogspot.com inside-out new-app) (user tehom-blog/blogspot.com std utility new-util)
Other roots
But there are other files that are relevant to libraries but are not source code. ISTM there's often a benefit in locating them relative to a given library. But they shouldn't have an identity in the library tree itself:
- They're not source
- Libraries mustn't accidentally use their identifier as the name of a normal source file.
So I would define other trees exactly paralleling the library tree. I would include at least:
- docs
- Organized documentation
- info
- Documentation specifically as info files
- customizations
-
For customizations relating to a library. Always
local.
(customizations std util my-app)
- Format : List of bindings
- default-customizations
-
For the recurring idiom where one
initially configures with defaults, and user changes to them are
preserved.
-
Format : List of lists, each being in the format
(name value first-appeared-in-version)
-
Format : List of lists, each being in the format
- files
-
General non-source files supporting a library. In
particular, in relation to
test
orexamples
.