18 October 2011

Thoughts on the structure of Klink Libraries

Klink Libraries

Previously

I blogged some ideas about Library paths for Klink, the Kernel implementation I wrote. I listed several desiderata, based on lessons from the past. I also blogged about how I'd like bridge the gap between package as used by developer and package as used by user.

Additional desiderata

  • (The desiderata from the previous blog post, plus:)
  • Should co-operate well with development. Switching from use to development shouldn't require gross changes or compete with library support.
  • Can fetch libraries automatically with reasonable power and control
    • In particular, automatable enough to support "remote autoloading" but ultimately should be under the user's control.
  • Support clean packaging

Fetching libraries: mostly lean on git

The well-loved version manager git provides most of what I'd want, out of the box:

  • Co-operates well with development (More than co-operates, that's what it's usually for)
  • Reasonably compact for non-development. You can clone a repo with depth=1
  • Fetching
    • Via URL (git protocol or otherwise)
    • Doesn't treat URLs as sexps - only a mild problem.
  • Finding out what's there to be fetched, in the sense of available versions (eg, looking for latest stable release)
    git ls-remote --tags URL 
    
    • But we have to distinguish tags and tags, which AIUI don't refer to versions.
  • Secure digital signatures are easy
  • Excluding local customizations from being updated
    • This is possible with .gitignore and some care
    • But customizations will live somewhere else entirely (See below)
  • Practices supporting stable releases. git-flow (code and practices) does this.
  • NOT a well-behaved heterogenerated tree of libraries.

Of course git does not support knowing that a repo is intended as Kernel code. Looking at filename extensions does, but that seems to require fetching the repo first. For the same reason, it can't easily be any file that "lives" in the repo. It should be something about the repo itself.

So the convention I propose is that the presence of a branch named --kernel-source-release indicates a branch of stable Kernel code. Tags on that branch would indicate available versions, so even if coders are working informally and doing unstable work on "master", only tagged versions would be offered.

But does keeping --kernel-source-release up to date require extra effort for the maintainer? IIUC, git can simply make --kernel-source-release track "master", so if a coder's workflow is organized, he needn't make any extra effort beyond issuing a one-time command. Branch tracking is intended for remotes, but seems to support this.

Should there be other branches, like --kernel-source-unstable or --kernel-source-development? I suspect they're not needed, and any use of unstable branches should be specifically configured by the daring user.

I'm not proposing to permanently tie Klink (much less Kernel) specifically to git forever. But it serves so well and is so well supported that I'm not concerned.

Where to put it all?

That addressed how we can fetch code. In doing so, it put some restrictions on how we can organize the files on disk. So I should at least sketch how it could work on disk.

The easy part

Of course one would configure directories for libraries to live in. Presumably one would distinguish system, local, and user.

Path configuration

But the stow approach still left issues of where exactly to stow things. We can't solve it in the file system. That would result in one of two ugly things:

  • Making each project represent the entire library filespace, with its real code living at some depth below the project root.
  • Making each project physically live in a mirror of the target filespace. This would have all the problems we were avoiding above plus more.

So I propose per-project configuration data to tell stow about paths. I'd allow binding at least these things:

prefix
The library prefix, being a list of symbols.
parts
List of sub-parts, each being a list, being:

For example,

((prefix (std util my-app))
   (parts
      (
         (source
            [,,src,,]
            ())
         (source
            [,,tests,,]
            (tests))
         (info
            [,,doc,,]
            ())
         (default-customizations
            [,,defaults,,]
            ())
         (public-key
            [,,pub_key.asc,,]
            ()))))

That would live in a file with a reserved name, say "%kernel-paths" in the repo root. As the example implies, the contents of that file would be sexps, but it wouldn't be code as such. It'd be bindings, to be evaluated in a "sandbox" environment that supported little or no functionality. The expressions seem to be just literals, so no more is required.

Dependencies and version identity

Surfeit of ways to express version identity

There are a number of ways to indicate versions. All have their strengths:

  • ID hash
    • Automatic
    • Unique
    • Says nothing about stability and features
  • Release timestamp
    • Easily made automatic
    • Time ordered
    • Nearly unique, but can mess up.
    • Says nothing about stability and features
  • Version major.minor.patch
    • Just a little work
    • Expresses stability
    • Expresses time order, but can be messed up.
  • Test-satisfaction
    • Lots of work
    • Almost unused
    • Automatically expresses stability and features
    • No good convention for communicating the nature of tests
  • `stable', `unstable', `release', `current'.
    • Expresses only stability and currency
  • By named sub-features
    • Just a little work
    • Expresses dependencies neatly
    • Expressive
    • Not automatic

I chose sub-feature names, based on how well that works for emacs libraries, a real stress test. That is, I choose for code to express dependencies in a form like:

(require (li bra ry name) (feature-1 feature-2))

Co-ordinating sub-features with version identity

The other forms of version identity still exist as useful data: ID hash, version tags, results of tests. What makes sense to me is to translate them into sets of provided features. Do this somewhere between the repository and the require statement. require would still just see sets of features.

Desiderata for this translation:

  • Shouldn't be too much work for the developer.
    • Probably easiest to support automatic rules and allow particular exceptions. With a git-flow workflow, this could almost be automatic. As soon as a feature branch is merged into "master", that version and later versions would be deemed to have a feature of that name.
  • Should be expressable at multiple points in the pipeline, at least:
    • Annotations in the source code itself
    • In the repo (In case the source code annotations had to be corrected)
    • Stand-alone indexes of library identities. Such indexes would be libraries in their own right. Presumably they'd also record other version-relevant attributes such as signature and URL.
    • Locally by user
  • Should be derivable from many types of data, at least:
    • Branches (eg, everything on "master" branch has the feature stable)
    • Tag text (eg, all versions after (2 3 3) provide foo-feature)
    • Tag signature (eg, check it against a public key, possibly found in the repo)
    • Source code annotations (eg, after coding foo-feature, write (provide-features ear lier fea tures foo-feature))
    • Tests (eg, annotate foo-feature's (sub)test suite to indicate that passing it all means foo-feature is provided)
    • ID
      • To express specific exceptions (eg, ID af84925ebdaf4 does not provide works)
      • To potentially compile a mapping from ID to features
    • Upstream data. Eg, the bundles of library identities might largely collect and filter data from the libraries
  • Should be potentially independent of library's presence, so it can be consulted before fetching a version of a library.
  • Should potentially bundle groups of features under single names, to let require statements require them concisely.

Dependencies

With sub-features, we don't even need Scheme's modest treatment of dependencies, at least not in require. Instead, we could avoid bad versions by indicating that they lack a feature, or possibly possess a negative feature.

The usual configuration might implicitly require:

  • works
  • stable
  • trusted-source
  • all-tests-passed

The set of implicitly required features must be configurable by the user, eg for a developer to work on unstable branches.

Library namespace conventions

On the whole, I like the CPAN namespace conventions. I'd like to suggest these additional (sub-)library-naming conventions:

raw
This interface provides "raw" functionality that favors regular operation and controllability over guessing intentions.
dwim
This interface provides "dwim" functionality that tries to do what is probably meant.
test
This sub-library contains tests for the library immediately enclosing it
testhelp
This sub-library contains code that helps test libraries that use the library immediately enclosing it. In particular, it should provide instances of objects the library builds or operates on for test purposes.
interaction
This library has no functionality per se, it combine one or more functional libraries with an interface (keybindings, menus, or w/e). This is intended to encourage separation of concerns.
inside-out
This library is young and has not yet been organized into a well-behaved namespace with parts. It can have sub-libraries, and their names should evolve to mirror the overall library organization so that it can become a real library.
(inside-out new-app)
user
This user is providing a library that doesn't yet have an official "home" in the namespace. The second component is a unique user-name.
(user tehom-blog/blogspot.com inside-out new-app)
(user tehom-blog/blogspot.com std utility new-util)

No comments:

Post a Comment