## 18 October 2011

### Previously

I blogged some ideas about Library paths for Klink, the Kernel implementation I wrote. I listed several desiderata, based on lessons from the past. I also blogged about how I'd like bridge the gap between package as used by developer and package as used by user.

• (The desiderata from the previous blog post, plus:)
• Should co-operate well with development. Switching from use to development shouldn't require gross changes or compete with library support.
• Can fetch libraries automatically with reasonable power and control
• In particular, automatable enough to support "remote autoloading" but ultimately should be under the user's control.
• Support clean packaging

## Fetching libraries: mostly lean on git

The well-loved version manager git provides most of what I'd want, out of the box:

• Co-operates well with development (More than co-operates, that's what it's usually for)
• Reasonably compact for non-development. You can clone a repo with depth=1
• Fetching
• Via URL (git protocol or otherwise)
• Doesn't treat URLs as sexps - only a mild problem.
• Finding out what's there to be fetched, in the sense of available versions (eg, looking for latest stable release)
git ls-remote --tags URL

• But we have to distinguish tags and tags, which AIUI don't refer to versions.
• Secure digital signatures are easy
• Creating them
git tag -s

• Verifying them
git-verify

• Excluding local customizations from being updated
• This is possible with .gitignore and some care
• But customizations will live somewhere else entirely (See below)
• Practices supporting stable releases. git-flow (code and practices) does this.
• NOT a well-behaved heterogenerated tree of libraries.

Of course git does not support knowing that a repo is intended as Kernel code. Looking at filename extensions does, but that seems to require fetching the repo first. For the same reason, it can't easily be any file that "lives" in the repo. It should be something about the repo itself.

So the convention I propose is that the presence of a branch named --kernel-source-release indicates a branch of stable Kernel code. Tags on that branch would indicate available versions, so even if coders are working informally and doing unstable work on "master", only tagged versions would be offered.

But does keeping --kernel-source-release up to date require extra effort for the maintainer? IIUC, git can simply make --kernel-source-release track "master", so if a coder's workflow is organized, he needn't make any extra effort beyond issuing a one-time command. Branch tracking is intended for remotes, but seems to support this.

Should there be other branches, like --kernel-source-unstable or --kernel-source-development? I suspect they're not needed, and any use of unstable branches should be specifically configured by the daring user.

I'm not proposing to permanently tie Klink (much less Kernel) specifically to git forever. But it serves so well and is so well supported that I'm not concerned.

## Where to put it all?

That addressed how we can fetch code. In doing so, it put some restrictions on how we can organize the files on disk. So I should at least sketch how it could work on disk.

### The easy part

Of course one would configure directories for libraries to live in. Presumably one would distinguish system, local, and user.

### Path configuration

But the stow approach still left issues of where exactly to stow things. We can't solve it in the file system. That would result in one of two ugly things:

• Making each project represent the entire library filespace, with its real code living at some depth below the project root.
• Making each project physically live in a mirror of the target filespace. This would have all the problems we were avoiding above plus more.

So I propose per-project configuration data to tell stow about paths. I'd allow binding at least these things:

prefix
The library prefix, being a list of symbols.
parts
List of sub-parts, each being a list, being:

For example,

((prefix (std util my-app))
(parts
(
(source
[,,src,,]
())
(source
[,,tests,,]
(tests))
(info
[,,doc,,]
())
(default-customizations
[,,defaults,,]
())
(public-key
[,,pub_key.asc,,]
()))))



That would live in a file with a reserved name, say "%kernel-paths" in the repo root. As the example implies, the contents of that file would be sexps, but it wouldn't be code as such. It'd be bindings, to be evaluated in a "sandbox" environment that supported little or no functionality. The expressions seem to be just literals, so no more is required.

## Dependencies and version identity

### Surfeit of ways to express version identity

There are a number of ways to indicate versions. All have their strengths:

• ID hash
• Automatic
• Unique
• Says nothing about stability and features
• Release timestamp
• Time ordered
• Nearly unique, but can mess up.
• Says nothing about stability and features
• Version major.minor.patch
• Just a little work
• Expresses stability
• Expresses time order, but can be messed up.
• Test-satisfaction
• Lots of work
• Almost unused
• Automatically expresses stability and features
• No good convention for communicating the nature of tests
• stable', unstable', release', current'.
• Expresses only stability and currency
• By named sub-features
• Just a little work
• Expresses dependencies neatly
• Expressive
• Not automatic

I chose sub-feature names, based on how well that works for emacs libraries, a real stress test. That is, I choose for code to express dependencies in a form like:

(require (li bra ry name) (feature-1 feature-2))


### Co-ordinating sub-features with version identity

The other forms of version identity still exist as useful data: ID hash, version tags, results of tests. What makes sense to me is to translate them into sets of provided features. Do this somewhere between the repository and the require statement. require would still just see sets of features.

Desiderata for this translation:

• Shouldn't be too much work for the developer.
• Probably easiest to support automatic rules and allow particular exceptions. With a git-flow workflow, this could almost be automatic. As soon as a feature branch is merged into "master", that version and later versions would be deemed to have a feature of that name.
• Should be expressable at multiple points in the pipeline, at least:
• Annotations in the source code itself
• In the repo (In case the source code annotations had to be corrected)
• Stand-alone indexes of library identities. Such indexes would be libraries in their own right. Presumably they'd also record other version-relevant attributes such as signature and URL.
• Locally by user
• Should be derivable from many types of data, at least:
• Branches (eg, everything on "master" branch has the feature stable)
• Tag text (eg, all versions after (2 3 3) provide foo-feature)
• Tag signature (eg, check it against a public key, possibly found in the repo)
• Source code annotations (eg, after coding foo-feature, write (provide-features ear lier fea tures foo-feature))
• Tests (eg, annotate foo-feature's (sub)test suite to indicate that passing it all means foo-feature is provided)
• ID
• To express specific exceptions (eg, ID af84925ebdaf4 does not provide works)
• To potentially compile a mapping from ID to features
• Upstream data. Eg, the bundles of library identities might largely collect and filter data from the libraries
• Should be potentially independent of library's presence, so it can be consulted before fetching a version of a library.
• Should potentially bundle groups of features under single names, to let require statements require them concisely.

### Dependencies

With sub-features, we don't even need Scheme's modest treatment of dependencies, at least not in require. Instead, we could avoid bad versions by indicating that they lack a feature, or possibly possess a negative feature.

The usual configuration might implicitly require:

• works
• stable
• trusted-source
• all-tests-passed

The set of implicitly required features must be configurable by the user, eg for a developer to work on unstable branches.

## Library namespace conventions

On the whole, I like the CPAN namespace conventions. I'd like to suggest these additional (sub-)library-naming conventions:

raw
This interface provides "raw" functionality that favors regular operation and controllability over guessing intentions.
dwim
This interface provides "dwim" functionality that tries to do what is probably meant.
test
This sub-library contains tests for the library immediately enclosing it
testhelp
This sub-library contains code that helps test libraries that use the library immediately enclosing it. In particular, it should provide instances of objects the library builds or operates on for test purposes.
interaction
This library has no functionality per se, it combine one or more functional libraries with an interface (keybindings, menus, or w/e). This is intended to encourage separation of concerns.
inside-out
This library is young and has not yet been organized into a well-behaved namespace with parts. It can have sub-libraries, and their names should evolve to mirror the overall library organization so that it can become a real library.
(inside-out new-app)

user
This user is providing a library that doesn't yet have an official "home" in the namespace. The second component is a unique user-name.
(user tehom-blog/blogspot.com inside-out new-app)
(user tehom-blog/blogspot.com std utility new-util)


## Mutability and Signals

Recently I've been working on Rosegarden, the music sequencer. It uses Qt which uses signals.

Signals implement the Observer pattern, where an object notifies "observers" via signals. A signal is connected to one or more "slots" in other objects. The slots are basically normal methods, except they return nothing (void). When a signal is emitted, Qt arranges for the slots to be called, other than those of deleted objects. So far, I find it works easily and elegantly.

This made me wonder: Could signals take the place of mutability in Scheme? And might that give us both referential transparency and reactiveness simultaneously?

There's not much support for signals for Lisp and Scheme. There's Cells, but it seemed to be conceived of as just a super-spreadsheet. I want to go much further and use signals to re-imagine the basics of object mutation.

## Quasi-mutation: The general idea

Basically, we'd use signals between constant objects to fake mutability.

• Objects can't mutate.
• "Slots" are closures.
• Signals are emitted with respect to particular objects.
• Not "by objects". Again, we're not object-oriented. We're just indexing on objects.
• Ability to emit is controlled by access to objects and signal types.
• Indexing on one particular argument seems overly special, so I contemplate indexing on any relevant arguments. This is again similar to generic functions.
• Signals can be connected to slots.
• The signals go to where the object's respective signal is connected. They are indexed on objects.
• Constructors connect the signal replaced from the parts to the constructed object.
• More precisely, to a closure that knows the object.
• The closure would fully represent the objects' relation. For instance, mutable pairs might have the slots new-car and new-cdr with the obvious meanings.
• But not for immutable objects. Immutable objects' slots would not be new-car and new-cdr, they would raise error.
• The constructed object can access its part objects, in appropriate ways by its own lights. For instance, a pair object could retrieve its car and cdr objects.
• This particular signal replaced is not exposed.
• The details of replaced will be refined below.
• Slots such as new-car will typically:
• Construct a near-copy of the object, with the new part in the old part's place. This effectively connects a new version of the object to the new part and disconnects it from the old part.
• Emit replaced with respect to the object, propagating the change.
• "Normal" setters such as set-car! emit replaced wrt the old object with the new object as value.
• That's just the classical way. There's plenty of room to do clever new things with signals.
• As above, doing this to immutable objects causes error.
• Constructed objects behaving this way would include at least:
• Mutable pairs (and therefore mutable lists and trees)
• Environments
• Continuations. While not often considered object-like, continuations have parts such as the current environment and their parent continuations.
• External-world-ish objects such as ports react to signals in their own appropriate way, not neccessarily propagating them further.

## As if

Literally implementing ever pair with at least two signals between itself and its car and cdr seems prohibitive if not impossible. Physically, it couldn't be the only mechanism of mutation. So I'm talking about a mechanism that acts as if it's continuous down to basics like pairs and lists, but really uses a more modest mechanism where it can (presumably containment, as now).

## Object identity

Above, I said "constructed object can access a part object", not "a part's value". Since objects no longer ever change value, the difference is subtle. It's just this: the object has a single set of signal connections. So it has a single identity. So there is a trace of object identity remaining.

One could represent identity value-wise by saying that values consist of a (classical) value and an "object-identity" value, and that object-identity values are opaque and unique except as shared by this mechanism. So signals are connected with respect to object-identity values.

It has a flavor of "the long way around", but it lets us treat objects entirely as values.

### Object versioning

Different versions of an object have different Object-IDs. Imperatively this wouldn't be anything, since two simultaneous references to the same object can't have different values. But here, one can both mutate an object as far the the world is concerned and hold onto its old value. But it should never be the case that objects are eq? but not equal?. So different versions have different Object-IDs.

### The equality predicates

Equality:

equal?
is what it normally is in Scheme or Kernel: The objects have the same current value.
eq?
A and B are eq? just if
• (equal? identity-of-a identity-of-b)
=?
is just an optimization of equal?

## Signals and immutability

I mentioned that I was thinking about immutability in regard to this. So far, I've just described how to duplicate mutability with signals.

For immutable objects, some or all slots would still get connections, but would raise error instead of propagating mutation.

But that's only half of the control we should have over mutation. We'd also like to guarantee that certain evaluations don't mutate even mutable objects they have access to, eg their arguments, the environment, and dynamic variables. The "foo!" convention indicates this (negatively) but doesn't enforce anything. "foo!" convention notwithstanding, we'd like to guarantee this from outside an arbitrary call, not from inside trusted combiners.

### Blocking signals

So we'd like to sometimes block signals. If signals were emitted anyways, they'd be errors and would not reach their destinations. So if replaced is blocked, code either doesn't try to mutate objects or tries and raises an error. Either is consistent with immutability.

ISTM the simplest way to block signals is to disconnect their existing connections and connect them to error combiners. When the call is done, restore their original connections. However, that doesn't play well with asynchrous execution.

Instead, we'll make a copy of the original object that will (probably lazily) infect its parts with "don't mutate me in this scope".

### Scope

For a traditional imperative language, where flow of control and scope are structurally the same, we could block signals in specific scopes, recursively. But for Scheme and Kernel, that won't suffice. What would happen if an object is passed to a continuation and mutated there? We've broken the guarantee that the object wouldn't be mutated. Any time we let objects be passed abnormally, this can happen.

We might try to:

1. raise error if affected objects are passed to continuation applications, or
2. "infect" the other scope with the signal restrictions.

Neither is appealing. In this mechanism, continuing normally is also passing to a less restrictive scope. And continuing normally should behave about the same way as continuing abnormally to the same destination. We also don't want error returns to permanently "freeze" objects.

So ISTM we must distinguish between continuing to a (not neccessarily proper) parent of the restricting scope (normally or otherwise) and continuing elsewhere. Signal blocks are removed just if control reaches a parent. This is essentially how Kernel guards reckon continuation parentage.

### Doing this for all object parts

We'd usually want to say that no part of any argument to a combiner can be mutated. It's easy enough to treat signal connections from the root argobject. But we need to "infect" the whole object with immutability, and not infect local objects, which may well be temporary and legitimately mutable.

Since these arguments are ultimately derived from the root argobject, what we can do is arrange for accessors to give immutable objects in their turn. But they have to be only temporarily immutable - blocked, as we said above. And we'd prefer to manage it lazily.

So I propose that accessing a blocked object gives only objects:

• whose replaced signals are re-routed to error, as above, until control "escapes normally" (an improper parent continuation is reached)
• which are in turn blocked, meaning they have the same property infecting all of their accessors.

#### Non-pair containers

Non-pair containers are not treated by mechanisms like copy-es-immutable. But we want to treat immutably fully, so they have to be treated too. This is the case even for:

• Environments
• Encapsulation types. In this case, their sole accessor is required to do this, as all accessors are.
• Ports. Administrative state or not, they can be immutable.
• Closures. Anything they return is considered accessed and automatically gets this blockingness. Their internal parts (static environment, etc) need not be blocked.
• Continuations. Like closures, what they return is considered accessed and automatically gets this blockingness.

#### Exemptions

We'd like to be able to exempt particular objects from this. Some combiners mutate an argument but shouldn't mutate anything else. There'd probably be a signal-block spec that would specify this.

### Blocking signals to keyed dynamic objects

We can easily extend the above to deal with dynamic environment, but keyed dynamic objects are not so simple. Their accessors would be covered by the above if derived from the argobject or the dynamic environment, but they need not be.

So we need an additional rule: Keyed dynamic objects are blocked if accessed in the dynamic scope of the blocking. That's recursive like other blockings. Keyed rebindings in the dynamic scope aren't, because one might bind a temporary that's legitimately mutable.

Side note: I'd like to see a stylistic convention to differentiate between combiners that mutate their arguments ("foo!") and combiners that mutate something else, meaning either their dynamic environment or a dynamic variable (Say "foo!!")

## Interface to be defined

I've already written a lot, so I'll leave this part as just a sketch. To support all this, we need to define an interface.

• A means of defining new signal types
• Returns a signal emitter
• Returns a signal identifier for connect' to use.
• A signal scheduler. By general principles, the user should be able to use his own, at least for exposed signals. It's not too hard to write one with closures and continuations.
• Means for the user to emit signals
• Means for the user to connect and disconnect signals
• Not exposed for many built-in objects such as pairs.
• "capture all current", as for blocking
• "disconnect all"
• block-all-signals: connects all signals to error continuation
• Possible argument except'
• (Maybe) block-signal: connects given signal to error continuation
• A "dirty-flag" mechanism, often useful with signals.
• Possibly a priority mechanism.

## Some possibilities this raises

• Use the signal scheduling mechanism for managing constraints. Since when we can enforce immutability, constraints become very tempting.
• Provide general signals
• Provide an interface to Lone-COWs, a sort of copy-on-write object optimized for usually being uniquely referenced/owned.
• Supplying "out-of-band" signals to a debugger or similar. They really do need to be out-of-band.
• Provide a broadly applicable interface for redo/undo. It could basically just capture historical copies of objects.

## Previously

Klink is my stand-alone implementation of Kernel by John Shutt.

## Types of immutability

There are several types of immutability used or contemplated in Klink.

Complete immutability
Not much to be said.
Pair immutability
A pair that forever holds the same two objects, though those objects' contents may change.
List immutability
For instance, a list where you can change the contents, but not the number of elements.
Recursive structural immutability
An immutable tree of mutable non-pair objects.
Eg, what ports typically have. When you read or write to a port, it remains "the same thing" administratively.

### Complete immutability

Available in C, used in some places such as symbols and strings. AFAICT there's no way to specify its use or non-use in Kernel; some objects just have it because of what they are or how they are created.

### Recursive structural immutability (Tree immutability)

Supported via copy-es-immutable

No C flag for it. Like complete immutability, some objects just have it because of what type they are.

## Non-recursive structural immutability?

If you read section 4.7.2 of the Kernel report (copy-es-immutable), you may notice that Pair immutability and List immutability are actually extensions. So I figured I should at least advance a rationale for them.

Is non-recursive immutability worth supporting? ISTM it's already strongly suggested by Kernel.

Implied by an informal type
Some combiners take finite lists as arguments; all applicatives require a finite list as argobject. That distinguishes the finite list as at least an informal type. There's a predicate for it, finite-list?, but pairs that "are" finite lists can "become" other sorts of lists (dotted or circular), so it falls short of being a formal type. This seems like an irregularity to me. Structural immutability would solve it.
Implied by implied algorithms
For some combiners (eg reduce), any practical algorithm seems to require doing sub-operations on counted lists. That implies a structurally immutable list, because otherwise the count could theoretically become wrong; in practice it's saved from this by writing the algorithm carefully. So there are at least ephemeral, implied structurally immutable lists present.
Vectors
John once told me that he would eventually like to have vectors in Kernel. Vectors are optimized structurally immutable finite lists.
Opportunity for optimization
There's an opportunity for other optimizations where list immutability is used and recognized. In particular, caching a list's element count is often nice if one can be sure the list count won't change.

## Comparison table

Type of immutabilityPairListTree
Special care needed
for shared objects?NoOnly own tailYes