Tehom: 2011

24 December 2011

Trade Logic - what mechanisms would a system need?

The mechanisms that a Trade Logic system would need

Previously

I introduced and specified Trade Logic, and posted a fair bit about it.

Now

Here's another post that's mostly a bullet-point list. In this post I try to mostly nail down all the mechanisms TL needs in order to function as intended. Some of it, notably the treatment of selector bids, is still only partly defined.

The mechanisms that a TL system needs

Pre-market
- Means to add definitions (understood to include selectors and bettable issues)
  - Initiated by a user
  - A definition is only accepted if it completely satisfies the typechecks
  - Definitions that are accepted persist.
In active market
- Holdings
  - A user's holdings are persistent except as modified by the other mechanisms here.
- Trading
  - Initiated by a user
  - User specifies:
    - What to trade
    - Buy or sell
    - Price
    - How much
    - How long the order is to persist.
  - A trade is accepted just if:
    - The user holds that amount or more of what he's selling.
    - It can be met from booked orders
    - It is to be retained as a booked order
  - Booked orders persist until either:
    - Acted on
    - Cancelled
    - Timed out
- Issue conversion
  - Initiated by a user
  - User specifies:
    - What to convert from
    - What to convert to
    - How much
  - A trade is accepted just if:
    - It's licensed by one of the conversion rules
    - User has sufficient holdings to convert from
Selectors - where the system meets the outside world. Largely TBD.
- As part of each selector issue, there is a defined mechanism that can be invoked.
  - Its actions when invoked are to:
    - Present itself to the outside world as invoked.
      - Also present its invocation parameters.
      - But not reveal whether it was invoked for settlement (randomly) or as challenge (user-selected values)
    - Be expected to act in the outside world:
      - Make a unique selection
      - Further query that unique result. This implies that selector issues are parameterized on the query, but that's still TBD.
      - Translate the result(s) of the query into language the system understands.
      - Present itself to the system as completed.
    - Accept from the real world the result(s) of that query.
      - How it is described to the system is TBD. Possibly the format it is described in is yet another parameter of a selector.
  - How an invocable mechanisms is described to the system is still to be decided.
  - (NB, a selector's acceptability is checked by the challenge mechanism below. Presumably those selectors that are found viable will have been defined in such a way as to be randomly inspectable etc; that's beyond my scope in this spec)
- Challenge mechanism to settle selector bets
  - Initiated by challengers
    - (What's available to pro-selector users is TBD, but probably just uses the betting mechanism above)
  - Challenger specifies:
    - The value that the choice input stream should have. It is basically a stream of bits, but:
      - Where split-random-stream is encountered, it will split and the challenger will specify both branches.
      - Challenger can specify that from some point on, random bits are used.
    - Probably some form of good-faith bond or bet on the outcome
  - The selector is invoked with the given values.
  - Some mechanism queries whether exactly one individual was returned (partly undecided).
  - The challenge bet is settled accordingly.
- There may possibly also be provision for multi-turn challenges ("I bet you can't specify X such that I can't specify f(X)") but that's to be decided.
Settling (Usually post-market)
- Public fair random-bit generating
  - Initiated as selectors need it.
  - For selectors, some interface for retrieving bits just as needed, so that we may use pick01 etc without physically requiring infinite bits.
    - This interface should not reveal whether it's providing random bits or bits given in a challenge.
- Mechanism to invoke selectors wrt a given bet:
  - It decides:
    - Whether to settle it at the current time.
    - If applicable, "how much" to settle it, for statistical bets that admit partial settlement
  - Probably a function of:
    - The cost of invoking its selectors
    - Random input
    - An ancillary market judging the value of settlement
    - Logic that relates the settlement of convertible bets. (To be decided, probably something like shares of the ancillary markets of convertible bets being in turn convertible)
  - Being activated, it in fact invokes particular selectors with fresh random values, querying them.
  - In accordance with the results of the query:
    - Shares of one side of the bet are converted to units
    - Shares of the other side are converted to zeros.
    - But there will also be allowance for partial results:
      - Partial settlement
      - Sliding payoffs

Trade Logic pick-boolean

Previously

I initially said that the basic choice operator in TL was pick01, which outputs a random scalar between 0 and 1.

What's wrong with `pick01` as basic

It occurs to me that picking a random scalar is not the most basic thing we could do.

It also occurred to me that pick01 as defined needs to provide infinite precision, which of course isn't physically realizable.

It couldn't be sensibly defined to generate both exact 0 and exact 1.

And the fact that I forget whether its interval was left-closed or right-closed suggests that there's a problem there.

So `pick-boolean`

So from infinite to minimal: pick-boolean is analogous to pick01, except that it outputs a bit, not a scalar.

But it shouldn't be unary

The temptation is to make pick-boolean unary. But if we did, it wouldn't be like other unary predicates. When two instances of these pick-boolean have the same inputs (ie, none), the outputs are not neccessarily the same. This lack of referential transparency potentially infects the outputs of any predicate that uses pick-boolean. This would restrict and complicate the conversions we are allowed to make.

So instead, let's provide a system random stream. A "random output" predicate such as pick-boolean will actually be ternary; its args will be:

Input bitstream
Output bitstream
Output random

This has the nice property that it gives us an upper bound on a proposition's sample space: It's the difference in position between the input stream and the output stream. Of course, that is not always predetermined.

We said earlier that bettable propositions had no arguments, and we need these to be bettable, so we have to revise slightly: A definition is bettable if its only inputs and outputs are system random streams, which will be a type.

There's another way this treatment is helpful: Sometimes we want to sample to settle issues (population bets). Such issues can sample by using the random stream argument, which feeds any selectors that are involved.

Random stream arguments also help validate selectors by challenge - the challenger specifies the bitstream. In this one case, the output bitstream is useful. It gives the various random calls an ordering, otherwise the challenger couldn't specify the bitstream.

Stream splitting

But sometimes a single stream doesn't work nicely. A predicate like pick01 may need to use arbitrarily many bits to generate arbitrary precision. So what can it do? It shouldn't consume the entire stream for itself, and it shouldn't break off just a finite piece and pass the rest on.

So let's provide split-random-stream. It inputs a random stream and outputs two random streams. We'll provide some notation for a challenger to split his specified stream accordingly.

Can't be built by users

We will provide no way to build an object of this new type from first principles. Users can get two of them from one by using split-random-stream, but can't get one from none. So the only way to have one is to get it from an input. Ultimately any proposition that uses a random stream must get it from system mechanisms of proposition resolution such as sampling and challenging.

New pieces

So let's define these built-ins:

Type: random-stream, the type of the system's random stream. No built-in predicate has any mode that outputs this type but doesn't also input it.
split-random-stream, ternary predicate that inputs a random stream and outputs two random streams.
pick-boolean, a ternary predicate outputting a bit.
Type: lazy scalar. A lazy scalar can only be compared with given precision, not compared absolutely. So any comparison predicates that take this also manage precision, possibly by inputting an argument giving the required precision.
pick01, ternary, outputs a lazy scalar.
Convenience
- Maybe pick-positive-rational, similar but picking binarily from a Stern-Brocot tree.
- Maybe pick-from-bell-curve, similar but picking lazily from the standard bell curve.

09 December 2011

Kernel WYSIWYG Digraphs

Kernel WYSIWYG rooted digraphs

Previously

I said in comments that I view backquote as a WYSIWYG tree ctor, analogous to `list'.

That's not fully general

But backquote doesn't suffice to construct the most general form of open structure of pairs, a rooted digraph. Those require the ability to share objects between multiple parents.

Things that don't do the job

One could write shared objects by naming objects in a `let' form and including them by name. That's not WYSIWYG, though. WYSIWYG would at least present the objects in place once, as #N# / #N= reader notation does.

But that's in the reader. It's not available to general code.

What to do

One could:

Enclose the whole construction in an empty `let'.
Provide a combiner expanding to ($sequence ($define! x DEF) x)

Signals, continuations, and constraint-handling

Signals, continuations, constraint-handling, and more

Previously

I wrote about mutability and signals and automatically forcing promises where they conflict with required types.

Signals for Constraint-handling

Constraint-handling, and specifically the late-ordered evaluation it uses, could be treated by signals. Something like this:

Promises are our non-ground objects. They are, if you think about it, analogous to Prolog variables.
- They can be passed around like objects.
- But they're not "really" objects yet.
- For them to "succeed", there must be some value that they can take.
- That value can be determined after they are passed.
- Everything else is either ground or a container that includes one or more of them.
As I already do in Klink, use of an non-ground argument is detected where it's unacceptable.
- Generally, they're unacceptable where a type is expected (eg they're acceptable as arguments to cons)
Unlike what I do now, non-ground unacceptables send signals to a scheduler.
- This implies that each evaluation is uniquely related to (owned by) one scheduler.
- That signal passes as objects
  - the continuation as object.
  - The promise to be forced.
It is the job of the scheduler to ensure that the resulting computation completes.
- In appropriate circumstances, eg with resource-quotas, or when doing parallel searches, it is reasonable to not complete.

An analogy

Computations whose flow can't be changed are analogous to constant objects and flow-changable ones analogous to mutable ones.

Another use: Precision

This mechanism could also be useful for "dataflow" precision management. Here the objects are not exactly promises, but they are incomplete in a very precise way. They are always numbers but their precision can be improved by steps.

Combiners that require more precision could signal to their generators, which could signal back their new values, propagating back their own requirements as needed. This is very manageable.

Apply-continuation and signals can mutually define

apply-continuation, while it could be used to implement signals, can also be implemented by signals between evaluations. It signals a scheduler to schedule the given continuation and deschedule the current continuation.

Done this way, it gives a scheduler flexibility. This might appear to make analysis more difficult. But a simple (brain-dead) scheduler could do it exactly the continuation way: exactly one evaluation is "in control" at all times. This might be the default scheduler.

In both cases this functionality is a neither-fish-nor-fowl hybrid between objects and computational flow. ISTM it has to be so.

signals allow indeterminate order

One mixed blessing about signals is that since they are "broadcast" 1-to-N, it can be indeterminate what is to be evaluated in what order.

Pro: This gives schedulers flexibility.
Con: Computations can give different results.
- Continuation-based scheduling doesn't neccessarily do better. It could consult "random" to determine the order of computations.
Pro: Signals and their schedulers give us another handle to manage evaluation order dependency. A scheduler could co-operate with analysis by:
- Promising that evaluations will behave "as if" done in some particular order.
- Taking into account analysis
- Providing information that makes analysis easier, short of simulating the scheduler.
  - Probably promising that no signals escape the scheduler itself, or only a few well-behaved ones do.
Pro: Regardless of the scheduler, analysis can treat signals as a propagating fringe. Everything not reached by the fringe is unaffected.

Should signal-blockers cause copies?

I suggested earlier that combiners might promise non-mutation by blocking mutation signals out.

One way to code this is to make such combiners deep-copy their arguments. Should this situation automatically cause copies to be made? ISTM yes.

Not an issue: User confusion. There is no case where a user reasonably expects different functionality than he gets.
- He might expect a function to mutate an object and it doesn't - but why is he expecting that of a function that doesn't claim to mutate anything and moreover explicitly blocks that from happening? Bad docs could mislead him, but that's a documentation issue.
Not an issue: Performance.
- Performance is always secondary to correctness.
- Performance can be rescued by
  - copy-on-write objects
  - immutable objects
  - Analysis proving that no mutation actually occurs.

18 October 2011

Thoughts on the structure of Klink Libraries

Klink Libraries

Previously

I blogged some ideas about Library paths for Klink, the Kernel implementation I wrote. I listed several desiderata, based on lessons from the past. I also blogged about how I'd like bridge the gap between package as used by developer and package as used by user.

Additional desiderata

(The desiderata from the previous blog post, plus:)
Should co-operate well with development. Switching from use to development shouldn't require gross changes or compete with library support.
Can fetch libraries automatically with reasonable power and control
- In particular, automatable enough to support "remote autoloading" but ultimately should be under the user's control.
Support clean packaging

Fetching libraries: mostly lean on git

The well-loved version manager git provides most of what I'd want, out of the box:

Co-operates well with development (More than co-operates, that's what it's usually for)
Reasonably compact for non-development. You can clone a repo with depth=1
Fetching
- Via URL (git protocol or otherwise)
- Doesn't treat URLs as sexps - only a mild problem.
Finding out what's there to be fetched, in the sense of available versions (eg, looking for latest stable release)
```
git ls-remote --tags URL 
```
- But we have to distinguish tags and tags^{, which AIUI don't
  refer to versions.}
Secure digital signatures are easy
- Creating them
```
git tag -s
```
- Verifying them
```
git-verify
```
- Some pages about them:
  - http://learn.github.com/p/tagging.html
  - http://progit.org/book/ch2-6.html
Excluding local customizations from being updated
- This is possible with .gitignore and some care
- But customizations will live somewhere else entirely (See below)
Practices supporting stable releases. git-flow (code and practices) does this.
NOT a well-behaved heterogenerated tree of libraries.

Of course git does not support knowing that a repo is intended as Kernel code. Looking at filename extensions does, but that seems to require fetching the repo first. For the same reason, it can't easily be any file that "lives" in the repo. It should be something about the repo itself.

So the convention I propose is that the presence of a branch named --kernel-source-release indicates a branch of stable Kernel code. Tags on that branch would indicate available versions, so even if coders are working informally and doing unstable work on "master", only tagged versions would be offered.

But does keeping --kernel-source-release up to date require extra effort for the maintainer? IIUC, git can simply make --kernel-source-release track "master", so if a coder's workflow is organized, he needn't make any extra effort beyond issuing a one-time command. Branch tracking is intended for remotes, but seems to support this.

Should there be other branches, like --kernel-source-unstable or --kernel-source-development? I suspect they're not needed, and any use of unstable branches should be specifically configured by the daring user.

I'm not proposing to permanently tie Klink (much less Kernel) specifically to git forever. But it serves so well and is so well supported that I'm not concerned.

Where to put it all?

That addressed how we can fetch code. In doing so, it put some restrictions on how we can organize the files on disk. So I should at least sketch how it could work on disk.

The easy part

Of course one would configure directories for libraries to live in. Presumably one would distinguish system, local, and user.

The hard part

A mature set of libraries tends to look like a tree. But above, we had to treat each one a separate project.

Approaches:

Let the namespace tree exactly be the directory tree.
- Fatal con: Code repos want to "own" all their subdirectories. git will try to control everything in its working tree. It even wants submodules to covary with the working tree.
Have a dotty namespace where all the components are expressed in one name. So directories have names like "std.math.util.raw"
- Con: Yuck! There's everything wrong with this.
Reserve a name for subdirectories and .gitignore it.
- Con: Every level of subdirectory becomes two levels.
- Con: Still yuck.
Require that directories must be either inner nodes or leaves, never both. If a project directory becomes an inner node, move code in it to a directory with a special name, say "%main".
- Con: Requires moving subdirectories, possibly behind the back of other software.
Library filespace and project filespace are distinct. Stow the projects into a library tree. They live in some other directory as configured by user.
- Pro: Clean.
- Pro: Flexible.
- Pro: Lets local customizations live outside the project repo, as they should.
- Con: How is stow to know where a project's "root" in filespace is?
- Con (that the other approaches had too): Internal project filetree organization can still mess us up. What happens if its code lives in "src" instead of in "."?

I favor the "stow" approach.

Path configuration

But the stow approach still left issues of where exactly to stow things. We can't solve it in the file system. That would result in one of two ugly things:

Making each project represent the entire library filespace, with its real code living at some depth below the project root.
Making each project physically live in a mirror of the target filespace. This would have all the problems we were avoiding above plus more.

So I propose per-project configuration data to tell stow about paths. I'd allow binding at least these things:

prefix: The library prefix, being a list of symbols.
parts

For example,

((prefix (std util my-app))
   (parts
      (
         (source
            [,,src,,]
            ())
         (source
            [,,tests,,]
            (tests))
         (info
            [,,doc,,]
            ())
         (default-customizations
            [,,defaults,,]
            ())
         (public-key
            [,,pub_key.asc,,]
            ()))))

That would live in a file with a reserved name, say "%kernel-paths" in the repo root. As the example implies, the contents of that file would be sexps, but it wouldn't be code as such. It'd be bindings, to be evaluated in a "sandbox" environment that supported little or no functionality. The expressions seem to be just literals, so no more is required.

Dependencies and version identity

Surfeit of ways to express version identity

There are a number of ways to indicate versions. All have their strengths:

ID hash
- Automatic
- Unique
- Says nothing about stability and features
Release timestamp
- Easily made automatic
- Time ordered
- Nearly unique, but can mess up.
- Says nothing about stability and features
Version major.minor.patch
- Just a little work
- Expresses stability
- Expresses time order, but can be messed up.
Test-satisfaction
- Lots of work
- Almost unused
- Automatically expresses stability and features
- No good convention for communicating the nature of tests
`stable', `unstable', `release', `current'.
- Expresses only stability and currency
By named sub-features
- Just a little work
- Expresses dependencies neatly
- Expressive
- Not automatic

I chose sub-feature names, based on how well that works for emacs libraries, a real stress test. That is, I choose for code to express dependencies in a form like:

(require (li bra ry name) (feature-1 feature-2))

Co-ordinating sub-features with version identity

The other forms of version identity still exist as useful data: ID hash, version tags, results of tests. What makes sense to me is to translate them into sets of provided features. Do this somewhere between the repository and the require statement. require would still just see sets of features.

Desiderata for this translation:

Shouldn't be too much work for the developer.
- Probably easiest to support automatic rules and allow particular exceptions. With a git-flow workflow, this could almost be automatic. As soon as a feature branch is merged into "master", that version and later versions would be deemed to have a feature of that name.
Should be expressable at multiple points in the pipeline, at least:
- Annotations in the source code itself
- In the repo (In case the source code annotations had to be corrected)
- Stand-alone indexes of library identities. Such indexes would be libraries in their own right. Presumably they'd also record other version-relevant attributes such as signature and URL.
- Locally by user
Should be derivable from many types of data, at least:
- Branches (eg, everything on "master" branch has the feature stable)
- Tag text (eg, all versions after (2 3 3) provide foo-feature)
- Tag signature (eg, check it against a public key, possibly found in the repo)
- Source code annotations (eg, after coding foo-feature, write (provide-features ear lier fea tures foo-feature))
- Tests (eg, annotate foo-feature's (sub)test suite to indicate that passing it all means foo-feature is provided)
- ID
  - To express specific exceptions (eg, ID af84925ebdaf4 does not provide works)
  - To potentially compile a mapping from ID to features
- Upstream data. Eg, the bundles of library identities might largely collect and filter data from the libraries
Should be potentially independent of library's presence, so it can be consulted before fetching a version of a library.
Should potentially bundle groups of features under single names, to let require statements require them concisely.

Dependencies

With sub-features, we don't even need Scheme's modest treatment of dependencies, at least not in require. Instead, we could avoid bad versions by indicating that they lack a feature, or possibly possess a negative feature.

The usual configuration might implicitly require:

works
stable
trusted-source
all-tests-passed

The set of implicitly required features must be configurable by the user, eg for a developer to work on unstable branches.

Library namespace conventions

On the whole, I like the CPAN namespace conventions. I'd like to suggest these additional (sub-)library-naming conventions:

raw

This interface provides "raw" functionality that favors regular operation and controllability over guessing intentions.

dwim

This interface provides "dwim" functionality that tries to do what is probably meant.

test

This sub-library contains tests for the library immediately enclosing it

testhelp

This sub-library contains code that helps test libraries that use the library immediately enclosing it. In particular, it should provide instances of objects the library builds or operates on for test purposes.

interaction

This library has no functionality per se, it combine one or more functional libraries with an interface (keybindings, menus, or w/e). This is intended to encourage separation of concerns.

inside-out

This library is young and has not yet been organized into a well-behaved namespace with parts. It can have sub-libraries, and their names should evolve to mirror the overall library organization so that it can become a real library.

(inside-out new-app)

user

This user is providing a library that doesn't yet have an official "home" in the namespace. The second component is a unique user-name.

(user tehom-blog/blogspot.com inside-out new-app)
(user tehom-blog/blogspot.com std utility new-util)

Other roots

But there are other files that are relevant to libraries but are not source code. ISTM there's often a benefit in locating them relative to a given library. But they shouldn't have an identity in the library tree itself:

They're not source
Libraries mustn't accidentally use their identifier as the name of a normal source file.

So I would define other trees exactly paralleling the library tree. I would include at least:

docs

Organized documentation

info

Documentation specifically as info files

customizations

For customizations relating to a library. Always local.

(customizations std util my-app)

Format : List of bindings

default-customizations

For the recurring idiom where one initially configures with defaults, and user changes to them are preserved.

Format : List of lists, each being in the format (name value first-appeared-in-version)

files

General non-source files supporting a library. In particular, in relation to test or examples.

13 October 2011

Mutability And Signals

Mutability and Signals

Recently I've been working on Rosegarden, the music sequencer. It uses Qt which uses signals.

Signals implement the Observer pattern, where an object notifies "observers" via signals. A signal is connected to one or more "slots" in other objects. The slots are basically normal methods, except they return nothing (void). When a signal is emitted, Qt arranges for the slots to be called, other than those of deleted objects. So far, I find it works easily and elegantly.

This made me wonder: Could signals take the place of mutability in Scheme? And might that give us both referential transparency and reactiveness simultaneously?

There's not much support for signals for Lisp and Scheme. There's Cells, but it seemed to be conceived of as just a super-spreadsheet. I want to go much further and use signals to re-imagine the basics of object mutation.

Quasi-mutation: The general idea

Basically, we'd use signals between constant objects to fake mutability.

Objects can't mutate.

"Slots" are closures.
- Closures would typically know a target object. Possibly as environment as record.
Signals are emitted with respect to particular objects.
- Not "by objects". Again, we're not object-oriented. We're just indexing on objects.
- Ability to emit is controlled by access to objects and signal types.
- Indexing on one particular argument seems overly special, so I contemplate indexing on any relevant arguments. This is again similar to generic functions.
Signals can be connected to slots.
- The signals go to where the object's respective signal is connected. They are indexed on objects.
Constructors connect the signal replaced from the parts to the constructed object.
- More precisely, to a closure that knows the object.
- The closure would fully represent the objects' relation. For instance, mutable pairs might have the slots new-car and new-cdr with the obvious meanings.
- But not for immutable objects. Immutable objects' slots would not be new-car and new-cdr, they would raise error.
- The constructed object can access its part objects, in appropriate ways by its own lights. For instance, a pair object could retrieve its car and cdr objects.
- This particular signal replaced is not exposed.
- The details of replaced will be refined below.
Slots such as new-car will typically:
- Construct a near-copy of the object, with the new part in the old part's place. This effectively connects a new version of the object to the new part and disconnects it from the old part.
- Emit replaced with respect to the object, propagating the change.
"Normal" setters such as set-car! emit replaced wrt the old object with the new object as value.
- That's just the classical way. There's plenty of room to do clever new things with signals.
- As above, doing this to immutable objects causes error.
Constructed objects behaving this way would include at least:
- Mutable pairs (and therefore mutable lists and trees)
- Environments
- Continuations. While not often considered object-like, continuations have parts such as the current environment and their parent continuations.
External-world-ish objects such as ports react to signals in their own appropriate way, not neccessarily propagating them further.

As if

Literally implementing ever pair with at least two signals between itself and its car and cdr seems prohibitive if not impossible. Physically, it couldn't be the only mechanism of mutation. So I'm talking about a mechanism that acts as if it's continuous down to basics like pairs and lists, but really uses a more modest mechanism where it can (presumably containment, as now).

Circularity

My blog readers, being smart, will already be wondering what happens when replaced is propagated in a circular structure. Does it propagate forever, crashing the interpreter? That might go something like this: The signal propagates from pair to pair, repeatedly reaching its original location, pointlessly representing more new objects that differ only from the k*N nth cdr, and never finishing because there's always an (k+1)*N nth cdr to represent.

For instance, in the circular list above, suppose we gave object C a new value. It emits (replaced New-C C), which B receives in new-cdr. B emits (replaced New-B C) which both A and old C receive. Leaving aside A's reaction, old C then emits (replaced New-Old-C C) which old B again receives. A is going to be deluged with infinite signals, and we will keep making new copies of old B and old C. Worse, we make many new-Bs when we only have a place for one.

The too-easy answer would be to block all repetitions between the same two objects. That fails, because we re-emit on new copies. So also block repetitions where the objects are the same except that one is replaced by an updated copy?

So as a first draft, I propose that these built-in signal emissions:

Be each associated with a unique ID
Refuse to propagate that ID circularly

The ID would be an opaque ID equal? only to itself. It would be unique to a given cascade of emissions. That is, set-car! etc would make fresh IDs but new=car etc would use the ID of the replaced signal that provoked it.

One subtle point: It suffices for the signalled object itself to recognize repititions. Originally I thought repititions would have to be recognized across a transitive closure. But only the original object is connected to by its contents, and connections from new parts to new object don't receive any signal to propagate.

What about structural information?

The above is progress, but it discards structural information.

Consider an application that displays the first N items from a possibly-circular list on screen, and redisplays them just as needed (Say they are formatted in some time-consuming way and different screen locations can't share this work) The display would want to be notified of all the nth cdr occurences of a changed element. The approach above wouldn't permit that.

So when there's circularity, let's switch tactics and use polling to retrieve information as needed. Let any circular object provide a stream of all the replacements that occured in it for a given emission-ID. Applications only read the stream as far as they need to and the rest is not really generated.

But often a slot would prefer to get a simple new object, without replaying all the detailed changes.

Desiderata

To sum up the desiderata on this:

We want slots to be able to replay detailed changes.
We want slots to be able to just get a new object
We don't want mutated objects to have to guess what their target slots want, which may not even be the same for different slots.
We don't want mutated objects to have to do inordinate (even infinite) work preparing these signals.
We don't want to make assumptions about the topology of the mutated object. It can be an arbitrary directed graph.

A thought-experiment sequence of operations

The sequence of operations on the circular list above would go something like this. Here I use FOO.N to indicate the nth version of FOO.

We gave object C.1 a new car, V (we emit replaced.1 to it) Its change is:
- "Replace car with object V". We'll represent this as
  - The trivial accessor identity.
  - A promise to construct V (Trivial since we started off knowing it)
replaced.1 reaches C
We emit replaced.2 for C. Its change is:
- A disjunction (or) of
  - "Replace what you're holding with C.2"
    - identity
    - A promise to construct C.2
  - "Replace car of it with V"
    - car
    - Promise of V
replaced.2 reaches B
We emit replaced.3 for B.1. Its change is:
- A disjunction (or) of
  - "Replace this with B.2"
    - identity
    - A promise to construct B.2
  - A disjunction (or) of
    - "Replace cdr of this with C.2"
      - cdr
      - A promise to construct C.2
    - "Replace cddr of this with V"
      - cadr
      - Promise of V
replaced.3 reaches A.1
- We'll say the promises are not forced yet. If they are, that still forces these following operations to occur first.
replaced.3 also reaches C.1. Here we encounter a repitition. C.1 has already encountered a signal from this trigger. We don't want to emit another consequence of it to the same places. Instead we want to fold the signals together.

Let's stop operations and look at it

If we made replaced.4, its change would be:
- A disjunction (or) of
  - "Replace this with C.2"
    - identity
    - A promise to construct C.2
  - A disjunction (or) of
    - "Replace cdr with B.2"
      - cdr
      - A promise to construct B.2
    - A disjunction (or) of
      - "Replace cddr of this with C.2"
        
        cddr
        
        Again, a promise to construct C.2
      - "Replace cadr of this with V"
        
        caddr
        
        Promise of V

So replaced.4 resembles replaced.2. The big change is that instead of "Replace car of this with V", we have a cdr replacement. It pushes "replace with V" down to another cycle of the loop. That doesn't support the "display circular list" example, which was our motivation for doing this. So:

We need to support something that can mention both the car replacement and the cdr replacement of C.1
We need to effectively be able to adjust previous signals instead of emitting more.

I first considered and discarded an approach of rewriting the second disjunct into a conjunction.

Now the most tempting approach is to rethink the car replacement operation. It did not fully describe the object, it just described the new car, V.

Let's say that all such operations are in fact constructors, and in the case of "Replace car of it with V", we let the cdr argument default to the current cdr.

Timing issues:

The default can't be committed to until the signal and its consequents have stopped propagating.
We know this by Emission-Id.
So forcing one of these promises first causes the signal to propagate until it's done.

This sounds quite like what I would expect from a solution. Constructors "stand for" the objects they make, as is often the case in RT languages. We are getting a structure that's isomorphic to the original object (circular list of period 2). "Just mutate contents" remains a valid optimization as long as we're in a closed system (And a tempting one; I have to keep reminding myself that we're conceptually replacing mutation with signals so it won't do)

The new structure of replacement info

So replaced.2's replacement info actually looks like:

A disjunction (or) of
- "Replace what you're holding with C.2"
  - identity
  - A promise to construct C.2
- "Construct it with V as car"
  - Arguments
    - car : Promise of V
    - cdr : C.2's cdr value, defaulting to C.1's cdr value.

Default arguments

The action of replaced.3 on C.1 just provides a value for the cdr argument. Re-emission is not needed, nor wanted since at C's cdr the signal has reached a join.

So the default cdr argument is held in a sort of limbo until it's given or not. That's a lot like mutation in itself. So it too is managed by signals:

"All done". Signal has fully propagated, use defaults for anything not filled.
"Use this" Accessor Value. After receiving "use this", "All done" has no effect.

These are of course quasi-mutating the stream argument of replace. There's an infinite regress there, but since it is all behind promises that delay evaluation until after this is done, I think we can consider it as if the stream had had one unchanging value forever.

Structure of the change stream

So each "cell" of the stream of operations changes consists of a disjunction on constructors and values, rather than a cons cell. Each disjunct can be:

A promise of a value
A promise of a constructor parameterized by promises of values

So it's like an n-ary tree-stream.

Note that this naturally orders the change stream from shallow to deep. If it was ordered from deep to shallow we couldn't find elements in finite time.

This doesn't preclude specialized representations that (say) compact multiple simple cdr accesses to an nthcdr access. These could be represented as additional constructors in the disjunct.

Object identity

Above, I said "constructed object can access a part object", not "a part's value". Since objects no longer ever change value, the difference is subtle. It's just this: the object has a single set of signal connections. So it has a single identity. So there is a trace of object identity remaining.

One could represent identity value-wise by saying that values consist of a (classical) value and an "object-identity" value, and that object-identity values are opaque and unique except as shared by this mechanism. So signals are connected with respect to object-identity values.

It has a flavor of "the long way around", but it lets us treat objects entirely as values.

Object versioning

Different versions of an object have different Object-IDs. Imperatively this wouldn't be anything, since two simultaneous references to the same object can't have different values. But here, one can both mutate an object as far the the world is concerned and hold onto its old value. But it should never be the case that objects are eq? but not equal?. So different versions have different Object-IDs.

The equality predicates

Equality:

equal?

is what it normally is in Scheme or Kernel: The objects have the same current value.

eq?

A and B are eq? just if

(equal? identity-of-a identity-of-b)

=?

is just an optimization of equal?

Signals and immutability

I mentioned that I was thinking about immutability in regard to this. So far, I've just described how to duplicate mutability with signals.

For immutable objects, some or all slots would still get connections, but would raise error instead of propagating mutation.

But that's only half of the control we should have over mutation. We'd also like to guarantee that certain evaluations don't mutate even mutable objects they have access to, eg their arguments, the environment, and dynamic variables. The "foo!" convention indicates this (negatively) but doesn't enforce anything. "foo!" convention notwithstanding, we'd like to guarantee this from outside an arbitrary call, not from inside trusted combiners.

Blocking signals

So we'd like to sometimes block signals. If signals were emitted anyways, they'd be errors and would not reach their destinations. So if replaced is blocked, code either doesn't try to mutate objects or tries and raises an error. Either is consistent with immutability.

ISTM the simplest way to block signals is to disconnect their existing connections and connect them to error combiners. When the call is done, restore their original connections. However, that doesn't play well with asynchrous execution.

Instead, we'll make a copy of the original object that will (probably lazily) infect its parts with "don't mutate me in this scope".

Scope

For a traditional imperative language, where flow of control and scope are structurally the same, we could block signals in specific scopes, recursively. But for Scheme and Kernel, that won't suffice. What would happen if an object is passed to a continuation and mutated there? We've broken the guarantee that the object wouldn't be mutated. Any time we let objects be passed abnormally, this can happen.

We might try to:

raise error if affected objects are passed to continuation applications, or
"infect" the other scope with the signal restrictions.

Neither is appealing. In this mechanism, continuing normally is also passing to a less restrictive scope. And continuing normally should behave about the same way as continuing abnormally to the same destination. We also don't want error returns to permanently "freeze" objects.

So ISTM we must distinguish between continuing to a (not neccessarily proper) parent of the restricting scope (normally or otherwise) and continuing elsewhere. Signal blocks are removed just if control reaches a parent. This is essentially how Kernel guards reckon continuation parentage.

Doing this for all object parts

We'd usually want to say that no part of any argument to a combiner can be mutated. It's easy enough to treat signal connections from the root argobject. But we need to "infect" the whole object with immutability, and not infect local objects, which may well be temporary and legitimately mutable.

Since these arguments are ultimately derived from the root argobject, what we can do is arrange for accessors to give immutable objects in their turn. But they have to be only temporarily immutable - blocked, as we said above. And we'd prefer to manage it lazily.

So I propose that accessing a blocked object gives only objects:

whose replaced signals are re-routed to error, as above, until control "escapes normally" (an improper parent continuation is reached)
which are in turn blocked, meaning they have the same property infecting all of their accessors.

Non-pair containers

Non-pair containers are not treated by mechanisms like copy-es-immutable. But we want to treat immutably fully, so they have to be treated too. This is the case even for:

Environments
Encapsulation types. In this case, their sole accessor is required to do this, as all accessors are.
Ports. Administrative state or not, they can be immutable.
Closures. Anything they return is considered accessed and automatically gets this blockingness. Their internal parts (static environment, etc) need not be blocked.
Continuations. Like closures, what they return is considered accessed and automatically gets this blockingness.

Exemptions

We'd like to be able to exempt particular objects from this. Some combiners mutate an argument but shouldn't mutate anything else. There'd probably be a signal-block spec that would specify this.

Blocking signals to keyed dynamic objects

We can easily extend the above to deal with dynamic environment, but keyed dynamic objects are not so simple. Their accessors would be covered by the above if derived from the argobject or the dynamic environment, but they need not be.

So we need an additional rule: Keyed dynamic objects are blocked if accessed in the dynamic scope of the blocking. That's recursive like other blockings. Keyed rebindings in the dynamic scope aren't, because one might bind a temporary that's legitimately mutable.

Side note: I'd like to see a stylistic convention to differentiate between combiners that mutate their arguments ("foo!") and combiners that mutate something else, meaning either their dynamic environment or a dynamic variable (Say "foo!!")

Interface to be defined

I've already written a lot, so I'll leave this part as just a sketch. To support all this, we need to define an interface.

A means of defining new signal types
- Returns a signal emitter
- Returns a signal identifier for `connect' to use.
A signal scheduler. By general principles, the user should be able to use his own, at least for exposed signals. It's not too hard to write one with closures and continuations.
Means for the user to emit signals
Means for the user to connect and disconnect signals
- Not exposed for many built-in objects such as pairs.
- "capture all current", as for blocking
- "disconnect all"
block-all-signals: connects all signals to error continuation
- Possible argument `except'
(Maybe) block-signal: connects given signal to error continuation
A "dirty-flag" mechanism, often useful with signals.
Possibly a priority mechanism.

Some possibilities this raises

Use the signal scheduling mechanism for managing constraints. Since when we can enforce immutability, constraints become very tempting.
Provide general signals
Provide an interface to Lone-COWs, a sort of copy-on-write object optimized for usually being uniquely referenced/owned.
Supplying "out-of-band" signals to a debugger or similar. They really do need to be out-of-band.
Provide a broadly applicable interface for redo/undo. It could basically just capture historical copies of objects.

Glorified footnote: Aren't signals a lot like callbacks?

Somewhat. The differences are:

Signals are N to 1.
Signals are associated to some sort of signal-type object. One creates a signal of a specific type. In Qt, this is specified as "signal:" members of a class.
Signals are scheduled more freely.
Signals to "dead" objects do nothing. Their existence doesn't keep objects alive; in this they are like weak references.
The sending object does not explicitly track its observers,

Thinking about Immutability in Klink

Previously

Klink is my stand-alone implementation of Kernel by John Shutt.

Types of immutability

There are several types of immutability used or contemplated in Klink.

Complete immutability: Not much to be said.
Pair immutability: A pair that forever holds the same two objects, though those objects' contents may change.
List immutability: For instance, a list where you can change the contents, but not the number of elements.
Recursive structural immutability: An immutable tree of mutable non-pair objects.
Administrative immutability: Eg, what ports typically have. When you read or write to a port, it remains "the same thing" administratively.

Current status in Klink

Complete immutability

Available in C, used in some places such as symbols and strings. AFAICT there's no way to specify its use or non-use in Kernel; some objects just have it because of what they are or how they are created.

Pair immutability

This is the default in Klink. To get mutable pairs, use mcons instead of cons. I was persuaded of this by Getting rid of set-car! and set-cdr! and Racket, also here.

List immutability

No way to have a pair whose car is mutable but whose cdr recursively isn't.

Recursive structural immutability (Tree immutability)

Supported via copy-es-immutable

Administrative immutability

No C flag for it. Like complete immutability, some objects just have it because of what type they are.

Non-recursive structural immutability?

If you read section 4.7.2 of the Kernel report (copy-es-immutable), you may notice that Pair immutability and List immutability are actually extensions. So I figured I should at least advance a rationale for them.

Is non-recursive immutability worth supporting? ISTM it's already strongly suggested by Kernel.

Implied by an informal type: Some combiners take finite lists as arguments; all applicatives require a finite list as argobject. That distinguishes the finite list as at least an informal type. There's a predicate for it, finite-list?, but pairs that "are" finite lists can "become" other sorts of lists (dotted or circular), so it falls short of being a formal type. This seems like an irregularity to me. Structural immutability would solve it.
Implied by implied algorithms: For some combiners (eg reduce), any practical algorithm seems to require doing sub-operations on counted lists. That implies a structurally immutable list, because otherwise the count could theoretically become wrong; in practice it's saved from this by writing the algorithm carefully. So there are at least ephemeral, implied structurally immutable lists present.
Vectors: John once told me that he would eventually like to have vectors in Kernel. Vectors are optimized structurally immutable finite lists.
Opportunity for optimization: There's an opportunity for other optimizations where list immutability is used and recognized. In particular, caching a list's element count is often nice if one can be sure the list count won't change.

Comparison table


Type of immutability	Pair	List	Tree
Special care needed
for shared objects?	No	Only own tail	Yes
Exists in Klink?	Yes	No	Yes
Exists in Kernel?	No	No	Yes
What might be mutable?	car, cdr	elements	leaves

21 September 2011

(Long) I'd like to see a Linux package manager co-operate with development

What provoked me to think about this

I recently went from using the Debian lenny version of Rosegarden to being a developer and using my development version. As transitions go, this was probably one of the easier ones. And overall, Debian is pretty clean about package management.

Yet it seemed to me that there were gratuitous obstacles. Most packages managers simply proceed as if versions that you build yourself do not exist:

They cannot satisfy requirements. For instance, a year or so ago when I built a newer gEda than Debian lenny had, I had to remove a perfectly good installed copy of easyspice and build it manually, because aptitude (and dpkg) couldn't understand that I still had gEda.
They cannot have requirements. For instance, Rosegarden wants jackd to be installed, but when I told aptitude to remove its copy of Rosegarden, it wanted to remove jackd too because it thought it was unused.
They tend to be stepped on when installing, and by the same token "make install" often steps on installed packages. Package managers do usually take care not to erase things they don't own. But this is of the nature of first endangering the other software, then sparing it.
Any config data that installed and built versions can share is purely accidental. The package manager thinks it controls all the config.

It's something I've noticed before, I'm just blogging about it for the first time.

An understandable choice

Now, this is all understandable. They don't call development versions "the bleeding edge" for nothing. If a package manager has to choose between blindly trusting locally built software and blindly ignoring it, it should ignore it.

And trying to make it all work together would raise problems of communication and coordination. How is a package manager supposed to know what it needs to know? And even if it knows, how is it supposed to coordinate with "make install"?

Does CheckInstall fix it?

Not for me. CheckInstall wants to watch "make install" and create a package (deb, RPM, others). The idea is that then you use that package to install and uninstall.

Creating a package every time I "make install" is a very heavy mechanism. It's not really for developers, it's for small distributors.
I'd forget to use it when I "make install", then what have I got?
The way I work would confuse it. I typically configure to install into usr/local/stow. Which leads me to the next section.
It's a roundabout way of doing things.

Stow could help

What Stow does

If you know about Stow, you probably should skip to Could a package manager work thru stow?

Stow is a Perl package by Bob Glickstein that helps install packages cleanly. More cleanly than you might think possible, if you're familiar with traditional installation. It works like this:

You install the package entirely inside one directory, usually a subdirectory of usr/stow or usr/local/stow We'll say it's in usr/local/stow/foo-1.0 No part of it is in bin or usr/bin or usr/doc etc. Every file lives under usr/local/stow/foo-1.0
With most builds you can arrange for "make install" to do this by passing ./configure an argument like
```
--prefix=/usr/local/stow/foo-1.0/
```
To complete the install, just:
```
cd /usr/local/stow/ 
stow foo-1.0
```
That makes symlinks from usr/doc, usr/bin, etc into the stow/foo-1.0 directory tree. No file is physically moved or copied out of stow/foo-1.0
Now the package is available just as if it had been installed.
Want it gone? Just
```
cd /usr/local/stow/ 
stow -D foo-1.0
```
Want it completely gone from your disk? After the above, you can just:
```
rm -r foo-1.0
```

This is neat in every sense of the word. It also can manage multiple versions of a package neatly.

Could a package manager work thru stow?

Imagine a package manager that simply put each installation into an appropriate subdirectory of usr/stow and then stowed it. As far as I can know, this hasn't been done.

That would make it easier for installed and built versions to live side by side.

Tracking a package's files is no longer an issue. They all live under a subdirectory and nothing else lives there (Oversimplification, but I'll get to that below). So stepping on built versions is no longer an issue. Neither is being stepped on by them. This also takes a lot of weight off the package manager.
Whether a package is considered installed is no longer tied to the package manager. A package would be considered installed just if it is stowed and considered uninstalled otherwise.
What about incomplete stowage, especially what if there's a race condition? A canonical flag could indicate when an (un)stow is in progress. I'm not aware whether stow does this; I've never seen it leave a stow half-done.
The directory provides a place that developer, stow, and package manager can all see for installation-related information about the software.

Must it be `stow`?

Maybe you don't like Perl. I don't either, though I've coded in it and I do like CPAN and a fair bit of stuff that's written in Perl.

stow is not the only tool for this, it's just the first. There are a number of variants or offshoots: Graft, ln_local, Reflect, Sencap, Toast. They mostly seem to be in Perl as well. One exception is Reflect, which requires only bash and coreutils. Unfortunately, Reflect appears to be abandoned.

So the idea here is not so much an application as a file-location protocol. Another tool could do the same job on the same directory. You could even change tools later on.

General approach

It's easy for me to say "it uses stow", but that leaves a lot of little issues. I'll tackle them one by one below, but first I'll outline a general approach.

Desiderata:

It needs to work safely even if interrupted. So all the relevant information needs to be stored in the filesystem.
It shouldn't require any component to understand details that are not its own concern, following the general rule that components should do one thing well.
If possible, it shouldn't require any component to do much more than it does now.
It should place little or no extra constraint on development
- Especially it should place no constraint on how to develop packages that one's package manager is not interested in.
It should not make security holes.

File-naming scheme

So I will base this mostly on a file-naming scheme:

A package named "FOO" relates uniquely to /usr/stow/FOO/
The files of version N.N of a package FOO live under /usr/stow/FOO/N.N/
- It's a slightly stronger version of the usual stow subdirectory naming practice. The difference from usual practice is just a level of subdirectory; it's FOO/N.N/ instead of FOO-N.N/
- Almost no format is assumed for N.N, except that it can't start with a single colon.
- I'm reserving double or more initial colons for situations where the version name wants to start with a colon.
FOO-related files and directories that are not themselves installed ("Magic" files) live in /usr/stow/FOO/
- Their names all start with a single colon.

Specifically unaffected so far:

Most stow functionality.
Most package manager functionality. It mostly needs to build paths accordingly.
Naive local packages that live as usr/stow/FOO-N.N (no slash). They just are not understood as versions of FOO and thus receive no benefit.

stow2

I will talk about stow2, a hypothetical variant of stow. It's old stow with a few minor extra behaviors. It doesn't aim to be a package manager. It doesn't watch out for system coherence as a package manager does. In this vision, it's merely the part of the package manager that handles physically adding or removing a package. It just aims to do software-stowing for either a package manager, a developer or both without inviting problems.

Specific issues

What about shared config?

I mentioned sharing config above. Config shouldn't disappear or get overwritten when changing versions, or when changing how versions are managed.

So configuration data that should persist across versions will live under /usr/stow/FOO/:config/. It's stowed or unstowed just like a normal stow directory. There difference is in how it's used:

It is meant to contain data that ought to remain unchanged across versions, such as user preferences.
It initially contains very little. I am tempted to say nothing at all, but the case of config triggers makes me unsure.

Unaffected:

Unaware packages will just operate on config data where they think it lives, which is where it's stowed from /usr/stow/FOO/:config/.
Developers can treat config data for their builds by their own lights.
stow2 can stow this like any other directory tree.
Maybe package format, which would not need a separate space for config data if there isn't any.

Altered responsibility: A package manager should:

When installing or updating
- Create the FOO/:config/ directory if it doesn't already exist.
- If it already exists, don't alter it.
- In any case, arrange for the package to update the config data, eg to add settings for new flags.
When told to remove a package's config data, unstow that directory before deleting it.

How the updating can be done is another issue, and this post is already long. I'll just say that I have a vision of farming off the (re)config control role entirely to make, at some stage higher than stow2, in such a way that developers and package managers can both use it.

What about config that consists of triggers?

For instance, what about scripts that live in /etc/init.d/? They are created when installing, but if the user removes them, they need to stay removed even when versions change. So they are shared config. Yet as scripts, they may freely change between versions. So they are not straightforwardly handled by stowing.

I'm not sure that what I'm about to propose will work with out-of-the-box tools, but I'll air it anyways:

Let scripts be stowed from FOO/N.N into a dedicated directory that isn't otherwise used, maybe /etc/triggers/FOO/. The contents of the scripts can vary freely across versions.
Let the triggers be symlinked into /etc/triggers/FOO/ from FOO:config

When scripts disappear between versions, it seems like it would leave dangling symlinks unless specifically removed. "make config" should handle removing obsolete ones, but it might not be perfect.

So one (possibly) new behavior is wanted: stow2 mustn't normally stow dangling symlinks, or at least ones into /etc/triggers/. (I'm not sure stow stows dangling symlinks even now)

What if a user unstows an installed package?

OK, everything works neat while the package manager is the only thing moving. But what happens when the user/developer starts using stow as it was meant to be used? He doesn't hurt anything by stowing new packages, but what happens when he unstows an installed package. . .:

. . .and he doesn't replace it, because he thinks that doing so constitutes uninstalling it. Operating behind a package manager's back could greatly confuse it.

Note that generally one needs to su in order to stow or unstow, so anything he can break this way he can break anyways. So we are only concerned about breakage by misunderstanding.

. . .and he replaces it with another version, presumably one he built? If his new version is buggy, he may have broken not only this software but packages that depend on it. But that's the same problem we developers already face, and it's not amplified by this. How his new version works with the package manager's various controls is a separate topic, below.

I propose solving the unstow case by adding:

New flag: Whether a given stow directory is under control of a package manager or not.
New behavior:
- Let stow2 refuse to unstow if that flag is set, unless forced.
- If forced, inform the package manager.

So reserve the filename /usr/stow/FOO/:managed-by

If it doesn't exist, that means no package manager considers FOO installed.
If it exists, it is a symlink to a package manager that considers FOO installed.
- I don't specify what part of the package manager. It might be useful to be a link to an executable.
- A package-manager, as package itself, has to manage versions of where that symlink points
  - Maybe by pointing it towards an unchanging location.
  - Maybe by understanding previous installed versions of itself.
When stow2 stows, it should create ../:managed-by pointing to itself.
stow2 should refuse to unstow if ../:managed-by exists and doesn't point to itself, unless forced.
(Stows that require unstowing another version are really the unstow case)
If stow2 forces an unstow, it should erase ":managed-by".
stow2 should support an filename argument that means "Act for this package manager".

What if a user stows a package?

I said "He doesn't hurt anything by stowing new packages". But we wanted more than harmlessness. We wanted locally built packages to be basically on the same footing as installed packages; at least, we want the user to be able to make that happen without real extra work.

But that invites us into a potentially complex system of dependencies and signing. Again, stow2 shouldn't have to concern itself with that but it also shouldn't make a mess.

For dependencies etc

For some concerns, we're just going to have to bite the bullet and say that the developer must talk to the package manager in its own format. Dependencies are one example.

Let's reserve the directories /FOO/:control for all package control data about FOO. Now, various package managers might have their own formats, and different dependencies etc might apply to different versions. So let's further reserve any file or directory /FOO/:control/PMNAME/N.N for a package manager named PMNAME with regard to package FOO version N.N. Here we can't do much about name clashes, but it's just between a few package managers.

Sometimes the same control data applies to many versions of the same package, up to version identity data. Particularly in development, the version might change frequently but the developer shouldn't have to create new control data each time. So let's reserve /FOO/:control-default and /FOO/:control-default/PMNAME.

So a conforming package manager PMNAME:

Must recognize files of the form /FOO/:control/PMNAME/N.N and /FOO/:control-default/PMNAME even if FOO is not a package it knows about.
Similarly, must recognize files of /FOO/:control-default/PMNAME
- For them, find all FOO/N.N and consider each a version of FOO.
Must treat all such package versions as available or installed, as appropriate.
- Even where this requirement conflicts with security measures it uses for external packages, such as digital signing.
May use its own internal control format, except insofar as it conflicts with other requirements. For instance:
- The requirement to associate :control-default/PMNAME data with multiple versions, which might conflict with a "version" field in control data.
- The requirement to treat such packages as available, which might conflict with digital signing.
Is allowed but not required to distinguish such packages, for instance by:
- presenting them differently
- presenting their dependency chains differently
Is recommended to provide a means in control data for indicating particular packages as unstable. Generally package managers already provide this.

But this still requires that the developer create control data in the package manager's format. Let's make it a little easier for him and require any package manager that has FOO-N.N to create /FOO/:control/PMNAME/N.N on request. Generally that's simple. Then the developer can adapt that instead of starting from scratch.

Digital signatures

A package manager generally guarantees package integrity by checking signatures. They are considered part of the control data.

We could require the developer to cause each "make install" to be signed. But this would not make /usr/stow/ even slightly more secure. Someone already put the code in question into it (usually by "sudo make install") Any misbehavior they wanted to do was already possible.

So I just say, trust the local versions. They belong there - or if somehow they don't, it's beyond what a package manager can provide security against.

The exception is if some user:

Has write permission into /usr/stow/
Does not have permission to stow
Does have permission to run the package manager
Can stow via the package manager. Perhaps it has the sticky bit, or he can sudo it but can't sudo stow.

So a package manager ought not to consider local versions installable by a user who can't directly run stow. It seems difficult for that situation can arise.

What about name collision?

What if two packages have the same name? That wasn't an issue when we assumed a package manager, which assumes a naming authority that can manage the namespace. And what about the flip side, if one package has two names at various times?

There's often no problem, but when there is, the developer has all the power to solve it. He can name his local stow subdirectories anything he pleases (directly or thru ./configure --prefix=). So it's up to the developer to manage the stow namespace in harmony with his favorite package manager.

Now, if more than one package manager is operating on the same system, there could namespace problems. But that wasn't even possible before, so nothing is lost.

What about bootstrapping?

How do you use stow2 if it's sitting in /usr/stow/stow2/1.1 unstowed? This is a solved problem: To bootstrap, call stow2 by its absolute path. Its support apps (Perl interpreter or w/e) also would be used by absolute path if they haven't been installed. That's all.

What about system bootstrapping?

Some Linux boot systems require some boot software such as the kernel to live in a small 1 (<100 Mb) partition at the front of the hard disk. You can't symlink from /boot/ into /usr/stow/, but there's already a well-known solution. You make a stow directory there, eg /boot/stow/, and you put such software in it.

One would have to define a mapping that the package manager understands, and affected systems would have to be configured, and affected packages tagged, but for most packages and systems it's no work.

What about an unbuilt package having requirements?

So I solved most of the obstacles, including "they cannot satisfy requirements", but not "they cannot have requirements". There's a place for them in control data, but:

It pertains to the wrong stage. Generally you need packages for the build stage, not after installation. If you as a developer only find you need some other package after installation, you just fetch it.
At the time /FOO/:control/PMNAME/N.N is read, FOO/N.N doesn't exist and the package presumably isn't available thru the package manager. So it would be fruitless for a package manager to try to install FOO-N.N.
It misses an opportunity to learn requirements from the configure stage.
It misses a wider opportunity for other apps to communicate their needs to a package manager. For instance, as a simple way for an app to pull in optional support that's not bundled with it (hopefully only with user approval)

"BUILD" packages don't suit development requirements either.

This one is unrelated to anything stow2 should do. It is properly the domain of package managers as requirement managers. Here I suggest:

A common file format for representing requirements, not particular to any package manager.
A canonical flag to invoke a package manager with regard to such a file, like:
```
aptitude2 --wanted autoconf2-201109212117-1.requirements
```

The format seems to require at least these fields:

Name of requesting app
Date requested
List of desired packages, each giving:
- Name
- Version required
- What exactly is wanted, eg app, static lib, dynamic lib, dev headers.

Because there is no one package manager in control, name collision and name splits are now an issue. In particular, virtual package names have little to go on.

One approach is to allow the package name to be represented in multiple ways, each with respect to some managed namespace. Like:

(("debian" "sawfish") ("redhat" "sawmill"))

Total requirements

So totalling up the requirements,

A file-naming scheme, consisting of:
- One or more STOW directories
  - Zero or more package names
    - Zero or more version names (each a directory tree)
    - :managed-by (a symlink)
    - :installed (a symlink)
    - :config (a directory tree)
    - :control
      - Zero or more package-manager names
        
        Zero or more version names
    - :control-default
      - Zero or more package-manager names
    - :stowing (a symlink)
A format for listing required packages, not specific to one project manager.
stow2 must
- Respect ../:managed-by wrt any directory it (un)stows from.
- Impersonate a given package-manager, by filename argument, to ../:managed-by.
- Otherwise behave like stow
Package managers must
- Respect /FOO/:managed-by
- Recognize /FOO/:control-default/PMNAME
- Recognize /FOO/:control/PMNAME/N.N
- Create /FOO/:control/PMNAME/N.N on request
- Obey a tag-to-stow-directory mapping, part of its own config.
- Understand the above required packages format
- Obey command-line flag --wanted
A common means for developers and package managers alike to cause config updates. I suggest leaning on make.
That's all.

Footnotes:

¹ I just called 100 Mb small. There was a time when we all called that "huge".

24 December 2011

The mechanisms that a Trade Logic system would need

Previously

Now

The mechanisms that a TL system needs

Trade Logic pick-boolean

Previously

What's wrong with pick01 as basic

So pick-boolean

But it shouldn't be unary

Stream splitting

Can't be built by users

New pieces

09 December 2011

Kernel WYSIWYG rooted digraphs

Previously

That's not fully general

Things that don't do the job

What to do

Signals, continuations, constraint-handling, and more

Previously

Signals for Constraint-handling

An analogy

Another use: Precision

Apply-continuation and signals can mutually define

signals allow indeterminate order

Should signal-blockers cause copies?

18 October 2011

Klink Libraries

Previously

Additional desiderata

Fetching libraries: mostly lean on git

Where to put it all?

The easy part

Path configuration

Dependencies and version identity

Surfeit of ways to express version identity

Co-ordinating sub-features with version identity

Dependencies

Library namespace conventions

13 October 2011

Mutability and Signals

As if

What about structural information?

Desiderata

A thought-experiment sequence of operations

Let's stop operations and look at it

The new structure of replacement info

Default arguments

Structure of the change stream

Object identity

Object versioning

The equality predicates

Signals and immutability

Blocking signals

Scope

Doing this for all object parts

Non-pair containers

Exemptions

Blocking signals to keyed dynamic objects

Interface to be defined

Some possibilities this raises

Previously

Types of immutability

Current status in Klink

Complete immutability

Recursive structural immutability (Tree immutability)

Administrative immutability

Non-recursive structural immutability?

Comparison table

21 September 2011

An understandable choice

Does CheckInstall fix it?

Stow could help

Must it be stow?

General approach

Desiderata:

File-naming scheme

stow2

Specific issues

What's wrong with `pick01` as basic

So `pick-boolean`

Must it be `stow`?