Tehom: programming

Showing posts with label programming. Show all posts

12 May 2012

Mutability And Signals 3

Previously

I have a crazy notion of using signals to fake mutability, thereby putting a sort of functional reactive programming on top of formally immutable data. (here and here)

Now

So recently I've been looking at how that might be done. Which basically means by fully persistent data structures. Other major requirements:

Cheap deep-copy
Support a mutate-in-place strategy (which I'd default to, though I'd also default to immutable nodes)
Means to propagate signals upwards in the overall digraph (ie, propagate in its transpose)

Fully persistent data promises much

As mentioned, signals formally replacing mutability.
Easily keep functions that shouldn't mutate objects outside themselves from doing so, even in the presence of keyed dynamic variables. For instance, type predicates.
From the above, cleanly support typed slots and similar.
Trivial undo.
Real Functional Reactive Programming in a Scheme. Implementations like Cell and FrTime are interesting but "bolted on" to languages that disagree with them. Flapjax certainly caught my interest but it's different (behavior based).
I'm tempted to implement logic programming and even constraint handling on top of it. Persistence does some major heavy lifting for those, though we'd have to distinguish "immutable", "mutate-in-place", and "constrain-only" versions.
If constraint handling works, that basically gives us partial evaluation.
And I'm tempted to implement Software Transactional Memory on it. Once you have fully persistent versioning, STM just looks like merging versions if they haven't collided or applying a failure continuation if they have. Detecting in a fine-grained way whether they have is the remaining challenge.

DSST: Great but yikes

So for fully persistent data structures, I read the Driscoll, Sarnak, Sleator and Tarjan paper (and others, but only DSST gave me the details). On the one hand, it basically gave me what I needed to impelement this, if in fact I do. On the other hand, there were a number of "yikes!" moments.

The first was discovering that their solution did not apply to arbitrary digraphs, but to digraphs with a constant upper bound p on the number of incoming pointers. So the O(1) cost they reported is misleading. p "doesn't count" because it's a constant, but really we do want in-degree to be arbitrarily large, so it does count. I don't think it will be a big deal because the typical node in-degree is small in every code I've seen, even in some relentlessly self-referring monstrosities that I expect are the high-water mark for this.

Second yikes was a gap between the version-numbering means they refer to (Dietz et al) and their actual needs for version-numbering. Dietz et al just tell how to efficiently renumber a list when there's no room to insert a new number.

Figured that out: I have to use a level of indirection for the real indexes. Everything (version data and persistent data structure) hold indirect indexes and looks up the real index when it needs it. The version-renumbering strategy is not crucial.

Third: Mutation boxes. DSST know about them, provide space for them, but then when they talk about the algorithm, totally ignore them. That would make the description much more complex, they explain. Yes, true, it would. But the reader is left staring at a gratuitously costly operation instead.

But I don't want to sound like I'm down on them. Their use of version-numbering was indispensable. Once I read and understood that, the whole thing suddenly seemed practical.

Deep copy

But that still didn't implement a cheap deep copy on top of mutate-in-place. You could freeze a copy of the whole digraph, everywhere, but then you couldn't both that and a newer copy in a single structure. Either you'd see two copies of version A or two copies of version B, but never A and B.

Mixing versions tends to call up thoughts of confluent persistence, but IIUC this is a completely different thing. Confluent persistence IIUC tries to merge versions for you, which limits its generality. That would be like (say) finding every item that was in some database either today or Jan 1; that's different.

What I need is to hold multiple versions of the same structure at the same time, otherwise deep-copy is going to be very misleading.

So I'd introduce "version-mapping" nodes, transparent single-child nodes that, when they are¹ accessed as one version, their child is explored as if a different version. Explore by one path, it's version A, by another it's version B.

Signals

Surprisingly, one part of what I needed for signals just fell out of DSST: parent pointers, kept up to date.

Aside from that, I'd:

Have signal receiver nodes. Constructed with a combiner and an arbitrary data object, it evaluates that combiner when anything below it is mutated, taking old copy, new copy, receiver object, and path. This argobject looks very different under the hood. Old and new copy are recovered from the receiver object plus version stamps; it's almost free.
When signals cross the mappers I added above, change the version stamps they hold. This is actually trivial.
As an optimization, so we wouldn't be sending signals when there's no possible receiver, I'd flag parent pointers as to whether anything above them wants a signal.

Change of project

If I code this, and that's a big if, it will likely be a different project than Klink, my Kernel interpreter, though I'll borrow code from it.

It's such a major change that it hardly seems right to call it a Kernel interpreter.
With experience, there are any number of things I'd do differently. So if I restart, it'll be in C++ with fairly heavy use of templates and inheritance.
It's also an excuse to use EMSIP.

Footnotes:

¹ Yes, I believe in using resumptive pronouns when it makes a sentence flow better.

01 March 2012

Digrasp - The options for representing digraphs with pairs

Digrasp 3

Previously

This is a long-ish answer to John's comment on How are dotted graphs second-class?, where he asks how I have in mind to represent digraphs using pairs.

The options for representing digraphs with pairs

I'm not surprising that it comes across as unclear. I'm deliberately leaving it open which of several possible approaches is "right". ISTM it would be premature to fix on one right now.

As I see it, the options include:

Unlabelled n-ary rooted digraph. Simplest in graph theory, strictest in Kernel: Cars are nodes, cdrs are edges (arcs) and may only point to pairs or nil. With this, there is no way to make dotted graphs or lists, so there is no issue of their standing nor any risk of "deep" conversion to dotted graphs. It loses or alters some functionality, alists in particular.
Labelled binary rooted digraph: More natural in Kernel, but more complex and messier in graph theory. Cars and cdrs are both edges, and are labelled (graph theory wise) as "car" or "cdr". List-processing operations are understood as distinguishing the two labels and expecting a pair in the cdr. They can encounter unexpected dotted ends, causing errors.
Dynamic hybrid: Essentially as now. Dottedness can be checked for, much like with `proper-list?' but would also be checkable recursively. There's risk of "deep" conversion from one to the other; list-processing operations may raise error.
Static hybrid: A type similar to pair (undottable-pair) can only contain unlabelled n-ary digraphs, recursively. List operations require that type and always succeed on it. There's some way to structurally copy conformant "classic" pair structures to undottable-pair structures.
Static hybrid II: As above, but an undottable-pair may hold a classic pair in its car but not its cdr, and that's understood as not part of the digraph.

And there's environments / labelled digraphs

By DIGRASP, I also mean fully labelled digraphs in which the nodes are environments and the labels are symbols. But they have little to do with the list-processing combiners.

27 February 2012

How are dotted graphs second class?

Digrasp

Previously

I said that dotted graphs seem to be second class objects and John asked me to elaborate.

How are dotted graphs second class?

A number of combiners in the spec accept cyclic but not dotted lists. These are:

All the type predicates
map and for-each
list-neighbors
append and append!
filter
reduce
"Constructably circular" combiners like $sequence

So they accept any undotted graph, but not general dotted graphs. This occurs often enough that to make dotted graphs seem second-class.

Could it be otherwise?

The "no way" cases

For some combiners I think there is no sane alternative, like `pair?' and the appends.

The "too painful" cases

For others, like filter or list-neighbors, the dotted end could have been treated like an item, but it seems klugey and irregular, and they can't do anything sane with a "unary dotted list", ie a non-list.

$sequence etc seem to belong here.

map and for-each

For map and for-each, dotted lists at the top level have the same problem as above, but ISTM "secondary" dotted lists and lists of varying length could work.

Those could be accomodated by passing another combiner argument (proc2) that, when any list runs out, is given the remaining tails isomorphically to Args, and its return is used as the tail of the return list. In other words, map over a "rectangle" of list-of-list and let proc2 work on the irregular overrun.

The existing behavior could be recaptured by passing a proc2 that, if it gets all nils, returns nil, and otherwise raises error. Other useful behaviors seem possible, such as continuing with default arguments or governing the length of the result by the shortest list.

Reduce

Reduce puzzles me. A cyclic list's cycle after it is collapsed to a single item resembles a dotted tail, and is legal. Does that imply that a dotted list should be able to shortcut to that stage?

25 February 2012

Digrasp

Previously

I have often blogged about Kernel, John Shutt's Scheme-like language.

Lisp becomes Digrasp?

One interesting thing about Kernel is that it treats pairs rather than lists as fundamental. Consequently, digraphs constructed from pairs have a certain fundamental status too. Most operations in Kernel allow arbitrary digraphs if they allow pairs. OK, dotted graphs seem to be second class objects. But as long as every cdr points to a pair or nil, you can pass it almost anywhere that accepts a pair.

So rather than LISt Processing, it's like DIrected GRAph ProceSsing. OK, the acronym's not perfect, but it sounds better than DIGRAP and echoes LISP.

09 December 2011

Kernel WYSIWYG Digraphs

Kernel WYSIWYG rooted digraphs

Previously

I said in comments that I view backquote as a WYSIWYG tree ctor, analogous to `list'.

That's not fully general

But backquote doesn't suffice to construct the most general form of open structure of pairs, a rooted digraph. Those require the ability to share objects between multiple parents.

Things that don't do the job

One could write shared objects by naming objects in a `let' form and including them by name. That's not WYSIWYG, though. WYSIWYG would at least present the objects in place once, as #N# / #N= reader notation does.

But that's in the reader. It's not available to general code.

What to do

One could:

Enclose the whole construction in an empty `let'.
Provide a combiner expanding to ($sequence ($define! x DEF) x)

Signals, continuations, and constraint-handling

Signals, continuations, constraint-handling, and more

Previously

I wrote about mutability and signals and automatically forcing promises where they conflict with required types.

Signals for Constraint-handling

Constraint-handling, and specifically the late-ordered evaluation it uses, could be treated by signals. Something like this:

Promises are our non-ground objects. They are, if you think about it, analogous to Prolog variables.
- They can be passed around like objects.
- But they're not "really" objects yet.
- For them to "succeed", there must be some value that they can take.
- That value can be determined after they are passed.
- Everything else is either ground or a container that includes one or more of them.
As I already do in Klink, use of an non-ground argument is detected where it's unacceptable.
- Generally, they're unacceptable where a type is expected (eg they're acceptable as arguments to cons)
Unlike what I do now, non-ground unacceptables send signals to a scheduler.
- This implies that each evaluation is uniquely related to (owned by) one scheduler.
- That signal passes as objects
  - the continuation as object.
  - The promise to be forced.
It is the job of the scheduler to ensure that the resulting computation completes.
- In appropriate circumstances, eg with resource-quotas, or when doing parallel searches, it is reasonable to not complete.

An analogy

Computations whose flow can't be changed are analogous to constant objects and flow-changable ones analogous to mutable ones.

Another use: Precision

This mechanism could also be useful for "dataflow" precision management. Here the objects are not exactly promises, but they are incomplete in a very precise way. They are always numbers but their precision can be improved by steps.

Combiners that require more precision could signal to their generators, which could signal back their new values, propagating back their own requirements as needed. This is very manageable.

Apply-continuation and signals can mutually define

apply-continuation, while it could be used to implement signals, can also be implemented by signals between evaluations. It signals a scheduler to schedule the given continuation and deschedule the current continuation.

Done this way, it gives a scheduler flexibility. This might appear to make analysis more difficult. But a simple (brain-dead) scheduler could do it exactly the continuation way: exactly one evaluation is "in control" at all times. This might be the default scheduler.

In both cases this functionality is a neither-fish-nor-fowl hybrid between objects and computational flow. ISTM it has to be so.

signals allow indeterminate order

One mixed blessing about signals is that since they are "broadcast" 1-to-N, it can be indeterminate what is to be evaluated in what order.

Pro: This gives schedulers flexibility.
Con: Computations can give different results.
- Continuation-based scheduling doesn't neccessarily do better. It could consult "random" to determine the order of computations.
Pro: Signals and their schedulers give us another handle to manage evaluation order dependency. A scheduler could co-operate with analysis by:
- Promising that evaluations will behave "as if" done in some particular order.
- Taking into account analysis
- Providing information that makes analysis easier, short of simulating the scheduler.
  - Probably promising that no signals escape the scheduler itself, or only a few well-behaved ones do.
Pro: Regardless of the scheduler, analysis can treat signals as a propagating fringe. Everything not reached by the fringe is unaffected.

Should signal-blockers cause copies?

I suggested earlier that combiners might promise non-mutation by blocking mutation signals out.

One way to code this is to make such combiners deep-copy their arguments. Should this situation automatically cause copies to be made? ISTM yes.

Not an issue: User confusion. There is no case where a user reasonably expects different functionality than he gets.
- He might expect a function to mutate an object and it doesn't - but why is he expecting that of a function that doesn't claim to mutate anything and moreover explicitly blocks that from happening? Bad docs could mislead him, but that's a documentation issue.
Not an issue: Performance.
- Performance is always secondary to correctness.
- Performance can be rescued by
  - copy-on-write objects
  - immutable objects
  - Analysis proving that no mutation actually occurs.

18 October 2011

Thoughts on the structure of Klink Libraries

Klink Libraries

Previously

I blogged some ideas about Library paths for Klink, the Kernel implementation I wrote. I listed several desiderata, based on lessons from the past. I also blogged about how I'd like bridge the gap between package as used by developer and package as used by user.

Additional desiderata

(The desiderata from the previous blog post, plus:)
Should co-operate well with development. Switching from use to development shouldn't require gross changes or compete with library support.
Can fetch libraries automatically with reasonable power and control
- In particular, automatable enough to support "remote autoloading" but ultimately should be under the user's control.
Support clean packaging

Fetching libraries: mostly lean on git

The well-loved version manager git provides most of what I'd want, out of the box:

Co-operates well with development (More than co-operates, that's what it's usually for)
Reasonably compact for non-development. You can clone a repo with depth=1
Fetching
- Via URL (git protocol or otherwise)
- Doesn't treat URLs as sexps - only a mild problem.
Finding out what's there to be fetched, in the sense of available versions (eg, looking for latest stable release)
```
git ls-remote --tags URL 
```
- But we have to distinguish tags and tags^{, which AIUI don't
  refer to versions.}
Secure digital signatures are easy
- Creating them
```
git tag -s
```
- Verifying them
```
git-verify
```
- Some pages about them:
  - http://learn.github.com/p/tagging.html
  - http://progit.org/book/ch2-6.html
Excluding local customizations from being updated
- This is possible with .gitignore and some care
- But customizations will live somewhere else entirely (See below)
Practices supporting stable releases. git-flow (code and practices) does this.
NOT a well-behaved heterogenerated tree of libraries.

Of course git does not support knowing that a repo is intended as Kernel code. Looking at filename extensions does, but that seems to require fetching the repo first. For the same reason, it can't easily be any file that "lives" in the repo. It should be something about the repo itself.

So the convention I propose is that the presence of a branch named --kernel-source-release indicates a branch of stable Kernel code. Tags on that branch would indicate available versions, so even if coders are working informally and doing unstable work on "master", only tagged versions would be offered.

But does keeping --kernel-source-release up to date require extra effort for the maintainer? IIUC, git can simply make --kernel-source-release track "master", so if a coder's workflow is organized, he needn't make any extra effort beyond issuing a one-time command. Branch tracking is intended for remotes, but seems to support this.

Should there be other branches, like --kernel-source-unstable or --kernel-source-development? I suspect they're not needed, and any use of unstable branches should be specifically configured by the daring user.

I'm not proposing to permanently tie Klink (much less Kernel) specifically to git forever. But it serves so well and is so well supported that I'm not concerned.

Where to put it all?

That addressed how we can fetch code. In doing so, it put some restrictions on how we can organize the files on disk. So I should at least sketch how it could work on disk.

The easy part

Of course one would configure directories for libraries to live in. Presumably one would distinguish system, local, and user.

The hard part

A mature set of libraries tends to look like a tree. But above, we had to treat each one a separate project.

Approaches:

Let the namespace tree exactly be the directory tree.
- Fatal con: Code repos want to "own" all their subdirectories. git will try to control everything in its working tree. It even wants submodules to covary with the working tree.
Have a dotty namespace where all the components are expressed in one name. So directories have names like "std.math.util.raw"
- Con: Yuck! There's everything wrong with this.
Reserve a name for subdirectories and .gitignore it.
- Con: Every level of subdirectory becomes two levels.
- Con: Still yuck.
Require that directories must be either inner nodes or leaves, never both. If a project directory becomes an inner node, move code in it to a directory with a special name, say "%main".
- Con: Requires moving subdirectories, possibly behind the back of other software.
Library filespace and project filespace are distinct. Stow the projects into a library tree. They live in some other directory as configured by user.
- Pro: Clean.
- Pro: Flexible.
- Pro: Lets local customizations live outside the project repo, as they should.
- Con: How is stow to know where a project's "root" in filespace is?
- Con (that the other approaches had too): Internal project filetree organization can still mess us up. What happens if its code lives in "src" instead of in "."?

I favor the "stow" approach.

Path configuration

But the stow approach still left issues of where exactly to stow things. We can't solve it in the file system. That would result in one of two ugly things:

Making each project represent the entire library filespace, with its real code living at some depth below the project root.
Making each project physically live in a mirror of the target filespace. This would have all the problems we were avoiding above plus more.

So I propose per-project configuration data to tell stow about paths. I'd allow binding at least these things:

prefix: The library prefix, being a list of symbols.
parts

For example,

((prefix (std util my-app))
   (parts
      (
         (source
            [,,src,,]
            ())
         (source
            [,,tests,,]
            (tests))
         (info
            [,,doc,,]
            ())
         (default-customizations
            [,,defaults,,]
            ())
         (public-key
            [,,pub_key.asc,,]
            ()))))

That would live in a file with a reserved name, say "%kernel-paths" in the repo root. As the example implies, the contents of that file would be sexps, but it wouldn't be code as such. It'd be bindings, to be evaluated in a "sandbox" environment that supported little or no functionality. The expressions seem to be just literals, so no more is required.

Dependencies and version identity

Surfeit of ways to express version identity

There are a number of ways to indicate versions. All have their strengths:

ID hash
- Automatic
- Unique
- Says nothing about stability and features
Release timestamp
- Easily made automatic
- Time ordered
- Nearly unique, but can mess up.
- Says nothing about stability and features
Version major.minor.patch
- Just a little work
- Expresses stability
- Expresses time order, but can be messed up.
Test-satisfaction
- Lots of work
- Almost unused
- Automatically expresses stability and features
- No good convention for communicating the nature of tests
`stable', `unstable', `release', `current'.
- Expresses only stability and currency
By named sub-features
- Just a little work
- Expresses dependencies neatly
- Expressive
- Not automatic

I chose sub-feature names, based on how well that works for emacs libraries, a real stress test. That is, I choose for code to express dependencies in a form like:

(require (li bra ry name) (feature-1 feature-2))

Co-ordinating sub-features with version identity

The other forms of version identity still exist as useful data: ID hash, version tags, results of tests. What makes sense to me is to translate them into sets of provided features. Do this somewhere between the repository and the require statement. require would still just see sets of features.

Desiderata for this translation:

Shouldn't be too much work for the developer.
- Probably easiest to support automatic rules and allow particular exceptions. With a git-flow workflow, this could almost be automatic. As soon as a feature branch is merged into "master", that version and later versions would be deemed to have a feature of that name.
Should be expressable at multiple points in the pipeline, at least:
- Annotations in the source code itself
- In the repo (In case the source code annotations had to be corrected)
- Stand-alone indexes of library identities. Such indexes would be libraries in their own right. Presumably they'd also record other version-relevant attributes such as signature and URL.
- Locally by user
Should be derivable from many types of data, at least:
- Branches (eg, everything on "master" branch has the feature stable)
- Tag text (eg, all versions after (2 3 3) provide foo-feature)
- Tag signature (eg, check it against a public key, possibly found in the repo)
- Source code annotations (eg, after coding foo-feature, write (provide-features ear lier fea tures foo-feature))
- Tests (eg, annotate foo-feature's (sub)test suite to indicate that passing it all means foo-feature is provided)
- ID
  - To express specific exceptions (eg, ID af84925ebdaf4 does not provide works)
  - To potentially compile a mapping from ID to features
- Upstream data. Eg, the bundles of library identities might largely collect and filter data from the libraries
Should be potentially independent of library's presence, so it can be consulted before fetching a version of a library.
Should potentially bundle groups of features under single names, to let require statements require them concisely.

Dependencies

With sub-features, we don't even need Scheme's modest treatment of dependencies, at least not in require. Instead, we could avoid bad versions by indicating that they lack a feature, or possibly possess a negative feature.

The usual configuration might implicitly require:

works
stable
trusted-source
all-tests-passed

The set of implicitly required features must be configurable by the user, eg for a developer to work on unstable branches.

Library namespace conventions

On the whole, I like the CPAN namespace conventions. I'd like to suggest these additional (sub-)library-naming conventions:

raw

This interface provides "raw" functionality that favors regular operation and controllability over guessing intentions.

dwim

This interface provides "dwim" functionality that tries to do what is probably meant.

test

This sub-library contains tests for the library immediately enclosing it

testhelp

This sub-library contains code that helps test libraries that use the library immediately enclosing it. In particular, it should provide instances of objects the library builds or operates on for test purposes.

interaction

This library has no functionality per se, it combine one or more functional libraries with an interface (keybindings, menus, or w/e). This is intended to encourage separation of concerns.

inside-out

This library is young and has not yet been organized into a well-behaved namespace with parts. It can have sub-libraries, and their names should evolve to mirror the overall library organization so that it can become a real library.

(inside-out new-app)

user

This user is providing a library that doesn't yet have an official "home" in the namespace. The second component is a unique user-name.

(user tehom-blog/blogspot.com inside-out new-app)
(user tehom-blog/blogspot.com std utility new-util)

Other roots

But there are other files that are relevant to libraries but are not source code. ISTM there's often a benefit in locating them relative to a given library. But they shouldn't have an identity in the library tree itself:

They're not source
Libraries mustn't accidentally use their identifier as the name of a normal source file.

So I would define other trees exactly paralleling the library tree. I would include at least:

docs

Organized documentation

info

Documentation specifically as info files

customizations

For customizations relating to a library. Always local.

(customizations std util my-app)

Format : List of bindings

default-customizations

For the recurring idiom where one initially configures with defaults, and user changes to them are preserved.

Format : List of lists, each being in the format (name value first-appeared-in-version)

files

General non-source files supporting a library. In particular, in relation to test or examples.

13 October 2011

Mutability And Signals

Mutability and Signals

Recently I've been working on Rosegarden, the music sequencer. It uses Qt which uses signals.

Signals implement the Observer pattern, where an object notifies "observers" via signals. A signal is connected to one or more "slots" in other objects. The slots are basically normal methods, except they return nothing (void). When a signal is emitted, Qt arranges for the slots to be called, other than those of deleted objects. So far, I find it works easily and elegantly.

This made me wonder: Could signals take the place of mutability in Scheme? And might that give us both referential transparency and reactiveness simultaneously?

There's not much support for signals for Lisp and Scheme. There's Cells, but it seemed to be conceived of as just a super-spreadsheet. I want to go much further and use signals to re-imagine the basics of object mutation.

Quasi-mutation: The general idea

Basically, we'd use signals between constant objects to fake mutability.

Objects can't mutate.

"Slots" are closures.
- Closures would typically know a target object. Possibly as environment as record.
Signals are emitted with respect to particular objects.
- Not "by objects". Again, we're not object-oriented. We're just indexing on objects.
- Ability to emit is controlled by access to objects and signal types.
- Indexing on one particular argument seems overly special, so I contemplate indexing on any relevant arguments. This is again similar to generic functions.
Signals can be connected to slots.
- The signals go to where the object's respective signal is connected. They are indexed on objects.
Constructors connect the signal replaced from the parts to the constructed object.
- More precisely, to a closure that knows the object.
- The closure would fully represent the objects' relation. For instance, mutable pairs might have the slots new-car and new-cdr with the obvious meanings.
- But not for immutable objects. Immutable objects' slots would not be new-car and new-cdr, they would raise error.
- The constructed object can access its part objects, in appropriate ways by its own lights. For instance, a pair object could retrieve its car and cdr objects.
- This particular signal replaced is not exposed.
- The details of replaced will be refined below.
Slots such as new-car will typically:
- Construct a near-copy of the object, with the new part in the old part's place. This effectively connects a new version of the object to the new part and disconnects it from the old part.
- Emit replaced with respect to the object, propagating the change.
"Normal" setters such as set-car! emit replaced wrt the old object with the new object as value.
- That's just the classical way. There's plenty of room to do clever new things with signals.
- As above, doing this to immutable objects causes error.
Constructed objects behaving this way would include at least:
- Mutable pairs (and therefore mutable lists and trees)
- Environments
- Continuations. While not often considered object-like, continuations have parts such as the current environment and their parent continuations.
External-world-ish objects such as ports react to signals in their own appropriate way, not neccessarily propagating them further.

As if

Literally implementing ever pair with at least two signals between itself and its car and cdr seems prohibitive if not impossible. Physically, it couldn't be the only mechanism of mutation. So I'm talking about a mechanism that acts as if it's continuous down to basics like pairs and lists, but really uses a more modest mechanism where it can (presumably containment, as now).

Circularity

My blog readers, being smart, will already be wondering what happens when replaced is propagated in a circular structure. Does it propagate forever, crashing the interpreter? That might go something like this: The signal propagates from pair to pair, repeatedly reaching its original location, pointlessly representing more new objects that differ only from the k*N nth cdr, and never finishing because there's always an (k+1)*N nth cdr to represent.

For instance, in the circular list above, suppose we gave object C a new value. It emits (replaced New-C C), which B receives in new-cdr. B emits (replaced New-B C) which both A and old C receive. Leaving aside A's reaction, old C then emits (replaced New-Old-C C) which old B again receives. A is going to be deluged with infinite signals, and we will keep making new copies of old B and old C. Worse, we make many new-Bs when we only have a place for one.

The too-easy answer would be to block all repetitions between the same two objects. That fails, because we re-emit on new copies. So also block repetitions where the objects are the same except that one is replaced by an updated copy?

So as a first draft, I propose that these built-in signal emissions:

Be each associated with a unique ID
Refuse to propagate that ID circularly

The ID would be an opaque ID equal? only to itself. It would be unique to a given cascade of emissions. That is, set-car! etc would make fresh IDs but new=car etc would use the ID of the replaced signal that provoked it.

One subtle point: It suffices for the signalled object itself to recognize repititions. Originally I thought repititions would have to be recognized across a transitive closure. But only the original object is connected to by its contents, and connections from new parts to new object don't receive any signal to propagate.

What about structural information?

The above is progress, but it discards structural information.

Consider an application that displays the first N items from a possibly-circular list on screen, and redisplays them just as needed (Say they are formatted in some time-consuming way and different screen locations can't share this work) The display would want to be notified of all the nth cdr occurences of a changed element. The approach above wouldn't permit that.

So when there's circularity, let's switch tactics and use polling to retrieve information as needed. Let any circular object provide a stream of all the replacements that occured in it for a given emission-ID. Applications only read the stream as far as they need to and the rest is not really generated.

But often a slot would prefer to get a simple new object, without replaying all the detailed changes.

Desiderata

To sum up the desiderata on this:

We want slots to be able to replay detailed changes.
We want slots to be able to just get a new object
We don't want mutated objects to have to guess what their target slots want, which may not even be the same for different slots.
We don't want mutated objects to have to do inordinate (even infinite) work preparing these signals.
We don't want to make assumptions about the topology of the mutated object. It can be an arbitrary directed graph.

A thought-experiment sequence of operations

The sequence of operations on the circular list above would go something like this. Here I use FOO.N to indicate the nth version of FOO.

We gave object C.1 a new car, V (we emit replaced.1 to it) Its change is:
- "Replace car with object V". We'll represent this as
  - The trivial accessor identity.
  - A promise to construct V (Trivial since we started off knowing it)
replaced.1 reaches C
We emit replaced.2 for C. Its change is:
- A disjunction (or) of
  - "Replace what you're holding with C.2"
    - identity
    - A promise to construct C.2
  - "Replace car of it with V"
    - car
    - Promise of V
replaced.2 reaches B
We emit replaced.3 for B.1. Its change is:
- A disjunction (or) of
  - "Replace this with B.2"
    - identity
    - A promise to construct B.2
  - A disjunction (or) of
    - "Replace cdr of this with C.2"
      - cdr
      - A promise to construct C.2
    - "Replace cddr of this with V"
      - cadr
      - Promise of V
replaced.3 reaches A.1
- We'll say the promises are not forced yet. If they are, that still forces these following operations to occur first.
replaced.3 also reaches C.1. Here we encounter a repitition. C.1 has already encountered a signal from this trigger. We don't want to emit another consequence of it to the same places. Instead we want to fold the signals together.

Let's stop operations and look at it

If we made replaced.4, its change would be:
- A disjunction (or) of
  - "Replace this with C.2"
    - identity
    - A promise to construct C.2
  - A disjunction (or) of
    - "Replace cdr with B.2"
      - cdr
      - A promise to construct B.2
    - A disjunction (or) of
      - "Replace cddr of this with C.2"
        
        cddr
        
        Again, a promise to construct C.2
      - "Replace cadr of this with V"
        
        caddr
        
        Promise of V

So replaced.4 resembles replaced.2. The big change is that instead of "Replace car of this with V", we have a cdr replacement. It pushes "replace with V" down to another cycle of the loop. That doesn't support the "display circular list" example, which was our motivation for doing this. So:

We need to support something that can mention both the car replacement and the cdr replacement of C.1
We need to effectively be able to adjust previous signals instead of emitting more.

I first considered and discarded an approach of rewriting the second disjunct into a conjunction.

Now the most tempting approach is to rethink the car replacement operation. It did not fully describe the object, it just described the new car, V.

Let's say that all such operations are in fact constructors, and in the case of "Replace car of it with V", we let the cdr argument default to the current cdr.

Timing issues:

The default can't be committed to until the signal and its consequents have stopped propagating.
We know this by Emission-Id.
So forcing one of these promises first causes the signal to propagate until it's done.

This sounds quite like what I would expect from a solution. Constructors "stand for" the objects they make, as is often the case in RT languages. We are getting a structure that's isomorphic to the original object (circular list of period 2). "Just mutate contents" remains a valid optimization as long as we're in a closed system (And a tempting one; I have to keep reminding myself that we're conceptually replacing mutation with signals so it won't do)

The new structure of replacement info

So replaced.2's replacement info actually looks like:

A disjunction (or) of
- "Replace what you're holding with C.2"
  - identity
  - A promise to construct C.2
- "Construct it with V as car"
  - Arguments
    - car : Promise of V
    - cdr : C.2's cdr value, defaulting to C.1's cdr value.

Default arguments

The action of replaced.3 on C.1 just provides a value for the cdr argument. Re-emission is not needed, nor wanted since at C's cdr the signal has reached a join.

So the default cdr argument is held in a sort of limbo until it's given or not. That's a lot like mutation in itself. So it too is managed by signals:

"All done". Signal has fully propagated, use defaults for anything not filled.
"Use this" Accessor Value. After receiving "use this", "All done" has no effect.

These are of course quasi-mutating the stream argument of replace. There's an infinite regress there, but since it is all behind promises that delay evaluation until after this is done, I think we can consider it as if the stream had had one unchanging value forever.

Structure of the change stream

So each "cell" of the stream of operations changes consists of a disjunction on constructors and values, rather than a cons cell. Each disjunct can be:

A promise of a value
A promise of a constructor parameterized by promises of values

So it's like an n-ary tree-stream.

Note that this naturally orders the change stream from shallow to deep. If it was ordered from deep to shallow we couldn't find elements in finite time.

This doesn't preclude specialized representations that (say) compact multiple simple cdr accesses to an nthcdr access. These could be represented as additional constructors in the disjunct.

Object identity

Above, I said "constructed object can access a part object", not "a part's value". Since objects no longer ever change value, the difference is subtle. It's just this: the object has a single set of signal connections. So it has a single identity. So there is a trace of object identity remaining.

One could represent identity value-wise by saying that values consist of a (classical) value and an "object-identity" value, and that object-identity values are opaque and unique except as shared by this mechanism. So signals are connected with respect to object-identity values.

It has a flavor of "the long way around", but it lets us treat objects entirely as values.

Object versioning

Different versions of an object have different Object-IDs. Imperatively this wouldn't be anything, since two simultaneous references to the same object can't have different values. But here, one can both mutate an object as far the the world is concerned and hold onto its old value. But it should never be the case that objects are eq? but not equal?. So different versions have different Object-IDs.

The equality predicates

Equality:

equal?

is what it normally is in Scheme or Kernel: The objects have the same current value.

eq?

A and B are eq? just if

(equal? identity-of-a identity-of-b)

=?

is just an optimization of equal?

Signals and immutability

I mentioned that I was thinking about immutability in regard to this. So far, I've just described how to duplicate mutability with signals.

For immutable objects, some or all slots would still get connections, but would raise error instead of propagating mutation.

But that's only half of the control we should have over mutation. We'd also like to guarantee that certain evaluations don't mutate even mutable objects they have access to, eg their arguments, the environment, and dynamic variables. The "foo!" convention indicates this (negatively) but doesn't enforce anything. "foo!" convention notwithstanding, we'd like to guarantee this from outside an arbitrary call, not from inside trusted combiners.

Blocking signals

So we'd like to sometimes block signals. If signals were emitted anyways, they'd be errors and would not reach their destinations. So if replaced is blocked, code either doesn't try to mutate objects or tries and raises an error. Either is consistent with immutability.

ISTM the simplest way to block signals is to disconnect their existing connections and connect them to error combiners. When the call is done, restore their original connections. However, that doesn't play well with asynchrous execution.

Instead, we'll make a copy of the original object that will (probably lazily) infect its parts with "don't mutate me in this scope".

Scope

For a traditional imperative language, where flow of control and scope are structurally the same, we could block signals in specific scopes, recursively. But for Scheme and Kernel, that won't suffice. What would happen if an object is passed to a continuation and mutated there? We've broken the guarantee that the object wouldn't be mutated. Any time we let objects be passed abnormally, this can happen.

We might try to:

raise error if affected objects are passed to continuation applications, or
"infect" the other scope with the signal restrictions.

Neither is appealing. In this mechanism, continuing normally is also passing to a less restrictive scope. And continuing normally should behave about the same way as continuing abnormally to the same destination. We also don't want error returns to permanently "freeze" objects.

So ISTM we must distinguish between continuing to a (not neccessarily proper) parent of the restricting scope (normally or otherwise) and continuing elsewhere. Signal blocks are removed just if control reaches a parent. This is essentially how Kernel guards reckon continuation parentage.

Doing this for all object parts

We'd usually want to say that no part of any argument to a combiner can be mutated. It's easy enough to treat signal connections from the root argobject. But we need to "infect" the whole object with immutability, and not infect local objects, which may well be temporary and legitimately mutable.

Since these arguments are ultimately derived from the root argobject, what we can do is arrange for accessors to give immutable objects in their turn. But they have to be only temporarily immutable - blocked, as we said above. And we'd prefer to manage it lazily.

So I propose that accessing a blocked object gives only objects:

whose replaced signals are re-routed to error, as above, until control "escapes normally" (an improper parent continuation is reached)
which are in turn blocked, meaning they have the same property infecting all of their accessors.

Non-pair containers

Non-pair containers are not treated by mechanisms like copy-es-immutable. But we want to treat immutably fully, so they have to be treated too. This is the case even for:

Environments
Encapsulation types. In this case, their sole accessor is required to do this, as all accessors are.
Ports. Administrative state or not, they can be immutable.
Closures. Anything they return is considered accessed and automatically gets this blockingness. Their internal parts (static environment, etc) need not be blocked.
Continuations. Like closures, what they return is considered accessed and automatically gets this blockingness.

Exemptions

We'd like to be able to exempt particular objects from this. Some combiners mutate an argument but shouldn't mutate anything else. There'd probably be a signal-block spec that would specify this.

Blocking signals to keyed dynamic objects

We can easily extend the above to deal with dynamic environment, but keyed dynamic objects are not so simple. Their accessors would be covered by the above if derived from the argobject or the dynamic environment, but they need not be.

So we need an additional rule: Keyed dynamic objects are blocked if accessed in the dynamic scope of the blocking. That's recursive like other blockings. Keyed rebindings in the dynamic scope aren't, because one might bind a temporary that's legitimately mutable.

Side note: I'd like to see a stylistic convention to differentiate between combiners that mutate their arguments ("foo!") and combiners that mutate something else, meaning either their dynamic environment or a dynamic variable (Say "foo!!")

Interface to be defined

I've already written a lot, so I'll leave this part as just a sketch. To support all this, we need to define an interface.

A means of defining new signal types
- Returns a signal emitter
- Returns a signal identifier for `connect' to use.
A signal scheduler. By general principles, the user should be able to use his own, at least for exposed signals. It's not too hard to write one with closures and continuations.
Means for the user to emit signals
Means for the user to connect and disconnect signals
- Not exposed for many built-in objects such as pairs.
- "capture all current", as for blocking
- "disconnect all"
block-all-signals: connects all signals to error continuation
- Possible argument `except'
(Maybe) block-signal: connects given signal to error continuation
A "dirty-flag" mechanism, often useful with signals.
Possibly a priority mechanism.

Some possibilities this raises

Use the signal scheduling mechanism for managing constraints. Since when we can enforce immutability, constraints become very tempting.
Provide general signals
Provide an interface to Lone-COWs, a sort of copy-on-write object optimized for usually being uniquely referenced/owned.
Supplying "out-of-band" signals to a debugger or similar. They really do need to be out-of-band.
Provide a broadly applicable interface for redo/undo. It could basically just capture historical copies of objects.

Glorified footnote: Aren't signals a lot like callbacks?

Somewhat. The differences are:

Signals are N to 1.
Signals are associated to some sort of signal-type object. One creates a signal of a specific type. In Qt, this is specified as "signal:" members of a class.
Signals are scheduled more freely.
Signals to "dead" objects do nothing. Their existence doesn't keep objects alive; in this they are like weak references.
The sending object does not explicitly track its observers,

Thinking about Immutability in Klink

Previously

Klink is my stand-alone implementation of Kernel by John Shutt.

Types of immutability

There are several types of immutability used or contemplated in Klink.

Complete immutability: Not much to be said.
Pair immutability: A pair that forever holds the same two objects, though those objects' contents may change.
List immutability: For instance, a list where you can change the contents, but not the number of elements.
Recursive structural immutability: An immutable tree of mutable non-pair objects.
Administrative immutability: Eg, what ports typically have. When you read or write to a port, it remains "the same thing" administratively.

Current status in Klink

Complete immutability

Available in C, used in some places such as symbols and strings. AFAICT there's no way to specify its use or non-use in Kernel; some objects just have it because of what they are or how they are created.

Pair immutability

This is the default in Klink. To get mutable pairs, use mcons instead of cons. I was persuaded of this by Getting rid of set-car! and set-cdr! and Racket, also here.

List immutability

No way to have a pair whose car is mutable but whose cdr recursively isn't.

Recursive structural immutability (Tree immutability)

Supported via copy-es-immutable

Administrative immutability

No C flag for it. Like complete immutability, some objects just have it because of what type they are.

Non-recursive structural immutability?

If you read section 4.7.2 of the Kernel report (copy-es-immutable), you may notice that Pair immutability and List immutability are actually extensions. So I figured I should at least advance a rationale for them.

Is non-recursive immutability worth supporting? ISTM it's already strongly suggested by Kernel.

Implied by an informal type: Some combiners take finite lists as arguments; all applicatives require a finite list as argobject. That distinguishes the finite list as at least an informal type. There's a predicate for it, finite-list?, but pairs that "are" finite lists can "become" other sorts of lists (dotted or circular), so it falls short of being a formal type. This seems like an irregularity to me. Structural immutability would solve it.
Implied by implied algorithms: For some combiners (eg reduce), any practical algorithm seems to require doing sub-operations on counted lists. That implies a structurally immutable list, because otherwise the count could theoretically become wrong; in practice it's saved from this by writing the algorithm carefully. So there are at least ephemeral, implied structurally immutable lists present.
Vectors: John once told me that he would eventually like to have vectors in Kernel. Vectors are optimized structurally immutable finite lists.
Opportunity for optimization: There's an opportunity for other optimizations where list immutability is used and recognized. In particular, caching a list's element count is often nice if one can be sure the list count won't change.

Comparison table


Type of immutability	Pair	List	Tree
Special care needed
for shared objects?	No	Only own tail	Yes
Exists in Klink?	Yes	No	Yes
Exists in Kernel?	No	No	Yes
What might be mutable?	car, cdr	elements	leaves

10 September 2011

Could backquote be adapted sensibly for Kernel?

Backquote vs Kernel

If you look at Kernel source, you'll quickly see constructions that a Lisper or Schemer would immediately use backquote for, like:

(eval ($if (null?/1 bindings)
           (list* $let () body)
           (list $let
                 (list (car bindings))
                 (list* $let* (cdr bindings) body)))
      env)

Quote and backquote are discouraged in Kernel. They tend to detach symbols from the environments they pertain to. They're not needed much, because in Kernel it's usually not neccessary to pass bare symbols around unevaluated.

But backquote is a lot more convenient than managing tree structure manually. It's more WYSIWYG. It lets you read and write something like:

(eval ($if (null?/1 bindings)
         ($` $let () . body)
         ($` $let
            (#unquote(car bindings))
            ($let* (cdr bindings) . body)))
   env)

Can we have it both ways? Yes.

Is it possible to have a variation of backquote that only exposes the sort of things manual list construction exposes, forms that don't naturally include symbols?

Yes. For instance, one could construct a form as traditional backquote does and evaluate that form in the current environment, without exposing the quoted form. The result would usually be evaluated a second time.

That gives the right result, but some pieces of the sexp get evaluated early (the low-quoted pieces), some get evaluated late (but still internally). It works but it seems dodgy.

With an equivalent result, one could say that backquote internally builds a form that, when evalled, recovers the original structure of the tree, except that low-quoted pieces are copied literally. Then it evals that form. In this formulation, there are no new quoted parts even internally, and all evaluation wrt the current environment happens in the final stage of backquote.

Any new machinery needed? Not much.

The backquote itself can be just an operative. The existing primitives supply enough functionality. The one piece of new language machinery needed is a means to introduce the low quote without causing special cases.

This rules out using a combiner, because backquote doesn't ordinarily evaluate combiners until it has settled the form structure, and low quote must be seen before that.

So low quote needs to be something that backquote understands particularly, which implies a unique object defined with backquote. But we can't convey this object by the usual means of binding it and writing its symbol. The logic is similar to the above: We'd need to use it before we evalled it and found it was special.

So we need a special way to read this object, thus the "#unquote" above. And that's about it.

24 August 2011

git-svn with less pain

How to make a svn clone with git-svn without spending 10 hours

I recently had the mixed experience of cloning the Rosegarden source from its SVN repo with git-svn.

First thing: Do not believe anyone who tells you to just launch:

git svn clone -s REPO-URL

If somebody tells you to do that, give them a mean glare from me. It will take hours and hundreds of megabytes. I quit after a total of maybe 10 or 11 hours, after maybe a dozen restarts with no end in sight. It had used about 350 megabytes of disk, and it wasn't anywhere near finished downloading.

Second thing: You're going to have to find the latest SVN revision number by hand. At least, I found no way to do it within the git workflow¹. You can find it remotely via svn but if you wanted to use svn you wouldn't be doing this.

What worked for me is:

git svn clone -r $REV:HEAD -s $URL $DIR

where $REV is a fairly recent SVN revision number; $URL and $DIR are obvious.

At the risk of making my own post redundant, this Stackflow thread showed me how to do it.

Footnotes:

¹ I tried just initting the directory with git-svn and launching various informational git-svn commands from in it; none worked.

18 June 2011

Kernel suggestions about modules

Two suggestions about Kernel modules

Foreword

The first part of this was inspired by my desire to bundle testing submodules with Kernel modules. The second part was inspired by my email exchange with John Shutt after I sent the first part to him.

An issue about modules

Klink is at the point where I'm starting to think about built-in test support. (Till now it has relied entirely on an external tester I wrote for emacs) That brought up an issue about modules.

The dilemma

As I see it, the dilemma (multilemma) is this:

IMO as a tester, tests ought to group with modules, for many reasons.
I doubt it makes sense for modules to have canonical names or identities, so I can't tell the tests about the module, I have to tell the module about the tests.
- So a test harness needs to be able to look at an environment and find the tests in it, if any. This probably implies they live somewhere in that environment.
Reserving a name to always have a special meaning, such as "tests" or "tests", seems wrong for many reasons.
make-keyed-static-variable wants to make a fresh environment.
- That requires always loading the tests first, which is problematic at best.
- Even if I can load a module before I load the tests for it, I'd still need to maintain a mapping from module to tests.
- That makes it impossible to define tests incrementally.
I could bind an object that a test-definer would write into. Say, an environment (In fact I will, to name sub-tests). But I'd still have to always place the binder around the entire module.
- It's an error opportunity, having to always remember to do that.
- It's structurally noisy.
- The same would have to be done for every functionality that wants to let individual modules "say something about themselves".
I could fake it with gensyms but with all the keyed variable support, it'd be a shame.

Possible solutions

Keyed setters

Have make-keyed-static-variable also make a setter, something with semantics similar to:

($vau (value) env ($set! env KEY value))

where KEY refers to the shared secret. If the "binder" return is analogous to `$let', this would be analogous to `$set!'. This would not make a fresh environment.

Con: Creates uncertainty and error opportunities about what environment is being defined into.
Con: Doesn't cooperate with constructs like `$provide'

Let accessors have defaults

Another possibility is for the accessor to optionally, if it finds no binding, make one and record it. Presumably it'd evaluate something to make a default.

Con: Same cons as above.

Smarter modules

Taking horn #5 of the multilemma as a first draft, provide a construction that surrounds a module with all the binders it should have.

Issues:

Its identity:
- Is the entry point `get-module' with a larger mandate?
- Or is it a separate thing? IMHO no.
- And should this be available on its own? Meaning "bind all the usual things but don't load anything". IMHO yes.
So which binders should it use?
- Interested ones are somehow registered externally.
What happens for binders that are defined after a module is loaded?
- Are they missing forever? That seems unfortunate.
- Alternatively, they could behave as if their binders had been used in the first place, since nothing can have accessed them yet (which must have made an error).
  - Pro: This neatly handles circular dependencies and even self-dependencies.
How may they be registered?
- If any combiner of the proper signature can be registered, stray code could be registered and subtly change the behavior of all sorts of modules. That'd be a serious problem.
- So ISTM registering should be something only make-keyed-static-variable or similar can do. We know it makes a normal, harmless binder.
What specifically registers a binder?
- make-keyed-static-variable, on some optional argument?
- A relative of make-keyed-static-variable that always registers the binder?
  - `make-keyed-module-variable'

Coda

I lean towards the 3rd solution, smarter modules.

What do you think?

Late addendum

This implies that `make-keyed-module-variable' takes code to make the initial object, so I better mention that. Probably it's a combiner that's run with no arguments in the dynamic extent of the call to `get-module'.

Reloading with secrets intact

ISTM surely there are situations where one wants to reload a module but keep its "secrets" (keyed variables, encapsulation types) as they are. The alternatives seem unacceptable:

Somehow prove that there's never a situation where one wants to reload a module that has secrets.
- Fatal con: There surely are such situations, eg minor bugfixes.
Require reloading everything to make sure that every instance of every encapsulation type etc is fully recreated.
- Con: This makes it very painful to use the interpreter interactively. It would be like having to restart emacs every time you fix an elisp bug.
Track everything affected and reload just those things.
- Con: Seems hugely difficult to track it all.
- Con: Still might require so much reloading that it's effectively a restart.
Name keyed variables etc in the usual Scheme way
- Fatal con: defeats their purpose.

A possible solution sketched in terms of the high-level behavior

Let secret-makers (make-keyed-static-variable, make-keyed-dynamic-variable, make-encapsulation-type) optionally be passed a symbol and a version-number (integer). Call those that aren't passed this argument "anonymous secret-makers".

Let each module retain, between loads in the same session, a mapping from (symbol version-number) to private info about the respective secret-maker. Anonymous secret-makers don't participate in the mapping.

When a secret-maker is being created, if its symbol and version-number match an earlier version, then the elements that it returns are to be `eq?' to what the earlier version returned, as if the "two" secret-makers were one and the same.

Anonymous secret-makers never satisfy that test. They behave as if they had a new symbol each time.

It is legal for secret-makers to have the same version-number across source-code changes, but then if changes occur within the same session, proper update behavior is not guaranteed.

Rationale: This allows "secret-makers" to usually carry over automatically, yet allows them to be overridden when desirable, eg when their old definition is wrong.

The version-number is separate to avoid making the user create a new name for each version. In principle it could also let an interpreter react intelligently to "going backwards", or warn on missing redefinitions without also giving false warnings for new versions.

We could have instead required the interpreter to treat source-code changes as new versions, but this seems an unreasonable burden and raises issues of code equivalence, and removes control from the user. But an interpreter is allowed to do this, and since it is legal for secret-makers to keep the same version-number across source-code changes, doing so requires nothing special.

To version a secret-maker, this requires changing source code, because the version-number lives in source code. This is less than ideal because it's really a session property that's being expressed, not a source property. But it is generally reasonable.

We require only that the elements that it returns be `eq?'. Requiring that the whole return value be eq? seems unneccessary, though it probably falls out.

Enabling mechanism: Cross-load memoization

The above all can be accomplished by cross-load memoization, which further makes it possible to make all sorts of objects eq? across multiple loads.

This requires mostly:

That modules retain an object across repeated loads.
That that object, relative to the module being loaded, be accessible for this purpose.
That the object be an environment, because it will map from symbol to object.
Code that:
- Takes a (name . version) argument
- Accesses the above environment relative to current module. It's the only thing that can access it.
- checks version
- re-uses old value if appropriate
- otherwise calls to create new object
- records current (name . version) and value

A recipe for using the secret-makers this way. Maybe simply:

($module-preserve (my-unique-name 1) (make-keyed-static-variable))

How should this object be shared?

Code defined in other modules shouldn't normally see this or use it, so these objects are not shared dynamically. They are shared statically. That implies affecting the environment that `get-module' makes, that loading runs in.

Presumably we'd use `make-keyed-static-variable' and share the binder with `get-module' and the accessor with `$module-preserve'.

Sketch of actual requirements on Kernel

The standard environment would contain `$module-preserve', defined as above.
`get-module' or similar would take an additional argument, the object, which it would make statically available to `$module-preserve'.
Any future `require' type mechanism would map module names to these objects.
- It would create new ones for new modules.

20 May 2011

Automatic forcing of promises in Klink

Automatic forcing of promises in Klink (addendum)

I had meant to explain also about auto-forcing in `$let' or `$define!', but for some reason I didn't. So I'm adding this now.

Background: In Kernel, combiners like `$let' and `$define!' destructure values. That is, they define not just one thing, but an arbitrarily detailed tree of definiendums.

So when a value doesn't match the tree of definiendums, or only partly matches, but the part that doesn't match is a promise, Klink forces the promise and tries the match again.

Unlike argobject destructuring, this doesn't check type.

Automatic forcing of promises in Klink

Automatic forcing in Klink

As I was coding EMSIP in Kernel, I realized that I was spending entirely too much time and testing to manage the forcing of promises.

I needed to use promises. In particular, some sexps had to have the capability of operating on what followed them, if only to quote the next sexp. But I couldn't expect every item to read its tail before operating. That would make me always read an entire list before acting. This isn't just inefficient, it is inconsistent with the design as an object port, from which objects can be extracted one by one.

What I do instead

So what I have now is automatic forcing of promises. This occurs in two destructuring situations. I've coded one, and I'm about to code the other.

Operatives' typespecs

Background: For a while now, Klink has been checking types before it calls any built-in operative. This operation can check an argobject piece by piece against a typespec piece by piece. That destructures it treewise to arguments that fill an array that is exactly the arguments for to the C call. It's very satisfactory.

Now when an argobject doesn't match a typespec, but the argobject is a promise, the destructure operation arranges for the promise to be forced. After that comes the tricky part. While the destructuring was all in C, it could just return, having filled the target array. But now it has to also reschedule another version of itself, possibly nested, and reschedule the C operation it was working towards.

All fairly tricky, but by using the chain combiners and their support, and by passing destructure suitable arguments, I was able to make it work.

Defining

As in `$let' or `$define!'. I'm about to code this part. I expect it to be along similar lines to the above, but simpler (famous last words).

Status

I haven't pushed this branch to the repo yet because I've written only one of the two parts, the destructuring. That part passes the entire test suite.

I haven't yet tried it with EMSIP to see if it solves the problem.

Does it lose anything?

ISTM this does not sacrifice anything, other than the {design, coding, testing, debugging} effort I've spent on it.

Functionality

It subtracts no functionality. `force' is still available for those situations when manual control is needed.

Restraint

The opposite side of functionality. Does this sacrifice the ability to refrain from an action? No. In every circumstance that a promise is forced, the alternative would be an immediate error. There could never have been a capacity to do the same thing except refraining from forcing.

But does it sacrifice the ability to make other code refrain from an action? No, the other code could have just called `force' at the same points.

Exposure

Does this expose whether a promise has been forced? No, not in any way that wasn't already there. Of course one can deduce that a promise has been forced from the fact that an operation has been done that must force that promise. That's always been the case.

Code size

The init.krn code is actually slightly smaller with this. The C code grew, but largely in a way that it would have had to grow anyways.

08 May 2011

Chains in Klink

As I mentioned in the previous post, Klink is a hybrid of C functions and a dispatch loop (_klink_cycle). The dispatch loop finds a "next" C-based combiner and runs it. This can be an applicative or an operative. Under the hood, operatives come in several C types.

About a week ago, I added chains to this set. Chains are really a vector of combiners. Evaling a chain queues all those combiners to run, then continues to them with the value it was passed.

Store and load

In order to make this work neatly, I also added two other new types of (pseudo)combiners, T_STORE and T_LOAD. Those are only available "under the hood" for optimization, because they know things about the stack that shouldn't be exposed.

T_STORE destructures the current value and stores on the stack a vector of the resulting objects.

T_LOAD builds a tree from recent values according to a template. "Leaves" of the tree index an element from a stored vector. That's a slight simplification: Leaves can also be constants, which are used unchanged.

About the stored vectors

These stored vectors collectively are somewhat like an environment in that they retain and retrieve values. But unlike the environment, they are distinct across distinct continuations that have yet to assign to them. In other words, if you use this mechanism to capture a combiner's return value, and then before using that value, you re-enter and capture it again, the new return value will not overwrite the old return value, nor vv. Both will be available in their respective continuations.

These stored vectors expire a certain number of frames down the stack, at the bottom of the chain. That keeps them from confusing other chains and keeps the vectors from hanging around uselessly forever.

Future improvements

At the moment, T_LOAD treats pairs of type (integer . integer) specially, as indexes. I should clean this up by making a dedicated type for indexes.

Together, T_LOAD and T_STORE may supersede T_CURRIED, a type of operative that combines current value with stored data in various possible ways. In order to do that, T_LOAD will have to can access the current value.

Eventually I may switch to a more linear spaghetti stack. Then I will have to change the indexing scheme from (integer . integer) to single integer.

I said earlier the stored vectors were somewhat like an environment but different. Eventually I'd like to use essentially this mechanism to optimize well-behaved environments, so we can index rather than always looking up a symbol. Those in contrast would have to be capturable by continuations. They would also need special care in cases where the environment is captured at some point - either skip those cases or make this data part of environments.

History

Originally instead of T_STORE and T_LOAD, I used special instructions that were assign values when the chain was run. This proved complex and difficult to work with.