## External testing frameworks revisited

Last week I blogged about external testing frameworks. Summary: There are (were) no good ones.

While writing the post, it occured to me how I could extend my own Emtest testing framework to do this sort of thing. I hadn't designed Emtest for this sort of work, but I had designed it to be very modular and extensible.

Since then, I have gone ahead and added a runner to Emtest that does external testing. So I have to modify my previous position. This one, while it's young, is usable and you don't have to write in TCL.

### What it is

It's called "external.el" and it's rather like "expect", but all in Elisp.

Its test cases look almost the same as the existing test cases. They are declared in deftest-3', same as the others. They take a governor of external' rather than nil'. The test clause itself consists of, first, a list of bindings, and then a series of interaction forms. The bindings tell how to launch the external process and give other config information. The interaction forms each start with a string to be sent to the external process. They typically also contain a form that receives the response and tests it. The form can be omitted for prompts whose response is not tested.

I am already using this for testing Klink and it's much nicer to use the DejaGNU was, and works whereas Greg doesn't work here.

## There's another org2blog

Yesterday I found out that there's another emacs package called org2blog, just like mine. It does roughly the same task, too, just it publishes to Wordpress instead of to Blogger and other Atom API sites.

It looks like its conventions are almost all different. The only thing really the same is the "#+TITLE: " file property.

I emailed the author, suggesting that we should at least try to co-ordinate. I haven't heard back yet.

## Can the stow model help me write design documents?

### The problem

I like to document my designs for my projects, such as Klink and Emtest. In theory writing down the things I decide or do should create a record of the design.

But I always face a problem. The issues, insights and decisions don't come in an organized manner. Quite the opposite. They trickle in a stream-of-consciousness manner, a few at a time, bearing no strong relation to the organization of the design. I want to bring order to this chaos without spending more time on it than it's worth.

I've tried several approaches:

• Buckling down and putting every new thing into its proper, final place. After doing that for too long, my conclusion is that it's far, far too much work, especially navigating to the right spot all the time.
• Separate files for major task areas: Design, decisions, progress. Before I switched to org mode, I wrote a meta-design of how to organize and some helper code and tried to stick to it. Again, after too much time and effort trying to keep it on track, I concluded that this just took far more work than it was worth.
• Tried out John Wiegley's remember / planner / emacs-muse, but it really wasn't what I was looking for.
• About this time I switched to Carsten Dominick's org-mode. It's wonderful for organizing text, but seemingly possesses no miracle big enough for this task.
• Accumulating everything as notes and placing them in their final place in the outline later. Problems:
• Later never comes.
• Notes get lost in the big amorphous blob of notes.
• Often I write a note that wants to modify the design, but that I might or might not ultimately accept. It isn't clear what should be done with it. I could move it into the design and maybe have to move it back out and try to restore the design to its old state, or leave it where it is, where it itself becomes an informal focus of later additions.

I tend to choose the latter, so I am often finding recent notes and adding to them. This corrupts the file's organization too, so either way the design file degenerates.

• Tagging each incoming note. I hoped that by doing this, I could filter on tags and thereby see all the items that should be moved to a place in the design section. However, that offers little for the pieces that are maybe in, maybe out.

It's helpful up to a point. Theoretically I can filter on the tags, but in practice that hasn't done much for me. The tagset is simply nowhere near as fine-grained as my design outlines, and there's no reasonable way to make it so.

• Splitting my design file template into information sections. "Design", "Issues", "Info", "Communication", "Progress", "Notes". Again, it's helpful up to a point, but doesn't solve my problem.

One major problem is that items often evolve from one category to another. Eg, an issue will evolve into a task, which belongs in "Progress", or a design, which belongs in "Design". Progress items will spawn new issues, often as many as half a dozen. Decision items want to place one alternative into "Design" and archive the rest.

I'd really like to organize my designs even more strongly, separating out such facets as parts and behavior, and "cookbook" docs, etc. But without a much stronger approach, that's unlikely to happen.

### What stow does

In order to describe my shiny new idea, I first have to describe how Stow works. Stow uses indirection. The parts are in one place (in subtrees of /stow/ or /usr/local/stow/) and the organized tree (/usr/local/bin/, /usr/local/man/man1/, etc) lives in a different place. Please ignore the fact that /usr/local/stow/ is really a location in the filetree - that's an inescapable bootstrap issue and poses little practical problem.

What stow does is tells the organized tree to hold the parts indirectly by symlinks. So the executable file in /usr/local/bin/my-exec really points at /usr/local/stow/this-package-0.1/bin/my-exec.

It figures out what to hold where from each package's internal organization. Ie, /usr/local/stow/this-package-0.1/bin/ will be symlinked from /usr/local/bin/, because after we get past the stow prefix /usr/local/stow/this-package-0.1/, what we see is bin/)

• One can swap packages in and out very freely
• Consequently nearly all of the pain of managing package versions disappears. You install them to use/local/stow and you stow the version you want. Upgrade? Unstow the old one, stow the new one. Didn't like the upgrade? Reverse the procedure. Don't like the whole package any more? Unstow it and delete the subdirectory.
• Also, packages' install procedures can be much simplified, though most packages don't take advantage of this because they want to install on non-stow systems too.

### A possible solution

Now, this could be another shiny idea that proves to be more work than it's worth. But it works neatly for stow, so maybe it will work here.

#### Overview

My approach would have these parts in analogy to stow:

/stow/ contents
Notes that mirror the existing structure of a document or subtree.
items that don't have their own contents but act as if they were other items.
/usr/local/
Documents or outline items intended to contain the symlinks as well as some text themselves.
stow
A command relating to a note, that causes appropriate "symlinks" to it to be created in /stow/ contents.

#### stowable notes

##### They are like org notes

These might be just ordinary notes, or they could be notes in a specific place. They should be creatable by org-remember, so they should live in a refile target.

##### Multiple stowables might live in a note

But it shouldn't be the case that each note corresponds to exactly one target location. A single coherent note may well want to add parts to different documents. Also, its target may be so deeply nested that mirroring the entire path to it would be very annoying.

So ISTM best for each stowable item to parameterize on a prefix. Presumably that prefix is stored in a property. ISTM the right interface is a command or commands to move singleton "directories" into and out of the prefix. Binding-wise, ISTM best to overload org-shiftmetaright' and org-shiftmetaleft' to do that, since their ordinary function is not useful here.

Having stowables not correspond to notes creates a problem that stow also has: How to recognize a parts directory when there can be more than one. We could just record "stowable", presumably as a tag, and not let new stowables encompass items tagged "stowable".

In stow, items don't need to know whether they are currently stowed or not. Here, for user convenience we'd probably want to at least tagged stowed items "stowed".

##### Indicating the mirror structure

My first thought is to just mirror the headlines. It works but it's fragile for split trees. If the user renames an item in the target document, suddenly we can't find it anymore. Let's also have a backup: store the org-id of any node that our mirror structure splits on.

##### Creating the mirror structure

But we still haven't talked about creating the mirror structure. It would be a PITA to do manually. Doing it supervised automatically doesn't seem difficult: It'd be much like the interaction like org-refile. Presumably we'd proceed in two stages:

• Creating a stowable
• Interactively find a root item
• Make a fresh item with the same headline
• Store the root's id.
• Tag the fresh item "stowable"
• Adding an item underneath a stowable
• Choose a item in the respective target directory (whether our item is stowed or not)
• Make a fresh item with the same headline
• Store its id.

I'd also like support to repair the mirror structure, if it goes astray. It'd go something like

• Visiting a possibly broken item:
• Find where the org-id points.
• No match, or no id? Uh-oh, that's not good.
• Is there a headline match under the parent?
• If so, fix the id.
• Otherwise proceed as if creating a new item, just it will have old contents.
• Match found: Fix the headline, if needed.

#### stow command

As with stow itself, there are a few issues under the surface:

• Conflicts
• Tree-splitting.

It's all fine to symlink /foo/bar/x into an empty directory, but what do we do when part of the path already exists in the target? If eg the target is /foo and not /.

Same thing that stow does: We split the tree. Sometimes to do so we must turn stowed "symlink" directories into real directories whose components are now symlinks. We may even have to do that recursively. Usually that suffices, because the user's intention was not to make competing, overlapping parts.

But sometimes it doesn't suffice. Two stowed parts might want to occupy the same position, and they can't both. Or maybe there is already text there.

Again we do as stow does: Refuse to stow, report the conflict(s), and asks the user to resolve them manually. That's acceptable. If the user's design document really has such a structural conflict, there's no magic here to fix it. Of course he will need to actually change whatever is making the conflict.

It might be handy to have a tool to pull existing text that participates in a conflict into a new note or notes. The user could then resolve the conflict just by unstowing that text.

#### Integration with org-choose

org-choose is my old project to "choose" among org items, where choosing one item makes its siblings non-selected. It would be very nice to automatically stow the item that is selected, and unstow the alternatives (but not in that order). This probably wants its own tag, perhaps "stow-choice".

But what about impure choices, where some items build on others? For instance, in this very post, the item "Do the same in place" builds on "Weave a separate read-only document". Pretend for a moment that they are full stow-choice items.

Stowing "Do the same in place" by itself wouldn't work; what's it the same as?. Stowing "Weave a separate read-only document" would be just wrong. If the two items have no structure, we're just out of luck. The second item will just have to be edited to be stand-alone.

So also pretend for a moment that the user has restructured both items cleanly: ie, each sub-item pertains entirely to one or entirely to both. Now there is something potentially stowable there. We just have to see how to do it right.

One thought is that the derived item points to be stowed in the base item, and if chosen, cascades to stow first the base and then itself. But this can't be entirely right. It implies that the derived item also overwrites the base item's text. That would be wrong, because what happens if we then choose the base item? Its text would be gone, replaced by that of the derived item. In any case, I don't contemplate using this mechanism to override text.

One could insist that the user rewrite further to remove conflicts, but the items are in fact alternatives. Sometimes we're lucky and one alternative builds on another in such a way that there is no conflict, but we can hardly rely on that. Making them never structurally conflict seems contrary to the spirit of alternative choices.

So instead, let this base/derived situation inform the choice of what to stow but not cause any base text to be overwritten. We'd recognize such situations just if a "stow-choice" alternative pointed into a "stow-choice" alternative in the same set.

## container-algebra

Yesterday I was reading some fascinating posts on sigfpe's blog A Neighborhood of Infinity. In particular, Fun With Derivatives Of Containers

Fascinating stuff. The rest of this post is going to assume that you have read it, so check out the link.

One part got me blogging: He says "Pity we can't write X.F'[X]-F[X] to mean just the address, but '-' makes no sense in this context". In another place (somewhere) he (or someone) says that neither subtraction nor division

I got to thinking, that's not entirely true. Consider the case of a list from which we intend to pop an element. The input an be any list except the empty list. Following his notation, the type of a list of X's is:

1 + X + X^2 + etc


where '+' is understood as a discriminated union. Or equivalently,

sum(N,0,oo,X^N)


The type of the empty list is '1'. That's actually not specific; 1 is the type of which there can be only one instance. Similarly, a boolean flag is of type 2, an enumerated type of N elements is of time N.

So the possible inputs, list other than empty list, could be written:

sum(N,0,oo,X^N) - 1


And the type of pop' as

sum(N,0,oo,X^N) - 1 -> sum(N,0,oo,X^N)


It's a little disconcerting that pop' looks like "+1". When we look at it closer, we see that's not really the case. In type theory, objects don't have identity beyond that of their type signature, so we can't see that all of the objects have moved one place closer to the list head. We can, however, let the upper limit not be infinity but merely approach infinity. Letting M be that limit, we see that pop's type signature is actually:

sum(N,0,M,X^N) - 1 -> sum(N,0,M-1,X^N)


which we can stretch to write as:

1/X


So we can have some semblance of division!

## What's "e"?

Reading thru his other posts on the topic (highly recommended), I find that amazingly, the Taylor series for exponentiation is applicable to container types, ie

exp(X) = 1 + X/1! + X^2/2! + X^3/3! + etc


where a container of the form

X^N/N!


is interpreted as a bag of N elements of type X. The "N!" comes from the fact that they are insensitive to permutations, ie, there are N! ways to compose a list of N elements of type X from those.

So I asked myself, what's the equivalent of "e" in this system? And the answer seems to be,

exp(1) = 1 + 1/1! + 1^2/2! + 1^3/3! + etc


A bag of any number of elements, each of which is void. Amazing.

## Can junk compaction be done with the Boehm garbage collector?

When I first started this blog, I blogged about junk compaction, which is akin to garbage collection, but instead of throwing out objects that have no use at all, it transforms objects that have a lot of "scratch space" into a more compact form when they are inactive.

The Boehm garbage collector doesn't give us access to generational information, which was my original idea for implementing junk compaction. It does give us a fair amount of flexibility, though. So let's revise the idea: Don't try to figure out whether the junky object is stale. Just assume that it is stale if referenced only by junk-releasing references - akin to weak pointers.

The Boehm GC does give us an opening for this: Finalization can leave an object alive if it points to itself. And we can make weak links by hiding pointers.

### First idea (not so hot)

• Hold objects to be junk-collected only thru non-junk-releasing pointers and weak links.
• So it will get finalized when it's no longer useful.
• During finalization
• Build the compact version (allocate it - is this OK during finalization in Boehm gc?)
• Change the weak links into real links to the compact version.

The last is the hard part because it requires finding all links to the object, unhiding them, and changing them to point to the compact version. GCregisterdisappearinglink could probably do it "under the hood", but it doesn't expose any such functionality. It would have to be prearranged on every link anyways, which is the reverse of what we want: The non-junk-releasing pointers are special, not the junk-releasing ones.

### Second idea: Handles

It could work if junk-collected objects are only held by handles and normal (non-junk-releasing) pointers. The handle would hold it by a weak pointer and it would point to the handle by a normal pointer. Thus if nothing is holding the object directly (not thru the handle), it will soon be finalized and thus compacted.

When finalized, it would make the handle point to the compact object and indicate this. And of course make the pointer to the handle disappear.

### Third idea: Handles that become the compact object

This still leaves a superfluous handle. Is there some way to telescope it to actual reference? We must do so all at once, not piecemeal, so it doesn't mess up pointer comparison.

So instead, realloc the handle to be the size of the compact object and change it into that object.

This seems like it would work.

## External testing frameworks, yuck!

Yesterday I looked at external testing frameworks. I didn't find a satisfactory one. Looking back, I'm not glad that I spent my time on that.

Klink is now at the point where it makes sense to script tests for it. But it's not yet ready for internal, self-driven tests. So I looked for an external testing frameworks. I tried several, and none were satisfactory.

### Emtest

Let's get this out of the way first. Emtest is my emacs test harness. It does not do external testing. Making it do so would take more time than I want to spend on this. So I didn't look at it for this purpose.

#### But maybe I should have

In retrospect, doing so might have had more satisfying results than spending my time exploring the others. Now I'm wondering whether I should go and do that. It could run the external program as an inferior process. The major work would be writing the expect functionality.

But that's another topic. Right now I'm grousing. Er, I mean reviewing the current state of external testing frameworks.

### DejaGNU and Expect

DejaGNU does at least use POSIX-conformant test results. That's a format that I don't entirely agree with, but it's at least more reasonable than some formats out there. I could live with it.

Expect scripts are in TCL, and DejaGNU wants you to write tests in TCL. I had written an expect script years ago, back in the days of dialup. I vaguely remembered it being a PITA, but I thought that maybe my memory was wrong. It wasn't wrong, it was understated.

So I gave Greg a try.

### Greg

Greg seemed a lot more promising than DejaGNU. It is scripted in Scheme. Scheme vs TCL? No contest!

It is a little bit odd to be using Scheme for a Kernel project, especially since it wants Guile linked in. But I don't think it would be hard to convert it to Kernel.

It's also easier to control than DejaGNU. The command line options are pretty direct, and the documentation is better.

Unfortunately, Greg is badly out of date now1. In particular, out of the box, it is not capable of setting up the terminal-pair that it needs.

I inspected the code and made it set up the terminal-pair. After that it semi-ran but got the same error on every single pipe. Clearly there's something else that needs to be updated, but now it's less obvious.

I know that Guile converted the function that gives the error to return a wchart some years back, so I suspect that it may be involved. But that's buried deep in Guile's innards and I don't want to fool with it.

The other reason why I'm not fixing Greg is that at the moment Savannah's web interface is down, so I can't see whethr there is a more recent version.

### DejaGNU/Expect again

So I gave DejaGNU another try.

While there is a fair amount of documentation for DejaGNU, it's all about how to set it up cross platform. I had to experiment to determine what it wanted. Key trick: It wants to be run in a directory of which subdirectories are named *.t00, *.t05 etc. It will then run all the *.exp files in those subdirectories. I only found that out by looking at an example.

More problems cropped up, related to TCL's stupid syntax. I got it working but I really don't look forward to maintaining this creature.

## POSIX-conformant format - not great but tolerable

[This started out as a note where I first talked about DejaGNU, but it grew into a section on its own]

What POSIX does right:

• The major taxonomy for tests results is almost right: PASS, FAIL, UNRESOLVED, UNSUPPORTED and UNTESTED.

DejaGNU extensions add 4 more categories: two for expected failures (XPASS,XFAIL) and 2 for known bugs (KPASS,KFAIL).

What it does wrong:

• Merges presentation format and report format. True, one could do further formatting on top of the given format, but that's a nitpick; the problem is that the merger cripples the report format. That leads to the next item.
• No provision for extra information, just a test name.

A test report format should allow and frame a fair bit of additional information: diagnostics information, additional taxonomic information (eg what caused a given dormancy), context information (test location, exactly how to rerun the test). This information should be in very flexible format. The TESTRAL spec is a good model for what should be allowed.

Some might argue that analytical information has no place in test reporting. I disagree. It's not just convenient, it belongs there. It is entirely reasonable to treat the set of test cases for diagnosis as overlapping with or identical to the relevant set of test cases for regression testing. Indeed it would be wrong to omit relevant diagnostic cases from the regression test suite, and pointless to construct new cases for diagnostics where test cases exist. So obeying DRY, those cases should not be repeated in code, they should be used from where they sit, in test clauses.

• While the major headings of the taxonomy are nearly correct, POSIX doesn't place all results in the correct category. In particular, dormant results are placed into all the non-boolean categories: UNRESOLVED, UNSUPPORTED and UNTESTED. It also mingles dormant and bad-test results in UNRESOLVED.

Dormant tests are tests that are present in some form but deliberately not run for any reason. There is a wide spectrum of possible reasons for dormancy. Some are extremely lightweight (user temporarily marked a test dormant), some are heavyweight (system configuration, in all its many guises).

Splitting up dormant results into extra categories that requires the test-runner to understand dormancy. That's not just extra work. In general, it can't. Dormancy can be marked for reasons far beyond what the test-runner could understand, and that's not even its job to understand them.

• The DejaGNU extensions move analysis and presentation logic into the test-runner. That's wrong.

However, it does make sense to recognize that a given test corresponds to a bug in a bug database, and to have persisting marks of expected failure, possibly even shared by a team. It shouldn't be the test-runner's job to do those things, however.

Whose job is it then? I long ago analyzed essentially this question for Emtest. I believe the same holds true for these uses of persistence. My conclusion is that:

• Persistence belongs on both sides of the test-report-presentation/test-runner interface
• Ultimate control wrt using persistence belongs on the presentation side.
• The test launcher has a read-only interest in the information - it should can optionally not run expected failures. It shouldn't be required to be that smart, and also belongs on both sides of the interface, with the higher level launcher logic being on the presentation side.
• The test-runner has essentially no interest in persistence. At most, it incidentally calls libraries that know about persistence.

## Footnotes:

1 AFAICT, it was updated in an apparent one-time effort in 2006.

2 I'm not paying umpteen hundred to buy a printed copy of the spec. I'm going by D J Delorie's writeup of it.