30 October 2010

Kernel can't have and-let

Kernel is strict about boolean types

Kernel is the Scheme variant by John Shutt that I've been blogging about.

One thing about Kernel is that it does not let non-booleans be used as tests in conditionals. Ie, it doesn't use what other Lisps call the extended boolean - every object other than false is deemed true.

In a sense, I agree with this. There is virtue in keeping the types straight. But OTOH, it blocks some Scheme conveniences, which then need to be reinvented.

So it can't use #f as an out-of-band object

A common Scheme (and Lisp) idiom is to use the false object as an out-of-band object. For instance, a search that find nothing returns #f.

This is made easy by the latent typing that Scheme has. The #f object can go anywhere unless you explicitly check for type, and still it is easily checked for. But that's never been a perfect plan. There's always the question of what you do when the search (or w/e) gives the #f object itself. Sometimes there's also the question of how to distinguish different non-fatal failures.

But in Kernel that's not available because #f and #t aren't suppsoed to mix with other objects, lest they cause $if to exhibit hard-to-find bugs.

Kernel does provide a special object called #inert for "no information" returns. John discusses this in the rationale for assoc, contemplating making assoc return a pair of a boolean and either the answer of #inert. For instance, a successful search could return:

(#t . The-Result)

and a failed search

(#f . #inert)

These are easily treated because Kernel makes multiple binding available everywhere.

But then he goes on to argue that under the hood, assoc will have to deal with a result object that is () on failure, and that the user can easily re-represent it as bool+result if desired. He therefore concludes that assoc should just return either () or the found object.

I'm not entirely sure I agree. It seems like this situation is particular to assoc (and assq and a few others), not generally available. Won't other search functions still have to return bool+result? One could argue that it's more important to make assoc uniform with other searchers, and the nil-on-failure version could be provided also.

Also, it has always struck me that assoc was often more difficult than it needed to be, because instead of returning the value that's usually sought, which is the cdr of the cell, it always returned the whole cell. It does this for two reasons:

  • Sometimes one wants the whole cell.
  • It needed to have a means of indicating failure. If it returned the cdr, that would sometimes be (), the same as the failure result.

Kernel's multiple bindings almost solves this problem. It solves it in the success case. But if assoc's failure value is (), multiple bindings won't help. One can't bind the result's cdr, because () doesn't have one. So there's another argument for assoc returning bool+result.

So no and-let

One piece of convenience that this denies is and-let*. And-let* is a sort of hybrid of and and let*. It binds symbols successively to values, like let* does, but if any of those values is #f it terminates, like and does. and-let* makes it very convenient to alternate between getting values and testing that those values are suitable. I find that this situation comes up a lot.

But since one can't (or shouldn't) mix booleans and non-booleans, there won't be many #f results to terminate it. They could only come from functions that return booleans, and bindings their results makes little sense, since if and-let* succeeds the values can only be #t.

What to replace and-let* with

IMO the best replacement for `and-let*' is a pattern-matching `let*'. I wrote something like that as part of "match.el" in Emtest. It would go beyond Kernel's what multiple bindings can do and be a true pattern language. Constructor functions such as cons and list would have matching destructurers. It'd be something like:

(pattern-let*
   (
      ;;This clause has the same interpretation as in let
      (a value1)
      ;;Same interpretation as in Kernel's let but not Scheme's
      ((b c) value2)
      ;;Like ((d e) value3) in Kernel, because ~list~'s matcher
      ;;would accord with ~list~.
      (,(list d e) value3)
      ;;Call a matcher associated with the constructor ~foo~ to
      ;;destructure value4 and bind its parts to f and g
      (,(foo f g) value4)
      ;;Result is not bound but must be 12
      (12 value5)
      ;;Result is not bound but must be #t
      (#t value6))
   ;;In the body, a thru g are all bound.
   (do-stuff a b c d e f g))

This is clearly quite powerful. It probably belongs in a pattern-matching module.

No type signatures

The non-mixing of booleans and non-booleans suggests to me that there should be some sort of a type signature in Kernel, at least to distinguish booleans from everything else. But there isn't.

The name #inert

Kernel also has a special object called #ignore. ISTM #ignore and #inert are too similar in name and role, and would be easily confused. So I have some suggestions for renaming #inert:

  • #n/a
  • #not-applicable
  • #not-available
  • #no-info

Kernel: rationale for not listing bindings

Understanding Kernel's rationale for not listing bindings

As I blogged, I have been reading about Kernel, a better Scheme.

One thing Kernel will not do is to list all of an environment's bindings. At first glance this seemed wrong, because the first rationale that I read for it seemed to contradict itself:

From (§6.7.1)

Presenting [$binds] as a library feature drives home the point that it doesn't introduce any capability that wasn't already provided by the language. In particular, for purposes of type encapsulation (§3.4), there is still no way for a Kernel program to generate a complete list of the variables exhibited in an arbitrary environment. Predicate $binds?, or the techniques used to derive it, could be used to formally enumerate such a list, by sequentially generating all possible variables (an infinite sequence) and testing each one; but there would be no way to know when all variables in the environment had been enumerated. This nontermination is critical because it means that no Kernel program can prove in finite time that two non-eq? environments are equivalent up to mutation; therefore, the rules governing predicate equal? (§4.3.1) do not require it to return false in any cases where predicate eq? returns true.

I read it thru 3 times, and he seems to have accidentally flipped polarity in his argument: He wants to avoid a situation where (called on two objects) eq? would give false and equal? true. But the thing that he fears will give rise to it is being able to completely compare non-eq? environments.

When I read this I got the impression that his rationale was mistaken. But then when I read the paper again, I saw a better rationale elsewhere.

Better rationale

When introducing environments, he gives a different rationale for linking equal? and eq? on environments:

(§4.8)

The behavior of equal? is tied to that of eq? to forestall the possibility of an implementation compromising the encapsulation of the type by allowing a program to determine, in finite time, that all bindings for one environment are the same as those for another. (Cf. the rationale discussion for the derivation of library predicate $binds?, §6.7.1.)

And (§4.9) reinforces it:

There is no need for this module to assume the Equivalence under mutation module, because environments are eq? iff they are equal? .

So the discussion in §6.7.1 was misleading. Now it seems that:

  • He fears that listing bindings will compromise the encapsulation of the environment type.
  • So he wants equal? and eq?, called on two environments, to always give the same answer.

Why does he want the environment type to be encapsulated in this way? Apparently to prevent mutation of parent environments. Also from (§4.8):

First-class environments offer a tremendous amount of control over what can be accessed from where - but only if there are limitations carefully placed on what can be done without explicit permission. In particular, whenever a combiner is called, it has the opportunity, in principle, to mutate the dynamic environment in which it was called. This power is balanced by omitting any general facility for determining the parents of a given environment, and also omitting any other general facility for mutating the parents, or other ancestors, of a given environment. (For an example of articulate environment- access control, see $provide!, §6.8.2.)

But what's the threat?

This unfortunately does not make it clear what the threat is. I just can't see any exploit that could use `list-bindings' to do something bad.

All I can guess is that it's that misbehaving code could get a list of all current bindings and comprehensively rebind all of them and thus fake the original environment, parents and all. But this is not the same thing as mutating that parent environment. And since one can sometimes correctly guess all the bindings, it is not clear how one can expect this to always work.

Why I suspect this prohibition isn't needed

Here's a positive argument that this prohibition is wrong: Containers such as alists can hold the same data that environments do: A set of bindings. Yet there is no problem being able to list all the elements of an alist or to compare alists `equal?' (and Shutt doesn't say there is).

kernel-interpreter

I'm writing a Kernel interpreter

Kernel is the Scheme variant by John Shutt that I've been blogging about.

The two and a half days, I've been adapting the Tinyscheme interpreter to run Kernel. Right now it doesn't compile. Some of the choices I'm making:

New eval functionality

The first thing I did was to write a new eval loop and support functions to make and evaluate vaus, which are basically Kernel's version of lambdas. Like in Tinyscheme, it's separated into parts, and in between the parts the interpreter does other stuff such as nested evals.

C functions instead of new opcodes

Rather than grossly extend the OP_ list, I've opted to add the capability for the interpreter to call a C function instead.

This is distinct from the old foreign function type. All these functions have the signature:

typedef pointer (*kernel_func)(scheme * sc, pointer args, pointer env);

They are suitable for use as operatives, so a lot of the base Kernel functionality can directly be C functions.

Initially I coded some of the stuff as new opcodes, because that's the way TS has always done things. But I was never too thrilled about that way of doing things.

To support this, I:

  • changed the interpreter to include a field for the next function to call.
  • changed the stack frame code to save and restore that field
  • Added one opcode: OPKERNELCALL, which calls that function (or complains if it's NULL)

So right now both opcodes and C functions are supported and opcodes are primary.

New code for registering foreign functions

Since I have the new supported function type, I created a way of registering those functions. That is, knowing their C name and Kernel name (as a string), add them to a Kernel environment.

I wrote that facility for Tinyscheme in the first place, so it was pretty easily adapted. I corrected one thing I wish I'd done earlier and allowed this registering to apply to an arbitrary environment. Before, it always applied to the global environment (what Kernel calls the ground environment)

Added pseudo-object fields to interpreter

Tinyscheme creates some gotta-be-present objects as fields in the interpreter, such as #t, #f, and nil. Kernel has some new objects that I made similarly: #ignore and #inert

One thing I did differently: For some obscure historical reason, TS has both objects and pointers for those fields. It seems pointless; the pointer must always point at the same object, so one could just take the object's address. So for the new ones, I left out the pointer field.

Formatting

Tinyscheme code really hasn't had a consistent indentation style etc. On request, I had added a style marking to the file local variables, which as "k&r" which seemed to nearly match the original style. But I like GNU style better, so I switched the file local variable to that.

28 October 2010

Kernel: A better Scheme

Kernel

I have been reading Revised-1 Report on the Kernel Programming Language 1 about Kernel, a variant Scheme defined by John Shutt.

I found it interesting read:

  • I like his language ideas.
  • I like his mechanisms for them.
  • He scrupulously explains his motivations for each choice.

About Kernel

There is little else out there about Kernel. There's this paper, and an implementation in Scheme called SINK. Also a Wikipedia stub and some commentary on Lambda: The Ultimate. That's about it

FWIW, the name Kernel is very hard to search for. Googling "Kernel Scheme" generated mostly false hits. I'd suggest a change of name.

Everything is first-class / fexprs done right

Kernel's primary claim to goodness is that it makes everything a first-class object, even the things that Scheme wants to treat specially such as `define', `lambda', `and', etc. It does this by having its evaluator distinguish between "applicatives" which are essentially Scheme functions, and "operatives", which are somewhat like macros except2 that they take an explicit environment argument.

Continuations

Kernel approaches continuations a little differently than Scheme. Unlike the rest of the paper, continuations were presented in a difficult way3, and it took me some careful reading to make sense of what he was saying.

Kernel has these primitives about continuations:

  • call/cc The familiar Scheme construct
  • extend-continuation. The purpose of this wasn't clear to me. It's not merely building a lambda form that will call the continuation. It seems to have to do with building trees of continuations, which guard-dynamic-extent treats specially.
  • guard-dynamic-extent is a more powerful, safer form of dynamic-wind. Each guard knows a continuation and only intercepts children of that continuation.
  • continuation->applicative which makes a continuation into essentially a function. Apparently in Kernel continuations are not simply applied but must be converted first.

It would have been nice to see some examples for how these constructs would be used.

These extensions seem intended largely for handling exceptions, but he doesn't specify an error format. (I like a fat error format myself, but that's a topic for another time)

Multiple-value bindings

Kernel has multiple-value bindings, and they are available everywhere, not just in a special construct (as in Common Lisp). So you can write something like:

(let (((a b c) (list 1 2 3)))
   (list c b a))

And you can bind not only lists but trees:

(let (((a (b c)) (list 1 (list 2 3))))
   (list c b a))

I agree with this feature. I keep wanting this in Lisp and Scheme. I've built it with macros more than once, but it's nice to have it properly in the language.

But ISTM full pattern-matching is a fairly short jump away from this. As the writer of Emtest, I find pattern matching nearly indispensable.

Possible issues with full pattern-matching:

  • "What happens when a variable occurs twice in the pattern?"

    The occurences must be eq?4 or the pattern fails to match.

  • "What if the pattern doesn't match?"

    Nothing new. It's essentially the same situation as when a list is too long or too short.

  • "So how do you match objects that are not lists?"

    Answer: Every constructor-like combiner5 should be able to have an associated pattern-matcher.

    NB, for encapsulations created by make-encapsulation-type (8.1.1), e should not be considered constructor-like. If we were to try to use e with a symbol argument as a matcher, there would be no correct return.

    • Always returning false would be wrong because the ctor might be instantiated with the same object, which should match.
    • Returning true would let the contents as returned by d bind to the symbol, which would break encapsulation and incorrectly make the returns of different calls to e eq?.
  • "If you have pattern-matchers, you need to indicate them, presumably by symbols. How do you distinguish those symbols from symbols to be bound?"

    I suggest a quasiquote-style syntax, where low-quote indicates a non-literal sexp.

  • "What about unexpected problems?"

    Having implemented more or less what I described for Emtest, I didn't find this especially problematic. However, I did not implement the quasiquote syntax because of how Elisp works.

  • "Is it really needed?"

    Maybe not in the core, but then again, maybe. ISTM one should be able to define functions that have patterns as argument lists.

More to come

I will probably post about Kernel again when I have more thoroughly digested the paper.

[fn:4] (Footnote from nowhere, I just thought this was interesting) I was surprised to find that John has, among his weblinks, a link to my old conlang Allnoun. Apparently he is interested in conlanging. Maybe conlanging and programming language conlanging go together.

Footnotes:

1 I suppose we should abbreviate it R-1RK. So should the next report be R0RK or just RK?

2 There's much more to be said, but this is then main thing. Read the paper (ps or pdf) to get the full story.

3 For instance, he refers to `$let/cc' and `apply-continuation' before he defines them, and the descriptions of extend-continuation and guard-dynamic-extent required two readings.

4 In Emtest's "match.el" I used `eql', but Kernel behaves differently than Elisp and has no `eql', so `eq?' is correct.

5 "Combiner" is Kernel's term for what you may think of as a function's actual workings. It differs from a function in that it does not cause its arguments to be evaluated. Applicatives (which you may think of as "normal functions") can be generated from combiners with "wrap".

23 October 2010

Review of On the Origin of Gravity

Review of Erik Verlinde's On the Origin of Gravity

I just read a paper that's been making the science headlines recently, On the Origin of Gravity and the Laws of Newton. In it Erik Verlinde develops an explanation of gravity as an emergent phenomenon driven by entropy.

It's a somewhat easier read than you might suppose. It helps to be familiar with the holographic universe principle. I get the impression that this might be the popular version of some longer paper or papers. He begins with an example of how entropy alone can give a force, in terms of a molecule (or rubber band) contracting due to assuming a higher-entropy configuration. He returns to this example thruout the paper.

His argument is often merely heuristic, which he admits. In other places it's elliptical; I get the impression that the ellipticality is not hiding anything too damaging, but I didn't try to verify it.

After reading, it's clear that the publicity surrounding it is completely appropriate. If even half of this paper's theory holds up it will be a milestone in physics.