17 November 2010

kernel-library-paths

kernel-library-paths

I've been thinking about library paths in relation to Klink, the Kernel implementation I've been writing.

Some lessons from the past

  • Being familiar with Elisp's horrid treatment of library naming, I'd like to do library-finding right, and not hobble for years with a suboptimal system.
  • Scheme's use of lists as library specs is the right way to do it.
  • SRFI-103 suffered from a lack of encoding for identifiers that don't map cleanly to file names. Avoid that.
  • SRFI-103's attempt to escape some identifiers was overly special, resulting in a non-trivial knowledge bad. Don't do that, use something well-known.
  • It is really nice to allow a "main" file in order to group a file with its supporting packages.
    • From SRFI-103, though its "main^" syntax was awkward.
    • From my experience creating Elisp packages, especially Emtest.
  • Want to support at least these parameters:
    • An ordered list of directories to search.
    • An ordered list of file-name extensions
      • From SRFI-103 "to distinguish dialects, formats, and extensions"
  • Want to at least optionally choose among available versions of a library.

    I don't have to support that in the near future, but it should at least be specified.

  • It's nice to allow sub-features in "require" type statements
    • From emacs >= 22 `require' statements, and from Redhat rpm.
    • This clashes with the previous lesson.
  • Traditional major.minor.release version numbers have some problems, especially wrt project branching.
    • Git among others.
    • This clashes yet further with the previous two.
  • I'd really really prefer versioning that doesn't push the version-update treadmill to go any faster than it has to.
    • Too many times on the version-update treadmill.
  • Reading the first syntactic datum is a wart on the procedure, and I may be able to solve it.
  • It is nice and convenient to have a way to abbreviate file locations, something more transparent than calling expand-file-name everywhere.
    • From SWI Prolog and from emacs' default-directory
  • Init files should be ordinary source files, implying that they should fit into this library scheme. They come in more flavors than you would expect.
    • Learned from many places, especially TinyScheme and emacs.
  • It's often convenient to be able to associate "support" files with a source file. Their extension should differ so that you can tell them apart.
    • Learned from Mercury and from C/C++
  • The normal source file extension should be unique. So use resources like Filext to check it.
    • From Prolog and Perl's clash over *.pl (Prolog was there first)
    • "KRN" is taken. Could take it anyways since it's unrelated, a font for a video game
    • "KRL" seems available.
  • Sometimes one would like to define multiple submodules within the same file.
    • Learned from Mercury and from an experiment I did in Prolog.

Answer 1: Just borrow URL encoding.

It very simple and it's in widespread use.

Now, what to do about clashes? Eg, what if files "too%06dany" and "toomany" both exist?

Several answers present themselves:

  • Prefer them in alphabetical order?
  • Prefer them in almost alphabetical order, but sort the escape sequence to before/after other files?
  • Check for the existence of clashing filenames and raise error.
  • Don't allow clashes. Enforce a one-to-one mapping of strings to filename encodings.

I like the last answer best. Encodings that are not mandatory are forbidden. In other words, for any file name it must be the case that:

(encode (decode filename))

gives the original filename again.

Answer 2: Encode "main" with a similar escape sequence.

That is, some escape sequence that would otherwise be illegal means "main". Let's use "%main" for that.

Answer 3: A file is a list of sexps

This will probably be the most controversial answer. Instead of reading the first syntactic datum in the source file, let the whole source file be a list. That is, when you start to load a file, pretend you just read an open bracket, and the matching close bracket is EOF. Each sexp in between is one element in the list.

That even handles the case where the file is just a script: Put Kernel's `$sequence'1 as the first symbol in the file.

Tentative answer 4: Library namespaces for special purpose

These would include at least:

  • Standard preludes for an implementation
    • Say the implementation is called "acme", the prefix would be;
      (implementation acme) 
      
  • System rc files for various purposes
  • Personal rc files for various purposes

Tentative answer 5: Resolving the treatment of versioning.

Recapping, the things I wanted were:

  1. Want to can choose among available versions
  2. Use sub-features for version requirement.
  3. Version IDs need not be major.minor.release
  4. Don't make the version-update treadmill go faster than it has to.

If we were to use traditional major.minor.release version numbers, that would satisfy #1 but scuttle #2 and #3.

My tentative answer is to eschew the false economy of not reading a file before you need it. Our code should just partially evaluate the file and see what it places in a "subfeatures" area2. If that's ambiguous, then we have a problem - but we always would have in that case.

If efficiency is required, do it behind the scenes. Possibly pre-compile data for individual files or even to index the entirety of code available on a given system.

For #4, I have a tentative scheme involving automated tests and code coverage tools, but that's for another time.

Answer 6:

If a library path resolves partway thru to file rather than a directory, look in that file for submodules. IIRC this is a direct clone of how Mercury does it.

Footnotes:

1 equivalent to Lisp `progn' or Scheme `begin'.

2 Kernel is basically re-inventing these things with first-class environments. So very likely subfeatures will be an environment accessed by key from the main environment that the library provides.

05 November 2010

COW-stacks

COW-stacks

Why

Klink (as Tinyscheme) has two representations of its stack:

  • As a list.
    • Pro: It allows reusable continuations
    • Con: It's very inefficient.
  • As a resizable array
    • Pro: It's much more efficient
    • Con: It does not allow reusable continuations.

Neither choice is very appealing.

The answer

Thinking about the problem, I invented what I call a copy-on-write stack. It's really a set of arbitrarily many stacks. The multiple stacks can share their deeper elements, while cheaply pushing and popping their shallow elements.

I'm not actually going to add COW-stacks to Klink any time soon, but I thought about how they would work and I wrote it down. Stray references below to Klink and continuations are due to the context in which I was thinking about COW-stacks.

How: COW stacks

Entities, briefly:

SE
Stack element
LS
Linear section (of stack elements)
Slider
The part that's visible from outside; the stack as a container.

Operations, briefly:

  • push
  • pop
  • capture a continuation
  • finalize a continuation

Overall idea

We hold a tree of linear sections of stack elements. The sliders, representing states of execution, can move along the tree, and can be cloned for continuation capture. These is to work as if each continuation capture had copied the entire stack.

Desired properties
  • The most common operations go very fast. In particular, when there's only one slider, this should do almost nothing that an ordinary stack would not do
  • Allow arbitrarily many partly-shared stacks.
  • Minimize copying the stack.
  • Each stack works as if the entire stack had been copied

Entities in detail

Stack element
  • Essentially as a dump stack element now
  • Parts:
    • Next instruction (ie, to be continued to)
    • Arguments
    • Possibly data
      • Rethinking data function types: "data" may just a special argument that it passes to itself, that our special data-function type gets from its car. Do we ever need to actually store that?
    • List of dynamic bindings
      • Could cache the identity of the next SE, if it never moves in memory.
Linear sections
  • A linear section of the tree. Generally associated to one or more sliders. It keeps a list of these sliders roughly in order from shallowest to deepest. Ie, highest index to lowest.
  • Parts:
    • Reference to its parent linear section
      • For the root, this is NULL
    • Index into its parent linear section
    • A growable array of stack elements
    • Length of that array
    • Pointer to a slider, understood as the head of a list.
      • Can be NULL (for an empty list) if this linear section has become unoccupied.
Slider
  • The object representing a state of execution.
  • Owners:
    • Captured continuations
    • The interpreter
  • Parts:
    • Reference to PO
    • Index into the PO's array of stack elements.
    • Reference to the "next" slider, understood to be used as a cell in a list.
      • For the last element, this is NULL
      • Would be nice for this to be a weak reference, since it shouldn't keep sliders alive.
    • States:
      surely-shallowest
      This element is the head of the list
      maybe-shallowest
      This element, while not the head of the list, may well be equally as shallow. It's not guarantee to be, since we mostly don't conform marks when we move other sliders.
      deeper-than-this
      This element has been moved deeper than its "proper" position in the list. It can remain out of position indefinitely.
      dead
      This node represents an unreferenced slider and can safely be removed at any time. That's just if we choose strategy (2) for removing sliders.
      normal
      This node is none of the above.

Two strategies for removing a slider

  • Traverse the list and when we find it, perform list surgery.
  • Mark it dead and when it's encountered, discard it.

I have no strong opinion on which is better.

Operations in detail

  • pop
    • These are the possible situations:
      • Decrement within linear section
        • When: index > 0
        • Action:
          • Decrement the index.
          • If state is:
            surely-shallowest
            Make new list of sliders, keeping this slider on list.
            (other states)
            Set it deeper-than-this.
            dead
            Shouldn't happen.
      • Decrement out of linear section to previous LS.
        • When: index == 0 and LS has a parent
        • If state is:
          surely-shallowest
          Make new list of sliders, removing this slider from the list.
          (other states)
          Do nothing special for this step.
          dead
          Shouldn't happen.
        • Remove it from the list
          • Clone a new slider and mark this one dead?
          • Extract it from the list?
        • Add it to the parent LS
          • Index is the index that the child LS had into its parent.
          • State: maybe-shallowest
      • Decrement out of root linear section
        • When: index == 0 and LS has no parent
        • Stack is empty. Interpreter is done.
  • push
    • These are the possible situations:
      • State is surely-shallowest
        • Remarks: This state does nothing special
      • State is maybe-shallowest
        • Compare to the actual head's index. If this slider is:
          • As deep (or deeper):
            • Mark that one maybe-shallowest
            • Remove ours from the list.
            • Mark ours (or its clone) surely-shallowest
            • Place ours (or its clone) at the beginning of the list.
            • Continue as surely-shallowest.
          • Not as deep
            • (We needn't mark ours normal because we're about to remove it)
            • Proceed as for normal.
      • State is normal or deeper-than-this
        • Make a new LS
          • Its parent is the old LS
          • Its index into parent is our old index
        • Remove this slider from the list.
    • In each case, having made sure we are safe, we add another SE:
      • If the array isn't large enough, enlarge it.
      • Write an element to PO at our index
      • Increment our index by 1
  • capture a continuation
    • Make a new slider. Insert it behind the old slider, same index.
      • Index: Same as old one
      • State
        • If old slider is surely-shallowest, make the new one maybe-shallowest.
        • Otherwise give the new one the same state as the old one.
  • finalize a continuation
    • Remove its slider from the list
  • Initialization
    • Make one linear section
      • Initialize its array of SEs.
    • Make one slider, surely-shallowest, index 0, pointing at that linear section.

Supporting operations

Conform sliders
Conform a first element that moved or was removed.
  • Remarks
    • This was just intended for when the first element moves. Other elements mostly stay in place, and are only reconsidered if first element happens to see them while it's rebuilding a list.
    • We'll examine the next elements, collecting a new list that we'll give to the LS. We'll also conform the portion of the list that we see, and we'll make sure that that portion extends as far as the next-shallowest index.
  • Action:
    • (Simple case, for comparison) If there's no next element, we're done. New list is this element or nothing.
    • If an element is this slider itself, ignore it. We'll insert it when we have surely decided its position.
    • If there is one and its state is:
      surely-shallowest
      This shouldn't ever happen.
      dead
      Ignore this slider. It won't be part of the new list.
      (other)
      We'll put in rough order in the collected list.

      Possibilities of its index relative to our slider:

      • Its index is shallower (presumably only by 1)
        • Set our slider's state normal since we can rule out the other states.
        • Do we know a definite head yet?
          • No:
            • This is now the definite head.
            • Mark it surely-shallowest
            • Record min-index
            • Record reference to its cdr, the tail cdr.
          • Yes:
            • Leave it in its place in the list
            • If it's index is:
              • Same as min-index
                • Mark it maybe-shallowest
              • Larger than min-index
                • Mark it normal
              • Smaller than min-index
                • This shouldn't ever happen
            • Record reference to its cdr, the new tail cdr.
      • Its index is the same as our slider's.
        • Remarks: We may or may not want to also mark other nodes, since they may tie for shallowest.
        • Construct an intermediate list consisting of:
          • Our slider, if we're leaving it on the list
          • The new slider
          • Append the deeper-than-this list, if any.
          • Append the new slider's tail.
        • Remark: If there's no deeper-than-this list, this construction amounts to consing our slider to the new slider.
        • Do we know a definite head yet?
          • No:
            • Mark our slider surely-shallowest
            • Mark the other slider maybe-shallowest
            • Return the intermediate list
          • Yes:
            • Add it as the tail of the collected list
            • Return the collected list
      • Its index is larger than our sliders:
        • If other node is deeper-than-this:
          • Queue it to a deeper-than-this list
            • We'll add that list when we construct an intermediate list just before returning.
            • It's a queue so nodes stay in order.
          • Continue (ie, examine next node)
          • Remarks:
            • There's no point try to remark it.
              • It won't match min-index
              • It won't match our slider's index.
        • Otherwise:
          • Same as for same index.
    • If there's none (general this time),
      • Construct an intermediate list consisting of:
        • Our slider, if we're leaving it on the list
        • Append the deeper-than-this list, if any.
        • No tail
      • Treat it as in maybe-shallowest, same index.

30 October 2010

Kernel can't have and-let

Kernel is strict about boolean types

Kernel is the Scheme variant by John Shutt that I've been blogging about.

One thing about Kernel is that it does not let non-booleans be used as tests in conditionals. Ie, it doesn't use what other Lisps call the extended boolean - every object other than false is deemed true.

In a sense, I agree with this. There is virtue in keeping the types straight. But OTOH, it blocks some Scheme conveniences, which then need to be reinvented.

So it can't use #f as an out-of-band object

A common Scheme (and Lisp) idiom is to use the false object as an out-of-band object. For instance, a search that find nothing returns #f.

This is made easy by the latent typing that Scheme has. The #f object can go anywhere unless you explicitly check for type, and still it is easily checked for. But that's never been a perfect plan. There's always the question of what you do when the search (or w/e) gives the #f object itself. Sometimes there's also the question of how to distinguish different non-fatal failures.

But in Kernel that's not available because #f and #t aren't suppsoed to mix with other objects, lest they cause $if to exhibit hard-to-find bugs.

Kernel does provide a special object called #inert for "no information" returns. John discusses this in the rationale for assoc, contemplating making assoc return a pair of a boolean and either the answer of #inert. For instance, a successful search could return:

(#t . The-Result)

and a failed search

(#f . #inert)

These are easily treated because Kernel makes multiple binding available everywhere.

But then he goes on to argue that under the hood, assoc will have to deal with a result object that is () on failure, and that the user can easily re-represent it as bool+result if desired. He therefore concludes that assoc should just return either () or the found object.

I'm not entirely sure I agree. It seems like this situation is particular to assoc (and assq and a few others), not generally available. Won't other search functions still have to return bool+result? One could argue that it's more important to make assoc uniform with other searchers, and the nil-on-failure version could be provided also.

Also, it has always struck me that assoc was often more difficult than it needed to be, because instead of returning the value that's usually sought, which is the cdr of the cell, it always returned the whole cell. It does this for two reasons:

  • Sometimes one wants the whole cell.
  • It needed to have a means of indicating failure. If it returned the cdr, that would sometimes be (), the same as the failure result.

Kernel's multiple bindings almost solves this problem. It solves it in the success case. But if assoc's failure value is (), multiple bindings won't help. One can't bind the result's cdr, because () doesn't have one. So there's another argument for assoc returning bool+result.

So no and-let

One piece of convenience that this denies is and-let*. And-let* is a sort of hybrid of and and let*. It binds symbols successively to values, like let* does, but if any of those values is #f it terminates, like and does. and-let* makes it very convenient to alternate between getting values and testing that those values are suitable. I find that this situation comes up a lot.

But since one can't (or shouldn't) mix booleans and non-booleans, there won't be many #f results to terminate it. They could only come from functions that return booleans, and bindings their results makes little sense, since if and-let* succeeds the values can only be #t.

What to replace and-let* with

IMO the best replacement for `and-let*' is a pattern-matching `let*'. I wrote something like that as part of "match.el" in Emtest. It would go beyond Kernel's what multiple bindings can do and be a true pattern language. Constructor functions such as cons and list would have matching destructurers. It'd be something like:

(pattern-let*
   (
      ;;This clause has the same interpretation as in let
      (a value1)
      ;;Same interpretation as in Kernel's let but not Scheme's
      ((b c) value2)
      ;;Like ((d e) value3) in Kernel, because ~list~'s matcher
      ;;would accord with ~list~.
      (,(list d e) value3)
      ;;Call a matcher associated with the constructor ~foo~ to
      ;;destructure value4 and bind its parts to f and g
      (,(foo f g) value4)
      ;;Result is not bound but must be 12
      (12 value5)
      ;;Result is not bound but must be #t
      (#t value6))
   ;;In the body, a thru g are all bound.
   (do-stuff a b c d e f g))

This is clearly quite powerful. It probably belongs in a pattern-matching module.

No type signatures

The non-mixing of booleans and non-booleans suggests to me that there should be some sort of a type signature in Kernel, at least to distinguish booleans from everything else. But there isn't.

The name #inert

Kernel also has a special object called #ignore. ISTM #ignore and #inert are too similar in name and role, and would be easily confused. So I have some suggestions for renaming #inert:

  • #n/a
  • #not-applicable
  • #not-available
  • #no-info

Kernel: rationale for not listing bindings

Understanding Kernel's rationale for not listing bindings

As I blogged, I have been reading about Kernel, a better Scheme.

One thing Kernel will not do is to list all of an environment's bindings. At first glance this seemed wrong, because the first rationale that I read for it seemed to contradict itself:

From (§6.7.1)

Presenting [$binds] as a library feature drives home the point that it doesn't introduce any capability that wasn't already provided by the language. In particular, for purposes of type encapsulation (§3.4), there is still no way for a Kernel program to generate a complete list of the variables exhibited in an arbitrary environment. Predicate $binds?, or the techniques used to derive it, could be used to formally enumerate such a list, by sequentially generating all possible variables (an infinite sequence) and testing each one; but there would be no way to know when all variables in the environment had been enumerated. This nontermination is critical because it means that no Kernel program can prove in finite time that two non-eq? environments are equivalent up to mutation; therefore, the rules governing predicate equal? (§4.3.1) do not require it to return false in any cases where predicate eq? returns true.

I read it thru 3 times, and he seems to have accidentally flipped polarity in his argument: He wants to avoid a situation where (called on two objects) eq? would give false and equal? true. But the thing that he fears will give rise to it is being able to completely compare non-eq? environments.

When I read this I got the impression that his rationale was mistaken. But then when I read the paper again, I saw a better rationale elsewhere.

Better rationale

When introducing environments, he gives a different rationale for linking equal? and eq? on environments:

(§4.8)

The behavior of equal? is tied to that of eq? to forestall the possibility of an implementation compromising the encapsulation of the type by allowing a program to determine, in finite time, that all bindings for one environment are the same as those for another. (Cf. the rationale discussion for the derivation of library predicate $binds?, §6.7.1.)

And (§4.9) reinforces it:

There is no need for this module to assume the Equivalence under mutation module, because environments are eq? iff they are equal? .

So the discussion in §6.7.1 was misleading. Now it seems that:

  • He fears that listing bindings will compromise the encapsulation of the environment type.
  • So he wants equal? and eq?, called on two environments, to always give the same answer.

Why does he want the environment type to be encapsulated in this way? Apparently to prevent mutation of parent environments. Also from (§4.8):

First-class environments offer a tremendous amount of control over what can be accessed from where - but only if there are limitations carefully placed on what can be done without explicit permission. In particular, whenever a combiner is called, it has the opportunity, in principle, to mutate the dynamic environment in which it was called. This power is balanced by omitting any general facility for determining the parents of a given environment, and also omitting any other general facility for mutating the parents, or other ancestors, of a given environment. (For an example of articulate environment- access control, see $provide!, §6.8.2.)

But what's the threat?

This unfortunately does not make it clear what the threat is. I just can't see any exploit that could use `list-bindings' to do something bad.

All I can guess is that it's that misbehaving code could get a list of all current bindings and comprehensively rebind all of them and thus fake the original environment, parents and all. But this is not the same thing as mutating that parent environment. And since one can sometimes correctly guess all the bindings, it is not clear how one can expect this to always work.

Why I suspect this prohibition isn't needed

Here's a positive argument that this prohibition is wrong: Containers such as alists can hold the same data that environments do: A set of bindings. Yet there is no problem being able to list all the elements of an alist or to compare alists `equal?' (and Shutt doesn't say there is).

kernel-interpreter

I'm writing a Kernel interpreter

Kernel is the Scheme variant by John Shutt that I've been blogging about.

The two and a half days, I've been adapting the Tinyscheme interpreter to run Kernel. Right now it doesn't compile. Some of the choices I'm making:

New eval functionality

The first thing I did was to write a new eval loop and support functions to make and evaluate vaus, which are basically Kernel's version of lambdas. Like in Tinyscheme, it's separated into parts, and in between the parts the interpreter does other stuff such as nested evals.

C functions instead of new opcodes

Rather than grossly extend the OP_ list, I've opted to add the capability for the interpreter to call a C function instead.

This is distinct from the old foreign function type. All these functions have the signature:

typedef pointer (*kernel_func)(scheme * sc, pointer args, pointer env);

They are suitable for use as operatives, so a lot of the base Kernel functionality can directly be C functions.

Initially I coded some of the stuff as new opcodes, because that's the way TS has always done things. But I was never too thrilled about that way of doing things.

To support this, I:

  • changed the interpreter to include a field for the next function to call.
  • changed the stack frame code to save and restore that field
  • Added one opcode: OPKERNELCALL, which calls that function (or complains if it's NULL)

So right now both opcodes and C functions are supported and opcodes are primary.

New code for registering foreign functions

Since I have the new supported function type, I created a way of registering those functions. That is, knowing their C name and Kernel name (as a string), add them to a Kernel environment.

I wrote that facility for Tinyscheme in the first place, so it was pretty easily adapted. I corrected one thing I wish I'd done earlier and allowed this registering to apply to an arbitrary environment. Before, it always applied to the global environment (what Kernel calls the ground environment)

Added pseudo-object fields to interpreter

Tinyscheme creates some gotta-be-present objects as fields in the interpreter, such as #t, #f, and nil. Kernel has some new objects that I made similarly: #ignore and #inert

One thing I did differently: For some obscure historical reason, TS has both objects and pointers for those fields. It seems pointless; the pointer must always point at the same object, so one could just take the object's address. So for the new ones, I left out the pointer field.

Formatting

Tinyscheme code really hasn't had a consistent indentation style etc. On request, I had added a style marking to the file local variables, which as "k&r" which seemed to nearly match the original style. But I like GNU style better, so I switched the file local variable to that.

28 October 2010

Kernel: A better Scheme

Kernel

I have been reading Revised-1 Report on the Kernel Programming Language 1 about Kernel, a variant Scheme defined by John Shutt.

I found it interesting read:

  • I like his language ideas.
  • I like his mechanisms for them.
  • He scrupulously explains his motivations for each choice.

About Kernel

There is little else out there about Kernel. There's this paper, and an implementation in Scheme called SINK. Also a Wikipedia stub and some commentary on Lambda: The Ultimate. That's about it

FWIW, the name Kernel is very hard to search for. Googling "Kernel Scheme" generated mostly false hits. I'd suggest a change of name.

Everything is first-class / fexprs done right

Kernel's primary claim to goodness is that it makes everything a first-class object, even the things that Scheme wants to treat specially such as `define', `lambda', `and', etc. It does this by having its evaluator distinguish between "applicatives" which are essentially Scheme functions, and "operatives", which are somewhat like macros except2 that they take an explicit environment argument.

Continuations

Kernel approaches continuations a little differently than Scheme. Unlike the rest of the paper, continuations were presented in a difficult way3, and it took me some careful reading to make sense of what he was saying.

Kernel has these primitives about continuations:

  • call/cc The familiar Scheme construct
  • extend-continuation. The purpose of this wasn't clear to me. It's not merely building a lambda form that will call the continuation. It seems to have to do with building trees of continuations, which guard-dynamic-extent treats specially.
  • guard-dynamic-extent is a more powerful, safer form of dynamic-wind. Each guard knows a continuation and only intercepts children of that continuation.
  • continuation->applicative which makes a continuation into essentially a function. Apparently in Kernel continuations are not simply applied but must be converted first.

It would have been nice to see some examples for how these constructs would be used.

These extensions seem intended largely for handling exceptions, but he doesn't specify an error format. (I like a fat error format myself, but that's a topic for another time)

Multiple-value bindings

Kernel has multiple-value bindings, and they are available everywhere, not just in a special construct (as in Common Lisp). So you can write something like:

(let (((a b c) (list 1 2 3)))
   (list c b a))

And you can bind not only lists but trees:

(let (((a (b c)) (list 1 (list 2 3))))
   (list c b a))

I agree with this feature. I keep wanting this in Lisp and Scheme. I've built it with macros more than once, but it's nice to have it properly in the language.

But ISTM full pattern-matching is a fairly short jump away from this. As the writer of Emtest, I find pattern matching nearly indispensable.

Possible issues with full pattern-matching:

  • "What happens when a variable occurs twice in the pattern?"

    The occurences must be eq?4 or the pattern fails to match.

  • "What if the pattern doesn't match?"

    Nothing new. It's essentially the same situation as when a list is too long or too short.

  • "So how do you match objects that are not lists?"

    Answer: Every constructor-like combiner5 should be able to have an associated pattern-matcher.

    NB, for encapsulations created by make-encapsulation-type (8.1.1), e should not be considered constructor-like. If we were to try to use e with a symbol argument as a matcher, there would be no correct return.

    • Always returning false would be wrong because the ctor might be instantiated with the same object, which should match.
    • Returning true would let the contents as returned by d bind to the symbol, which would break encapsulation and incorrectly make the returns of different calls to e eq?.
  • "If you have pattern-matchers, you need to indicate them, presumably by symbols. How do you distinguish those symbols from symbols to be bound?"

    I suggest a quasiquote-style syntax, where low-quote indicates a non-literal sexp.

  • "What about unexpected problems?"

    Having implemented more or less what I described for Emtest, I didn't find this especially problematic. However, I did not implement the quasiquote syntax because of how Elisp works.

  • "Is it really needed?"

    Maybe not in the core, but then again, maybe. ISTM one should be able to define functions that have patterns as argument lists.

More to come

I will probably post about Kernel again when I have more thoroughly digested the paper.

[fn:4] (Footnote from nowhere, I just thought this was interesting) I was surprised to find that John has, among his weblinks, a link to my old conlang Allnoun. Apparently he is interested in conlanging. Maybe conlanging and programming language conlanging go together.

Footnotes:

1 I suppose we should abbreviate it R-1RK. So should the next report be R0RK or just RK?

2 There's much more to be said, but this is then main thing. Read the paper (ps or pdf) to get the full story.

3 For instance, he refers to `$let/cc' and `apply-continuation' before he defines them, and the descriptions of extend-continuation and guard-dynamic-extent required two readings.

4 In Emtest's "match.el" I used `eql', but Kernel behaves differently than Elisp and has no `eql', so `eq?' is correct.

5 "Combiner" is Kernel's term for what you may think of as a function's actual workings. It differs from a function in that it does not cause its arguments to be evaluated. Applicatives (which you may think of as "normal functions") can be generated from combiners with "wrap".

23 October 2010

Review of On the Origin of Gravity

Review of Erik Verlinde's On the Origin of Gravity

I just read a paper that's been making the science headlines recently, On the Origin of Gravity and the Laws of Newton. In it Erik Verlinde develops an explanation of gravity as an emergent phenomenon driven by entropy.

It's a somewhat easier read than you might suppose. It helps to be familiar with the holographic universe principle. I get the impression that this might be the popular version of some longer paper or papers. He begins with an example of how entropy alone can give a force, in terms of a molecule (or rubber band) contracting due to assuming a higher-entropy configuration. He returns to this example thruout the paper.

His argument is often merely heuristic, which he admits. In other places it's elliptical; I get the impression that the ellipticality is not hiding anything too damaging, but I didn't try to verify it.

After reading, it's clear that the publicity surrounding it is completely appropriate. If even half of this paper's theory holds up it will be a milestone in physics.