20 July 2010

My elisp directory conventions

Here are two directory conventions I have found useful in organizing my elisp projects:

Library symbols

I use a convention of setting each subordinate library's symbol as a path-name relative to the main directory (ie, the root lisp directory of the package). For this to work right, load-path only points at the main directory.

For example, a directory structure might look like:

(main directory)
(Name isn't important. load-path only knows about this directory)
my-project-name/
plugins/
foo.el
(provides `my-project-name/plugins/foo')
bar.el
(provides `my-project-name/plugins/bar')
baz.el
(provides `my-project-name/plugins/baz')

I do this because I feel that emacs' namespace is getting crowded and I keep encountering extension library names that try to squeeze project name and library name into one component, like "org-this-thing.el" or "pcmpl-that-thing.el".

This style has the official approval of emacs-devel (I asked) and I know of one other project besides mine that does this (CEDET).

Placement of support libraries

I put testing and other support libraries in a subdirectory with the same name as the library they support. As author of the testing package Emtest, I've found that that works nicely. Just customize uniquify-buffer-name-style so that the various "test.el" buffers get named something clearer than "test.el<2>", "test.el<3>" etc.

Under this convention, part of the directory structure might look like:

my-project-name/
plugins/
foo.el
(the library itself)
foo/
tests.el
(tests pertaining to "foo.el")
testhelp.el
(code that is useful in testing "foo.el" and also useful testing other files that build on "foo.el")
examples
file1
(a file used in tests)
file2
(another file used in tests)
bar.el
bar/
tests.el
(tests pertaining to "bar.el")
(etc)
baz.el
baz/
tests.el
(tests pertaining to "baz.el")
(etc)

13 July 2010

Scheme environments as object system

Disclaimer

First a disclaimer: This post isn't intended to be practical. Scheme already has an object system, introduced in R6RS. Beyond that, people have been using Scheme closures as a poor-man's object system for quite a while.

But this idea has been kicking around in my head for a while. So I will write it down, not expecting it to go anywhere. Without further ado:

The idea

The sine qua non of an object system is to:

  1. Have a related group of datums
  2. Sometimes treat the group as a whole, eg storing it in a single variable.
  3. But also sometimes treat it as seperate fields.

Any non-trivial programming language already provides us with #1 and #3. All we need is #2. That is, a means to:

  • gather a user-specified group of datums together.
  • Pass the group around
  • Retrieve individual fields from the group.

Scheme already gives us a way to do that, because it lets us:

  • Get an empty emvironment via (null-environment)
  • Add name-and-datum pairs to that environment, via lambda, let, etc
  • Get that environment as a single object via (current-environment)
  • Having it as a single object, see it as separate fields via eval by passing it as environment arg. This last one is not as powerful as I'd like, but I think that's fixable. More on that later.

Some code

How to create an environment object. In this example it has the fields "field1" and "field2":

(define (my-obj val1 val2)
   (
      (eval
         '(lambda (current-environment field1 field2)
            (current-environment))
         (null-environment))
      current-environment val1 val2))

How to access a single field:

(eval 'field1 my-obj)

How to set a single field:

(eval `(set! field1 ,new-val) my-obj)

What's neat about it

  • It feels very Scheme-ish.
  • It unifies the treatment of environments and objects.
  • It's very flexible. Objects can add fields at run-time.
  • One can define methods without additional mechanism. They are just functions in the special environment. But they need a less messy way of seeing functions they call that live in normal environment.
  • It introduces inheritance naturally. Just extend a parent environment instead of (null-environment) above.
  • It introduces constructors naturally: They are just functions that return suitable environment objects.
  • It introduces static fields (aka nongenerative fields) naturally. They are just fields of a parent environment that's shared by all instances.
  • The objects are passed more directly than closure-based objects.

    With closures, you are passing around what looks like a function. That function internally holds an object, accesses it, and even mutates it. Sometimes it's not clear that passing the function to one call is causing the object to be mutated before it gets to another call.

    With this, the object is passed more traditionally, more directly.

R6RS compatible? Not quite

Is it compatible with R6RS records? It can almost be made to be, but not quite.

It mostly needs for each object to have an associated record-type object. That would be an object in its own right, and would in turn have a type. (That doesn't recurse infinitely. Eventually a type is its own record-type) That type-object needs to be defined by name, so that record-type-descriptor etc can find it.

define-record-type can be have the same interface as in R6RS. Having that, most of the other functions fall out.

In addition, each (non-opaque) environment object needs to know its own record-type, and record-rtd needs to be defined to look for that. So define record-rtd something like:

(define (record-rtd env)
   (if (environment? env)
      (eval '*the-rtd* env)
      (error "Not an object")))

record? could return #t just if its argument is an environment in which the-rtd is bound.

We can bind the-rtd manually. But current-environment-as-object (see here) should do it automatically unless (opaque #t) is specified.

record-predicate would check for the-rtd and compare it to what it should be.

Some incompatibilities remain:

  • record-type-field-names needs to list all the bindings in the environment. (In fact the list is returned as a vector) Some Scheme environments provide this, but there is no portable way to do it. Because we have the flexibility to add new bindings, we can't assume there is a pre-existing list of field names.
  • eqv? now needs to can compare environments. Again we need to list all the bindings in one or both environments, and then (recursing) compare them with eqv?. So if record-type-field-names works, eqv? can work.
  • R6RS allows fields to be marked immutable. This could be partially enforced by not providing a setter for those fields. But that's an ugly solution. It also doesn't hold when the programmer uses the environment object as an environment.
  • There's no way to express or enforce the various constraints on extendability such as sealed, opaque, nongenerative.
  • Extensions of an environment object will still appear to be that type of object. That could have unpredictable consequnces.

There might be addressed by Scheme changes that IMO are minor compared to the record-type specs.

How it might be better supported

Provide type information about environments

Above, when we couldn't get a list of field-names and we wanted more information about environment type, what was really going on is that the Scheme R6RS records were tightly described by type information and our environment objects are not.

So we want a way to express and enforce the type of an environment. And that needs to be more powerful than the R6RS records' type needed to be. Some things are clear:

  • We'd like to can constrain the list of field names.
    • We'd like some flexibility in this. For instance, instead of specifying an exact list of field names, we could specify a list of fields that must exist and a list of fields that may exist.
    • We'd like at least the simple case to be easy and not create error opportunities, so we don't want the user to must repeat the list of field names.
  • We'd like optimizations to work "under the hood". Eg, if the field list can be precomputed, the representation can be optimized without requiring user involvement.
    • There is something to be said for optionally giving user control of layout etc, but this is not the place for it.
  • Since environment objects may have parent environments that they extend, parents must can have a different type than their children, and that type must can be enforced too.
  • By the same reasoning, for constructs that change the environment (let, lambda, etc), their internal environment object is of a different type than their external environment object.
  • So object type is only applicable to environments at a particular extension.
  • Defining changes an environment object's type even more strongly by changing its list of field names in a way that doesn't end at the end of the current scope.
  • This mechanism must be safe for environment objects that are derived not from null-environment but from something else (say scheme-environment)
  • We do not require "sealed" objects in the R6RS sense, because it's possible to extended any environment via let and doing so must be harmless. Basically, that's a chain of environments, some of which are objects. We only must can seal objects against "define".

So I suggest the following type support for environments:

  • Behind the scenes, annotate each environment with a record-type-descriptor. This replaces the magic field the-rtd above. It can be #f, signalling that the environment is opaque, ie has no rtd.
  • Let record-rtd return that record-type-descriptor.
  • Provide a means of specifying limitations on how an environment may be used. I tentatively call this ->object.

    ->object is passed an environment and either a record-type-descriptor or a list of limitations on how the environment may be used. It returns an environment object annotated with the appropriate rtd, and possibly optimized.

    If it is passed a record-type-descriptor, it will make an environment object with exactly the fields defined by the record-type-descriptor. Other bindings will be ignored and unbound fields will be initialized to their default values (possibly #f). The record-type-descriptor should not define a constructor - it's not clear how its arguments could be chosen.

    Otherwise it will return an environment object identical to what it was passed except annotated by a record-type-descriptor implied by the extant bindings and the list of limitations. The syntax of the limitations is the same form as for R6RS records except as noted. They can include the following:

    sealed
    Three possible values:
    #f
    No limitation on extending it or defining into it. Can't easily precompute its list of fields.
    semi
    One can't define into it, but can extend it and inherit from it. Ie, "children" extended from it satisfy this object type's predicate. This is the default.
    #t
    In contrast to R6RS records, this does allow extending. It does not allow inheritance or defining into.
    immutable
    Its bindings can't be changed, as by set!. But they may can be extended, according to sealed.
    nongenerative
    Only one copy of it exists with any one UID. If another copy has already been created with the same uid, it is as if ->object had been passed that record-type-descriptor. if other limitations are incompatible with that rtd, an exception is raised.
    ???
    Omitted:
    • opaque. To get the effect of not having an rtd, just don't use ->object.
    • parent. Not needed because an environment automatically inherits from any environment that it extends.
    • parent-rtd. Same reasoning.
    • protocol. It's largely about defining constructors, but environment objects have essentially run the constructor already.

So now a ctor might look something like:

(define (my-obj val1 val2)
     (
        (eval
           '(lambda (current-environment ->object field1 field2)
              (->object (current-environment) (sealed #t)))
           (null-environment))
        current-environment ->object val1 val2))

Provide to mark bindings immutable

This is pretty self-explanatory. Having this would give us immutable fields. Some Schemes have this but it's not portable.

Join environments

In a few places above, we wanted to get bindings from 2 places at once: The environment object and the normal outer environment. The ability to join environments together would solve much messiness above, and would be useful in and of itself.

This is frustratingly just short of being provided by R6RS. Libraries can be imported in groups (and thus joined), and renamed in useful ways. Modules can be converted to environments. But environments can't be joined.

I would suggest extending the import spec to allow environments just when it is given as argument to environment. Syntax: <import set> would be extended to allow (environment <expression>).

NB, there are two senses of the Scheme identifier "environment" above, a procedure (available almost everywhere) and an auxiliary syntax (available just in import-spec when interpreted in a call to the procedure environment)

An example of usage:

(eval '(let ((x 3)) (scribble x))
   (environment 
      `(rename 
          (environment ,(current-environment)) 
          (write scribble))))

R6RS also says that such environments are immutable. I would loosen that restriction to not neccessarily apply to imported environments, just to modules.

Annoyed by speedbar

Annoyed by speedbar

Just now I tracked down a configuration bug in emacs. Cut to the ending: Speedbar had installed into etc/emacs/site-start.d an overly vigorous configuration file which did things I didn't want. Without telling me.

Backstory: I had tried out CEDET a month or so ago. CEDET emcompasses speedbar, EDE, EIEIO, and other emacs devel tools. EDE the project manager was what I was interested in. (Didn't choose it. I chose project-buffer-mode instead)

I fetched it via Debian aptitude, which automatically installs it. When EDE absolutely wouldn't start, I uninstalled CEDET.

An hour ago when I went to post Review of What If the Earth Had Two Moons? it complained that it couldn't find a file called sb-html. I know very well that org2blog doesn't require this file, because I wrote org2blog. And I know that it was never an issue before, because I tested org2blog meticulously.

So what changed? Finding one change in the whole huge system is a chore not to be taken lightly. I started by using dired search in the packages that I knew had changed recently, org-mode and project-buffer-mode. Nothing about sb-html.

I started another emacs, hoping to find what added the sb-html requirement. To my surprise, it was already present at startup! I know it's not in my customizations, and a quick search confirmed this, so how did it get there.

my-site-start was great about helping. In addition to organizing my emacs startup files, I can binarily divide my load sequence just by renaming a debug file. But when I moved the debug file to first place, 00debug.el, the unwanted config was still there.

So what in .emacs loads before my-site-start does it's work? Not much, and all quickly ruled out.

This could have been very frustrating, but fortunately I thought of etc/emacs. A quick check with dired search revealed the culprit to be speedbar, or rather its configuration file, which for some reason didn't get removed.

All of CEDET's config removed, and problem solved.

Review of What If the Earth Had Two Moons?

What it is

A book by Neil Comins about what the earth would be like if it had two moons, and nine other hypothetical scenarios about the solar system.

I'm tempted to call it science fiction. It nearly matches the original definition of the term. It is fiction about science, carefully based on existing science.

But it is not a story. There is no narrative, other than the vignettes he begins each chapter with. Instead, in each chapter he describes a hypothetical world that is as similar as possible to Earth, except for one major astronomical difference, such as:

  • The title essay What If the Earth Had Two Moons?
  • What if the Earth were a moon?
  • What if the moon orbited backwards?
  • What if the Earth's crust were thicker?
  • What if the Earth had formed fifteen billion years from now?
  • What if there were a Counter-Earth?
  • What if the Earth had formed elsewhere in the galaxy?
  • What if the Sun were less massive?
  • What if the Earth had two suns?
  • What if another galaxy collided with the Milky Way?

This is his second such book. The first was called What If The Earth Had No Moon. I haven't read that one.

Impressions

Comins tries hard to correctly apply current thinking in diverse fields such as astronomy, plate tectonics. To a reader who is already familiar with those topics, he spends too much time on basic explanations, but that can't be helped.

For this book, Comins has to tread a fine line between providing too much support for his conclusions and too little.

Neither goal is entirely satisfied IMO. On the one hand, despite the vignettes, it's a dry read as popular books go. On the other hand, I was occasionally left doubting his conclusions, or at least doubting that they applied with any real certainty.

I have to wonder if it would have been better to have structured the book so that it had multiple levels of support according to the reader's inclination to explore further. Basically I'm suggesting footnotes and a bibliography. He does provide appendixes but could have done more.