13 July 2010

Scheme environments as object system

Disclaimer

First a disclaimer: This post isn't intended to be practical. Scheme already has an object system, introduced in R6RS. Beyond that, people have been using Scheme closures as a poor-man's object system for quite a while.

But this idea has been kicking around in my head for a while. So I will write it down, not expecting it to go anywhere. Without further ado:

The idea

The sine qua non of an object system is to:

  1. Have a related group of datums
  2. Sometimes treat the group as a whole, eg storing it in a single variable.
  3. But also sometimes treat it as seperate fields.

Any non-trivial programming language already provides us with #1 and #3. All we need is #2. That is, a means to:

  • gather a user-specified group of datums together.
  • Pass the group around
  • Retrieve individual fields from the group.

Scheme already gives us a way to do that, because it lets us:

  • Get an empty emvironment via (null-environment)
  • Add name-and-datum pairs to that environment, via lambda, let, etc
  • Get that environment as a single object via (current-environment)
  • Having it as a single object, see it as separate fields via eval by passing it as environment arg. This last one is not as powerful as I'd like, but I think that's fixable. More on that later.

Some code

How to create an environment object. In this example it has the fields "field1" and "field2":

(define (my-obj val1 val2)
   (
      (eval
         '(lambda (current-environment field1 field2)
            (current-environment))
         (null-environment))
      current-environment val1 val2))

How to access a single field:

(eval 'field1 my-obj)

How to set a single field:

(eval `(set! field1 ,new-val) my-obj)

What's neat about it

  • It feels very Scheme-ish.
  • It unifies the treatment of environments and objects.
  • It's very flexible. Objects can add fields at run-time.
  • One can define methods without additional mechanism. They are just functions in the special environment. But they need a less messy way of seeing functions they call that live in normal environment.
  • It introduces inheritance naturally. Just extend a parent environment instead of (null-environment) above.
  • It introduces constructors naturally: They are just functions that return suitable environment objects.
  • It introduces static fields (aka nongenerative fields) naturally. They are just fields of a parent environment that's shared by all instances.
  • The objects are passed more directly than closure-based objects.

    With closures, you are passing around what looks like a function. That function internally holds an object, accesses it, and even mutates it. Sometimes it's not clear that passing the function to one call is causing the object to be mutated before it gets to another call.

    With this, the object is passed more traditionally, more directly.

R6RS compatible? Not quite

Is it compatible with R6RS records? It can almost be made to be, but not quite.

It mostly needs for each object to have an associated record-type object. That would be an object in its own right, and would in turn have a type. (That doesn't recurse infinitely. Eventually a type is its own record-type) That type-object needs to be defined by name, so that record-type-descriptor etc can find it.

define-record-type can be have the same interface as in R6RS. Having that, most of the other functions fall out.

In addition, each (non-opaque) environment object needs to know its own record-type, and record-rtd needs to be defined to look for that. So define record-rtd something like:

(define (record-rtd env)
   (if (environment? env)
      (eval '*the-rtd* env)
      (error "Not an object")))

record? could return #t just if its argument is an environment in which the-rtd is bound.

We can bind the-rtd manually. But current-environment-as-object (see here) should do it automatically unless (opaque #t) is specified.

record-predicate would check for the-rtd and compare it to what it should be.

Some incompatibilities remain:

  • record-type-field-names needs to list all the bindings in the environment. (In fact the list is returned as a vector) Some Scheme environments provide this, but there is no portable way to do it. Because we have the flexibility to add new bindings, we can't assume there is a pre-existing list of field names.
  • eqv? now needs to can compare environments. Again we need to list all the bindings in one or both environments, and then (recursing) compare them with eqv?. So if record-type-field-names works, eqv? can work.
  • R6RS allows fields to be marked immutable. This could be partially enforced by not providing a setter for those fields. But that's an ugly solution. It also doesn't hold when the programmer uses the environment object as an environment.
  • There's no way to express or enforce the various constraints on extendability such as sealed, opaque, nongenerative.
  • Extensions of an environment object will still appear to be that type of object. That could have unpredictable consequnces.

There might be addressed by Scheme changes that IMO are minor compared to the record-type specs.

How it might be better supported

Provide type information about environments

Above, when we couldn't get a list of field-names and we wanted more information about environment type, what was really going on is that the Scheme R6RS records were tightly described by type information and our environment objects are not.

So we want a way to express and enforce the type of an environment. And that needs to be more powerful than the R6RS records' type needed to be. Some things are clear:

  • We'd like to can constrain the list of field names.
    • We'd like some flexibility in this. For instance, instead of specifying an exact list of field names, we could specify a list of fields that must exist and a list of fields that may exist.
    • We'd like at least the simple case to be easy and not create error opportunities, so we don't want the user to must repeat the list of field names.
  • We'd like optimizations to work "under the hood". Eg, if the field list can be precomputed, the representation can be optimized without requiring user involvement.
    • There is something to be said for optionally giving user control of layout etc, but this is not the place for it.
  • Since environment objects may have parent environments that they extend, parents must can have a different type than their children, and that type must can be enforced too.
  • By the same reasoning, for constructs that change the environment (let, lambda, etc), their internal environment object is of a different type than their external environment object.
  • So object type is only applicable to environments at a particular extension.
  • Defining changes an environment object's type even more strongly by changing its list of field names in a way that doesn't end at the end of the current scope.
  • This mechanism must be safe for environment objects that are derived not from null-environment but from something else (say scheme-environment)
  • We do not require "sealed" objects in the R6RS sense, because it's possible to extended any environment via let and doing so must be harmless. Basically, that's a chain of environments, some of which are objects. We only must can seal objects against "define".

So I suggest the following type support for environments:

  • Behind the scenes, annotate each environment with a record-type-descriptor. This replaces the magic field the-rtd above. It can be #f, signalling that the environment is opaque, ie has no rtd.
  • Let record-rtd return that record-type-descriptor.
  • Provide a means of specifying limitations on how an environment may be used. I tentatively call this ->object.

    ->object is passed an environment and either a record-type-descriptor or a list of limitations on how the environment may be used. It returns an environment object annotated with the appropriate rtd, and possibly optimized.

    If it is passed a record-type-descriptor, it will make an environment object with exactly the fields defined by the record-type-descriptor. Other bindings will be ignored and unbound fields will be initialized to their default values (possibly #f). The record-type-descriptor should not define a constructor - it's not clear how its arguments could be chosen.

    Otherwise it will return an environment object identical to what it was passed except annotated by a record-type-descriptor implied by the extant bindings and the list of limitations. The syntax of the limitations is the same form as for R6RS records except as noted. They can include the following:

    sealed
    Three possible values:
    #f
    No limitation on extending it or defining into it. Can't easily precompute its list of fields.
    semi
    One can't define into it, but can extend it and inherit from it. Ie, "children" extended from it satisfy this object type's predicate. This is the default.
    #t
    In contrast to R6RS records, this does allow extending. It does not allow inheritance or defining into.
    immutable
    Its bindings can't be changed, as by set!. But they may can be extended, according to sealed.
    nongenerative
    Only one copy of it exists with any one UID. If another copy has already been created with the same uid, it is as if ->object had been passed that record-type-descriptor. if other limitations are incompatible with that rtd, an exception is raised.
    ???
    Omitted:
    • opaque. To get the effect of not having an rtd, just don't use ->object.
    • parent. Not needed because an environment automatically inherits from any environment that it extends.
    • parent-rtd. Same reasoning.
    • protocol. It's largely about defining constructors, but environment objects have essentially run the constructor already.

So now a ctor might look something like:

(define (my-obj val1 val2)
     (
        (eval
           '(lambda (current-environment ->object field1 field2)
              (->object (current-environment) (sealed #t)))
           (null-environment))
        current-environment ->object val1 val2))

Provide to mark bindings immutable

This is pretty self-explanatory. Having this would give us immutable fields. Some Schemes have this but it's not portable.

Join environments

In a few places above, we wanted to get bindings from 2 places at once: The environment object and the normal outer environment. The ability to join environments together would solve much messiness above, and would be useful in and of itself.

This is frustratingly just short of being provided by R6RS. Libraries can be imported in groups (and thus joined), and renamed in useful ways. Modules can be converted to environments. But environments can't be joined.

I would suggest extending the import spec to allow environments just when it is given as argument to environment. Syntax: <import set> would be extended to allow (environment <expression>).

NB, there are two senses of the Scheme identifier "environment" above, a procedure (available almost everywhere) and an auxiliary syntax (available just in import-spec when interpreted in a call to the procedure environment)

An example of usage:

(eval '(let ((x 3)) (scribble x))
   (environment 
      `(rename 
          (environment ,(current-environment)) 
          (write scribble))))

R6RS also says that such environments are immutable. I would loosen that restriction to not neccessarily apply to imported environments, just to modules.

No comments:

Post a Comment