20 November 2010

reading and writing

Thoughts on reading and writing in Kernel

This is an email I sent to John Shutt, designer of Kernel, which I am writing an implementation for. I thought I'd put it up for public scrutiny and comments.

The email

I am implementing printing and reading for Klink now, and it made me think about printing and reading objects in general, which I know you were looking for a general solution to.

From the partial state of printing, it's already clear that operatives frequently present themselves to be printed as bare objects and not as symbols evaluating to themselves.

So I'm proposing that as a fallback before we resort to printing an object as "#<whatever>" we do the following:

  • When printing an object:
    • find a visible symbol bound to it (if that's not possible, we can't do this)
      • NB, this is for objects, not for symbols that are intended to be captured, eg symbols bound inside a $let.
    • print an indication of an environment it can be found in and that environment's state. I'm thinking of a hash or a standard library path.

      We print the state to avoid problems mutation could cause. If we printed, rebound the symbol, and read, the result might not be equal? to the object.

      That environment is not neccessarily the environment the symbol is bound in, nor neccessarily the current environment. It need only be one the symbol is visibly bound to the object in. It could be a standard library because read doesn't bind anything in that environment nor give access to the environment.

      That indication of environment might encompass a sexp of arbitrary size.

    • print an indication of that symbol (Not neccessarily the same as printing that symbol)
  • When reading:
    • know an environment to find symbols in.
    • when reading the symbol indication above, look it up in that environment
    • In response to environment indications, change to the right reading environment. It's an error if it doesn't exist or it's in a different state.

It has some drawbacks:

  • Con: It introduces reverse-association into environments.
    • Mitigating: Introduces only what list-bindings would introduce, because one could just try every binding.
  • Not an issue: Finding multiple bindings for the same object. We can just use the most convenient one.
  • Con: Naively implemented, requires a lot of searching to find an object's symbol.
    • Mitigating: Need not be naively implemented.
  • Con: We need to control what counts as the visible environment for read and print.
    • Mitigating: That's easy.
    • Pro: OTOH that gives us some extra control.
    • Pro: G1 implies eliminating the special case of one predetermined reading environment.
  • Con: We need to calculate and represent an environment's state uniquely.
    • Calculating it is doable, but it's work.
    • Mitigating: There are ways to represent it: hashes, standard libraries.
  • Not an issue: Causing evaluation during read. This only causes symbol lookup.
  • Con: There'd be 2 ways of printing what are now printed as symbols. For someone writing forms, that's an error opportunity and violates G3.
    • Mitigating: That's only an error opportunity for manually written sexps, which generally means forms. But in that case the forms are evalled, so bare symbols will be looked up. This usually gives the same object, and when it doesn't, there really were 2 distinct objects by that name so there's no getting around needing 2 distinct representations.

      Further, the easier form to write (bare symbol) and the more likely meaning (local binding of that symbol) coincide.

That's a lot of cons, but ISTM each one is mitigated fairly well.

I'd say this strategy is a lot like paths (mymod:foo) and should be generalized in that direction. So:

  • the "indication of symbol" would be a sort of path.
  • the "indication of environment" would be contained in something like a $let, except purely syntactic and allowing only environments.

So as a first sketch, it might look something like:

#let-environments
(
   ((mod-1 (id 34fe93ab5539d9bc8))
      (mod-2 (kernel devel)))
   
   ;;The object bound to `foo' in `mod-1'
   #object (mod-1 foo)
   ;;Object bound to `bar' in (kernel devel testhelp mocks filebuf)
   #object (mod-2 testhelp mocks filebuf bar))

I'm certainly not committed to this syntax, but it gives the idea. I don't like the prefixuality of the sharps.

That gives us this overall strategy for printing objects:

  • For literals - Unvarying print logic that does only superficial inspection of data
    • I include symbols as honorary literals because when read and written, they have no relation to an environment.
  • For containers (at least for transparent ones) - print logic that includes printing their contents, read logic includes calling a constructor that reconstructs them from their contents. The results of write-then-read should satisfy equal?
  • For non-containers or opaque containers: The method described above.
  • Last resort method: "<#opaque-type ID>" style.

What do you think?

Tom Breton (Tehom)

No comments:

Post a Comment