31 August 2010

First notes on implementing EMSIP

Implementing EMSIP in Scheme

What

These are my older notes on implementing EMSIP in Scheme. They give a flavor of my thinking, warts and all.

There's a fair bit of rethinking in here, and many parts are just obsolete.

Some ideas

Iterate over a sequence of characters

This allows all the lazy reading we could want.

The iterated proc mostly just checks for magic characters

Or more carefully, for attention-string followed by any magic character, and for non-empty attention-strings, also wraps characters in a different way that other parts recognize.

It builds a lazy structure as follows:

A left bracket
  • Calls the respective reader. That's one of 3 predetermined readers, which mutate into the actual readers.
  • Feeds the current reader the return value of the new reader as an object. Since Scheme supports lazy evaluation, this can be done without order-of-evaluation problems.
A right bracket
  • (Not immediately needed) Check for bracket agreement; error if bracket does not agree with initial bracket.
  • Terminates the current reader's input
    • And (as above) terminates the current read scope.
???
The comma is not special here, it's just treated specially by 2 of the built-in readers.
???
(Not immediately needed) For a non-magic character following a non-empty attention-string, feed it to the current reader as a semi-magic char.
Any other character
Feed it to the current reader as a simple char.

Mode of iteration

We can't just loop thru the characters, because different conditions apply at different points, and these conditions are hierarchical.

Instead, we are reading with specific state data, and sometimes recursing into another call that "uses up" a stretch of characters.

So each call would:

  • Take the initial stream
  • Take the state data
  • Return the remaining stream
  • Return the object that was read

So this call corresponds to a nested reader. It gets or immediately sets up a reader and uses that reader for its whole (non-nested) scope.

And we use this by:

  • When we call a new reader,
    • Give it state data
      • probably including the new reader
    • replace the stream with its promised return
    • Use its object return as the value to feed to the current reader.

To make the laziness work wrt the end-bracket termination, we'll pass the current continuation, and the iterator never stops, it just calls that continuation.

We also want to fill the list in a natural way. That suggests that rather than iterating as such, we are filling a list and making further promises.

Argument structure of the built-in readers

  • Takes a lazy stream
    • Element types can be, all distinct:
      • simple character
      • (Not immediately needed) semi-magic character
      • object, as returned by some other reader.
      • A promise which may be any of the above
      • A discriminated promise, which knows which type it is but not what it is.
      • And (not formally an element) end-of-list.
  • May also take further arguments, all objects:
    • Usually, as when reading a list, takes the next object as a promise
      • Must allow an indication of termination, possibly #f.
      • When possible, should treat that object as opaque; don't force it as a promise. That's not always possible, which is why that object would be an argument.
  • Returns an object

Flow control

Let them be promises.

Let both object and stream be promises.

  • Order of evaluation

    Wall-clock order of evaluation of a nested call:

    • Call nested emsip-read
    • Nested emsip-read formally returns
    • Current read goes on to the remainder of the read (but may not have the stream)
    • Current read forces the stream
    • Nested emsip-read tries to finish
    • Nested emsip-read finishes stream, makes it available
    • Nested emsip-read may ask for some or all of current read's remainder of the read (often just the first object)
    • Current read, certainly now knowing stream, reads further,
Pass xformers around.

The returned object becomes a chain of promised xformers.

What nested read returns

Of course its final state of the stream. But what else to return is conflicted:

  • OT1H, wants to return the object, letting reader treat it any way that it likes as it builds its ultimate returned object.
  • OTOH, wants to call an xformer on the object and a tail object.

We could let it return either, and reader can use it as an xformer if that's appropriate. So reader must know what it's getting. And reader must supply itself an xformer when none is provided.

Or return a tagged thing, arbitrary tag that other readers should understand.

Or predefine some tags, so it can be `xformer' or `object'. And every reader tells us which it supplies - really, we pass it thru.

It is up to readers to be suitably lazy and to call xformers.

Optimization: Skipping forward

Sometimes we'd like to can skip past characters we know aren't relevant. Some cases:

  • Non-magic characters, skip to the repective closing bracket. This reader's value remains just a promise and may yet be forced.
  • General skip-past-chars as dictated by reader, but of course still never skipping past magic characters.
    • So the other is a special case of this.

How to communicate it? Partly just by not forcing reads. Perhaps also as a read parameter.

Rethink: May need to allow more nested levels

The simple one that only sees comma as separator. This will be built by `emsip-make-list-reader'. One xformer only applies at single level here. This is basically for "repeat". But there are 2 levels in our basic reader: Objects and characters. Or we could just accumulate characters into lists. What's not nice is that the 2 levels are entangled with each other. Maybe intercept just comma and magicify it, then have really 2 levels. Init the inner xformer with `list->string'. Does the outer level (a) LEANINGTOWARDS pass in a false end, or (b) NOTCHOSEN act after the inner level passes on something? And it would make an object, as any other level does. But dual levels is just what I hoped to avoid here.

Perhaps reader would tell upper level about separators?

But it's really alternating string-constitutent and non-string-constitutent.

So very likely a new design: Push-down layers which always include the brackets. But comma is just a separator. It pops (like an end bracket), but higher level immediately pushes another. And hopefully the basic readers won't need anything more than that and sequence (which also pushes and makes non-match pop).

Readers forcing downwards

The previous item demonstrates that readers may want to immediately start a nested read. That gives opaques still a chance to organize strings rather than having pieces embedded in them. Do this before end-brackets - repeat-readers do expect to go to a second level first. But they shouldn't eat the forthcoming char

Repeat-readers do this. Separating repeat-readers don't have a way to return zero elements, nor should they.

Checking for matching brackets

Built-ins simply know the end-bracket char they expect. Readers probably shouldn't set up their own end-char expectations, and even if they did, they are passed their starting char which suffices to figure out the end char.

So there's nothing special to be done when starting a nested read.

Maybe something special to be done when ending one. That's most neatly done by passing the final char to `afterwards', the termination communication function.

Raise error when `emsip-object-stream-force1' gets empty stream? No.

Do not raise error when `emsip-object-stream-force1' gets an empty stream, because this condition might occur in innocent places such as a forced tail after a comment.

I may rethink that, though.

How to return the state of the stream

In normal use, `emsip-object-stream-force1' returns all the info it needs to. It returns a stream with the car forced, and the tail promise knows the state of the input stream.

But when it hits an end-bracket which ends the output stream, or hits the end of the input stream, it needs to communicate the new state of the input stream. We can no longer use the return value to communicate it, because that has to be the null stream.

Would be nice to have:

  • This manages bracket-match checking
  • This gives error when it is not set up, corresponding to a read that popped past the top level.

Possible approaches:

  • Call a continuation of something in the caller. emsip-object-stream-force1 would call it just when it hit an end, passing the new state of the input stream.

    The problem is that we need control to exit this call by its normal route so that the output stream gets assigned its right (forced) value.

  • Return two values, where the first is the output stream and the second one (usually redundant) is the input stream.

    It's also difficult or impossible to ensure that the calls we're interested in gets the values. In many cases there are other callers, usually delayed intermediate calls.

    And it's ugly to always return a value that usually isn't used and is usually redundant to another value.

  • Set a boxed value to the new state of the input stream.

    That's clumsy for the caller to set up.

  • Call a function that sets the value.

I will be posting some notes about EMSIP

I will be posting some notes about EMSIP

For the past few days, I've been sketching an implementation of EMSIP. EMSIP is my idea of how s-expressions could be extended without losing their well-behavedness.

It is a reference implementation and it's not ready for use yet. It's really just a sketch at this point. I haven't written the testing code, and I don't consider code "really written" until it passes tests.

Ordinarily I would write the tests first and write code afterwards. The design phase would overlap with test-writing. But in this case functionality was not the only thing that interested me. I was as interested in making an elegant, minimal design as I was in functionality. So instead of writing tests first, I sketched the code and re-examined it until it seemed as elegant and minimal as was reasonably possible.

I think I largely succeeded in my goal. The code even surprised me with its elegance at a few points.

Why EMSIP?

Why EMSIP is a good thing

EMSIP is my idea of how s-expressions could be extended without losing their well-behavedness.

Common Lisp reader macros read arbitrary stretches of text. They only need stop when the macro decides it has read enough. It need not respect bracketing. But without reliable bracketing, s-expressions become a lot less predictable, and therefore a lot harder for tools to handle. So Common Lisp reader macros destroy the well-behavedness of s-expressions.

There are several ways you can think about a solution:

  • Just don't expect tools to handle arbitrary reader macros. So now your tools can't fully read your code. Bad idea.
  • Don't use any extensions? Impractical.
  • Put all of the intelligence that your reader macros have into every one of your tools. There's no portable or reliable way of doing this.
  • Use a different system of s-expressions. But Scheme macros have the same problem.

EMSIP offers a solution. EMSIP always respects bracketing. A bracket always means a nesting operation, except for one special case. And even that case is very mild: In special scopes, there is an attention string and only brackets immediately preceded by that attention string "count".

Further (minor) deficiencies of s-expressions

EMSIP offers even more regularity. Although normal, non-reader-macro'd s-expressions are fairly well behaved, they have lots of exceptional cases. For instance:

  • The lowly string literal, eg "abc". It doesn't have start and end brackets, it has one unibracket that both starts and ends it.
  • Quoting and backquoting. They aren't standalone s-expressions, they want to alter the meaning of the s-expression that follows them.
  • Exceptional symbols such as `&optional' in formals lists. They don't want to be symbols in a list. Like quoting, they want to alter the meaning of what follows them.

EMSIP will make all of that more regular.

25 August 2010

The King's Singers: Chanson D'Amour (Review)

The title song has a trivial character and is done in a mostly silly way. A little silliness is good, but I wouldn't have made it the first song.

I was looking forward to their rendition of Kurt Weill's "My Romance" but it was uninspiring.

Good tracks:

  • Those Days Are Gone
  • Who Is Sylvia?
  • All I Ask Of You
  • Plaisir D'Amour
  • Phyllis Is My Only Joy

Pretty good tracks:

  • I Get Along Without You Very Well
  • Lydia
  • Down With Love

The King's Singers: Good Vibrations (Review)

Good Vibrations

Some of the tracks were, not badly done but so well-trodden that another cover of them just isn't exciting. Notably:

  • Good Vibrations
  • Cecilia (Paul Simon)
  • American Pie

I was disappointed by And So It Goes. It's not bad, but there is a much better King's Singers performance out there. I had expected that performance to be on this CD and it wasn't.

For some of the tracks, there's fun in seeing the King's Singers ham it up. In particular, Freddie Feel-Good.

Good tracks:

  • Father to Son
  • The Boxer
  • Some Folks' Lives Roll Easy
  • Fifty Ways To Leave Your Lover

Pretty Good:

  • That Lonesome Road
  • Freddie Feel-Good