21 November 2010

Introducing EMSIP

Introducing EMSIP

Some motivation

I like s-expressions

I like s-expressions. They don't make you memorize new syntax for every new feature someone added. Editors and other tools can almost always make sense of them - with heavier syntaxes, there's always someone who thumps their chest and claims that they "could write a parser for it in an afternoon, no problem!" but when you look for actual existing tools that work on a given heavy syntax, finding them is a different story.

Not only can editors handle s-expressions, you can write code to process them in new ways without ever having to wrestle with a parser. In both Scheme and Common Lisp, the code to read an s-expression is just `read' - can't get much simpler than that.

And I have philosophical reasons for liking them. S-expressions are regular, regular is clean, and clean is good.

But they could go further

Quick, where does this s-expression end?

#mynewreadermacro a b c () ) () "["]{} 52:abcde

Of course you can't tell. In Lisp (Common Lisp, that is) there are many dozens of standard reader macros, and new ones can be added.

And it's not just reader macros. There are a few built-in extra bumps:

  • Double-quotes for strings
    "A string"
    
  • Literal symbols in Common Lisp, like
    |One literal symbol|
    
  • Infix operators such as quote, backquote, and low quote.

Scheme is better but headed worse

But surely Scheme is clean and small, right? Well, the impetus to write EMSIP up and post it came from Scheme. I've just been looking at the R6RS spec, and I'm stunned at how complex the Scheme syntax has become. Not up to the complexity of Common Lisp's full syntax spec, but headed in that direction.

EMSIP

I propose instead an idea that I've been developing for some time. I call it EMSIP. The idea is to have:

  • brackets (3 types and a comma)
  • A lot of little parsers that cannot violate brackets. They are expected to only parse regular expressions.

Think of it as a case of separation of concerns. No little parser needs to care how any other little parser behaves. It need only interface properly with EMSIP.

It may sound like not enough to implement a proper programming language, but it is. When I get more round tuits, I hope to add EMSIP into Klink, the interpreter I'm writing for Kernel.

Essentially all the complexity would be pushed out to little parsers:

  • comments
  • strings
  • radix numbers such as hex numbers
  • complex numbers
  • rationals
  • bytevectors
  • vectors
  • character escapes
  • (etc)

EMSIP main syntax

In EMSIP there are only seven magic characters:

  • open parenthesis
  • close parenthesis
  • open curly braces
  • close curly braces
  • open square brackets
  • close square brackets
  • comma
    {}[](),
    

Each of them always has the same structural effect, so even a naive processing tool can tell where structures begin and end. All bracket types consume exactly the text from the opening bracket to the matching closing bracket. Never more and never less. So even if an editor or other tool knows nothing about the syntax, it knows when an parser's scope ends.

The little parsers have their own internal syntaxes which:

  • apply only to their scope.
  • cannot override brackets, but treat the comma as non-magic.
  • Can only do regular expressions.

More to come

I'll write more about EMSIP later, I hope.

No comments:

Post a Comment