17 November 2010

kernel-library-paths

kernel-library-paths

I've been thinking about library paths in relation to Klink, the Kernel implementation I've been writing.

Some lessons from the past

  • Being familiar with Elisp's horrid treatment of library naming, I'd like to do library-finding right, and not hobble for years with a suboptimal system.
  • Scheme's use of lists as library specs is the right way to do it.
  • SRFI-103 suffered from a lack of encoding for identifiers that don't map cleanly to file names. Avoid that.
  • SRFI-103's attempt to escape some identifiers was overly special, resulting in a non-trivial knowledge bad. Don't do that, use something well-known.
  • It is really nice to allow a "main" file in order to group a file with its supporting packages.
    • From SRFI-103, though its "main^" syntax was awkward.
    • From my experience creating Elisp packages, especially Emtest.
  • Want to support at least these parameters:
    • An ordered list of directories to search.
    • An ordered list of file-name extensions
      • From SRFI-103 "to distinguish dialects, formats, and extensions"
  • Want to at least optionally choose among available versions of a library.

    I don't have to support that in the near future, but it should at least be specified.

  • It's nice to allow sub-features in "require" type statements
    • From emacs >= 22 `require' statements, and from Redhat rpm.
    • This clashes with the previous lesson.
  • Traditional major.minor.release version numbers have some problems, especially wrt project branching.
    • Git among others.
    • This clashes yet further with the previous two.
  • I'd really really prefer versioning that doesn't push the version-update treadmill to go any faster than it has to.
    • Too many times on the version-update treadmill.
  • Reading the first syntactic datum is a wart on the procedure, and I may be able to solve it.
  • It is nice and convenient to have a way to abbreviate file locations, something more transparent than calling expand-file-name everywhere.
    • From SWI Prolog and from emacs' default-directory
  • Init files should be ordinary source files, implying that they should fit into this library scheme. They come in more flavors than you would expect.
    • Learned from many places, especially TinyScheme and emacs.
  • It's often convenient to be able to associate "support" files with a source file. Their extension should differ so that you can tell them apart.
    • Learned from Mercury and from C/C++
  • The normal source file extension should be unique. So use resources like Filext to check it.
    • From Prolog and Perl's clash over *.pl (Prolog was there first)
    • "KRN" is taken. Could take it anyways since it's unrelated, a font for a video game
    • "KRL" seems available.
  • Sometimes one would like to define multiple submodules within the same file.
    • Learned from Mercury and from an experiment I did in Prolog.

Answer 1: Just borrow URL encoding.

It very simple and it's in widespread use.

Now, what to do about clashes? Eg, what if files "too%06dany" and "toomany" both exist?

Several answers present themselves:

  • Prefer them in alphabetical order?
  • Prefer them in almost alphabetical order, but sort the escape sequence to before/after other files?
  • Check for the existence of clashing filenames and raise error.
  • Don't allow clashes. Enforce a one-to-one mapping of strings to filename encodings.

I like the last answer best. Encodings that are not mandatory are forbidden. In other words, for any file name it must be the case that:

(encode (decode filename))

gives the original filename again.

Answer 2: Encode "main" with a similar escape sequence.

That is, some escape sequence that would otherwise be illegal means "main". Let's use "%main" for that.

Answer 3: A file is a list of sexps

This will probably be the most controversial answer. Instead of reading the first syntactic datum in the source file, let the whole source file be a list. That is, when you start to load a file, pretend you just read an open bracket, and the matching close bracket is EOF. Each sexp in between is one element in the list.

That even handles the case where the file is just a script: Put Kernel's `$sequence'1 as the first symbol in the file.

Tentative answer 4: Library namespaces for special purpose

These would include at least:

  • Standard preludes for an implementation
    • Say the implementation is called "acme", the prefix would be;
      (implementation acme) 
      
  • System rc files for various purposes
  • Personal rc files for various purposes

Tentative answer 5: Resolving the treatment of versioning.

Recapping, the things I wanted were:

  1. Want to can choose among available versions
  2. Use sub-features for version requirement.
  3. Version IDs need not be major.minor.release
  4. Don't make the version-update treadmill go faster than it has to.

If we were to use traditional major.minor.release version numbers, that would satisfy #1 but scuttle #2 and #3.

My tentative answer is to eschew the false economy of not reading a file before you need it. Our code should just partially evaluate the file and see what it places in a "subfeatures" area2. If that's ambiguous, then we have a problem - but we always would have in that case.

If efficiency is required, do it behind the scenes. Possibly pre-compile data for individual files or even to index the entirety of code available on a given system.

For #4, I have a tentative scheme involving automated tests and code coverage tools, but that's for another time.

Answer 6:

If a library path resolves partway thru to file rather than a directory, look in that file for submodules. IIRC this is a direct clone of how Mercury does it.

Footnotes:

1 equivalent to Lisp `progn' or Scheme `begin'.

2 Kernel is basically re-inventing these things with first-class environments. So very likely subfeatures will be an environment accessed by key from the main environment that the library provides.

No comments:

Post a Comment