24 January 2011

cords, emacs, and lone-COWs

Cords, emacs, and lone-COWs

Quick recap

I've been blogging about how I'd like to combine emacs and cords. My time is already committed to Klink and Emtest, and the time I don't spend on them is coveted by other projects. But the idea has been jangling around in my head for some time, so I blog about it.

I've already talked about the basics, the text structure emacs wants, and how cords might be adapted to support it. This led me to imagine dynamic cords that mirrored the state of objects, and to ask how their updates could cause redisplay in a reasonable way.

The problem

Propagating updates

How might mutation of an object cause redisplay? Here's a very naive approach: When anything mutates an object, we set a dirty flag in the window(s) (if any) that are displaying that object.

But this is not robust. In emacs, only a few operations intrinsically require redisplay: inserting into a buffer, changing certain text properties. Most operations do not directly require redisplay. Those that indirectly require it call one or more of the direct operations. So failing to force redisplay is mostly not an issue. (Even that is not completely robust1) But for us, mutations to arbitrary objects would require redisplay. This is far harder to track.

We'd also need to propagate the dirty flag to exactly the windows that are displaying that object, ie windows whose displayed cord is transitively built from it.

When a redisplay was appropriate, each window would check whether it was contributed to by each mutated object, and if so, recalculate the cord(s) it contributed to and redisplay it in its proper context.

This is a problem, because a cord node doesn't naturally know about other cords built from it, and because it gives the window no hint what changed, so it has to redisplay everything.

A new cord each time?

Cords are basically immutable. Mutating a cord is contrary to the spirit of cords, and that's bad. They are supposed to be immutable.

So new text means a new cord object. If the display should change, there should be a new cord to display.

Or maybe not?

But there are problems with making a new cord every time. The process is driven by object mutation. Every relevant object mutation wants to make a new chain of ancestors for the affected cord. That would be grossly inefficient. We should only make those cords that are required for display.

Another problem is that it makes it more work to track cord contributions because cords' identities keep changing. If we keep lists of them, we have to keep erasing the old one and making a new one.

And we usually don't need to

And consider dynamic cords made from objects when those objects have since mutated. It is possible that something might be interested in their previous value, but it's unlikely. Otherwise, they don't really mean anything. There is no live object that they are describing, except perhaps by pure co-incidence. Nothing wants to display them.

Lone-COWs

Introducing lone-COWs

So let's treat dynamic cords and their ancestors as a sort of copy-on-write objects. Only we'll optimize them for the common case where it is released from its old use before a new use is wanted.

In fact, we can imagine a fairly general facility for objects like this, objects that are dynamic but are treated as constant building blocks. I'll call them "lone-COWs", because they are copy-on-write (COW) objects that are often unique, but not always (otherwise I'd have called them "unique objects" and the COW part wouldn't be needed)

General nature

Lone-COWs, like COWs, are a type of reference. Technically they are handles, that is, references only pointed to by other references. A lone-COW holds an object OBJ.

When OBJ is to be mutated, a lone-COW figures out whether anything is interested in OBJ's old value. If anything is, it (being a COW) copies itself and OBJ and lets one OBJ be mutated while the other stays constant. In any case, the lone-COW sends notifications to interested parties.

References

There are these types of reference to lone-COWs:

  • "Owner" references that want to treat the lone-COW as constant. If an owner actually wants to mutate it, it isn't really an owner but a co-mutator (or has become one). It'd be a special type of reference that holds a lone-COW and decrements the count on finalization, but I won't explore that here.

    A lone-cow only needs a count of these. We expect this count to usually be 0.

  • "Tracker" references that want to track the lone-COW across its mutations.
  • "Co-mutator" references that want to (pretend to) copy some other object when the lone-COW (pretends to) copy itself. I will call the other object a "co-mutator".

External operations on lone-COWs

  • Copying OBJ. This is just making an "owner" reference, which just increments a reference count.
  • Co-mutating with OBJ. That means that an object A wants to, next time OBJ is mutated, effect something like:
    (set! A (make-a 'field-1 OBJ' 'field-2 (a-field-2 A) etc))
    

    This is expected to chain from child to parent to grandparent etc.

  • Tracking OBJ. That means that an object A wants to co-mutate with OBJ but A has not yet processed OBJ's earlier mutations. A wants OBJ to go ahead with any further mutations and A will catch up to it later. If the lone-COW splits, A wants to know about it.
  • Mutating OBJ. This must notify the lone-COW beforehand.

Internal operations

  • Copying the lone-COW. This is done if OBJ will be mutated and there are any "owner" references to it (Unless there is just one and there are no co-mutator references, but OBJ shouldn't be set up to be mutated this way. Owners expect to hold constant objects)
    • This notifies every tracker and co-mutator that now the lone-COW lives at the new location.
    • All the "owner" references continue pointing at the old copy.
  • Anticipating that OBJ will mutate. Operations:
    • Copy the lone-COW if it needs it
    • (No-op) Now there are certainly no owners
    • Put the lone-COW in a state where nothing can add co-mutator references or owners to it.
    • Notify every co-mutator that OBJ will mutate. Remove it from the list of co-mutators. The onus is on the co-mutators to arrange further relations such as still co-mutating or becoming a tracker.
    • (No-op) Trackers don't care about mutation.
    • Put the lone-COW in a state where it accepts co-mutator references and owners.
    • (No-op) The mutation can now proceed.

Notifying the references

  • Owner references don't ever need to be notified.
  • Tracker references are notified just if the lone-COW copies. We just change a pointer in them.
  • Co-mutator references, when notified, call back to act on an arbitrary object.

    A co-mutator that is a lone-COW itself will consider itself notified of mutation, thus propagating the notification wherever it belongs.

    A co-mutator reference would hold:

    • Idiosyncratic info. The callback interprets this. Generally this would include a weak reference to an interested object, effectively a backlink.
    • A callback for notifying the tracker that the lone-COW copied. It is to be called with these params:
      • the idiosyncratic info
      • the lone-COW

To make it easy to notify the references, a lone-COW keeps a list of tracker and co-mutator references to it.

Fields

A lone-COW has this data:

  • OBJ, a mutable object of arbitrary type.
  • A count of "owner" references
  • A list of tracker references
  • A list of co-mutator references

Requirements

Notification

A lone-COW must be notified before its mutable object changes. It has to be before, not after, just in case there are any "owner" references to it.

Ways to satisfy this requirement:

  • All its components being either immutable objects or lone-COWs themselves.
  • Some low-level mechanism that intercepts attempts to mutate OBJ or merely to arrange to mutate it.
    • Possibly this creates lone-COWs that hold parts of the object.
  • Or other code has the onus to notify it.
Uniquely own OBJ

OBJ should be refered to only by the lone-COW. This could be guaranteed by copying it when making the lone-COW.

Co-mutator and tracker references must set themselves up

The onus is on co-mutator and tracker references to arrange to be found by the lone-COW.

Copying must understand lone-COWs

Functions that copy objects must increment a lone-COW's reference-count instead of physically copying it.

Some co-mutator strategies

Lazy lone-COW

When it receives notification, it:

  • creates a "tracker" reference to the first lone-COW.
  • Arrange to conform itself later, possibly by setting a dirty flag.
  • (Of course considers itself mutated, see above)

When it updates, it:

  • Makes a co-mutate reference from the tracker reference
  • Drops the tracker reference.
Co-mutator that is not a lone-COW

So it gets notified but itself has nothing to notified. Probably this means it is some sort of user interface, say a window. When it receives notification, it arranges to update the display in the near future, possibly the next time it's waiting for a command. Until then, it just tracks the lone-COW.

So it uses the lazy lone-COW strategy except that it ultimately updates a display.

Co-mutator with parts

When something tries to get a reference to a mutable part, give it a lone-COW reference and arrange to be a co-mutator of that lone-COW.

Issues

Circularity

Circular lone-COW references are not a problem, because we erase each notification link as it is used. So even if a notification transitively reaches the place it started, it doesn't continue circling.

Repeated notification

Repeated notification is not a problem, again because we erase each co-mutator notification link as it is used.

Backdoor changes

Owners that are "cheated" by OBJ being mutated in some backdoor manner. This can't happen if requirement 1 is satisfied. That doesn't mean it can't happen.

Long-distance owners

Do new "owning" references to the top of the graph cause problems when the lower objects are mutated? No, because we copy before mutation (That's what "copy on write" (COW) is about). We don't need for child lone-COWs to have non-zero reference counts. When a lone-COW splits, we copy OBJ. If OBJ has lone-COW components, "copying" them will increase their "owner" reference count, which will cause them to split if the mutation came from within them. So as long as requirement 1 is satisfied, we're untouchable.

Multiple tracking?

Should there be multiple tracking & co-mutating? We've assumed thruout that all the mutations are of interest to all the trackers and co-mutators and to none of the owners. For our immediate purposes, single tracking is sufficient. And lack of multiple tracking does not create problems, it's just lack of a feature.

Should owners get the copy?

Should the treatment of trackers and owners be reversed? Maybe. I did say we want to optimize for the common case when we have a lone object being mutated and no owners. Yet the fact that the lone-COW copies means that we do have owner references, as we didn't expect to often have. For now, I don't want to muddy the waters by using the reverse polarity from what normal COWs use.

The semantics of new owners or co-mutators during notification.

At first I thought they'd apply before the mutation. But when they are notified, we've already committed to the mutation. The notification period is just meant for housekeeping actions, not for functional action. Then I tried applying them afterwards, but that was wrong too. If something re-notified the lone-COW of this mutation, it could cause them to be applied early. So we can't always control whether they are applied early or late. That's troublesome. So it seemed best to just block them.

We could try to control whether they are applied early or late, but that seems potentially complex.

Ramifications for our use

So we've come back to the design to merge emacs and cords. In this design, we'd use lone-COWs for:

  • dynamic cords
  • The arbitrary mutable objects dynamic cords are built from.
  • cords built from dynamic cords
  • Windows as displays
  • Windows as containers that track cords

Using a chain of lone-COWs has the effect of propagating any object mutations to all the windows that are displaying it, without further intervention. That's basically problem solved.

Footnotes:

1 Eg, auto-insert used to not cause redisplay, so you would only see the inserted template after you started typing

No comments:

Post a Comment