27 March 2010

Junk collection

Junk collection


Junk collection: The idea

My previous post made me wonder if it would be worthwhile to have "junk collection", in analogy to garbage collection. Instead of removing and finalizing an object, "junk collection" would transform it into a more compact form, idiosyncratically by object type. I previously called it "mess compaction", but I think the junk/garbage analogy explains the idea better.

Junk vs garbage

"junk" is different than "garbage":

JunkGarbageNeeded data
Still useful?MaybeNo

When to transform

Junk collection would transform objects only when:

  • The object is stale. This could simply mean that a generational GC knows it's in an old generation.
  • The object is only referenced by "junk-releasing" references, which would be loosely similar to weak references.

    Such references are intended to be used by referers that don't cause the object to be used, and therefore shouldn't hold the object in its ready-to-use state.

How to transform

There are competing desiderata here:

  • The junk-release operation needs to be particular to the object type. It might release its scratch space, empty a cache, etc.
  • The junk-release operation must be simple since it occurs during garbage collection or a similar operation. In particular, no arbitrary logic, no loops or at least none with arbitrary termination conditions.

For simply releasing junk parts, two approaches present themselves:

  • A junky object holds its junk parts by another variant of weak reference. Call that type junk-holding. In effect, junk-releasing and junk-holding combine to make a weak reference.
  • A junky object's type knows an idiosyncratic but limited means of junk-releasing it. Perhaps it can only assign #f (or nil, w/e) to certain places in it.

But we may want something more general: The junk-release operation could alternatively replace the object with a junkless type of object. To keep it simple, transformations might consist entirely of allocations and assignments. If further work were required to make the junkless object usable, a flag could be set and the operation could be done just-in-time instead.

Here, junk-releasing referers must treat the object as potentially being either of the accepted types. If the transformation is one-way, the references themselves might be transformed to normal references, since they can do no longer do anything useful with junk.

In the cord marker example, the junk-release operation would transform a marker and its temporary array into a constant string. The marker would disappear from the cord but the string wouldn't. Any marker's parents would hold it by a junk-releasing reference. I believe only markers should be held in this manner.

Replacing an object must be transparent to everything that holds a reference to the object. But since any generational GC already moves objects transparently, this is feasible.

Junk within junk

What happens if a junky object (BIG) contains another junky object (SMALL)? If there are also outside references to SMALL, not much. It's kept alive by those references, even it gets detached from BIG.

So we're talking about the case where the only reference to SMALL is thru BIG. In that case, SMALL will probably not be used in the near future, because is can only be used thru BIG, which is stale. So SMALL should be junk-released too.

A further generalization of this

In a different general vein, this could contemplate not just one type of "junk-filled object", but multiple potential uses that an object can be released from. For each use, it might discard different junk.

No comments:

Post a Comment