03 April 2011

Profiling in Klink

Profiling in Klink

Klink is now at the stage where it makes some sense to profile it. The C part can just be profiled as C. That makes sense for the C parts, but tells me nothing about the Kernel parts. (Some of Klink is in a Kernel prolog that is loaded first) As far as the C profiler knows, that's merely _klink_cycle being called many times.

What I need to know is how much each combiner of interest causes _klink_cycle to be called. ISTM I won't need to track C-to-C calls even when they are "the same" as Kernel code. The C profiler will do a far better job of profiling C-to-C calls.

Profiling in Guile

Profiling Scheme is not like profiling C. There's no single relevant call-stack. A call might apply continuations, force promises, or cause dynamic guards to be called. It might even be returned to multiple times. There's not neccessarily much relation between the stack below a call and the amount of evaluation that that call caused.

So I looked at how other Schemes do profiling, specifically Guile. Surprise, Guile's statprof just does it the C way! It just walks up "the" stack. Exactly what I didn't want to do.

What I'm doing instead

The stack in Klink is a spaghetti stack (as usual for Scheme). It's also heterogeneous. While call frames are the "main" thing, and they are what is popped when one combiner is evaluated, the stack includes:

  • call frames
  • guards
  • dynamic variables

So to that list I added profile structures. They contain:

  • Pointer to the combiner in question
  • Count of eval-cycles so far.

That much I've already written. For profiling, each _klink_cycle will push the combiner it is using. That's unless the same combiner is already being profiled in this frame, which happens when a combiner recurses as a tail call.

That's because _klink_cycle stores the profiling node above the original frame (ie, one place further away from the leaves of the stack). It has to, because it pops the frame that made that combiner launch. If it stored tail calls too, recursive functions could accumulate many profiling frames. In particular, the REPL loop would have a ton of them, because it's implemented as a recursive combiner.

I may make profiling count distinct tail calls, but for now I'm just treating tail calls as continuances of the original call.

This design will not show calls that never terminate. It's not clear that they could be meaningfully profiled. This specifically includes calls that never continue normally.

This design also doesn't try to handle re-entered continuations. The second time a continuation is used, the elapsed number of _klink_cycle counts bears no relation to how much evaluation the call caused. Again, it's not clear that this situation can be meaningfully profiled. I will probably just ignore returns other than the first time.

It's also not clear what this should do when a call's continuation is stored (say in user environment), and later is used for the first time. It would be misleading to count every _klink_cycle between the two times. So this design is far from perfect too.

Slightly different topic: Plans for the stack

Make it a list of Kernel objects

At the moment, the stack nodes are a different set of types than Klink objects. But it nags at me that they partly overlap and have an extra layer of indirection that was inherited from Tinyscheme. For instance, a combiner on the stack is very nearly a Kernel combiner object. But it also contains an environment, and it has to have a second tag to type it on the stack.

In the future I am hoping to make the stack just a list of Kernel objects. One tag would disambiguate these objects both type-wise and stack-wise.

That would let a number of "hot" stack-handling functions be simpler. But it would also eject the static environment object from stack frames. However, the static environment is mostly re-used every frame. We only need to restore environment occasionally:

  • When continuing abnormally
  • When popping (continuing normally out of) a vau
  • When popping an eval
  • In certain special contexts such as evaluating guards or promises

In some of those contexts I already have to manage environment explicitly. So I may simply move an environment argument into the continuation object, and treat an environment object on stack as meaning "restore this static environment".

Args on stack

At the moment, arglists in Klink are literally lists. There is no attempt to vectorize them, nor keep track of previous type-checks on them.

If the stack directly stored objects, it would be tempting to also use it to store an argvector when that is applicable, and to store intermediate variables on it, and even vectors of environments that we know won't be captured.

However, such objects must be categorically distinguished from all other stack objects. Combiner objects at least risk being falsely recognized as frames. All that might be required is a type that means "The next N slots are reserved, pop past them when popping frame".

COW spaghetti stacks

Months ago I blogged about a design I call COW spaghetti stacks. Basically spaghetti stacks that try to be vectors for sub-portions that allow that. The above changes would make it tempting to implement them in Klink.