Profiling in Klink
Klink is now at the stage where it makes some sense to profile it.
The C part can just be profiled as C. That makes sense for the C
parts, but tells me nothing about the Kernel parts. (Some of Klink is
in a Kernel prolog that is loaded first) As far as the C profiler
knows, that's merely _klink_cycle
being called many times.
What I need to know is how much each combiner of interest causes
_klink_cycle
to be called. ISTM I won't need to track C-to-C calls
even when they are "the same" as Kernel code. The C profiler will do
a far better job of profiling C-to-C calls.
Profiling in Guile
Profiling Scheme is not like profiling C. There's no single relevant call-stack. A call might apply continuations, force promises, or cause dynamic guards to be called. It might even be returned to multiple times. There's not neccessarily much relation between the stack below a call and the amount of evaluation that that call caused.
So I looked at how other Schemes do profiling, specifically Guile. Surprise, Guile's statprof just does it the C way! It just walks up "the" stack. Exactly what I didn't want to do.
What I'm doing instead
The stack in Klink is a spaghetti stack (as usual for Scheme). It's also heterogeneous. While call frames are the "main" thing, and they are what is popped when one combiner is evaluated, the stack includes:
- call frames
- guards
- dynamic variables
So to that list I added profile structures. They contain:
- Pointer to the combiner in question
- Count of eval-cycles so far.
That much I've already written. For profiling, each _klink_cycle
will
push the combiner it is using. That's unless the same combiner is
already being profiled in this frame, which happens when a combiner
recurses as a tail call.
That's because _klink_cycle
stores the profiling node above the
original frame (ie, one place further away from the leaves of the
stack). It has to, because it pops the frame that made that combiner
launch. If it stored tail calls too, recursive functions could
accumulate many profiling frames. In particular, the REPL loop would
have a ton of them, because it's implemented as a recursive combiner.
I may make profiling count distinct tail calls, but for now I'm just treating tail calls as continuances of the original call.
This design will not show calls that never terminate. It's not clear that they could be meaningfully profiled. This specifically includes calls that never continue normally.
This design also doesn't try to handle re-entered continuations. The
second time a continuation is used, the elapsed number of
_klink_cycle
counts bears no relation to how much evaluation the
call caused. Again, it's not clear that this situation can be
meaningfully profiled. I will probably just ignore returns other than
the first time.
It's also not clear what this should do when a call's continuation is
stored (say in user environment), and later is used for the first
time. It would be misleading to count every _klink_cycle
between
the two times. So this design is far from perfect too.
Slightly different topic: Plans for the stack
Make it a list of Kernel objects
At the moment, the stack nodes are a different set of types than Klink objects. But it nags at me that they partly overlap and have an extra layer of indirection that was inherited from Tinyscheme. For instance, a combiner on the stack is very nearly a Kernel combiner object. But it also contains an environment, and it has to have a second tag to type it on the stack.
In the future I am hoping to make the stack just a list of Kernel objects. One tag would disambiguate these objects both type-wise and stack-wise.
That would let a number of "hot" stack-handling functions be simpler. But it would also eject the static environment object from stack frames. However, the static environment is mostly re-used every frame. We only need to restore environment occasionally:
- When continuing abnormally
- When popping (continuing normally out of) a vau
- When popping an eval
- In certain special contexts such as evaluating guards or promises
In some of those contexts I already have to manage environment explicitly. So I may simply move an environment argument into the continuation object, and treat an environment object on stack as meaning "restore this static environment".
Args on stack
At the moment, arglists in Klink are literally lists. There is no attempt to vectorize them, nor keep track of previous type-checks on them.
If the stack directly stored objects, it would be tempting to also use it to store an argvector when that is applicable, and to store intermediate variables on it, and even vectors of environments that we know won't be captured.
However, such objects must be categorically distinguished from all other stack objects. Combiner objects at least risk being falsely recognized as frames. All that might be required is a type that means "The next N slots are reserved, pop past them when popping frame".
COW spaghetti stacks
Months ago I blogged about a design I call COW spaghetti stacks. Basically spaghetti stacks that try to be vectors for sub-portions that allow that. The above changes would make it tempting to implement them in Klink.