02 December 2010

No good external testing frameworks

External testing frameworks, yuck!

Yesterday I looked at external testing frameworks. I didn't find a satisfactory one. Looking back, I'm not glad that I spent my time on that.

Klink is now at the point where it makes sense to script tests for it. But it's not yet ready for internal, self-driven tests. So I looked for an external testing frameworks. I tried several, and none were satisfactory.

Emtest

Let's get this out of the way first. Emtest is my emacs test harness. It does not do external testing. Making it do so would take more time than I want to spend on this. So I didn't look at it for this purpose.

But maybe I should have

In retrospect, doing so might have had more satisfying results than spending my time exploring the others. Now I'm wondering whether I should go and do that. It could run the external program as an inferior process. The major work would be writing the expect functionality.

But that's another topic. Right now I'm grousing. Er, I mean reviewing the current state of external testing frameworks.

DejaGNU and Expect

DejaGNU does at least use POSIX-conformant test results. That's a format that I don't entirely agree with, but it's at least more reasonable than some formats out there. I could live with it.

Expect scripts are in TCL, and DejaGNU wants you to write tests in TCL. I had written an expect script years ago, back in the days of dialup. I vaguely remembered it being a PITA, but I thought that maybe my memory was wrong. It wasn't wrong, it was understated.

So I gave Greg a try.

Greg

Greg seemed a lot more promising than DejaGNU. It is scripted in Scheme. Scheme vs TCL? No contest!

It is a little bit odd to be using Scheme for a Kernel project, especially since it wants Guile linked in. But I don't think it would be hard to convert it to Kernel.

It's also easier to control than DejaGNU. The command line options are pretty direct, and the documentation is better.

Unfortunately, Greg is badly out of date now1. In particular, out of the box, it is not capable of setting up the terminal-pair that it needs.

I inspected the code and made it set up the terminal-pair. After that it semi-ran but got the same error on every single pipe. Clearly there's something else that needs to be updated, but now it's less obvious.

I know that Guile converted the function that gives the error to return a wchart some years back, so I suspect that it may be involved. But that's buried deep in Guile's innards and I don't want to fool with it.

The other reason why I'm not fixing Greg is that at the moment Savannah's web interface is down, so I can't see whethr there is a more recent version.

DejaGNU/Expect again

So I gave DejaGNU another try.

While there is a fair amount of documentation for DejaGNU, it's all about how to set it up cross platform. I had to experiment to determine what it wanted. Key trick: It wants to be run in a directory of which subdirectories are named *.t00, *.t05 etc. It will then run all the *.exp files in those subdirectories. I only found that out by looking at an example.

More problems cropped up, related to TCL's stupid syntax. I got it working but I really don't look forward to maintaining this creature.

POSIX-conformant format - not great but tolerable

[This started out as a note where I first talked about DejaGNU, but it grew into a section on its own]

When I say POSIX-conformant test results, it doesn't mean that there's a section of POSIX 1003.3 specifying test result formats. You won't find it, I checked the TOC2. Apparently it's implied thruout.

What POSIX does right:

  • The major taxonomy for tests results is almost right: PASS, FAIL, UNRESOLVED, UNSUPPORTED and UNTESTED.

    DejaGNU extensions add 4 more categories: two for expected failures (XPASS,XFAIL) and 2 for known bugs (KPASS,KFAIL).

What it does wrong:

  • Merges presentation format and report format. True, one could do further formatting on top of the given format, but that's a nitpick; the problem is that the merger cripples the report format. That leads to the next item.
  • No provision for extra information, just a test name.

    A test report format should allow and frame a fair bit of additional information: diagnostics information, additional taxonomic information (eg what caused a given dormancy), context information (test location, exactly how to rerun the test). This information should be in very flexible format. The TESTRAL spec is a good model for what should be allowed.

    Some might argue that analytical information has no place in test reporting. I disagree. It's not just convenient, it belongs there. It is entirely reasonable to treat the set of test cases for diagnosis as overlapping with or identical to the relevant set of test cases for regression testing. Indeed it would be wrong to omit relevant diagnostic cases from the regression test suite, and pointless to construct new cases for diagnostics where test cases exist. So obeying DRY, those cases should not be repeated in code, they should be used from where they sit, in test clauses.

  • While the major headings of the taxonomy are nearly correct, POSIX doesn't place all results in the correct category. In particular, dormant results are placed into all the non-boolean categories: UNRESOLVED, UNSUPPORTED and UNTESTED. It also mingles dormant and bad-test results in UNRESOLVED.

    Dormant tests are tests that are present in some form but deliberately not run for any reason. There is a wide spectrum of possible reasons for dormancy. Some are extremely lightweight (user temporarily marked a test dormant), some are heavyweight (system configuration, in all its many guises).

    Splitting up dormant results into extra categories that requires the test-runner to understand dormancy. That's not just extra work. In general, it can't. Dormancy can be marked for reasons far beyond what the test-runner could understand, and that's not even its job to understand them.

  • The DejaGNU extensions move analysis and presentation logic into the test-runner. That's wrong.

    However, it does make sense to recognize that a given test corresponds to a bug in a bug database, and to have persisting marks of expected failure, possibly even shared by a team. It shouldn't be the test-runner's job to do those things, however.

    Whose job is it then? I long ago analyzed essentially this question for Emtest. I believe the same holds true for these uses of persistence. My conclusion is that:

    • Persistence belongs on both sides of the test-report-presentation/test-runner interface
    • Ultimate control wrt using persistence belongs on the presentation side.
    • The test launcher has a read-only interest in the information - it should can optionally not run expected failures. It shouldn't be required to be that smart, and also belongs on both sides of the interface, with the higher level launcher logic being on the presentation side.
    • The test-runner has essentially no interest in persistence. At most, it incidentally calls libraries that know about persistence.

Footnotes:

1 AFAICT, it was updated in an apparent one-time effort in 2006.

2 I'm not paying umpteen hundred to buy a printed copy of the spec. I'm going by D J Delorie's writeup of it.

2 comments:

  1. At the expense of asking a stupid question, what do you mean, precisely, by "external testing"? I just googled for this, and some people take it to mean black box (as opposed to glass box) testing, almost all of them seem to think it means testing by some organization other than the software developer organization, and some of the security folks seem to think it's related to penetration testing....

    My guess is that you mean something like "out of process" testing --- where the process running the set of tests isn't the same as the process that is running the test code proper. Is that what you mean?

    thanks!

    ReplyDelete
  2. Robert: Yes, your last paragraph is exactly what I meant.

    ReplyDelete