Erlang Central

Difference between revisions of "BuggyPrecondition"

From ErlangCentral Wiki

(QuickCheck Tip on How to find buggy preconditions)
 
Line 65: Line 65:
 
</code>
 
</code>
  
We can see that many different lengths of sequence were run… up to 111 commands in this run, in fact… but it’s hard to draw very firm conclusions about the test data just from this.
+
We can see that many different lengths of sequence were run… up to 111 commands in this run, in fact… but it’s hard to draw very firm conclusions about the test data just from this. Although it might look strange that we have 80 tests with sequence length zero.
 +
 
 +
===Summarizing test cases===
 +
 
 +
We would like to check that we get a reasonably even distribution of all four commands in our test sequences. We could add a call of collect(Cmds,…) to our property, but doing so will display the distribution of test cases in their entirety—and most likely, will fill the screen with output containing an enormous number of different test sequences when we run QuickCheck. This is not really helpful.
 +
 
 +
Of course, we could define a way to “summarize” test cases, and collect the summaries instead… perhaps something like:
 +
<code>
 +
summarize(Cmds) ->
 +
  lists:ulist([Name || {set,_Var,{call,_Mod,Name,_Args}} <- Cmds]).
 +
</code>
 +
which extracts a list of the command names from each test case, and then sorts them and eliminates duplicate names, converting each test case into a list of the command names that appear in it. Adding collect(summarize(Cmds),…) to the property generates output something like this:
 +
 
 +
<code>
 +
OK, passed 1000 tests
 +
60% [register,spawn,whereis]
 +
13% [spawn,whereis]
 +
6% [whereis]
 +
6% [spawn]
 +
3% [register,spawn]
 +
</code>
 +
 
 +
This is much more helpful: there are relatively few entries in the table, and we can see which combinations of commands appeared in tests. In fact, the problem in the specification is fairly visible just from this table—can you see it?
 +
 
 +
===Function command_names===
 +
 
 +
To make this kind of data easier to collect, QuickCheck provides a function command_names (version eqc-1.15 and above) which can be used to redefine summarize more easily:
 +
<code>
 +
summarize(Cmds) ->
 +
  lists:usort(command_names(Cmds)).
 +
</code>
 +
 
 +
Using this function, the collected output would appear like this instead:
 +
<code>
 +
OK, passed 1000 tests
 +
62% [{erlang,whereis,1},{reg_eqc_2,register,2},{reg_eqc_2,spawn,0}]
 +
12% [{erlang,whereis,1},{reg_eqc_2,spawn,0}]
 +
8% [{erlang,whereis,1}]
 +
5% [{reg_eqc_2,spawn,0}]
 +
3% [{reg_eqc_2,register,2},{reg_eqc_2,spawn,0}]
 +
</code>
 +
 
 +
The function names are replaced by {module_name, function_name, arity} triples—just in case you happen to use two functions with the same name in your tests.
 +
 
 +
=== Aggregation: How often do we test each command?===
 +
 
 +
But these tables still don’t answer the simple question: how often do I use each command? The fundamental problem is that we can only collect one value per test case, which QuickCheck then displays statistics over—while what we would really like to do instead, is to collect all the function names in each test case, and display statistics aggregated over all tests when QuickCheck is done testing.
 +
 
 +
If we want aggregated statistics over command names, we simply add aggregate(command_names(Cmds),…) (version eqc-1.15 and above) to our property, in the same way as we have so far added collect. This collects a list of values in each test, aggregates all the lists together, and then displays a table showing the frequency with which each list element occurs in the aggregated result.
 +
In this case, the result we obtain is
 +
<code>
 +
OK, passed 1000 tests
 +
35% {reg_eqc_2,spawn,0}
 +
35% {erlang,whereis,1}
 +
28% {reg_eqc_2,register,2}
 +
</code>
 +
which shows that 35% of tested commands were spawn, 35% were whereis, and 28% were register… wait a minute, what happened to unregister?
 +
 
 +
===A buggy precondition===
 +
 
 +
When a command fails to appear as often as we expect in tests, this is often because the precondition we have stated is more restrictive than we expect, and so prevents the inclusion of the command in test cases. When a command fails to appear at all, this is often because of a bug in the precondition, which makes it always false. Let’s check the precondition of unregister:
 +
 
 +
<code>
 +
precondition(S,{call,_,unregister,[Name]}) ->
 +
  lists:keymember(Name,2,S#state.regs);
 +
</code>
 +
 
 +
But this is wrong! This specification uses a list of {Name,Pid} pairs to record the currently registered processes… so a Name can be unregistered if Name occurs as the first element of a tuple in S#state.regs. Of course, the name never appears as the second component of a tuple—because these components are pids. Yet the precondition above looks for Name as a second component! As a result this precondition is always false, and unregister is never included in test cases. The precondition should, of course, have been lists:keymember(Name,1,…) instead.
 +
 
 +
Fixing the bug, and retesting, the distribution we obtain is:
 +
<code>
 +
OK, passed 1000 tests
 +
32% {erlang,whereis,1}
 +
32% {reg_eqc_2,spawn,0}
 +
26% {reg_eqc_2,register,2}
 +
7% {reg_eqc_2,unregister,1}
 +
</code>
 +
which now contains all four commands, as we would expect. Unregister occurs less often than the other commands, but this is not so surprising since before can unregister anything, we need to both spawn a process and register it. We are at any rate in a good position to start adjusting the command probabilities using frequency, and to see directly how that affects the actual command distributions in the generated tests.
 +
 
 +
===The lesson===
 +
 
 +
'''Always aggregate the command names in your tests, and check to see that all the commands you expect are present.''' This is an easy way to avoid the “gotcha” of a wrongly defined precondition hindering one of the commands you want to include from being tested at all.

Revision as of 10:55, 30 December 2008


Contents

Authors

John Hughes posted by Thomas Arts

http://www.quviq.com/

How to find buggy preconditions

When writing state machine models with QuickCheck's eqc_statem, you specify preconditions that should hold in order to include a command in a test. How can you find out that the precondition you wrote is buggy?

When tests pass, you never see the test data—so unless you take care to gather relevant statistics, you can end up running thousands of successful tests that test something quite different from what you intended! In particular, when you use eqc_statem to generate tests, you would like to know that your tests contain a good mix of all the commands that you are supposed to be testing—and hitherto, this hasn’t been easy to check. Sound familiar? Then read on.

Example - process registery

For example, suppose we test the process registry, as in our basic training course, generating command sequences that contain calls to spawn, register, unregister and whereis:

command(S) -> 
    oneof( [{call,?MODULE,spawn,[]}]++ 
           [{call,?MODULE,register,  
               [name(),elements(S#state.pids)]}
             || S#state.pids/=[]]++ 
           [{call,?MODULE,unregister,[name()]}, 
            {call,erlang,whereis,[name()]}]). 

Here’s the property we’re testing:

prop_registration() -> 
   ?FORALL(Cmds,commands(?MODULE), 
           begin 
             {H,S,Res} = run_commands(?MODULE,Cmds), 
             [?MODULE:unregister(N) || {N,_} <- S#state.regs], 
             [exit(P,kill) || P <- S#state.pids], 
             Res==ok 
           end). 

There’s a little clean-up code in there to unregister and kill the pids used in each test, but otherwise this is pretty standard. Now, there is a problem in the QuickCheck specification we’re using here (not in these definitions, but elsewhere). All the tests pass, but we’re not really testing what we think. Let’s see how the problem can be found.

Collecting the lengths of test sequences

Of course, we can check the lengths of the generated test cases, by adding a collect(length(Cmds),…) to our property. If we do so, we’ll see output something like this:

OK, passed 1000 tests 
8% 0 
8% 2 
8% 1 
6% 4 
6% 3 
4% 6
4% 5 
4% 8 
4% 7
3% 12 
3% 11 
3% 10 
3% 9 
2% 16 
2% 15 
2% 18 
2% 14 
… 

We can see that many different lengths of sequence were run… up to 111 commands in this run, in fact… but it’s hard to draw very firm conclusions about the test data just from this. Although it might look strange that we have 80 tests with sequence length zero.

Summarizing test cases

We would like to check that we get a reasonably even distribution of all four commands in our test sequences. We could add a call of collect(Cmds,…) to our property, but doing so will display the distribution of test cases in their entirety—and most likely, will fill the screen with output containing an enormous number of different test sequences when we run QuickCheck. This is not really helpful.

Of course, we could define a way to “summarize” test cases, and collect the summaries instead… perhaps something like:

summarize(Cmds) -> 
   lists:ulist([Name || {set,_Var,{call,_Mod,Name,_Args}} <- Cmds]).

which extracts a list of the command names from each test case, and then sorts them and eliminates duplicate names, converting each test case into a list of the command names that appear in it. Adding collect(summarize(Cmds),…) to the property generates output something like this:

OK, passed 1000 tests 
60% [register,spawn,whereis] 
13% [spawn,whereis] 
6% [whereis] 
6% [spawn] 
3% [register,spawn]

This is much more helpful: there are relatively few entries in the table, and we can see which combinations of commands appeared in tests. In fact, the problem in the specification is fairly visible just from this table—can you see it?

Function command_names

To make this kind of data easier to collect, QuickCheck provides a function command_names (version eqc-1.15 and above) which can be used to redefine summarize more easily:

summarize(Cmds) -> 
   lists:usort(command_names(Cmds)).

Using this function, the collected output would appear like this instead:

OK, passed 1000 tests 
62% [{erlang,whereis,1},{reg_eqc_2,register,2},{reg_eqc_2,spawn,0}] 
12% [{erlang,whereis,1},{reg_eqc_2,spawn,0}] 
8% [{erlang,whereis,1}] 
5% [{reg_eqc_2,spawn,0}] 
3% [{reg_eqc_2,register,2},{reg_eqc_2,spawn,0}]

The function names are replaced by {module_name, function_name, arity} triples—just in case you happen to use two functions with the same name in your tests.

Aggregation: How often do we test each command?

But these tables still don’t answer the simple question: how often do I use each command? The fundamental problem is that we can only collect one value per test case, which QuickCheck then displays statistics over—while what we would really like to do instead, is to collect all the function names in each test case, and display statistics aggregated over all tests when QuickCheck is done testing.

If we want aggregated statistics over command names, we simply add aggregate(command_names(Cmds),…) (version eqc-1.15 and above) to our property, in the same way as we have so far added collect. This collects a list of values in each test, aggregates all the lists together, and then displays a table showing the frequency with which each list element occurs in the aggregated result. In this case, the result we obtain is

OK, passed 1000 tests 
35% {reg_eqc_2,spawn,0} 
35% {erlang,whereis,1} 
28% {reg_eqc_2,register,2}

which shows that 35% of tested commands were spawn, 35% were whereis, and 28% were register… wait a minute, what happened to unregister?

A buggy precondition

When a command fails to appear as often as we expect in tests, this is often because the precondition we have stated is more restrictive than we expect, and so prevents the inclusion of the command in test cases. When a command fails to appear at all, this is often because of a bug in the precondition, which makes it always false. Let’s check the precondition of unregister:

precondition(S,{call,_,unregister,[Name]}) ->
  lists:keymember(Name,2,S#state.regs);

But this is wrong! This specification uses a list of {Name,Pid} pairs to record the currently registered processes… so a Name can be unregistered if Name occurs as the first element of a tuple in S#state.regs. Of course, the name never appears as the second component of a tuple—because these components are pids. Yet the precondition above looks for Name as a second component! As a result this precondition is always false, and unregister is never included in test cases. The precondition should, of course, have been lists:keymember(Name,1,…) instead.

Fixing the bug, and retesting, the distribution we obtain is:

OK, passed 1000 tests 
32% {erlang,whereis,1} 
32% {reg_eqc_2,spawn,0} 
26% {reg_eqc_2,register,2} 
7% {reg_eqc_2,unregister,1}

which now contains all four commands, as we would expect. Unregister occurs less often than the other commands, but this is not so surprising since before can unregister anything, we need to both spawn a process and register it. We are at any rate in a good position to start adjusting the command probabilities using frequency, and to see directly how that affects the actual command distributions in the generated tests.

The lesson

Always aggregate the command names in your tests, and check to see that all the commands you expect are present. This is an easy way to avoid the “gotcha” of a wrongly defined precondition hindering one of the commands you want to include from being tested at all.