Test = examples or theorem ?

Recently, I had a conversation with a friend about testing. The job of my friend is about the tests of system components from controlled conditions to real environment. Around an incredible meal and a tasty wine, it was the occasion to compare our understanding of testing. Mine is using the lens of maths. Let recap here!

(Is it an excuse for presenting the proof assistant Coq?)

Use-case presentation

The aim is to be didactic, so let a pick really simple use-case. Let consider an arbitrary list of natural numbers, say [5,4,3,2,1], and we want to be able to:

  • insert a number, say 3, i.e., resulting to [5,4,3,3,2,1];
  • count the occurrences of one number, say 3, i.e., resulting to the answer 2.

For the insertion, let consider that the list is read from left to right and the number is inserted when the next one is smaller. This choice could be flipped.

These functions are used by our fictional application and our goal is to test that its implementation is correct.

Weird C implementation

Because the syntax of C is very common, let fix the ideas using the C programming language. The code we write will compile and run but obviously it will not be C-compliant for efficiency (fool the one who uses recursion with C!) – the aim is to ease the reading with less common language as Haskell or Coq.

So, what do we need for the implementation? Easy, the very common linked list data structure and the functions insert and count. In addition, let also implement some helper functions. A question is the representation of the natural numbers, let keep it simple and just use integer. The linked list reads,

typedef struct List {
        Int head;
        struct List *tail;
} List;

where Int means an alias of int (typedef int Int;), using the Haskell typecase. The function insert takes a natural number, say n, and a list, say lst and returns another list. An implementation using recursion reads,

List* insert (Int n, List* lst)  {
        if (lst == NULL) {
                return Cons (n, NULL);
        } else {
                Int a = lst -> head;
                List *tail = lst -> tail;
                if (a <= n) {
                        return Cons (n, lst);
                } else {
                        return Cons (a, insert (n, tail));

where List* Cons (Int n, List* lst) is a function for helping the construction of new list. The function count takes a natural number, say n, and a list, say lst, and returns a natural number. A recursive implementation reads,

Int count (Int n, List* lst) {
        if (lst == NULL) {
                return 0;
        } else {
                Int a = lst -> head;
                List* tail = lst -> tail;
                Int r = count (n, tail);
                if (n == a) {
                        return 1 + r;
                } else {
                        return r;

So far, so good! We can test these two functions individually or we can also test them in combination. Somehow, we are looking for a property… Well, if we insert a number to a list, then the this number appears one more time. Other said, we have the test,

void test (Int n, List* lst) {
        if (count (n, insert (n, lst)) == 1 + count (n, lst)) {
                printf("test ok.\n");
        } else {
                printf("test ko!\n");

And now, we needs examples for testing. For instance, we can implement a function, say generate, that generates one list, and uses it for one test,

void main() {
        List *lst = generate (5);
        test (3, lst);

Hum?! But how do we know that this test covers well the real application? We have tested only one example, after all. Ok, we could test many more examples and complexify the generator. That's the direction of QuickCheck and other testing frameworks.

Here, we are trying to convince ourselves that many examples could be enough to be confidant. People with maths background know that the examples are often between the definition and the theorem. What acts as definition and as theorem?

Coq as a programming language

Coq is a proof assistant, i.e., an interactive theorem prover. Wow! it makes sense with the paragraph just above. However, we want a pragmatical implementation and test it – not fly in abstract maths. Let stay on track and use Coq for the implementation and the tests.

First thing first, quick presentation of Coq. For instance, it reads,

Require Import PeanoNat.
Notation Int := nat.

Definition y := 42.
Definition f x := x + 1.
Compute f y.

where it seems self explanatory; y is defined as the value 42 and f is defined as a function. The last instruction returns,

= 43
: Int

and an attentive reader could remark that the «mathematical» parenthesis as in \(f(y)\) are omitted; the notation is inspired by ML and derivative. The first instruction imports utilities for working with natural numbers and we will discuss the second instruction Notation in another section.

A recursive function cannot be defined using the keyword Definition and instead, we have to use the keyword Fixpoint. For instance, the factorial reads,

Fixpoint fac n :=
  match n with
    0 => 1
  | S n' => (S n')  * fac n'

where S n' means the successor of n', i.e., S n' = n' + 1; see the discussion about Notation for more details. The important point is the recursion: fac is defined using itself, i.e, fac and thus it is a Fixpoint. The keyword match allows to deconstruct the argument (here n) and check its pattern (it could be somehow thought as a if-branch; but the pattern matching is more powerful). And the evaluation of Compute fac 5. returns 120.

What is missing? The ability to define the linked list data structure. It is done using the keyword Inductive and it reads,

Inductive List : Type :=
| Cons (head : Int) (tail : List).

where List somehow defines a new type with 2 constructors: NULL without any argument, and Cons which is defined recursively by a simple number (head) and another List (tail). Note that using Coq (and similarly with OCaml or Haskell), it is possible – not to say strongly recommended – to pattern match on these « constructors ».

Now, we have defined the datastructure for the linked list, let implement the insert function,

Fixpoint insert (n : Int) (lst : List) : List :=
  match lst with
    NULL => Cons n NULL
  | Cons a tail => if a <=? n
		   then Cons n lst
		   else Cons a (insert n tail)

which takes two arguments, n of type Int and lst of type List and returns a List. The match acts as the if-branch in the C implementation. And reading both side by side, they appears similar, right?

The function count reads,

Fixpoint count (n : Int) (lst : List) : Int :=
  match lst with
    NULL => 0
  | Cons a tail => let r := count n tail in
		   if n =? a
		   then 1 + r
		   else r

where all should be almost self explanatory. Well, Coq is a programming language and thus it can be used for programming.

Wait, we were discussing about testing, so what about the tests?

Coq as a proof assistant

Now, let jump into the core point of the post. Having maths in mind, we would like to compute, i.e., specify a value for n and for the list lst, for instance,

Compute count 5 (insert 5 (Cons 1 (Cons 2 (Cons 3 NULL)))).

which returns 1 as expected and it also 1 + 0 since the value 5 does not appear in this list of the three values 1, 2 and 3. Let simplify this comparison and rewrite in Coq the functions generate and test,

Fixpoint generate (n : Int) : List :=
  match n with
    0 => Cons n NULL
  | S m => Cons (S m) (generate m)

Definition test (n : Int) (lst : List) :=
  count n (insert n lst) =? 1 + count n lst.

Nothing fancy in these definitions, right? Again, we can compute,

Compute test 3 (generate 5).

which returns true. From math in mind, instead we would like to write an example, for instance,

Example test_3_with_5 : test 3 (generate 5) = true.

however, Coq returns instead,

1 subgoal (ID 4)

  test 3 (generate 5) = true

Again, having maths in mind, we would write a Proof just by computing the few terms. Similarly, in Coq, we write,


which computes the left hand side and the right hand side, last comparing that they are correctly equal.

Ok, we have the intuition that this property (specification?) holds: for all natural number and for all list of natural numbers, counting the number of occurrences of one specific number when this specific number is inserted to this list is equal to the number of occurrences of this specific number in this very same list plus one. Perhaps it is a theorem… in Coq, it reads,

Theorem count_incr_by_insert : forall (n : Int) (l : List),
    count n (insert n l) = 1 + count n l.

and Coq returns,

1 subgoal (ID 6)

  forall (n : Int) (l : List), count n (insert n l) = 1 + count n l

which requires a Proof. It is far beyond this post to expose the Coq’s tactics for proving. Here a proof could be,

  intros n l; induction l.
  (* Search (_ =? _). *)
  rewrite Nat.eqb_refl.
  case (head <=? n).
  rewrite Nat.eqb_refl.
  case (n =? head).
  rewrite IHl; reflexivity.
  rewrite IHl; reflexivity.

Wow! We show that the both implementations insert and count associated with the custom data structure List are consistent (using the logic implemented by Coq).


We do not dive into all the extraction features of Coq. For instance, these functions could be converted as Haskell or OCaml and thus compiled to efficient native machine code. Here, for instance, using,

Require Extraction.
Extraction Language Haskell.
Extraction "Generated.hs" insert count.

then it extracts to the Haskell code,

data List =
 | Cons Nat List

insert :: Nat -> List -> List
insert n lst =
  case lst of {
   NULL -> Cons n NULL;
   Cons a tail -> case leb a n of {
		   True -> Cons n lst;
		   False -> Cons a (insert n tail)}}

count :: Nat -> List -> Nat
count n lst =
  case lst of {
   NULL -> O;
   Cons a tail -> let {r = count n tail} in case eqb n a of {
					     True -> add (S O) r;
					     False -> r}}

where it is not truly an idiomatic Haskell code but the code compiles.

Ah, why does the signature use the type Nat and not the type Int? The question is thus: how to define mathematically a natural number for computing? A computer is able to represent a finite set of natural numbers, usually named integer. Outside this range (finite set), the natural numbers cannot be represented by the computer; to be precise, it is because a computational efficiency for the processing unit that these integers are used.

From a mathematical point of view, these integers do not satisfy all the rules of the natural numbers. Instead, the natural numbers are defined as a zero, noted e.g., O, and a successor, i.e., the number one is defined as the successor of zero, noted S 0, and so on. For instance, the extraction of the addition reads,

data Nat =
 | S Nat

add :: Nat -> Nat -> Nat
add n m =
  case n of {
   O -> m;
   S p -> S (add p m)}

instead of the builtin n + m for the Haskell type Int. The number 5 is thus S (S (S (S (S 0)))) and such addition implies many computations or memory. Other said, the definition sacrifices the performance when computing. The issue also applies for the real numbers and the floating-point numbers; even probably worse.

Opinionated afterword

Usually, we have the specification, the implementation and the test. Then many efforts are provided to check the consistency of these three components. If we apply the lens of maths, then the components become,

  • specification = definition and theorem
  • test = proof
  • implementation = extraction

© 2014-2024 Simon Tournier <simon (at) tournier.info >

(last update: 2024-05-27 Mon 14:02)