Back from the Open Source Experience 2021 meeting, I got the chance to learn more about Pharo – a pure object-oriented programming language and a powerful environment à la Smalltalk – and about Coq – a formal proof management system. Even, I had the opportunity to discuss with Yves Bertot and I had been so lucky that he explained me the demo one-to-one. To be honnest, I am still rehashing many concepts and this experience is feeding my willing to learn more.
Once at home, I want to try by myself. Ah, because I am running Guix as a computational environment manager, the story starts with a yak shaving: close this old bug#46016 about Proof General. Other said, I want to interact with Coq from my regular text editor Emacs. Therefore, let fix the annoyance with patch#51755. And, since we are at it, although it is not necessary, let also add the Coq package semantics with the patch#51896. Now we are ready, let's go!
Our aim with this post is to sketch some ideas about compilation and correctness using a very basic --not to say trivial_– toy implementation. How to test the implementation? How to know the implementation is correct? And we get it, our aim is to use Coq.
Because we want to keep everything as simple as possible, the compiler is
about the simplest language ever: a calculator computing only additions but
with parenthesis, e.g.,
(1+2)+(3+(4+0)). Again, keeping everything as
simple as possible, we are using the simplest computational model: a stack
machine with two instructions (
ADD). Therefore, we need:
- a parser reading the input and returning a tree,
- a compiler transforming the tree to instructions for the stack machine,
- an emulator running the stack machine.
Our didactic goal
In our very restricted toy framework, the pipeline of a computation is defined by this composition –data flowing from right to left–,
pop . emulator . compiler . push
pushreturns a tree,
compilertransforms the tree to a sequence of instructions,
emulatorexecutes this list of instructions,
popreturns the last elements; we expect one value.
In this post, we are going to implement various
The aim is to discuss how it is possible to ensure the correctness of the
emulator . compiler.
Implementation using Haskell
Parsing with Haskell
Using the builtin Haskell parsing utilities, the
parser function reads,
parser :: String -> Maybe STree parser input = case parse expr "" input of Left _ -> Nothing Right e -> Just e
expr of type
Parser STree. Well, please read all the details in the
Parser.hs. Keeping simplicity in mind, the data type for the tree is
data STree = Val Integer | Add STree STree
where the leaf is named
Val and carries the value of an integer; the
Add is a node and by referring recursively to
STree it defines
the tree . For instance,
(1 + 2) + 3 is thus represented by the tree,
Add / \ Add Val 3 / \ Val 1 Val 2
Val are leaves and
Add are nodes. So far, so good!
The emulator (virtual machine) runs two instructions (
binary, here, is a list of instructions, and the stack is a list of integers.
The Haskell code implementing these words reads,
data Instr = PUSH Integer | ADD type Binary = [Instr] type Stack = [Integer]
Therefore, the emulator takes a binary, returns a stack and it executes (computes) starting with an empty stack,
emulator :: Binary -> Stack emulator b = exec b 
For the interested reader, the details of
exec is provided here.
The input of the compiler a tree
STree and it produces a binary (a list of
instructions), i.e., it reads,
compiler :: STree -> Binary compiler source = comp source 
comp is an auxiliary function dealing with each case of the tree: if
it is a leaf (
Val), then the instruction is to
PUSH the value in the stack
bytecode), otherwise if it is a node (
Add), then compile recursively the
subtrees and set the instruction
ADD to stack. These words are implemented
in Haskell with,
comp :: STree -> Binary -> Binary comp (Val n) bytecode = PUSH n : bytecode comp (Add x y) bytecode = comp x $ comp y $ ADD : bytecode
…to complete a working example, the functions
pop are here the
identity function (do nothing) and take the first element. Well, adding some
other minor code,
cmdfor the command line interface and parsing the input string,
gofor gluing the compilation and running parts,
emitfor debugging purpose,
and we end with,
$ ./calc-no-coq -x '(1+2)+3' = 6 $ ./calc-no-coq -c '(1+2)+3' Instructions: "PUSH 1, PUSH 2, ADD, PUSH 3, ADD, ;"
Implementation using Coq
As you see just above, we named the previous
calc-no-coq because the title
of this post is about «compilation and correctness», thus now we are going to
speak about correctness. The perfect tool is Coq the proof assistant. Before
proving, let use Coq as a programming language and implement these compiler
and emulator in Coq, then extract to Haskell. Other said, we reuse the same
pipeline as before,
pop . emulator . compiler . push
compiler are going to be implemented in Coq, the
push will be adapted accordingly, i.e.,
pushconverts from the Haskell tree datatype
STreeto the Coq tree datatype
Tree, i.e, signature
STree -> Coq.Tree,
popconverts from the Coq stack datatype to Haskell integer, i.e., signature
Coq.Stack -> Integer
Coq.Tree is defined by the module
Inductive Tree : Type := Val (n : nat) | Add (x : Tree) (y : Tree).
which looks similar to definition of
Coq.Stack is identically
defined as above, i.e., a list of number,
Definition Stack := list nat.
So far, so good!
Similarly to the previous implementation, we straightforwardly adpat the data structure for the instructions and the binary (list of instructions),
Inductive Instr:Type := PUSH (n : nat) | ADD. Definition Binary := list Instr.
The emulator executes a bytecode (
b : Binary) and returns a stack,
Fixpoint exec (b : Binary) (s : Stack) : Stack := match b, s with PUSH n :: rest, stack => exec rest (n :: stack) | ADD :: rest, m :: n :: stack => exec rest (m + n :: stack) | , stack => stack | _, stack => stack end. Definition emulator (b : Binary) := exec b .
exec is the function really computing; this function takes two
arguments: a list of instructions (
Binary) and stack and then returns
The compiler requires a tree (named above
Coq.Tree) and an auxiliary function,
Fixpoint comp (source : Tree) (bytecode : Binary) : Binary := match source with Lang.Val n => PUSH n :: bytecode | Lang.Add x y => comp x (comp y (ADD :: bytecode)) end. Definition compiler (source : Tree) := comp source .
where all is strongly similar with the previous implementation.
…to a complete working example, we just need to extract from Coq to Haskell with,
Require Export Lang Compiler Emulator. Require Extraction. Extraction Language Haskell. Extraction "Generated.hs" compiler emulator.
additing some Haskell glue code, as
pop, and command-line and
parser, we get,
$ ./calc -x '(1+2)+3' = 6 $ ./calc -c '(1+2)+3' Instructions: "PUSH 1, PUSH 2, ADD, PUSH 3, ADD, ;" $ ./calc -c '1+(2+3)' Instructions: "PUSH 1, PUSH 2, PUSH 3, ADD, ADD, ;"
Note the order of the instructions depending on the parenthesis order, i.e., depending on the tree, for instance,
(1+2)+3 1+(2+3) Add Add / \ / \ Add Val 3 Val 1 Add / \ / \ Val 1 Val 2 Val 2 Val 3
Neat! However, let avoid to go out of rail and thus let recap!
- we are interested by the
- for both implementations, the command-line is done by the same Haskell
cmd); idem for parsing (
Now, we want to have a proof that the implementations of
compiler are correct. For the implementation in Haskell, we cannot do much,
except run QuickCheck (large number of examples picked randomly). For the
implementation in Coq, well Coq is a proof assistant after all, so could we
prove something about the correctness?
compiler are correct
In place of a large test suite, we can reason about the logic of the functions
compiler. As we wrote in Test = examples or theorem?,
instead of ruling out many examples to convince us about the correctness, we
could try to prove a theorem. Which one?
We would like to prove that compiling the code and running it using the emulator is equal to interpret it using an (abstract) evaluator. It means prove this theorem,
Theorem correctness: forall (code : Tree), emulator (compiler code) = eval code :: .
eval is an abstract interpreter implemented in Coq but never
effectively doing a single real computation. Such interpreter is easily
defined by the recursive function,
Fixpoint eval (source:Tree) := match source with Val n => n | Add x y => (eval x) + (eval y) end.
A formal proof of the theorem
correctness requires other theorems, lemma or
corollary. Note that for Coq, theorem, lemma and corollary are the same but
they help the human reader.
How do we prove this theorem
correctness assisted by Coq?
Unwrap the emulator
The emulator is a layer on the top of the execution model – the function
exec. Instead of the theorem
correctness, we could think about the
Theorem correctness': forall (code : Tree) (stack : Stack), exec (compiler code) stack = eval code :: stack.
and if this theorem
correstness' holds, then the proof of the theorem
Corollary correctness'': forall (code : Tree), exec (compiler code)  = eval code :: . Proof. intro code. rewrite correctness'. reflexivity. Qed. Theorem correctness: forall (code : Tree), emulator (compiler code) = eval code :: . Proof. intro code. unfold emulator. rewrite correctness''. reflexivity. Qed.
where the corollary is a straightforward application of the (admitted for now)
theorem. Then, the proof of
correctness just unfolds the function of
emulator and applies (
rewrite) the previous proved corollary.
How do we prove this theorem
Unwrap the compiler
The correctness is now dependent on the theorem
compiler is a wrapper around the function
comp, we could think about the
Lemma correctness_comp: forall (code : Tree) (bytecode : Binary) (stack : Stack), exec (comp code bytecode) stack = exec bytecode (eval code :: stack).
The key point of the proof is the use of the additing commutativity proof
Nat.add_comm, \(\forall n, m : n + m = m + n\). Such could appear
straightforward, but it is not because it depends on how the numbers are
internally represented. For efficiency purpose, a computer somehow represents
only a finite set of natural numbers, commonly named integer. Instead Coq is
using, by default, the mathematical construction of natural numbers defining a
O and a successor, i.e., the data structure,
Inductive nat : Type := O | S n.
where e.g., 3 is thus
S (S (S 0)). That's why, the proof applies
Nat.add_comm. Aside this consideration, the proof mainly unfolds and folds
the definitions; see below.
correctness_comp lemma, then the corollary
just considers the case of an empty binary,
Corollary correctness_comp': forall (code:Tree) (stack:Stack), exec (comp code ) stack = eval code :: stack.
…ending the proof is just unfolding the functions of interest and applying
rewrite) the previous proved results. The complete proof of correctness is
therefore (presented linearly as in math textbook),
Lemma correctness_comp: forall (code:Tree) (bytecode:Binary) (stack:Stack), exec (comp code bytecode) stack = exec bytecode (eval code :: stack). Proof. intros code. induction code. - simpl. reflexivity. - unfold comp. fold comp. intros bytecode stack. rewrite IHcode1. rewrite IHcode2. unfold eval. fold eval. unfold exec. fold exec. rewrite Nat.add_comm. reflexivity. Qed. Corollary correctness_comp': forall (code:Tree) (stack:Stack), exec (comp code ) stack = eval code :: stack. Proof. intros code stack. rewrite correctness_comp. unfold exec. reflexivity. Qed. Theorem correctness': forall (code:Tree) (stack:Stack), exec (compiler code) stack = eval code :: stack. Proof. intros code stack. unfold compiler. rewrite correctness_comp'. reflexivity. Qed. Corollary correctness'': forall (code:Tree), exec (compiler code)  = eval code :: . Proof. intro code. rewrite correctness'. reflexivity. Qed. Theorem correctness: forall (code:Tree), emulator (compiler code) = eval code :: . Proof. intro code. unfold emulator. rewrite correctness''. reflexivity. Qed.
Done! Well, almost. We could also prove that the behaviour of the
eval is correct, i.e., the properties of the addition for
mathematical intergers are fully satisfied.
A Coq's proof is somehow a compilation. Therefore, similarly as Thompson's Reflections on Trusting Trust, what is the seed we have to trust? How to inspect this seed? What is the size / complexity of this seed?
For sure, realistic compilers usable for critical embedded software using formal verification are around, as CompCert – incredible piece!
Last, whatever the trust we have in the compiler, it runs on the top of a kernel (operating system). What trust do we have in this binary kernel? How this binary kernel is it built? Maybe, all the bootstrap chain could starts from stage0 (Hex0) on some formally verified microkernel as seL4 or on a Linux kernel compiled by CompCert from GNU Hurd microkernel1.
Assuming Coq would run Hurd. ;-)