Dijkstra on Abstraction for Program Correctness

This article is based on Edsger Dijkstra's "Notes on Structured Programming". I pick out only one string from this tapestry.

The effective exploitation of his powers of abstraction must be regarded as one of the most vital activities of a competent programmer. In this connection it might be worth-while to point out that the purpose of abstracting is not to be vague, but to create a new semantic level in which one can be absolutely precise.

— Edsger Dijkstra, The Humble Programmer

In science, before you make a formal claim, you first ought to be convinced that it’s correct. Proof is what convinces you (and others) that your claim is correct.

In the same way, before a programmer formally claims that their program is correct (i.e, all computations the program can evoke will produce the desired outcome), they ought to have been convinced of this by a proof.

A typical proof of a program’s correctness you can expect from programmers is “the tests passed”, where “tests” means black-box tests. Dijkstra didn’t think that passing black-box tests were sufficiently convincing proof. Dijkstra (ideally) wanted absolute proof and black-box testing could hardly offer this. He showed the insufficiency of black-box testing using an illustration of testing a machine that is claimed can multiply two 27-bit integers. How do you prove the absolute correctness of this machine using black-box testing? By verifying its output for each valid combination of input. But there are 2^54 combinations of input. Running a multiplication program 2^54 times in his day, he claimed, would take about 10,000 years (today, on my PC, it would take over 40 years). Therefore, this is impractical.

Dijkstra proposed that the only way to demonstrate absolute correctness of a program was to look inside the black box, to assess all possible computations that the program can evoke by assessing the program code. This meant going through the program one statement at a time, in the sequence of execution, assessing the effect of each statement’s execution and the cumulative effect of their execution, for all possible sequences of execution (as established by condition branching). He called this “enumeration”. (He also proposed the use of mathematical induction for loops and recursion for which enumeration can be impractical or impossible.)

(Notice, parenthetically, that a program is separate from the computations it evokes - the program is a bunch of text and the computations are a sequence of actions performed by a computer. And the ease of understanding and thus making claims about a computation by assessing its program depends on how much “the structure of the program text reflects the structure of the computation”. Hence the death of gotos.)

But the problem with this is that we humans are “slow-witted”. He said, “Enumerative reasoning is all right as far as it goes, but as we are rather slow-witted it does not go very far. Enumerative reasoning is only an adequate mental tool under the severe boundary condition that we use it very moderately.” Using enumeration on a highly complex monolithic program is too intellectually taxing to be useful.

We still need enumeration, though, since it’s our best bet for reaching absolute correctness. So, we must deal with the program's complexity. Dijkstra’s solution was to decompose the program into modules of manageable complexity. And his approach to achieving this was abstraction. “We should appreciate abstraction as our main mental technique to reduce the demands made upon enumerative reasoning,” he said.

Using abstraction, we can decompose a program into (a hierarchy of) modules, where each one is concerned with only a single objective and is conceptually stripped of all details that are not directly relevant to this objective. Each module may contain only the details that are essential to it (everything else is abstracted away). Therefore, each module may contain only its irreducible complexity or in other words, it can be made as simple as it can possibly be and thus more amenable to enumeration. Yay!

Dijkstra used the example of a program for printing the first 1,000 prime numbers. One of its modules (the root module) can be expressed as a function like this

fun `print first thousand prime numbers`() {
	var list: IntegerList
	list = `compute first thousand prime numbers`()
	`print list of numbers`(list)
}

This module is composed of three parts - the data type IntegerList, and the two functions compute first thousand prime numbers and print list of numbers. All these are essential to the module - the module only needs a way to get the primes, a way to print them and a common structure for the list of primes. Of course, each of them has an internal representation or implementation. IntegerList could be an array or a linked list; compute first thousand prime numbers may be using the Fermat test or the Sieve of Eratosthenes and so forth. But these details are not directly relevant to the module print first thousand prime numbers and so they are abstracted away. Only the essentials, the bare minimum are retained. Therefore, the module's complexity is minimised. And so enumeration becomes significantly more practical.