The Mathematics of Double-checking Programs

Read the original at UCL Discovery ↗ ← All Writing

The piece in brief

There is no algorithm that can check every program for bugs — Rice's theorem forbids it. So how do Amazon, Google and your bank sleep at night? With proof: Hoare Logic, Separation Logic, and the quiet mathematics of double-checking.

Imagine you own a large company delivering essential services, akin to industry giants like Amazon, Google, or Meta. Your business boasts an enormous client base, thousands of servers delivering these services, and generates millions (if not billions) of pounds in revenue per year. To keep up with the ever-evolving tech landscape, you constantly need to add new features, correct bugs, and otherwise work on the system delivering your service. This relentless drive is powered by a dedicated fleet of engineers maintaining the system.

It's a Friday afternoon, and Bob, one of your diligent software engineers, is working late at the London office to meet a critical deadline. In an effort to fix a minor bug, he uploads some new code to the server. Business continues to run smoothly for now, but a crucial question looms large: How can you be sure that Bob's code won't crash the system? After all, he's human, it's late, and a small mistake could easily slip through. With so much money and reputation at stake, the risk of your service going down over the weekend is a chilling prospect.

This scenario underscores a common dilemma faced by many businesses today. In a world increasingly driven by software, there is a pressing need to ensure that code performs as intended. Companies like Amazon, Google, and Meta invest substantial resources to mitigate the risk of system failures. As the digital economy expands, the ability to manage and minimize these risks becomes ever more critical.

There are not only financial concerns for big business, but safety and privacy ones for individuals too. For example, how can you trust that your bank account is safe? Can you trust that the online login system for your bank works correctly so that no one has access to your account that shouldn't? Ensuring system stability isn't just about maintaining operations; it's about safeguarding trust, reputation, and reliability in an interconnected world where downtime can have far-reaching consequences.

A pressing example is the CrowdStrike incident on the 19th of July 2024. The cybersecurity firm CrowdStrike released a flawed update for its Falcon Sensor security software, leading to significant disruptions on Microsoft Windows computers using the software. This update caused around 8.5 million systems to crash and become unable to restart properly, marking what has been described as the largest outage in the history of information technology.

There is an apparently obvious answer to the problem of program/system correctness: just double-check that Bob's code is correct before implementing it. This is surprisingly subtle. What does it mean to double-check it? One answer is to implement an algorithm for checking that code is OK before committing it to your operations. Unfortunately, providing such an algorithm is a mathematical impossibility.

A program is the specification of a process by which an input yields an output, called computation. For example, the program ‘$x + 1$’ takes an integer as an input and outputs another integer; namely, one more than the input. There are many equivalent accounts of what it means to be a computation, perhaps the most celebrated is the concept of a Turing Machine.

Alonzo Church in 1935 and Alan Turing in 1936 proved that there are limits to what one may actually say about and do with computations. An important consequence of which was a negative answer to the Entscheidungsproblem set by Hilbert and Ackermann in 1928: ‘Is there an algorithm that considers, as input, a statement and answers “yes” or “no” according to whether the statement is universally valid?’

Let a semantic property of a program be one about the program's behaviour (e.g., ‘Does it always output 0?’). This is in contrast to a syntactic property such as ‘Does the program contain an if-then-else statement?’. Call a property trivial if it is either true or false for every program. In his 1951 doctoral thesis, Henry Rice gave the following generalization of the limitative results by Church and Turing:

Theorem (Rice)

All non-trivial semantic properties of programs are undecidable. That is, no algorithm exists that determines whether or not a given program has the property or not.

Checking the presence of bugs is a semantic property. Consequently, Rice's theorem tells us that it is impossible to write a program that automatically checks for the absence of bugs in other programs; that is, taking a program and a specification as input, checks whether the program satisfies the specification or not. This does not imply an impossibility to prevent certain types of bugs, but the general problem is hopeless.

So, what are we to do? Of course, you can simply ask Bob and his colleagues to double-check each others' code before submitting it. But how do you ensure an adequate standard for them to do that? And, if one check is good, are two better? What you really desire is a mathematical proof that Bob's code is correct. To this end, you hire Alice.

Suppose Bob had the following specification for a project: ‘Write a function that computes the integer part of the square root of a non-negative integer $x$’. Bob chooses to use binary search and writes the following Python code $B$ (nb. // denotes integer division):

def sqrt(x):
    lt, rt = 0, x
    while lt <= rt:
        y = (lt + rt) // 2
        if y*y <= x < (y + 1)*(y + 1):
            return y
        elif y*y < x:
            lt = y + 1
        else:
            rt = y - 1

Inspecting Bob's code, one might be convinced that it is correct, but it is Alice's job to prove that the program satisfies the brief.

We can think of the code as a sequence of arithmetical operations. In doing so, the problem becomes analogous to checking a sequence of equations. When doing such checks, one must be careful to apply the laws of arithmetic correctly. For example, consider the following sequence of arithmetical steps:

$$\begin{aligned} a = b &\Rightarrow a^2 = ab \\ &\Rightarrow a^2 - b^2 = ab - b^2 \\ &\Rightarrow (a - b)(a + b) = b(a - b) \\ &\Rightarrow a + b = b \\ &\Rightarrow b + b = b \\ &\Rightarrow 2 = 1 \end{aligned}$$

This argument uses many familiar laws of arithmetic, but it has clearly gone awry. The subtlety is dividing by the factor $(a - b)$. Such divisions are only applicable in the case that the factor is non-zero, but this is not the case since we assumed $a = b$.

To check Bob's code, Alice needs to establish some fixed laws governing how programs manipulate expressions about the state of the variables. These laws need to be precisely stated so that no unsound inferences such as the division-by-zero example above are admitted. Having such laws, Alice merely needs to follow how the expression $x > 0$ evolves as Bob's program executes and verify that the property $y^2 \le x < (y + 1)^2$ holds at the end.

A Hoare triple specifying the correctness of a square-root program, over faint annotated Python code. — A *Hoare triple*: a pre-condition, a program, and a post-condition. Prove the triple, and you have proved the code correct — without ever running it.

This landmark observation was made by Tony Hoare in 1961. He came up with the Hoare Triple,

$$\{P\}\; C \;\{Q\}$$

in which $C$ is some program code, $P$ is a logical assertion about variables before executing $C$, and $Q$ is a logical assertion about the program state after executing $C$. We call $P$ and $Q$ the pre- and post-condition of the triple, respectively. The laws governing these triples are known as Hoare Logic. An example of a law in Hoare Logic is as follows:

$$\{P[E/x]\}\; x := e \;\{P\}$$

This says: ‘If $P$ is true with term $E$ (representing the number $e$) substituted for the variable $x$, then after the command $x := e$ is executed, $P$ will be true’. Since programs usually contain sequences of commands, we also have the following transitive law:

$$\dfrac{\{P\}\; C_1 \;\{Q\} \qquad \{Q\}\; C_2 \;\{R\}}{\{P\}\; C_1; C_2 \;\{R\}}$$

This expresses the following: ‘If $C_1$ executes with pre-condition $P$ and yields post-condition $Q$, and $C_2$ executes with pre-condition $Q$ and yields post-condition $R$, then $C_1; C_2$ executes with pre-condition $P$ and yields post-condition $R$’.

Thus, to check that Bob's code $B$ behaves as desired, Alice needs only to show that the following Hoare Triple holds:

$$\{x > 0\}\; B \;\{y^2 \le x < (y + 1)^2\}$$

That is: ‘If $x$ is a positive number, then executing $B$ yields a number $y$ such that $x$ is between (inclusive of the lower bound, exclusive of the upper) $y^2$ and $(y + 1)^2$’.

Alice writes a proof for the correctness of Bob's code. Rather than using the laws of arithmetic, Alice does this using Hoare Logic. A standard way of doing such ‘verification’ proofs is by annotating the code line by line with the post-condition (using the laws of Hoare Logic). One first writes the pre-condition and then writes the post-condition after each line of command. Alice may write the following:

def sqrt(x):
    # Precondition: x >= 0
    lt, rt = 0, x
    # { 0 <= lt <= rt <= x }
    while lt <= rt:
        y = (lt + rt) // 2
        # { 0 <= lt <= y <= rt <= x }
        if y*y <= x < (y + 1)*(y + 1):
            # { y^2 <= x < (y+1)^2 }
            return y
        elif y*y < x:
            # { y^2 < x }
            lt = y + 1
            # { 0 <= lt <= rt <= x }
        else:
            # { x <= y^2 }
            rt = y - 1
            # { 0 <= lt <= rt <= x }

Importantly, the condition above the while loop, { 0 <= lt <= rt <= x }, is a loop invariant — analogous to an induction invariant. That is, it is true before and after each iteration of a loop in a program.

This process may be articulated to such precision that it can be executed by a computer. Alice needs only to specify the laws and the pre- and post-condition suitable for the brief. This method enables automatic verification of thousands of lines of code. It is a seminal piece of mathematics that supports the vital role of technology in the modern world. However, today's businesses do not run on thousands of lines of code, but rather on tens of thousands, hundreds of thousands and millions of lines of code. Hoare Logic on its own does not scale to meet industry needs.

In 1999, through foundational work in formal logic, Peter O'Hearn and David Pym invented the ‘separating’ conjunction ($*$) which led to a refinement of Hoare Logic called Separation Logic. The separating conjunction renders program verification compositional: one does not need to tackle all million lines of code at once, but one can look at fragments of the code and fragments of the variables that it touches. It is governed by the law,

$$\dfrac{\{P\}\; C \;\{P\}}{\{P * R\}\; C \;\{P * R\}}\qquad \big(\mathrm{mod}(C) \cap \mathrm{fv}(R) = \varnothing\big)$$

This is known as the frame rule, named after the frame problem in artificial intelligence. It says that a program that executes safely in a small state (satisfying $P$) can also execute in any bigger state (satisfying $P * R$) and that its execution will not affect the additional part of the state (and so $R$ will remain true in the post-condition). This rule has the side condition that the variables modified by $C$ (i.e., $\mathrm{mod}(C)$) are disjoint from the free variables in $R$ (i.e., $\mathrm{fv}(R)$).

Separation Logic and its subsequent developments have scaled to meet the rigorous demands of modern technology. Consequently, your business is safer, and your bank is more secure. Rice's theorem still looms large, so no guarantee is possible, but while these double-checking proofs using modern techniques in logic and informatics are being implemented, we can sleep more soundly. Indeed, entire companies, such as Facebook's Infer, have been established to bring the rigour of mathematical proof to industry.

The power of Separation Logic lies in its unique operator, the separating conjunction ($*$). This operator originates from a nuanced observation about the relationship between implication ($\to$) and conjunction ($\land$ — logical ‘and’) in formal logic. We may write $P \vdash Q$ to denote that $Q$ is a logical consequence of $P$. The connection is the following:

$$P \vdash Q \to R \quad\text{iff}\quad P \land Q \vdash R$$

This is closely related to the notion of a Galois connection in order theory, and adjunction in algebraic topology.

By making subtle modifications to the definition of implication ($\to$), one derives a special kind of implication called the ‘magic wand’ ($-\!\!*$). The difference between $\to$ and $-\!\!*$ is fundamentally about how one ‘discharges’ assumptions in an argument. O'Hearn and Pym derived $*$ from the magic wand by the adjunction above,

$$P \vdash Q \mathbin{-\!\!*} R \quad\text{iff}\quad P * Q \vdash R$$

The subtle difference between $\to$ and $-\!\!*$ results in a significant distinction between $\land$ and $*$.

Let $M$ be a model of a system, such as the memory in the computer on which $B$ is running. Let $P$ and $R$ be properties, such as ‘$x = E$’, meaning ‘memory unit $x$ has value $E$’. If $P \land R$ is true of $M$, then each of $P$ and $R$ is true of $M$. However, if $P * R$ is true of $M$, then there exists a decomposition of $M$ into parts $M_1$ and $M_2$ such that $P$ is true of $M_1$ and $R$ is true of $M_2$. This property enables the frame rule in Separation Logic.

The separating conjunction and magic wand have many other uses than program verification, but we defer discussion to another time.

In practice, while powerful, formal verification of the kind described here is highly resource-intensive and time-consuming, often requiring significant computational effort and expert knowledge. Therefore, it is limited to those companies described above with the need and resources to use it. Additionally, it may struggle to catch certain types of bugs, particularly those related to performance, real-world data, or hardware-induced issues, as it focuses primarily on logical correctness rather than dynamic behaviour. It is for these reasons that not every tech company uses it. Nonetheless, as the mathematical discipline of logic has matured over recent years, formal verification has become more and more tractable.

Other ways of checking correctness are also available. For example, tech companies typically employ protocols such as unit and integration testing. Unit testing involves testing individual components or functions of a program/system in isolation to ensure they work as expected. Meanwhile, integration testing focuses on verifying the interactions between different modules or components to ensure they function correctly together, catching issues that may not be apparent during unit testing.

Over the last century, logic as a mathematical discipline has evolved significantly. It has expanded beyond set theory, theories of arithmetic, and model theory. Rather than being confined to the foundations of mathematics, it has become a versatile area in applied mathematics, enabling representation and reasoning about systems, broadly conceived. The concept of ‘system’ here encompasses, inter alia, transport networks (airports, train lines, bus routes), telecommunication networks (the internet, postal services, radio), manufacturing, logistics, and security policies. The question remains: can we trust these systems to function as intended? The answer lies in rigorous verification. We need to double-check these systems. We need proofs of their correctness.

References

Turing, A. (1936) On computable numbers, with an application to the Entscheidungsproblem. Journal of the London Mathematical Society, vol. 58, pp. 345–363.
Rice, H. (1953) Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, vol. 74, no. 2, pp. 358–366.
Hoare, C. (1969) An axiomatic basis for computer programming. Communications of the ACM, vol. 12, no. 10, pp. 576–580.
O'Hearn, P. & Pym, D. (1999) The Logic of Bunched Implications. Bulletin of Symbolic Logic, vol. 5, no. 2, pp. 215–244.
Reynolds, J. (2002) Separation Logic: A logic for shared mutable data structures. In Proceedings 17th Symposium on Logic in Computer Science, 2002; Copenhagen, Denmark.
McCarthy, J. & Hayes, P. J. (1981) Some philosophical problems from the standpoint of Artificial Intelligence. In Readings in Artificial Intelligence (pp. 431–450). Morgan Kaufmann.

This article was first published in Mathematics Today (January 2025), the magazine of the Institute of Mathematics and its Applications. An open-access record is held at UCL Discovery. Read the original at UCL Discovery ↗