Implementations as Formal Specifications

Implementations are formal, so an implementation may serve as its own formal specification.

That’s a really bad idea, for several reasons.

To start with, correctness is defined with respect to a specification. If an implementation serves as its own specification, then the implementation will be correct by definition, regardless of how badly the implementation fails to satisfy its intended purpose.

Even if the implementation satisfies every requirement of its intended purpose, using the implementation as its own specification will inevitably lead to over-specification, in which accidental features of the implementation will become just as mandatory as features that are essential to its purpose.

Consider, for example, early versions of Microsoft’s Windows operating system. Windows was written in assembly language, and some of its system calls were not very well documented. Without a good specification of those system calls, Windows programmers relied on experiments to discover exactly what those system calls did. Looking at the results of those experiments, a programmer might find that the result of a certain system call is being returned in two different machine registers, rB and rC. Knowing the result is available in both registers, a programmer might write code that overwrites register rB immediately after the system call, relying on the result remaining available in register rC. Microsoft, however, may have intended for that result to be available only in register rB; it was just an accident that the result was also available in register rC. Some time later, when Microsoft rewrites the system call to accommodate some new feature or to fix some bug, the rewritten system call might return the result only in register rB. When the new version of Windows is released, all code that depended on the result being available in register rC will break.

That sort of thing actually happened. Sometimes the code that broke had been written by Microsoft itself. Microsoft ended up having to test the major applications it cared about every time it released a new version of Windows. If Microsoft discovered that an important application was relying on some accidental feature of a system call, then Microsoft had to retain that accidental feature. If Microsoft didn’t care about your application, however, then Microsoft didn’t care whether their new version of Windows broke your application.

That situation improved somewhat after Microsoft replaced Windows with Windows NT, which was written in C instead of assembly language. C is not a terribly high-level language, but it does hide the details of register allocation from C programmers.

For another example, consider Java’s byte code verifier. At OOPSLA 1999, Stephen Freund and John Mitchell pointed out that the original specification of Java's byte code verifier "was an informal English description that was incomplete and even incorrect in some respects." Freund and Mitchell proposed a formal specification, which other academic papers built upon, but the immediate response from Sun Microsystems was to say its informal description wasn't intended to be authoritative, because the byte code verifier served as its own formal specification. (Indeed, the informal specification published in the second edition of the Java Virtual Machine Specification basically consisted of an informal description of algorithms used by Sun's implementation of the byte code verifier.) That meant the only practical way for compiler writers to tell what sequences of byte code were legal and which weren't was to run them through the byte code verifier. Although that produced an authoritative answer for any particular class file, compiler writers needed to know the general rules. As of Java SE 7, those general rules are expressed in Prolog, which provides a declarative formal specification that's independent of both informal specification and the implementation.

One final example: I once asked one of the people who wrote Microsoft’s C# compiler where I could find the official specification for automatic boxing and unboxing in C#. He told me the C# compiler was the only specification for that part of the C# language. Whatever the C# compiler did was correct. That meant, of course, that accidental changes to the C# compiler in future releases could have the effect of redefining the semantics of the C# language. Your C# program might be correct on Tuesday, but become incorrect when a new version of the C# compiler is released on Thursday.