Reading PlusCal Algorithms and Understanding their TLA+-based Error Statements by Gene Cooperman Copyright (c) 2017, Gene Cooperman. This text may be freely distributed and modified as long as this copyright notice remains. In these days of many-core CPUs, writing multi-threaded code is becoming more and more important. Unfortunately, guaranteeing that multi-threaded code is correct is also difficult, even for the experts. For example, the double-checked locking pattern (DCLP) for lock-free multi-threaded programming could not be implemented safely in Java until 2004, and could not be implemented safely in C++ until C++11. This meant that there were many legacy implementations of such code in Java and C++11 that were simply incorrect. Formal verification could have found bugs in those implementations. But formal verification has traditionally been considered difficult, and requiring substantial advanced training. So, developers checked their multi-threaded code by eye, and hoped that there were no rare race conditions (perhaps causing a bug only one time in a million). The PlusCal language is an attempt to make formal verification easily accessible. Please note the PlusCal-examples subdirectory. PlusCal provides a gentler introduction to the TLA+ toolbox that allows you to write an algorithm in a format very close to your original source code. TLA+ is a tool for explicit-state model checking. The idea is that you will re-write your original program in a form that plusCal can read. It is also crucial to add enough labels. The plusCal system will view each labeled statement (including all statements until the next label) as a single atomic statement. (These labels also serve a second person. They will act as our "line numbers" in referring to both your plusCal program and to the translated TLA+ program that is created in the same .tla file.) You choose the level of granularity of your PlusCal model. You decide what set of statements act atomically (are found betwee labels), and TLA+ will then report the results of all possible interleavings of those atomic statements as multiple threads (called process here) execute. For example, compare: PlusCal-examples/bank_account.tla PlusCal-examples/bank_account-assembly.tla to see the classical bank account example for multi-threading, presented first at the C source code level, where no bug exists, and then at the assembly level, where a bad interleaving of assembly instructions produces a bug. Let us now begin a reading knowledge-only examination of TLA+ and PlusCal. For a first look at TLA+, I suggest to ignore all the reference materials, and simply read carefully: PlusCal-examples/two_threads.tla . If you prefer pretty-printing, you can look instead at a pretty-printed version of the same source code: PlusCal-examples/two_threads.pdf Note that the code appears near the top in an "--algorithm threads" statement. This region holds the plusCal version of the code. The TLA+ toolbox allows you to translate that into direct TLA+ code. Note the TLA+ translation appearing after "BEGIN TRANSLATION", lower down in the file. Presumably, you have already written your multi-threaded code in C or a similar language. So, why is it useful to write your code a second time, but this time in PlusCal, and now with the addition of labels to indicate which sequences of statements are atomic? The reason is that the TLA+ toolbox is an explicit-state model checker. The model checker allows you to consider _all_ possible inter-leavings of statements (with some caveats) in the different processes of your PlusCal program. Your original multi-threaded program can only select the random inter-leavings of different threads chosen by the scheduler or your operating system. So, now you examine your code from a new viewpoint. The risk of model checking is that your model may or may not accurately reflect the original multi-threaded code. Note the famous aphorism: "All models are wrong but some are useful". by George Box, in _Robustness in Statistics_, Academic Press, 1979 Now, let's look at the low-level TLA+ code that starts with "BEGIN TRANSLATION". Here, your plusCal program has been translated into a language for explicit-state model checking. Luckily, with a few simple clues, this low-level TLA+ language is also easy to read. The first clue to reading TLA+ is that TLA+ tries to create a model based on a finite state automaton. Each state consists of: the current statement of each thread (indicated by the PlusCal label); and the values of the global variables. See the TLA+ "VARIABLES" statement. If your model allows an infinite number of variable values, then you will have an infinite number of states to check. So, your model should restrict the variable values. For example, maybe all integer values larger than 3 could be represented by "LARGE". The second clue to reading TLA+ is that each paragraph typically refers to a transition between two states. That is why it will refer to both pc and pc'. (Continue reading below for the meaning of pc and pc'.) The third clue to reading TLA+ is that it refers to a pc[] array. This is the "program counter". So, pc["thr1"] refers to the the current program counter, or statement label for process "thr1". The fourth clue to reading TLA+ is that a prime (') after a variable name, such as x', means that this is the value of the the variable x in the next state (the target of the current transition between states). Similarly, pc' refers to the value of the program counter in the next state (i.e., the label of the statement that we will execute next after the current transition). The fifth clue to reading TLA+ is that the "=" operator is the same as "==" in Java or C/C++. Still More important, no variable is ever modified or set to a new value in TLA+! Instead, we are simply comparing the variable values found in the source and target states, during a transition between the two states. Finally, for good or bad, you must get used to reading TLA+ code if you want to understand what your PlusCal program is doing. When TLA+ "compiles" a PlusCal program to TLA+, and when TLA+ runs the resulting TLA+ program within a "model" (analogous to an executable), then any TLA+ error statements will likely refer to the underlying TLA+ code, and not to the PlusCal code itself. TLA+ will provide an error trace of a series of states if a model error is found. But that error trace will refer to TLA+, and not to PlusCal. The program, two_threads.tla, is a good place to start reading PlusCal/TLA+. But when you understand it, consider reading some of the other PlusCal examples in the same directory.