Modeling programs
The goal of the following exercises is to build models that represents the behaviors of given programs. A model is a transition system (or state machine) that describes the behaviors of the program. We will start using the TLA+ toolbox and write models as PlusCal algorithms. These models may abstract some features of the actual programs under consideration. We will learn how to generate finite transition systems from these models. We will also discuss which properties of the programs are preserved by our abstractions and which ones are not.
Foreword on the PlusCal language
PlusCal is an algorithmic language. It differs from programming languages in several directions:
- PlusCal algorithms can manipulate arbitrary mathematical objects (sets, sequences, unbounded integers, etc). Programming languages require to implement these objects using low-level types (integers, booleans, pointers, etc) and can only manipulate values from fixed bounded types.
- PlusCal algorithms can be non-deterministic. Programs need to be deterministic in order to be executable.
- PlusCal has a well-defined notion of an atomic step. In general, the instructions in a program (C, C++, Java, Python, etc) are not atomic. Some mechanisms may be provided to group instructions in atomic steps (e.g. locks)
- PlusCal has a formal semantics defined as TLA+ expressions, that, in turns, formally defines the state-space and the runs of the algorithm.
The TLA tools offer three main features:
- The translation of PlusCal programs into TLA+ expressions.
- The TLC model-checker that allows to find bugs in algorithms, and to prove algorithms with finite resources.
- The TLA prover which is not in the scope of this class.
A simple C function
We consider the following C function. Our goal is to ensure that this function terminates, and that the assertion holds when the function terminates.
int f(int x)
{
while (x != 0) {
if (x < -1)
x = 0;
x = x - 1;
}
assert(x >= 0);
}
The file simple_process.tla contains the following PlusCal model of the function f
above:
---- MODULE simple_process ----
EXTENDS Integers, TLC
(* PlusCal options (-termination) *)
(*--algorithm simple_process {
variables
x \in -3..3;
{
w: while (x /= 0) {
i: if (x < -1)
r: x := 0;
d: x := (x - 1);
};
a: assert(x >= 0);
}
}
*)
====
- What is the meaning of the syntactic constructions that appear in this PlusCal algorithm? Run the command
Parse model
to check that the model is syntactically correct - Open the file
simple_process.tla
and check that the assertion holds. What can you conclude? - Replace the assertion by
assert(x < 0)
and check your model again. What happens? - Set back the assertion as in the model above. Then, add the line
PROPERTY Termination
in the filesimple_process.cfg
. Then, check your model again. What happens? - Restrict the initial values of variable
x
to0..3
. Check termination again. What happens? - Generate the transition system corresponding to the PlusCal model using the command-line tools. Explain your answers to the questions above using the generated transition system. What is a state of this model?
- What can you conclude on the C program from what you have observed on the PlusCal model?
A multi-process C program
We consider the C program below which has a shared variable x
and a function increment
that is executed by several threads (not shown in the code). This function first stores the value of x
in a local variable y
. Then, it writes y+1
into the shared variable x
. We assume that each line is an atomic instruction.
#define N 2
int x = 0;
void * increment(void *)
{
int y = x;
x = y + 1;
return NULL;
}
int main(int argc, char * argv[])
{
pthread_t T[N];
for (int i = 0; i < N; ++i)
pthread_create(T+i, NULL, increment, NULL);
for (int i = 0; i < N; ++i)
pthread_join(T+i, NULL);
return 0;
}
The following code is a PlusCal model of the C program above, with N
threads executing the function increment
(non_atomic_increment.tla).
---- MODULE non_atomic_increment ----
EXTENDS Naturals, TLC
CONSTANTS N (* Number of processes *)
(*--algorithm non_atomic_increment {
variables
x = 0;
process (P \in 1..N)
variables
y = 0;
{
l0: y := x;
l1: x := y + 1
}
}
*)
\* BEGIN TRANSLATION
\* END TRANSLATION
AllDone == \A self \in 1..N: pc[self] = "Done"
Correctness == [](AllDone => x=N)
====
A first model of a concurrent program
- Explain the meaning of the new syntactic constructions that appear in this model
- The property
Correctness
that is defined at the end of the PlusCal model expresses that upon termination of all the processes, then x=N. In order to check if this property hold, create the filenon_atomic_increment.cfg
and add the declarationPROPERTY Correctness
in the dedicated section. Does this property hold forN=1
process? ForN=2
processes? - Generate the transition system defined for
N=1
, and thenN=2
, using the command-line tools (to that purpose, commentPROPERTY Correctness
in thecfg
file). What is a state of this model? Are the transition systems consistent with your previous answers?
Atomic instructions in PlusCal
Labels in PlusCal algorithms define which blocs of statements are executed in an atomic way. Labeling is crucial to the modeling of concurrent and distributed algorithms where correctness heavily depends on which statements can be executed atomically. Moreover, atomic actions should be carefully defined to correspond to which statements are actually atomic in the system.
PlusCal puts restrictions on labels in order to let the translation in TLA+ be as close as possible to the algorithm. These constraints are described in section 3.7 of the PlusCal user manual. In particular:
- each statement must belong to exactly one label
- each variable is updated at most once in a label
Our goal is to understand the impact of labeling in concurrent algorithms.
- Remove the label
l1
in the previous algorithm. Check that the propertyCorrectness
holds forN=2
processes. - Output the corresponding transition system. What is a state in this model? What has changed when the label
l1
has been removed. - Explain which of the two PlusCal models (with label
l1
or withoutl1
) corresponds the best to the C program above (multiprocess.c)
A small model for a C program
We consider the following C program where the lock
is initially free, and variables old
and new
are 8-bytes pointers. The initial values of old
and new
are unknown. The actual condition in the if
instruction is abstract for simplicity (hence the ...
) and should be modelled by a non-deterministic choice.
// lock is free, old and new are 8-bytes pointers
do {
lock();
old = new;
if (...) {
unlock();
new++;
}
}
while (new != old);
- What is the impact of modeling the
if
statement as a non-deterministic choice, on the behaviors of the program? - How many initial states has this program? How long does it take to enumerate them at a rate of $10^9$ states per second?
- The module
lock_old_new
below is a skeleton of an abstract PlusCal model of the C program above. In this model, the lock is represented by the boolean variablelocked
. Finalize the model in such a way that it keeps track of whetherold == new
(as a boolean variable) instead of the actual values ofold
andnew
. The non-deterministicif (...)
can be modelled using theeither ... or ...
instruction. - How many states has your model? Can you prove that the
lock()
instruction is never blocking in your model? What can you deduce on the actual C program? - Can you use your model to prove termination of the C program?
---- MODULE lock_old_new ----
EXTENDS Integers, TLC
(*--algorithm lock_old_new {
variables
locked = FALSE;
define {
LOCKED == (locked = TRUE)
}
macro lock() { await locked = FALSE; locked := TRUE }
macro unlock() { locked := FALSE }
{
...
}
}
*)
====