Modeling programs

The goal of the following exercises is to build models that represents the behaviors of given programs. A model is a transition system (or state machine) that describes the behaviors of the program. We will start using the TLA+ toolbox and write models as PlusCal algorithms. These models may abstract some features of the actual programs under consideration. We will learn how to generate finite transition systems from these models. We will also discuss which properties of the programs are preserved by our abstractions and which ones are not.

Foreword on the PlusCal language

PlusCal is an algorithmic language. It differs from programming languages in several directions:

PlusCal algorithms can manipulate arbitrary mathematical objects (sets, sequences, unbounded integers, etc). Programming languages require to implement these objects using low-level types (integers, booleans, pointers, etc) and can only manipulate values from fixed bounded types.
PlusCal algorithms can be non-deterministic. Programs need to be deterministic in order to be executable.
PlusCal has a well-defined notion of an atomic step. In general, the instructions in a program (C, C++, Java, Python, etc) are not atomic. Some mechanisms may be provided to group instructions in atomic steps (e.g. locks)
PlusCal has a formal semantics defined as TLA+ expressions, that, in turns, formally defines the state-space and the runs of the algorithm.

The TLA tools offer three main features:

The translation of PlusCal programs into TLA+ expressions.
The TLC model-checker that allows to find bugs in algorithms, and to prove algorithms with finite resources.
The TLA prover which is not in the scope of this class.

A simple C function

We consider the following C function. Our goal is to ensure that this function terminates, and that the assertion holds when the function terminates.

int f(int x)
{
  while (x != 0) {
    if (x < -1)
      x = 0;
    x = x - 1;
  }
  assert(x >= 0);
}

The file simple_process.tla contains the following PlusCal model of the function f above:

---- MODULE simple_process ----
EXTENDS Integers, TLC

(* PlusCal options (-termination) *)

(*--algorithm simple_process {
  variables
    x \in -3..3;
  {
w: while (x /= 0) {
i:  if (x < -1)
r:      x := 0;
d:   x := (x - 1);
    };
a:  assert(x >= 0);
  }

}
*)
====

What is the meaning of the syntactic constructions that appear in this PlusCal algorithm? Run the command Parse model to check that the model is syntactically correct
Open the file simple_process.tla and check that the assertion holds. What can you conclude?
Replace the assertion by assert(x < 0) and check your model again. What happens?
Set back the assertion as in the model above. Then, add the line PROPERTY Termination in the file simple_process.cfg. Then, check your model again. What happens?
Restrict the initial values of variable x to 0..3. Check termination again. What happens?
Generate the transition system corresponding to the PlusCal model using the command-line tools. Explain your answers to the questions above using the generated transition system. What is a state of this model?
What can you conclude on the C program from what you have observed on the PlusCal model?

A multi-process C program

We consider the C program below which has a shared variable x and a function increment that is executed by several threads (not shown in the code). This function first stores the value of x in a local variable y. Then, it writes y+1 into the shared variable x. We assume that each line is an atomic instruction.

#define N 2

int x = 0;

void * increment(void *)
{
    int y = x;
    x = y + 1;
    return NULL;
}

int main(int argc, char * argv[])
{
    pthread_t T[N];

    for (int i = 0; i < N; ++i)
      pthread_create(T+i, NULL, increment, NULL);

    for (int i = 0; i < N; ++i)
      pthread_join(T+i, NULL);

    return 0;
}

The following code is a PlusCal model of the C program above, with N threads executing the function increment (non_atomic_increment.tla).

---- MODULE non_atomic_increment ----
EXTENDS Naturals, TLC

CONSTANTS N (* Number of processes *)

(*--algorithm non_atomic_increment {
  variables
    x = 0;

  process (P \in 1..N)
  variables
    y = 0;
  {
    l0:    y := x;
    l1:    x := y + 1
  }

}
*)
\* BEGIN TRANSLATION
\* END TRANSLATION

AllDone == \A self \in 1..N: pc[self] = "Done"
Correctness == [](AllDone => x=N)
====

A first model of a concurrent program

Explain the meaning of the new syntactic constructions that appear in this model
The property Correctness that is defined at the end of the PlusCal model expresses that upon termination of all the processes, then x=N. In order to check if this property hold, create the file non_atomic_increment.cfg and add the declaration PROPERTY Correctness in the dedicated section. Does this property hold for N=1 process? For N=2 processes?
Generate the transition system defined for N=1, and then N=2, using the command-line tools (to that purpose, comment PROPERTY Correctness in the cfg file). What is a state of this model? Are the transition systems consistent with your previous answers?

Atomic instructions in PlusCal

Labels in PlusCal algorithms define which blocs of statements are executed in an atomic way. Labeling is crucial to the modeling of concurrent and distributed algorithms where correctness heavily depends on which statements can be executed atomically. Moreover, atomic actions should be carefully defined to correspond to which statements are actually atomic in the system.

PlusCal puts restrictions on labels in order to let the translation in TLA+ be as close as possible to the algorithm. These constraints are described in section 3.7 of the PlusCal user manual. In particular:

each statement must belong to exactly one label
each variable is updated at most once in a label

Our goal is to understand the impact of labeling in concurrent algorithms.

Remove the label l1 in the previous algorithm. Check that the property Correctness holds for N=2 processes.
Output the corresponding transition system. What is a state in this model? What has changed when the label l1 has been removed.
Explain which of the two PlusCal models (with label l1 or without l1) corresponds the best to the C program above (multiprocess.c)

A small model for a C program

We consider the following C program where the lock is initially free, and variables old and new are 8-bytes pointers. The initial values of old and new are unknown. The actual condition in the if instruction is abstract for simplicity (hence the ...) and should be modelled by a non-deterministic choice.

// lock is free, old and new are 8-bytes pointers
do {
  lock();
  old = new;
  if (...) {
    unlock();
    new++;
  }
}
while (new != old);

What is the impact of modeling the if statement as a non-deterministic choice, on the behaviors of the program?
How many initial states has this program? How long does it take to enumerate them at a rate of $10^9$ states per second?
The module lock_old_new below is a skeleton of an abstract PlusCal model of the C program above. In this model, the lock is represented by the boolean variable locked. Finalize the model in such a way that it keeps track of whether old == new (as a boolean variable) instead of the actual values of old and new. The non-deterministic if (...) can be modelled using the either ... or ... instruction.
How many states has your model? Can you prove that the lock() instruction is never blocking in your model? What can you deduce on the actual C program?
Can you use your model to prove termination of the C program?

---- MODULE lock_old_new ----
EXTENDS Integers, TLC

(*--algorithm lock_old_new {
    variables
        locked = FALSE;

    define {
        LOCKED == (locked = TRUE)
    }

    macro lock() { await locked = FALSE; locked := TRUE }
    macro unlock() { locked := FALSE }
        
    {
      ...
    }
}
*)
====