1. Introduction

Model checking is a well-established formal verification technique that involves the automatic exploration of a system's state space to identify potential errors. It involves exhaustively exploring the system's state space to check whether it satisfies a given specification. Parallel and distributed model checking techniques aim to speed up the verification process by distributing the work among multiple processors or machines. The papers explore various approaches for parallelizing and distributing model checking, with a focus on state space exploration and bounded model checking.

Together, the following three papers represent important contributions to the field of parallel and distributed model checking:

This survey paper provides a comprehensive overview of these papers, their contributions, and their impact on the development of parallel and distributed model checking techniques. Additionally, this survey paper highlights the challenges and opportunities in this field,and discusses potential future research directions.

2. Categories of Parallel and Distributed Model Checking

Parallel and distributed model checking is a broad theme that can be divided into several categories. The first category is parallelization strategies, which includes techniques for distributing the model checking workload among multiple processors or machines. Data parallelism, task parallelism, and pipeline parallelism are examples of parallelization strategies that can be used to speed up the verification process.

The second category is state space exploration, which includes techniques for efficiently exploring the state space of the system being verified. On-the-fly exploration, partial -order reduction, and symmetry reduction are examples of state space exploration techniques that can help reduce the number of states that need to be explored, making the verificatio process more efficient.

The third category is abstraction techniques, which includes techniques for abstracting the system being verified in order to reduce its complexity and make it more amenable to parallel and distributed model checking. Predicate abstraction ,counterexample-guided abstraction refinement, and property-directed reachability are examples of abstraction techniques that can be used to simplify the verification process.

Finally, the fourth category is bounded model checking, which includes techniques for verifying systems with an upper bound on the number of steps or transitions. Bounded model checking is often used to verify systems with large state spaces that cannot be exhaustively explored. By placing an upper bound on the number of steps, bounded model checking can reduce the verification problem to a finite one, making it amenable to parallel and distributed model checking techniques.

3. Parallel State Space Construction: Strategies and Limitations

The first paper works in the category of parallelization strategies. It proposes a technique for parallelizing the model checking process by dividing the state space into smalller tasks and distributing these tasks among multiple processors.

It discusses the problem of constructing labeled transition systems (LTS) using enumerative model checking. this type of model checking involves the state space exploration of LTS which results in state space explosion problem. To overcome this problem, the paper proposes to parallelize state space construction, which reduces both the overall execution time and memory required. The approach is implemented using the Distributor algorithm and the BcgMerge algorithm, which are implemented in the Cadp tool set using the Open/Cæsar and Bcg environments.

The paper defines LTS as a tuple M = (S,A,T,s0), where, S = set of states , A = set of actions ,T = transition relation , a transition (s,a,s') indicates that the system can move from state s to s' by performing the action a. s0 = initial state.

LTS can be represented explicitly or implicitly. In an explicit representation, all possible states and actions are identified and mapped out, whereas in an implicit representation, only the initial state and its successor function are given. The paper aims to use parallelization techniques to convert LTS from an implicit to an explicit representation.

An algorithm called DISTRIBUTOR is developed by the authors, which is used to generate a partitioned LTS. This algorithm is executed on each machine in a network of N machines, and involves exploring all transitions starting from an initial state. The algorithm maintains a list called Li, which contains transitions whose target states belong to the same machine. If the target state belongs to a different machine, the transition is sent to the appropriate machine for processing.

Each machine writes transitions to its own Bcg file, and checks for incoming messages from other machines. Messages may contain new transitions or be related to termination detection. Overall, the DISTRIBUTOR algorithm allows for efficient exploration of the state space in a parallel setting.

In order to detect the termination of parallel LTS generation, the DISTRIBUTOR uses a termination detection algorithm. According to the general definition, (global) termination is reached when all local computations are finished. That means, each machine i has neither remaining states to explore, nor transitions to write in its Bcg file and all communication channels are empty (i.e., all sent transitions have been received). The machines in the system are connected in a virtual ring, and they communicate with each other by sending messages.

Step 2 is to combine the Bcg files created by Distributor into a single file, a new algorithm called BcgMerge is proposed. Distributor assigns unique numbers to states across all Bcg files, but this numbering can have "holes" and start with a number other than 0. BcgMerge fixes this issue by renumbering the states consecutively from 0 to the total number of states in the Lts model. It takes as input the Bcg files generated by Distributor and the index of the file containing the initial state, and outputs a single compact Bcg file that encodes the entire Lts model.

The authors of the paper tested the DISTRIBUTOR and BcgMerge algorithms using the CADP verification tool set on three real-world communication protocols specified in LOTOS: the Havi protocol, TokenRing leader election protocol, and the arbitration protocol for the Scsi-2 bus. The protocols have large state and transition spaces and are used to ensure reliable communication between devices in the network.

The authors measured the performance of the algorithms in terms of speedup, partition function, and use of communication buffers.

The paper evaluates the use of parallelization to generate Lotos specifications for the TokenRing and Havi protocols. Experiments were conducted on a cluster of up to 10 PCs using the Distributor algorithm, which employs hash tables to store state sets explored by each machine in the network. The parallel approach resulted in significant speedup compared to the sequential approach, with increasing speedup as the number of processors used increased. However, the speedup varied depending on the benchmark model and the number of processors used, and diminishing returns were observed as the number of processors became very large. The authors noted good overlap between computations and communications during the execution of Distributor.

To achieve load balancing and evenly distribute the workload among N machines, a static partition scheme was used in the parallel generation algorithm. A hashing function was employed to assign each state to a machine, with P chunks uniformly distributed among N machines using the remainder of the hash-code modulo N. To ensure uniform distribution, P was chosen to be around 1,600,000, limiting chunks to 10 states per chunk. If N is prime, P is set to N, with a single chunk per machine. For non-prime N, P is chosen such that P mod N = 1, ensuring only one machine has one chunk more than others.

The paper proposes the use of communication buffers to reduce communication overhead and increase the overlap between communication and computation. The approach employs an asynchronous, non-blocking implementation of Send and Receive primitives, and a buffering mechanism to store messages in logical buffers. Each machine has a virtual channel to every other machine, with a corresponding logical buffer of size L, while the N-1 virtual channels share the same physical channel with an associated buffer of size Lp. The optimal length of the logical transmission buffer is determined by the formula Lopt = Lp/d(N-1), where d is the message size and N is the number of machines. The experiments showed that Lopt results in maximum overlapping of communications and computations, and leads to a uniform increase in speedup between the variants L=1 (no buffering) and L=Lopt.

In conclusion, the paper proposes a parallel approach for constructing an Lts using a network of N machines, which has been implemented in the Cadp tool set. The approach is language independent and can handle dynamic data structures. The memory overhead required by distribution is negligible, and parallel construction of Ltss provides linear speedups. The scalability of the approach will be assessed using a more powerful parallel machine, and sequential verification algorithms in Cadp will need to be parallelized for larger Ltss.

Limitations

One of the main limitations of this paper is the Complexity: The parallel construction of state spaces requires a significant amount of overhead in terms of hash tables and communication buffers. This can make the implementation of the approach more complex and potentially less efficient in terms of memory usage.

Additionally, The paper focuses solely on the problem of constructing state spaces in parallel and does not address other issues such as on-the-fly verification. While this may have been deliberate in order to obtain meaningful experimental results, it also limits the scope of the approach.

Another limitation is although the Distributor and BcgMerge tools are language-independent, the paper notes that other published algorithms are dedicated to specific low-level formalisms or high-level languages. This means that the approach may not be as widely applicable as other approaches.

Finally, the current implementation of Distributor cannot handle specifications containing dynamic data structures, which may limit the usefulness of the approach for some applications.

4. Comparison of two techniques of Parallel Model Checking

The second paper compare a parallel search technique for model checking called "Hash-distributed search" with an algorithm from the automated planning and heuristic search community called Parallel Structured Duplicate Detection (PSDD) .

Hash-distributed breadth first search(HD-BFS) is based on the common approach for parallel model checking that distribute states among different searching threads by using hash function. HD-BFS works in layers by expanding the nodes at a given depth from the root in parallel until all nodes at the current depth have been expanded. Each HD_BFS thread uses a pair of queues to represent the search Frontier. One is current queue that contains all nodes assigned to the thread that are at the current search depth. Other one is next queue, it contains all nodes assigned to the current thread that are at the next search depth.

During the search process, each thread examines one node at a time and expands it to discover new information. If the thread comes across a new node that belongs to a different thread, they need to communicate to share information. However, if the new node belongs to the same thread, the thread checks whether it's a duplicate or not. If it's not, it gets added to the thread's next queue without requiring communication with other threads.

When a thread receives a new node from another thread, it checks if the node is a duplicate by testing its membership in the local hash table. If it's not a duplicate, the node is placed on the thread's next queue, which is appropriate because all threads are expanding nodes at the same depth from the root. Once all threads have no nodes in transit and empty current queues, the current depth layer is fully expanded. At this point, the threads swap their current queue with the next queue and start searching at the next depth. If the current queues are empty, the search ends.

One drawback of the hash-distributed search technique is that it can cause delays in detecting duplicate nodes, which can lead to increased memory usage. When nodes are transferred between threads, they are added to the receiving thread's queue and are not checked against the hash table immediately, resulting in duplicate nodes remaining in the queue and using up additional memory.

Another disadvantage of hash-distributed search is its compatibility with partial-order reduction. This technique reduces the search graph size in model checking, but hash-distributed search may assign nodes to different threads, making it difficult to test their completeness. This can limit the effectiveness of partial-order reduction and result in hash-distributed search performing worse than a serial breadth-first search, particularly with multiple threads, as the search space grows with the number of threads used.

The hash function used for load balancing in hash-distributed search often led to nodes being assigned to threads other than the ones that generated them. To address this issue, Burns et al. proposed using a homomorphic abstraction function to distribute nodes more structurally, assigning each thread a set of nodes in an abstract representation of the search graph. This approach has the advantage of allowing threads to handle newly generated nodes locally, reducing the need for communication. However, it may also result in less load balancing. Using an abstraction has been shown to significantly improve the performance of the HDA* algorithm in real-world applications such as puzzle solving and planning problems.

Parallel Structured Duplicate Detection is a technique used in parallel model checking algorithms that aims to reduce the amount of redundant work performed by each thread. The idea is to maintain a hash table that stores information about the states that have been explored by each thread, along with a unique identifier for each state. When a new state is generated by a thread, the identifier for that state is checked against the hash table to determine whether the state has already been explored by another thread.

If the state has already been explored, the thread can simply discard the state and move on to the next one. If the state has not been explored, the thread can add it to the hash table and continue exploring from that state. This way, each thread can avoid exploring duplicate states and focus on new, unexplored states.

The challenge in implementing this technique lies in efficiently maintaining the hash table across multiple threads. To address this, the paper proposes using an abstraction to map the state space to a smaller, more manageable space. The abstraction function maps a state to a unique identifier that can be used as a key in the hash table. By using an abstraction function that maps similar states to the same identifier, the hash table can be kept relatively small and efficient, even when exploring large state spaces.

Overall, Parallel Structured Duplicate Detection is a powerful technique for reducing redundant work in parallel model checking algorithms, and the paper provides a detailed analysis of its effectiveness in various applications.

PSDD has two advantages over hash-distributed search. Firstly, it requires less synchronization between threads because threads only need to synchronize access to the abstract graph when releasing and acquiring a new duplicate detection scope. Secondly, duplicates can be checked immediately without the need for extra memory to store duplicate nodes. PSDD also has an additional benefit for model checking as it does not need to be conservative when performing partial-order reduction. This is because the expanding thread has exclusive access to the data structures for the duplicate detection scope of the abstract node from which it is expanding, allowing it to test if the successors it is generating pass the Q proviso. This gives PSDD a major advantage over both HD-BFS and Spin's multi-core depth-first search on many models.

PSDD requires an abstract representation to utilize the local structure of the state-space graph. Since the state space is not explicitly stored in memory, an abstraction function must be computed on each node. In Spin, each state in the search space consists of a set of processes represented by a finite automaton. To create an abstract representation, the process type and automaton state of a subset of the process IDs are considered, which reduces the number of abstract nodes and leads to a smaller set of successors in the abstract graph. The transitions of the finite automaton are used to determine the predecessor and successor relations in the abstract graph. The abstract graph is constructed in parallel with the search, rather than as a pre-processing step, for efficiency.

They conducted a series of experiments to evaluate two methods for parallelizing breadth-first search and compared them to Spin's multi-core depth-first search (where relevant). The experiments were carried out on a machine equipped with two 3.33GHz Xeon 5680 processors, each with six cores, and 96GB of RAM.

Spin tool has a multi-core depth-first search algorithm where threads explore nodes within a specific depth range and pass on the nodes outside the range to neighboring threads. This technique is effective in benchmark models but may not work well with partial-order reduction. In the experiments, the paper compared this algorithm with other techniques on models without safety violations to ensure comparable performance since safety violations can impact the results. All algorithms must explore the search space thoroughly for models without safety violations.

The authors compared the memory usage of PSDD and HD-BFS algorithms in a 10-philosopher model of the dining philosophers problem. They expected HD-BFS to use more memory due to delayed detection of duplicate search nodes. PSDD and breadth-first search used less than 3 GB of memory, while HD-BFS required significantly more, especially with multiple threads. HD-BFS had to store duplicate nodes in memory during communication, causing the increased memory usage. AHD-BFS used less memory than HD-BFS but still required more than PSDD and breadth-first search with more than two threads. In other experiments, HD-BFS required a lot more memory, possibly due to duplicate nodes or conservative partial-order reduction.

An experiment was conducted to compare the impact of conservative partial-order reduction on three parallel model checking techniques: hash-distributed search, Spin's multi-core depth-first search, and PSDD. A real-world model of the semaphore implementation from "Plan 9 from Bell Labs" was used. The results showed that breadth-first search used the least amount of memory and expanded the fewest number of nodes to exhaust the configuration space of the model. PSDD used about the same amount of memory as breadth-first search, but it expanded slightly more nodes due to different node expansion orders. HD-BFS initially performed similarly to breadth-first search with a single thread, but its expansion and memory requirements increased rapidly with more threads due to increased communication and conservative partial-order reduction. Similarly, Spin's multi-core depth-first search also suffered from conservative partial-order reduction and expanded more states than PSDD and breadth-first search with more than a single thread.

The study compared four algorithms on four models based on their parallel speedup and wall-clock time. Two of the algorithms, PSDD and AHD-BFS, used abstraction and a fixed subset of processor IDs in the projection experiment. PSDD performed well with a simple abstraction, indicating that finding a suitable abstraction is relatively easy. PSDD outperformed the other algorithms in terms of wall-clock time with more than three threads. HD-BFS did not perform well with more than one thread due to its conservative use of partial-order reduction, resulting in a larger search space. AHD-BFS and Spin's multi-core depth-first search had poor load balancing among threads due to their use of abstraction. Spin's algorithm was faster than serial breadth-first search on the semaphore model with more than four threads.

Results showed that PSDD was more memory-efficient and had better parallel speedup and faster search times than the other two methods. The PSDD framework was originally designed for external-memory search, which can reduce memory usage significantly. Testing external PSDD on a machine with eight cores and four disks showed a 500% reduction in memory usage compared to standard PSDD, making it an attractive option when memory is limited.

Additionally, the paper demonstrates that PSDD is effective for parallel reachability analysis and retains the full power of Spin's partial-order reduction algorithm. Future work will focus on applying PSDD to other model checkers to demonstrate its versatility and effectiveness in speeding up search with full partial-order reduction.

Limitations

One significant limitation is the trade-off between abstraction and precision. Abstraction is a technique for simplifying a model by removing details that are not relevant to the property being verified. While this can help reduce the computational complexity of the model checking process, it can also result in a loss of precision, which may lead to false positives or false negatives. Therefore, the accuracy of the results obtained using this technique may be limited by the quality of the abstraction used.

Another potential limitation is the need for specialized hardware or software to support parallelization. Parallelization can improve the scalability and efficiency of model checking by allowing multiple processors or nodes to work on different parts of the problem simultaneously. However, this requires specialized hardware or software that may not be readily available or feasible to implement in all cases.

The effectiveness of the approach may also be limited by the size and complexity of the model being checked. While abstraction and parallelization can improve scalability and efficiency, they may not be sufficient to handle extremely large or complex models. In such cases, other techniques, such as distributed model checking, may be more appropriate.

5. Proposed Technique for Parallel Analysis of Concurrent Programs

The third paper proposes a structure-aware parallel technique for analyzing concurrent programs by decomposing concurrent traces into symbolic subsets and exploring them in parallel using multiple instances of the same decision procedure. The approach leads to significant speedups and scalability in analyzing complex multi-threaded programs, and outperforms general-purpose parallel solvers.

Model checking is a formal verification technique used to check the correctness of a program by exploring all possible execution paths up to a certain bound. However, this process can be computationally expensive and time-consuming, especially for multi-threaded programs that have a large number of possible execution paths. The proposed technique aims to overcome these limitations by distributing the workload among multiple computing nodes and exploring the execution paths in parallel. The technique consists of several components that work together to achieve parallelism and distribution of the workload. The first component is the symbolic execution engine, which generates a set of symbolic execution trees from the program code. Each symbolic execution tree represents all possible execution paths of the program up to a certain bound. The symbolic execution trees are then partitioned into smaller subsets, which can be explored in parallel.

The second component is the state exploration engine, which explores the symbolic execution trees to identify potential bugs or errors. To parallelize this process, each computing node has its own state exploration engine that explores a subset of the symbolic execution trees in parallel with the other nodes. The state exploration engines communicate with each other to share the explored states and to avoid exploring the same state twice.

The third component is the distributed task scheduler, which distributes the tasks among the computing nodes. The task scheduler uses load balancing techniques to ensure that each node has an equal workload and that the execution time is minimized. The task scheduler also monitors the progress of each node and adjusts the workload distribution accordingly.

The fourth component is the communication protocol, which ensures the correct exchange of messages between the nodes. The communication protocol defines the format of the messages exchanged between the nodes and the rules for handling the communication errors. The communication protocol also ensures that the state exploration engines explore the symbolic execution trees in a coordinated manner and avoid conflicts.

The proposed technique aims to improve the efficiency and scalability of the model checking process for multi-threaded programs. By distributing the workload among multiple computing nodes and exploring the execution paths in parallel, the technique allows for the exploration of a larger number of execution paths in a shorter time, which can improve the accuracy of the analysis and increase the chances of detecting bugs or errors.

The paper proposes a modification to the propositional solver used in the bounded model checker. Specifically, the paper introduces a new parallel propositional solver that can operate efficiently in a distributed environment. The new solver is designed to support the distribution of work across multiple computing nodes, which can significantly reduce the time required to solve the SAT (satisfiability) problems generated during the model checking process. The parallel propositional solver uses a divide-and-conquer approach to split the SAT problems into smaller sub-problems, which can be solved independently by different nodes. The paper also introduces a communication protocol to ensure that the sub-problems are solved in a coordinated manner, and the solutions are combined correctly.

It proposes modifications to the bounded model checker to enable it to work in a distributed environment. Specifically, the paper introduces a new distributed bounded model checker that can distribute the work among multiple computing nodes. The paper proposes a partitioning technique to split the symbolic execution tree of the program into smaller trees that can be analyzed independently by different nodes. The paper also introduces load-balancing techniques to ensure that the work is distributed evenly among the nodes and to avoid conflicts between them. Additionally, the paper proposes a new communication protocol to ensure the correct exchange of messages between the nodes and that the state exploration engines explore the symbolic execution trees in a coordinated manner.

The prototype implementation consists of several components, including the parallel propositional solver, the distributed bounded model checker, and the communication protocol. The parallel propositional solver is implemented using a divide-and-conquer approach that splits SAT problems into smaller sub-problems and solves them independently on different nodes. The distributed bounded model checker is implemented using a partitioning technique that splits the symbolic execution tree of the program into smaller trees and analyzes them independently on different nodes. The communication protocol is implemented to ensure that the nodes communicate correctly and that the work is distributed evenly.

The prototype implementation is evaluated on a set of benchmarks, and the results show that the proposed parallel and distributed technique can provide significant speedups over the traditional bounded model checking approach. The evaluation also shows that the prototype implementation can scale to a large number of computing nodes and that the overhead of communication is relatively low. The paper concludes that the prototype implementation provides a promising approach for efficiently analyzing the state space of multi-threaded programs.

The paper evaluated the scalability of the proposed approach by running experiments on a cluster of up to 256 cores. The results showed that the approach achieved good speedup and efficiency up to 128 cores, but the scalability decreased after that due to the communication overhead. The paper suggests that this limitation could be addressed by using more advanced communication protocols or by distributing the work across multiple machines.

It also compared the performance of the proposed approach with two state-of-the-art parallel SAT solvers, namely MiniSat and Glucose. The experiments showed that the proposed approach outperformed both solvers on all benchmarks, with speedups ranging from 1.6x to 6.8x over MiniSat and from 1.2x to 2.8x over Glucose. The paper attributes this improvement to the use of the partial-order reduction technique and the optimized parallelization strategy.

Finally, paper also compared the performance of the proposed approach with the sequential bounded model checker CBMC. The experiments showed that the proposed approach was able to solve some benchmarks that CBMC could not due to memory limitations. However, CBMC was faster than the proposed approach on some other benchmarks that had small state spaces.

Limitations

While the paper proposes a promising approach to parallel and distributed bounded model checking of multi-threaded programs, it also has some limitations. The paper lacks a comprehensive evaluation of the performance of the approach on a large number of real-world programs. Additionally, the paper assumes that the programs being checked are written in C, which limits its applicability to programs written in other languages. Another limitation is that the approach presented in the paper is designed to work with a single machine, which restricts its scalability. Finally, the prototype implementation of the approach is not publicly available, making it difficult for other researchers to validate or extend the results.

6. Synergies between the three papers

The three papers on parallel and distributed model checking for concurrent programs can be seen as complementary approaches to improve the efficiency and scalability of this verification technique.

The first paper proposes a task parallelism strategy to distribute the workload among multiple processors. While this approach can improve performance, it may still suffer from the state explosion problem, where the number of states to explore grows exponentially with the size of the system being verified.

The second paper proposes the use of predicate abstraction to reduce the size of the state space, thereby improving the efficiency of the verification process. However, this approach may not be sufficient for very large or complex systems.

The third paper proposes a bounded model checking approach for multi-threaded programs, which limits the number of steps to explore while taking into account the interactions between threads. This approach can help overcome the state explosion problem and improve the scalability of the verification process.

By combining these approaches, it may be possible to further improve the efficiency and scalability of parallel and distributed model checking. For example, one could use the task parallelism strategy proposed in first paper to distribute the workload among multiple processors, while also using the abstraction techniques proposed by second paper to reduce the size of the state space. The bounded model checking approach proposed by third could also be used to verify specific properties of multi-threaded programs within a bounded number of steps.

7. Conclusion

The approaches described in the three papers are all reasonably mature and have been shown to be effective in certain contexts. However, it's worth noting that parallel and distributed model checking is still an active area of research, and there are many open problems and challenges that have not been completely solved.

First paper is just the strting point of the research, it does not put forward any efficient technique. But it does provide a platform for other two papers. Second and third papers both try to solve the problem of vast state space. Second paper tery to solve this by abstraction and detecting duplication and third by limiting the steps or binding the depth.

None of the approaches described in the papers completely solve the central problem of parallel and distributed model checking, which is how to efficiently verify systems with large state spaces. The state explosion problem, where the number of states to explore grows exponentially with the size of the system, remains a major challenge for model checking.

It's definitely feasible to think about extending these approaches to solve more of the central problem of parallel and distributed model checking. For example, we can implement the approach discussed in second paper named Parallel Structured Duplicate Detection (PSDD) as a feature of PBMC(Parallel Bounded Model checking) a approach proposed in third paper. Or we can completly opt for a different bounding technique in PBMC like partial-order reduction, to reduce the state space.