A distributed system (DS) is a collection of autonomous computing sites that neither share a common memory nor a global clock, nor communicate solely by exchanging messages.
When processes interact through shared resources (Critical Section), the integrity of the resources may be violated if the accesses are not coordinated.
In a single-computer system, the status of a shared resource and the status of users are readily available in the shared memory, and solutions to the mutual exclusion problem can be easily implemented using shared variables (e.g., semaphores). However, in distributed systems, both the shared resources and the users may be distributed and shared memory does not exist. Consequently, approaches based on shared variables are not applicable to distributed systems and approaches based on message passing must be used.
The mutual exclusion problem requires that, at a time, only one of the contending processes be allowed to enter its critical section (CS). Mutual exclusion plays an important role in developing designing distributed systems. Problems like replicated data, atomic commitment, distributed shared memory etc. require mutual exclusion. Mutual exclusion is also required in some systems where there is no sharing but the events happening in a system should be mutually exclusion.
Due to the nature of a distributed environment, many failures can occur. These can take place either in a channel or in a node, or in both or sometimes network partitions may also happen. A mutual exclusion algorithm is fault-tolerant if in the wake of a failure, it can organize itself so that it continues to function without any disruptions.
There are some algorithms existing to solve the mutual exclusion problem in distributed systems. Like priority based mutual exclusion algorithm, Quorum based distributed mutual exclusion algorithm and Asynchronous Group Mutual Exclusion. A priority based mutual exclusion algorithm useful in group mutual exclusion problem.
First I make a brief survey of a few existing fault-tolerant ME algorithms along with their characteristics and performance measures. Based on the survey, I choose one of the available algorithms and analysis its behavior. Finally, I propose a ME algorithm which is a modification over the available algorithms.
Abstract: Mutual exclusion (ME) problem in distributed systems has attracted considerable attention over the last two decades. The ME problem requires that, at a time, only one of the contending processes be allowed to enter its critical section (CS). A number of solutions have been proposed for the mutual exclusion problem in distributed systems. These algorithms can broadly be classified as token-based and permission-based algorithms. The matrices that are generally used to compare their performances are message complexity and synchronization delay. Fault tolerance which is an essential requirement in a dynamic distributed environment, where nodes as well as links can fail arbitrarily, is not considered as a performance matrix in most of the existing ME algorithms except a few. In this paper, we make a brief survey of various fault tolerant distributed ME algorithms and present a comparative study of these algorithms based on the performance matrices mainly message complexity, synchronization delay, degree of fault tolerance for different types of failures like communication link failure, node failure and network partitioning.
A distributed computing system is a collection of autonomous computing sites also referred to as “nodes” that do not share a global or common memory and communicate solely by exchanging messages over a communication facility. One of the most interesting problems in the design of distributed systems is the implementation of mutual exclusive execution of critical section. The problem is to devise a mechanism such that each of the communicating sites may have access to a designated section of the code called the critical section (for example such code may involve using a common non-sharable resource). One of the basic properties of such a mechanism is mutual exclusion, which means that no two processes may be allowed to execute their critical sections simultaneously. This mutual exclusion problem in distributed systems is more complex than in centralized systems because of two reasons:
Due to the nature of a distributed environment, many failures can occur. These can take place either in a channel or in a node, or in both or sometimes network partitions may also happen. A mutual exclusion algorithm is fault-tolerant if in the wake of a failure, it can organize itself so that it continues to function without any disruptions. Here, an attempt is made to investigate the existing fault tolerant mutual exclusion algorithms for distributed systems and to compare their performances in terms of performance parameters like message complexity, synchronization delay, degree of fault tolerance, free from starvation and deadlock.
The primary objective of a mutual exclusion algorithm is to maintain mutual exclusion; that is, to guarantee that only one request is granted the accesses to the CS at a time. In addition, the following characteristics are considered important in a mutual exclusion algorithm. Free from Deadlocks, starvation, fairness and Fault tolerance.
Distributed mutual exclusion algorithms can be classified into two groups by a basic principle in their design. These two groups are token-based algorithms and permission-based algorithms. These algorithms are discussed in the next section. The basic principle for the design of a distributed mutual exclusion algorithm is the way in which the right to enter the critical section is formalized in the system. Most of these algorithms assume a fault free channel and nodes which always operate correctly. But due to the nature of a distributed environment, many failures can occur. . Out of all available algorithms, only some algorithms are fault tolerant.
The performance of the algorithms presented here is measured using the total number of messages required for a node to enter the critical section, Synchronization delay (SD) (the time elapsed between exit from CS by one site and entry to CS by the next site), degree of tolerance (number of nodes failure, link failures, network partitions). Complicacy of messages is also considered while evaluating the algorithms.
The effectiveness of an algorithm depends on the suitability of the model as well as the validity of the assumptions made about the distributed environment. Most of all algorithms assume following conditions for the distributed environment:
When reviewing an algorithm, attention should be paid to the assumptions made about the communications network. This is very important because nodes communicate only by exchanging messages with each other. The following aspects about the reliability of the underlying communications network should be considered.
Early algorithms did not consider fault-tolerance issues. An algorithm in a distributed computing system should consider fault-tolerance aspects to detect and recover from failures. A resilient algorithm takes advantage of the high availability of the system in a distributed environment. Even when nodes fail, the rest of the system can still work, albeit with a degraded performance.
The primary objective of a mutual exclusion algorithm is to maintain mutual exclusion; that is, to guarantee that only one request is granted the accesses to the CS at a time. In addition, the following characteristics are considered important in a mutual exclusion algorithm.
The performance of mutual exclusion algorithm is generally measured by the following four metrics:
System Through = 1 / ( sd + E )
Distributed mutual exclusion algorithms can be classified into two groups based on the basic principle in their design. These two groups come of token-based algorithms and permission-based algorithms. The basic principle for the design of a distributed mutual exclusion algorithm is the way in which the right to enter the critical section is formalized in the system. Most of these algorithms assume a fault free channel and nodes which always operate correctly. But due to the nature of a distributed environment, many failures can occur. Out of all available algorithms, only some algorithms are fault tolerant.
In a token-based group the right to enter a critical section is materialized by a special object, namely a token. The token is unique in the whole system. Processes requesting to enter their critical section are allowed to do so when they possess the token. The token gives a process the privilege of entering the critical section. A token is a special type of message. The singular existence of the token implies the enforcement of mutual exclusion. Only one process, the one holding the token, is allowed to enter to its critical section. At any given time the token must be possessed by one process at most.
In the token based approach, two methods are usually used: circulating token and requesting token. In the requesting token method, a node requesting the CS has to obtain the token. The basic problem is how to reach the token holder. In some algorithms, the request is sent to all the nodes because the token holder is unknown, in others a logical structure is defined to point the token holder.
Ye-In Chang, M. Singhal and Ming T. Liu proposed a fault-tolerant token based mutual exclusion algorithm for distributed systems which tolerant communication link and site failures. Here system topology is a graph such that each site has more than one path to the site holding the token. The algorithm is fault-tolerant because a site has alternative paths to search for the token in case of communication link or site failures. In the algorithm, every site communicates only with its neighboring sites and holds information only about its neighbors. When a site wants to execute CS, a sequence of Request messages are sent between the requesting site and the site holding the token, and then the token is passed along the same path in the reverse direction. As the token passes through, the direction of the edges is reversed such that every path always leads to the site holding the token. Due to the reversal of the direction of edges in the graph, a directed cycle may occur resulting in deadlock or long time delay. To avoid directed cycles, a site through which the token has passed, should also reverse the direction of all its outgoing edges.
If case of a single site failure or communication link failure, a request (or the token) can always be sent along alternative paths and a new site to hold the token is elected in case the site holding the token fails. The fault-tolerant algorithm works correctly as long as every site has at least one fault-free path to the site holding the token. When a site comes back from a crash, it reconstructs its local information by exchanging information with its connected neighbors.
When every site has K or more alterative paths, the number of the messages exchanged is O( (2 + K) * log N) for light traffic and is reduced to O(2 + K) messages for heavy traffic, where N is the number of sites in the system. However, the degree of fault tolerance in the system increases as K increases. Here SD is the T*(maximum traversable path length). As the token holding site maintains a FIFO queue for keeping record of pending requests, they are served in FIFO order. So the algorithm is free from starvation.
S Nishio, Kin F. Li and G. Manning proposed a resilient mutual exclusion algorithm which is an extension of the Suzuki and Kazami algorithm, but the movement of the token is different. The nearest node in a logical circular list whose last request has not been granted yet is generated the token. Fault-tolerance, based on time-out values, is incorporated in their algorithm. It is assumed that each node in the system consists of a processor and a communication controller. The algorithm can recover from processor failures, controller failures and communication link failures. A lost token can be regenerated and duplicated tokens are eliminated from the system. The controller of each node k manages message exchanges with other nodes, controls processor k's right to enter its critical section, and is able to regenerate a new token if need be. Processor and controller interact, exchanging information. Based on that information, the controller can decide to regenerate a new token or eliminate a duplicated one.
Request messages include the identification of each node to which it is being sent. A field indicating the age of the token is added to the token message. Each node keeps the age of the most current unique token generated in the system. When a process i have sent a request to all other N-1 nodes in the system, it sets a time-out for the token to arrive. If the token arrives within this time and it is not elder or of the same age of site i, then it is a duplicated token and is discarded. The process re-sends its request to all other nodes and the same procedure is repeated. If the age of the token is at least equal to the age value at site i, the age value of i is updated and it can proceed to enter into its critical section. If the token did not arrive within the time-out, the token regeneration procedure commences. A Token_Missing message is sent to all other nodes. If some of nodes not respond, the procedure starts again. If all other N-1 nodes respond with an ACK message, a new token is regenerated and an incremented age is given to it
The algorithm assumes finite transfer delays and does not require message-order preservation. The number of messages exchanged for a critical section entry is N. The resiliency mechanism presented can be easily modified to include the recovery from a node insertion, or removal. If there is a node or communication link failure, then 3(N-1) message exchanges are required for a critical section entry. If there is no failure, then Synchronization Delay (SD) is 0 or T (T is message transfer time). This is same as that of Suzuki kazami's algorithm. If the token is lost or an invalid token is received, then SD is 3T. If there is a node failure, then SD is more than 3T (here SD is linearly proportional to the node failures). Token is passed to requesting sites in a round robin fashion to sites with outstanding requests. So, this algorithm is free from starvation and deadlock.
In these algorithms, requesting processes wait to obtain permission from a set of processes in the system. Once a process obtains permission from a sufficient number of members in a set, it is allowed to enter the critical section (CS). Only one process at a time can get enough rights to execute its CS. Each node grants its permission to only one node at a time. This ensures the condition for mutual exclusion. Two inter-related aspects are considered in these algorithms to reduce the number of messages exchanged for an entry to the CS to take effect: The number of "enough" rights that should be collected, and which nodes should grant those rights. Some algorithms require that a node should obtain permission from all nodes in the system. In other algorithms, nodes are divided into groups that intersect with each other in a non-null pair wise manner. Any possible group must have one node in common with any other group to ensure mutual exclusion. A node needs to obtain permission only from all the other members in its group.
The concept of obtaining permission from a group of nodes, which are not necessarily a majority, was formalized by Gifford. He introduced the notion of quorums, which are nonempty sets of nodes. Garcia-Molina and Barbara introduced the notion of coteries. A coterie is a nonempty set of quorums in which any two quorums must have at least one common node (intersection property), and no quorum is a subset of any other one (minimality property).
D. Agrawal and A E Abbadi proposed an efficient and fault-tolerant distributed mutual exclusion algorithm using quorums. This algorithm uses a logical tree organization of the network to generate tree quorums, which are logarithmic in the size of the network in the best case. This approach is resilient to both site and communication failures, even when such failures lead to network partitioning. Furthermore, this algorithm exhibits a property of graceful degradation, i.e., it requires more messages only as the number of failures increase in the network.
Given a set of N sites, these sites are logically organized to form a binary tree. The algorithm for constructing quorums can be used with arbitrary trees, however, for simplicity and efficiency assume that the tree is complete, i.e., if k is the level of the tree then it has 2k+ 1 - 1 sites and the root is at level k and the leaves are all at level 0. For the purpose of this presentation, any site could be chosen as the root, and any two sites may be chosen as its children, and so on. A path in the tree is a sequence of sites S1,S2,....,Si, Si+1, . . .>Sn, such that Si+1 is a child of Si. If failures occur, then for each failed Si the quorum contains (recursively) a path of sites starting from Sj and Sk, where Sj and Sk are the two children of Si and ending with leaves. The set constructed by this algorithm is termed as a tree quorum.To enter its critical section, a process at a site i, must send request messages, Req, to a quorum of sites, which is determined using the tree quorum algorithm. Each Req message is time-stamped with a unique local timestamp (each site has a logical clock). Each site maintains a request-queue, where Req messages are ordered in timestamp order. When a Req message is at the head of the queue, the site sends a reply message, Reply, to the requesting site. A process requesting to enter its critical section waits until it receives Reply messages from a set of sites that form a tree quorum before entering its critical section. Once a process exits from its critical section, it sends relinquish messages, Relinq, to all sites in the quorum, so that they may remove the corresponding Req message from the head of the queue. Message complexity is dependent on quorum, if the environment is failure-free, then the size of the tree quorum is minimal i.e. logN. As failure occurs the size increases to the maximum of (N+1)/2. In the absence of failure, SD is 2T. If there is a node failure, SD is 2T+ Reconstruct Quorum Time (RQT). This algorithm is free from deadlock and starvation.
D. Manivanna and M. Singhal proposed a token based fault-tolerant mutual exclusion algorithm. In Nishio et al.'s fault-tolerant mutual exclusion algorithm, when a site detects the loss of token, it requires a positive acknowledge from every other site to generate a new token. This could be time consuming in the presence of failed sites or failed communication links because the failure of even a single site would delay the token generation until that site comes up and sends a positive acknowledge. Moreover, in algorithm, if a site falsely detects token loss, it could cause the deletion of the exiting token, thereby reducing availability of the token. This algorithm solves these problems. This is fault-tolerant and efficient algorithm. In this algorithm in order to generate a new token, only two sites and the communication link connecting them need to be operational. And also, when a site falsely detects token loss and initiates the generation of a new token, it will not prevent other sites from using the existing token to execute critical section.
In this algorithm, in the event of token loss, a new token is generated successfully even in the presence of many failed sites and communication links. This algorithm is resilient to not only token loss but also message loss. When a site falsely detects a token loss and tries to generate new token, this algorithm does not cause the deletion of the existing token. In this algorithm, only two sites need to be up and be able to communicate with each other in order to generate a new token in the event of token loss; these sites are (i) the site which executed the CS most recently, say i, and (ii) the site j to which site i had sent the token after executing CS.
Main idea of this algorithm is when site i wants to execute CS, it sends a request message to every other site. If site i does not receive the token within a timeout, it sends “Tokenlost?” message to every other site. Then it waits of another timeout period to receive reply for this message from other sites. From the information collected from the replies, it finds out the site j which executed CS most recently and requests it to generate a new token. When site j receives this request to generate a new token, it generates a new token if necessary and sends it to the site k to which it had sent the original token most recently. If site i does not receive the token within another timeout period (this may be repeated failure of sites and/or loss of messages), then it sends “Tokenlost?” massage to every other site again and initiates the process of the generation of new token all over again. The determination of the site which executed the CS most recently is done by keeping a count of the total number of times the CS has been executed by all the sites and passing that information along with the token and also records the site id to which it had most recently sent the token.
Suppose in this algorithm, when a site i detects the loss of the token, it sends “token_lost?” message to every other site and collects reply from them. From the reply messages site i comes to know that site j had executed CS most recently. Then site i informs site j to generate a new token. When site j receives this request, it generates a new token and sends it to the site k to which it had sent the original token most recently. Finally site k sends the new token to site i. In this algorithm if site k fails again the token generation process is repeated after a predefined timeout period. But the major drawback of this algorithm if site k fails, site i will not get the token until site k is repaired.
Figure describes the problem of this algorithm. Here where site k failed, after executing the CS and site i is interested for execute CS. After a timeout period site i will send “token-lost?” message to all other sites and corresponding to this request all other non-failure sites will send local token-copy information. Using this information site i knows that site j had executed CS most recently. Finally site i inform site j to generate a token. When site j receives this request, it generates a new token and sends it to site k which it had ent the original token most recently, but site k is failed. So, token generation process is not completed until site k comes up.
In their algorithm an underlying dynamic logical structure is used on the communication network. Requesting processes are logically arranged, by their requests, as a rooted tree. As a request from node i travels along the path from node i to the root node, node i becomes the new parent of each node on the path, except for itself. Thus, node i becomes the new root node of the tree. Neither the nodes nor the token needs to maintain a queue of pending requests. This queue is implicitly maintained by the state of each node in the system. Each node keeps two integer variables, LAST and NEXT. The former indicates the last node from which a request was received and the neighbor node in the path to the root to which this node will send a request message the next time it wants to enter its critical section. NEXT indicates the node to which the token will be granted after this node leaves its critical section.
When a node i want to enter its critical section (CS) and LASTi<> i (node i is not holding the token), it sends a request to node LASTi, it sets LASTi: =i, and waits for the token to arrive. If it has the token or the token arrives, it enters its CS directly.
When the request from node i arrives at a non-privileged node j in the path to the token holder node (the privileged node), node j forwards the request from i to node LASTj. Node j sets LASTj: =i. When the request from node i arrives at the privileged node, say node k, and k is the root node of the tree (LASTk:=k) and it is in its critical section, then node k sets NEXTk:=i and LASTk:=i. If k is the root node and is holding an idle token, then it sends the token to node i and sets LASTk:=i. In the case that node k is not the root node, it forwards the request from node i to node LASTk . The latter happens when node k received a request from another node prior to the request from node i and thus it became part of the path form node i to the root node. When a privileged node k holding the token leaves the critical section, it sends the token to node NEXTk and sets NEXTk:=0. If there were no pending requests (NEXTk:=0), the node keeps the idle token.
When a node i send a request message, it enters a waiting state. If it does not receive the token within Twait, there is a presumption of failure. If a failure has occurred, node i consults if any node k has record of its request and NEXTk =i. After Telec has timeout, it queries the other nodes to detect if the token is present in the system. When this Telec expires, node i becomes a candidate to regenerate the token, broadcasts an election message and activates another Telec. When several other are candidates in the same Telec interval, the one with the smallest node number will be elected. All other nodes in the system become observers and wait for a "candidate elected" message from the elected node to resume operations. The elected node possesses the new token and becomes the root of the reorganized tree. All nodes set their LAST to be the elected node number and NEXT is set to 0. The average number of messages exchanged for an entry to the critical section is in the order of log N. A node holding an idle token does not need to send a request to enter its critical section again.
During the interval delays at the consulting and query phases, the token or an answer from another node may arrive, and the inquiring node goes back to the waiting state. Queries for the presence of the token in the system are recorded by the nodes for the case in which the token is traveling. The algorithm assumes that when a node fails, it continues to fail during the election process. It does not consider what happens when a failed node recovers and is incorporated to the system. This node may be holding an old token. Failures in the communications network are not considered in the algorithm.
R.L.N. Reddy, B. Gupta and Pradip K. Srimani proposed permission based new fault-tolerant mutual exclusion algorithm for a distributed system derived from Maekawa's algorithm. Here used same basic concepts of Meakawa's algorithm in order to minimize the number of messages exchanged. This algorithm functions properly when any number of the cooperating nodes in the system fails. The communication delay may vary but has an upper limit of ? time units. Messages are not guaranteed to be received in the same order in which they are sent. In this algorithm, all the nodes are assumed to form a logical cycle. Each node i has a designated node know as Maker(i) which takes its responsibility in case of its failure. This Maker(i) node is placed immediately right of the node i in the logical cycle. This algorithm tolerates any number of node failures assuming there is no communication link failure.
If a node, after sending a request message or inquire message does not receive a reply within TMAX time units it sends a probe message to see if that node has failed. This probe message has the highest priority to get a response. If a response for a probe message is not received within the maximum time required for a round trip in the network, then the corresponding node is assumed to have failed. Whenever the failure of node i is detected, the node Maker (i) is informed and requested to simulate the failed node i so as to uphold the pair wise non-null property required for the arbitrator sets of the algorithm. The Maker of the failed node starts simulating the failed node and sends a message to all other nodes notifying them that it is simulating a particular node. Then all the nodes which have the failed node in their arbitrator set send their status with respect to the failed node. These status messages carry the time-stamp of the original request so that they can be queued up accordingly.
If there is no contention and no node fails, the critical section access requires (K- 1) REQUEST, (K-1) REPLY and (K-1) RELEASE messages. If a single node fails, then (K-1) Are_You_There, (K-1) Makeup, (N-2) Making and (K-1) Status messages in addition to the regular messages. In the absence of node failure SD is 2T. If node failures occurred, then SD is 4T or more. Here all the requests are assigned with a sequence number which is always in increasing order. The requests are resolved according to this sequence number. As there will not be any cycle in these sequence numbers, this algorithm is free from deadlock. Also it is free from starvation.
G Cao and M Singhal proposed a delay-optimal quorum-based mutual exclusion algorithm it optimizes synchronization delay and message complexity. This algorithm is derived from Maekawa's algorithm. Main idea is “instead of first sending a release messages to unlock the arbiter site which in turn sends a Reply message to the site to enter the CS, a site exiting the CS directly sends a Reply message to the site to enter the CS next.”
As shown in fig Si receives a request from Sj after Si has sent a reply to Sk. On receipt of the request, Si sends a transfer message to Sk to notify it that Sj is the next site to execute the CS. When Sk finishes its CS access, it sends a reply to Sj on behalf of Si. When Sj receives the reply, it gets the permission to enter CS from Si even though the reply was sent by SK.This algorithm requires 3(vN-1) messages to access CS in light load without failure. But under heavy load and in the absence of a node failure, 5(vN -1) to 6(vN-1) messages are required. If a site finds out that another site say Si, has failed, it sends a failure(i) message to all other sites. So, extra N-2 messages are required. In the absence of node failure, SD is only T otherwise SD is 2T. This algorithm is free from starvation and deadlock.
Based on the analysis we made on the fault tolerant algorithms discussed in the previous section, Table gives the performance comparison of the algorithms.
Token based | Message Complexity | SD | Fair ness | Dead lock | Starvation | Fault-Tolerant | ||
Dr. of Node failure | Comm. link failure | Net. partn | ||||||
Chang, Singhal and Liu | O( (2 + K) * log N) | T* (path len.) | Satisfying | Deadlock free$ | Does Not occur | Upto K * | Yes | No |
Nishio, Li and Manning | N+ 3(N-1) ++ | 0 or T + >3T++ | Not satisfying | Deadlock free | Does Not occur | Upto N-1 | Yes | No |
Naimi-Trehel | Log N | T* (path len.) | Satisfying | Deadlock free | Does Not occur | One | yes | No |
Manivanna and Singhal | 0 or N+ 2*(N-1)+3++ | 0 or T+ 6T ++ | Satisfying | Deadlock free | Does Not occur | Upto N-1 | Yes | Yes |
Permission based | ||||||||
Agrawal and Abbadi | logN + (N+1)/2++ | 2T+ (RQT) | Satisfying | Deadlock free | Does Not occur | Upto N-1 | Yes | Yes |
Reddy, Gupta and Srimani | (7√N to 9√N)++ | 2T+ 4T++ | Satisfying | Deadlock free | Does Not occur | Upto N-1 | No | No |
Cao and Singhal | 6(√N-1) + 6(√N-1) +(N-2) ++ | T+ 2T++ | Satisfying | Deadlock free | Does Not occur | Upto N-1 | Yes | No |
*K is the number of alterative paths. + no failure, ++ with failure or not valid token
$ Dhamdhere Kulkarni proved algorithm is not deadlock free.
Table: Comparison of Performance proposed algorithm with existing algorithms
A major task in a distributed mutual exclusion algorithm is to reduce the number of messages exchanged for an entry to the CS to take effect. The number of messages can be reduced considerably if the nodes are logically structured. Different structures are used to reduce the overhead of achieving mutual exclusion. There is a tradeoff between fault-tolerance and message complexity (and also SD) in distributed mutual exclusion algorithms. It is also observed that SD is linearly proportional to node failure. For permission based algorithms requires the high communication overhead to invoke the CS.
(Average-Case Analysis of Path Reversal)
Abstract: The algorithm designed in Suzuki and Kazami algorithm and S Nishio, Kin F. Li and G. Manning' S Algorithm was the very first distributed algorithm to solve the mutual exclusion problem in complete networks by using a dynamic logical tree structure as its basic distributed data structure, viz. a path reversal transformation in rooted n-node trees ; besides, it was also the first one to achieve a logarithmic average-case message complexity. The present paper proposes a direct and general approach to compute the moments of the cost of path reversal. It basically uses one-one correspondences between combinatorial structures and the associated probability generating functions : the expected cost of path reversal is titus proved to be exactly H,-1. Moreover. time and message complexity of the algorithm as well as randomized bounds on its worst-case message complexity in arbitrary networks are also given. The average-case analysis of path reversal and the analysis of this distributed algorithm for mutual exclusion are thus fully completed in the paper.
A distributed system consists of a collection of geographically dispersed autonomous sites, which are connected by a communication network. The sites -- or processes -- have no shared memory and can only communicate with one another by means of messages.
In the mutual exclusion problem, concurrent access to a shared resource, called the critical section (CS), must be synchronized such that at any time, only one process can access the (CS). Mutual exclusion is crucial for the design of distributed systems. Many problems involving replicated data, atomiCcommitment, synchronization, and others require that a resource be allocated to a single process at a time. Solutions to this problem often entail high communication costs and are vulnerable to site and communication failures. Several distributed algorithms exist to implement mutual exclusion. They usually are designed for complete or general networks and the most recent ones are often fault tolerant. But, whatever the algorithm, it is either a permission-based, or a token-based algorithm, and thus, it uses appropriate data structures. Lamport's token-based algorithm maintains a waiting queue at each site and the message complexity of the algorithm is 3(n - 1), where n is the number of sites. Several algorithms were presented later, which reduce the number of messages to e(n) with a smaller constant factor [S. Mishra and P.K. Srimani Fault-tolerant mutual exclusion algorithm]. Mackawa's permissionbased algorithm imposes a logical structure on the network and only requires cvfn messages to be exchanged (where c is a constant which varies between 3 and 5).
The token-based algorithm .4, which is analyzed in the present paper, is the first mutual exclusion algorithm for complete networks which achieves a logarithmic average message complexity; besides, it is the very first one to use a tree.based structure, namely a path reversal, as its basic distributed data structure. More recently, various mutual exclusion algorithms (e.g. R.L.N Reddy and B.Gupta Pradip K. Srimani etc.) have been designed which use either the same data structure, or some very close tree-based data structures. They usually also provide efficient (possibly fault tolerant) solutions to the mutual exclusion problem. The general model used in [I Suzuki and T. Kazami, S. Nishio, K.F. Li and E.G. Manning] to design algorithms .4 assumes the underlying communication links and the processes to be reliable. Message propagation delay is finite but unpredictable and the messages are not assumed to obey the FIFO rule. A process entering the (CS) releases it within a finite delay. Moreover, the communication network is complete. To ensure a fair mutual exclusion, each node in the network maintains two pointers, Last and Next, at any time. Last indicates the node to which requests for (CS) access should be forwarded ; Next points to the node to which access permission must be forwarded after the current node has executed its own (CS). As described below, the dynamic updating of these two pointers involves two distributed data structures: a waiting queue and a dynamic logical rooted tree structure which is nothing
Sites in a distributed system communicate by message exchanges through communication channels and do not share global memory. Mutual exclusion in a distributed system is achieved by a mechanism that ensures involved sites get access to a designated section of code (called the critical section) in a mutually exclusive way. Mutual exclusion has been widely applied to solve many problems, such as replicated data consistency and distributed shared memory. In particular, the release consistency model in distributed shared-memory systems heavily utilizes the concept of mutual exclusion. To measure traffic overhead caused by message exchanges due to mutual exclusion, it is common to adopt the parameter of mean number of messages exchanged per critical section entry, or NME for short. An algorithm that leads to a smaller NME is preferable because it tends to yield lower traffic overhead. From the user's point of view, however, response time appears more important than NME. To allow its wide applicability, a mutual exclusion algorithm should scale well. In addition, an efficient such algorithm ought to avoid experiencing an exceedingly long response time when encountering various critical section request rates or critical section durations. The focus of this paper is on evaluating various distributed mutual exclusion algorithms on two real machines, comparing their behaviors in terms of NME and response time, and taking into account the effects of critical section request rate, critical section duration, and system size. Three algorithms are compared, Nielsen and Mizuno's algorithm with star topology (called the Star algorithm), the improved Ring algorithm and Chang, Singhal, and Liu's algorithm (i.e., CSL in short). Our improved Ring algorithm is a variation of that described earlier exhibits improved performance due to the elimination of unnecessary messages, as detailed in Distributed mutual exclusion algorithms Section.
The behavior of distributed mutual exclusion algorithms is very complex and hard to analyze mathematically. It is difficult to obtain practical insights through theoretical analysis only. Most previous works thus limit their analytic scopes to 786 FU, TZENG, AND CHUNG parameters that can be analyzed, mainly NME, while employing simulation techniques to measure parameters that cannot be analyzed, such as response time. In general, analytic evaluation can predict the complexity of algorithms in terms of orders of magnitude; but it cannot clearly distinguish which algorithm outperforms others when algorithms (such as these investigated in this paper) all have the same order of NME complexity. As a result, early performance studies essentially relied on simulation for evaluating distributed mutual exclusion algorithms. Performance comparison through simulation, while remedying the shortcoming of theoretical analysis, has limitations on assumptions of traffic patterns and program execution, usually failing to reflect actual behaviors of algorithms and leading to impractical results. As a result, it is highly desirable to conduct an empirical study that actually implements these algorithms on real systems for performance comparison to overcome the shortcomings and limitations associated with simulation or analysis.
In order to observe the actual behaviors of these algorithms, we implemented them on the IBM's Scalable POWERparallel System 2 (SP2) and the Intel iPSC_860, collecting empirical results under different critical section request rates and critical section durations. We carried out our study on the IBM SP2 machine of size up to 64 and on the Intel iPSC_860 of size 16 (which is the largest subcube available to users). Our results on both distributed-memory machines suggest that Nielsen and Mizuno's Star algorithm outperforms all other algorithms with respect to response time for most cases, when the critical section is requested by sites repeatedly and no barrier synchronization is involved in the meantime. This is due to the following two facts: (1) the Star algorithm has the lowest NME unless the request rate is extremely high, and (2) while all sites contend for the critical section initially (since they start issuing their first critical section requests within a short interval), fewer and fewer sites experience contention gradually, as only one site is permitted to enter then leave the critical section at a time in sequence and a site does not generate another critical section request until its earlier request has been served. Consequently, sites gradually serialize their generation of critical section requests and soon avoid most contention, as long as no barrier synchronization is involved in the meantime. If every site issues just one request to the critical section before being involved in barrier synchronization, our improved Ring algorithm exhibits the best performance for a high request rate; but the Star and the CSL algorithms are superior to others for a low request rate.
Our experimental results shed some light on the design of distributed systems comprising computers interconnected by networks. In such a common distributed system configuration, critical section execution typically involves some operations that read from, or write to, remote memory locations. If the network for interconnection in the future is made to reduce latency, the critical section duration in such a distributed system becomes shorter. This will make the Star and CSL algorithms more attractive than others.
Distributed mutual exclusion algorithms can be divided into two classes: (1) permission based algorithms, where all involved sites vote to determine which site receives the permission to enter the critical section, and (2) token-based algorithms, in which only the site with the token may enter the critical section. For an algorithm in class (1), a site must obtain the permission from a quorum of involved sites in the same subset before entering the critical section. A system may consist of only one set, which contains all sites [G. Ricart and A.K. Agrawala, I Suzuki and T. Kazami], or multiple subsets. A class (2) algorithm, on the other hand, manages a unique token and guarantees the site acquiring the token to enter the critical section. There are many algorithms in this class. For permission-based algorithms, each site typically communicates with most of the sites and keeps the status information of many sites, creating excessive message exchanges and involving high storage overhead. In general, a permission-based algorithm involves higher communication traffic overhead than a token-based algorithm, because the latter often communicates with fewer sites before entering the critical section, as validated by Chang's simulation results. We therefore focus our attention to token-based algorithms in our empirical study. Among known token-based algorithms, five of them have been shown to achieve better performance, with their average NME values all being O(log N) per critical section entry. The simulation study compared a variety of permission based algorithms and token-based algorithms, indicating that the algorithm by Naimi and Trehel exhibits better NME and time delay. This algorithm adopted the token queue concept, where a queue contains the identifiers of the sites requesting the critical section and the queue is attached to the token message when it travels around the system. The disadvantages of this algorithm are that (1) the size of the token queue is not fixed and could grow up to N&1 in a system with N sites, and (2) time overhead to prepare and process the token message and maintain local information at each site is high.
This token queue overhead has been overcome by the CSL algorithm. To avoid duplicated work with the previous studies in, we therefore selected the CSL algorithm in our study among those described by. Additionally, in considering a variety of measurement issues, three other algorithms are chosen, as stated in sequence. Both the Raymond and the CSL algorithms have the same NME complexity of O(log N) but the former employs a static structure while the latter assumes a dynamic underlined structure. To compare and contrast the impact of logical structures on performance, we included Raymond's algorithm in our portfolio. To measure the impacts of system size, request rate, and critical section duration on performance of algorithms with different orders of magnitude in complexity, we selected the improved Ring algorithm. We also employed Nielsen and Mizuno's algorithm in our study, since 788 FU, TZENG, AND CHUNG their fixed star topology has the merits of a simple structure and easy implementation, potential to exhibit better performance than other algorithms in most cases. The algorithms under this study are reviewed briefly below.
Raymond's algorithm determines and maintains a static logical structure. The logical structure (for example, a spanning tree) is kept unchanged throughout its lifetime, but the directions of edges in the structure change dynamically as the token migrates among sites, in order to point toward the possible token holder. The token moves in response to a token request issued by a site wishing to enter the critical section, and it stay at its current site until a token request arrives. When the token travels over a link (in response to a token request) toward the requesting site, the direction of the link is reversed because the requesting site will eventually become the new token holder after receiving the token. As a result, the directions of edges in the structure always point to the possible token holder, making the token holder a sink node in the structure. Each site has a local queue to hold requests coming from its neighbors and itself. When a request message arrives at a site that has already issued a request, no further message is sent by the site, as the token message will be drawn to the site (by the earlier request). Under such a circumstance, each site has only one outstanding request at any given time, resulting in the local queue length no more than the node degree of the embedding structure. This property eliminates unnecessary exchange messages and keeps the amount of communication traffic low under any request load. Each site wishing to enter the critical section inserts its local request to the rear of its local queue, so that all requests that appear at that site are in a first-come-first-served order. While it is possible to obtain better performance by inserting a locally generated request at the front of the local queue, referred to as the eager Raymond algorithm (because the local site is then allowed to enter the critical section immediately when the token reaches the site), this tends to pose a concern on the fairness of requests being served and is thus not considered here.
We modified the Manivannan and Singhal's algorithm in order to overcome the basic problem: requirements for repairing of failed sites/links to handle failures of sites/links. Our design also provides better guarantees in terms of message complexity and synchronization delay in comparison with Manivannan and Singhal's by reducing number of message exchanges.
When site i wants to execute CS, it sends a request message with sequence number to all other sites. When a site j receives this message, if it has idle token and site i's request is of higher priority (priorities are computed from the implied position of a site in round-robin scheme) then it sends token to site i, otherwise it queues up the request. After receiving the token, site i execute CS. If site i does not receive the token within a timeout period, it sends “is_token_lost?” message to all other sites. Corresponding to this “is_token_lost?” message all non-failure sites send reply to site i. Then Site i waits for a predefined time (depends on maximum round trip time between any two sites in the system) and record the reply messages received within this period. From those replies site i knows that site j has most recently executed CS. The site which has executed the CS most recently can be found by keeping a count of the total number of times the CS has been executed by all the sites and passing that information along with the token. Then site i request site j to generate a new token and finally site j generates and sends a token to site i. If site i does not receive the token within another timeout period (this could be as a result of failure of site j or message loss), then it sends “is_token_lost?” message to every other site to repeat the token generation process until it gets a token.
The system has N sites numbered 0, 1 … N-1 which forms a synchronous fully connected network and there is only one CS in the system. The sites communicate only through messages and do not share any memory or global clock. We make the following further assumptions about message delay and failures in our system:
We use timeouts for detecting token loss and this timeout period is similar to the one used in S. Nishio, K.F. Li and E.G. Manning algorithm.
A site can execute CS only when it possesses the token, a privilege message, which is a data structure of type token-type. It has two fields, namely, total which is an integer and cs_exuted which is an array of N integers.
In this algorithm the following data structure is used for token.
Type
token-type = record
cs_exuted: array [0..N-1] of integer;
/* cs_exuted[i] = no of times site i executed CS */
total: integer; /* total no. of CS executions by all sites */
end;
Initially, token.total =C, where C is an integer constant and >0 and token.cs_exuted[i] = 0, for all i and token is at site 0. When a site holding the token, say i, finishes executing the CS, it increments token.total and updates token.cs_exuted[i]. At any given time, the value of token.total will be equal to “the total number of times all the sites have executed the CS+C ” and token.cs_exuted[i], the number of times site i has executed the CS.
The algorithm at each site i is given below which is represented using Pascal like notation and comments are enclosed by /* and */.
Algorithm at site i
RNi:array [0..N-1] of integer;
token_copyi: token_type;
seq_noi: integer;
token_herei: boolean;
requested-for-CS: boolean;
token-generated-fori: integer;
Procedure Request_CS() is executed when site wants to execute CS. First it increment the sequence number and stores that sequence number in local variable RNi[i] (line 2 to 3). Then if site hold the token, execute CS and update the token values and copy token to local variable (line 4 to 8). If token is not at site then send broadcast request to all sites by mentioning its site id and present sequence number (line 10).
Procedure Handle_CS_Request() is executed when a site receive CS request from other site. It checks weather received request is outdated or not by comparing received sequence number with stored latest sequence number (line 3). If the request is not outdated and if site having the idle token, it send token to requesting site and update token-here variable.
Above procedure Handle_nodefaliure_messagelost() handles token loss, node failure and message loss. If token not received within timeout period it broadcast is-token-lost? message along with site id and sequence number to every site (line 2 and 3). Suppose site i receive this is-token-lost message from site j it checks the sequence number, if the request is not outdated, it update the sequence number which it had stored and if token is there send token to site j (line 4 to 9). If token is not with site i, it sends reply with site-id and token-copy information (line 11. Then site j collects reply messages from all sites and finds out which site (say k) executed the CS most recently (line 12 to 15). Accordingly, it informs the site k to generate a new token. If a site i receive generate-token from site j, and if token is not generated within predefined time period (2*timeout) then site i generates a token and sends it to site j (line 16 to 20). If token is already generated within predefined period it sends site id to whom it has given last generated token (line 22).
The site having the token and which executed CS most recently fails, algorithm generates the new token. This new token's total value may not equal to the total number of times all the sites have executed the CS+C. But this does not affect the system performance.
Procedure On_receive_token() is executed when a site receives token. It checks if the token is valid or not by comparing token.total with local copy. If the token is valid, it updates token-here variable and executes CS (line 2 to 4). Then it updates token values and local copy information and check waiting CS request site and send token to particular site in round robin fashion (line 5 to 12). If the token is not valid it deletes the token (line 13).
A site executes procedure On_site_recovery() when it recovers from failure. Here the site takes the latest sequence number from the persistent stable storage. Also, token_copyi and RNi[j] are reinitialized with their initial values.
Figure 4.1 shows the sequence of steps involved in execution of the algorithm in a normal situation where there is no failure. Initially site 0 is having the token (token :{[1,0,0,0],1}) and site 3 is interested to execute CS. So, it sends CS request message to all other sites. After receiving this CS request message site 0 will send the token to site 3. Finally sites 3 execute CS and update the necessary data structures.
Figure 4.2 shows an arbitrary situation where there site 1 failed after executing the CS (token :{[1,1,0,1],3}) and site 2 is interested for execute CS. After a timeout period site 2 will send “is-token-lost?” message to all other sites and corresponding to this request all other non-failure sites will send local token-copy information. Using this information ((token :{[1,0,0,1],2}) site 2 knows that site 3 has executed CS most recently. Finally site 2 inform site 3 to generate a token. Then site 3 generates a new token and sends it to site 2. Then site 2 executes the CS (token :{[1,0,1,1],3}).
In this section we prove that the algorithm achieves mutual exclusion and it is free from starvation and deadlock.
Lemma 1: If token loss, new token will be generated in finite time.
Proof: If token is lost, one of the sites (say site i) requesting for the CS will timeout and will send “is_token_lost?” message to all other sites. Corresponding to this “is_token_lost?” message all non-failure sites will send reply to site i. From those replies site i knows that site j has executed CS most recently. Then site i request site j to generate a new token and finally site j generates and sends a token to site i. If site i does not receive the token within another timeout period (this could be as a result of failure of site j or message loss), then it sends “is_token_lost?” message to every other site to repeat the token generation process until it gets a token.
Lemma 2: When two sites initiate token generation simultaneously, only one token is generated.
Proof: Suppose site m receives token generation request from site i. When site m receives this message token generation process is enabled at site m. When site m executes token-generation process, it generates a new token and sends it to the site i. During execution of this process site m ignores token generation request from any other site for a predefined time period (2*timeout). This helps in generating a single token corresponding to two simultaneous requests from two different sites. Figure 4.3 shows a situation where site n and site i are sending token generation request to site m simultaneously. In this case site m will generate the token corresponding to the first request (suppose) from site i where as it will send a message to the site n, indicating that a token is generated for the site i.
Theorem 1: At any point of time at most one site holds the token.
Proof: Initially, there is one token in the system at site 0 and a site is allowed to execute the CS only if it has the token (procedure Request_CS(), on_receive-token()). Moreover, the token is forwarded only to a site that has an outstanding request for the CS. If simultaneous two sites initiate the token generation, new token is generated at most one (by Lemma 2). So at any given time, there is at most one token in the system. Hence, at most one site will be executing the CS at any given time.
In this report, we present main existing fault-tolerant distributed mutual exclusion algorithms along with their principles and characteristics. Comparative study of their performance based on message complexity, synchronization delay and degree of fault tolerance based on different types of failures like communication link failure, node failure and network partitioning has been made. We developed an efficient fault-tolerant token-based mutual exclusion algorithm for distributed system which is an extension of Manivannan and Singhal's algorithm. Our algorithm avoids unnecessary message communications of Manivannan and Singhal algorithm. This algorithm gives improved performance by reducing number of messages necessary to detect token loss and generate new token. Synchronization delay is less as compared to [8], by reducing number of rounds of communications which leads to lower response time. The simulation results reveal the performance gains. The algorithm uses a timeout mechanism to find the loss of token or loss of message in the system. Our algorithm can tolerate both node and link failures to a good extent.
Both Manivannan and Singhal algorithm and our proposed algorithm are not handling multiple simultaneously entries into CS, i.e., k-mutual exclusion problem. There could be advantages by using k-mutual exclusion algorithm in a situation where there are k copies of non-sharable resources and at most k processes should be allowed to enter the CS simultaneously. Future work can be extended to handle this situation.
Distributed computing system. (2017, Jun 26).
Retrieved December 14, 2024 , from
https://studydriver.com/distributed-computing-system/
A professional writer will make a clear, mistake-free paper for you!
Get help with your assignmentPlease check your inbox
Hi!
I'm Amy :)
I can help you save hours on your homework. Let's start by finding a writer.
Find Writer