DOI QR코드

DOI QR Code

Bidirectional Chain Replication for Higher Throughput Provision

  • Mostafa, Almetwally M. (College of Computer and Information Sciences, King Saud University) ;
  • Youssef, Ahmed E. (College of Computer and Information Sciences, King Saud University) ;
  • Aljarbua, Yazeed Ali (College of Computer and Information Sciences, King Saud University)
  • Received : 2018.02.16
  • Accepted : 2018.06.10
  • Published : 2019.02.28

Abstract

Provision of higher throughput without sacrificing consistency guarantees in replication systems is a critical problem. In this paper, we propose a novel approach called Bidirectional Chain Replication (BCR) to improve throughput in traditional Chain Replication (CR) through better utilization of computing and communication resources of the chain. Unlike CR where the whole replicated data store is treated as a single unit, in BCR the replicated shared data at each server in the chain is split into two disjoint Logical Partitions ($LP_1$, $LP_2$). This forms two chains running concurrently on the same hardware in two opposite directions; the first chain ($CR_1$) exclusively manipulates data objects in $LP_1$, while the second chain ($CR_2$) exclusively manipulates data objects in $LP_2$, therefore, conflict is avoided and concurrency is guaranteed. The simultaneous employment of these two chains results in better utilization of hardware in the sense that the two chains can evenly share the workload, hence, throughput can be improved without sacrificing consistency. Experimental results showed an improvement of approximately 85% in throughput of BCR over CR.

Keywords

1. Introduction

The fact that databases are increasingly deployed by various distributed systems over the recent years, has caused a dramatic growth in the importance of data replication. Replication is the process of copying data in one server to other (n-1) servers (i.e., replicas) to provide high availability and fault tolerance. In replication systems, as many as (n-1) servers can fail without compromising data availability. In addition, client request processing proceeds in case of server failure. However, maintaining data consistency represents a critical challenge in these systems. Consistency guarantees assert that operations to query and update individual objects are executed in some sequential order and the effects of update operations are necessarily reflected in results returned by subsequent query operations [31 and 32]. In order to ensure that all data ends up on all replicas, every write (update) request to the data needs to be processed by every replica in its local data store, otherwise the replicas would no longer contain the same data.

Strong consistency guarantees are often thought to be in conflict with high throughput and availability [23, 32 and 34]. Hence, many replication systems sacrifice throughput or availability to support strong consistency guarantees. For example, Paxos [15-18 and 26] and Primary Backup Replication (PBR) [11] employ a single server (i.e., leader) to ensure that consistency and serialization are applied all the time. Accordingly, during efficient, failure free operations, all clients communicate with the single leader at all times. This limitation has an important consequence since it reduces throughput by placing a disproportionality high load on the leader, which must process more messages than the other replicas [28 and 30]. Moreover, in case of leader failure, throughput drops to zero until a new leader is elected. Hence, the throughput of the replication cell is limited by the performance of the leader. In other words, the employment of single leader technique to maintain consistency causes a problem of inefficient utilization of the available servers in the replication cell and imbalanced load distribution among these servers. This problem, in turns, results in performance degradation.

Chain replication (CR) [32-36] was first proposed by Renesse et. al. [32] to improve throughput in replication systems while maintaining consistency. They showed that in a large-scale storage system, maintaining strong consistency is not in conflict with achieving high throughput and availability. The chain consists of four linearly ordered servers, the first server is called “the head” and the last server is called “the tail”. The main idea is to classify client requests into two different classes, the write-requests and the read-requests. The head is assigned to exclusively process the write (update) requests over the entire replicated data, while the tail exclusively performs the read (query) requests over the whole replicated data store. In CR, workload (update and query processes) is divided between the head and the tail, resulting in better resource utilization and an improved throughput. To maintain consistency, CR employs a single writer (head) and a single reader (tail), consistency is guaranteed since read requests are processed only by the tail. Although CR simultaneously supports strong consistency and high throughput, it suffers from some limitations. Firstly, communication and computing resources of the chain are not fully utilized since the data are transferred and processed in only one direction (i.e., from the head to the tail). Secondly, the replicated data are still treated as a single unit with coarse-grained.

The objective of this research is to improve throughput in CR without sacrificing consistency through better utilization of computing and communication resources. To achieve this goal, we propose a novel approach called Bidirectional Chain Replication (BCR). Like CR, BCR consists of four linearly ordered servers connected to form a chain, thus, CR and BCR have the same hardware. In contrast to CR where the replicated data store on each server are treated as a single unit, in BCR the replicated data on each server in the chain are split into two disjoint Logical Partition (LP1, LP2). BCR forms two chains; the first chain (CR1) exclusively manipulates data objects in LP1 and the second chain (CR2) exclusively manipulates data objects in LP2, therefore, conflict is avoided and the two chains can work concurrently on the same hardware in two opposite directions. The simultaneous deployment of these two chains results in better utilization of hardware in the sense that the two chains can evenly share the workload, therefore, throughput can be improved.

Experimental performance evaluation of BCR showed an improvement of approximately 85% in throughput over traditional CR. Practically, we can merge every two physical partitions used by CR into a single physical partition in BCR which reduces the number of the needed physical partitions using the same set of servers in a datacenter. The main contribution of this work is to provide a method for attaining higher throughput in replication systems through better utilization of computing resources without sacrificing consistency.

The rest of this paper is organized as follows: in Section 2, we briefly review essential concepts related to chain replication. In Section 3, we review related work. In Section 4, we describe our proposed BCR approach in detail. In Section 5, emprical evaluation for BCR is described and the results are analyzed and interpreted. Finally, in Section 6, we give our conclusions and suggestions for future work.

2. Chain Replication

Renesse et. al. [32] proposed Chain Replication, CR, an approach that simultaneously supports high throughput, availability, and strong consistency in large-scale storage service. In this approach, the servers replicating data are linearly ordered to form a chain, as shown in Fig. 1, the first server in the chain is called “the head” and the last server is called “the tail”. Request execution is implemented by the servers roughly as follows: each update-request is directed to the head and executed at its local store, then the state changes are forwarded along a reliable FIFO link to the next server of the chain which updates its local data store and forwards the changes to the next server and so on until the changes reach the tail. At this point, the tail sends a reply (write-notification) to the client and the write finishes. An update acknowledgement is generated at the tail and is sent along the chain until it reaches the head. Query requests are directed to the tail and processed there automatically. The reply for every query request is generated by the tail and sent to the client. Since all the values stored in the tail are guaranteed to have been propagated to all replicas, reads are always consistent [36].

 

Fig. 1. Chain Replication

Unlike Paxos and Primary Backup Replication (PBR) where all decisions for update and query requests and their replies are made by a single server (i.e., the leader), CR deploys a better resource utilization procedure to improve throughput. The workload is distributed among the head and the tail by splitting client requests into updates (writes) which are processed by the head and queries (reads) which are processed by the tail, the two servers (head and tail) work concurrently. However, communication and computing resources are not fully utilized since the data are transferred and processed in only one way (i.e., from the head to the tail).

3. Related work

A wide variety of replication techniques has been introduced in literature [1-4]. Generally speaking, replication techniques can be classified into active replication and passive replication. In active replication [5-7], all replicas execute all client requests in the same order, assuming that the process hosted by the replicas are deterministic. Deterministic means that, given the same initial state and a request sequence, all processes will produce the same response sequence and end up in the same final state. The Chubby lock service for loosely coupled distributed systems [20], the Spanner, as Google’s globally distributed database [21], and PaxStore [19] are examples of services that utilize active replication. The disadvantage of active replication is that in practice most of the real world servers are nondeterministic. In passive replication [8-14], there is one replica, the primary, which executes the client requests and propagates the new states to all other replicas (backup servers). Then, the backups apply updates in the same order sent by the primary. If the primary server fails, one of the backup servers takes its place. Passive replication may be used even for nondeterministic processes. One of the service that sometimes utilizes passive replication is Zookeeper [22]. However, the disadvantage of passive replication compared to active replication is that in case of failure the response is delayed.

Paxos [15-18 and 26] is a widely used active replication technique. It is a consensus protocol that results in an agreement on an order of inputs among a group of replicas, even when the replicas in the group crash and restart or when a minority of them permanently fail. Paxos classifies the processes by their roles in the protocol: Client, Proposer, Acceptor, Learner, and Leader. A single processor may play one or more roles at the same time without affecting the correctness of the protocol. The Client issues a request to the proposer, and waits for a response. The proposer promotes the client request, attempting to convince the acceptors to agree on it. Learners act as the replication factor for the protocol. Once a client request has been agreed on by the majority of the acceptors, the learner may execute the request and send a response to the client. The leader is a distinguished proposer that acts as a coordinator to move the protocol forward when conflicts occur.

The Primary-Backup Replication (PBR) protocol [11] is massively used in passive replication. In this approach, one server, the primary, perform the following jobs [32]: 1) imposes a sequencing on client requests to ensure that strong consistency holds, 2) executes locally the client requests, 3) distributes to other servers (backups) the client requests resulting updates, 4) awaits for acknowledgement from all non-faulty backups, and 5) send a reply to the client after receiving those acknowledgements. If the primary fails, one of the backups is elected to that role.

Both Paxos and PBR employ a single server (leader/primary) to manage consistency and serialization. Accordingly, during efficient, failure free operations, all clients communicate with the single leader at all times. This limitation has an important consequence since it impairs throughput and scalability by placing a disproportionality high load on the leader, which must process more messages than the other replicas [28 and 30]. Moreover, in case of leader failure, throughput drops to zero until a new leader is elected. Several approaches have been proposed to resolve the issue of a single leader bottleneck. Their main objective is to improve replication cell performance by distributing consistency management responsibilities among servers. These approaches include different variants of Paxos such as Multi-Paxos [16 and 17], Fast Paxos [27], Mencius [28], generalized Paxos [29], EPaxos [30], Object Ownership Distribution (OOD) [13], and chain replication [32-36].

Chain replication (CR) is intended for supporting large-scale storage services that exhibit high throughput and availability without sacrificing strong consistency guarantees. In chain replication, the primary’s role in sequencing requests is partitioned between two servers. The head sequences update requests; the tail extends that sequence by handling query requests. This sharing of responsibility enables lower-latency and lower-overhead processing for query requests, because only the tail is involved in processing a query and that processing is never delayed by activity elsewhere in the chain. This is contrast to the primary backup approach, where the primary must await acknowledgements from backups for prior updates before responding to a query. In both approaches, update requests must be distributed to all servers replicating an object otherwise the replicas will deviate. Chain replication does this broadcasting serially, resulting in higher latency than the primary/backup approach where updates are disseminated to backups in parallel. With parallel dissemination, the time needed to generate a reply is proportional to the maximum latency of any non-faulty backup; with serial dissemination, it is proportional to the sum of those latencies [32]. Chain replication exhibits a higher latency than multicast-based replication solutions but, on the other hand, it is extremely resource efficient and, therefore, it has been adopted in several practical systems. FAWN-KV [39] and Hyperdex [40] are two data stores that offer strong consistency using chain-replication as the main replication technique. BCR extends CR to achieve better utilization of computing and communication resources and to gain higher throughput.

The work in [34] proposed a novel datastore design, named ChainReaction. The proposed solution relies on a novel variant of chain-replication that offers the causal+ consistency criteria [37 and 38] and is able to leverage the existence of multiple replicas to distribute the load of read requests. As a result, ChainReaction avoids the bottlenecks of linearizability while providing competitive performance when compared with systems merely offering eventual consistency. ChainReaction can be deployed either on a single datacenter or on Geo- replicated scenarios, over multiple datacenters. Additionally, ChainReaction provides a transactional construct that allows a client to read the value of multiple objects in a causal+ consistent way.

The work in [35] presents the design, implementation, and evaluation of CRAQ (Chain Replication with Apportioned Queries), an object storage system that, while maintaining the strong consistency properties of chain replication, provides lower latency and higher throughput for read operations by supporting apportioned queries: that is, dividing read operations over all nodes in a chain, as opposed to requiring that they all be handled by a single primary node. CRAQ enables any chain node to handle read operations while preserving strong consistency, thus supporting load balancing across all nodes storing an object. Furthermore, when workloads are read mostly, an assumption used in other systems such as the Google File System [24] and Memcached [25], the performance of CRAQ rivals systems offering only eventual consistency. In addition to strong consistency, CRAQ’s design naturally supports eventual-consistency among read operations for lower-latency reads during write contention and degradation to read-only behavior during transient partitions. CRAQ allows applications to specify the maximum staleness acceptable for read operations. CRAQ techniques can be directly supported in BCR.

4. The Proposed BCR

4.1 BCR Intuitions and Contributions

The main objective of BCR is to improve throughput in CR without sacrificing consistency through better utilization of computing and communication resources of the chain. As shown in Fig. 2, BCR is composed of four uniquely identified servers (S1, S2, S3, and S4) which are linearly connected to form a chain. To maintain availability, data is replicated on each server, the replicated data is divided into two logical partitions (LP1, LP2). In order to maintain data availability in BCR, data is replicated on each server, the replicated data is divided into two logical partitions (LP1, LP2). Thus, data redundancy is achieved by this replication. If LP1 or LP2 is corrupted on one server, data can be recovered from another server. It is worthy to mention here, that fault tolerance techniques used in traditional CR are valid in BCR, hence, it is out of scope of this work. On server level, the first server in BCR, S1, (formerly the head in CR) has the exclusive right to concurrently write to the LP1 and read from the LP2. Conversely, the last server in BCR, S4, (formerly the tail in CR) has the exclusive right to concurrently write to the LP2 and read from LP1. Thus, S1 employs two processes: H1 to exclusively write to LP1 and T2 to exclusively read from LP2. Conversely, S4, employs two processes: H2 to exclusively write to LP2 and T1 to exclusively read from LP1. The inverse assignment of the read and write operations on S1 and S4 with respect to the logical partitions is motivated by attaining concurrent operation of S1 and S4 without conflict. Server, S2, writes on LP1 when triggered by H1 and writes to LP2 when triggered by S3. On the contrary, server, S3, writes on LP1 when triggered by S2 and writes to LP2 when triggered by H2. On chain level, BCR forms two chains running concurrently in two opposite directions (bidirectional chain). The first chain (CR1) runs from left to right (from S1 to S4) and manipulates data objects belonging to LP1. The second chain (CR2) runs from right to left (from S4 to S1) and manipulates data objects belonging to LP2. The processes H1 and T1 forms respectively the head and the tail of CR1, while the process H2 and T2 forms respectively the head and the tail of CR2.

From the client point of view, BCR has two servers (S1, S4), each one of them can execute both write and read requests, but on two different data slices. This is contrary to traditional CR where S1 can only write and S4 can only read. In BCR, a client request manipulating a data object needs to be directed to the partition to which this object belongs (either LP1 or LP2). This is illustrated in Table 1 shown below. From this table, we notice that BCR is able to perform four requests concurrently without affecting consistency. CR1 is able to write (through H1) to LP1 and read (through T1) from LP1. Conversely, CR2 is able to write (through H2) to LP2 and read (through T2) from LP2. This represents twice the number of requests that traditional CR can perform concurrently since it can only write through the head and read through the tail. Servers S1 and S4 shares write and read workload since S1 writes to LP1 and reads from LP2 while S2 writes to LP2 and reads from LP1. The intermediate servers (S2, S3) are able to write to both LP1 and LP2. Hence, BCR distributes the workload (write and read requests) on both chains without conflict between the two chains since data on each server is partitioned into two disjoint sets. The concurrent deployment of the two chains provides better utilization of the chain hardware which results in an improved throughput compared to traditional CR.

 

Fig. 2. BCR architecture

Table 1. BCR form the client view

 

4.2 BCR Protocols

The design of BCR is shown in Figs. 3 and 4. The first chain (CR1:H1→S2→S3→T1), shown in Fig. 3, consists of a head (H1 at S1), two servers (S2, S3), and a tail (T1 at S4). Client requests targeting data in LP1 are implemented as follows: each write-request is directed to the head (H1) and is executed at its local store (LP1), then the state changes are forwarded along the chain link to the next server (S2) which updates its local data store (LP1) and so on until the changes reach the tail (T1). T1 updates its local store (LP1) and sends a write-notification to the client and an update-acknowledgement to S3. The last acknowledgement is propagated down the chain until it reaches H1. Each query-request is directed to T1 which replies directly to the client.

 

Fig. 3. The First Chain (CR1)

Similarly, the second chain (CR2:T2←S2←S3←H2), shown in Fig. 4, consists of a head (H2 at S4), two servers (S3, S2), and a tail (T2 at S1). Client requests targeting data in LP2 are implemented as follows: each write-request is directed to the head (H2) and executed at it local store (LP2), then the state changes are forwarded to the next server (S3) which updates its local data store (LP2), and so on until the changes reach the tail (T2). T2 updates its local data store (LP2) and sends a write-notification to the client and an update-acknowledgement to S2. The last acknowledgement is propagated down the chain until it reaches H2. Each query-request is directed to T2 which replies directly to the client.

 

Fig. 4. The Second Chain (CR2)

The protocol of the client side of BCR is shown in Fig. 5. Upon write request, the client has to determine the partition (LP1 or LP2) to which the target object belongs. If the object belongs to LP1, the request is directed to S1, otherwise, it is directed to S4. Similarly, upon read request, the client determines the partition to which the target object belongs. If the object belongs to LP1, the request is directed to S4, otherwise, it is directed to S1. The code implementing the server side is similar the code of traditional CR with minor modifications. We enabled the read process at S1 and the write process at S4 to allow both servers to read and write. In addition, we modified the code on all servers to allow update notifications and acknowledgements to transfer in both directions (i.e., from S1 to S4 and vice versa).

 

Fig. 5. Client side read and write protocols in BCR

Finally, it is worth noticing that the mechanisms employed by BCR to recover from failure of a node are similar to those in the original chain replication. There are three types of failures and corresponding repairs [32]:1) Head Failure: when the head (S1 for CR1 or S4 for CR2) node fails, its successor (S2 for CR1 or S3 for CR2) takes over as the new head, as it contains most of the previous state of the head. All updates that were in head but were not propagated to it successor are retransmitted by the client proxy when the failure is detected. 2) Tail Failure: when tail node (S4 for CR1 or S3 for CR2) fails, it is easily recovered by replacing it with its predecessor, (S3 for CR1 or S2 for CR2). Because of the properties of the chain, the predecessor is guaranteed to have newer or equal state to the failing tail. 3) Failure of a middle node (S2 or S3): when a middle node (S2) fails, the chain is repaired by connecting S1 to S3 without any state transfer, however, the two nodes (S1, S3) may have to exchange some pending PUT operations that were sent to S2, but did not arrive to any of them. Similarly, when a middle node (S3) fails, the chain is repaired by connecting S4 to S2 without any state transfer, however, the two nodes (S4, S2) may have to exchange some pending PUT operations that were sent to S3, but did not arrive to any of them.

5. Experiments

In order to evaluate the throughput of BCR vs. the throughput of CR, we have conducted a set of experiments. The experiments were conducted on six Virtual Machines (VM) outsourced from the private cloud of King Saud University. Each VM is an eight-core machine with eight GB memory. Four VMs are dedicated for servers S1, S2, S3, S4, one VM is used as a master, and the last VM is used to generate client requests. Since we have up to 500 clients, we cannot provide a separate VM for each client program; hence, we created a thread for each client program on the same VM. The object store is replicated at each sever and contains 4000 objects which are evenly divided into two partitions, LP1 and LP2, each partition contains 2000 objects.

5.1 Experiments Setup

We have conducted four experiments as shown in Table 2. The number of clients in each experiment is 200, 300, 400, and 500 respectively where each client sends 10 requests. The total number of requests sent by the clients in each experiment is 2000, 3000, 4000, and 5000 respectively. Client requests are generated at a rate of 10% writes and 90% reads since this rate is very common in most large-scale applications such as Facebook and Twitter. Another reason for choosing this rate is that the original CR is more suitable for applications with high read rate. However, we have conducted experiments with different rates (i.e., 20% writes-80% reads, 30% writes-70% reads, and 40% writes-60% reads) and have obtained similar results. In all experiments, we measured throughput and execution time for both BCR and CR. The execution time is measured empirically by registering the VM local time at the instant of issuing the first request from a client till the execution of the last request and take the difference between them.

Table 2. Experiment Setup

 

5.2 Experimental Results

Tables 3, 4, 5 and 6 and Figs 6, 7, 8, and 9 show the load distribution in BCR and CR in Exp1,2,3, and 4 respectively. From these tables, we can see that in CR, S1 (the head) executes write requests only which represent 10% of all requests, while S4 (the tail) executes read requests only which represent 90% of all requests. On the other hand, in BCR, both servers S1 and S4 can executes write and read requests and they approximately share these requests evenly. The tables also show that in BCR, the two chains approximately execute the same number of requests.

Table 3. Load Distribution (no. of requests) in CR and BCR (Exp.1)

 

 

Fig. 6. Load Distribution (no. of requests) in CR and BCR (Exp.1)

Table 4. Load Distribution (no. of requests) in CR and BCR (Exp.2)

 

 

Fig. 7. Load Distribution (no. of requests) in CR and BCR (Exp.2)

Table 5. Load Distribution (no. of requests) in CR and BCR (Exp.3)

 

 

Fig. 8. Load Distribution (no. of requests) in CR and BCR (Exp.3)

Table 6. Load Distribution (no. of requests) in CR and BCR (Exp.4)

 

 

Fig. 9. Load Distribution (no. of requests) in CR and BCR (Exp.4)

Fig. 10 compares the load distribution in BCR and CR, the results depict that while in CR there is a significant difference between the loads on S1 and S4, in BCR, these servers have approximately the same load. Moreover, Fig. 11 shows that in BCR, the load on CR1 and CR2 are approximately the same.

 

Fig. 10. Load distribution on S1 and S4 in BCR and CR

 

Fig. 11. Load distribution on CR1 and CR4 in BCR

Fig. 12 compares throughput in BCR and CR, a quick inspection to this figure releases that BCR outperforms CR in terms of read, write, and total throughput by approximately 85%. The reason for this result is that BCR utilizes two chains in parallel to execute all requests. This result is also shown in Fig. 13, which depicts that in all experiments, BCR execution time is less than CR execution time by a factor of approximately two.

 

Fig. 12. Throughput (request/sec) in BCR and CR

 

Fig. 13. Execution time in BCR and CR

6. Conclusions and Future Work

In this paper, we proposed a novel approach called Bidirectional Chain Replication (BCR) to improve throughput in traditional Chain Replication (CR) through better utilization of computing and communication resources of the chain. Unlike CR where the whole replicated data store is treated as a single unit, in BCR the replicated data at each server in the chain are split into two disjoint Logical Partitions (LP1, LP2). This forms two chains that can work concurrently on the same hardware in two opposite directions and share the workload without conflict since the first chain (CR1) exclusively manipulates data objects in LP1 and the second chain (CR2) exclusively manipulates data objects in LP2. Experimental performance evaluation of BCR showed an improvement of approximately 85% in throughput over traditional CR. One limitation on BCR is that its performance depends on how requests are distributed on the logical partitions. In the ideal case, both types of requests are evenly distributed over the two partitions. Practically, some data objects belonging to a single partition are requested more frequently than others belonging to the other partition which results in unfair distribution of the requests on the partitions. The challenge is how to dynamically redistribute the popular data objects between the two logical partitions such that requests are uniformly distributed over partitions. This challenge is left for future work. Another future research direction is how to expand BCR to duplicate the performance of existence practical systems such as ChainReaction [34] and CRAQ [35].

References

  1. Charron-Bost, B., Pedone, F. & Schiper, A., "Replication: Theory and Practice," Springer, 2010.
  2. Furat F. and Almetwally M., "Callenges and New Avenues in Existing Replication Techniques," in Proc. of proceeding of the 6th International Conference on Cloud Computing and Service Science (CLOSER2016), vol. 1, pp. 147-154, Rome, Italy, April 2016.
  3. Safa Albasam and Almetwally M., "Dynamic Health-based Object Ownership Distributed Protocol," in Proc. of proceeding of 6th International Conference on Digital Information Processing and Communications (ICDIPC), Beirut, Lebanon, April 2016.
  4. F.P. Junqueira and M. Serafini, "On barriers and the gap between active and passive replication," in Proc. of Proceeding of the 27th International Symposium on Distributed Computing (DISC2013), Springer, vol. 8205, pp. 299-313, October 14-18, Jerusalem, Israel, 2013.
  5. Schneider, F. B. & Zhou, L., "Implementing Trustworthy Services Using Replicated State Machines," in Proc. of IEEE Symposium on Security & Privacy, vol. 3, pp. 34-43, Oakland, California, USA, 2005.
  6. Sousa, J. & Bessani, A. "From Byzantine Consensus to BFT State Machine Replication: A Latency-Optimal Transformation," in Proc. of proceeding of the 9th IEEE European Dependable Computing Conference (EDCC2012), pp. 37-48, Sibiu, Romania, 2012.
  7. Dettoni, F., Lung, L. C., Correia, M. & Luiz, A. F. "Byzantine Fault-Tolerant State Machine Replication with Twin Virtual Machines," in Proc. of IEEE Symposium On Computers and Communications (ISCC2013), Split, Croatia, July 2013.
  8. Cecchet, E., Candea, G. & Ailamaki, A. "Middleware Based Database Replication: The Gaps Between Theory and Practice," in Proc. of Proceedings of the 2008 ACM Sigmod International Conference On Management of Data, pp. 739-752, 2008.
  9. Lang, W., Patel, J. M. & Naughton, J. F. "On Energy Management, Load Balancing and Replication," ACM SIGMOD Record, vol. 38, pp. 35-42, 2010.
  10. Effatparvar, M., Yazdani, N., Effatparvar, M., Dadlani, A. & Khonsari, A. "Improved Algorithms for Leader Election in Distributed Systems," in Proc. of proceeding of the 2nd IEEE International Conference on Computer Engineering and Technology (ICCET2010), V2-6-V2-10, 2010.
  11. Budhiraja, N., Marzullo, K., Schneider, F. B. & Toueg, S. "The Primary-Backup Approach," Distributed Systems, vol. 2, pp. 199-216, 1993.
  12. Mostafa, A. M. & Youssef, A. E. A, "Primary Shift Protocol for Improving Availability in Replication Systems," International Journal of Computer Applications, vol. 72, pp. 37-44, 2013. https://doi.org/10.5120/12485-8905
  13. Mostafa, A. M. & Youssef, A. E. "Improving Resource Utilization, Scalability, and Availability in Replication Systems Using Object Ownership Distribution," Arabian Journal for Science and Engineering, vol. 39, no. 12, pp. 8731-8741, 2014. https://doi.org/10.1007/s13369-014-1375-1
  14. Mostafa, A. M. & Youssef, A. E. "PRP: A Primary Replacement Protocol Based On Early Discovery of Battery Power Failure in MANETS," Multimedia Tools and Applications, vol. 74, no. 16, pp. 6243-6254, 2015. https://doi.org/10.1007/s11042-014-2091-2
  15. Bolosky, W. J., Bradshaw, D., Haagens, R. B., Kusters, N. P. & Li, P. "Paxos Replicated State Machines as The Basis of a High-Performance Data Store," in Proc. of Proceeding of 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI2011), pp.141-154, Boston, MA, April 2011.
  16. Lamport, L. "The Part-Time Parliament", ACM Transactions On Computer Systems (ToCS), vol. 16, pp. 133- 169, 1998. https://doi.org/10.1145/279227.279229
  17. Lamport, L. "Paxos Made Simple," ACM SIGACT News, vol. 32, pp. 18-25, 2001.
  18. Lampson, B. "The ABCD's of Paxos,", Proceedings of the twentieth annual ACM symposium on Principles of Distributed Computing (PODC2001), pp. 13, Newport, Rhode Island, USA, 2001.
  19. Tan, Z., Dang, Y., Sun, J., Zhou, W. & Feng, D. "Paxstore: A Distributed Key Value Storage System," in Proc. of Proceeding of International Conference on Network and Parallel Computing (NPC2014), Springer, Lecture Notes in Computer Science, LNCS-8707, pp. 471-484, Ilan, Taiwan, 2014.
  20. Burrows, M. "The Chubby Lock Service for Loosely Coupled Distributed System," in Proc. of Proceedings of the 7th USENIX Symposium On Operating Systems Design and Implementation (OSDI2006), pp. 335-350, Seattle, WA, Nov, 2006.
  21. Corbett, J. C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J. J., Ghemawat, S., Gubarev, A., Heiser, C. & Hochschild, P. "Spanner: Google's Globally Distributed Databas," ACM Transactions On Computer Systems (ToCS), vol. 31, no. 8, 2013.
  22. Hunt, P., Konar, M., Junqueira, F. P. & Reed, B. "Zookeeper: Wait-Free Coordination for Internet-Scale Systems," in Proc. of USENIX Annual Technical Conference, 2010.
  23. Shapiro, Marc, et al. "Conflict-free replicated data types," in Proc. of Symposium on Self-Stabilizing Systems, Springer Berlin Heidelberg. 2011.
  24. S. Ghemawat, H. Gobioff, and S.-T. Leung. "The google file system," in Proc. of Symposium on Operating Systems Principles (SOSP), Oct. 2003.
  25. B. Fitzpatrick. "Memcached: a distributed memory object caching syste," 2009.
  26. T. D. Chandra, R. Griesemer, and J. Redstone. "Paxos made live: an engineering perspective," in Proc. of 26th ACM SOSP, PODC '07, pages 398-407, New York, NY, USA, 2007.
  27. L. Lamport. Fast Paxos, 2006. Available: https://doi.org/10.1007/s00446-006-0005-x
  28. Y. Mao, F. P. Junqueira, and K. Marzullo. "Mencius: building efficient replicated state machines for WANs," in Proc. of 8th USENIX OSDI, pages 369-384, San Diego, CA, Dec. 2008.
  29. L. Lamport. "Generalized consensus and Paxos," 2005.
  30. Iulian Moraru, David G. Andersen, Michael Kaminsky. "There Is More Consensus in Egalitarian Parliaments," in Proc. of Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13), pp. 358-372, Farminton, Pennsylvania - November 03 - 06, 2013.
  31. Phillipe Ajoux, Nathan Bronson, Sanjeev Kumar, Wyatt Lloyd, and Kaushik Veeraraghavan. "Challenges to adopting stronger consistency at scale," in Proc. of 15th Workshop on Hot Topics in Operating Systems, 2015.
  32. Van Renesse, Robbert, and Fred B. Schneider. "Chain Replication for Supporting High Throughput and Availability," USENIX Symposium On Operating Systems Design and Implementation (OSDI04), vol. 4, pp:91-104, 2004.
  33. Van Renesse, Robbert, Chi Ho, and Nicolas Schiper. "Byzantine chain replication," in Proc. of International Conference On Principles of Distributed Systems, Springer Berlin Heidelberg, 2012.
  34. Almeida, Sergio, Joao Leitao, and Luis Rodrigues. "ChainReaction: a causal+ consistent datastore based on chain replication," in Proc. of Proceedings of the 8th ACM European Conference on Computer Systems, ACM, 2013.
  35. Terrace, Jeff, and Michael J. Freedman. "Object Storage on CRAQ: High-Throughput Chain Replication for Read-Mostly Workloads," in Proc. of USENIX Annual Technical Conference. 2009.
  36. Fritchie, Scott Lystig. "Chain replication in theory and in practice," in Proc. of Proceedings of the 9th ACM SIGPLAN workshop on Erlang, 2010.
  37. W. Lloyd, M. Freedman, M. Kaminsky, and D. Andersen. "Don't settle for eventual: scalable causal consistency for wide- area storage with cops," in Proc. of ACM SOSP, pages 401-416, 2011.
  38. P. Mahajan, L. Alvisi, and M. Dahlin. "Consistency, availability, and convergence," Technical Report TR-11-22, Univ. Texas at Austin, 2011.
  39. D. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. "FAWN: a fast array of wimpy nodes," Comm. ACM, vol. 54, no. 7, pp:101-109, 2011. https://doi.org/10.1145/1965724.1965747
  40. R. Escriva, B. Wong, and E. G. Sirer. "HyperDex: A distributed, searchable key-value store for cloud computing," Technical report, CSD, Cornell University, 2011.