Indistributed computing, aconflict-free replicated data type(CRDT) is adata structurewhich can bereplicatedacross multiple computers in anetwork, where the replicas can be updated independently andconcurrentlywithoutcoordinationbetween the replicas, and where it is always mathematically possible to resolve inconsistencies which might result.

The CRDT concept was formally defined in 2011 by Marc Shapiro, Nuno Preguiça, Carlos Baquero and Marek Zawirski. Development was initially motivated bycollaborative text editingandmobile computing. CRDTs have also been used inonline chatsystems,online gambling, and in theSoundCloudaudio distribution platform. TheNoSQLdistributed databasesRedisandRiakhave CRDT data types.[1][2][3][4][5][6][7][8]

Background[edit]

Concurrent updates to multiple replicas of the same data, without coordination between the computers hosting the replicas, can result in inconsistenciesbetween the replicas, which in the general case may not be resolvable. Restoring consistency and data integrity when there are conflicts between updates may require some or all of the updates to be entirely or partially dropped.

Accordingly, much of distributed computing focuses on the problem of how to prevent concurrent updates to replicated data. But another possible approach isoptimistic replication, where all concurrent updates are allowed to go through, with inconsistencies possibly created, and the results are merged or “resolved” later. In this approach, consistency between the replicas iseventuallyre-established via “merges” of differing replicas. While optimistic replication might not work in the general case, it turns out that there is a significant and practically useful class of data structures, CRDTs, where it does work — where it is mathematically always possible to merge or resolve concurrent updates on different replicas of the data structure without conflicts. This makes CRDTs ideal for optimistic replication.

As an example, a one-wayBooleanevent flag is a trivial CRDT: one bit, with a value of true or false. True means some particular event has occurred at least once. False means the event has not occurred. Once set to true, the flag cannot be set back to false. (An event, having occurred, cannot un-occur.) The resolution method is “true wins”: when merging a replica where the flag is true (that replica has observed the event), and another one where the flag is false (that replica hasn’t observed the event), the resolved result is true — the event has been observed.

Types of CRDTs[edit]

There are two approaches to CRDTs, both of which can providestrongeventual consistency: operation-based CRDTs[9][10]and state-based CRDTs.[11][12]

The two alternatives are equivalent, as one can emulate the other.[1] Operation-based CRDTs require additional guarantees from thecommunication middleware;[1]namely that the operations not be dropped or duplicated when transmitted to the other replicas, though they can be delivered in any order. State-based CRDTs also have a disadvantage, which is that the entire state must be transmitted to the other replicas, which may be costly.

Operation-based CRDTs[edit]

Operation-based CRDTs are referred to ascommutative replicated data types, orCmRDTs. CmRDT replicas propagate state by transmitting only the update operation. For example, a CmRDT of a single integer might broadcast the operations (+10) or (−20). Replicas receive the updates and apply them locally. The operations arecommutative. However, they are notidempotent. The communications infrastructure must therefore ensure that all operations on a replica are delivered to the other replicas, without duplication, but in any order.

Pureoperation-based CRDTs[10]are a variant of operation-based CRDTs that reduces the metadata size.

State-based CRDTs[edit]

State-based CRDTs are calledconvergent replicated data types, orCvRDTs. In contrast to CmRDTs, CvRDTs send their full local state to other replicas, where the states are merged by a function which must becommutative,associative, andidempotent. Themergefunction provides ajoinfor any pair of replica states, so the set of all states forms asemilattice. Theupdatefunction mustmonotonically increasethe internal state, according to the samepartial orderrules as the semilattice.

Delta stateCRDTs[12][13](or simply Delta CRDTs) are optimized state-based CRDTs where only recently applied changes to a state are disseminated instead of the entire state.

Comparison[edit]

While CmRDTs place more requirements on the protocol for transmitting operations between replicas, they use less bandwidth than CvRDTs when the number of transactions is small in comparison to the size of internal state. However, since the CvRDT merge function is associative, merging with the state of some replica yields all previous updates to that replica.Gossip protocolswork well for propagating CvRDT state to other replicas while reducing network use and handling topology changes.

Some lower bounds[14]on the storage complexity of state-based CRDTs are known.

Known CRDTs[edit]

G-Counter (Grow-only Counter)[edit]

payload integer[n] P
    initial [0,0,...,0]
updateincrement()
    let g=myId()
    P[g] :=P[g] + 1
queryvalue() : integer v
    let v=P[i]
compare (X, Y) : boolean b
    let b=([0, n - 1] : X.P[i]Y.P[i])
merge (X, Y) : payload Z
    let[0, n - 1] : Z.P[i]=max(X.P[i], Y.P[i])

This CvRDT implements a counter for a cluster ofnnodes. Each node in the cluster is assigned an ID from 0 ton– 1, which is retrieved with a call tomyId(). Thus each node is assigned its own slot in the arrayP, which it increments locally. Updates are propagated in the background, and merged by taking themax() of every element inP. The compare function is included to illustrate a partial order on the states. The merge function is commutative, associative, and idempotent. The update function monotonically increases the internal state according to the compare function. This is thus a correctly-defined CvRDT and will provide strong eventual consistency. The CmRDT equivalent broadcasts increment operations as they are received.[2]

PN-Counter (Positive-Negative Counter)[edit]

payload integer[n] P, integer[n] N
    initial [0,0,...,0], [0,0,...,0]
updateincrement()
    let g=myId()
    P[g] :=P[g] + 1
updatedecrement()
    let g=myId()
    N[g] :=N[g] + 1
queryvalue() : integer v
    let v=P[i] -N[i]
compare (X, Y) : boolean b
    let b=([0, n - 1] : X.P[i]Y.P[i][0, n - 1] : X.N[i]Y.N[i])
merge (X, Y) : payload Z
    let[0, n - 1] : Z.P[i]=max(X.P[i], Y.P[i])
    let[0, n - 1] : Z.N[i]=max(X.N[i], Y.N[i])

A common strategy in CRDT development is to combine multiple CRDTs to make a more complex CRDT. In this case, two G-Counters are combined to create a data type supporting both increment and decrement operations. The “P” G-Counter counts increments; and the “N” G-Counter counts decrements. The value of the PN-Counter is the value of the P counter minus the value of the N counter. Merge is handled by letting the merged P counter be the merge of the two P G-Counters, and similarly for N counters. Note that the CRDT’s internal state must increase monotonically, even though its external state as exposed throughquerycan return to previous values.[2]

G-Set (Grow-only Set)[edit]

payload set A
    initial
updateadd(element e)
    A :=A{e}
querylookup(element e) : boolean b
    let b=(eA)
compare (S, T) : boolean b
    let b=(S.AT.A)
merge (S, T) : payload U
    let U.A=S.AT.A

The G-Set (grow-only set) is a set which only allows adds. An element, once added, cannot be removed. The merger of two G-Sets is their union.[2]

2P-Set (Two-Phase Set)[edit]

payload set A, set R
    initial,
querylookup(element e) : boolean b
    let b=(eAeR)
updateadd(element e)
    A :=A{e}
updateremove(element e)
   prelookup(e)
    R :=R{e}
compare (S, T) : boolean b
    let b=(S.AT.AS.RT.R)
merge (S, T) : payload U
    let U.A=S.AT.A
    let U.R=S.RT.R

Two G-Sets (grow-only sets) are combined to create the 2P-set. With the addition of a remove set (called the “tombstone” set), elements can be added and also removed. Once removed, an element cannot be re-added; that is, once an elementeis in the tombstone set,querywill never again return True for that element. The 2P-set uses “remove-wins” semantics, soremove(e) takes precedence overadd(e).[2]

LWW-Element-Set (Last-Write-Wins-Element-Set)[edit]

LWW-Element-Set is similar to 2P-Set in that it consists of an “add set” and a “remove set”, with a timestamp for each element. Elements are added to an LWW-Element-Set by inserting the element into the add set, with a timestamp. Elements are removed from the LWW-Element-Set by being added to the remove set, again with a timestamp. An element is a member of the LWW-Element-Set if it is in the add set, and either not in the remove set, or in the remove set but with an earlier timestamp than the latest timestamp in the add set. Merging two replicas of the LWW-Element-Set consists of taking the union of the add sets and the union of the remove sets. When timestamps are equal, the “bias” of the LWW-Element-Set comes into play. A LWW-Element-Set can be biased towards adds or removals. The advantage of LWW-Element-Set over 2P-Set is that, unlike 2P-Set, LWW-Element-Set allows an element to be reinserted after having been removed.[2]

OR-Set (Observed-Remove Set)[edit]

OR-Set resembles LWW-Element-Set, but using unique tags instead of timestamps. For each element in the set, a list of add-tags and a list of remove-tags are maintained. An element is inserted into the OR-Set by having a new unique tag generated and added to the add-tag list for the element. Elements are removed from the OR-Set by having all the tags in the element’s add-tag list added to the element’s remove-tag (tombstone) list. To merge two OR-Sets, for each element, let its add-tag list be the union of the two add-tag lists, and likewise for the two remove-tag lists. An element is a member of the set if and only if the add-tag list less the remove-tag list is nonempty.[2]An optimization that eliminates the need for maintaining a tombstone set is possible; this avoids the potentially unbounded growth of the tombstone set. The optimization is achieved by maintaining a vector of timestamps for each replica.[15]

Sequence CRDTs[edit]

A sequence, list, orordered setCRDT can be used to build aCollaborative real-time editor, as an alternative toOperational transformation(OT).

Some known Sequence CRDTs are Treedoc,[5]
RGA,[16]Woot,[4]
Logoot,[17]and LSEQ.[18]
CRATE[19]is a decentralized real-time editor built on top of LSEQ and runnable on a network of browsers usingWebRTC.
LogootSplit[20]was proposed as an extension of Logoot in order to reduce the metadata for sequence CRDTs. MUTE[21][22]is an online web-based peer-to-peer real-time collaborative editor relying on LogootSplit algorithm.

Industry use[edit]

Redisis a distributed, highly available and scalable in-memory database that uses CRDTs for implementing globally distributed databases based on and fully compatible with Redis open source.SoundCloudopen-sourcedRoshi, a LWW-element-set CRDT for the SoundCloud stream implemented on top ofRedis.[23]

Riakis a distributed NoSQL key-value data store based on CRDTs.[24]League of Legendsuses the Riak CRDT implementation for its in-game chat system, which handles 7.5 million concurrent users and 11,000 messages per second.[25]Bet365(the largest European on-line betting company with 2.5 million simultaneous users peak), stores hundreds of megabytes of data in theRiakimplementation of OR-Set.[26]

TomTomemploys CRDTs to synchronize navigation data between the devices of a user.[27]

Phoenix (web framework), a web framework written inElixir, uses CRDTs to support real time multi-node information sharing in version 1.2.[28]

Facebookimplements CRDTs in their Apollo low-latency “consistency at scale” database.[29]

Teletype forAtomemploys CRDTs to enable developers share their workspace with team members and collaborate on code in real time.[30]

Microsoft’sCosmos DBuses CRDTs to enable a multi-master write mode.[31]

Haja Networks’ OrbitDB uses operation-based CRDTs in its core data structure, IPFS-Log.[32]

Appleimplements CRDTs in the Notes app for syncing offline edits between multiple devices.[33]

References[edit]

  1. ^abc
    Shapiro, Marc; Preguiça, Nuno; Baquero, Carlos; Zawirski, Marek (2011),Conflict-Free Replicated Data Types(PDF), Lecture Notes in Computer Science,6976, Grenoble, France: Springer Berlin Heidelberg, pp. 386–400,doi:10.1007/978-3-642-24550-3_29,ISBN 978-3-642-24549-7
  2. ^abcdefg
    Shapiro, Marc; Preguiça, Nuno; Baquero, Carlos; Zawirski, Marek (13 January 2011). “A Comprehensive Study of Convergent and Commutative Replicated Data Types”.Rr-7506.
  3. ^
    Shapiro, Marc; Preguiça, Nuno (2007). “Designing a Commutative Replicated Data Type”.Computing Research Repository (CoRR). abs/0710.1784.arXiv:0710.1784.Bibcode:2007arXiv0710.1784S.
  4. ^abOster, Gérald; Urso, Pascal; Molli, Pascal; Imine, Abdessamad (2006).Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work – CSCW ’06. p. 259.CiteSeerX 10.1.1.554.3168.doi:10.1145/1180875.1180916.ISBN 978-1595932495.
  5. ^ab
    Letia, Mihai; Preguiça, Nuno; Shapiro, Marc (2009). “CRDTs: Consistency without Concurrency Control”.Computing Research Repository (CoRR). abs/0907.0929.arXiv:0907.0929.Bibcode:2009arXiv0907.0929L.
  6. ^
    Preguiça, Nuno; Marques, Joan Manuel; Shapiro, Marc; Letia, Mihai (June 2009),A Commutative Replicated Data Type for Cooperative Editing(PDF), Montreal, Quebec, Canada: IEEE Computer Society, pp. 395–403,doi:10.1109/ICDCS.2009.20,ISBN 978-0-7695-3659-0
  7. ^
    Baquero, Carlos; Moura, Francisco (1997). “Specification of Convergent Abstract Data Types for Autonomous Mobile Computing”. Universidade do Minho.
  8. ^
    Schneider, Fred (December 1990). “Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial”.
  9. ^
    Letia, Mihai; Preguiça, Nuno; Shapiro, Marc (1 April 2010).“Consistency without Concurrency Control in Large, Dynamic Systems”(PDF).SIGOPS Oper. Syst. Rev.44(2): 29–34.doi:10.1145/1773912.1773921.
  10. ^ab
    Baquero, Carlos; Almeida, Paulo Sérgio; Shoker, Ali (2014-06-03). Magoutis, Kostas; Pietzuch, Peter (eds.).Making Operation-Based CRDTs Operation-Based. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 126–140.CiteSeerX 10.1.1.492.8742.doi:10.1007/978-3-662-43352-2_11.ISBN 9783662433515.
  11. ^
    Baquero, Carlos; Moura, Francisco (1 October 1999). “Using Structural Characteristics for Autonomous Operation”.SIGOPS Oper. Syst. Rev.: 90–96.
  12. ^ab
    Almeida, Paulo Sérgio; Shoker, Ali; Baquero, Carlos (2015-05-13). Bouajjani, Ahmed; Fauconnier, Hugues (eds.).Efficient State-Based CRDTs by Delta-Mutation. Lecture Notes in Computer Science. Springer International Publishing. pp. 62–76.arXiv:1410.2803.doi:10.1007/978-3-319-26850-7_5.ISBN 9783319268491.
  13. ^
    Almeida, Paulo Sérgio; Shoker, Ali; Baquero, Carlos (2016-03-04). “Delta State Replicated Data Types”.Journal of Parallel and Distributed Computing.111: 162–173.arXiv:1603.01529.Bibcode:2016arXiv160301529S.doi:10.1016/j.jpdc.2017.08.003.
  14. ^
    Burckhardt, Sebastian; Gotsman, Alexey; Yang, Hongseok; Zawirski, Marek (23 January 2014). “Replicated Data Types: Specification, Verification, Optimality”.Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. pp. 271–284.doi:10.1145/2535838.2535848.ISBN 9781450325448.
  15. ^Annette Bieniusa, Marek Zawirski, Nuno Preguiça, Marc Shapiro, Carlos Baquero, Valter Balegas, Sérgio Duarte, “An Optimized Conflict-free Replicated Set” (2012)arXiv:1210.3368
  16. ^Roh, Huyn-Gul; Jeon, Myeongjae; Kim, Jin-Soo; Lee, Joonwon (2011). “Replicated Abstract Data Types: Building Blocks for Collaborative Applications”.Journal of Parallel and Distributed Computing.71(2): 354–368.doi:10.1016/j.jpdc.2010.12.006.
  17. ^
    Weiss, Stephane; Urso, Pascal; Molli, Pascal (2010). “Logoot-Undo: Distributed Collaborative Editing System on P2P Networks”.IEEE Transactions on Parallel and Distributed Systems.21(8): 1162–1174.doi:10.1109/TPDS.2009.173.ISSN 1045-9219.
  18. ^
    Nédelec, Brice; Molli, Pascal; Mostefaoui, Achour; Desmontils, Emmanuel (2013). “LSEQ”.LSEQ an adaptive structure for sequences in distributed collaborative editing. p. 37.doi:10.1145/2494266.2494278.ISBN 9781450317894.
  19. ^
    Nédelec, Brice; Molli, Pascal; Mostefaoui, Achour (2016). “CRATE: Writing Stories Together with our Browsers”.Proceedings of the 25th International Conference Companion on World Wide Web. p. 231.doi:10.1145/2872518.2890539.[dead link]
  20. ^
    André, Luc; Martin, Stéphane; Oster, Gérald; Ignat, Claudia-Lavinia (2013). “Supporting Adaptable Granularity of Changes for Massive-scale Collaborative Editing”.Proceedings of the International Conference on Collaborative Computing: Networking, Applications and Worksharing – CollaborateCom 2013. pp. 50–59.doi:10.4108/icst.collaboratecom.2013.254123.
  21. ^
    “MUTE”. Coast Team. March 24, 2016.
  22. ^
    Nicolas, Matthieu; Elvinger, Victorien; Oster, Gérald; Ignat, Claudia-Lavinia; Charoy, François (2017). “MUTE: A Peer-to-Peer Web-based Real-time Collaborative Editor”.Proceedings of ECSCW Panels, Demos and Posters 2017.doi:10.18420/ecscw2017_p5.
  23. ^
    Bourgon, Peter (9 May 2014).“Roshi: a CRDT system for timestamped events”. SoundCloud.
  24. ^
    “Introducing Riak 2.0: Data Types, Strong Consistency, Full-Text Search, and Much More”. Basho Technologies, Inc. 29 October 2013.
  25. ^
    Hoff, Todd (13 October 2014).“How League of Legends Scaled Chat to 70 Million Players – It Takes Lots of Minions”.High Scalability.
  26. ^
    Macklin, Dan.“bet365: Why bet365 chose Riak”. Basho.
  27. ^
    Ivanov, Dmitry.“Practical Demystification of CRDTs”.
  28. ^
    McCord, Chris.“What makes Phoenix Presence Special”.
  29. ^
    Mak, Sander.“Facebook Announces Apollo at QCon NY 2014”.
  30. ^
    “Code together in real time with Teletype for Atom”. Atom.io. November 15, 2017.
  31. ^rimman.“Multi-master at global scale with Azure Cosmos DB”.docs.microsoft.com. Retrieved2018-05-07.
  32. ^“OrbitDB/ipfs-log on Github”. Retrieved2018-09-07.
  33. ^“IOS Objective-C headers as derived from runtime introspection: NST/IOS-Runtime-Headers”. 2019-07-25.

External links[edit]

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here