An Introduction to Eventual Consistency
In a perfect world, the internet computing environment would consist of a single gigantic computer and hard drive that stored all data without worrying about hardware failure or losing data.
Of course, our computing world is not perfect, and interconnected computing environments spread computing requirements over many different computers, networks, and storage systems. In failure, data is backed up or replicated on multiple storage units, also known as nodes. Developers need to consider how consistent data appears to the user and how available it is when designing new database applications.
Eventual consistency is a design model that prioritizes computation speed (low latency) over data consistency (whether or not the data is replicated precisely on all nodes). Strong consistency is a model that favors consistent data delivery in exchange for system availability.
Each model is about different types of databases and how the data is going to be used. The CAP Theorem explains that a distributed database can have only had two out of consistency, availability, and partition tolerance. While the consistency availability partition is acceptable for some applications, it is not the best for financial transactions or streaming data analysis.
What is CAP Theorem?
In the early days of computing, all operations were conducted on a single CPU or mainframe computer. Client-server architecture separated the user nodes from the back-end processing. However, a single processor does all of the computing work. Distributed computing spreads the processing work between multiple processors, all working together.
Network equipment and storage systems also distribute computing power. Distributed computing is made possible by connecting separate computer systems to improve processing power, speed, and redundancy. If one processor (or node) goes down, the other nodes will complete the process with consistent and accurate data.
The CAP Theorem (aka Brewer’s Theorem named after Professor Eric Brewer of UC Berkeley, who introduced the concept in 1998) states that there are tradeoffs between data Consistency, Availability, and Partition tolerance. The “C” means that users see the same data on all nodes at the same time. “A” means the system operates during a node failure. “P” is when the node runs despite network failure.
In some cases, the data presented to the user must be the same on all nodes. This concept is also known as strong consistency. This state is a requirement for financial transactions. Its downside is the too much time it takes for all nodes to replicate data and become available.
If the priority is data availability, then the data may not be updated on all the nodes simultaneously. The availability is faster, but the tradeoff is data accuracy. This model is called eventual consistency. Non-financial website updates on domain-name-servers (DNS) is an example of eventual consistency.
If your data goal is Consistency and Availability, using a relational database (SQL, MySQL, Postgres) is the best choice. If Consistency and Partition Tolerance is the goal, then using a non-relational (NoSQL) database like MongoDB, Redis or CouchBase might be a better choice. For Consistency and Availability, using CouchBase, Cassandra, and Amazon DynamoDB is an excellent option. Note that different database characteristics may change over time.
Background on Eventual Consistency
Eventual consistency is a distributed computing model emphasizing speed or low latency over the risk of displaying stale or outdated data. The data will eventually show once all of the replicated nodes are up-to-date. Updating Domain Name Services (DNS) is an example of an eventual consistency model. DNS updates can take from several minutes up to 24 hours, depending on the domain’s popularity on the internet. The fact the DNS is not updated right away is not critical to the application and promises availability over consistency.
Applications that update single keys over vast geography are built using the eventual consistency model. Consider a social media application that asks readers to like a post. A user in Sweden can like a post, and the state change is transmitted around the world. At the same time, a user in Mexico will also like the same post. The state difference will eventually update on the Swedish’s user account. Since that update is not essential, the system is readily available and more important than data consistency. These data state changes are typical of many cloud-based applications that update databases on hundreds if not thousands of computers worldwide. The expectation to keep these systems available and responsive is more important than showing the data consistently.
Non-relational databases (NoSQL) are a popular choice for eventual consistency applications for a variety of reasons. These databases store non-structured data in JSON document-style format versus the structured rows and columns used in structured (SQL) databases. Popular NoSQL databases include MongoDB, Cassandra, and HBase.
Eventual Consistency vs. Strong Consistency
When comparing eventual vs. strong consistency, developers need to determine which model fits their users’ needs and application requirements. If you need a cloud-based application with a lot of flexibility, then eventual consistency will provide low latency and availability for a quality web-based user experience.
Another use case for eventual consistency is when data is changed infrequently. Let’s consider the address in your online account user profile. It might take a few minutes for the new address to appear on the screen. While the delay might not be desirable, it will not impact the performance of the application.
The BASE consistency model is well-accepted for eventual consistency applications. The BASE acronym stands for:
BA - The database is available most of the time.
Soft-State (S) - Data replicas do not need to be always consistent.
Eventual Consistency (E) - Once a write completes, writes from other nodes will finish eventually.
Databases with BASE consistency prefer availability over data consistency and is less strict than the ACID model.
The strong consistency model emphasizes data consistency, with financial transactions being a great example. If you deposit money into your online banking account, then drive to your local branch, you will want to see the same balance in both locations. You may need to wait a bit longer for the data to replicate through the network and update all of the bank’s computers. But the data accuracy you see is what is most important.
If you used the model for eventual consistency applications, you might see a different balance on your mobile app than in the branch and wonder what is wrong with your bank. The consistency is obtained by restricting the read/write operations while the different nodes update the databases. The tradeoff for data consistency is high latency, but there is no alternative in this case. The transaction data must be accurate.
Structured (SQL) databases are the best choice for strong consistency applications. When a transaction occurs, all network nodes are locked down until the user’s correct data displays. SQL databases need to meet the ACID consistency model requirements.
ACID is an acronym for:
Atomic - The transaction includes many operations, and all transactions must succeed.
Consistent - After the transaction finishes, the database must be in a consistent state.
Isolation - There may be multiple transactions happening concurrently, but each transaction must be isolated.
Durable - The result of the transaction is stable.
The ACID model ensures data is consistent and always displays correctly.
Eventual Consistency in Streaming
Eventual consistency works for widely distributed applications where availability is the top priority, and data changes do not consistently appear across all nodes. But eventual consistency in streaming is not a preferred model since the dataflow is always changing. The eventual consistency model results will generate inaccurate results since the data is inconsistent and requires constant updating.
For low latency results, using a streaming database is a better choice. Streaming databases frequently check the dataflow and update database results when those values change. Since streaming databases are SQL relational tables, strong consistency is built into the equation delivering consistent and accurate results as the data changes.
Data consistency models based on the CAP Theorem require distributed computing applications to consider Consistency, Availability, and Partition tolerance when designed. Developers need to decide which model fits their use case. Strong consistency models require that users see consistent data across all network nodes.
Eventual consistency models suggest that data be displayed but that some nodes be updated later, as long as all information is eventually updated. Developers also need to consider the type of database. SQL or relational databases work well for strong consistency models. Non-relational (NoSQL) databases that manage non-structured data are often good choices for eventual consistency models.
Eventual consistency is not a robust model for streaming data applications. Since the dataflow is continually changing, the NoSQL databases will return a consistent flow of inaccurate data. Streaming SQL databases can provide strong consistency with accurate results and low-latency availability.
For more about software development tutorials and recent developments in programming, please check our dedicated blog section.