ACID vs BASE: Comparison of two Design Philosophies

#concurrency #database #data-engineering #architecture #basics

Let's talk about a couple of basic concepts in the data space and how they relate to each other and how they started. This is a loose interpretation and my understanding of the situation, so if you want strict terms, this is not the place to learn them.

So, ACID and BASE represent two design philosophies at opposite ends of the consistency-accessibility spectrum. Whaaat?

ACID

Once upon a time, in the dark ages of the early days of the Internet, that is, somewhere around the end of the last millennium and the beginning of this millennium, the question of choosing the right architecture often came down to choosing the right database. For years, people tried to solve all service-building problems with relational DBMSs, and for years, all attempts to scale these services were doomed to failure as soon as the schema became complex enough.

Transactional database design is a very mature model with a great set of technologies with a lot of benefits. But it came with problems, the main one being scalability. This problem is mostly because of the transactions. And when people talk about transactions they usually mean ACID transactions.

Atomicity. Atomicity guarantees that each transaction will be executed entirely or not at all. No intermediate states are allowed.
Consistency. The problem with the term "Consistency" is that it is used in too many contexts. Here Consistency has a narrower meaning and does not include the quality attribute of distributed systems. Consistency is a requirement that implies that the transaction will result in valid data. For example, the amount of money in an account cannot be a negative value or the number of people cannot be a fraction. Each successful transaction, by definition, captures only valid results.
Isolation. Events occurring within a transaction must be hidden from other concurrent transactions. If this condition was not met then the transaction would be impossible to roll back (Atomicity).
Durability. Once a transaction has completed and committed its results to the database, the system must ensure that these results survive any subsequent failures.

ACID guarantees that the system chooses consistency over availability.

Do you feel that ACID transactions are a problem? They guarantee too much. And this greatly affects the scalability of systems.

BASE

With the increasing amount of data and high availability requirements, the approach to database design has also changed dramatically. To increase the ability to scale and at the same time be highly available, we move the logic from the database to separate servers. In this way, the database becomes more independent and focused on the actual process of storing data.

In this case, we have to almost completely abandon the consistency requirements of the ACID model. When we talk about "Consistency" here, we are referring to a scenario where different nodes have their own copy of data. In such a situation, conflicts arise because each node can update its own copy, so if I read data from different nodes, I will see different values. Conflicts can occur, but nodes communicate their changes to each other to resolve these conflicts, so eventually, they agree on the final value. This is where Eventual Consistency, Strong Eventual Consistency, and Strong Consistency come into play.

Now we are moving from transactional database models into NoSQL approaches. They cover situations where the ACID model is redundant. NoSQL relies on a softer model, known as the BASE. BASE consists of three principles:

Basic Availability. The system focuses on data availability even in the presence of data errors or inconsistencies. This is achieved by using a distributed approach. Instead of maintaining one large data warehouse and focusing on the fault tolerance of that warehouse, data is distributed across many storage systems with a high degree of replication. In the unlikely event that a failure disrupts access to data partitions, it doesn't necessarily result in a complete database outage. It might mean that you don't even control the data sources — for example, you might link to publicly available datasets for part of your workloads.
Soft State. One of the basic concepts at the heart of the BASE. The state of the system could change over time (even during times without input), because there may be changes going on due to "eventual consistency". That's why says "soft" state. By contrast, the position of a typical simple light switch is "hard-state". If you flip it up, it will stay up, possibly forever. It will only change back to down when you (or some other user) explicitly comes back to manipulate it.
Eventual consistency. The only requirement that systems have for consistency is that the data must converge to a consistent state at some point in the future. However, there is no guarantee as to when this will happen. This is the complete opposite of the ACID immediate consistency requirement, which prohibits the execution of a transaction until the previous transaction is complete and the database is converging to a consistent state.

At its core, the BASE is considered the opposite of the ACID model, arguing that true consistency(in terms of BASE but not ACID) cannot be achieved in the real world in highly scalable systems.

Conclusion

ACID vs BASE

What is better ACID or BASE depends on your project and the context of the problem. For example, if approximate answers are fine but the user really cares about the speed of interaction, BASE will be a better option. If the opposite is true, ACID will help you make your data system as robust as possible. Today's large-scale systems, including clouds, use a combination of both approaches.

To be honest, I find the notion of BASE to be a more marketing wrapper than ACID — because it doesn't give anything new and doesn't characterize the database in any way. Putting labels on certain databases can only confuse developers. I decided to introduce you to those terms because it's hard to avoid them when studying databases, but now that you know what it is, I want you to forget about it.