Design Scalable, Fault-Tolerant Replication System: Best Practices

So, you must have heard the word “Replication” multiple times, let’s crack it down.

Replication is simply creating multiple copies of data, but isn’t it bad? we are just wasting storage by copying the same data multiple times.

No, it’s not, let's understand why we need replication:

To keep the database reliable, allowing the system to work fine even if some database fails.
To keep data geographically closer to us for faster response
To scale the system to the moon 🚀

How do we do it in distributed systems, when all the databases are connected via the network?

There are multiple algorithms which can be used based on different system requirements for replicating a database.

Single Leader Replication
Multi Leader Replication
Leaderless Replication

Almost all distributed databases use one of these three approaches. They all have various pros and cons, which we will examine in detail.

Each node containing a copy of the database is called a “replica”. Every write in the database needs to be written to all these replicas. But how do we achieve it?

There are many possible approaches to achieving it. We will discuss only the Single Leader Replication, the next two will be covered in upcoming articles.

Single Leader Replication :

All the write requests will be sent to a single database which is termed as “leader”. Whenever the leader writes new data to its local storage, it also sends the data change to all of its followers as part of a replication log or change stream.
The followers receive this log and apply the changes to their local copies of the database, making sure to apply the updates in the same order that they were processed on the leader.
Whenever a client wants to read, the followers are there to handle the read requests.

Now there’s a catch. How will the logs send to the followers? Synchronously or Asynchronously

Let’s figure it out

Synchronously — We want to have at least one of the followers to be up to date with the leader so that in case the leader fails we will have our most recent updates safe. So on every write, the leader will write to one follower and when the write is complete, it will send the response to the client. And all other followers will asynchronously get updates from that follower. It’s pretty good until something fails.

The advantage of synchronous replication is that the follower is guaranteed to have an up-to-date copy of the data that is consistent with the leader.

Leader Failure: If the leader suddenly fails, we can be sure that the data is still available to the follower.

Follower Failure: If the synchronous follower doesn’t respond, the writer cannot be processed. The leader must block all writes and wait until the synchronous replica is available again. That’s why it is impractical for all followers to be synchronous: any one node outage would cause the whole system to grind to a halt.

Semi-Synchronous: If the synchronous database fails, make one of the asynchronous databases synchronous. This will guarantee up-to-date data on at least two nodes.

Asynchronously — Often this is used nowadays because it allows the leader to continue processing write requests without waiting for the followers to catch up. This can improve the performance and availability of the system since the leader is not slowed down by the need to wait for the followers to apply the updates.

However, this also means that there may be a delay between the time that a write is made to the leader and when it is applied to the followers. This can result in a slight inconsistency between the leader and the followers until the followers catch up, which is why asynchronous replication is generally best suited for scenarios where strong consistency is not a strict requirement.

Simply copying data files from one node to another is not an effective way to replicate the database, because the data is constantly changing as clients make updates. If a file copy were performed while the data was in flux, the copy would contain a snapshot of the data at a specific point in time, which might not be consistent with other parts of the database. As a result, the copied data might not be meaningful or useful.

How to prevent it?

Simple, by taking a lock on the database after every writes operation. But this will make the system unavailable for that period, which we don’t want.

Then what?

Here’s how we can do it-

Take a consistent snapshot of the leader’s database at some time.
Copy the entire snapshot to the new follower node.
The follower connects to the leader and requests all the data changes that have happened since the snapshot was taken.
Once the follower has applied all of the updates to its local copy of the database that were included in the replication log or change stream since the last snapshot, it is said to have caught up. At this point, the follower can continue processing updates from the leader in real time as they are made, rather than needing to catch up on a backlog of changes.

That’s how we can replicate using a Single Leader Replication.

Like if you find it helpful.

Thanks for reading.

Replication in System Design

Techniques and Best Practices for scalable systems

Table of contents

No headings in the article.

Replication in System Design

Techniques and Best Practices for scalable systems

Table of contents

No headings in the article.

Did you find this article valuable?