Durability is a guarantee that, once the Kafka broker confirms that the data is written, it will be permanent. Databases implement it by storing it in non-volatile storage. Kafka doesn't follow the DB approach!
Short Answer
Short answer is that, Kafka doesn't rely on the physical storage (i.e. file system) as the criteria that a message write is complete. It relies on the replicas.
Long Answer
When the message arrives to the broker, it first writes it to the in-memory copy of leader replica. Now it has following things to do before considering the write successful.
Assume that, replication factor > 1.
Assume that, replication factor > 1.
1. Persist the message in the file system of the partition leader.
2. Replicate the message to the all ISRs (in-sync replicas).
In ideal scenario, both above are important and should be done irrespective of order. But, the real question is, when does Kafka considers that the message write is complete? To answer this, let's try to answer below question-
If a consumer asks for a message 4 which just go persisted on the leader, will the leader return the data? And the answer is NO!
It's interesting to note that, not all data that exists on the leader is available for clients to read. Clients can read only those messages that were written to in-sync replicas. The replica leader knows which messages were replicated to which replica, so until it's replicated it will not be returned to the client. Attempt to read those messages will result in empty response.
So, now it's obvious, just writing the message to leader (including persisting to the file system) is hardly of any use. Kafka considers a message written only if it's replicated to all in-sync replicas.
~Happy replication!
No comments:
Post a Comment