Time, Clocks, and the Ordering of Events in a Distributed System

Introduction
Our Distributed System
Partial Ordering
Logical Clocks
Ordering the Events Totally
Example: Synchronizing Access to a Shared Resource in a Distributed System
Anomalous behavior
Example: Synchronizing Physical Clocks
My Thoughts

Introduction

Time, Clocks, and the Ordering of Events in a Distributed System (link to paper) is a paper written by Leslie Lamport, and covers some of the core challenges that we are faced with when building a distributed system. And as the title suggests, the challenges are around the concept of time, clocks, and when events within your system occur relative to each other.

People tend to think about events in the real world as things that happened at a particular instant. You can reflect back on your day and create a timeline that shows all the events in order that happened throughout the day.

This approach will get you into trouble as you design a distributed system. To understand why, we should first look at how a distributed system is defined:

"A distributed system is a collection of processes which communicate by exchanging messages, in which the time it takes to transmit a message is not negligible compared to the time between events in a single process."

If you rely solely on physical time, it would be impossible for processes to know in which order events truly happened. Physical clocks become out of sync, and the times that events occurr at become distorted by message transmission times. Messages might arrived delayed or out of order.

The only thing that can be agreed upon are events that directly follow each other on a single process. We can also agree that the sending of a message occurs before the receipt of a message. But this only gives us a partial view into the ordering of events that occur.

To build a distributed system, we have to devise a way to take this partial ordering of events, and extend it to a consistent total order of events that the system can agree upon. Having the consistent total order of all events across all processes is the key to building a distributed system.

Our Distributed System

Before we move on to how we can build a consistent total order, lets define what our distributed system is:

it is composed of a collection of processes
Each process consists of a sequence of events.
The events of a process form a sequence where event a occurs before event b in the sequence if a happens before b.
Sending or receiving a message is an event in a process.

Partial Ordering

Next we need a way of saying that an event a happened before event b. But we must be able to do so without a physical clock.

This is easy for a single process. It knows of each event that occurred within itself, and in what order (note the use of the word order, and not time) the events occurred in respect to each other. note: we could say that each process has a total ordering of the events that occurred within itself

It gets trickier when talking about the order of events across processes, but this is what we can say so far:

If a and b are events in the same process, and a comes before b in the processes' order of events, then a happened before b.
- From now on we will denote "happened before" with the symbol "->".
If a is the sending of a message by one process, and b is the receipt of the same message by another process, than a -> b.
If a -> b and b -> c, then a -> c. Also, two distinct events a and b are concurrent if a doesn't "happen before" b, and b doesn't "happen before" a.

These statements imply that "->" is a partial ordering on all of all events on the system. We still can't say in what order all events across the system occurred, but through these statements, we can order some of the events.

So to recap, the partial ordering is the ordering of events that have the happened before relationship. But the happened before relationship is constrained to events that happened within a single process, and the sending and receipt of a message between two processes. If we have many processes with events, not all of them will have this happened before relationship. How can we take these unrelated events and decide in what order they occurred?

Logical Clocks

The next step to find a total ordering of all events, is to introduce clocks.

So we now introduce clocks into the system. But they are not related to physical time. They simply assign a number to an event, and can be implemented as a counter with no timing mechanism. For example If a process has two events that occurred within itself, the clock time assigned to event a could be 1, and the clock time to event b could be assigned 2 if a -> b.

And that is what we'll define as the clock condition: for two events, a and b, the clock time assigned to a is less than the clock time assigned to b, if a -> b.

Based on what we said above, The following two conditions must hold for the clock condition to be satisfied:

If a and b are events on process P_i, and a -> b, then C_i(a) < C_i(b). note: C_i denotes P_i's clock, and C_i(e) is the clock time assigned to event e
If a is the sending of a message by Process P_i, and b is the receipt of that message by process P_j, then C_i(a) < C_j(b).

Then all we have to do to implement those two conditions is:

Each process P_i increments C_i between any two successive events.
(a) If an event a is the sending of a message m by process P_i, then the message m contains a timestamp T_m = C_i(a).
(b) Upon receiving a message m, process P_j must set C_j greater than or equal to its present value and greater than T_m.

Notice that in the implementation, (a) lets other processes know at what logical time the event took place, and (b) is needed to ensure that the sending of the message takes place before the receipt of the message in the system's logical time.

Ordering the Events Totally

Now that we have a system of logical clocks that satisfy the clock condition, we just need to extend it a little bit to have a total order of events.

First we decide how to break ties if two events from different processes are assigned the same logical clock time: we simply set an arbitrary priority to the processes. Event a on P_i which takes place at time 1 occurs before event b on P_j at time 1 because we say so. We just need to keep this tie-breaking priority order consistent throughout the system.

Now we can define the "happened before" partial ordering in terms of a total ordering (which we denote as "=>"):

If a is an event in process P_i, and b is an event in process P_j, then a => b if and only if either:

C_i(a) < C_j(b)
C_i(a) = C_j(b), and we've decided that P_i beats P_j on ties.

We can see that a -> b implies a => b.

With this method, we can now take all of the events that occurred across every process, and say in what order they occurred. This is the total order of events.

The partial ordering of events in the system are unique. The total ordering is not unique. Different clocks satisfying the clock condition yield different total orderings. The important thing is that the processes end up with the same total order.

Now that we have devised a way to have a consistent total order of all of the events in the system, we can build a distributed system. Let's take a look at an example that shows why the total order makes it possible to do so.

Example: Synchronizing Access to a Shared Resource in a Distributed System

We can use the total order to implement the mutual exclusion problem of synchronizing access to a shared resource.

Let's say we have a system that consists of a fixed set of processes and one shared resource. We want to make sure that only one process has access to the shared resource so that there are no conflicts. The algorithm does 3 things:

A process that has access to the resource must release it before access is given to another process.
Different requests for access to the resource is given in the order the requests are made
If every process which requests access to the resource releases it, then all requests are eventually granted.

If we use the logical clocks from above, we can have a total ordering of all request and release operations. We just have to make sure each process learns of every other processes' operations.

Each process maintains its own request queue which is never seen by other processes. The request queue keeps all the requests sorted in total order.

The algorithm then has 5 rules:

To request a resource, a process P_i sends a message "T_m: P_i requests resource" to every other process, and then puts the message on its own request queue. Note: T_m is the logical timestamp of the message.
When process P_j receives the message "T_m: P_i requests resource", it puts it on its request queue and sends an acknowledgment message to P_i.
To release the resource, P_i removes any "T_m: P_i requests resource" message from its queue and sends the message "T_m: P_i releases resource" to all the other processes.
When P_j receives "T_m: P_i releases resource" it removes any "T_m: P_i requests resource" message from its request queue.
P_i is granted access to the resource when (i) there is a T_m: P_i requests resource" at the front of the queue (which is in total order), and (ii) P_i has received a message later than T_m from every other process.

This guarantees that the same (total ordered) request queue is synchronized across all processes.

Each process in the system can be thought of as a state machine processing a series of commands that transition the state machine to its next state. The series of commands is the request queue.

Each processes will end up in a consistent state because each one processes the events in the total order.

Anomalous behavior

We've come along way, but the system of clocks described so far has a problem if users are using it in the real world in real time. One user might take some action a. That user then might call up user b on the phone, and tell them to take some action b after action a. But the system might not know that a -> b unless the two processes have already communicated by sending a message.

This anomalous behavior can happen because there is no physical clock in the system. The logical clock has no way of knowing that a physically happened before b unless the two processes in which those events occurred happened to have sent messages to each other. To put it another way, the ordering of the logical clock differs from that perceived by the user.

There are a couple of ways this can be fixed. One is to have the users include timestamp information to give the system information about the time the event occurred which will give it the necessary information to calculate a correct total ordering. This might put an unwanted burden on the users of the system.

Another way to avoid this behavior is to introduce physical clocks into the system.

We will go this route. Let's take a look at another example, this time synchronizing physical clocks in a distributed system.

Example: Synchronizing Physical Clocks

This approach is similar to the previous example where we synchronized mutual exclusion to a shared resource across different processes, but this time we will synchronize a physical clock across processes.

When we talk about physical time, we imagine that time is continuous. We can take that view with our clocks too, rather than imagining clocks have discrete clicks.

Two clocks will never run at the same rate, and will slowly drift apart. The algorithm must ensure they don't drift further than an acceptable margin.

In addition to the clocks running at approximately the same rate, the algorithm must ensure that they also are synchronized to the same time within an acceptable margin of error.

We can introduce physical time to our distributed system by the following algorithm, which extends the implementation rules for the clock condition.

within a single process, the clock is continuous. It is changing at a constant rate.
a. If process P_i sends a message m at physical time t, the message has a timestamp T_m = C_i(t). b. Upon receiving a message at time t', process P_j sets it's clock C_j to the maximum of either C_j(t') or T_m plus the shortest message transmission time.

Note: deciding on the shortest transmission can be fairly complicated. Check the paper for more details

With this algorithm, each process only needs to know the reading of its own clock, and the timestamps of messages it receives in order to maintain the physical time.

This is also known as the strong clock condition, which is stronger than the clock condition because it takes physical clocks into account as well as the logical clocks.

My Thoughts

Things I found interesting about this paper:

This distributed system works by each process communicating to the other processes of events that occurred. As mentioned above, each process then becomes a state machine that processes each event, which leads to a new state. Because each process has each event in total order, they all end up in the same state after processing each event.
It seems like many systems built today become very brittle because processes in the system are isolated. They do not communicate events that occur within the process. This can create complexity when these isolated processes start to rely on each other and need to orchestrate the sharing of each processes' state among each other.
In the algorithm described in the paper, each process follows the algorithm independently without any central synchronization process or central storage. They are truly distributed algorithms.