Asynchronous programming. Blocking I/O and non-blocking I/O

This is the first post in a series on asynchronous programming. The whole series tries to answer a simple question: "What is asynchrony?". In the beginning, when I first started digging into the question, I thought I knew what it is. It turned out that I didn't know the slightest thing about asynchrony. So let's find out!

Whole series:

In this post, we will be talking about networking but you can easily map it to other input/output(I/O) operations, for example, change sockets to file descriptors. Also, this explanation is not focusing on any specific programming language although the examples will be given in Python(what can I say – I love Python!).

One way or another, when you have a question about blocking or non-blocking calls, most commonly it means dealing with I/O. The most frequent case in our age of information, microservices, and lambda functions will be request processing. We can immediately imagine that you, dear reader, are a user of a web site, while your browser (or the application where you're reading these lines) is a client. Somewhere in the depths of the Amazon, there is a server that handles your incoming requests to generate the same lines that you're reading.

In order to start an interaction in such client-server communications, the client and the server must first establish a connection with each other. We will not go into the depths of the 7-layer model and the protocol stack that is involved in this interaction, as I think it all can be easily found on the Internet. What we need to understand is that on both sides (client and server) there are special connection points known as sockets. Both the client and server must be bound to each other's sockets, and listen to them to understand what the other says on the opposite side of the wire.

Client-server communication

In our communication, the server doing something — either processes the request, converts markdown to HTML or looks where the images are, it performs some kind of processing.

CPU speed and network speed

If you look at the ratio between CPU speed and network speed, the difference is a couple of orders of magnitude. It turns out that if our application uses I/O most of the time, in most cases the processor simply does nothing. This type of application is called I/O-bound. For applications that require high performance, it is a bottleneck, and that is what we will talk about next.

There are two ways to organize I/O (I will give examples based on Linux): blocking and non-blocking.

Also, there are two types of I/O operations: synchronous and asynchronous.

All together they represent possible I/O models.

I/O models

Each of these I/O models has usage patterns that are advantageous for particular applications. Here I will demonstrate the difference between the two ways of organizing I/O.

Blocking I/O

With the blocking I/O, when the client makes a connection request to the server, the socket processing that connection and the corresponding thread that reads from it is blocked until some read data appears. This data is placed in the network buffer until it is all read and ready for processing. Until the operation is complete, the server can do nothing more but wait.

The simplest conclusion from this is that we cannot serve more than one connection within a single thread. By default, TCP sockets work in blocking mode.

A simple example on Python, client:

import socket
import sys
import time


def main() -> None:
    host = socket.gethostname()
    port = 12345

    # create a TCP/IP socket
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        while True:
            sock.connect((host, port))
            while True:
                data = str.encode(sys.argv[1])
                sock.send(data)
                time.sleep(0.5)

if __name__ == "__main__":
    assert len(sys.argv) > 1, "Please provide message"
    main()

Here we send a message with 50ms interval to the server in the endless loop. Imagine that this client-server communication consist of downloading a big file — it takes some time to finish.

And the server:

import socket


def main() -> None:
    host = socket.gethostname()
    port = 12345
    
    # create a TCP/IP socket
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        # bind the socket to the port
        sock.bind((host, port))
        # listen for incoming connections
        sock.listen(5)
        print("Server started...")

        while True:
            conn, addr = sock.accept()  # accepting the incoming connection, blocking
            print('Connected by ' + str(addr))
            while True:
                data = conn.recv(1024)  # receving data, blocking
                if not data: 
                    break
                print(data)

if __name__ == "__main__":
    main()

I am running this in separate terminal windows with several clients as:

$ python client.py "client N"

And server as:

$ python server.py

Here we just listen to the socket and accept incoming connections. Then we try to receive data from this connection.

In the above code, the server will essentially be blocked by a single client connection! If we run another client with another message, you will not see it. I highly recommend that you play with this example to understand what is happening.

What is going on here?

The send() method will try to send all data to the server while the write buffer on the server will continue to receive data. When the system call for reading is called, the application is blocked and the context is switched to the kernel. The kernel initiates reading - the data is transferred to the user-space buffer. When the buffer becomes empty, the kernel will wake up the process again to receive the next portion of data to be transferred.

Now in order to handle two clients with this approach, we need to have several threads, i.e. to allocate a new thread for each client connection. We will get back to that soon.

Non-blocking I/O

However, there is also a second option — non-blocking I/O. The difference is obvious from its name — instead of blocking, any operation is executed immediately. Non-blocking I/O means that the request is immediately queued and the function is returned. The actual I/O is then processed at some later point.

By setting a socket to a non-blocking mode, you can effectively interrogate it. If you try to read from a non-blocking socket and there is no data, it will return an error code (EAGAIN or EWOULDBLOCK).

Actually, this polling type is a bad idea. If you run your program in a constant cycle of polling data from the socket, it will consume expensive CPU time. This can be extremely inefficient because in many cases the application must busy-wait until the data is available or attempt to do other work while the command is performed in the kernel. A more elegant way to check if the data is readable is using select().

Let us go back to our example with the changes on the server:

import select
import socket


def main() -> None:
    host = socket.gethostname()
    port = 12345

    # create a TCP/IP socket
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        sock.setblocking(0)
        # bind the socket to the port
        sock.bind((host, port))
        # listen for incoming connections
        sock.listen(5)
        print("Server started...")

        # sockets from which we expect to read
        inputs = [sock]
        outputs = []

        while inputs:
            # wait for at least one of the sockets to be ready for processing
            readable, writable, exceptional = select.select(inputs, outputs, inputs)

            for s in readable:
                if s is sock:
                    conn, addr = s.accept()
                    inputs.append(conn)
                else:
                    data = s.recv(1024)
                    if data:
                        print(data)
                    else:
                        inputs.remove(s)
                        s.close()

if __name__ == "__main__":
    main()

Now if we run this code with >1 clients you will see that the server is not blocked by a single client and it handles everything that can be detected by the messages displayed. Again, I suggest that you try this example yourself.

What's going on here?

Here the server does not wait for all the data to be written to the buffer. When we make a socket non-blocking by calling setblocking(0), it will never wait for the operation to be completed. So when we call the recv method, it will return to the main thread. The main mechanical difference is that send, recv, connect and accept can return without doing anything at all.

With this approach, we can perform multiple I/O operations with different sockets from the same thread concurrently. But since we don't know if a socket is ready for an I/O operation, we would have to ask each socket with the same question and essentially spin in an infinite loop (this non-blocking but the still synchronous approach is called I/O multiplexing).

To get rid of this inefficient loop, we need polling readiness mechanism. In this mechanism, we could interrogate the readiness of all sockets, and they would tell us which one is ready for the new I/O operation and which one is not without being explicitly asked. When any of the sockets is ready, we will perform operations in the queue and then be able to return to the blocking state, waiting for the sockets to be ready for the next I/O operation.

There are several polling readiness mechanisms, they are different in performance and detail, but usually, the details are hidden "under the hood" and not visible to us.

Keywords to search:

Notifications:

Level Triggering (state)
Edge Triggering (state changed)

Mechanics:

select(), poll()
epoll(), kqueue()
EAGAIN, EWOULDBLOCK

Multitasking

Therefore, our goal is to manage multiple clients at once. How can we ensure multiple requests are processed at the same time?

There are several options:

Separate processes

The simplest and historically first approach is to handle each request in a separate process. This approach is satisfactory because we can use the same blocking I/O API. If a process suddenly fails, it will only affect the operations that are processed in that particular process and not any others.

The minus is complex communication. Formally there is almost nothing in common between the processes, and any non-trivial communication between the processes that we want to organize requires additional efforts to synchronize access, etc. Also at any moment, there can be several processes that just wait for client requests, and this is just a waste of resources.

Let us see how this works in practice. As soon as the first process (the main process/master process) starts, it generates some set of processes as workers. Each of them can receive requests on the same socket and wait for incoming clients. As soon as an incoming connection appears, one of the processes handling it — receives this connection, processes it from beginning to end, closes the socket and then becomes ready again for the next request. Variations are possible — the process can be generated for each incoming connection, or they can all be started in advance, etc. This may affect performance, but it is not so important for us now.

Examples of such systems:

Apache mod_prefork;
FastCGI for those who most often run PHP;
Phusion Passenger for those who write on Ruby on Rails;
PostgreSQL.

Threads

Another approach is to use Operating System(OS) threads. Within one process we can create several threads. I/O blocking can also be used because only one thread will be blocked.

Example:

import select
import threading
import socket


def handler(client):
    while True:
        data = client.recv(1024)
        if data:
            print(data)
        
    client.close()

def main() -> None:
    host = socket.gethostname()
    port = 12345

    # create a TCP/IP socket
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        # bind the socket to the port
        sock.bind((host, port))
        # listen for incoming connections
        sock.listen(5)
        print("Server started...")

        while True:
            client, addr = sock.accept()
            threading.Thread(target=handler, args=(client,)).start()

if __name__ == "__main__":
    main()

To check the number of threads on the server process you can use linux ps command with server process PID:

$ ps huH p <PID> | wc -l

The operating system manages the threads itself and is capable of distributing them between available CPU cores. Threads are lighter than processes. In essence, it means we can generate more threads than processes on the same system. We can hardly run 10,000 processes, but 10,000 threads can be easy. Not that it'll be efficient.

On the other hand, there is no isolation, i.e. if there is any crash, it may cause not only one particular thread to crash but the whole process to crash. And the biggest difficulty is that memory of the process where threads work is shared by threads. We have a shared resource — memory, and it means that there is a need to synchronize access to it. While the problem of synchronizing access to shared memory is the simplest case, but for example, there can be a connection to the database, or a pool of connections to the database, which is common for all the threads inside the application that handles incoming connections. It is difficult to synchronize access to the 3rd party resources.

There are common synchronization problems:

During the synchronization process deadlocks are possible. A deadlock occurs when a process or thread enters a waiting state because the requested system resource is held by another waiting process which in turn is waiting for another resource held by another waiting process. For example, the following situation will cause a deadlock between two processes: Process 1 requests resource B from process 2. Resource B is locked while process 2 is running. Process 2 requires resource A from process 1 to finish running. Resource A is locked while process 1 is running.
Lack of synchronization when we have competitive access to shared data. Roughly speaking, two threads change the data and spoil it at the same time. Such applications are more difficult to debug and not all the errors appear at once. For instance, the well-known GIL in Python — Global Interpreter Lock is one of the simplest ways to make a multithreaded application. By using GIL we say that all the data structures, all our memory are protected by just one semaphore for the entire process. In the next chapter, we will be talking about cooperative multitasking and its implementations.

In the next post, we will be talking about cooperative multitasking and its implementations.

Check out my book on asynchronous concepts:

Additional materials

Grokking Concurrency by Kirill Bobrov