Asynchronous programming. Blocking I/O and non-blocking I/O

Asynchronous programming. Blocking I/O and non-blocking I/O

This is the first post in a series on asynchronous programming. The whole series tries to answer a simple question: "What is asynchrony?". In the beginning, when I first started digging into the question, I thought I knew what it is. It turned out that I didn't know the slightest thing about asynchrony. So let's find out!

Whole series:

In this post, we will be talking about networking but you can easily map it to other input/output(I/O) operation, for example, change sockets to file descriptors. And this explanation is not focusing on any specific programming language although the examples will be in Python(what can I say — I love Python).


One way or another, when you have a question about blocking or non-blocking calls, most commonly it means dealing with I/O. The most frequent case in our age of information, microservices and lambda functions will be processing requests. We can immediately imagine that you, dear reader, are a user of a web site, your browser (or the application where you're reading these lines) will be a client. Somewhere in the depths of the Amazon, there is a server that handles incoming requests from you to generate the same lines that you're reading.

In order to start an interaction in such client-server communications, the client and the server must first establish a connection with each other. We will not go into the depths of the 7-layer model and the protocol stack that are involved in this interaction, I think it all without problems can be found on the internet. What we need to understand is that on both sides (client and server) there are special connection points known as sockets. Both the client and server must be bound to each other's sockets, and listen to them to understand what the other says on the other side of the wire.

Client-server communication

In our communication, the server doing something — processes the request, converts markdown to HTML, looks for where the images are and so on. It does some kind of processing.

If you look at the ratio between CPU speed and network speed, the difference is a couple of orders of magnitude. It turns out that if our application uses I/O most of the time, in most cases the processor simply does nothing. This type of application is called I/O-bound. For applications that require high performance, it's a bottleneck, and that's what we'll talk about next.

There are 2 ways to organize I/O (I will give examples based on Linux): blocking and non-blocking.

And 2 types of I/O operations: synchronous and asynchronous.

All together they represent possible I/O models.

I/O models

Each of these I/O models has usage patterns that are advantageous for particular applications. Here I will be showing the difference between the two ways of organizing I/O.

Blocking I/O

With the blocking I/O, when the client makes a connection request to the server, the socket processing that connection and the corresponding thread that reads from it is blocked until some read data appears. This data is placed in the network buffer until it is all read and ready for processing. Until the operation is complete, the server can do nothing more than wait.

The simplest conclusion from this is that we cannot serve more than one connection within a single thread. By default, TCP sockets work in blocking mode.

A simple example on Python, client:

import socket
import sys
import time


def main() -> None:
    host = socket.gethostname()
    port = 12345

    # create a TCP/IP socket
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        while True:
            sock.connect((host, port))
            while True:
                data = str.encode(sys.argv[1])
                sock.send(data)
                time.sleep(0.5)

if __name__ == "__main__":
    assert len(sys.argv) > 1, "Please provide message"
    main()

Here we send a message with 50ms interval to the server forever. Imagine that this client-server communication is the downloading big file — it takes some time to finish.

And the server:

import socket


def main() -> None:
    host = socket.gethostname()
    port = 12345
    
    # create a TCP/IP socket
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        # bind the socket to the port
        sock.bind((host, port))
        # listen for incoming connections
        sock.listen(5)
        print("Server started...")

        while True:
            conn, addr = sock.accept()  # accepting the incoming connection, blocking
            print('Connected by ' + str(addr))
            while True:
                data = conn.recv(1024)  # receving data, blocking
                if not data: 
                    break
                print(data)

if __name__ == "__main__":
    main()

I'm running this in separate terminal windows with several clients as:

$ python client.py "client N"

And server as:

$ python server.py

Here we just listening socket and accepting incoming connections. Then we try to receive data from this connection.

In the above code, the server will essentially be blocked by a single client connection! If we run another client with another message, you will not see it. I highly recommend that you play with this example to understand what is happening.

What is going on here?

The send() method will try to send all data to the server while the write buffer on the server will continue to receive data. When the system call for reading is called, the application is blocked and the context is switched to the kernel. The kernel initiates reading - the data is transferred to the user-space buffer. When the buffer becomes empty, the kernel will wake up the process again to receive the next portion of data to be transferred.

Now to handle two clients with this approach we need to have several threads, i.e. to allocate a new thread for each client connection. We will get back to that soon.

Non-blocking I/O

But there is also a second option — non-blocking I/O. From the name the differences are obvious — instead of blocking, any operation is executed immediately. Non-blocking I/O means that the request is immediately queued and the function is returned. The actual I/O is then processed at some later point.

By setting a socket to a non-blocking mode, you can effectively interrogate it. If you try to read from a non-blocking socket and there is no data, it will return an error code (EAGAIN or EWOULDBLOCK).

Actually, this polling type is a bad idea. If you run your program in a constant cycle of polling data from the socket, it will consume expensive CPU time. This can be extremely inefficient because in many cases the application must busy-wait until the data is available or attempt to do other work while the command is performed in the kernel. A more elegant solution to check if the data is readable is using select().

Let's go back to our example with the changes on the server:

import select
import socket


def main() -> None:
    host = socket.gethostname()
    port = 12345

    # create a TCP/IP socket
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        sock.setblocking(0)
        # bind the socket to the port
        sock.bind((host, port))
        # listen for incoming connections
        sock.listen(5)
        print("Server started...")

        # sockets from which we expect to read
        inputs = [sock]
        outputs = []

        while inputs:
            # wait for at least one of the sockets to be ready for processing
            readable, writable, exceptional = select.select(inputs, outputs, inputs)

            for s in readable:
                if s is sock:
                    conn, addr = s.accept()
                    inputs.append(conn)
                else:
                    data = s.recv(1024)
                    if data:
                        print(data)
                    else:
                        inputs.remove(s)
                        s.close()

if __name__ == "__main__":
    main()

Now if we run this code with >1 clients you will see that the server is not blocked by a single client and it handles everything that can be detected by the messages displayed. Again, I suggest that you try this example yourself.

What's going on here?

Here the server does not wait for all the data to be written to the buffer. When we make a socket non-blocking by calling setblocking(0), it will never wait for the operation to complete. So when we call the recv method, it will return to the main thread. The main mechanical difference is that send, recv, connect and accept can return without doing anything at all.

With this approach, we can perform multiple I/O operations with different sockets from the same thread concurrently. But since we don't know if a socket is ready for an I/O operation, we would have to ask each socket with the same question and essentially spin in an infinite loop (this non-blocking but still synchronous approach called I/O multiplexing).

To get rid of this inefficient loop, we need polling readiness mechanism. In this mechanism we could interrogate the readiness of all sockets, they would tell us which one is ready for the new I/O operation and which one is not without being explicitly asked. When any of the sockets are ready, we will perform operations in the queue and then be able to return to the blocking state, waiting for the sockets to be ready for the next I/O operation again.

There are several polling readiness mechanisms, they are different in performance and detail, but usually the details are hidden "under the hood" and not visible to us.

Keywords to search:

Notifications:

  • Level Triggering (state)
  • Edge Triggering (state changed)

Mechanics:

  • select(), poll()
  • epoll(), kqueue()
  • EAGAIN, EWOULDBLOCK

Multitasking

So, our goal is to manage multiple clients at once. How can we ensure multiple requests are processed at the same time?

There are several options:

Separate processes

Separate processes

The simplest and historically first approach is to handle each request in a separate process. This is good because we can use the same blocking I/O API. If a process suddenly fails, it will only affect the operations that are processed in that particular process and not any others.

The minus is complex communication. Formally there is almost nothing in common between the processes, and any non-trivial communication between the processes that we want to organize requires additional efforts to synchronize access, etc. Also at any moment, there can be several processes that just wait for client requests, and this is just a waste of resources.

Let's see how this works in practice: usually, the first process (the main process/master process) starts. Then it generates some set of processes as workers, each of which can receive requests on the same socket and wait for incoming clients. As soon as an incoming connection appears, one of the processes handling it — receives this connection, processes it from beginning to end, closes the socket and then becomes ready again for the next request. Variations are possible — the process can be generated for each incoming connection, or they can all be started in advance, etc. This may affect performance, but it is not so important for us now.

Examples of such systems:

  • Apache mod_prefork;
  • FastCGI for those who most often run PHP;
  • Phusion Passenger for those who write on Ruby on Rails;
  • PostgreSQL.

Threads

Another approach is to use Operating System(OS) threads. Within one process we can create several threads. I/O blocking can also be used because only one thread will be blocked.

Example:

import select
import threading
import socket


def handler(client):
    while True:
        data = client.recv(1024)
        if data:
            print(data)
        
    client.close()

def main() -> None:
    host = socket.gethostname()
    port = 12345

    # create a TCP/IP socket
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        # bind the socket to the port
        sock.bind((host, port))
        # listen for incoming connections
        sock.listen(5)
        print("Server started...")

        while True:
            client, addr = sock.accept()
            threading.Thread(target=handler, args=(client,)).start()

if __name__ == "__main__":
    main()

To check the number of threads on the server process you can use linux ps command with server process PID:

$ ps huH p <PID> | wc -l

The operating system manages the threads itself and is capable of distributing them between available CPU cores. Threads are lighter than processes. In essence, it means we can generate more threads than processes on the same system. We can hardly run 10,000 processes, but 10,000 threads can be easy. Not that it'll be efficient.

On the other hand, there is no isolation, i.e. if there is any crash, it may cause not only one particular thread to crash but the whole process to crash. And the biggest difficulty is that memory of the process where threads work is shared by threads. We have a shared resource — memory, and it means that there is a need to synchronize access to it. And the problem of synchronizing access to shared memory is the simplest case, but for example, there can be a connection to the database, or a pool of connections to the database, which is common for all the threads inside the application that handles incoming connections. It is difficult to synchronize access to the 3rd party resources.

There are some problems:

  1. First — during the synchronization process it is possible deadlocks. A deadlock occurs when a process or thread enters a waiting state because the requested system resource is held by another waiting process which in its turn is waiting for another resource held by another waiting process(hope it makes sense)
  2. Lack of synchronization when we have competitive access to shared data. Roughly speaking, two threads change the data and spoil it at the same time. Such applications are more difficult to debug and not all the errors appear at once. For instance, the well-known GIL — Global Interpreter Lock is one of the simplest ways to make a multithreaded application. By using GIL we say that all the data structures, all our memory are protected with just one semaphore for the entire process.

In the next post, we will be talking about cooperative multitasking and its implementations.

Next post


Daily dose of