Asynchronous Programming. Threads and Processes

7 min read Last updated Oct 27 2024

#concurrency #programming

This is the second post in a series on asynchronous programming. The whole series explores a single question: What is asynchrony? When I first started digging into this, I thought I had a solid grasp of it. Turns out, I didn't know the first thing about asynchrony. So, let's dive in together!

Whole series:

Our goal in multitasking is to efficiently manage multiple client connections at once. Let's explore some different approaches and their impacts on performance and resource utilization.

Separate Processes

Separate processes

The simplest (and historically, the first) approach is to handle each request in a separate process. This method works because we can use the familiar blocking I/O API, and if a process fails, only that process is affected, not the others.

The minus is complex communication. Formally there is almost nothing in common between the processes, and any non-trivial communication between the processes that we want to organize requires additional efforts to synchronize access, etc. Also, at any moment, there can be several processes that just wait for client requests, and this is just a waste of resources.

How does this work in practice? The main (or master) process starts and generates worker processes. Each worker can receive requests on the same socket and waits for incoming clients. When a new connection appears, one of the workers takes it, processes it, closes the socket, and waits for the next request. Variations are possible — the process can be generated for each incoming connection, or they can all be started in advance, etc. This may affect performance, but it is not so important for us now.

Here's a simple Python example of a process-based server using the multiprocessing library to create a new process for each client connection:

import socket
from multiprocessing import Process

def handle_client(connection):
    """Function to handle client connections in separate processes."""
    with connection:
        print("Connected:", connection)
        while True:
            data = connection.recv(1024)
            if not data:
                break
            print("Received:", data.decode("utf-8"))
            connection.sendall(b"Echo: " + data)

def main():
    host = socket.gethostname()
    port = 12345
    
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        sock.bind((host, port))
        sock.listen(5)
        print("Server started...")
        
        while True:
            client_conn, client_addr = sock.accept()
            print(f"Connection from {client_addr}")
            
            # Create a new process for each client connection
            process = Process(target=handle_client, args=(client_conn,))
            process.start()
            client_conn.close()  # Close in main process, handled in child

if __name__ == "__main__":
    main()

Examples of systems that use this approach:

Apache mod_prefork
FastCGI, commonly used with PHP
Phusion Passenger (Ruby on Rails)
PostgreSQL

Threads

Another approach is to use Operating System (OS) threads, allowing multiple threads within a single process. Here, blocking I/O is manageable since only the thread performing I/O will be blocked.

Example:

import socket
import threading


def handler(client):
    while True:
        data = client.recv(1024)
        if data:
            print(data)
        else:
            break
        
    client.close()

def main() -> None:
    host = socket.gethostname()
    port = 12345

    # create a TCP/IP socket
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        # bind the socket to the port
        sock.bind((host, port))
        # listen for incoming connections
        sock.listen(5)
        print("Server started...")

        while True:
            client, addr = sock.accept()
            threading.Thread(target=handler, args=(client,)).start()

if __name__ == "__main__":
    main()

To check the number of threads in the server process, you can use the Linux ps command with the server process's PID:

$ ps huH p <PID> | wc -l

The operating system manages the threads itself and is capable of distributing them between available CPU cores. In essence, it means we can generate more threads than processes on the same system. We can hardly run 10,000 processes, but 10,000 threads can be easy. Not that it'll be efficient.

On the other hand, there is no isolation, i.e. if there is any crash, it may cause not only one particular thread to crash but the whole process to crash. And the biggest difficulty is that memory of the process where threads work is shared by threads.

Hybrid Models

Some architectures use a hybrid approach, combining processes and threads to balance isolation with performance. By creating multiple processes, each with its own set of threads, we get a blend of both models' benefits: the isolation and fault tolerance of processes and the efficiency of threads.

In a hybrid model, the main process (often called a master process) spawns several worker processes. Each worker then manages a pool of threads to handle multiple requests concurrently. If a process fails, it only takes down the threads within that process, which the master can restart without affecting the rest of the system.

Hybrid models also support CPU-intensive tasks because the process can be distributed across CPU cores, with threads within each process handling smaller concurrent tasks, like I/O.

For example:

Apache worker MPM: Uses a process-based model where each process handles a group of threads, combining the stability of process-based isolation with the performance benefits of threading.
NGINX with epoll or kqueue: Although primarily asynchronous, it also uses a worker-process model, balancing lightweight concurrency with robust process-level fault tolerance.

Preemptive Multitasking

A key concept in managing multiple threads or processes is preemptive multitasking. In a preemptive multitasking system, the operating system's scheduler has the authority to interrupt and switch between tasks to ensure efficient CPU usage. Preemptive multitasking is entirely managed by the operating system. In this model, the OS scheduler has the authority to interrupt tasks and switch between them to maximize CPU efficiency. This approach prevents any single task from monopolizing the CPU, thereby enhancing responsiveness, especially in environments where tasks may block or run for an unpredictable amount of time.

In a preemptive system, each task (or thread) is given a time slice, a short period during which it can run. Once this slice ends, the scheduler decides whether to allow the task to continue or switch to another task waiting in the queue. This allows for smoother multitasking since tasks are automatically shared based on priority, and the OS can manage them independently of the application's awareness.

This lead to several advantages:

Better Resource Utilization: The OS efficiently uses CPU time, ensuring that idle or blocked tasks don't waste resources.
Improved System Responsiveness: The system can interrupt a long-running task to attend to more critical tasks, improving overall responsiveness.
Isolation of Tasks: Preemptive multitasking reduces the chance that a single misbehaving task can lock up the entire system since the OS can switch to other tasks or terminate problematic ones if needed.

Challenges with Preemptive Multitasking

However, preemptive multitasking introduces challenges, particularly around synchronization. We have a shared resource — memory, and it means that there is a need to synchronize access to it. While the problem of synchronizing access to shared memory is the simplest case, but for example, there can be a connection to the database, or a pool of connections to the database, which is common for all the threads inside the application that handles incoming connections. It is difficult to synchronize access to the 3rd party resources.

There are common synchronization problems:

Deadlocks

During synchronization, deadlocks can occur. A deadlock happens when a process or thread enters a waiting state because the resource it needs is held by another waiting process, which itself is waiting for another resource held by a different process. For instance:

Process 1 needs Resource B, which is held by Process 2.
Process 2, in turn, requires Resource A to complete, but it's held by Process 1.

This circular dependency causes both processes to wait indefinitely, locking each other out. To address deadlocks, engineers often employ strategies like assigning a priority order to resources (avoiding circular dependencies), setting timeouts, or using deadlock detection algorithms to identify and resolve deadlocks before they escalate.

Lack of Synchronization

Lack of synchronization happens when multiple threads access and modify shared data without appropriate controls, leading to race conditions and data inconsistency. For example, two threads might attempt to update a shared counter simultaneously, resulting in unpredictable values. This type of error can be hard to debug because not all issues appear immediately — some only manifest under specific conditions, making them elusive and often intermittent.

One straightforward approach to synchronization is the Global Interpreter Lock (GIL) in Python, which uses a single semaphore to protect all memory within a process. While GIL prevents race conditions, it also restricts true parallelism on multi-core systems, as only one thread can execute Python bytecode at a time. This makes GIL a double-edged sword: it simplifies thread safety but at the cost of scalability in CPU-bound programs.

In the next post, we'll dive into cooperative multitasking and its implementations. Stay tuned!

Additional materials

Liked this? I publish one deep-dive every week.

Join 2,500+ engineers. No BS, no vendor fluff.

Get the newsletter