This is the second post in a series on asynchronous programming. The whole series explores a single question: What is asynchrony? When I first started digging into this, I thought I had a solid grasp of it. Turns out, I didn't know the first thing about asynchrony. So, let’s dive in together!
Whole series:
- Asynchronous Programming. Blocking I/O and non-blocking I/O
- Asynchronous Programming. Threads and Processes
- Asynchronous Programming. Cooperative multitasking
- Asynchronous Programming. Await the Future
- Asynchronous Programming. Python3.5+
Our goal in multitasking is to efficiently manage multiple client connections at once. Let’s explore some different approaches and their impacts on performance and resource utilization.
Separate Processes
The simplest (and historically, the first) approach is to handle each request in a separate process. This method works because we can use the familiar blocking I/O API, and if a process fails, only that process is affected, not the others.
The minus is complex communication. Formally there is almost nothing in common between the processes, and any non-trivial communication between the processes that we want to organize requires additional efforts to synchronize access, etc. Also, at any moment, there can be several processes that just wait for client requests, and this is just a waste of resources.
How does this work in practice? The main (or master) process starts and generates worker processes. Each worker can receive requests on the same socket and waits for incoming clients. When a new connection appears, one of the workers takes it, processes it, closes the socket, and waits for the next request. Variations are possible — the process can be generated for each incoming connection, or they can all be started in advance, etc. This may affect performance, but it is not so important for us now.
Here’s a simple Python example of a process-based server using the multiprocessing library to create a new process for each client connection:
import socket
from multiprocessing import Process
def handle_client(connection):
"""Function to handle client connections in separate processes."""
with connection:
print("Connected:", connection)
while True:
data = connection.recv(1024)
if not data:
break
print("Received:", data.decode("utf-8"))
connection.sendall(b"Echo: " + data)
def main():
host = socket.gethostname()
port = 12345
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
sock.bind((host, port))
sock.listen(5)
print("Server started...")
while True:
client_conn, client_addr = sock.accept()
print(f"Connection from {client_addr}")
# Create a new process for each client connection
process = Process(target=handle_client, args=(client_conn,))
process.start()
client_conn.close() # Close in main process, handled in child
if __name__ == "__main__":
main()
Examples of systems that use this approach:
- Apache
mod_prefork
- FastCGI, commonly used with PHP
- Phusion Passenger (Ruby on Rails)
- PostgreSQL
Threads
Another approach is to use Operating System (OS) threads, allowing multiple threads within a single process. Here, blocking I/O is manageable since only the thread performing I/O will be blocked.
Example:
import socket
import threading
def handler(client):
while True:
data = client.recv(1024)
if data:
print(data)
else:
break
client.close()
def main() -> None:
host = socket.gethostname()
port = 12345
# create a TCP/IP socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
# bind the socket to the port
sock.bind((host, port))
# listen for incoming connections
sock.listen(5)
print("Server started...")
while True:
client, addr = sock.accept()
threading.Thread(target=handler, args=(client,)).start()
if __name__ == "__main__":
main()
To check the number of threads in the server process, you can use the Linux ps
command with the server process's PID:
$ ps huH p <PID> | wc -l
The operating system manages the threads itself and is capable of distributing them between available CPU cores. In essence, it means we can generate more threads than processes on the same system. We can hardly run 10,000 processes, but 10,000 threads can be easy. Not that it'll be efficient.
On the other hand, there is no isolation, i.e. if there is any crash, it may cause not only one particular thread to crash but the whole process to crash. And the biggest difficulty is that memory of the process where threads work is shared by threads.
Hybrid Models
Some architectures use a hybrid approach, combining processes and threads to balance isolation with performance. By creating multiple processes, each with its own set of threads, we get a blend of both models’ benefits: the isolation and fault tolerance of processes and the efficiency of threads.
In a hybrid model, the main process (often called a master process) spawns several worker processes. Each worker then manages a pool of threads to handle multiple requests concurrently. If a process fails, it only takes down the threads within that process, which the master can restart without affecting the rest of the system.
Hybrid models also support CPU-intensive tasks because the process can be distributed across CPU cores, with threads within each process handling smaller concurrent tasks, like I/O.
For example:
- Apache
worker
MPM: Uses a process-based model where each process handles a group of threads, combining the stability of process-based isolation with the performance benefits of threading. - NGINX with
epoll
orkqueue
: Although primarily asynchronous, it also uses a worker-process model, balancing lightweight concurrency with robust process-level fault tolerance.
Preemptive Multitasking
A key concept in managing multiple threads or processes is preemptive multitasking. In a preemptive multitasking system, the operating system’s scheduler has the authority to interrupt and switch between tasks to ensure efficient CPU usage. Preemptive multitasking is entirely managed by the operating system. In this model, the OS scheduler has the authority to interrupt tasks and switch between them to maximize CPU efficiency. This approach prevents any single task from monopolizing the CPU, thereby enhancing responsiveness, especially in environments where tasks may block or run for an unpredictable amount of time.
In a preemptive system, each task (or thread) is given a time slice, a short period during which it can run. Once this slice ends, the scheduler decides whether to allow the task to continue or switch to another task waiting in the queue. This allows for smoother multitasking since tasks are automatically shared based on priority, and the OS can manage them independently of the application’s awareness.
This lead to several advantages:
- Better Resource Utilization: The OS efficiently uses CPU time, ensuring that idle or blocked tasks don’t waste resources.
- Improved System Responsiveness: The system can interrupt a long-running task to attend to more critical tasks, improving overall responsiveness.
- Isolation of Tasks: Preemptive multitasking reduces the chance that a single misbehaving task can lock up the entire system since the OS can switch to other tasks or terminate problematic ones if needed.
Challenges with Preemptive Multitasking
However, preemptive multitasking introduces challenges, particularly around synchronization. We have a shared resource — memory, and it means that there is a need to synchronize access to it. While the problem of synchronizing access to shared memory is the simplest case, but for example, there can be a connection to the database, or a pool of connections to the database, which is common for all the threads inside the application that handles incoming connections. It is difficult to synchronize access to the 3rd party resources.
There are common synchronization problems:
Deadlocks
During synchronization, deadlocks can occur. A deadlock happens when a process or thread enters a waiting state because the resource it needs is held by another waiting process, which itself is waiting for another resource held by a different process. For instance:
- Process 1 needs Resource B, which is held by Process 2.
- Process 2, in turn, requires Resource A to complete, but it’s held by Process 1.
This circular dependency causes both processes to wait indefinitely, locking each other out. To address deadlocks, engineers often employ strategies like assigning a priority order to resources (avoiding circular dependencies), setting timeouts, or using deadlock detection algorithms to identify and resolve deadlocks before they escalate.
Lack of Synchronization
Lack of synchronization happens when multiple threads access and modify shared data without appropriate controls, leading to race conditions and data inconsistency. For example, two threads might attempt to update a shared counter simultaneously, resulting in unpredictable values. This type of error can be hard to debug because not all issues appear immediately — some only manifest under specific conditions, making them elusive and often intermittent.
One straightforward approach to synchronization is the Global Interpreter Lock (GIL) in Python, which uses a single semaphore to protect all memory within a process. While GIL prevents race conditions, it also restricts true parallelism on multi-core systems, as only one thread can execute Python bytecode at a time. This makes GIL a double-edged sword: it simplifies thread safety but at the cost of scalability in CPU-bound programs.
In the next post, we’ll dive into cooperative multitasking and its implementations. Stay tuned!