Asynchronous programming. Cooperative multitasking

This is the second post in a series on asynchronous programming. The whole series tries to answer a simple question: "What is asynchrony?". In the beginning, when I first started digging into the question, I thought I knew what it is. It turned out that I didn't know the slightest thing about asynchrony. So let's find out!

Whole series:

In the previous post, we talked about how to provide concurrent processing of several requests. Hence we suggested that it could be implemented with the help of threads or processes. However, there is one more option — cooperative multitasking(aka non-preemptive multitasking).

What we are saying here is that the operating system is definitely amazing as there are schedulers/planners, it can handle processes, threads, switch between them, etc. Unfortunately it still doesn't know how our application works so it will pause the thread/process at a random moment(probably not the best one), save the context, and switch to the next thread/process(aka preemptive multitasking). But we, developers, know how our application works. We know that we have short periods when some computing operations are performed on the CPU, but most of the time we wait for network I/O, and we know better when to switch between processing individual requests.

From an OS perspective, cooperative multitasking is just one execution thread, but inside it, the application has the power to decide when to switch between processing individual requests/commands. Once some data arrives, the application reads it, parses the request, sends the data to the database for example, and this is a blocking operation, but instead of waiting for a response from the database, it can start processing another request. This is called "cooperation" because all tasks/commands have to cooperate to make the entire planning scheme work. They intersperse with each other but in a single control thread, known as a cooperative scheduler, whose role is to start processes and give them the opportunity to voluntarily take control back.

This is easier than preemptive multitasking because the developer always knows that when one task Is being preformed, the other is not. Although in a single-processor system a multithreaded application will be executed in an interleaved manner as well, a programmer using threads must still think about errors so that the application doesn't work incorrectly when switching to a multi-core system. However, a single-threaded asynchronous system will always be executed in interleaved fashion even on a multi-core system.

The complexity of writing such programs lies in the fact that this process of switching, maintaining the context as such, organizing each task as a sequence of smaller steps performed with interruptions, falls on the shoulders of developers. On the other hand, we win in efficiency because there are no unnecessary context switches, like a processor context when switching between threads and processes.

There are two ways to implement cooperative multitasking — callbacks and cooperative threads.

Callbacks

All cooperative operations cause the action to occur sometime in the future and our execution thread should return the result when it is ready. So in order to get the result, we have to register a callback — if the request/operation is successful, it will call one function, if request/operation is not successful, it will call another. Callback is an explicit approach, i.e the developer should write programs as if he really doesn't know when the callback function will be called.

It is the most widely used option because it is explicit and supported by most modern languages. Also, there are Futures or Promises — it's kind of the same thing internally but with more clear API.

The pros and cons:

It differs from threaded programs and doesn't have their problems;
Callbacks swallow exceptions;
The callback becomes confusing and difficult to debug.

Cooperative Threads

The second way is implicit when developers write a program in such a way that there seems to be no cooperative multitasking. There are different shades of this approach: user-threads(aka green threads) or coroutines.

Using green threads, we can perform a blocking operation as we have done before and expect the result right away as if it is non-blocking. But there is black magic "under the hood" — there is a framework or programming language that makes the blocking operation non-blocking and transfers control to some other execution thread but not in the sense of the operating system thread but in the sense of a logical thread (user-level thread). These threads are executed by a "normal" user process and not by the OS.

In coroutines, you should write programs that add some "checkpoints" where your function can be paused and resumed. Exit can be done by calling other coroutines, which may later return to the point where they are calling the original coroutine. Coroutines are very similar to threads. However, coroutines are cooperatively multitasked, while threads tend to be preemptively multitasked. There is no need for synchronization primitives such as mutexes, semaphores, etc. and no need for support from the operating system.

The pros and cons:

They are controlled at the user-space level, not the OS;
They feel like synchronous programming;
Include all the problems of normal threading programming except switching the context of the CPU.

Reactor/Proactor patterns

Within cooperative multitasking, there is always a processing engine that is responsible for all I/O processing. It is called the Reactor after the design template name. The reactor interface says, "Give me a bunch of your sockets and your callbacks, and when that socket is ready for I/O, I will call your callback functions. A reactor job is to react to I/O events by delegating all the processing to the appropriate handler(worker). The handlers perform processing, so there is no need to block I/O, as long as handlers or callbacks for events are registered to take care of them.

The purpose of the reactor design template is to avoid the common problem of creating a thread for each message, request, and connection. It receives events from multiple handlers and sequentially distributes them to the corresponding event handlers. In principle, the standard Reactor allows the application to be run with simultaneous events while maintaining the simplicity of single-thread processing. It would usually use non-blocking synchronous I/O(check out multiplexing in the I/O models). What is more interesting is the Proactor pattern. It's an asynchronous version of the Reactor pattern. It usually uses true asynchronous I/O operations provided by the OS(check out AIO in the I/O models).

But there are limitations to such an approach.

Firstly by using this pattern, it limits the types of operations that you can perform on any supported platform. Reactors, on the other hand, can handle any event types. Secondly, there are buffer space limitations. A buffer has to be for each asynchronous operation for the duration of the I/O which can run basically forever. Those two paradigms underlie the nginx HTTP server, Node.js via libuv, Twisted Python, and the new asyncio libraries in Python.

Best approach

But none of the options is really perfect. The combination works best because cooperative multitasking usually wins, especially if your connections hang up for a long time. For example, a web socket is a long-lasting connection. If you allocate a single process or a single thread to handle a single web socket, you significantly limit the number of connections to one backend server at a time. And because the connection will last a long time, it's important to keep many simultaneous connections, while each connection will have little work to do.

The problem with multitasking is that it can only use one processor core. Clearly, you can run multiple instances of an application on the same machine, although this is not always convenient and has its drawbacks). Therefore it is a good idea to run multiple processes using a reactor/proactor and use cooperative multitasking within each process.

This combination allows, on one hand, to use all available processor cores in our system and, on the other hand, it works efficiently inside each core without allocating a lot of resources to handle each individual connection.

Conclusion

The difficulty in writing applications that use cooperative multitasking is that this switching process while maintaining the context as such, falls on the shoulders of poor developers. On the other hand, by using this approach, we attain efficiency by avoiding unnecessary switches.

A more interesting solution comes from combining cooperative multitasking with Reactor/Proactor patterns.

In the next post, we will talk about asynchronous programming itself and how it differs from synchronous programming, about old concepts but considered on a new level and using new terms.

Check out my book on asynchronous concepts:

Additional materials

Grokking Concurrency by Kirill Bobrov