At first glance, concurrency and parallelism might seem identical, but this misunderstanding arises from missing the core distinction between them. Let’s break down what sets these concepts apart.
Concurrency
Concurrency occurs when multiple tasks are processed in overlapping time periods. Importantly, this doesn’t mean the tasks are always processed at the same time (although they can be). The key here is interruptibility: tasks can be divided into smaller, interleaved subtasks that don’t depend on each other’s completion order.
Concurrency can be implemented in many ways, based on whether it’s managed at the user or system level and whether it’s on a single or multi-core CPU.
User-Level Concurrency
User-level concurrency is managed within the application code itself, independently of the operating system. Here are a few common approaches:
-
Green Threads: Green threads are user-space threads managed by a runtime library or virtual machine, such as Python’s
greenlet
or Java’s early green threads implementation. This approach allows applications to create and switch between multiple tasks without involving the OS, which is resource-efficient but limits true parallelism. -
Asynchronous Programming (Async I/O): In async programming, tasks yield control when waiting for external events (e.g., I/O operations), allowing other tasks to proceed. Frameworks like Python’s
asyncio
and JavaScript’sasync/await
facilitate overlapping tasks, even on single-core CPUs, by handling I/O-bound operations non-blockingly. -
Coroutines: Similar to async programming, coroutines are functions that pause and resume execution. Languages like Python and Kotlin support coroutines, which are lightweight, making them ideal for high-frequency context switches, such as in web servers or real-time applications.
-
Fibers: Fibers are cooperative, user-managed concurrency primitives that allow precise control over execution, making them well-suited for workflows needing specific yield points. Languages like Ruby and C++ (using Boost or
libfib
) offer fiber support.
System-Level Concurrency
The operating system manages concurrency at the system level, allowing processes or threads to run independently and concurrently:
-
Processes: Processes are isolated instances of a program, each with its own memory space. This makes them concurrency-safe but resource-intensive. Processes can run concurrently on multiple CPUs/cores, using Inter-Process Communication (IPC) mechanisms (e.g., pipes, shared memory, sockets) for interaction.
-
Kernel Threads: Kernel threads are managed directly by the OS, allowing context switching between threads and true parallelism on multi-core systems. Java and C++ provide built-in support for kernel threads, which can run on different cores independently.
-
Hybrid Approaches: Many programming environments combine user-level and system-level concurrency:
-
Thread Pools: Thread pools are reusable thread groups that handle multiple tasks, ideal for high-throughput or I/O-bound applications. They’re commonly available in languages like Java (
ExecutorService
) and Python (concurrent.futures
). -
Work-Stealing Schedulers: Work-stealing allows idle threads to "steal" tasks from busy threads, balancing load across CPU cores. This model, found in Go’s goroutines and Java’s ForkJoinPool, is particularly useful for high-performance or distributed applications.
-
Actor Model: The actor model treats "actors" (independent entities) as units of concurrency that communicate by passing messages. Frameworks like Erlang and Akka use this model, which is helpful in distributed systems where task isolation is essential.
-
Hardware-Level Concurrency
Concurrency can also occur at the hardware level, leveraging multiple processors or specialized hardware resources to execute tasks in parallel:
-
Simultaneous Multithreading (SMT): Processors supporting SMT (e.g., Intel’s Hyper-Threading) allow multiple threads to run on the same physical core by sharing resources. This setup maximizes processor efficiency, enabling two tasks to run concurrently.
-
GPUs for Parallel Execution: GPUs provide massive parallelism with thousands of cores optimized for batch processing of similar tasks (e.g., matrix multiplications in ML). CUDA C/C++ and frameworks like TensorFlow and PyTorch take advantage of GPUs for high-throughput parallel processing.
Let's draw an analogy: Consider a secretary who alternates between answering phone calls and checking appointments. They need to pause answering the phone to check the calendar, then return to answering calls, repeating this sequence throughout the day.
As you have noticed, concurrency is more connected with logistics. If it were not, then the secretary would wait until the time of the appointment and do the necessary things and then go to the ringing phone.
Parallelism
Parallelism, on the other hand, involves the simultaneous execution of tasks — literally doing things "in parallel". This type of execution requires at least two computational resources (e.g., cores or processors). Parallelism is often used as an implementation method within concurrent systems, utilizing threads or processes for actual simultaneous execution.
Back to the office: Now we have two secretaries. One keeps an eye on the phone, and the other makes appointments. Since there are now two people to handle the workload, tasks can truly proceed in parallel.
Parallelism is a subset of concurrency. To run tasks concurrently, we first organize them. Once organized, they can be executed either in an interleaved manner (concurrency) or truly simultaneously (parallelism).
Additional materials
- Andrew Gerrand's post
- Rob Pike's talk
- Asynchronous programming by Kirill Bobrov
- Grokking Concurrency by Kirill Bobrov