OS Design: Multithread support

6 min readFeb 25, 2021

In this article I will tell you about multithread support in modern operating systems. Scientists discussed the problem of slow singlethread systems back in the 70s, but when implementing a multithread system, the developers faced a number of problems.

I strongly recommend to read my article “OS Design: Threads and dispatcher” before reading this article.

Let’s start with basic terms you must know before reading this article:

Atomic operation is the smallest available operation in the system. This operation must be performed without suspensions
Critical section is a block of the program (for example, block of the code, instructions) in which a task is performed with a resource that can’t support multiple operations (for example, writing something to a file located on HDD)
Mutual exclusion (Mutex) is a multithread synchronization primitive. It works like a “lavatory rule” — only one thread can enter the critical section for a while, and other threads must wait until the first thread won’t complete its task
Race condition is a program status, when its result depends on the order of the performing threads. It’s a big problem, because we can’t foresee whict thread will be performed earlier than the other threads. It seems threads are performed randomly
Resource starvation is a situation, when the thread can’t be performed and may get stuck in the queue for waiting threads due to deficiency of the OS resources (for example, the OS must perform a large amount of critically necessary threads using a RAM with little size, and threads with lower priority may be stucked forever waiting for their turn)
Deadlock is a situation, when the first thread waits for the result of a second thread, and the second thread waits for the result of the first thread. And then both treads get stuck forever
Livelock is a situation similar to deadlock, but both threads don’t get stuck and continue performing some tasks

Switcher vs SMP

Let’s start with explaining two models of the multithread system.

Switcher is a technology which is used at the 1-core CPUs which switches the threads very-very fast, and we get a simulation of the multithreading.
SMP (Symmetric MultiProcessing) is a technology which is used at the multiple-core CPUs or multiple CPUs. This technology implements real parallel threads execution on multiple cores, and also synchronizes threads with common RAM.

Look at the dispatcher diagram from my previous article:

How dispatcher works (also thread’s life cycle)

We have a queue of ready threads. This diagram is equal to Switcher, because we have the only queue for ready for execution threads, and the dispatcher takes the thread quickly, executes it and then sends it to the queue again. And we get a simulation of the multithreading.

SMP gives us multiple queues. The diagram below shows a part of the dispatcher’s logic, but with SMP:

So, SMP dispatcher uses 3 queues at the 3-core CPU. We have 3 queues which give us a true 3-parallel execution of the threads.

The responsibilities of the OS are:

to manage the performance of the threads
to protect the threads from the other threads
to control the CPU and memory resources and grant them for the threads
to minimize race conditions and deadlocks/livelocks

Kernel-level and User-level threads

In the modern world there are two types of threads:

Kernel-level threads
User-level threads

Let’s look at the diagram below:

Difference between kernel-level and user-level threads

In the first situation threads are totally controlled by the kernel.

And in the second one threads aren’t controlled by the concurrent library with a mechanisms for working with threads.

Kernel-level threads advantages and disadvantages:

Threads are safely controlled by the kernel
The performance is so slow, besause of a large count of swithcking between kernel mode and user mode
This type requires a kernel modifications

User-level threads advantages and disadvantages:

Threads can be parallely performed in the systems without multithread support. This feature is implemented in the concurrent library
This type doesn’t require a kernel modifications
User-level threads have simple interface and performs quite faster than kernel-level threads
But user-level threads aren’t performed and as fast as kernel-level threads. It means we need communications with a concurrent library

Concurrent library mechanisms

1. Lock

Lock is a mechanism which implements mutual exclusion.

We know that threads have Mutex flag which shows a lock status: thread may be locked or not.

So, Lock mechanism has two atomic operations — lock() and unlock().

Let’s look at this pseudocode that shows us Lock usage:

var logger = getLogger();
var lock = getLock();.......lock.lock();
logger.write_log("[W] the resource is blocked");
lock.unlock();

In this example logger.write_log(“[W] the resource is blocked”) is a critical section which we must restrict from the other threads to prevent race condition. lock.lock() is the lock() operation, and lock.unlock() is the unlock() operation.

Why lock() and unlock() must be atomic? Because every thread may be suspended by the dispatcher, and if it happends with the thread which performs lock() or unlock() operations at this moment, these operations might not be completed, and we get race condition.

2. Semaphore

Semaphore is a mechanism which allows to give an access to a fixed amount of threads to critical section.

It has three atomic operations — init(), wait() and signal(). Init() is an operation that initializes the semaphore instance with a fixed amount of threads from a given integer value.

Let’s look at this pseudocode that shows us Semaphore usage:

var logger = getLogger();
var semaphore = getSemaphore(3);.......semaphore.wait();
logger.write_log("[W] the resource is blocked");
semaphore.signal();

In this example logger.write_log(“[W] the resource is blocked”) is a critical section which we must restrict from the other threads to prevent race condition. getSemaphore(3) is the init() operation, semaphore.wait() is the wait() operation, and semaphore.signal() is the signal() operation.

How does semaphore work? Semaphore has an inner thread counter.

After executing init() operation counter equals given value. In the example above it is equal to 3.
When the thread joins critical section, the counter is decreased by 1.
When the counter is equal to 0, the other threads are blocked by the Lock mechanism.
When the active thread leaves the critical section, the counter is increased by 1.

If you don’t understand, just look at this gif (T is the thread, Permits is the counter, critical section is the room):

Source: https://habr.com/en/post/277669/

3. Monitor

Monitor is like semaphore, which supports only one active thread.

It has two atomic operations — wait() and signal(). Init() operation isn’t needed, because the monitor supports only one active thread.

Let’s look at this pseudocode that shows us Monitor usage:

var logger = getLogger();
var monitor = getMonitor();.......monitor.wait();
logger.write_log("[W] the resource is blocked");
monitor.signal();

In this example logger.write_log(“[W] the resource is blocked”) is a critical section which we must restrict from the other threads to prevent race condition. monitor.wait() is the wait() operation, and monitor.signal() is the signal() operation.

In this article I focus on basic things, in the OS every atomic operation is a complex operation, every mechanism has a lot of conditions and optimizations.

Deadlock and Livelock examples

Let’s check again the terms — what is deadlock and livelock:

Deadlock is a situation, when the first thread waits the result for the second thread, and the second thread waits for the result of the first thread. And then both treads get stuck forever
Livelock is a situation similar to deadlock, but both threads don’t get stuck and continue performing some tasks

By the diagram below, you can easily understand the deadlock and livelock models:

Deadlock or livelock is a very bad situation which is hardly to find and debug. In most situations, deadlocks or livelocks are fixed by the right program architecture with programming languages’ concurrent instruments.