In a previous article I described the basic concepts of computer virtualization and how the MIPS architecture efficiently implements hardware virtualization. This article will focus on multithreading in an attempt to define what it is, and why it’s useful.
A thread is a sequence of instructions. Multithreading refers to the ability of a given processor (e.g. CPU, GPU, etc.) to run multiple threads. Without going too much into the specifics of what constitutes a thread, you can think of threads as tasks or jobs.
A program can spawn one or more threads. In today’s computing landscape, threads will invariably and concurrently compete for hardware resources. But what happens when you are in a resource limited environment? If you have two threads running on a multicore processor, the first instinct would be to distribute them equally on each core.
But that seems like a brute force approach. Could we do it in a better way? The answer is yes – and is quickly followed by another follow-up question: does your multicore processor also have hardware multithreading?
Hardware techniques used to support multithreading are often confused with the software mechanisms used for multitasking computer programs on multicore processors. There is a significant difference however between the two; the diagram below explains the differences between hardware and software multithreading:
From here onward, we’ll use the term multithreading to refer to the hardware multithreading as supported by the MIPS architecture through the MT extensions (MT ASE) and MIPS MT module.
Embedded processors are hitting a performance wall. For the past decade, SoC designers have relied more and more on multicore to provide performance scalability. Where has this race led us? Mainstream smartphone processors have gone from dual-core configurations running at 1 GHz in 2012 to 2.5 GHz deca-core versions in 2015 – that’s a 5x increase in the number of CPUs in the course of three years for those counting at home. During that time process technology has improved too, but at a much reduced scale, from 40nm in 2012 to 28nm or 14nm now. However, it sounds like 28nm is here to stay for mainstream processors while 16nm and beyond will be reserved mostly for the high-end computing market.
That means CPU designers need to find other solutions to increase performance while keeping the total die size down.
Doubling the core count automatically implies twice the area but not necessarily twice the performance. In fact, benchmarking data reveals that a dual-core CPU can achieve better scores compared to a quad- or octa-core CPU.
Further increasing the frequency of the CPU cluster is one option but that brings complications with power consumption management. Microarchitectural improvements (e.g. bonded load/store instructions, out-of-order execution, etc.) also provide tangible benefits but may result in a high-end design that has limited applications in the embedded market. There is also the question of execution efficiency: a single-threaded CPU can spend a lot of time in an idle state, waiting for data from memory.
Multithreading cleverly addresses the issues presented above. For example, opting for a dual-threaded, single core CPU instantly offers system architects a 30-50% boost in performance for a minimal increase in area. In addition, waking up a thread is an instant process which does not require the additional power management logic needed by multicore systems.
There’s another reason why multithreading is useful for embedded SoCs: reliable real-time operation. System architects can dynamically allocate threads to demanding I/O services; for example, a thread will attempt to read a value from the I/O system, go to idle, and spring back to life as soon as the data is ready. This reduces the overhead associated with servicing interrupts.
Another important reason to use multithreading is to fully utilize the execution pipeline resources for every CPU cycle in the presence of cache misses and potentially other events that would normally stall one thread.
Finally, multithreading is geared towards delivering superior overall throughput for parallel-oriented applications such as web browsing.
Looking at the information presented above, the immediate conclusion is that multithreading proves useful for a variety of scenarios. But how do MIPS CPUs implement multithreading?
One way to understand multithreading is through a crash course in modern operating systems. When the thread is not running, the program counter (PC) needs to store the address the thread will execute from when it next starts up. In addition, the CPU must track all the saved values from the programmer-visible registers; for programs running in user mode on a MIPS CPU that means the general-purpose registers (GPRs) and the multiplier accumulator (MAC). The CPU also stores an identifier for every thread that is running, as well as a flag to indicate any kernel privileges (e.g. executing a system call).
An operating system running on a multithreaded MIPS CPU maintaining multiple address spaces will require the ASID value too, because it’s used by the hardware when translating addresses.
Therefore, every thread inside a MIPS CPU will contain a copy of the PC and GPRs, a unique thread ID, the kernel mode, and the ASID value; these values make up the thread context.
Popular operating systems that are capable of detecting and using multithreading hardware include SMP Linux, Android, or Windows. Imagination has worked closely with its ecosystem partners to ensure that all operating systems that support MIPS can take full advantage of our hardware multithreading technology. For example ThreadX from Express Logic features optimized context switching implemented on top of our MIPS MT extensions for I-class CPUs.
So no matter whether you’re running a SMP Linux-based distribution like Android or a real-time operating system like ThreadX, if it supports MIPS then it can support MIPS multithreading.