Multithreading
Threads
- Process with own instruction/data
- Thread may be a part of a larger program or the whole program
- Each thread has all the state (instructions, data, PC, register state, etc.) necessary to allow it to execute
Types

Coarse-Grained
- Switches thread on costly stalls, such as L2 cache misses
- Advantages
- Does not require very fast thread switching
- Doesn’t slow down thread, since instruction from other threads issued only when the thread reaches a costly stall
- Disadvantage
- Context switching is expensive
- Is hard to overcome throughput losses from shorter stalls due to pipeline start up calls
Fine-Grained
- Switches between threads on each instruction, causing the execution of multiple threads to be interleaved
- Usually done in a round-robin fashion, skipping any stalled threads
- CPU must be able to switch threads each clock
- Advantage
- It can hide both short and long stalls
- Since instructions from other threads executed when one thread stalls
- Disadvantage
- Slows down the execution of individual threads
- More expensive since it requires specialized hardware
Simultaneous Multithreading
- Requires fine grained ability
- Does not need multiple processors, just parallel execution units