Data Level Parallelism
Vector Architecture (extreme SIMD)
- Architecture where is an additional set of registers that each store an array of values
- Allows you to do operations on entire arrays using single instructions
- Similar to the multimedia SIMD instructions
- Application: Formula translation
- Speedups: up to triple digit times faster
- Exploits the fact that most registers are larger than what is needed for media
- ex: 8 bits for color, 16 bits per audio sample
- 2-8 values can be stored in each register
- Allows them to be added all at once
- Uses a special adder that doesn’t carry across the bounds of the values
- Speedups: single digit times faster
Graphics Processing Units (GPUs)
- Heterogeneous execution model
- CPU is the host, CPU is the device
- Programming model is “single instruction multiple thread”
- Uses language that abstracts GPU parallelism
- Draws heavily from vector architectures, however
- No scalar processor
- Uses multithreading to hide memory latency
- Has many functional units