Data Level Parallelism

Vector Architecture (extreme SIMD)

  • Architecture where is an additional set of registers that each store an array of values
  • Allows you to do operations on entire arrays using single instructions
  • Similar to the multimedia SIMD instructions
  • Application: Formula translation
  • Speedups: up to triple digit times faster

Multimedia Instructions (normal SIMD)

  • Exploits the fact that most registers are larger than what is needed for media
    • ex: 8 bits for color, 16 bits per audio sample
  • 2-8 values can be stored in each register
    • Allows them to be added all at once
    • Uses a special adder that doesn’t carry across the bounds of the values
  • Speedups: single digit times faster

Graphics Processing Units (GPUs)

  • Heterogeneous execution model
    • CPU is the host, CPU is the device
  • Programming model is “single instruction multiple thread”
  • Uses language that abstracts GPU parallelism
  • Draws heavily from vector architectures, however
    • No scalar processor
    • Uses multithreading to hide memory latency
    • Has many functional units