Multicore Parallelism

The Power limit (how power goes up exponentially) forced chip designers to switch to multiple processors (cores) instead of just one more powerful one

New challenges

  • How to utilize all the cores
  • How to keep all the cores busy

Calculating Performance

  • Potential is the speedup if 100% of the program is parallelizable
  • Use Amdahl’s Law to calculate the percentage needed to parallelize for a given speedup

Types of Memory

  • Centralized Shared-Memory
    • Properties
      • Processors mutate shared memory
      • Each processor has their own private Caches
      • Processors have shared IO
    • Good for 4-16 processors
    • Main memory has a symmetric relationship to all processors and uniform access time from any processor
      • SMP: symmetric shared-memory multiprocessors
      • UMA: uniform memory access architectures
  • Distributed-Memory
    • Properties
      • Each processor gets their own memory and IO
      • Processors have to communicate to each other
      • Good for a ton of processors
    • Shared address space
      • Same physical address refers to same memory location
      • DSM: Distributed Shared-Memory Architectures
      • NUMA: Non-uniform memory access since the access time depends on the location of the data
  • Cache coherence problems
    • With write back, the value in the cache is not written back to main memory until it is needed
    • So other processors might read a stale value from the main memory
    • Solutions
      • Centralized: Snoopy based protocol
        • The processors watch what other processors are doing
      • Distributed: Directory based protocol