Multicore Parallelism
The Power limit (how power goes up exponentially) forced chip designers to switch to multiple processors (cores) instead of just one more powerful one
New challenges
- How to utilize all the cores
- How to keep all the cores busy
Calculating Performance
- Potential is the speedup if 100% of the program is parallelizable
- Use Amdahl’s Law to calculate the percentage needed to parallelize for a given speedup
Types of Memory
- Centralized Shared-Memory
- Properties
- Processors mutate shared memory
- Each processor has their own private Caches
- Processors have shared IO
- Good for 4-16 processors
- Main memory has a symmetric relationship to all processors and uniform access time from any processor
- SMP: symmetric shared-memory multiprocessors
- UMA: uniform memory access architectures
- Properties
- Distributed-Memory
- Properties
- Each processor gets their own memory and IO
- Processors have to communicate to each other
- Good for a ton of processors
- Shared address space
- Same physical address refers to same memory location
- DSM: Distributed Shared-Memory Architectures
- NUMA: Non-uniform memory access since the access time depends on the location of the data
- Properties
- Cache coherence problems
- With write back, the value in the cache is not written back to main memory until it is needed
- So other processors might read a stale value from the main memory
- Solutions
- Centralized: Snoopy based protocol
- The processors watch what other processors are doing
- Distributed: Directory based protocol
- Centralized: Snoopy based protocol