Performance gap between Memory (DRAM latencies) and Processor due to technology scaling is the motivation to investigate Processing in Memory (PIM) architectures, also known as Near Data Processing. Energy, Performance and Cost must be optimized. PIM tackles this by bringing computation to the data (and removing the data movement latencies). PIM is both: Completely new architectures and varying degree of architectural modifications in Memory Subsystems. This paper discusses the application of PIM in real-world applications (such as, Machine Learning, Data analytics, genome sequencing etc.), how to design the programming framework and the challenge of adoption of the new framework among the developer community.
Data moves from the Memory to the CPU via the memory channel (a pin-limited off-chip Bus e.g. double data-rate Memories aka DDR use a 64-bit memory channel.)
The CPU issues request to the Memory Controller, which issues command across the memory channel to the DRAM.
The DRAM then reads the data and moves across the memory channel to the CPU (where the data has to travel through the memory hierarchy into the CPU’s register).1
Therefore, we need to rethink the computer architecture. PIM is one of such methods. The idea is almost 40 years old, but the technology was not mature enough to integrate a Memory with Processor elements. Technology such as 3D Stacked Memory (combining DRAM Memory layers connected using through-layer via along with a logic layer), and more-computation friendly resistive memory technologies, makes it possible to embed general-purpose computation directly within the memory.
- Identification of application properties that can benefit from PIM architectures.
- Making the architecture heterogeneous requires understanding of: a) architectural constraints (area, energy limitations along with the logic that is implementable within the memory), and b) application properties, such as memory access patterns and shared-data across different functions.
- Therefore we need to understand the partition between PIM logic and CPU driven logic, establish the interfaces and mechanism for programming (while trying to stay close to the conventional programming model).
Overview of PIM
Opportunities in PIM applications
Key Issues in Programming PIM architectures
Pros and Cons of the Paper
Memory Bottleneck: Moving large amount of data for High-Performance and Data-Intense applications causes the bottleneck on energy and performance of the processor. The limited size of memory channel limits the number of access requests that can be issued in parallel, and the Random access patterns often leads to inefficient caching. The total cost of computation (in terms of performance and energy) is dominated by the cost of data movement. ↩