Amitabh's Weblog

Processing in Memory: A workload driven perspective

Performance gap between Memory (DRAM latencies) and Processor due to technology scaling is the motivation to investigate Processing in Memory (PIM) architectures, also known as Near Data Processing. Energy, Performance and Cost must be optimized. PIM tackles this by bringing computation to the data (and removing the data movement latencies). PIM is both: Completely new architectures and varying degree of architectural modifications in Memory Subsystems. This paper discusses the application of PIM in real-world applications (such as, Machine Learning, Data analytics, genome sequencing etc.), how to design the programming framework and the challenge of adoption of the new framework among the developer community.


  1. Data moves from the Memory to the CPU via the memory channel (a pin-limited off-chip Bus e.g. double data-rate Memories aka DDR use a 64-bit memory channel.)

  2. The CPU issues request to the Memory Controller, which issues command across the memory channel to the DRAM.

  3. The DRAM then reads the data and moves across the memory channel to the CPU (where the data has to travel through the memory hierarchy into the CPU’s register).1

  4. Therefore, we need to rethink the computer architecture. PIM is one of such methods. The idea is almost 40 years old, but the technology was not mature enough to integrate a Memory with Processor elements. Technology such as 3D Stacked Memory (combining DRAM Memory layers connected using through-layer via along with a logic layer), and more-computation friendly resistive memory technologies, makes it possible to embed general-purpose computation directly within the memory.

  5. Challenges:

Overview of PIM

Opportunities in PIM applications

Key Issues in Programming PIM architectures

Future Challenges


Lessons Learnt

Pros and Cons of the Paper

Improvement Ideas

  1. Memory Bottleneck: Moving large amount of data for High-Performance and Data-Intense applications causes the bottleneck on energy and performance of the processor. The limited size of memory channel limits the number of access requests that can be issued in parallel, and the Random access patterns often leads to inefficient caching. The total cost of computation (in terms of performance and energy) is dominated by the cost of data movement.