Aussie AI

Partitioning

  • Last Updated 28 November, 2024
  • by David Spuler, Ph.D.

Partitioning is a model inference optimization technique that involves organizing data in memory, especially ordering of vectors and tensors. There can be multiple goals to achieve with in-memory partitioning:

  • Faster memory access. This can be improved via use of contiguous memory or retaining data in memory longer, rather than swapping in and out.
  • Pipelining operations to GPUs. Keeping the GPU busy by handling how the data is organized before being sent to the GPU.
  • Parallelization of operations to multiple GPUs.

Research Papers on Partitioning

GPU partitioning is a type of software acceleration to make hardware acceleration more effective. Partitioning data optimally can optimize the throughput and efficiency when using multiple GPUs.

More AI Research

Read more about: