Skip to main content

Scope Batching

GPU blocks batch multiple operations into single dispatch calls for efficiency.


Automatic Batching

All operations inside a gpu { } block are collected and submitted as a batch:

gpu {
a is X + Y;
b is X * Y;
c is (X * Y) + Z;
}
// All three operations dispatched in a single GPU submission

Loop Batching

Operations inside loops within gpu blocks are also batched:

gpu {
for (i is 0; i < 100; i++) {
result[i] is data[i] * scale;
}
}
// All 100 iterations batched into one GPU kernel

Benefits

  • Reduced GPU driver overhead
  • Better memory coalescence
  • Automatic operation fusion (e.g., multiply-add → FMA)

Next Steps