Skip to main content

Scope Batching

GPU blocks batch multiple operations into single dispatch calls for efficiency.

Automatic Batching

All operations inside a gpu { } block are collected and submitted as a batch:

gpu {
    a is X + Y;
    b is X * Y;
    c is (X * Y) + Z;
}
// All three operations dispatched in a single GPU submission

Loop Batching

Operations inside loops within gpu blocks are also batched:

gpu {
    for (i is 0; i < 100; i++) {
        result[i] is data[i] * scale;
    }
}
// All 100 iterations batched into one GPU kernel

Benefits

Reduced GPU driver overhead
Better memory coalescence
Automatic operation fusion (e.g., multiply-add → FMA)

Next Steps

Performance — Optimization tips

Automatic Batching
Loop Batching
Benefits
Next Steps