Render a deforming grid of many points, additively blending them.
There's a checkbox to rasterize points with a compute shader; when it is off the regular GPU point rasterization is used. Compute shader path is the most naïve one: just do atomic writes to screen-sized R,G,B buffers, using fixed point colors. There's a resolve pass at the end to turn that into display colors.
"Scale" control changes how spread out the points are on screen. It seems that with small scale (i.e. lots of points overlapping on the same pixel), there is quite a big drop in performance on some GPUs. The compute shader path also gets slower, but not as drastically.
Buttons change the sliders to some pre-defined scenarios I was measuring.
⚠️ Note: it looks like for some people Firefox is reporting incorrect GPU timings (they are about 10x lower than they should be). Chrome/Chromium based browsers might give more consistent results at this point.
A blog post about all of this: aras-p.info/blog/2025/08/24/This-many-points-is-surely-out-of-scope/
Data gathered so far is below. Each case contains GPU time in milliseconds, for regular GPU raster and compute shader raster.
GPU Model | 1M, 90% | 4M, 90% | 16M, 90% | 1M, 1% | 4M, 1% | 16M, 1% | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NVIDIA | ||||||||||||
RTX 5070Ti | 0.2 | 0.0 | 1.0 | 0.1 | 4.8 | 0.7 | 1.6 | 0.0 | 6.8 | 0.1 | 28.5 | 0.2 |
RTX 4080 | 0.8 | 0.1 | 1.2 | 0.6 | 5.3 | 1.2 | 1.6 | 0.3 | 7.2 | 1.1 | 27.3 | 3.7 |
RTX 3090 | 0.7 | 0.2 | 2.0 | 1.1 | 7.5 | 2.1 | 2.5 | 0.9 | 10.1 | 0.9 | 38.6 | 4.9 |
RTX 3080Ti | 1.3 | 0.8 | 3.2 | 1.8 | 6.3 | 5.1 | 2.6 | 1.0 | 9.2 | 2.2 | 37.7 | 5.7 |
RTX 4070 | 1.0 | 0.5 | 1.7 | 1.2 | 7.3 | 2.3 | 1.7 | 0.5 | 2.1 | 1.5 | 26.5 | 6.2 |
RTX 3070Ti | 1.2 | 0.8 | 2.4 | 1.4 | 7.8 | 2.5 | 2.6 | 1.3 | 10.3 | 2.3 | 40.3 | 5.4 |
RTX 4070 laptop | 2.9 | 0.3 | 5.9 | 3.3 | 10.3 | 6.5 | 10.0 | 1.2 | 12.6 | 5.0 | 33.7 | 14.1 |
RTX 4060 | 6.8 | 1.9 | 8.4 | 6.8 | 10.0 | 6.1 | 8.5 | 2.9 | 7.0 | 6.9 | 28.2 | 7.5 |
RTX 2070 Super | 1.0 | 0.2 | 2.5 | 1.0 | 13.0 | 2.7 | 2.8 | 0.9 | 11.4 | 1.6 | 45.2 | 8.2 |
RTX 2060 | 0.9 | 0.2 | 4.7 | 0.9 | 15.5 | 4.6 | 4.7 | 1.7 | 10.1 | 2.5 | 41.6 | 7.9 |
GTX 1060 | 1.8 | 0.4 | 9.0 | 3.2 | 15.7 | 6.1 | 6.8 | 0.5 | 10.4 | 2.6 | 41.9 | 8.9 |
AMD | ||||||||||||
RX 9070 XT | 0.2 | 0.0 | 1.7 | 0.1 | 7.3 | 0.7 | 3.8 | 0.4 | 15.0 | 2.1 | 60.9 | 7.4 |
RX 9070 | 0.6 | 0.0 | 2.6 | 0.2 | 9.0 | 2.1 | 4.0 | 0.8 | 15.9 | 2.6 | 64.0 | 8.0 |
RX 7900 XTX | 0.4 | 0.0 | 2.6 | 0.2 | 7.3 | 1.2 | 7.7 | 1.0 | 30.9 | 4.4 | 123.6 | 9.7 |
RX 6950 XT | 0.9 | 0.2 | 4.1 | 1.0 | 6.5 | 4.0 | 4.3 | 3.1 | 15.5 | 4.8 | 62.0 | 9.4 |
RX 9060 XT | 0.8 | 0.1 | 3.3 | 0.3 | 13.3 | 1.9 | 4.6 | 1.2 | 14.4 | 2.4 | 58.2 | 9.8 |
RX 7800 XT | 1.0 | 0.1 | 6.1 | 0.4 | 14.5 | 0.9 | 9.5 | 0.9 | 17.7 | 4.2 | 71.1 | 12.5 |
RX 6600 XT | 1.2 | 0.3 | 4.6 | 1.6 | 10.7 | 4.8 | 4.7 | 2.3 | 15.2 | 4.8 | 60.8 | 10.9 |
Apple | ||||||||||||
M4 Max | 1.7 | 0.2 | 4.5 | 0.8 | 12.4 | 4.8 | 21.8 | 2.5 | 87.4 | 10.2 | 351.0 | 15.2 |
M2 Max | 1.9 | 0.3 | 5.0 | 1.3 | 18.4 | 9.5 | 14.6 | 3.4 | 58.8 | 9.6 | 234.5 | 25.1 |
M1 Max | 1.3 | 0.5 | 4.9 | 2.9 | 11.1 | 10.8 | 15.6 | 2.9 | 62.5 | 10.2 | 250.3 | 28.0 |
M3 | 2.9 | 1.0 | 5.1 | 2.3 | 23.1 | 8.0 | 22.0 | 4.1 | 89.0 | 8.4 | 340.3 | 20.5 |
A18 Pro | 10.5 | 6.6 | 13.1 | 10.4 | 61.5 | 15.8 | 33.8 | 18.9 | 102.5 | 17.1 | 432.7 | 55.4 |
A17 Pro | 6.8 | 5.2 | 14.2 | 7.7 | 67.2 | 20.5 | 29.1 | 12.2 | 110.0 | 22.5 | 470.0 | 64.8 |
Intel | ||||||||||||
Iris Xe | 4.4 | 3.6 | 12.5 | 10.6 | 68.4 | 35.3 | 12.9 | 11.1 | 44.7 | 22.8 | 228.3 | 119.2 |
Alder Lake GT2 | 2.9 | 3.8 | 24.5 | 10.4 | 191.5 | 60.6 | 53.1 | 7.4 | 213.8 | 12.8 | 858.3 | 41.5 |
gen-12lp (i9-14900HX) | 3.1 | 6.3 | 24.2 | 12.8 | 156.0 | 78.2 | 15.4 | 12.6 | 70.0 | 49.8 | 280.0 | 199.0 |
gen-12lp (i5-12450H) | 6.5 | 4.2 | 37.1 | 18.8 | 160.4 | 109.1 | 23.1 | 4.8 | 99.0 | 14.1 | 406.1 | 45.3 |
gen-12lp (i7-11800H) | 12.4 | 7.9 | 45.1 | 19.9 | 193.9 | 111.5 | 18.4 | 8.5 | 78.9 | 11.6 | 79.3 | 10.6 |
UHD 730 | 8.1 | 5.5 | 47.5 | 22.5 | 200.0 | 116.5 | 18.8 | 5.7 | 47.7 | 21.7 | 330.0 | 36.5 |
ARM | ||||||||||||
G715s MC10 (Pixel 8 Pro) | 8.3 | 3.2 | 17.2 | 6.1 | 92.6 | 15.6 | 34.3 | 5.1 | 112.9 | 13.6 | 469.1 | 53.1 |
G78 MP20 (Pixel 6) | 6.8 | 2.0 | 13.8 | 6.5 | 57.1 | 11.5 | 21.8 | 7.7 | 102.6 | 15.8 | 452.2 | 54.2 |
G715 MP7 (Pixel 9) | 5.5 | 2.1 | 15.5 | 4.6 | 75.4 | 19.2 | 25.8 | 3.8 | 106.2 | 15.1 | 429.3 | 50.3 |
G710 MP7 (Pixel 7a) | 10.2 | 7.3 | 18.2 | 9.0 | 69.7 | 13.7 | 14.5 | 8.0 | 78.2 | 13.0 | 394.0 | 52.8 |
Qualcomm | ||||||||||||
Adreno X1-85 | 2.4 | 1.4 | 12.8 | 4.7 | 57.8 | 20.1 | 9.4 | 2.7 | 38.0 | 11.4 | 174.4 | 63.9 |
Thanks to: @ascentress, Andrew Willmott, Arseny Kapoulkine, Benji Smith, Brandon Jones, @Gargaj, @Geegaz, @grapefrukt, @horenmar, Jak Boulton, Javier Arevalo, @kolyasisan, Marcel Wiessler, Mārtiņš Možeiko, Mikko Mononen, @NohatCoder, @Professor_Stevens, Robin van Ee, @rokups, Sascha Willems, @scoopr, Simen Storsveen, Simon Rolfmore, @squirrelbaffler, Steve Anichini, @vfig, for providing some of the above results!