Fast Mobile Shaders or, I did a talk at SIGGRAPH!
Finally after many years of dreaming I made it to SIGGRAPH! And not only that, I also did a talk/course with ReJ for 1.5 hours. This was the first time Unity had real presence at SIGGRAPH and I hope we’ll be more active & visible next time around.
Here it is, 100+ slides with notes: Fast Mobile Shaders (17MB pdf). This isn’t strictly about shaders; there’s info about mobile GPU architectures, general performance, hidden surface removal and so on. Also, graphs with logarithmic scales; can’t go wrong with that!
[...] Fast Mobile Shaders (originally How to Write Fast iPhone and Android Shaders in Unity) [...]
You state that GPU profiling is hard on iOS?
Surely, you haven’t found the OpenGL analyze profiler then, in Xcode 4?
You can see how long each and every glCall took, and will get a lot of suggestions to improve performance.
iOS is almost impossible to measure GPU cost of draw calls because of the deferred nature of the SGX chip… the OpenGL profiler may only display how long the CPU part of the driver took?
@bram: that profiler tells you how much CPU time each draw call took. So it’s good for measuring CPU overhead of the draw calls, but tells you nothing about which draw calls are expensive on the GPU.
This is full of extremely useful information. Thanks for posting!
Thanks for the wonderful paper.
The paper states that Mali does Tile-Based rendering, however it’s worth noting that the ARM site claims that the Mali200 to the Mali400 does TBDR.
It would also be interesting to find out some timings of the Mali400 in various tests (Galaxy S 2, Galaxy Tab 7.7, Galaxy Note)!
@Ben: my understanding is that Mali has small tiles (so Tiled, but much smaller tiles than Adreno), but there’s no “deferred” bit – within a tile, all primitives are processed in order. Which is why Mali’s development guides tell to sort geometry front to back (if it was full TBDR, it wouldn’t need this step much, right?)
Thanks for the clarification, Aras!
I’m looking at the dev guide now, and it seems as if they do use deferred rendering. However, their method of deferring comes at a significant enough performance hit (around a 3 frame lag in their job pipeline) to warrant sorting. Their implementation explains their recommendation to developers to sort geometry for maximum performance.
From the guide (2.6.8 note #2, page 2-13):
“Deferred processing and the pipeline nature of the Mali GPU implementation results in
some latency. Typically, three frames are in various stages of processing while a fourth is
displayed. At rates below 20 frames per second, you might notice a delay between a
button being pushed and observing a reaction in the image. So, if you want a very
fast-moving game, either simplify your scene, or optimize processing to get a higher
frame rate.”
http://infocenter.arm.com/help/topic/com.arm.doc.dui0363d/DUI0363D_opengl_es_app_dev_guide.pdf
3 frame lag at 50ms/frame would result in a noticeable delay in reaction! It would also be interesting to do some timings with significant overdraw to get a feel for the costs associated with ARMs tile-based renderer, and compare it to, say SGX or Adreno. I’m just thinking out loud, not to worry; I am not actually requesting anything!
It’s very hard to get a feel for the strengths/weaknesses of modern mobile GPUs via benchmarks, so your paper was extremely refreshing. Thank you! I wish I could have attended the talk!
@Ben: I still think what they call “deferred” is different from “deferred” in PowerVR case ;) That was my understanding when talking with Mali engineers at siggraph as well. Their “deferred” means that they do not immediately process commands; they need to “defer” pixel processing for example until after tile assignments are done.
In PowerVR architecture however, “deferred” means that pixel shading is deferred until they find the closest fragment in the tile. So there for opaque geometry, the pixel shader is executed exactly once per pixel in the tile, no matter what is the level or submission order of the opaque geometry.
In all other architectures that’s not the case; if any hi-z/zcull/early-z did and your order help to cull them then great, but worst case the pixel shader might be executed for each overdraw layer.
Thanks Aras! This makes a ton of sense, and clarifies quite a bit!
[...] sur plateformes mobiles, je vous conseille de lire ce PDF qui est une présentation faite par Aras lors du dernier SIGGRAPH. C’est du haut niveau, donc [...]