Implementing fixed function T&L in vertex shaders

Almost half a year ago I was wondering how to implement T&L in vertex shaders.

Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a technical report here.

In short, I’m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar to using fixed function (I know it’s implemented as vertex shaders internally by the runtime/driver) on several different cards I tried (Radeon HD 3xxx, GeForce 8xxx, Intel GMA 950).

What was unexpected: the most complex piece is not the vertex lighting! Most complexity is in how to route/generate texture coordinates and transform them. Huge combination explosion there.

Otherwise – I like! Here’s a link to the article again.

4 Responses to 'Implementing fixed function T&L in vertex shaders'

  1. steve

    Nice work! I think I would have done it as a Cg/HLSL fragment generator (and maybe cached compiled assembler in GL and bytecode in D3D), but that’s because I’ve totally gone off working with assembler these days (it’s the age). Your way is clearly more hardcore :)

  2. Fabian "ryg" Giesen

    Just for the sake of showing how a bytecode-based solution looks, I’ve put the shader generator code that was used for debris (and also the never-materialized kkrieger final, and a couple of other things) online here: http://www.farbrausch.de/~fg/code/shadergen/. It’s basically a custom shader assembler that supports (macro assembler level) flow control. Everything is stored as shader bytecode, with some new opcodes for the flow control instructions. This was mainly for size reasons; the only thing that is needed on the app side is shadercodegen.cpp, which is comperatively tiny (boiled down to about 880 bytes of code in the final executable), somewhat at the expense of readability. That’s the part that also resolves flow control and register allocation, and even though it has some limitations you need to work around at the source level, I’m still quite fond of it. :)

    I’ve also thrown in the sources for the “ubershader” used in kkrieger (long before that term was coined) and the tons-of-permutations multipass lighting stuff used in debris to show some examples. I certainly don’t miss writing these, but still, merely having variable names with automatic register allocation made them an order of magnitude more useful than “plain” ASM shaders; the flow-control with automatic bitfields on everything (that’s what the [16..19] after a variable name means, it’s bits 16 through 19) is admittedly weird, but that allowed me to pass the material parameters to the shader compiler directly in the bitpacked format in which it was stored. Not very clean, but it got rid of a whole translation layer, which was lots of code. (This kind of fakery is precisely why I’m sick of size-optimizing!)

  3. Fabian "ryg" Giesen

    Woops, just noticed: view the files with tab width 2 or they’re screwed up (sorry, should’ve cleaned this up before uploading).

  4. Aras Pranckevičius

    @ryg: that’s beyond awesome. Thanks for sharing!

Leave a Reply