LTGameJam 2009 postmortem

So LTGameJam 2009 is over. I’ve been there as part organizer, part participant, so my views are both biased and incomplete (being an organizer means you have to run around a bit, instead of just focusing on making the game).

The theme for the games was “as long as we have each other, we will never run out of problems”. Additionally, games had to be short (5 minutes of play or less), and somehow incorporate one of “affectionate”, “patriotic” or “missing” words.

missingpeace

I worked on a Missing Peace game. It’s nothing really fancy, does not quite follow the idea and incorporates the above mentioned words in a cheap way (“just stick it into a title! haha!”). It was probably the most polished game from all games made there though (for some definition of polish)… too bad it’s not actually fun to play :)

Oh well. I just did not have any interesting ideas, and wasn’t particularly inspired, so there is the result. Probably burnout of trying to finish Unity 2.5 at work had it’s toll as well.

Overall, the good parts about this game jam:

  • It was fun (hey, that’s the whole idea)

  • Some very positive progress, compared to LTGameJams 2002/2003: more people (20-25, up from 10-15), much better proportion of artists (about 30%, up from almost zero), more people who don’t know each other, more games made by folks outside of nesnausk! group :)

  • Some of the ideas that were brainstormed have interesting bits.

  • Did I mention it was fun?

On the downside, I get the feeling that the games made this time were not crazy enough. GameJams are meant to generate totally whacky, crazy and amazing ideas; however this time most of the games were known game mechanics, pretty safe idea and so on. Have to improve on that the next time.

So that’s about it!




Fixed function lighting in vertex shader - how?

Sometime soon I’ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to various artifacts that annoy people to hell.

I don’t really know why that happens, because it seems that most modern cards don’t have fixed function units, so internally they are running shaders anyway. DX9 runtime on Vista’s WDDM also seems to be only handling shaders to the driver internally. Still, for some reason somewhere the precision does not match…

How such a task should be approached?

My requirements are:

  • Should handle any possible state combination in D3D fixed function T&L.

  • D3D 9.0c, using vertex shader 2.0 is ok. For now I don’t care about OpenGL.

  • No HLSL at runtime. I don’t want to add a megabyte or more to Unity web player just for HLSL. DX9 shader assembly is ok, because we already have the assembler code.

  • Should work as fast (or close to) as the regular fixed function pipeline.

I looked at ATI’s FixedFuncShader sample. It’s an ubershader approach; one large (230 instructions or so) shader with static VS2.0 branching. It had some obvious places to optimize, I could get it down to 190 or so instructions, kill some rcp’s and reduce the amount of constant storage by 2x.

Still, it did not handle some things in the D3D T&L or had some issues:

  • It assumes one input UV, one output UV and no texture matrices. This place in T&L gets quite convoluted - any input UVs or a texgen mode can be transformed by matrices of various sizes, and routed into any output UVs.

  • It was not using full T&L lighting model. No biggie here.

  • I haven’t checked with NVShaderPerf or AMD ShaderAnalyzer yet, but last time I checked the static branch instruction was taking two clocks on some NV architecture. So ubershader approach does not come for free.

Another thing I’m considering, is to combine final shader(s) from assembly fragments, with some simple register allocation.

In T&L shader code, there’s only limited set of could-be-redundant computations, mostly computing world space position, camera space normal, view vector and so on (those could be used lighting, texgen or fog). Those computations can be explicitly put into separate fragments, and later fragments could just use their result.

What is left then is some register allocation. A shader assembly fragment could want some temporary registers for internal use (this is simple, just give it a bunch of unused registers), also want some registers as input (from previous fragments), and save some output in registers.

Again, I haven’t checked with shader performance tools, but I think, guess and hope that the drivers do additional register allocation, liveness analysis etc. when converting D3D shader bytecode into hardware format. This would mean that I can be quite sloppy with it, i.e. don’t have to implement some super smart allocation scheme.

I wrote some experimental code for the shader assembly combiner and so far it looks like a reasonable approach (and not too hard either).

Does that make sense? Or did everyone solve those problems eons ago already?

Edit: half a year later, I wrote a technical report on how I implemented all this: aras-p.info/texts/VertexShaderTnL.html


Quote of the day

Somewhat amusing quote from gamedeff.com:

Дешевая популярность в тяжелые времена не мешает, поэтому в блог срать надо почаще (всем, кстати, рекомендую).

Preemptive note: Google Translate does not quite cope with it.