Pair programming / animations

Tried out pair programming the other day. I can definitely see it working, especially on hard topics (i.e. where you spend lots of time thinking, explaining, arguing and brainstorming that just typing in code). I am still not sure whether it really suits for “ordinary” day-to-day programming though.

The topic I and Joe tried it on was a pretty hard one - related to the core animation system. Now, of course I don’t know anything about animation systems, but my impression is that there is just no “universal” way of designing it. The ones that are floating around inside free/open engines/libraries (cal3d, nebula2) are quite fine, but not much more impressive than my own very simplistic attempts at doing animations. There is nothing wrong with that of course - its simple, it gets the job done, and its okay in most of the cases when you’re doing simple stuff.

But then, if you want something more advanced, you either have to go and get the big serious libraries, or just… well… not do it.

So, back to pair programming - making the core animation system that would have transitions, continuous blends, animation layers, bone masks and whatnot (and the kitchen sink of course!) is just not very easy. We paired basically on writing the pseudocode of the system, or some sort of outline; changed the implementation several times along the way, and in the end we’re left with a really nice and fast system and the code is actually quite simple. Much of the credit for that goes to Joe, as he found out some really cool ways to optimize the expensive things away (at that time we were not doing pair programming anymore - I went to do some research on shadows!)

But to reiterate - pairing can definitely work. I guess mostly because the other person just keeps asking “why you’re doing this?” or “this is wrong” or “we’re in a deep shit now” or “that’s awesome” :)


Switched jobs - (almost) back in this crazy industry

Switched to the new job recently. So here I am, sitting in OTEE office in Copenhagen, working on Unity game development tool. Not exactly game development, but quite related. So far so good, will see how it goes. If you haven’t yet - take a look at Unity, it is good and will only get better.


Zen bondage!

Check out a small game by demogroup Moppi: Zen Bondage.

Zen Bondage is a puzzle game about control. The motivation was to try to make a puzzle game which evokes adult emotions.

However, it fails to evoke any “adult emotions” in me. Maybe I’m just not into this bondage thing :)

I like the simplicity/pureness of the idea, and it’s executed very well.


64k coding continued

I’m making a steady, but very slow progress on “my” 64k intro. Over the last week I couldn’t get over 13 kilobytes, so you can see that the progress is really slow. Not because I don’t code anything, but all code increase was cancelled by data size optimizations.

So far coding and data design for small sizes is not that much pain at all. Just, well, code and, well, keep your data small :) We’re only talking about the size of initial data, not the runtime size though.

A few obvious or new notes:

  • Code to construct a cylinder is more complex than the one to construct a sphere. That’s what I expected. However, code to construct a box with multiple segments per side is the most complex of all!

  • Dropping last byte from floats is usually okay. And instant 25% save! For some of the numbers, I plan to switch to half-style float (2 bytes) if space becomes a concern.

  • Storing quaternions in 4 bytes (byte per component) is good. Actually, now that I think of it, it makes more sense to store three components at 10 bits each, and just store the sign of 4th component - better precision for the same size.

  • This intro literally has the most complex and most automated “art pipeline” of any demo/game I (directly) worked on! I’ve got maxscripts generating C++ sources, custom commandline tools preprocessing C++ sources (mostly floats packing - due to lack of maxscript functionality), lua scripts for batch-compiling HLSL shaders, “development code” generating .obj models for import back into max, etc. It’s wacky, weird and cool!

  • Compiling HLSL in two steps (HLSL->asm and asm->bytecode) instead of direct (HLSL->bytecode) gets rid of the constant table, some copyright strings and hence is good. (thanks blackpawn!)

  • Getting FFD code to behave remotely similar to how 3dsmax does FFD is hard :)

The best thing so far is that I’ve got the music track from x_dynamics - it’s already done in V2 synth, takes small amount of space and is really good. Now I “just” have to finish the intro…


Speculation: pipelining geometry shaders

A followup to the older “discussion” about how/why geometry shaders would be okay/slow:

The graphics hardware has been quite successful so far at hiding memory latencies (i.e. when sampling textures). It does so (according to my understanding) by having a looong pixel pipeline, where hundreds (or thousands) pixels might be at one or another processing stage. ATI talks about this in big letters (R520 dispatch processor) and speculations suggest that GeForceFX had something like that (article). I have no idea about the older cards, but presumably they did something similar as well.

I am not sure how the vertex texture fetches are pipelined - pretty slow performance on GeForce6/7 suggest that they aren’t :) Probably vertex shaders in current cards operate in a simpler way - just fetch the vertices and run whole shaders on them (in contrast to pixel shaders, which seem to run just several instructions, then go to another pixels, return back, etc.).

With DX10, we have arbitrary memory fetches in any stage of the pipeline. Even the boundary between different fetch types is somewhat blurry (constant buffers vs. arbitrary buffers vs. textures) - perhaps they will differ only in bandwidth/latency (e.g. constant buffers live near the GPU while textures live in video memory).

So, with arbitrary memory fetches anywhere (and some of them being high latency), everything needs to have long pipelines (again, just my guess). This is all great, but the longer the pipeline, the worse it performs in non-friendly scenarios: pipeline flush is more expensive, drawing just a couple of “things” (primitives, vertices, pixels) is inefficient, etc.

I guess we’ll just learn a new set of performance rules for tomorrow’s hardware!

Back to GS pipelining: I imagine that the “slow” scenarios would be like this: vertices have shaders with dynamic branches or memory fetches differing vastly in execution lengths - so GS has to wait for all vertex shaders of the current primitive (optional: plus topology) to finish; and then each GS has dynamic branches or memory fetches, and outputs different number of primitives to the rasterizer. If I’d were hardware, I’d be scared :)