Depth bias and the power of deceiving yourself

In Unity we very often mix fixed function and programmable vertex pipelines. In our lighting model, some amount of brightest lights per object are drawn in pixel lit mode, and the rest are drawn using fixed function vertex lighting. Naturally the pixel lights most often use vertex shaders, as they want to calculate some texcoords for light cookies, or do something with tangent space, or calculate some texcoords for shadow mapping, and so on. The vertex lighting pass uses fixed function, because it’s the easiest way. It is possible to implement fixed function lighting equivalent in vertex shaders, but we haven’t done that yet because of complexities of Direct3D and OpenGL, the need to support shader model 1.1 and various other issues. Call me lazy.

And herein lies the problem: most often precision of vertex transformations is not the same in fixed function versus programmable vertex pipelines. If you’d just draw some objects in multiple passes, mixing fixed function and programmable paths, this is roughly what you will get (excuse my programmer’s art):
Mixing fixed function and vertex shaders

Not pretty at all! This should have looked like this:
All good here

So what do we do to make it look like this? We “pull” (bias) some rendering passes slighly towards the camera, so there is no depth fighting.

Now, at the moment Unity editor runs only on the Macs, which use OpenGL. In there, most of hardware configurations do not need this depth bias at all - they are able to generate same results in fixed function and programmable pipelines. Only Intel cards do need the depth bias on Mac OS X (on Windows, AMD and Intel cards need depth bias). So people author their games using OpenGL, where it does not need depth bias in most cases.

How do you apply depth bias in OpenGL? Enable GL_POLYGON_OFFSET_FILL and set glPolygonOffset to something like -1, -1. This works.

How do you apply depth bias in Direct3D 9? Conceptually, you do the same. There are DEPTHBIAS and SLOPESCALEDEPTHBIAS render states that do just that. And so we did use them.

And people complained about funky results on Windows.

And I’d look at their projects, see that they are using something like 0.01 for camera’s near plane and 1000.0 for the far plane, and tell them something along the lines of “increase your near plane, stupid!” (well ok, without the “stupid” part). And I’d explain all the above about mixing fixed function and vertex shaders, and how we do depth bias in that case, and how on OpenGL it’s often not needed but on Direct3D it’s pretty much always needed. And yes, how sometimes that can produce “double lighting” artifacts on close or intersecting geometry, and how the only solution is to increase the near plane and/or avoid close or intersecting geometry.

Sometimes this helped! I was so convinced that their too-low-near-plane was always the culprit.

And then one day I decided to check. This is what I’ve got on Direct3D:
Depth bias artefacts

Ok, this scene is intentionally using a low near plane, but let me stress this again. This is what I’ve got:
Epic fail!

Not good at all.

What happened? It happened in roughly this way:

  1. First, depth bias documentation on Direct3D is wrong. Depth bias is not in 0..16 range, it is in 0..1 range which corresponds to entire range of depth buffer.
  2. Back then, our code was always using 16 bit depth buffers, so the equivalent of -1,-1 depth bias in OpenGL was multiplied with something like 1.0/65535.0, and that was fed into Direct3D. Hey, it seemed to work!
  3. Later on, the device setup code was modified to do proper format selection, so most often it ended up using 24 bit depth buffer. Of course no one I never modified the depth bias code to account for this change…
  4. And it stayed there. And I kept deceiving myself that the content of the users is to blame, and not some stupid code of mine.

It’s good to check your assumptions once in a while.

So yeah, the proper multiplier for depth bias on Direct3D with 24 bit depth buffer should be not 1.0/65535.0, but something like 1.0/(2^24-1). Except that this value is really small, so something like 4.8e-7 should be used instead (see Lengyel’s GDC2007 talk). Oh, but for some reason it’s not really enough in practice, so something like 2.0*4.8e-7 should be used instead (tested so far on GeForce 8600, Radeon HD 3850, Radeon 9600, Intel 945, reference rasterizer). Oh, and the same value should be used even when a 16 bit depth buffer is used; using 1.0/65535.0 multiplier with 16 bit depth buffer produces way too large bias.

With proper bias values the image is good on Direct3D again. Yay for that (fix is coming in Unity 2.1 soon).

…and yes, I know that real men fudge projection matrix instead of using depth bias… someday maybe.

OpenCL?

Okay, so Apple just announced OpenCL (Open Computing Language) technology in upcoming OS X 10.6. This is starting to get interesting.

My prediction? OpenCL should be something along lines of CUDA or BrookGPU. Will work on various DX10-level graphics cards, and on the CPU. I think trying to target older graphics cards does not make sense - using real actual integer types is useful in general purpose computing (DX10 tech), and Apple will probably only be shipping DX10 level graphics cards in a year (at the moment only Intel cards in Macs are DX9 level; the rest is GeForce 8s and Radeon HDs). With a multithreaded CPU fallback any older machines will be taken care of anyway (and leaves the future open for Larrabees). So yeah, quite similar to BrookGPU actually.

It has “open” in the title, so maybe they will make it for other platforms as well. I doubt that they will ship implementation though; perhaps just make it royalty/patent/whatever free and publish the spec. Which is about the same level of “openness” as other technologies with “open” in their name (OpenGL, OpenAL, OpenMP, OpenCV, …) - not exactly open, but not the worst kind either.

Oh, and suddenly there are new uses for other technologies recently developed at Apple, like LLVM or clang.

We’ll see how it goes.

The problem with Vista

Jeff Atwood notes the lack of polish in Windows Vista UI. Long Zheng has started Windows UI Taskforce. I agree - Vista’s UI has tons of polish problems.

You know, little things that would seem unimportant, but screams something like “I was made in a hurry by people who don’t really love me”. Aliased shield icon overlays? Check. Horrible screen flickering when logging in or UAC prompt pops up? Check. The infamous Shut Down menu? Check. Awful file copy progress dialogs? Check. Explorer window title bar sometimes displaying green progress bar inside of it for some reason I can never understand? Check. General lack of unified style for UI? Check. The list goes on.

But still, I wonder whether lack of polish is the real problem with Vista. From my point of view, lack of direction or lack of vision seems to be a problem of similar size, if not larger. What is the vision for Vista?

“Security!” is not a vision. However hard it is to make something secure, “more security” is an improvement in one area, and not a vision on what a product should be. And second, “security” does not explain everything else about Vista. At start, it looked like some architecture astronauts had some fancy visions, like “all your filesystem is a database now!”… Well, that did not end up in Vista, and it is something that users genuinely don’t care about.

I might sound like an Apple fanboy (and indeed, OS X grows on you after a while), but when upgrading from OS X 10.4 (Tiger) to 10.5 (Leopard) I had a pretty clear list of what will be more useful to me:

  • New version feels faster (on the same machine). I am not sure if it is actually faster; or it’s only a perceived improvement. Maybe they optimized something, maybe they multithreaded something, I don’t really care. It feels faster and smoother. That’s good.
  • Quick Look is amazing. A seemingly simple feature - press Space over a file to preview it. With added polish, like when pressing space over multiple images selected, you can go into slideshow mode. Simple, yet highly effective.
  • Spotlight (desktop search) that is fast.
  • …and so on.

Those are things I, as a user, care about. I want computer to feel faster. I want to instantly preview files. I want to search for something fast.

A filesystem that is a database? I can almost see the regular user salivating over that… Yeah right. Users don’t want a platform, users want useful features.

And this is where Vista fails - it does not have obvious new useful features or improvements. Aside from Direct3D 10 - which I am not using yet - all so called “improvements” just feel like gimmicks.

  • It feels slower (I don’t care whether it actually is faster but just feels sluggish). And yes, it feels slower on a quad-core CPU with 4 gigs of RAM and a fast graphics card, so no “Vista runs circles around XP on a new box” please.
  • The reorganized menus, title bars and layout of Explorer just scream “I totally don’t understand what users need” at you. Previews are too small to be useable, organization of menus and buttons is horrible, and the constantly fading-in-and-fading-out user interface elements (folders tree view) are just distracting. I dig the new Office 2007 UI and I can see some understanding of users and vision behind it (see Jensen Harris), but Vista’s UI feels like it was designed by a bunch of people who never talked to each other. And it’s not just lack of polish, the “design” of it is wrong.
  • The Sidebar? Again, an attempt at doing something that seemed good, but without any understanding. Yes, I know that Apple might have taken the idea and implemented it right, but that does not leave Sidebar as being useful.
  • The new skin? Oh come on. How many users did upgrade because window close buttons now glow in red when you hover over them?
  • Was there anything else new in Vista? I didn’t notice anything.

So this pretty much sums up my view on Vista. Zero new useful things, many annoyances. Microsoft, here’s you chance to execute it better next time around.

Amazing! Demoscene news that actually make sense!

There’s a news item on next-gen.biz on Plastic’s Linger in Shadows PS3 demo.

What is totally amazing, is that the news item does actually make sense. It does not treat the demo as a game, or as some “what the f?” thing. Kudos.

About the demo itself - I totally dig the insane amount of work put in there; but I was quite confused with “story” or “meaning”. The visuals are good, the tech is good, it is impressive, but the message of the demo I just did not get. But still great work, go Plastic!

Argh MFC!

When introductory documentation for something has this, you know it won’t be pretty:

CAsyncMonikerFile is derived from CMonikerFile, which in turn is derived from COleStreamFile. A COleStreamFile object represents a stream of data; a CMonikerFile object uses an IMoniker to obtain the data, and a CAsyncMonikerFile object does so asynchronously.

So yeah, I am dealing with downloading something from the internet inside an ActiveX control that is written in MFC. A seemingly simple task - I give you an URL, you give me back the bytes. But no! That would not be a proper architecture, so instead it has asynchronous monikers which are based on monikers which are based on stream files which use some interfaces and whatnot. And for ActiveX controls the docs suggest using CDataPathProperty or CCachedDataPathProperty, which are abstractions build on top of the above crap. And I don’t even know what “a moniker” is!

Of course all this complexity fails spectacularly in some quite common situations. For example, try downloading something when the web server serves gzip compressed html output. Good luck trying to figure out why everything seemingly works, you are notified of downloading progress, but never get the actual downloaded bytes.

Turns out the solution is to change downloading behaviour of the above pile of abstractions to use “pull data” model, instead of default “push data” model. The default behaviour just seems to be broken (though it is not broken in that pile of abstractions, instead it is broken somewhere deeper in Windows code). Is this mentioned anywhere in the docs? Of course not!

This is pretty much how a code comment looks like for this:

We don’t use CCachedDataPathProperty because it’s awfully slow, doing data reallocations for each 1KB received. For 8MB file it’s 8000 reallocations and 32 GB (!) of data copied for no good reason!

While we’re at it, we don’t use CDataPathProperty either, because it’s a useless wrapper over CAsyncMonikerFile.

Oh, and we don’t use CAsyncMonikerFile either, because it has bugs in VS2003′ MFC where it never notifies the container that it is done with download, making IE still display “X items remaining” indefinitely. Some smart coder was converting information message and returning “out of memory” error if result was NULL, even if input message was NULL (which it often was). So we use our own “fixed” version of CAsyncMonikerFile instead.

Oh MFC, how we love thee.

Lolshadows strike again

Continuing the old theme

CAN I HAS MOIRE SHADOWS

CAN I HAS MOIRE SHADOWS?

On job titles and design patterns

I just changed my job title to say “Code Chef“. I like it, and it represents my current understanding of programming pretty well. I cook code. That’s my job.

Some N years ago I would have liked a title with “Architect” or “Analyst” or something like that. I would have called myself “developer” instead of “programmer” because hey, a developer thinks up things, whereas a programmer is a mere “code monkey”. More on code monkeys below.

But wait! Back then I also believed that knowing and using Design Patterns is essential for a programmer! In one place when I was interviewing new hires, design pattern knowledge was something I would look for… how stupid! Nowadays my view of patterns is more along the lines of “yeah, whatever”. I don’t exactly think of them as things from hell, but they could have caused more harm than good already.

Back to job titles. Code monkey is actually the key employee. A software product is largely defined by the code, heck, it is code. Sure, it also has the user interface, the fancy icons, the documentation, the website, the support, the roadmap and whatnot, but the code is the product, whereas everything else is more or less addons (possibly excluding UI… UI also defines the product).

Code design? Design patterns? Who cares about that.

It’s the final result that matters. Futurist programming for the win.

On the other hand, Memento Observer is probably very cool.

One-liners: biawesome filtering

Said by Jonathan Czeck of Graveck:

What kind of filtering does Resize() function use? Nearest-neighbor, bilinear, bicubic, biawesome?

Since then “biawesome” became a local meme at work. Biawesome is awesome on steroids.

Tricky bugs: peculiarities of dynamic linking, and magic divisions

After wasting nearly two days on some really funky animation import crash, I checked in a code change with this log message:

Fix FBX animation import crash once more. When exported symbols are not listed for a dylib, it seems to link back to calling executable (?!), making them share function impls with the same name. And because Keyframe is actually different in editor vs ImportFBX, this is wrong. Apparently this is OS X Leopard only, or something. Argh.

The code change in question was just telling the compiler “here’s the list of the functions that are exported from this dynamic library”. The list was already there, just the compiler was never told about existence of it.

The bug manifested itself as a crash when importing animations. But it would not happen when importer was run from a small unit test application. There were no memory corruptions happening, it was not running out of memory, yet the code was crashing with access violation, usually because STL’s vector was returning it’s wrong size (but the actual data of the vector was correct; it was just returning bogus size). And it was doing that only on OS X Leopard, and not on OS X Tiger. Huh?

Turns out what did happen - and I’m not sure if that’s a bug in OS X or a feature - is that the calling application did contain a class called Keyframe. And the shared library (where the crash was happening) also contained a class called Keyframe. But those classes were slightly different; first was 20 bytes in size, and second one was 16 bytes.

Now, somehow when the shared library was calling vector<Keyframe>::size(), the function from the calling application was used. I have no idea at all how or why this was happening, but it sure was! I could see from tracing the assembly code, that it was doing difference of two pointers, and then doing something that for sure was not division by 16.

What was the code doing? Turns out it was calculating division by 20 in a cunning way:

mov  edx,esi   # edx = end()
sub  edx,eax   # edx -= begin()
mov  eax,edx   # eax = edx
sar  eax,0x2   # eax >>= 2
imul eax,eax,0xcccccccd # eax *= 0xcccccccd

In other words, the compiler was replacing division by constant (as used in vector’s size()) by a shift and multiplication with a magic number. You can read more about the technique here or here.

But of course the code above only works if the number was actually divisible by 20; otherwise it returns totally wrong result. This is perfectly fine for computing the difference in two pointers to structures of known size… Except that inside the shared library the Keyframe structures are 16 bytes, and not 20!

So yeah. Watch out for peculiarities of dynamic linking on your platform.

SwiftShader 2.0 experience

ShiftShader 2.0, a pure software renderer with a Direct3D 9 interface, just got released. I tried it on rendering unit tests and some benchmark tests we have for Unity.

In short, I’m impressed.

It runs rendering tests almost correctly; the only minor bugs seem to be somewhere in attenuation of fixed function vertex lights. Everything else, including shaders, shadows, render to texture works without any problems.

Performance wise, of course it’s dozens to hundreds times slower than a real graphics card, but hey. I also tested with Intel 965 (aka GMA X3000) integrated graphics for comparison. All this on Intel Core2 Quad (Q6600), 3 GB RAM, Windows XP SP2.

  • Avert Fate demo: Radeon HD 3850 about 300 FPS, SwiftShader about 5 FPS (about 15 FPS if per-pixel lighting is turned off), Intel 965 about 22 FPS (about 50 FPS if per-pixel lighting is turned off).
  • Scene with lots of objects and lots of shadow-casting lights: Radeon HD 3850 about 76 FPS, SwiftShader 2.5 FPS, Intel - shadows not supported, duh.
  • High detail terrain with lots of vegetation and four cameras rendering it simultaneously: Radeon HD 3850 about 68 FPS, SwiftShader about 3 FPS, Intel 965 about 12 FPS.

Ok, so SwiftShader loses on performance to Intel 965, but the difference is only “a couple of times”, and not in order of magnitude or so. Pretty good I’d say.