|
|
Archive for 'code'
In Unity we very often mix fixed function and programmable vertex pipelines. In our lighting model, some amount of brightest lights per object are drawn in pixel lit mode, and the rest are drawn using fixed function vertex lighting. Naturally the pixel lights most often use vertex shaders, as they want to calculate some texcoords for light cookies, or do something with tangent space, or calculate some texcoords for shadow mapping, and so on. The vertex lighting pass uses fixed function, because it’s the easiest way. It is possible to implement fixed function lighting equivalent in vertex shaders, but we haven’t done that yet because of complexities of Direct3D and OpenGL, the need to support shader model 1.1 and various other issues. Call me lazy.
And herein lies the problem: most often precision of vertex transformations is not the same in fixed function versus programmable vertex pipelines. If you’d just draw some objects in multiple passes, mixing fixed function and programmable paths, this is roughly what you will get (excuse my programmer’s art):

Not pretty at all! This should have looked like this:

So what do we do to make it look like this? We “pull” (bias) some rendering passes slighly towards the camera, so there is no depth fighting.
Now, at the moment Unity editor runs only on the Macs, which use OpenGL. In there, most of hardware configurations do not need this depth bias at all – they are able to generate same results in fixed function and programmable pipelines. Only Intel cards do need the depth bias on Mac OS X (on Windows, AMD and Intel cards need depth bias). So people author their games using OpenGL, where it does not need depth bias in most cases.
How do you apply depth bias in OpenGL? Enable GL_POLYGON_OFFSET_FILL and set glPolygonOffset to something like -1, -1. This works.
How do you apply depth bias in Direct3D 9? Conceptually, you do the same. There are DEPTHBIAS and SLOPESCALEDEPTHBIAS render states that do just that. And so we did use them.
And people complained about funky results on Windows.
And I’d look at their projects, see that they are using something like 0.01 for camera’s near plane and 1000.0 for the far plane, and tell them something along the lines of “increase your near plane, stupid!” (well ok, without the “stupid” part). And I’d explain all the above about mixing fixed function and vertex shaders, and how we do depth bias in that case, and how on OpenGL it’s often not needed but on Direct3D it’s pretty much always needed. And yes, how sometimes that can produce “double lighting” artifacts on close or intersecting geometry, and how the only solution is to increase the near plane and/or avoid close or intersecting geometry.
Sometimes this helped! I was so convinced that their too-low-near-plane was always the culprit.
And then one day I decided to check. This is what I’ve got on Direct3D:

Ok, this scene is intentionally using a low near plane, but let me stress this again. This is what I’ve got:

Not good at all.
What happened? It happened in roughly this way:
- First, depth bias documentation on Direct3D is wrong. Depth bias is not in 0..16 range, it is in 0..1 range which corresponds to entire range of depth buffer.
- Back then, our code was always using 16 bit depth buffers, so the equivalent of -1,-1 depth bias in OpenGL was multiplied with something like 1.0/65535.0, and that was fed into Direct3D. Hey, it seemed to work!
- Later on, the device setup code was modified to do proper format selection, so most often it ended up using 24 bit depth buffer. Of course
no one I never modified the depth bias code to account for this change…
- And it stayed there. And I kept deceiving myself that the content of the users is to blame, and not some stupid code of mine.
It’s good to check your assumptions once in a while.
So yeah, the proper multiplier for depth bias on Direct3D with 24 bit depth buffer should be not 1.0/65535.0, but something like 1.0/(2^24-1). Except that this value is really small, so something like 4.8e-7 should be used instead (see Lengyel’s GDC2007 talk). Oh, but for some reason it’s not really enough in practice, so something like 2.0*4.8e-7 should be used instead (tested so far on GeForce 8600, Radeon HD 3850, Radeon 9600, Intel 945, reference rasterizer). Oh, and the same value should be used even when a 16 bit depth buffer is used; using 1.0/65535.0 multiplier with 16 bit depth buffer produces way too large bias.
With proper bias values the image is good on Direct3D again. Yay for that (fix is coming in Unity 2.1 soon).
…and yes, I know that real men fudge projection matrix instead of using depth bias… someday maybe.
Posted on 2008-06-12 8:52 in code, d3d, opengl, unity, work | 2 Comments »
When introductory documentation for something has this, you know it won’t be pretty:
CAsyncMonikerFile is derived from CMonikerFile, which in turn is derived from COleStreamFile. A COleStreamFile object represents a stream of data; a CMonikerFile object uses an IMoniker to obtain the data, and a CAsyncMonikerFile object does so asynchronously.
So yeah, I am dealing with downloading something from the internet inside an ActiveX control that is written in MFC. A seemingly simple task – I give you an URL, you give me back the bytes. But no! That would not be a proper architecture, so instead it has asynchronous monikers which are based on monikers which are based on stream files which use some interfaces and whatnot. And for ActiveX controls the docs suggest using CDataPathProperty or CCachedDataPathProperty, which are abstractions build on top of the above crap. And I don’t even know what “a moniker” is!
Of course all this complexity fails spectacularly in some quite common situations. For example, try downloading something when the web server serves gzip compressed html output. Good luck trying to figure out why everything seemingly works, you are notified of downloading progress, but never get the actual downloaded bytes.
Turns out the solution is to change downloading behaviour of the above pile of abstractions to use “pull data” model, instead of default “push data” model. The default behaviour just seems to be broken (though it is not broken in that pile of abstractions, instead it is broken somewhere deeper in Windows code). Is this mentioned anywhere in the docs? Of course not!
This is pretty much how a code comment looks like for this:
We don’t use CCachedDataPathProperty because it’s awfully slow, doing data reallocations for each 1KB received. For 8MB file it’s 8000 reallocations and 32 GB (!) of data copied for no good reason!
While we’re at it, we don’t use CDataPathProperty either, because it’s a useless wrapper over CAsyncMonikerFile.
Oh, and we don’t use CAsyncMonikerFile either, because it has bugs in VS2003′ MFC where it never notifies the container that it is done with download, making IE still display “X items remaining” indefinitely. Some smart coder was converting information message and returning “out of memory” error if result was NULL, even if input message was NULL (which it often was). So we use our own “fixed” version of CAsyncMonikerFile instead.
Oh MFC, how we love thee.
Posted on 2008-05-20 9:02 in code, rant, work | 3 Comments »
I just changed my job title to say “Code Chef“. I like it, and it represents my current understanding of programming pretty well. I cook code. That’s my job.
Some N years ago I would have liked a title with “Architect” or “Analyst” or something like that. I would have called myself “developer” instead of “programmer” because hey, a developer thinks up things, whereas a programmer is a mere “code monkey”. More on code monkeys below.
But wait! Back then I also believed that knowing and using Design Patterns is essential for a programmer! In one place when I was interviewing new hires, design pattern knowledge was something I would look for… how stupid! Nowadays my view of patterns is more along the lines of “yeah, whatever”. I don’t exactly think of them as things from hell, but they could have caused more harm than good already.
Back to job titles. Code monkey is actually the key employee. A software product is largely defined by the code, heck, it is code. Sure, it also has the user interface, the fancy icons, the documentation, the website, the support, the roadmap and whatnot, but the code is the product, whereas everything else is more or less addons (possibly excluding UI… UI also defines the product).
Code design? Design patterns? Who cares about that.
It’s the final result that matters. Futurist programming for the win.
On the other hand, Memento Observer is probably very cool.
Posted on 2008-05-09 13:25 in code, rant, work | 4 Comments »
After wasting nearly two days on some really funky animation import crash, I checked in a code change with this log message:
Fix FBX animation import crash once more. When exported symbols are not listed for a dylib, it seems to link back to calling executable (?!), making them share function impls with the same name. And because Keyframe is actually different in editor vs ImportFBX, this is wrong. Apparently this is OS X Leopard only, or something. Argh.
The code change in question was just telling the compiler “here’s the list of the functions that are exported from this dynamic library”. The list was already there, just the compiler was never told about existence of it.
The bug manifested itself as a crash when importing animations. But it would not happen when importer was run from a small unit test application. There were no memory corruptions happening, it was not running out of memory, yet the code was crashing with access violation, usually because STL’s vector was returning it’s wrong size (but the actual data of the vector was correct; it was just returning bogus size). And it was doing that only on OS X Leopard, and not on OS X Tiger. Huh?
Turns out what did happen – and I’m not sure if that’s a bug in OS X or a feature – is that the calling application did contain a class called Keyframe. And the shared library (where the crash was happening) also contained a class called Keyframe. But those classes were slightly different; first was 20 bytes in size, and second one was 16 bytes.
Now, somehow when the shared library was calling vector<Keyframe>::size(), the function from the calling application was used. I have no idea at all how or why this was happening, but it sure was! I could see from tracing the assembly code, that it was doing difference of two pointers, and then doing something that for sure was not division by 16.
What was the code doing? Turns out it was calculating division by 20 in a cunning way:
mov edx,esi # edx = end()
sub edx,eax # edx -= begin()
mov eax,edx # eax = edx
sar eax,0x2 # eax >>= 2
imul eax,eax,0xcccccccd # eax *= 0xcccccccd
In other words, the compiler was replacing division by constant (as used in vector’s size()) by a shift and multiplication with a magic number. You can read more about the technique here or here.
But of course the code above only works if the number was actually divisible by 20; otherwise it returns totally wrong result. This is perfectly fine for computing the difference in two pointers to structures of known size… Except that inside the shared library the Keyframe structures are 16 bytes, and not 20!
So yeah. Watch out for peculiarities of dynamic linking on your platform.
Posted on 2008-04-19 21:00 in code, work | 4 Comments »
I decided to make a very small game with Unity. Coincidentally, Danc of Lost Garden fame just announced a small game design challenge called “Play With Your Peas“. It comes with a set of cute graphics and a ready-to-be-implemented game design. What more could I want?
So it’s a small very small 2D game without any next-gen bells and whistles. It can probably be done casually on the side, by allocating an hour here and there. We’ll see how it goes. Hey, I never actually done any game in Unity, I only make or break some underlying parts…
Of course, first I start with no game, just imported graphics. Hey look, I can do sprites!
Then cook up some base things: define the game grid, throw in some basic user interface on the right hand side, and make it actually do something. This wasn’t so hard; that already gets me an almost working level building functionality. It does not have fancy block building delay or block deletion yet; that will come later.
Next come basic physics. Danc’s design calls for simple arcade-like physics (things moving at constant speeds, bouncing off at equal angles, and so on), but in Unity I have a fully fledged physics engine just waiting to be used. Let’s use that.
The design has sloped ramp pieces, which are hard to approximate using any primitive colliders, so instead I’ll use convex mesh colliders for them. Now, on this machine I only have Blender, which I totally don’t know how to use; and I was too lazy to go to PC and use 3ds Max there. What a coder does? Of course, just type in the mesh file in ASCII FBX format. Excerpt:
; scaled 2x in Z, by 0.85 in Y
Vertices: -0.5,-0.425,-1.0, 0.5,-0.425,-1.0, -0.5,-0.425,1.0, 0.5,-0.425,1.0, -0.5,0.425,-1.0, -0.5,0.425,1.0
PolygonVertexIndex: 0,1,-3,2,1,-4,1,0,-5,2,3,-6,0,2,-5,2,5,-5,3,1,-5,5,3,-5
It’s a left ramp mesh! So much for fancy asset auto-importing functionality, when you don’t know how to use those 3D apps :)
 After a while I’ve got peas being controlled by physics, colliding with level and so on. Physics is very bad for productivity, as I ended up just playing around with pea-stacks!
So far there’s no game yet… Next up: implement some AI for the peas, so they can wander around, climb the walls, fall down and bounce around. I guess that will be more work and less playing around… We’ll see.
Posted on 2008-02-20 21:42 in code, games, unity | 3 Comments »
Ever wondered what takes up space in the programs you write? I certainly did on a number of occasions.
For some reason though, I could not find a decent tool that would look at a Visual Studio compiled executable or a DLL, and report an overview of how large are the functions, classes, object files and whatnot. .kkrunchy executable packer does have a very nice size report, but it’s not exactly suitable for large executables…
Anyway, ryg of farbrausch fame was kind enough to donate the size reporting code, I did some modifications, and here it is: Sizer – executable symbol size reporting utility.
Enjoy. Oh, and the source code looks messy mostly because ryg and I use different indentation, and I never cared to format everything with a single style. Noone cares about the source code anyway, as long as it works. I’m not claiming that this code works, of course!
Posted on 2008-01-17 12:24 in code | 8 Comments »
Could not find any info how to do oblique near clipping plane for orthographic projections, so had to figure it out myself. It even wasn’t hard!
Here it is.
Posted on 2007-11-12 9:58 in code, papers | No Comments »
Everyone is saying “unit tests for the win!” all over the place. That’s good, but how would you actually test graphics related code? Especially considering all the different hardware and drivers out there, where the result might be different just because the hardware is different, or because the hardware/driver understands your code in a funky way…
Here is how we do it at work. This took quite some time to set up, but I think it’s very worth it.
First you need hardware to test things on. For a start just a couple of graphics cards that you can swap in and out might do the trick. A larger problem is integrated graphics cards – it’s quite hard to swap them in and out, so we bit the bullet and bought a machine for each integrated card that we care about. The same machines are then used to test discrete cards (we have several shelves of those by now, going all the way back to… does ATI Rage, Matrox G45 or S3 ProSavage say anything to you?).
Then you make the unit tests (or perhaps these should be called the functional tests). Build a small scene for every possible thing that you can imagine. Some examples:
- Do all blend modes work?
- Do light cookies work?
- Does automatic texture coordinate generation and texture transforms work?
- Does rendering of particles work?
- Does glow image postprocessing effect work?
- Does mesh skinning work?
- Do shadows from point lights work?
This will result in a lot of tests, with each test hopefully testing a small, isolated feature. Make some setup that can load all defined tests in succession and take screenshots of the results. Make sure time always progresses at fixed rate (for the case where a test does not produce a constant image… like particle or animation tests), and take a screenshot of, for example, frame 5 for each test (so that some tests have some data to warm up… for example motion blur test).
By this time you have something that you can run and it spits out lots of screenshots. This is already very useful. Get a new graphics card, upgrade to new OS or install a new shiny driver? Run the tests, and obvious errors (if any) can be found just by quickly flipping through the shots. Same with the changes that are made in rendering related code – run the tests, see if anything became broken.
The testing process can be further automated. Here we have a small set of Perl scripts that can either produce a suite of test images for the current hardware, or run all the tests and compare the results with “known to be correct” suite of images. As graphics cards are different from each other, the “correct” results will be somewhat different (because of different capabilities, internal precision etc.). So we keep a set of test results for each graphics card.
Then these scripts can be run for various driver versions on every graphics card. They compare results for each test case, and for failed tests copy out the resulting screenshot, the correct screenshot, log the failures into a wiki-compatible format (to be posted on some internal wiki), etc.
I’ve heard that some folks even go a step further – fully automate the testing of all driver versions. Install one driver in silent mode, reboot the machine, after reboot runs another script that launches the tests and proceeds with the next driver version. I don’t know if that is only an urban legend or if someone actually does this*, but that would be an interesting thing to try. The testing per card then would be: 1) install a card, 2) run the test script, 3) coffee break, happiness and profit!
* My impression is that at least with the big games it works the other way around – you don’t test with the hardware; instead the hardware guys test with your game. That’s how it looks for a clueless observer like me at least.
So far this unit test suite was really helpful in a couple of ways: making of the just-announced Direct3D renderer and discovering new & exciting graphics card/driver workarounds that we have to do. Making of the suite did take a lot of time, but I’m happy with it!
Posted on 2007-07-31 23:49 in code, unity, work | 12 Comments »
A paper on Electronic Arts’ implementation of Standard Template Library.
Is it insane or the only sane thing to do? It’s insane amount of work, but it looks like they know what they’re doing. STL is broken in many ways, especially on memory limited systems… Now they could release it as open source with a decent license!
Posted on 2007-07-16 14:57 in code, papers | 1 Comment »
I ranted about OpenGL p-buffers a while ago. Time for the whole story!
From time to time I hit some nasty debugging situation, and it always takes ages to figure out, and the path to the solution is always different. This is an example of such a debugging story.
While developing shadow mapping I implemented a “screen space shadows” thing (where cascaded shadow maps are gathered into a screen-space texture and shadow receiver rendering later uses only that texture). Then while being in the editor and maximizing/restoring the window a few times, everything locks up for 3 or 5 seconds, then resumes normally.
So there’s a problem: a complete freeze after editor window is being resized after a couple of times (not immediately!), but otherwise everything just works. Where is the bug? What caused it?
Since shadows were working fine before, and I never noticed such lock-ups – it must be the screen-space shadow gathering thing that I just implemented, right? (Fast-forward answer: no) So I try to figure out where the lock-up is happening. Profiling does not give any insights – the lock-up is not even in my process, instead “somewhere”. Hm… I insert lots of manual timing code around various code blocks (that deal with shadows). They say the lock-up most often happens when activating a new render texture (an OpenGL p-buffer), specifically, calling a glFlush(). But not always, sometimes it’s still somewhere else.
After some head-scratching, a session with OpenGL Driver Profiler reveals what is actually happening – video memory is leaked! Apparently Mac OS X “virtualizes” VRAM, and when it runs out, the OS will still happily create p-buffers and so on, it will just start swapping VRAM contents to AGP/PCIe area. This swapping causes the lock-up. Ok, so now I know what is happening, I just need to find out why.
I look at all the code that deals with render textures – it looks ok. And it would be pretty strange if a VRAM leak would be unnoticed for two years since Unity is out in the wild… So that must be the depth render textures that are causing a leak (since they are a new type for the shadows), right? (Answer: no)
I build a test case that allocates and deallocates a bunch of depth render textures each frame. No leaks… Huh.
I change my original code so that it gathers screen-space shadows onto the screen directly, instead of the screen-sized texture. No leaks… Hm… So it must be the depth render texture followed by screen-size render texture, that is causing the leaks, right? (Answer: no) Because when I have just the depth render texture, I have no leaks; and when I have no depth render texture, instead I gather shadows “from nothing” into a screen-size texture, I also have no leaks. So it must be the combination!
So far, the theory is that rendering into a depth texture followed by creation of screen-size texture will cause a video memory leak (Answer: no). It looks like it leaks the amount that should be taken by depth texture (I say “it looks” because in OpenGL you never know… it’s all abstracted to make my life easier, hurray!). Looks like a fine bug report, time to build a small repro application that is completely separate from Unity.
So I grab some p-buffer sample code from Apple’s developer site, change it to also use depth textures and rectangle textures, remove all unused cruft, code the expected bug pattern (render into depth texture followed by rectangle p-buffer creation) and… it does not leak. D’oh.
Ok, another attempt: I take the p-buffer related code out of Unity, build a small application with just that code, code the expected bug pattern and… it does not leak! Huh?
Now what?
I compare the OpenGL call traces of Unity-in-test-case (leaks) and Unity-code-in-a-separate-app (does not leak). Of course, the Unity case does a lot more; setting up various state, shaders, textures, rendering actual objects with actual shaders, filtering out redundant state changes and whatnot. So I try to bring in bits of stuff that Unity does into my test application.
After a while I made my test app leak video memory (now that’s an achievement)! Turns out the leak happens when doing this:
- Create depth p-buffer
- Draw to depth p-buffer
- Copy it’s contents into a depth texture
- Create a screen-sized p-buffer
- Draw something into it using the depth texture
- Release the depth texture and p-buffer
- Release the screen-sized p-buffer
My initial test app was not doing step 5… Now, why the leaks happens? Is it a bug or something I am doing wrong? And more importantly: how to get rid of it?
My suspicion was that OpenGL context sharing was somehow to blame here (finally, a correct suspicion). We share OpenGL contexts, because, well, it’s the only sane thing to do – if you have a texture, mesh or shader somewhere, you really want to have it available both to the screen and when rendering into something else. The documentation on sharing of OpenGL contexts is extremely spartan, however. Like: “yeah, when they are shared, then the resources are shared” – great. Well, the actual text is like this (Apple’s QA1248):
All sharing is peer to peer and developers can assume that shared resources are reference counted and thus will be
maintained until explicitly released or when the last context sharing resources is itself released. It is helpful to think of this in the simplest terms possible and not to assume excess complication.
Ok, I am thinking of this in the simplest terms possible… and it leaks video memory! The docs do not have a single word on how the resources are reference counted and what happens when a context is deleted.
Anyway, armed with my suspicion of context sharing being The Bad Guy here, I tried random things in my small test app. Turns out that unbinding any active textures from a context before switching to new one got rid of the leak. It looks like objects are refcounted by contexts, and they are not actually deleted while they are bound in some context (that is what I expect to happen). However, when a context itself is deleted, it seems as if it does not decrease refcounts of these objects (that is definitely what I don’t expect to happen). I am not sure if that’s a bug, or just undocumented “feature”…
All happy, I bring in my changes to the full codebase (“unbind any active textures before switching to a new context!”)… and the leak is still there. Huh?
After some head-scratching and randomly experimenting with whatever, turns out that you have to unbind any active “things” before switching to a new context. Even leaving a vertex buffer object bound can make a depth texture memory be leaked when another context is destroyed. Funky, eh?
So that was some 4 days wasted on chasing the bug that started out as “mysterious 5 second lock-ups”, went through “screen-space shadows leak video memory”, then through “depth textures followed by screen-size textures leak video memory” and through “unbind textures before switching contexts” to “unbind everything before switching contexts”. Would I have guessed it would end up like this? Not at all. I am still not sure if that’s the intended behavior or a bug; it looks more like a bug to me.
The take-away for OpenGL developers: when using shared contexts, unbind active textures, VBOs, shader programs etc. before switching OpenGL contexts. Otherwise at least on Mac OS X you will hit video memory leaks.
It’s somewhat sad that I find myself fighting issues like that most of my development time – not actually implementing some cool new stuff, but making stuff actually work. Oh well, I guess that is the difference between making (tech)demos and an actual software product.
Posted on 2007-07-14 21:31 in code, opengl, work | 2 Comments »
|