Archive for 'code'

Debugging story: video memory leaks

I ranted about OpenGL p-buffers a while ago. Time for the whole story!

From time to time I hit some nasty debugging situation, and it always takes ages to figure out, and the path to the solution is always different. This is an example of such a debugging story.

While developing shadow mapping I implemented a “screen space shadows” thing (where cascaded shadow maps are gathered into a screen-space texture and shadow receiver rendering later uses only that texture). Then while being in the editor and maximizing/restoring the window a few times, everything locks up for 3 or 5 seconds, then resumes normally.

So there’s a problem: a complete freeze after editor window is being resized after a couple of times (not immediately!), but otherwise everything just works. Where is the bug? What caused it?

Since shadows were working fine before, and I never noticed such lock-ups - it must be the screen-space shadow gathering thing that I just implemented, right? (Fast-forward answer: no) So I try to figure out where the lock-up is happening. Profiling does not give any insights - the lock-up is not even in my process, instead “somewhere”. Hm… I insert lots of manual timing code around various code blocks (that deal with shadows). They say the lock-up most often happens when activating a new render texture (an OpenGL p-buffer), specifically, calling a glFlush(). But not always, sometimes it’s still somewhere else.

After some head-scratching, a session with OpenGL Driver Profiler reveals what is actually happening - video memory is leaked! Apparently Mac OS X “virtualizes” VRAM, and when it runs out, the OS will still happily create p-buffers and so on, it will just start swapping VRAM contents to AGP/PCIe area. This swapping causes the lock-up. Ok, so now I know what is happening, I just need to find out why.

I look at all the code that deals with render textures - it looks ok. And it would be pretty strange if a VRAM leak would be unnoticed for two years since Unity is out in the wild… So that must be the depth render textures that are causing a leak (since they are a new type for the shadows), right? (Answer: no)

I build a test case that allocates and deallocates a bunch of depth render textures each frame. No leaks… Huh.

I change my original code so that it gathers screen-space shadows onto the screen directly, instead of the screen-sized texture. No leaks… Hm… So it must be the depth render texture followed by screen-size render texture, that is causing the leaks, right? (Answer: no) Because when I have just the depth render texture, I have no leaks; and when I have no depth render texture, instead I gather shadows “from nothing” into a screen-size texture, I also have no leaks. So it must be the combination!

So far, the theory is that rendering into a depth texture followed by creation of screen-size texture will cause a video memory leak (Answer: no). It looks like it leaks the amount that should be taken by depth texture (I say “it looks” because in OpenGL you never know… it’s all abstracted to make my life easier, hurray!). Looks like a fine bug report, time to build a small repro application that is completely separate from Unity.

So I grab some p-buffer sample code from Apple’s developer site, change it to also use depth textures and rectangle textures, remove all unused cruft, code the expected bug pattern (render into depth texture followed by rectangle p-buffer creation) and… it does not leak. D’oh.

Ok, another attempt: I take the p-buffer related code out of Unity, build a small application with just that code, code the expected bug pattern and… it does not leak! Huh?

Now what?

I compare the OpenGL call traces of Unity-in-test-case (leaks) and Unity-code-in-a-separate-app (does not leak). Of course, the Unity case does a lot more; setting up various state, shaders, textures, rendering actual objects with actual shaders, filtering out redundant state changes and whatnot. So I try to bring in bits of stuff that Unity does into my test application.

After a while I made my test app leak video memory (now that’s an achievement)! Turns out the leak happens when doing this:

  1. Create depth p-buffer
  2. Draw to depth p-buffer
  3. Copy it’s contents into a depth texture
  4. Create a screen-sized p-buffer
  5. Draw something into it using the depth texture
  6. Release the depth texture and p-buffer
  7. Release the screen-sized p-buffer

My initial test app was not doing step 5… Now, why the leaks happens? Is it a bug or something I am doing wrong? And more importantly: how to get rid of it?

My suspicion was that OpenGL context sharing was somehow to blame here (finally, a correct suspicion). We share OpenGL contexts, because, well, it’s the only sane thing to do - if you have a texture, mesh or shader somewhere, you really want to have it available both to the screen and when rendering into something else. The documentation on sharing of OpenGL contexts is extremely spartan, however. Like: “yeah, when they are shared, then the resources are shared” - great. Well, the actual text is like this (Apple’s QA1248):

All sharing is peer to peer and developers can assume that shared resources are reference counted and thus will be
maintained until explicitly released or when the last context sharing resources is itself released. It is helpful to think of this in the simplest terms possible and not to assume excess complication.

Ok, I am thinking of this in the simplest terms possible… and it leaks video memory! The docs do not have a single word on how the resources are reference counted and what happens when a context is deleted.

Anyway, armed with my suspicion of context sharing being The Bad Guy here, I tried random things in my small test app. Turns out that unbinding any active textures from a context before switching to new one got rid of the leak. It looks like objects are refcounted by contexts, and they are not actually deleted while they are bound in some context (that is what I expect to happen). However, when a context itself is deleted, it seems as if it does not decrease refcounts of these objects (that is definitely what I don’t expect to happen). I am not sure if that’s a bug, or just undocumented “feature”…

All happy, I bring in my changes to the full codebase (”unbind any active textures before switching to a new context!”)… and the leak is still there. Huh?

After some head-scratching and randomly experimenting with whatever, turns out that you have to unbind any active “things” before switching to a new context. Even leaving a vertex buffer object bound can make a depth texture memory be leaked when another context is destroyed. Funky, eh?

So that was some 4 days wasted on chasing the bug that started out as “mysterious 5 second lock-ups”, went through “screen-space shadows leak video memory”, then through “depth textures followed by screen-size textures leak video memory” and through “unbind textures before switching contexts” to “unbind everything before switching contexts”. Would I have guessed it would end up like this? Not at all. I am still not sure if that’s the intended behavior or a bug; it looks more like a bug to me.

The take-away for OpenGL developers: when using shared contexts, unbind active textures, VBOs, shader programs etc. before switching OpenGL contexts. Otherwise at least on Mac OS X you will hit video memory leaks.

It’s somewhat sad that I find myself fighting issues like that most of my development time - not actually implementing some cool new stuff, but making stuff actually work. Oh well, I guess that is the difference between making (tech)demos and an actual software product.

Now that’s what I call a good API (stb_image)

The other day at work I needed a command line tool to compare some images (whether they mostly match, used in unit/functional tests). For unknown reason I could not get ImageMagick’s compare to work like I wanted, so I just wrote my own.

I used stb_image library from Sean Barrett - and it just rocks! Here’s the code to load a PNG image from file:

int width, height, bpp;
unsigned char* rgb = stbi_load( "myimage.png", &width, &height, &bpp, 3 );
// rgb is now three bytes per pixel, width*height size. Or NULL if load failed.
// Do something with it...
stbi_image_free( rgb );

That’s it! Basically a single line to load the image (and of course the library has similar functions to load from a block of memory, etc.). And the whole “library” is a single file - just add to your project and there it is. In comparison, loading a PNG file using de-facto libpng takes more than 100 lines of code (and some time to read the docs).

Small is beautiful.

…and the way we do graphics related unit/functional/compatibility testing deserves a separate article. Sometime in the future!

What’s wrong with this code?

Here’s a short function:

inline int SecondsToEnergy( float time )
{
  return FastFloorfToInt( time * (float)(1 << kEnergyFixedPoint) );
}

It’s used in the particle system, and converts particle lifetime to an internal fixed point representation (10 bits for fractional part, i.e. kEnergyFixedPoint=10).

Some of the emitted particles are okay on a Mac, but completely not visible on Windows. This function is to blame.

Of course, what’s wrong is the possible overflow in float-to-int conversion. Whenever someone tries to use lifetime longer than about 2097151, the conversion to signed 32 bit integer is undefined. It seems to clamp result in gcc and produce something like -1 in msvc.

Using multiple compilers can be hard, but it can also help in finding obscure bugs. Ha!

On work and clean code

It’s been like 6 months of me working on Unity. So far so good. We’ve done a big new release recently, so after some pre-release insanity we’re a bit more relaxed. I guess not for very long though, we have more stuff planned than we can handle :)

It sure feels nice to work on an actual software product. I think it’s probably the first time in my carreer that I know people are using my work and I do care about that. Having worked on projects before, it’s very different - a project just comes and goes, and once it’s finished you never think about it again. And most of the time you don’t care about “the clients” that much either. Working on a product is much more rewarding (especially if the users seem to like it).

Another interesting here is that we are a very small software shop. So everyone has to be a one-man-army (the others certainly are, not sure about myself). Design, program, fix bugs, decide on features, do support, write docs and even do html tweaks for the website. Of course, it could be Jack of all trades, master of none (*), but somehow I feel that we are managing pretty well. And I like to be involved in various aspects of making a product.

(*) though wikipedia says that the full saying is Jack of all trades, master of none, though ofttimes better than master of one - which looks like a positive thing to me.

A completely different theme: when programming, it’s always good to massage the code you’re working with a bit. Remove unneccessary #includes. Write a comment on tricky code block. Fix warnings. Do small refactorings. Remove unused code paths. It does not take much time and helps to keep the codebase clean. Removing unused code is especially good - for some reason I love removing code. Could do that all day long; probably I’m some kind of anti-programmer :)

OOP and other things now and then

Approximate conversation at work the other day:

Yeah, I split this into separate files, removed this and made these classes to make it actually work.
Ok, but don’t go too fancy with objects here.
Sure! I think it’s the only place where I actually use inheritance!

Heh. I’d imagine how that would have looked back some 5 years ago. “What design patterns did you use here?” etc. Funny how things change.

I think I’ve got it by now - took me way too much time for such a trivial thing - there is no silver bullet. OOP or any other buzzword is just a means to do something; sometimes it fits, sometimes it does not. Regarding OOP, I highly recommend Execution in the Kingdom of Nouns essay - it’s way exaggerated, but has the point. The best part:

advocating Object-Oriented Programming is like advocating Pants-Oriented Clothing

There is one thing about the codebase that we have at work that I love: it does not use any particular design/programming technique. A bit of OO, a bit of metaprogramming, a bit of plain C style, a bit of preprocessor macros, etc. I like to think that we’re using the best of those worlds, of course :)

Pair programming / animations

Tried out pair programming the other day. I can definitely see it working, especially on hard topics (i.e. where you spend lots of time thinking, explaining, arguing and brainstorming that just typing in code). I am still not sure whether it really suits for “ordinary” day-to-day programming though.

The topic I and Joe tried it on was a pretty hard one - related to the core animation system. Now, of course I don’t know anything about animation systems, but my impression is that there is just no “universal” way of designing it. The ones that are floating around inside free/open engines/libraries (cal3d, nebula2) are quite fine, but not much more impressive than my own very simplistic attempts at doing animations. There is nothing wrong with that of course - its simple, it gets the job done, and its okay in most of the cases when you’re doing simple stuff.

But then, if you want something more advanced, you either have to go and get the big serious libraries, or just… well… not do it.

So, back to pair programming - making the core animation system that would have transitions, continuous blends, animation layers, bone masks and whatnot (and the kitchen sink of course!) is just not very easy. We paired basically on writing the pseudocode of the system, or some sort of outline; changed the implementation several times along the way, and in the end we’re left with a really nice and fast system and the code is actually quite simple. Much of the credit for that goes to Joe, as he found out some really cool ways to optimize the expensive things away (at that time we were not doing pair programming anymore - I went to do some research on shadows!)

But to reiterate - pairing can definitely work. I guess mostly because the other person just keeps asking “why you’re doing this?” or “this is wrong” or “we’re in a deep shit now” or “that’s awesome” :)

64k coding continued

I’m making a steady, but very slow progress on “my” 64k intro. Over the last week I couldn’t get over 13 kilobytes, so you can see that the progress is really slow. Not because I don’t code anything, but all code increase was cancelled by data size optimizations.

So far coding and data design for small sizes is not that much pain at all. Just, well, code and, well, keep your data small :) We’re only talking about the size of initial data, not the runtime size though.

A few obvious or new notes:

  • Code to construct a cylinder is more complex than the one to construct a sphere. That’s what I expected. However, code to construct a box with multiple segments per side is the most complex of all!
  • Dropping last byte from floats is usually okay. And instant 25% save! For some of the numbers, I plan to switch to half-style float (2 bytes) if space becomes a concern.
  • Storing quaternions in 4 bytes (byte per component) is good. Actually, now that I think of it, it makes more sense to store three components at 10 bits each, and just store the sign of 4th component - better precision for the same size.
  • This intro literally has the most complex and most automated “art pipeline” of any demo/game I (directly) worked on! I’ve got maxscripts generating C++ sources, custom commandline tools preprocessing C++ sources (mostly floats packing - due to lack of maxscript functionality), lua scripts for batch-compiling HLSL shaders, “development code” generating .obj models for import back into max, etc. It’s wacky, weird and cool!
  • Compiling HLSL in two steps (HLSL->asm and asm->bytecode) instead of direct (HLSL->bytecode) gets rid of the constant table, some copyright strings and hence is good. (thanks blackpawn!)
  • Getting FFD code to behave remotely similar to how 3dsmax does FFD is hard :)
The best thing so far is that I’ve got the music track from x_dynamics - it’s already done in V2 synth, takes small amount of space and is really good. Now I “just” have to finish the intro…

Debugging plus

Yesterday I had a cool debugging session while working on my HDR demo. One of postprocessing filters produced weird results and I went off to investigate that. The usual tricks: debugging in Visual Studio to make sure right sample offsets are generated; D3D debug runtime, D3DX debug, reference rasterizer, firing up NVPerfHud and doing frame analysis, doing full capture with PIX and inspecting device state, etc.

Nothing helped.

Then I noticed that in the pixel shader, I wrote

sample = tex2D( s0, uv + vSmpOffsets[i] )

instead of

sample += tex2D( s0, uv + vSmpOffsets[i] )

Aaargh. So much for a plus sign.

How to deal with such bugs? Why some bugs are trivial to find, and some are hard? Why sometimes (often?) the time required to find the bug does not correlate with bug’s “trickiness”? Why sometimes I can find a tricky bug in big unknown codebase in a couple of minutes; yet spend two hours on the plus sign in my own small code?

I’ve got no answers to the above.

By the way: PIX is a great tool, but D3D guys should really polish the UI :)

C++ clunkiness

Here I am coding various things and it suddenly struck me: C++ is pretty clunky. Now, I knew this to some extent for quite a time already, but the more I code C++ the more clunky it feels.

I admire it as a low-level language; it’s very powerful and there’s lots of unbelievable things you can do with it (think templates).

But still, it feels like a low-level one. I really want to code by “next project” (whatever that might be) in Lua, for example (especially now that LuaJIT is out).

Immediate mode GUI

It’s already advertised somewhere else, but I’ll add it as well: Casey Muratori has an amazing lecture on Immediate Mode GUI. I like the idea, and while I think IMGUI mostly applies to realtime UIs (in-game, editors etc.), this is a good read (er… watch) for any UI programmer.

I was subconsciously heading towards that UI style as well; drawing and processing some UI “immediately”, though not at the level Casey does.