|
|
Archive for 'd3d'
In Unity we very often mix fixed function and programmable vertex pipelines. In our lighting model, some amount of brightest lights per object are drawn in pixel lit mode, and the rest are drawn using fixed function vertex lighting. Naturally the pixel lights most often use vertex shaders, as they want to calculate some texcoords for light cookies, or do something with tangent space, or calculate some texcoords for shadow mapping, and so on. The vertex lighting pass uses fixed function, because it’s the easiest way. It is possible to implement fixed function lighting equivalent in vertex shaders, but we haven’t done that yet because of complexities of Direct3D and OpenGL, the need to support shader model 1.1 and various other issues. Call me lazy.
And herein lies the problem: most often precision of vertex transformations is not the same in fixed function versus programmable vertex pipelines. If you’d just draw some objects in multiple passes, mixing fixed function and programmable paths, this is roughly what you will get (excuse my programmer’s art):

Not pretty at all! This should have looked like this:

So what do we do to make it look like this? We “pull” (bias) some rendering passes slighly towards the camera, so there is no depth fighting.
Now, at the moment Unity editor runs only on the Macs, which use OpenGL. In there, most of hardware configurations do not need this depth bias at all - they are able to generate same results in fixed function and programmable pipelines. Only Intel cards do need the depth bias on Mac OS X (on Windows, AMD and Intel cards need depth bias). So people author their games using OpenGL, where it does not need depth bias in most cases.
How do you apply depth bias in OpenGL? Enable GL_POLYGON_OFFSET_FILL and set glPolygonOffset to something like -1, -1. This works.
How do you apply depth bias in Direct3D 9? Conceptually, you do the same. There are DEPTHBIAS and SLOPESCALEDEPTHBIAS render states that do just that. And so we did use them.
And people complained about funky results on Windows.
And I’d look at their projects, see that they are using something like 0.01 for camera’s near plane and 1000.0 for the far plane, and tell them something along the lines of “increase your near plane, stupid!” (well ok, without the “stupid” part). And I’d explain all the above about mixing fixed function and vertex shaders, and how we do depth bias in that case, and how on OpenGL it’s often not needed but on Direct3D it’s pretty much always needed. And yes, how sometimes that can produce “double lighting” artifacts on close or intersecting geometry, and how the only solution is to increase the near plane and/or avoid close or intersecting geometry.
Sometimes this helped! I was so convinced that their too-low-near-plane was always the culprit.
And then one day I decided to check. This is what I’ve got on Direct3D:

Ok, this scene is intentionally using a low near plane, but let me stress this again. This is what I’ve got:

Not good at all.
What happened? It happened in roughly this way:
- First, depth bias documentation on Direct3D is wrong. Depth bias is not in 0..16 range, it is in 0..1 range which corresponds to entire range of depth buffer.
- Back then, our code was always using 16 bit depth buffers, so the equivalent of -1,-1 depth bias in OpenGL was multiplied with something like 1.0/65535.0, and that was fed into Direct3D. Hey, it seemed to work!
- Later on, the device setup code was modified to do proper format selection, so most often it ended up using 24 bit depth buffer. Of course
no one I never modified the depth bias code to account for this change…
- And it stayed there. And I kept deceiving myself that the content of the users is to blame, and not some stupid code of mine.
It’s good to check your assumptions once in a while.
So yeah, the proper multiplier for depth bias on Direct3D with 24 bit depth buffer should be not 1.0/65535.0, but something like 1.0/(2^24-1). Except that this value is really small, so something like 4.8e-7 should be used instead (see Lengyel’s GDC2007 talk). Oh, but for some reason it’s not really enough in practice, so something like 2.0*4.8e-7 should be used instead (tested so far on GeForce 8600, Radeon HD 3850, Radeon 9600, Intel 945, reference rasterizer). Oh, and the same value should be used even when a 16 bit depth buffer is used; using 1.0/65535.0 multiplier with 16 bit depth buffer produces way too large bias.
With proper bias values the image is good on Direct3D again. Yay for that (fix is coming in Unity 2.1 soon).
…and yes, I know that real men fudge projection matrix instead of using depth bias… someday maybe.
Posted on 2008-06-12 8:52 in code, d3d, opengl, unity, work | 1 Comment »
(cross-posted from blogs.unity3d.com)
One of our customers found an interesting bug the other day: embedding Unity Web Player into a web page makes some javascript animation libraries not work correctly. For example, script.aculo.us or Dojo Toolkit would stop doing some of their tasks. But only on Windows, and only on some browsers (Firefox and Safari).
Wait a moment… Unity plugin makes nice wobbling web page elements not wobble anymore!? Sounds like an interesting issue…
So I prepared for a debug session and tried the usual “divide by two until you locate the problem” approach.
- Unity Web Player is composed of two parts: a small browser plugin, and the actual “engine” (let’s call it “runtime”). First I change the plugin so that it only loads the data, but never loads or starts the runtime. Everything works. So the problem is not in the plugin. Good.
- Load the runtime and do basic initialization (create child window, load Mono, …), but never actually start playing the content - everything works.
- Load the runtime and fully initialize everything, but never actually start playing the content - the bug appears! By now I know that the problem is somewhere in the initialization.
Initialization reads some settings from the data file, creates some “manager objects” for the runtime, initializes graphics device, loads first game “level” and then the game can play.
What of the above could cause something inside browser’s JavaScript engine stop working? And do that only on Windows, and only on some browsers? My first guess was the most platform-specific part: intialization of the graphics device, which on Windows usually happens to be Direct3D.
So I continued:
- Try using OpenGL instead of Direct3D - everything works. By now it’s confirmed that initializing Direct3D causes something else in the browser not work.
- “A-ha!” moment: tell Direct3D to not change floating point precision (via a create flag). Voilà, everything works!
I don’t know how I actually came up with the idea of testing floating point precision flag. Maybe I remembered some related problems we had a while ago, where Direct3D would cause timing calculations be “off”, if the user’s machine was not rebooted for a couple of weeks or more. That time around we properly changed our timing code to use 64 bit integers, but left Direct3D precision setting intact.
Side note: Intel x86 floating point unit (FPU) can operate in various precision modes, usually 32, 64 or 80 bit. By default Direct3D 9 sets FPU precision to 32 bit (i.e. single precision). Telling D3D to not change FPU settings could lower performance somewhat, but in my tests it did not have any noticeable impact.
So there it was. A debugging session, one line of change in the code, and fancy javascript webpage animations work on Windows in Firefox and Safari. This is coming out in Unity 2.0.2 update soon.
The moral? Something in one place can affect seemingly completely unrelated things in another place!
Posted on 2008-01-22 11:46 in d3d, rant, unity, work | No Comments »
The common knowledge is that drawing stuff in OpenGL is much more faster than in D3D9. I wonder - is this actually true, or just an urban legend? I could very well imagine that setting everything up to draw a single model and then issuing 1000 draw calls for it is faster in OpenGL… but come on, that’s not a very life-like scenario!
At work we now have a D3D9 and an OpenGL renderers on Windows. The original codebase was very much designed for OpenGL, so I had to jump through a lot of hoops to get it fully working on D3D… small differences that add up, like: there’s no object space texgen on D3D, shaders don’t track built-in state (world, modelview matrices, light positions, …), textures in GL vs. textures + sampler state in D3D, and so on. Anyway, the codebase was definitely not designed to exploit D3D strengths and OpenGL weaknesses, more likely the other way around.
But wait! I look at our benchmark tests, and D3D9 is consistently faster than OpenGL. Some examples:
- Real world scene with lots of shadow casting lights (different objects, different shaders, different lights, different shadow types in one scene):
- Core Duo with Radeon X1600: 23 FPS D3D9, 13 FPS GL.
- P4 with GeForce 6800GT: 16 FPS D3D9, 9 FPS GL.
- Core2 Duo with Radeon HD 2600: 41 FPS D3D9, 35 FPS GL.
- High object count test (1000 objects, multiple lights, 5 passes per object total):
- Core Duo with Radeon X1600: 18.3 FPS D3D9, 12.5 FPS GL.
- P4 with GeForce 6800GT: 13.2 FPS D3D9, 9.4 FPS GL.
- Core2 Duo with Radeon HD 2600: 34.8 FPS D3D9, 29.3 FPS GL.
- Dynamic geometry (lots of particle systems) test (this is limited by vertex buffer writing speed and CPU calculating the particles, not draw by calls):
- Core Duo with Radeon X1600: 170 FPS D3D9, 102 FPS GL.
- P4 with GeForce 6800GT: 108 FPS D3D9, 74 FPS GL.
- Core2 Duo with Radeon HD 2600: 325 FPS D3D9, 242 FPS GL.
- …and so on.
To be fair, there are a couple of tests where on some hardware OpenGL has a slight edge. But in 95% of the cases, D3D9 is faster. Not to mention that we have about 10x less broken hardware/driver workarounds for D3D9 than we have for OpenGL…
What gives? Either our OpenGL code is horribly suboptimal, or “OpenGL is faster!!!!11oneoneeleven” is a myth. I have trouble figuring out in which places our code would be horribly suboptimal, I think we follow all advice given by hardware vendors on how to make OpenGL efficient (not that there is much advice out there though…).
There isn’t much software that can run the same content on both D3D and OpenGL and is suitable for benchmarking. I tried Ogre 3D demos on one machine (GeForce 6800GT card) and guess what? D3D9 is faster in tests that specifically stress draw count (like the instancing demo… D3D9 is faster both in instanced and non-instanced modes).
Am I crazy?
Posted on 2007-09-23 1:50 in d3d, opengl | 12 Comments »
Just got back from MVP Global Summit 2007 in Seattle. Among usual things, like watching Bill’s keynote, meeting other MVPs, DirectX/XNA guys, getting a grip of some NDA information and such, here are some of the other highlights:
Amsterdam airport:
Officer: You speak English sir?
Me: Yeah.
O (takes a look at my passport): Ah, you speak Russian of course!
M: No, not really.
O: But your language is very similar to Russian, right?
M: Hm…
Well, here we know who gets the Linguist of the Year award.
Seattle-Tahoma airport, lady at checkin: “what kind of passport is that?“. It also takes 5 times to enter my last name properly, from the printed letters in the passport. Each time trying to persuade me that I did change the ticket date of course!
Seattle-Tahoma airport, security: “sir, you have been selected for additional screening“. Do they randomly select people for that quite involved process? Why this “selection” happens immediately after they take a look at my passport?
Random quotes:
Ten minutes walk is a long distance! Ten minutes of walking distance in the States is a very good reason to buy a car. At least SUV; preferably a Hummer.
DirectX SDK is the source of all sorts of high frequency goodness.
Sony is always good at announcements.
No? Rumours on the internet? Shock! Horror!
Posted on 2007-03-17 23:13 in conferences, d3d, random | 5 Comments »
I have a MacBook Pro now and slowly am getting used to it. It’s quite hard, considering that I’ve never had a laptop before; and actually used any Mac for the first time just a couple of months ago. My daughter thinks the best part about it are the weird image effects in PhotoBooth. I just can’t disagree.
On the unrelated note, now I am a Microsoft DirectX MVP. Just about the time when I almost stopped using it! I’d love to, but we’re making a product that primarily runs on the Macs… quite hard to use D3D there. But almost every day I wish I could, and every second day I’m annoying my coworkers by saying that D3D is lightyears ahead of TheOtherAPI!
The MVP award just came out of nowhere. It’s one of the things that you never expect - but hey, it feels good anyway. And now I have a MVP laptop case for my MacBook :)
Posted on 2006-04-20 19:22 in d3d | No Comments »
Reading DirectX10 preview documentation right now (you know, it’s released with Dec2005 SDK). It is pretty impressive, I must say! Seems like a huge leap forward. Back to reading!
Posted on 2005-12-16 19:58 in d3d | 7 Comments »
Posted on 2005-10-03 12:38 in d3d, papers | No Comments »
I’ve written down the basic idea here. Done some tests and it really seems to work!
That required tiny 700 lines of hacky C++ code in the engine; but in exchange there’s no longer a need to write state restoring passes by hand. Maybe such effect usage scheme would even be useable in RealWorld!
Too bad I didn’t think it up a couple of months ago. My ShaderX4 article about this subject would have been much better…
Ok, still got to test this stuff on real world data (i.e. trying it on our demos)
Posted on 2005-09-27 18:46 in d3d, papers | No Comments »
In my projects I’ve been using D3DX Effects with no device state saving/restoring. Instead, each effect contained a dummy “last pass” that restores “needed” state (see here; more lengthy article coming in ShaderX4).
I always wrote this “state restore” by hand. This is obviously very error-prone; it’s ok if I’m the only one writing effects but would be unusable in any real world scenario.
I think I could automatically generate the “state restore” pass. Somehow the engine knows which states need to be restored; which must be set in every effect etc. (this could be read from some file). It first loads each effect file and examines what states it touches. This can be done by supplying a custom ID3DXEffectStateManager and “executing” the effect - the state manager then would remember all states (left-hand sides of state assignments) touched by the effect.
Then the engine generates the “state restore” pass and loads the effect again. I’d image it would do it like this: each effect has to contain a macro RESTORE_PASS:
technique Foo {
pass P1 { ... }
pass P2 { ... }
RESTORE_PASS
}
Which would be empty during first load and which would expand to the generated restore pass on the second load (you can supply generated macro definitions when loading the effect). The engine can check whether the generated pass exists after second load (if it doesn’t then RESTORE_PASS is missing from the effect - an error).
The downside of this scheme is that each effect file has to be loaded twice - first time for examining its state assignments and second time for actually loading it with the generated restore pass. It’s not a problem for me, I guess, because effect loading doesn’t take much time anyway… And if it would become really slow, all this stuff can be done as a preprocess (e.g. during a build).
There are many upsides of this scheme, I think: the whole system is robust and error-proof again (no longer depends on the effect writer to remember all the details about states). And as far as I can see, no performance would be lost at all (performance was the main point why I’m using this “restore pass”).
Gotta go and implement all this!
Posted on 2005-09-24 18:45 in d3d, papers | 2 Comments »
|