Mobile graphics API wishlist: some features
In my previous post I talked about things I’d want from OpenGL ES 2.0 in the performance area. Now it’s time to look at what extra features it might expose with an extension here or there.
Archive for 'opengl'Mobile graphics API wishlist: some featuresIn my previous post I talked about things I’d want from OpenGL ES 2.0 in the performance area. Now it’s time to look at what extra features it might expose with an extension here or there. Mobile graphics API wishlist: performanceMost mobile platforms currently are based on OpenGL ES 2.0. While it is much better than traditional OpenGL, there are ways where it limits performance or does not expose some interesting hardware features. So here’s an unorganized wishlist for GLES2.0 performance part! iOS shader tricks, or it’s 2001 all over againI was recently optimizing some OpenGL ES 2.0 shaders for iOS/Android, and it was funny to see how performance tricks that were cool in 2001 are having their revenge again. Here’s a small example of starting with a normalmapped Blinn-Phong shader and optimizing it to run several times faster. Most of the clever stuff below was actually done by ReJ, props to him! Here’s a small test I’ll be working on: just a single plane with albedo and normal map textures: GLSL OptimizerDuring development of Unity 3.0, I was not-so-pleasantly surprised to see that our cross-compiled shaders run slow on iPhone 3Gs. And by “slow”, I mean SLOW; at the speeds of “stop the presses, we can not ship brand new OpenGL ES 2.0 support with THAT performance”. Compiling HLSL into GLSL in 2010Realtime shader languages these days have settled down into two camps: HLSL (or Cg, which for all practical reasons is the same) and GLSL (or GLSL ES, which is sufficiently similar). HLSL/Cg is used by Direct3D and the big consoles (Xbox 360, PS3). GLSL/ES is used by OpenGL and pretty much all modern mobile platforms (iPhone, Android, …). Since shaders are more or less “assets”, having two different languages to deal with is not very nice. What, I’m supposed to write my shader twice just to support both (for example) D3D and iPad? You would think in 2010, almost a decade since high level realtime shader languages have appeared, this problem would be solved… but it isn’t! ARB_draw_buffers
No, I don’t have any particular point to make. But I did not even get the t-shirt… OpenGL 3: a big step in no direction at all?Well, the post title pretty much summarizes my take on it, doesn’t it? I guess I could just stop typing now… but I won’t! So after some promises, delays and a period of deadly silence, OpenGL 3.0 was released. Response to it was “interesting“, to say at least. Some part of that response is related to seriously mishandled communication on Khronos part. Some part is because GL 3.0 is not what it was promised to be. Let’s just ignore the communication issue, it does not affect OpenGL itself in a direct way (it affects the developer community though). By the way, I borrowed part of the post title from a blog post linked from opengl.org. In general, I do not agree with that blog post, but it’s a valid point of view. Unlike some other blog posts linked from opengl.org that are just pure garbage… I am not sure what are the goals of OpenGL at this point. OpenGL’s current position, as far as games are concerned, seems to be roughly this:
Why? Because Windows has got D3D, which is far more stable, comes with useful tools, more often updated and actually works for variety of users (I’ll get to this point in a second). Mobile platforms have OpenGL ES, which is decent. All consoles have their own APIs (some of them similar to D3D, none of them similar to GL). So that leaves OpenGL as the choice on OS X, Linux and such. Not because it’s better. Because it’s the only choice. “Oh, but look, id uses OpenGL! Two other games use OpenGL as well!” Well, good for them. But they are in a different league than “the rest of us”. For some games, driver writers will do whatever it takes to get those games running correct & fast. Surprise surprise, id games fall into this category. For the rest of us – no such luxury. Hey, try talking to your friendly IHV, the most likely answer is “yeah, but are really busy with some high profile games right now, ping us back in two months”. After two months, repeat. So the rest comes from somone who is not working on the high-profile games that IHVs specially tune drivers to. If OpenGL’s goals are to stay in this current position, then GL 3.0 is okay. It adds some new features, brings some extensions into core, hey, it even says “it’s quite likely that maybe perhaps someday some of the old cruft in the API will be removed, if we feel like it”. No problem with that. However, OpenGL is advertised as something different, as if it wants to:
Which is quite different from it’s current position. I’m not sure if that’s the goal of OpenGL. Myself, I don’t care about the mythical cross-platform API that would actually work on those different platforms. API is a tool to do stuff; if different platforms have different APIs – no problem with that. However, if OpenGL wants to achieve this advertised goal, it has to do several things. First and foremost: Actually work Stable drivers and runtime. In it’s current state, GL is too complex to implement good quality drivers/runtime. Complexity can be reduced in several ways:
GL 3.0 could have done both of the above, instead it did none. It could have cleaned up the API, and provide one platform independent GL 1.x/2.x library that calls into actual 3.0 runtime. All the fixed function, immediate mode, display lists, whatever would be in one nice library. Even existing apps could continue to function transparently this way (with the benefit of actually simpler = more stable drivers). Support platforms/hardware/features user needs This is of course dependent on the user in question. For someone like us, we still have to support 10 year old hardware. D3D9 does a fine job for that (provided you have drivers installed, and DX9 runtime installed – which comes included in XP SP2 and upwards). OpenGL 2.1 and earlier would do a fine job for that, provided it would “actually work” (see above). If GL 3.0 would be as was originally promised – almost new API, shader model 2.0+ hardware, it would be sort of fine. In our case, that would mean writing and supporting two renderers – “old GL” and “new GL”, where old one would be used on old hardware or old platforms where “new GL” is not available. If the new runtime were much leaner, much more stable and generally nicer, this would not be a big problem. With actual GL 3.0, in theory one does not have to write two renderers. Minimum hardware level for GL 3.0 is shader model 4+ though. So to support both old hardware/platforms and new hardware/platforms, quite a lot of duplication has to be done. Especially if you intend to go towards proposed “future GL path”, i.e. start dropping deprecated functionality from the codebase. At which point you’ll probably write two separate renderers already. So we’re back to where original GL 3.0 would have been, just without any extra niceness/stability/leanness right now. Oh, and look at vendor announcements from 2008 OpenGL BOF. NVIDIA: we have almost full drivers now. AMD: we’re committed to having drivers. Intel: look for GL 3.0 on future platforms. In other words, looks like current Intel’s cards won’t ever have GL 3.0 drivers. And in our target market, Intel has the majority of cards. That sounds very much like “just ignore whole GL 3.0 thing” plan to me. Be nice This is a point of far lesser importance than “actually work” and “support what is needed” ones. Having good tools (PIX, …), documentation, code examples etc. is nice. But not much more; being nicest API in the world does not do much if it does not actually work or does not support what you need. Even in this area, actual GL 3.0 is not nice – it’s full of redundancies and crap that goes 15 years back in history. Summing it up To me, GL 3.0 looks like a blunder. Instead of fixing the core problems, they just postponed that. Well, Keep up the good work! Depth bias and the power of deceiving yourselfIn Unity we very often mix fixed function and programmable vertex pipelines. In our lighting model, some amount of brightest lights per object are drawn in pixel lit mode, and the rest are drawn using fixed function vertex lighting. Naturally the pixel lights most often use vertex shaders, as they want to calculate some texcoords for light cookies, or do something with tangent space, or calculate some texcoords for shadow mapping, and so on. The vertex lighting pass uses fixed function, because it’s the easiest way. It is possible to implement fixed function lighting equivalent in vertex shaders, but we haven’t done that yet because of complexities of Direct3D and OpenGL, the need to support shader model 1.1 and various other issues. Call me lazy. And herein lies the problem: most often precision of vertex transformations is not the same in fixed function versus programmable vertex pipelines. If you’d just draw some objects in multiple passes, mixing fixed function and programmable paths, this is roughly what you will get (excuse my programmer’s art): Not pretty at all! This should have looked like this: So what do we do to make it look like this? We “pull” (bias) some rendering passes slighly towards the camera, so there is no depth fighting. Now, at the moment Unity editor runs only on the Macs, which use OpenGL. In there, most of hardware configurations do not need this depth bias at all – they are able to generate same results in fixed function and programmable pipelines. Only Intel cards do need the depth bias on Mac OS X (on Windows, AMD and Intel cards need depth bias). So people author their games using OpenGL, where it does not need depth bias in most cases. How do you apply depth bias in OpenGL? Enable GL_POLYGON_OFFSET_FILL and set glPolygonOffset to something like -1, -1. This works. How do you apply depth bias in Direct3D 9? Conceptually, you do the same. There are DEPTHBIAS and SLOPESCALEDEPTHBIAS render states that do just that. And so we did use them. And people complained about funky results on Windows. And I’d look at their projects, see that they are using something like 0.01 for camera’s near plane and 1000.0 for the far plane, and tell them something along the lines of “increase your near plane, stupid!” (well ok, without the “stupid” part). And I’d explain all the above about mixing fixed function and vertex shaders, and how we do depth bias in that case, and how on OpenGL it’s often not needed but on Direct3D it’s pretty much always needed. And yes, how sometimes that can produce “double lighting” artifacts on close or intersecting geometry, and how the only solution is to increase the near plane and/or avoid close or intersecting geometry. Sometimes this helped! I was so convinced that their too-low-near-plane was always the culprit. And then one day I decided to check. This is what I’ve got on Direct3D: Ok, this scene is intentionally using a low near plane, but let me stress this again. This is what I’ve got: Not good at all. What happened? It happened in roughly this way:
It’s good to check your assumptions once in a while. So yeah, the proper multiplier for depth bias on Direct3D with 24 bit depth buffer should be not 1.0/65535.0, but something like 1.0/(2^24-1). Except that this value is really small, so something like 4.8e-7 should be used instead (see Lengyel’s GDC2007 talk). Oh, but for some reason it’s not really enough in practice, so something like 2.0*4.8e-7 should be used instead (tested so far on GeForce 8600, Radeon HD 3850, Radeon 9600, Intel 945, reference rasterizer). Oh, and the same value should be used even when a 16 bit depth buffer is used; using 1.0/65535.0 multiplier with 16 bit depth buffer produces way too large bias. With proper bias values the image is good on Direct3D again. Yay for that (fix is coming in Unity 2.1 soon). …and yes, I know that real men fudge projection matrix instead of using depth bias… someday maybe. What OpenGL actually needsOk, it looks like OpenGL 3.0 specification will be delayed a bit. Oh well, spec now, first drivers a bit later, sort-of-stable drivers a year or two later, and Joe-the-average-user will hopefully have some OpenGL 3.0 support in his Windows box after 5 years. Still, progress has to be made. The idea of abandoning the old concept of “bind the current object and do stuff on it” and replacing it with direct functions that take object as parameter is very good. Too much state-machine-like functionality in current OpenGL is just a pain for no good reason. Also a very good idea is to make most objects immutable once they are created. Too much flexibility for no good reason just makes the lives of driver developers harder (and gives them much more opportunities to make bugs). All in all, OpenGL’s API is becoming more like Direct3D, which is good in my eyes. What OpenGL needs, besides all the work that goes into OpenGL 3.0? Certainly not lengthy discussions on whether alpha test should be kept or removed (it does not matter! just pick one) or whether shader assembly is actually assembly (it’s not. but current implementations of GLSL are too unusable, so…). What OpenGL needs is implementation quality. Of all crashes in Unity 1.x web games, close to 100% are inside the dll of OpenGL driver, occurring in totally unpredictable situations. I’ve yet to see a crash in D3D driver of Unity 2.0 web games. Why is this? My thinking is because in D3D, quite a chunk of work is done by Microsoft (the D3D runtime). And as it’s a component of the OS, they probably try hard to make it stable, and they have WHQL tests at least. It’s a somewhat similar situation on the Mac with OpenGL – Apple does the runtime, and IHVs do the drivers. Thus OpenGL in the Mac is much more stable than on Windows (it’s not as stable as I’d like it to be, but hey). Get someone out of whole Khronos conglomerate to write GLSL parsers, format conversions, whatever else that is not directly tied to the hardware. Make it open source if you wish, so that some bugs can be found by mere mortals (instead of waiting indefinitely for IHVs to reply because we’re not important enough). Write very extensive testing suites that not just test rasterization rules, but also try to do something more complex than drawing a couple of primitives. The more tests the better. And make it required for all implementations to use this common codebase and pass all the tests, otherwise they won’t have the right to call themselves “OpenGL”. Oh, and get more games to actually use OpenGL, because right now all drivers have to do is make sure the current id Software engine runs okay :) Is OpenGL really faster than D3D9?The common knowledge is that drawing stuff in OpenGL is much more faster than in D3D9. I wonder – is this actually true, or just an urban legend? I could very well imagine that setting everything up to draw a single model and then issuing 1000 draw calls for it is faster in OpenGL… but come on, that’s not a very life-like scenario! At work we now have a D3D9 and an OpenGL renderers on Windows. The original codebase was very much designed for OpenGL, so I had to jump through a lot of hoops to get it fully working on D3D… small differences that add up, like: there’s no object space texgen on D3D, shaders don’t track built-in state (world, modelview matrices, light positions, …), textures in GL vs. textures + sampler state in D3D, and so on. Anyway, the codebase was definitely not designed to exploit D3D strengths and OpenGL weaknesses, more likely the other way around. But wait! I look at our benchmark tests, and D3D9 is consistently faster than OpenGL. Some examples:
To be fair, there are a couple of tests where on some hardware OpenGL has a slight edge. But in 95% of the cases, D3D9 is faster. Not to mention that we have about 10x less broken hardware/driver workarounds for D3D9 than we have for OpenGL… What gives? Either our OpenGL code is horribly suboptimal, or “OpenGL is faster!!!!11oneoneeleven” is a myth. I have trouble figuring out in which places our code would be horribly suboptimal, I think we follow all advice given by hardware vendors on how to make OpenGL efficient (not that there is much advice out there though…). There isn’t much software that can run the same content on both D3D and OpenGL and is suitable for benchmarking. I tried Ogre 3D demos on one machine (GeForce 6800GT card) and guess what? D3D9 is faster in tests that specifically stress draw count (like the instancing demo… D3D9 is faster both in instanced and non-instanced modes). Am I crazy? |