Four years ago today…
…I took a plane to Copenhagen. Oh, this sounds familiar…
Well ok, it all started a bit before: (more…)
Archive for 'work'Four years ago today……I took a plane to Copenhagen. Oh, this sounds familiar… Well ok, it all started a bit before: (more…) Deferred Cascaded Shadow MapsReading “Rendering Technology at Black Rock Studios” made me realize that cascaded shadow maps I did 2+ years ago in Unity 2.0 are probably called “deferred shadowing”. Since I never wrote how they are done… here: The process is roughly this (all of this is DX9 level tech on PCs; later tech or consoles could and should use more optimizations):
More detail: Render Shadow Cascades Nothing fancy here. All cascades packed into a single shadow map. For example two 512×512 cascades would be packed into 1024×512 shadow map side by side. Screen-space Shadow Term Render all shadow receivers with a shader that “collects” shadow map term. In effect, shadows from all cascades are collected into a screen-sized texture. After this step, original cascaded shadowmaps are not needed anymore. Unity supports up to 4 shadow map cascades, which neatly fit into a float4 register in the pixel shader. Correct cascade is sampled just once, without using static or dynamic branching. Pixel shader pseudocode:
Additionally, shadow fadeout is applied here (shadows in Unity can be cast up to specified distance from the camera, and they fade out when approaching that distance). After this I end up having shadow term in screen space. Note that here I do not do any shadow map filtering; that is done in screen space later. On PCs in DX9 there is (or there was?) no easy/sane way to read depth buffer in the pixel shader, so while collecting shadows the shader also outputs depth packed into two channels of the render target. Screen-space Shadow Blur Previous step results in screen space shadow term and depth. Shadow term is blurred into another render target, using a spatially varying Poisson disc-like filter. Filter size depends on depth (shadow boundaries closer to the camera are blurred more). Filter also discards samples if difference in depth is larger than something, to avoid blurring over object boundaries. It's not totally robust, but seems to work quite well. Using shadow term in forward rendering In forward rendering, this blurred shadow term texture is used. Here shadow term already has filtering & fadeout applied, and the shaders do not need to know anything about shadow cascades. Just read pixel from the texture and use it in lighting computation. Done! Fin Back then I didn't know this would be called "deferred" (that would probably have scared me away!). I don't know if this approach is any good, but so far it works quite well for Unity needs. Also, reduces shader permutation count a lot, which I like. Fixing bugs, in Tom Waits’ wordsMixing a sprint of bug fixing before the release and Tom Waits’ music results in interesting combination. For example, Crossroads describes bug fixing process perfectly:
Uhm. Yeah. Strided blur and other tips for SSAOIf you’re new to SSAO, here are good overview blog posts: meshula.net and levelofdetail. Some tips and an idea on strided blur below. Usability depends on context!Here’s a little story on how usability decisions need to depend on context. In Unity editor pretty much any window can be “detached” from the main window. An obvious use case is putting it onto a separate monitor. But of course you can just end up having a ton of detached windows overlapping each other. Here I have four windows in total on OS X: (more…) Compact Normal Storage for small g-buffersI’ve been experimenting with compact storage of view space normals for small g-buffers. Think about storing depth and normal in a single 8 bit/channel RGBA texture. Here are my findings – with error visualization and shader performance numbers for some GPUs. If you know any other method to encode/store normals in a compact way, please let me know! Implementing fixed function T&L in vertex shadersAlmost half a year ago I was wondering how to implement T&L in vertex shaders. Well, finally I implemented it for upcoming Unity 2.6. I wrote some sort of a technical report here. In short, I’m combining assembly fragments and doing simple temporary register allocation, which seems to work quite well. Performance is very similar to using fixed function (I know it’s implemented as vertex shaders internally by the runtime/driver) on several different cards I tried (Radeon HD 3xxx, GeForce 8xxx, Intel GMA 950). What was unexpected: the most complex piece is not the vertex lighting! Most complexity is in how to route/generate texture coordinates and transform them. Huge combination explosion there. Otherwise – I like! Here’s a link to the article again. Unity 2.5 is outUnity 2.5 is finally released. In summary: Here’s what’s new. Here’s the download page. My 11th Unity release since I joined 3+ years ago. This is quite a crazy release that involved almost complete editor tools rewrite and lots of other juggling. Was not exactly a walk in the park, but it’s done now. Meet me at GDC in San Francisco next week and I’ll tell you the war stories (Unity booth is 5110 NH). Here’s the obligatory source code commits graph: Another Vista review (after 6 months of usage)Ok, I don’t exactly like Windows Vista. But I just spent 6 months using Vista as my primary OS at work… because everyone else was using XP, and someone had to make sure everything works on Vista as well. So it was me. In summary, Vista is not that bad. Once you get used to changes in Explorer, different skin and so on – it’s actually usable. I think they have made some real improvements in the underlying technology, too bad they managed to “compensate” for all of that by inconsistencies and lack of polish in user interface. At this point it’s minor quirks in UI that annoy me, but apart from that, Vista is okay. Look:
So yeah. It’s not stellar, it has tons of small annoyances (and some large ones – try developing web plugins with UAC on…), but it’s usable. I might have gotten used to it by now, actually. Fixed function lighting in vertex shader – how?Sometime soon I’ll have to implement fixed function lighting pipeline in vertex shaders. Why? Because mixing fixed function and vertex shaders in multiple passes does not guarantee identical transformation results, thus requiring depth bias or projection matrix tweaks, which leads to various artifacts that annoy people to hell. I don’t really know why that happens, because it seems that most modern cards don’t have fixed function units, so internally they are running shaders anyway. DX9 runtime on Vista’s WDDM also seems to be only handling shaders to the driver internally. Still, for some reason somewhere the precision does not match… How such a task should be approached? My requirements are:
I looked at ATI’s FixedFuncShader sample. It’s an ubershader approach; one large (230 instructions or so) shader with static VS2.0 branching. It had some obvious places to optimize, I could get it down to 190 or so instructions, kill some rcp‘s and reduce the amount of constant storage by 2x. Still, it did not handle some things in the D3D T&L or had some issues:
Another thing I’m considering, is to combine final shader(s) from assembly fragments, with some simple register allocation. In T&L shader code, there’s only limited set of could-be-redundant computations, mostly computing world space position, camera space normal, view vector and so on (those could be used lighting, texgen or fog). Those computations can be explicitly put into separate fragments, and later fragments could just use their result. What is left then is some register allocation. A shader assembly fragment could want some temporary registers for internal use (this is simple, just give it a bunch of unused registers), also want some registers as input (from previous fragments), and save some output in registers. Again, I haven’t checked with shader performance tools, but I think, guess and hope that the drivers do additional register allocation, liveness analysis etc. when converting D3D shader bytecode into hardware format. This would mean that I can be quite sloppy with it, i.e. don’t have to implement some super smart allocation scheme. I wrote some experimental code for the shader assembly combiner and so far it looks like a reasonable approach (and not too hard either). Does that make sense? Or did everyone solve those problems eons ago already? Edit: half a year later, I wrote a technical report on how I implemented all this: http://aras-p.info/texts/VertexShaderTnL.html |