Deferred Cascaded Shadow Maps

Reading “Rendering Technology at Black Rock Studios” made me realize that cascaded shadow maps I did 2+ years ago in Unity 2.0 are probably called “deferred shadowing”. Since I never wrote how they are done… here:

The process is roughly this (all of this is DX9 level tech on PCs; later tech or consoles could and should use more optimizations):

  1. Render shadow map cascades. All of them packed into one shadow map via viewports.

  2. Collect shadows into screen sized render target. This is the shadow term.

  3. Blur the shadow term.

  4. In regular forward rendering, use shadow term in screen space.

More detail:

Render Shadow Cascades

Nothing fancy here. All cascades packed into a single shadow map. For example two 512x512 cascades would be packed into 1024x512 shadow map side by side.

Screen-space Shadow Term

Render all shadow receivers with a shader that “collects” shadow map term. In effect, shadows from all cascades are collected into a screen-sized texture. After this step, original cascaded shadowmaps are not needed anymore.

Unity supports up to 4 shadow map cascades, which neatly fit into a float4 register in the pixel shader. Correct cascade is sampled just once, without using static or dynamic branching. Pixel shader pseudocode:

 float4 near = float4 (z >= _LightSplitsNear);
 float4 far = float4 (z < _LightSplitsFar);
 float4 weights = near * far;
 float2 coord =
     i._ShadowCoord[0] * weights.x +
     i._ShadowCoord[1] * weights.y +
     i._ShadowCoord[2] * weights.z +
     i._ShadowCoord[3] * weights.w;
 float sm = tex2D (_ShadowMapTexture, coord.xy).r;

Additionally, shadow fadeout is applied here (shadows in Unity can be cast up to specified distance from the camera, and they fade out when approaching that distance).

After this I end up having shadow term in screen space. Note that here I do not do any shadow map filtering; that is done in screen space later.

On PCs in DX9 there is (or there was?) no easy/sane way to read depth buffer in the pixel shader, so while collecting shadows the shader also outputs depth packed into two channels of the render target.

Screen-space Shadow Blur

Previous step results in screen space shadow term and depth. Shadow term is blurred into another render target, using a spatially varying Poisson disc-like filter.

Filter size depends on depth (shadow boundaries closer to the camera are blurred more). Filter also discards samples if difference in depth is larger than something, to avoid blurring over object boundaries. It’s not totally robust, but seems to work quite well.

Using shadow term in forward rendering

In forward rendering, this blurred shadow term texture is used. Here shadow term already has filtering & fadeout applied, and the shaders do not need to know anything about shadow cascades. Just read pixel from the texture and use it in lighting computation. Done!

Fin

Back then I didn’t know this would be called “deferred” (that would probably have scared me away!). I don’t know if this approach is any good, but so far it works quite well for Unity needs. Also, reduces shader permutation count a lot, which I like.


Fixing bugs, in Tom Waits' words

Mixing a sprint of bug fixing before the release and Tom Waits’ music results in interesting combination. For example, Crossroads describes bug fixing process perfectly:

And that’s where ol’ George found himself out there at the FogBugz
Fixin’ the devil’s bugs
Now, a man figures it’s his bugs and he’ll assign whom he wants
But it don’t always work out that way
You see, some bugs are special for a certain target
A certain platform, or a certain person
And no matter whom you’re assignin’, that’s where the bug ’ll end up
And in the moment of assigning your mouse turns into a dowser’s wand
And clicks where the bug wants to go.

Uhm. Yeah.


Strided blur and other tips for SSAO

If you’re new to SSAO, here are good overview blog posts: meshula.net and levelofdetail. Some tips and an idea on strided blur below.

Bits and pieces I found useful

  • SSAO can be generated at a smaller resolution than screen, with depth+normals aware upsample/blur step.

  • If random offset vector points away from surface normal, flip it. This makes random vectors be in the upper hemisphere, which reduces false occlusion on flat surfaces. Of course this requires having surface normals.

  • When generating random vectors for your AO kernel:

    • Generate vectors inside unit sphere (not on unit sphere).
    • Use energy minimization to distribute your samples better, especially at low sample counts. See malmer.ru blog post.
  • In your AO blurring/upsampling step: no need to sample each pixel for blur. Just skip some of them, i.e. make kernel offsets larger. See below.

Strided blur for AO

Normally you’d blur AO term using some sort of standard blur, for example separable Gaussian: horizontal blur, followed by vertical blur. How one can imagine horizontal blur kernel:

Horizontal Blur Kernel

Here’s how Rune taught me how to blur better:

Rune: The other thing is the blur. I tried to make the blur 4 times stronger, and it looks much better IMO without any artifacts I could see. I could even use 4x downsampling with that blur amount and still get acceptable results.

Aras: how did you make it 4x stronger? (I was going to say that blur step is already quite expensive, and I don’t want to add more samples to make it even more expensive, yadda yadda)

Rune:
m_SSAOMaterial.SetVector ("_TexelOffsetScale", m_IsOpenGL ?
new Vector4 (4,0,1.0f/m_Downsampling,0) :
new Vector4 (4.0f/source.width,0,0,0));
And similar for vertical.

Aras: hmm. that’s strange :)

Rune: I have no idea what I’m doing of course but it looks good.

Aras: so this way it does not do Gaussian on 9x9 pixels, but instead only takes each 4th pixel. Wider area, but… it should not work! :)

Rune: It creates a very fine pattern at pixel level but it’s way more subtle than the noise you get otherwise.

Aras: ok (hides in the corner and weeps)

So yeah. The blur kernel can be “spread” to skip some pixels, effectively resulting in a larger blur radius for the same sample count:

Blur with 2 pixel stride

Or even this:

Blur with 3 pixel stride

Yes, it’s not correct blur. But that’s okay, we’re not building nuclear reactors that depend on SSAO blur being accurate. If you are, SSAO is probably a wrong approach anyway, I’ve heard it’s not that useful for nuclear stuff.

I’m not sure how this blur should be called. Strided blur? Interleaved blur? Interlaced blur? Or maybe everyone is doing that already and it has a well established name? Let me know.

Some images of blur in action. Raw AO term (very low - 8 - sample count and increased contrast on purpose):

Raw AO at low sample count

Regular 9x9 blur (does not blur over depth+normals discontinuities):

Blurred AO

Blur that goes in 2 pixel stride (effectively 17x17):

Blurred AO with stride 2

It does create a fine interleaved pattern because it skips pixels. But you get wider blur!

Blurred AO with stride 2, magnified

Blur that goes in 3 pixel stride (effectively 25x25):

Blurred AO with stride 3

At 3 pixel stride the artifacts are becoming apparent. But hey, this is very low AO sample count, increased contrast and no textures in the scene.

Blured AO with stride 3, magnified

For sake of completeness, the same raw AO term, but computed at 2x2 smaller resolution (still using low sample count etc.):

AO computed at lower resolution

Now, 2x2 smaller AO, blurred with 3 pixels stride:

AO at lower resolution, blurred with 3 pixel stride

Happy blurring!


Usability depends on context!

Here’s a little story on how usability decisions need to depend on context.

In Unity editor pretty much any window can be “detached” from the main window. An obvious use case is putting it onto a separate monitor. But of course you can just end up having a ton of detached windows overlapping each other.

Here I have four windows in total on OS X:

Overlapped Windows on OS X

Here I have four windows on Windows:

Overlapped Windows on Windows

However, users of OS X and Windows are used to applications behaving differently.

On OS X, it is very common that a single application has many overlapping windows. Usually users don’t have problems finding their windows either, thanks to Exposé. Press a key, voilà, here they are:

Exposé on OS X

On Windows, there is no Exposé. So there’s a problem: when a detached window is obscured by another window, how do you get to it? One would ask “well, what’s wrong in having windows partially overlapped, like in above screenshot?”, to which I’d say “you’re a Mac user”.

Windows users do not have a ton of windows on screen. They tend to maximize the application they are currently working with. I was doing this myself all the time, and it took 3 years of Mac laptop usage before I stopped maximizing everything on my Windows box!

So what a typical Windows user might see when using Unity is this. Now, where are the other three detached windows?

Maximized

On Windows, it is very uncommon for a single application to have many overlapped windows. When an application does that, the “detached” windows are always positioned on top of the main window. There are some applications that do not do this (yes I’m looking at you GIMP), and almost everyone is not happy with their usability.

So we decided to take this context into account. Windows users do not have Exposé, and they expect “detached” windows to be always on top of the main window. Unity 2.6 will do this soon.

In Front on Windows

Of course, you still can dock all the windows together and this whole “windows are obscured by other windows” issue goes away:

Docked on Windows

Hmm… I think the screenshots above show two new big features in upcoming Unity 2.6. Preemptive note: UI of the stuff above is not final. Anything might change, don’t become attached to any particular pixel!


Talks & Demos from Assembly 2009

I went to Assembly 2009 demoparty this year.

No demo submissions, but I did a seminar presentation about developing graphics technology for small games (PDF slides). Mostly on hardware statistics, GPU features, testing and stability:

Asm'09 seminar: developing gfx tech for small games from Unity3D on Vimeo.

However, the awesome talk was given by ReJ: low level iPhone (pre-3GS) rendering details (PDF slides). Inner workings of iPhone’s GPU, OpenGL ES drivers, command buffers, VFP assembly and so on. Bringing assembly back to the Assembly, yeah!

Asm'09 seminar: developing gfx tech for small games from Unity3D on Vimeo.

If you’re going to watch some demos from Assembly 2009, make sure to see:

  • Frameranger (1st place demo). Rocked the big screen! Seems somewhat unfinished though.

  • The Golden Path (3rd place demo) - for something fresh. Also, a good way to disprove the saying that “the winners don’t take drugs” :)

  • Muon Baryon (1st place 4 kilobyte intro) - that’s what kids do with sphere marching on the GPU these days.