Deferred Cascaded Shadow Maps
Reading “Rendering Technology at Black Rock Studios” made me realize that cascaded shadow maps I did 2+ years ago in Unity 2.0 are probably called “deferred shadowing”. Since I never wrote how they are done… here:
The process is roughly this (all of this is DX9 level tech on PCs; later tech or consoles could and should use more optimizations):
- Render shadow map cascades. All of them packed into one shadow map via viewports.
- Collect shadows into screen sized render target. This is the shadow term.
- Blur the shadow term.
- In regular forward rendering, use shadow term in screen space.
More detail:
Render Shadow Cascades
Nothing fancy here. All cascades packed into a single shadow map. For example two 512×512 cascades would be packed into 1024×512 shadow map side by side.
Screen-space Shadow Term
Render all shadow receivers with a shader that “collects” shadow map term. In effect, shadows from all cascades are collected into a screen-sized texture. After this step, original cascaded shadowmaps are not needed anymore.
Unity supports up to 4 shadow map cascades, which neatly fit into a float4 register in the pixel shader. Correct cascade is sampled just once, without using static or dynamic branching. Pixel shader pseudocode:
float4 near = float4 (z >= _LightSplitsNear);
float4 far = float4 (z < _LightSplitsFar);
float4 weights = near * far;
float2 coord =
i._ShadowCoord[0] * weights.x +
i._ShadowCoord[1] * weights.y +
i._ShadowCoord[2] * weights.z +
i._ShadowCoord[3] * weights.w;
float sm = tex2D (_ShadowMapTexture, coord.xy).r;
Additionally, shadow fadeout is applied here (shadows in Unity can be cast up to specified distance from the camera, and they fade out when approaching that distance).
After this I end up having shadow term in screen space. Note that here I do not do any shadow map filtering; that is done in screen space later.
On PCs in DX9 there is (or there was?) no easy/sane way to read depth buffer in the pixel shader, so while collecting shadows the shader also outputs depth packed into two channels of the render target.
Screen-space Shadow Blur
Previous step results in screen space shadow term and depth. Shadow term is blurred into another render target, using a spatially varying Poisson disc-like filter.
Filter size depends on depth (shadow boundaries closer to the camera are blurred more). Filter also discards samples if difference in depth is larger than something, to avoid blurring over object boundaries. It's not totally robust, but seems to work quite well.
Using shadow term in forward rendering
In forward rendering, this blurred shadow term texture is used. Here shadow term already has filtering & fadeout applied, and the shaders do not need to know anything about shadow cascades. Just read pixel from the texture and use it in lighting computation. Done!
Fin
Back then I didn't know this would be called "deferred" (that would probably have scared me away!). I don't know if this approach is any good, but so far it works quite well for Unity needs. Also, reduces shader permutation count a lot, which I like.
Good post, and I like the term “deferred shadowing” for this as well!
This is very similar to one of Wolf Engel’s techniques in ShaderX5 i think :). Not sure if you’ve seen that or not.
Yep, definitely deferred shadowing :) The variable sized filter and depth discontinuity discards are staples in SSAO too. Nice work!
CryEngine 2 did something similar but stored the shadows for 4 different lights into the 4 channels of the screen space shadow buffer.
I did that around 2003 in my trucks game. Unfortunately Geforce3 didn’t allow me to do fancy blur scheme so it was just blurring outward with semi-acceptable radius.
Can still see artifacts on recent screenshots though :(
For the weights calculation, I used this formulation:
float4 weights = ( z > _LightSplitsNear );
weights.xyz -= weights.yzw;
The rest is the same.
_LightSplitsNear has to be ordered from nearest to furthest, the cascades should overlap, and if it’s past the furthest cascade’s far then ‘weights’ doesn’t return to all zeroes.
@Alex: nice!
I’m doing this pretty much the same way, but I had a very hard time making this work with anti-aliasing (DirectX 9). In the end, I added a resolve pass that tries to fill invalid gaps with some approximate shadow value (the darkest of all neighbors after applying some validity and continuity heuristics) to get rid of all the wrongly lit edges. This has artifacts of its own, but at least it doesn’t look totally wrong. The big drawback here is that this resolve pass is not separable and quite expensive.
Any experience on this issue?
@CodingCat: I just ignored the whole AA issue. So yes, when AA is used the shadow term is sometimes wrong on edges. So far I haven’t heard lots of complaints from Unity users, so I assume everyone is fine with that :)
So I continued working on this topic and finally seem to have come to a solution. In case anyone is still interested, I just published a rather lengthy article on this whole AA matter: http://www.alphanew.net/index.php?section=articles&site=multisampling
Summed up in one sentence, I am now shifting the shadow mask look-up coordinate away from polygon edges towards the polygon center on edge pixels, thus rather elegantly circumventing wrongly shadowed pixels.
@CodingCat: thanks for sharing!
I think I coined the term Deferred Cascaded Shadow Maps and the term shadow collector in 2005 in my ShaderX article …
@Wolf: yeah, very well might be ;) I think I read your article halfway through Unity’s shadow system development, and IIRC the trick of gathering shadow from cascades without branches came from there. But it’s all quite hazy now, feels like ages ago!
Yup, this is industry standard at this point. Are you using Early-Stencil mask to get your “each pixel selects correct map without branches”?
You can take a bit further by creating another screen space Early-Stencil edge mask ( actually very quickly, I got mine to ~0.5ms on PS3 ) and running your screen space blur pass only on the edges of the shadow mask. I realized this couple of years ago, don’t need to run a blur operation on all white or all black pixels, won’t do anything, only need it along edges of the shadows that the mask creates. This should cut the cost of the blur to about 20% – 60% of total depending on shadow complexity in the screen at that time.
@Szymon: what would stencil masking save in selecting the cascade? Per pixel, it just computes the final UV (couple compares & several multiply-adds), see the code above.
But yeah, for blurring something could be done. IIRC, ATI had a technique on that back in the days. Not sure if they used stencil or just dynamic branching (might be latter, as it was to show off dynamic branching in Radeon X1xxx).
How do you create the mask? I’d imagine you either need to sample shadowmap once, detect edges & dilate them (into stencil… somehow); or sample shadowmap with a small amount of samples but quite widespread, and hope you won’t miss some very thin shadows?
@Aras I imagined: render a volume for each cascade slice, and create a rejection mask in stencil for everything but the first cascade you want to compute. Then run full quad with the right shadow matrix and texture uploaded, will only operate on pixels for that cascade. Update stencil to mask away that cascade, and allow new cascade to be updated. Repeat, full quad new matrix, new texture etc… That way only running minimal pixel operations full screen for arbitrary number of cascades (computing only 1 uv per pixel), with exception of grabbing the depth once per pixel to do shadow test.
One way to create blur mask: box down sample full screen shadow mask to quarter rez by sampling 4 corners of 16 pixel block and average. That’s it. Sample that smaller texture at full-screen, write to stencil where you alpha test based on whether bi linear sampled texture has any value not 0 or 1. If it has a gradient i.e. blurred from box down sample, then it is a natural edge and naturally got dilated by the down sample.
@Aras Sorry, I just re-read your code snippet, initially I thought it was a weighted blur when I skimmed it now realize that’s the code for selecting the uv’s from the atlas, very clever (will actually “read” fully next time).